1
|
Tripathy RK, Frohock Z, Wang H, Cary GA, Keegan S, Carter GW, Li Y. Effective integration of multi-omics with prior knowledge to identify biomarkers via explainable graph neural networks. NPJ Syst Biol Appl 2025; 11:43. [PMID: 40341543 PMCID: PMC12062277 DOI: 10.1038/s41540-025-00519-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Accepted: 04/11/2025] [Indexed: 05/10/2025] Open
Abstract
The rapid growth of multi-omics datasets and the wealth of biological knowledge necessitates the development of effective methods for their integration. Such methods are essential for building predictive models and identifying drug targets based on a limited number of samples. We propose a framework called GNNRAI for the supervised integration of multi-omics data with biological priors represented as knowledge graphs. Our framework leverages graph neural networks (GNNs) to model the correlation structures among features from high-dimensional 'omics data, which reduces the effective dimensions in data and enables us to analyze thousands of genes simultaneously using hundreds of samples. Furthermore, our framework incorporates explainability methods to elucidate informative biomarkers. We apply our framework to Alzheimer's disease (AD) multi-omics data, showing that the integration of transcriptomics and proteomics data with prior AD knowledge is effective, improving the prediction accuracy of AD status over single-omics analyses and highlighting both known and novel AD-predictive biomarkers.
Collapse
Affiliation(s)
- Rohit K Tripathy
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Zachary Frohock
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Hong Wang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | | | - Yi Li
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
2
|
Anteghini M, Gualdi F, Oliva B. How did we get there? AI applications to biological networks and sequences. Comput Biol Med 2025; 190:110064. [PMID: 40184941 DOI: 10.1016/j.compbiomed.2025.110064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 03/18/2025] [Accepted: 03/20/2025] [Indexed: 04/07/2025]
Abstract
The rapidly advancing field of artificial intelligence (AI) has transformed numerous scientific domains, including biology, where a vast and complex volume of data is available for analysis. This paper provides a comprehensive overview of the current state of AI-driven methodologies in genomics, proteomics, and systems biology. We discuss how machine learning algorithms, particularly deep learning models, have enhanced the accuracy and efficiency of embedding sequences, motif discovery, and the prediction of gene expression and protein structure. Additionally, we explore the integration of AI in the embedding and analysis of biological networks, including protein-protein interaction networks and multi-layered networks. By leveraging large-scale biological data, AI techniques have enabled unprecedented insights into complex biological processes and disease mechanisms. This work underlines the potential of applying AI to complex biological data, highlighting current applications and suggesting directions for future research to further explore AI in this rapidly evolving field.
Collapse
Affiliation(s)
- Marco Anteghini
- BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy; Visual and Data-Centric Computing, Zuse Institut Berlin, Berlin, Germany.
| | - Francesco Gualdi
- Structural Bioinformatics Lab, Universitat Pompeu Fabra, Barcelona, Spain; Istituto dalle Molle di Studi sull'Intelligenza Artificiale, USI/SUPSI (Università Svizzera Italiana/Scuola Universitaria Professionale Svizzera Italiana) Lugano, Switzerland.
| | - Baldo Oliva
- Structural Bioinformatics Lab, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
3
|
Karuppan Perumal MK, Rajan Renuka R, Kumar Subbiah S, Manickam Natarajan P. Artificial intelligence-driven clinical decision support systems for early detection and precision therapy in oral cancer: a mini review. FRONTIERS IN ORAL HEALTH 2025; 6:1592428. [PMID: 40356851 PMCID: PMC12066789 DOI: 10.3389/froh.2025.1592428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2025] [Accepted: 04/17/2025] [Indexed: 05/15/2025] Open
Abstract
Oral cancer (OC) is a significant global health burden, with life-saving improvements in survival and outcomes being dependent on early diagnosis and precise treatment planning. However, diagnosis and treatment planning are predicated on the synthesis of complicated information derived from clinical assessment, imaging, histopathology and patient histories. Artificial intelligence-based clinical decision support systems (AI-CDSS) provides a viable solution that can be implemented via advanced methodologies for data analysis, and synthesis for better diagnostic and prognostic evaluation. This review presents AI-CDSS as a promising solution through advanced methodologies for comprehensive data analysis. In addition, it examines current implementations of AI-CDSS that facilitate early OC detection, precise staging, and personalized treatment planning by processing multimodal patient information through machine learning, computer vision, and natural language processing. These systems effectively interpret clinical results, identify critical disease patterns (including clinical stage, site, tumor dimensions, histopathologic grading, and molecular profiles), and construct comprehensive patient profiles. This comprehensive AI-CDSS approach allows for early cancer detection, a reduction in diagnostic delays and improved intervention outcomes. Moreover, the AI-CDSS also optimizes treatment plans on the basis of unique patient parameters, tumor stages and risk factors, providing personalized therapy.
Collapse
Affiliation(s)
- Manoj Kumar Karuppan Perumal
- Centre for Stem Cell Mediated Advanced Research Therapeutics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India
| | - Remya Rajan Renuka
- Centre for Stem Cell Mediated Advanced Research Therapeutics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India
| | - Suresh Kumar Subbiah
- Centre for Stem Cell Mediated Advanced Research Therapeutics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India
| | - Prabhu Manickam Natarajan
- Department of Clinical Sciences, College of Dentistry, Centre of Medical and Bio-Allied Health Sciences and Research, Ajman University, Ajman, United Arab Emirates
| |
Collapse
|
4
|
Magateshvaren Saras MA, Mitra MK, Tyagi S. Navigating the Multiverse: a Hitchhiker's guide to selecting harmonization methods for multimodal biomedical data. Biol Methods Protoc 2025; 10:bpaf028. [PMID: 40308831 PMCID: PMC12043205 DOI: 10.1093/biomethods/bpaf028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Revised: 03/20/2025] [Accepted: 04/15/2025] [Indexed: 05/02/2025] Open
Abstract
The application of machine learning (ML) techniques in predictive modelling has greatly advanced our comprehension of biological systems. There is a notable shift in the trend towards integration methods that specifically target the simultaneous analysis of multiple modes or types of data, showcasing superior results compared to individual analyses. Despite the availability of diverse ML architectures for researchers interested in embracing a multimodal approach, the current literature lacks a comprehensive taxonomy that includes the pros and cons of these methods to guide the entire process. Closing this gap is imperative, necessitating the creation of a robust framework. This framework should not only categorize the diverse ML architectures suitable for multimodal analysis but also offer insights into their respective advantages and limitations. Additionally, such a framework can serve as a valuable guide for selecting an appropriate workflow for multimodal analysis. This comprehensive taxonomy would provide a clear guidance and support informed decision-making within the progressively intricate landscape of biomedical and clinical data analysis. This is an essential step towards advancing personalized medicine. The aims of the work are to comprehensively study and describe the harmonization processes that are performed and reported in the literature and present a working guide that would enable planning and selecting an appropriate integrative model. We present harmonization as a dual process of representation and integration, each with multiple methods and categories. The taxonomy of the various representation and integration methods are classified into six broad categories and detailed with the advantages, disadvantages and examples. A guide flowchart describing the step-by-step processes that are needed to adopt a multimodal approach is also presented along with examples and references. This review provides a thorough taxonomy of methods for harmonizing multimodal data and introduces a foundational 10-step guide for newcomers to implement a multimodal workflow.
Collapse
Affiliation(s)
- Murali Aadhitya Magateshvaren Saras
- IITB-Monash Research Academy, Mumbai, Maharashtra 400076, India
- Department of Physics, Indian Institute of Technology Bombay, Mumbai, Maharashtra 400076, India
- School of Translational Medicine, Monash University, Melbourne, Victoria 3181, Australia
| | - Mithun K Mitra
- Department of Physics, Indian Institute of Technology Bombay, Mumbai, Maharashtra 400076, India
| | - Sonika Tyagi
- School of Translational Medicine, Monash University, Melbourne, Victoria 3181, Australia
- School of Computing Technologies, RMIT University, Melbourne, Victoria 3001, Australia
| |
Collapse
|
5
|
Yates J, Van Allen EM. New horizons at the interface of artificial intelligence and translational cancer research. Cancer Cell 2025; 43:708-727. [PMID: 40233719 PMCID: PMC12007700 DOI: 10.1016/j.ccell.2025.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 03/04/2025] [Accepted: 03/12/2025] [Indexed: 04/17/2025]
Abstract
Artificial intelligence (AI) is increasingly being utilized in cancer research as a computational strategy for analyzing multiomics datasets. Advances in single-cell and spatial profiling technologies have contributed significantly to our understanding of tumor biology, and AI methodologies are now being applied to accelerate translational efforts, including target discovery, biomarker identification, patient stratification, and therapeutic response prediction. Despite these advancements, the integration of AI into clinical workflows remains limited, presenting both challenges and opportunities. This review discusses AI applications in multiomics analysis and translational oncology, emphasizing their role in advancing biological discoveries and informing clinical decision-making. Key areas of focus include cellular heterogeneity, tumor microenvironment interactions, and AI-aided diagnostics. Challenges such as reproducibility, interpretability of AI models, and clinical integration are explored, with attention to strategies for addressing these hurdles. Together, these developments underscore the potential of AI and multiomics to enhance precision oncology and contribute to advancements in cancer care.
Collapse
Affiliation(s)
- Josephine Yates
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Institute for Machine Learning, Department of Computer Science, ETH Zürich, Zurich, Switzerland; ETH AI Center, ETH Zurich, Zurich, Switzerland; Swiss Institute for Bioinformatics (SIB), Lausanne, Switzerland
| | - Eliezer M Van Allen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Medical Sciences, Harvard University, Boston, MA, USA; Parker Institute for Cancer Immunotherapy, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
6
|
Llinas-Bertran A, Butjosa-Espín M, Barberi V, Seoane JA. Multimodal data integration in early-stage breast cancer. Breast 2025; 80:103892. [PMID: 39922065 PMCID: PMC11973824 DOI: 10.1016/j.breast.2025.103892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 12/13/2024] [Accepted: 01/27/2025] [Indexed: 02/10/2025] Open
Abstract
The use of biomarkers in breast cancer has significantly improved patient outcomes through targeted therapies, such as hormone therapy anti-Her2 therapy and CDK4/6 or PARP inhibitors. However, existing knowledge does not fully encompass the diverse nature of breast cancer, particularly in triple-negative tumors. The integration of multi-omics and multimodal data has the potential to provide new insights into biological processes, to improve breast cancer patient stratification, enhance prognosis and response prediction, and identify new biomarkers. This review presents a comprehensive overview of the state-of-the-art multimodal (including molecular and image) data integration algorithms developed and with applicability to breast cancer stratification, prognosis, or biomarker identification. We examined the primary challenges and opportunities of these multimodal data integration algorithms, including their advantages, limitations, and critical considerations for future research. We aimed to describe models that are not only academically and preclinically relevant, but also applicable to clinical settings.
Collapse
Affiliation(s)
- Arnau Llinas-Bertran
- Cancer Computational Biology Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Maria Butjosa-Espín
- Cancer Computational Biology Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Vittoria Barberi
- Breast Cancer Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Jose A Seoane
- Cancer Computational Biology Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain.
| |
Collapse
|
7
|
Zhang S, Lv J, Zhang J, Fan Z, Gu B, Fan B, Li C, Wang C, Zhang T. Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 261:108603. [PMID: 39826483 DOI: 10.1016/j.cmpb.2025.108603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 11/27/2024] [Accepted: 01/12/2025] [Indexed: 01/22/2025]
Abstract
BACKGROUND AND OBJECTIVE Colorectal cancer (CRC) represents a heterogeneous malignancy that has concerned global burden of incidence and mortality. The traditional tumor-node-metastasis staging system has exhibited certain limitations. With the advancement of omics technologies, researchers are directing their focus on developing a more precise multi-omics molecular classification. Therefore, the utilization of unsupervised multi-omics integrative clustering methods in CRC, advocating for the establishment of a comprehensive benchmark with practical guidelines. METHODS In this study, we obtained CRC multi-omics data, encompassing DNA methylation, gene expression, and protein expression from the cancer genome atlas (TCGA)database. We then generated interrelated CRC multi-omics data with various structures based on realistic multi-omics correlations, and performed a comprehensive evaluation of eight representative methods categorized as early integration, intermediate integration, and late integration using complementary benchmarks for subtype classification accuracy. Lastly, we employed these methods to integrate real-world CRC multi-omics data, survival and differential analysis were used to highlight differences among newly identified multi-omics subtypes. RESULTS Through in-depth comparisons, we observed that similarity network fusion (SNF) exhibited exceptional performance in integrating multi-omics data derived from simulations. Additionally, SNF effectively distinguished CRC patients into five subgroups with the highest classification accuracy. Moreover, we found significant survival differences and molecular distinctions among SNF subtypes. CONCLUSIONS The findings consistently demonstrate that SNF outperforms other methods in CRC multi-omics integrative clustering. The significant survival differences and molecular distinctions among SNF subtypes provide novel insights into the multi-omics perspective on CRC heterogeneity with potential clinical treatment.
Collapse
Affiliation(s)
- Shuai Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Jiali Lv
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Jinglan Zhang
- School of Life Science, Shandong University, Qingdao, 266237, China
| | - Zhe Fan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Bingbing Gu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Bingbing Fan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Chunxia Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Cheng Wang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China.
| | - Tao Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China; Department of Epidemiology and Biostatistics, School of Public Health, Tianjin Medical University, Tianjin, 300070, China.
| |
Collapse
|
8
|
Gliozzo J, Soto Gomez MA, Bonometti A, Patak A, Casiraghi E, Valentini G. miss-SNF: a multimodal patient similarity network integration approach to handle completely missing data sources. Bioinformatics 2025; 41:btaf150. [PMID: 40184204 PMCID: PMC12011365 DOI: 10.1093/bioinformatics/btaf150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/06/2025] [Accepted: 04/02/2025] [Indexed: 04/05/2025] Open
Abstract
MOTIVATION Precision medicine leverages patient-specific multimodal data to improve prevention, diagnosis, prognosis, and treatment of diseases. Advancing precision medicine requires the non-trivial integration of complex, heterogeneous, and potentially high-dimensional data sources, such as multi-omics and clinical data. In the literature, several approaches have been proposed to manage missing data, but are usually limited to the recovery of subsets of features for a subset of patients. A largely overlooked problem is the integration of multiple sources of data when one or more of them are completely missing for a subset of patients, a relatively common condition in clinical practice. RESULTS We propose miss-Similarity Network Fusion (miss-SNF), a novel general-purpose data integration approach designed to manage completely missing data in the context of patient similarity networks. miss-SNF integrates incomplete unimodal patient similarity networks by leveraging a non-linear message-passing strategy borrowed from the SNF algorithm. miss-SNF is able to recover missing patient similarities and is "task agnostic", in the sense that can integrate partial data for both unsupervised and supervised prediction tasks. Experimental analyses on nine cancer datasets from The Cancer Genome Atlas (TCGA) demonstrate that miss-SNF achieves state-of-the-art results in recovering similarities and in identifying patients subgroups enriched in clinically relevant variables and having differential survival. Moreover, amputation experiments show that miss-SNF supervised prediction of cancer clinical outcomes and Alzheimer's disease diagnosis with completely missing data achieves results comparable to those obtained when all the data are available. AVAILABILITY AND IMPLEMENTATION miss-SNF code, implemented in R, is available at https://github.com/AnacletoLAB/missSNF.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab, Dipartimento di Informatica “Giovanni Degli Antoni”, Università degli Studi di Milano, Via Giovanni Celoria 18, Milan, 20133, Italy
- European Commission, Joint Research Centre (JRC), Ispra, 21027, Italy
| | - Mauricio A Soto Gomez
- AnacletoLab, Dipartimento di Informatica “Giovanni Degli Antoni”, Università degli Studi di Milano, Via Giovanni Celoria 18, Milan, 20133, Italy
| | - Arturo Bonometti
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele (MI), 20072, Italy
- Department of Pathology, IRCCS Humanitas Clinical and Research Hospital, Via Alessandro Manzoni 56, Rozzano (MI), 20089, Italy
| | - Alex Patak
- European Commission, Joint Research Centre (JRC), Ispra, 21027, Italy
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica “Giovanni Degli Antoni”, Università degli Studi di Milano, Via Giovanni Celoria 18, Milan, 20133, Italy
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, United States
- Milan Unit, ELLIS—European Laboratory for Learning and Intelligent Systems, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica “Giovanni Degli Antoni”, Università degli Studi di Milano, Via Giovanni Celoria 18, Milan, 20133, Italy
- Milan Unit, ELLIS—European Laboratory for Learning and Intelligent Systems, Italy
| |
Collapse
|
9
|
Tan CY, Ong HF, Lim CH, Tan MS, Ooi EH, Wong K. Amogel: a multi-omics classification framework using associative graph neural networks with prior knowledge for biomarker identification. BMC Bioinformatics 2025; 26:94. [PMID: 40155814 PMCID: PMC11954243 DOI: 10.1186/s12859-025-06111-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Accepted: 03/10/2025] [Indexed: 04/01/2025] Open
Abstract
The advent of high-throughput sequencing technologies, such as DNA microarray and DNA sequencing, has enabled effective analysis of cancer subtypes and targeted treatment. Furthermore, numerous studies have highlighted the capability of graph neural networks (GNN) to model complex biological systems and capture non-linear interactions in high-throughput data. GNN has proven to be useful in leveraging multiple types of omics data, including prior biological knowledge from various sources, such as transcriptomics, genomics, proteomics, and metabolomics, to improve cancer classification. However, current works do not fully utilize the non-linear learning potential of GNN and lack of the integration ability to analyse high-throughput multi-omics data simultaneously with prior biological knowledge. Nevertheless, relying on limited prior knowledge in generating gene graphs might lead to less accurate classification due to undiscovered significant gene-gene interactions, which may require expert intervention and can be time-consuming. Hence, this study proposes a graph classification model called associative multi-omics graph embedding learning (AMOGEL) to effectively integrate multi-omics datasets and prior knowledge through GNN coupled with association rule mining (ARM). AMOGEL employs an early fusion technique using ARM to mine intra-omics and inter-omics relationships, forming a multi-omics synthetic information graph before the model training. Moreover, AMOGEL introduces multi-dimensional edges, with multi-omics gene associations or edges as the main contributors and prior knowledge edges as auxiliary contributors. Additionally, it uses a gene ranking technique based on attention scores, considering the relationships between neighbouring genes. Several experiments were performed on BRCA and KIPAN cancer subtypes to demonstrate the integration of multi-omics datasets (miRNA, mRNA, and DNA methylation) with prior biological knowledge of protein-protein interactions, KEGG pathways and Gene Ontology. The experimental results showed that the AMOGEL outperformed the current state-of-the-art models in terms of classification accuracy, F1 score and AUC score. The findings of this study represent a crucial step forward in advancing the effective integration of multi-omics data and prior knowledge to improve cancer subtype classification.
Collapse
Affiliation(s)
- Chia Yan Tan
- School of Information Technology, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia.
| | - Huey Fang Ong
- School of Information Technology, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia
| | - Chern Hong Lim
- School of Information Technology, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia
| | - Mei Sze Tan
- School of Information Technology, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia
| | - Ean Hin Ooi
- School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia
| | - KokSheik Wong
- School of Information Technology, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Petaling Jaya, Selangor, Malaysia
| |
Collapse
|
10
|
Liao L, Xie M, Zheng X, Zhou Z, Deng Z, Gao J. Molecular insights fast-tracked: AI in biosynthetic pathway research. Nat Prod Rep 2025. [PMID: 40130306 DOI: 10.1039/d4np00003j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2025]
Abstract
Covering: 2000 to 2025This review explores the potential of artificial intelligence (AI) in addressing challenges and accelerating molecular insights in biosynthetic pathway research, which is crucial for developing bioactive natural products with applications in pharmacology, agriculture, and biotechnology. It provides an overview of various AI techniques relevant to this research field, including machine learning (ML), deep learning (DL), natural language processing, network analysis, and data mining. AI-powered applications across three main areas, namely, pathway discovery and mining, pathway design, and pathway optimization, are discussed, and the benefits and challenges of integrating omics data and AI for enhanced pathway research are also elucidated. This review also addresses the current limitations, future directions, and the importance of synergy between AI and experimental approaches in unlocking rapid advancements in biosynthetic pathway research. The review concludes with an evaluation of AI's current capabilities and future outlook, emphasizing the transformative impact of AI on biosynthetic pathway research and the potential for new opportunities in the discovery and optimization of bioactive natural products.
Collapse
Affiliation(s)
- Lijuan Liao
- Key BioAI Synthetica Lab for Natural Product Drug Discovery, College of Bee, Biomedical and Pharmaceutical Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237, P. R. China
| | - Mengjun Xie
- Key BioAI Synthetica Lab for Natural Product Drug Discovery, College of Bee, Biomedical and Pharmaceutical Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| | - Xiaoshan Zheng
- Key BioAI Synthetica Lab for Natural Product Drug Discovery, College of Bee, Biomedical and Pharmaceutical Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| | - Zhao Zhou
- Key BioAI Synthetica Lab for Natural Product Drug Discovery, College of Bee, Biomedical and Pharmaceutical Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| | - Zixin Deng
- State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Jiangtao Gao
- Key BioAI Synthetica Lab for Natural Product Drug Discovery, College of Bee, Biomedical and Pharmaceutical Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| |
Collapse
|
11
|
Zhang H, Goedegebuure SP, Ding L, DeNardo D, Fields RC, Province M, Chen Y, Payne P, Li F. M3NetFlow: A multi-scale multi-hop graph AI model for integrative multi-omic data analysis. iScience 2025; 28:111920. [PMID: 40034855 PMCID: PMC11872513 DOI: 10.1016/j.isci.2025.111920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 10/17/2024] [Accepted: 01/27/2025] [Indexed: 03/05/2025] Open
Abstract
Multi-omic data-driven studies are at the forefront of precision medicine by characterizing complex disease signaling systems across multiple views and levels. The integration and interpretation of multi-omic data are critical for identifying disease targets and deciphering disease signaling pathways. However, it remains an open problem due to the complex signaling interactions among many proteins. Herein, we propose a multi-scale multi-hop multi-omic network flow model, M3NetFlow, to facilitate both hypothesis-guided and generic multi-omic data analysis tasks. We evaluated M3NetFlow using two independent case studies: (1) uncovering mechanisms of synergy of drug combinations (hypothesis/anchor-target guided multi-omic analysis) and (2) identifying biomarkers of Alzheimer's disease (generic multi-omic analysis). The evaluation and comparison results showed that M3NetFlow achieved the best prediction accuracy and identified a set of drug combination synergy- and disease-associated targets. The model can be directly applied to other multi-omic data-driven studies.
Collapse
Affiliation(s)
- Heming Zhang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University in St. Louis, St. Louis, MO, USA
| | - S. Peter Goedegebuure
- Department of Surgery, Washington University in St. Louis, St. Louis, MO, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Li Ding
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - David DeNardo
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Ryan C. Fields
- Department of Surgery, Washington University in St. Louis, St. Louis, MO, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Michael Province
- Division of Statistical Genomics, Department of Genetics, Washington University in St. Louis, St. Louis, MO, USA
| | - Yixin Chen
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Philip Payne
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University in St. Louis, St. Louis, MO, USA
| | - Fuhai Li
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University in St. Louis, St. Louis, MO, USA
- Division of Statistical Genomics, Department of Genetics, Washington University in St. Louis, St. Louis, MO, USA
- Department of Pediatrics, Washington University in St. Louis, St. Louis, MO, USA
| |
Collapse
|
12
|
Zhang W, Huang H, Wang L, Lehmann BD, Chen SX. An Integrative Multi-Omics Random Forest Framework for Robust Biomarker Discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.05.641533. [PMID: 40093058 PMCID: PMC11908250 DOI: 10.1101/2025.03.05.641533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
High-throughput technologies now produce a wide array of omics data, from genomic and transcriptomic profiles to epigenomic and proteomic measurements. Integrating these diverse data types can yield deeper insights into the biological mechanisms driving complex traits and diseases. Yet, extracting key shared biomarkers from multiple data layers remains a major challenge. We present a multivariate random forest (MRF)-based framework enhanced by a novel inverse minimal depth (IMD) metric for integrative variable selection. By assigning response variables to tree nodes and employing IMD to rank predictors, our approach efficiently identifies essential features across different omics types, even when confronted with high-dimensionality and noise. Through extensive simulations and analyses of multi-omics datasets from The Cancer Genome Atlas, we demonstrate that our method outperforms established integrative techniques in uncovering biologically meaningful biomarkers and pathways. Our findings show that selected biomarkers not only correlate with known regulatory and signaling networks but can also stratify patient subgroups with distinct clinical outcomes. The method's scalable, interpretable, and user-friendly implementation ensures broad applicability to a range of research questions. This MRF-based framework advances robust biomarker discovery and integrative multi-omics analyses, accelerating the translation of complex molecular data into tangible biological and clinical insights.
Collapse
Affiliation(s)
- Wei Zhang
- Division of Biostatistics, Department of Public Health Sciences, University of Miami, Miller School of Medicine, Miami, FL 33136, USA
| | - Hanchen Huang
- Division of Biostatistics, Department of Public Health Sciences, University of Miami, Miller School of Medicine, Miami, FL 33136, USA
| | - Lily Wang
- Division of Biostatistics, Department of Public Health Sciences, University of Miami, Miller School of Medicine, Miami, FL 33136, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL 33136, USA
- Dr. John T Macdonald Foundation Department of Human Genetics, University of Miami, Miller School of Medicine, Miami, FL, 33136, USA
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, 33136, USA
| | - Brian D. Lehmann
- Division of Hematology and Oncology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Steven X. Chen
- Division of Biostatistics, Department of Public Health Sciences, University of Miami, Miller School of Medicine, Miami, FL 33136, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miller School of Medicine, Miami, FL 33136, USA
| |
Collapse
|
13
|
Zhi P, Liu Y, Zhao C, He K. GCBRGCN: Integration of ceRNA and RGCN to Identify Gastric Cancer Biomarkers. Bioengineering (Basel) 2025; 12:255. [PMID: 40150719 PMCID: PMC11939766 DOI: 10.3390/bioengineering12030255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2025] [Revised: 02/26/2025] [Accepted: 02/27/2025] [Indexed: 03/29/2025] Open
Abstract
Gastric cancer (GC) is a prevalent malignancy, and the discovery of biomarkers plays a crucial role in the diagnosis and prognosis of GC. However, current strategies for identifying GC biomarkers often focus on a single ribonucleic acid (RNA) class, neglecting the potential for multiple RNA types to collectively serve as biomarkers with improved predictive capabilities. To bridge this gap, our study introduces the GC biomarker relation graph convolution neural network (GCBRGCN) model which integrates the competing endogenous RNA (ceRNA) network with GC clinical informations and whole transcriptomics data, leveraging the relational graph convolutional network (RGCN) to predict GC biomarkers. It demonstrates exceptional performance, surpassing traditional machine learning and graph neural network algorithms with an area under the curve (AUC) of 0.8172 in the task of predicting GC biomarkers. Our study identified three unreported potential novel GC biomarkers: CCNG1, CYP1B1, and CITED2. Moreover, FOXC1 and LINC00324 were characterized as biomarkers with significance in both prognosis and diagnosis. Our work offers a novel framework for GC biomarker identification, highlighting the critical role of multiple types RNA interaction in oncological research.
Collapse
Affiliation(s)
- Peng Zhi
- Chinese PLA Medical School, Chinese PLA General Hospital, Beijing 100853, China;
- Medical Innovation Research Department of PLA General Hospital, Chinese PLA General Hospital, Beijing 100853, China
- Key Laboratory for Research and Evaluationof Artificial Intelligence Medical Devices, Chinese PLA General Hospital, Beijing 100853, China
- Medical Engineering Laboratory of Chinese PLA General Hospital, Chinese PLA General Hospital, Beijing 100853, China
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing 100853, China
| | - Yue Liu
- School of Computer Science and Technology, National University of Denfense Technology, Changsha 410073, China;
| | - Chenghui Zhao
- Medical Innovation Research Department of PLA General Hospital, Chinese PLA General Hospital, Beijing 100853, China
- Key Laboratory for Research and Evaluationof Artificial Intelligence Medical Devices, Chinese PLA General Hospital, Beijing 100853, China
- Medical Engineering Laboratory of Chinese PLA General Hospital, Chinese PLA General Hospital, Beijing 100853, China
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing 100853, China
| | - Kunlun He
- Medical Innovation Research Department of PLA General Hospital, Chinese PLA General Hospital, Beijing 100853, China
- Key Laboratory for Research and Evaluationof Artificial Intelligence Medical Devices, Chinese PLA General Hospital, Beijing 100853, China
- Medical Engineering Laboratory of Chinese PLA General Hospital, Chinese PLA General Hospital, Beijing 100853, China
- Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing 100853, China
| |
Collapse
|
14
|
Gurazada SGR, Kennedy HM, Braatz RD, Mehrman SJ, Polson SW, Rombel IT. HEK-omics: The promise of omics to optimize HEK293 for recombinant adeno-associated virus (rAAV) gene therapy manufacturing. Biotechnol Adv 2025; 79:108506. [PMID: 39708987 DOI: 10.1016/j.biotechadv.2024.108506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 11/14/2024] [Accepted: 12/15/2024] [Indexed: 12/23/2024]
Abstract
Gene therapy is poised to transition from niche to mainstream medicine, with recombinant adeno-associated virus (rAAV) as the vector of choice. However, robust, scalable, industrialized production is required to meet demand and provide affordable patient access, which has not yet materialized. Closing the chasm between demand and supply requires innovation in biomanufacturing to achieve the essential step change in rAAV product yield and quality. Omics provides a rich source of mechanistic knowledge that can be applied to HEK293, the most commonly used cell line for rAAV production. In this review, the findings from a growing number of diverse studies that apply genomics, epigenomics, transcriptomics, proteomics, and metabolomics to HEK293 bioproduction are explored. Learnings from CHO-omics, application of omics approaches to improve CHO bioproduction, provide a framework to explore the potential of "HEK-omics" as a multi-omics-informed approach providing actionable mechanistic insights for improved transient and stable production of rAAV and other recombinant products in HEK293.
Collapse
Affiliation(s)
- Sai Guna Ranjan Gurazada
- Center for Bioinformatics and Computational Biology, Department of Computer and Information Sciences, University of Delaware, Newark, DE, United States
| | | | - Richard D Braatz
- Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Steven J Mehrman
- Johnson & Johnson, J&J Innovative Medicine, Spring House, PA, United States
| | - Shawn W Polson
- Center for Bioinformatics and Computational Biology, Department of Computer and Information Sciences, University of Delaware, Newark, DE, United States.
| | | |
Collapse
|
15
|
Pateras J, Lodi M, Rana P, Ghosh P. Heterogeneous Clustering of Multiomics Data for Breast Cancer Subgroup Classification and Detection. Int J Mol Sci 2025; 26:1707. [PMID: 40004168 PMCID: PMC11855380 DOI: 10.3390/ijms26041707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 02/10/2025] [Accepted: 02/11/2025] [Indexed: 02/27/2025] Open
Abstract
The rapid growth of diverse -omics datasets has made multiomics data integration crucial in cancer research. This study adapts the expectation-maximization routine for the joint latent variable modeling of multiomics patient profiles. By combining this approach with traditional biological feature selection methods, this study optimizes latent distribution, enabling efficient patient clustering from well-studied cancer types with reduced computational expense. The proposed optimization subroutines enhance survival analysis and improve runtime performance. This article presents a framework for distinguishing cancer subtypes and identifying potential biomarkers for breast cancer. Key insights into individual subtype expression and function were obtained through differentially expressed gene analysis and pathway enrichment for BRCA patients. The analysis compared 302 tumor samples to 113 normal samples across 60,660 genes. The highly upregulated gene COL10A1, promoting breast cancer progression and poor prognosis, and the consistently downregulated gene CDG300LG, linked to brain metastatic cancer, were identified. Pathway enrichment analysis revealed similarities in cellular matrix organization pathways across subtypes, with notable differences in functions like cell proliferation regulation and endocytosis by host cells. GO Semantic Similarity analysis quantified gene relationships in each subtype, identifying potential biomarkers like MATN2, similar to COL10A1. These insights suggest deeper relationships within clusters and highlight personalized treatment potential based on subtypes.
Collapse
Affiliation(s)
- Joseph Pateras
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;
| | - Musaddiq Lodi
- Integrative Life Sciences, Virginia Commonwealth University, Richmond, VA 23284, USA;
| | - Pratip Rana
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;
| |
Collapse
|
16
|
Xu Y, Jiang X, Hu Z. Synergizing metabolomics and artificial intelligence for advancing precision oncology. Trends Mol Med 2025:S1471-4914(25)00016-4. [PMID: 39956738 DOI: 10.1016/j.molmed.2025.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 01/22/2025] [Accepted: 01/24/2025] [Indexed: 02/18/2025]
Abstract
Metabolomics has emerged as a transformative tool in precision oncology, with substantial potential for advancing biomarker discovery, monitoring treatment responses, and aiding drug development. Integrating artificial intelligence (AI) into metabolomics optimizes data acquisition and analysis, facilitating the interpretation of complex metabolic networks and enabling more effective multiomics integration. In this opinion, we explore recent advances in the application of metabolomics within precision oncology, emphasizing the unique advantages that AI-driven metabolomics offers. We propose that AI not only complements but also amplifies the potential of current platforms, accelerating research progress and ultimately improving patient outcomes. Finally, we discuss the opportunities and challenges involved in translating AI-driven metabolomics into clinical practice for precision oncology.
Collapse
Affiliation(s)
- Yipeng Xu
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Xiaojuan Jiang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China
| | - Zeping Hu
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China; Tsinghua-Peking Joint Center for Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
17
|
Alharbi F, Vakanski A, Zhang B, Elbashir MK, Mohammed M. Comparative Analysis of Multi-Omics Integration Using Graph Neural Networks for Cancer Classification. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2025; 13:37724-37736. [PMID: 40123934 PMCID: PMC11928009 DOI: 10.1109/access.2025.3540769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]
Abstract
Recent studies on integrating multiple omics data highlighted the potential to advance our understanding of the cancer disease process. Computational models based on graph neural networks and attention-based architectures have demonstrated promising results for cancer classification due to their ability to model complex relationships among biological entities. However, challenges related to addressing the high dimensionality and complexity in integrating multi-omics data, as well as in constructing graph structures that effectively capture the interactions between nodes, remain active areas of research. This study evaluates graph neural network architectures for multi-omics (MO) data integration based on graph-convolutional networks (GCN), graph-attention networks (GAT), and graph-transformer networks (GTN). Differential gene expression and LASSO (Least Absolute Shrinkage and Selection Operator) regression are employed for reducing the omics data dimensionality and feature selection; hence, the developed models are referred to as LASSO-MOGCN, LASSO-MOGAT, and LASSO-MOGTN. Graph structures constructed using sample correlation matrices and protein-protein interaction networks are investigated. Experimental validation is performed with a dataset of 8,464 samples from 31 cancer types and normal tissue, comprising messenger-RNA, micro-RNA, and DNA methylation data. The results show that the models integrating multi-omics data outperformed the models trained on single omics data, where LASSO-MOGAT achieved the best overall performance, with an accuracy of 95.9%. The findings also suggest that correlation-based graph structures enhance the models' ability to identify shared cancer-specific signatures across patients in comparison to protein-protein interaction networks-based graph structures. The code and data used in this study are available in the link (https://github.com/FadiAlharbi2024/Graph_Based_Architecture.git).
Collapse
Affiliation(s)
- Fadi Alharbi
- College of Engineering, Department of Computer Science, University of Idaho, Moscow, ID 83844, USA
| | - Aleksandar Vakanski
- College of Engineering, Department of Computer Science, University of Idaho, Moscow, ID 83844, USA
| | - Boyu Zhang
- College of Engineering, Department of Computer Science, University of Idaho, Moscow, ID 83844, USA
| | - Murtada K Elbashir
- College of Computer and Information Sciences, Department of Information Systems, Jouf University, Sakaka, Al-Jouf 72441, Saudi Arabia
| | - Mohanad Mohammed
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg 3209, South Africa
| |
Collapse
|
18
|
Zhang Y, Zheng H, Meng X, Wang Q, Li Z, Wu W. MOCapsNet: Multiomics Data Integration for Cancer Subtype Analysis Based on Dynamic Self-Attention Learning and Capsule Networks. J Chem Inf Model 2025; 65:1653-1665. [PMID: 39818771 DOI: 10.1021/acs.jcim.4c02130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
Background and Objective: With the rapid development of the accumulation of large-scale multiomics data sets, integrating various omics data to provide a thorough study from multiple perspectives can significantly provide stronger support for precise treatment of diseases. However, due to the complexity of multiomics data, issues of feature redundancy and noise often do not receive sufficient attention when processing high-dimensional data. Moreover, simple concatenation strategies may overlook the correlations between different omics data, failing to effectively capture the unique information inherent in multiomics data. Meanwhile, deep neural networks often rely on complex structures and numerous parameters for training and inference, making their internal feature representations and decision-making processes difficult to interpret. Methods: We propose an interpretable multiomics data integration method for cancer subtype classification, named MOCapsNet, based on self-attention and capsule networks. Specifically, the self-attention confidence learning module is implemented to assess the feature information within each omic and to assign weights to the embedded representations of various groups, resulting in more targeted integrated information. Furthermore, the capsule network structure is employed for the final cancer classification task. Results: The model achieved strong performance on both tasks: 87.8% accuracy on the BRCA multiclassification data set and 83.6% accuracy with an AUC of 88.8% on the LGG data set. Conclusions: The proposed framework has undergone extensive testing on omics data sets, consistently proving its effectiveness in integrating multiomics data. It improves classification accuracy while enhancing the interpretability of results by fully utilizing the feature information.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, Shandong, China
| | - Haoyu Zheng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, Shandong, China
| | - Xiaokun Meng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, Shandong, China
| | - Qihao Wang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, Shandong, China
| | - Zimin Li
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, Shandong, China
| | - Wenhao Wu
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, Shandong, China
| |
Collapse
|
19
|
Yang H, Zhang X, Jia Z, Wang H, Wu J, Wei X, Huang Y, Yan W, Lin Y. Targeting ferroptosis in prostate cancer management: molecular mechanisms, multidisciplinary strategies and translational perspectives. J Transl Med 2025; 23:166. [PMID: 39920771 PMCID: PMC11806579 DOI: 10.1186/s12967-025-06180-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Accepted: 01/25/2025] [Indexed: 02/09/2025] Open
Abstract
Prostate cancer (PCa) is a kind of malignant solid tumor commonly observed among males worldwide. The dilemma of increasing incidence with therapeutic resistance has become the leading issue in PCa clinical management. Ferroptosis is a new form of regulatory cell death caused by iron-dependent lipid peroxidation, which has a dual role in PCa evolution and treatment due to the multi-omics cascade of interactions among pathways and environmental stimuli. Hence deciphering the role of ferroptosis in carcinogenesis would provide novel insights and strategies for precision medicine and personalized healthcare against PCa. In this study, the mechanisms of ferroptosis during cancer development were summarized both at the molecular and tumor microenvironment level. Then literature-reported ferroptosis-related signatures in PCa, e.g., genes, non-coding RNAs, metabolites, natural products and drug components, were manually collected and functionally compared as drivers/inducers, suppressors/inhibitors, and biomarkers according to their regulatory patterns in PCa ferroptosis and pathogenesis. The state-of-the-art techniques for ferroptosis-related data integration, knowledge identification, and translational application to PCa theranostics were discussed from a combinative perspective of artificial intelligence-powered modelling and advanced material-oriented therapeutic scheme design. The prospects and challenges in ferroptosis-based PCa researches were finally highlighted to light up future wisdoms for the flourishing of current findings from bench to bedside.
Collapse
Affiliation(s)
- Hubo Yang
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
| | - Xuefeng Zhang
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
| | - Zongming Jia
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
| | - He Wang
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
| | - Jixiang Wu
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
| | - Xuedong Wei
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
| | - Yuhua Huang
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China.
| | - Wenying Yan
- Suzhou Key Lab of Multi-modal Data Fusion and Intelligent Healthcare, Suzhou, 215104, China.
- School of Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, 215123, China.
| | - Yuxin Lin
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China.
- Suzhou Key Lab of Multi-modal Data Fusion and Intelligent Healthcare, Suzhou, 215104, China.
| |
Collapse
|
20
|
Monette A, Aguilar-Mahecha A, Altinmakas E, Angelos MG, Assad N, Batist G, Bommareddy PK, Bonilla DL, Borchers CH, Church SE, Ciliberto G, Cogdill AP, Fattore L, Hacohen N, Haris M, Lacasse V, Lie WR, Mehta A, Ruella M, Sater HA, Spatz A, Taouli B, Tarhoni I, Gonzalez-Kozlova E, Tirosh I, Wang X, Gnjatic S. The Society for Immunotherapy of Cancer Perspective on Tissue-Based Technologies for Immuno-Oncology Biomarker Discovery and Application. Clin Cancer Res 2025; 31:439-456. [PMID: 39625818 DOI: 10.1158/1078-0432.ccr-24-2469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 09/27/2024] [Accepted: 11/12/2024] [Indexed: 02/04/2025]
Abstract
With immuno-oncology becoming the standard of care for a variety of cancers, identifying biomarkers that reliably classify patient response, resistance, or toxicity becomes the next critical barrier toward improving care. Multiparametric, multi-omics, and computational platforms generating an unprecedented depth of data are poised to usher in the discovery of increasingly robust biomarkers for enhanced patient selection and personalized treatment approaches. Deciding which developing technologies to implement in clinical settings ultimately, applied either alone or in combination, relies on weighing pros and cons, from minimizing patient sampling to maximizing data outputs, and assessing the reproducibility and representativeness of findings, while lessening data fragmentation toward harmonization. These factors are all assessed while taking into consideration the shortest turnaround time. The Society for Immunotherapy of Cancer Biomarkers Committee convened to identify important advances in biomarker technologies and to address advances in biomarker discovery using multiplexed IHC and immunofluorescence, their coupling to single-cell transcriptomics, along with mass spectrometry-based quantitative and spatially resolved proteomics imaging technologies. We summarize key metrics obtained, ease of interpretation, limitations and dependencies, technical improvements, and outward comparisons of these technologies. By highlighting the most interesting recent data contributed by these technologies and by providing ways to improve their outputs, we hope to guide correlative research directions and assist in their evolution toward becoming clinically useful in immuno-oncology.
Collapse
Affiliation(s)
- Anne Monette
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Adriana Aguilar-Mahecha
- Lady Davis Institute for Medical Research, The Segal Cancer Center, Jewish General Hospital, Montreal, Quebec, Canada
| | - Emre Altinmakas
- Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York
- Department of Radiology, Koç University School of Medicine, Istanbul, Turkey
| | - Mathew G Angelos
- Division of Hematology and Oncology, Department of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Nima Assad
- Icahn School of Medicine at Mount Sinai, New York, New York
| | - Gerald Batist
- McGill Centre for Translational Research, Jewish General Hospital, Montreal, Quebec, Canada
| | | | | | - Christoph H Borchers
- Gerald Bronfman Department of Oncology, Segal Cancer Proteomics Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
- Division of Experimental Medicine, Department of Pathology, McGill University, Montreal, Quebec, Canada
| | | | - Gennaro Ciliberto
- Scientific Direction, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| | | | - Luigi Fattore
- SAFU Laboratory, Department of Research, Advanced Diagnostics and Technological Innovation, Translational Research Area, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| | - Nir Hacohen
- Massachusetts General Hospital Cancer Center, Boston, Massachusetts
| | - Mohammad Haris
- Department of Radiology, Center for Advanced Metabolic Imaging in Precision Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
- Laboratory Animal Research Center, Qatar University, Doha, Qatar
| | - Vincent Lacasse
- Segal Cancer Proteomics Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, Montreal, Quebec, Canada
| | | | - Arnav Mehta
- Massachusetts General Hospital Cancer Center, Boston, Massachusetts
| | - Marco Ruella
- Division of Hematology-Oncology, Center for Cellular Immunotherapies, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | - Alan Spatz
- Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, McGill University Health Center, Montreal, Quebec, Canada
| | - Bachir Taouli
- Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Imad Tarhoni
- Department of Anatomy and Cell Biology, Rush University Medical Center, Chicago, Illinois
| | | | - Itay Tirosh
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Xiaodong Wang
- Key Laboratory of Mass Spectrometry Imaging and Metabolomics, College of Life and Environmental Sciences, Minzu University of China, Beijing, China
| | - Sacha Gnjatic
- Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
21
|
Gliozzo J, Soto-Gomez M, Guarino V, Bonometti A, Cabri A, Cavalleri E, Reese J, Robinson PN, Mesiti M, Valentini G, Casiraghi E. Intrinsic-dimension analysis for guiding dimensionality reduction and data fusion in multi-omics data processing. Artif Intell Med 2025; 160:103049. [PMID: 39673960 DOI: 10.1016/j.artmed.2024.103049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/03/2024] [Accepted: 12/04/2024] [Indexed: 12/16/2024]
Abstract
Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines. While several dimensionality reduction and data fusion algorithms have been proposed, crucial aspects are often overlooked. Specifically, the choice of projection space dimension is typically heuristic and uniformly applied across all omics, neglecting the unique high dimension small sample size challenges faced by individual omics. This paper introduces a novel multi-modal dimensionality reduction pipeline tailored to individual views. By leveraging intrinsic dimensionality estimators, we assess the curse-of-dimensionality impact on each view and propose a two-step reduction strategy for significantly affected views, combining feature selection with feature extraction. Compared to traditional uniform reduction pipelines in a crucial and supervised multi-omics analysis setting, our approach shows significant improvement. Additionally, we explore three effective unsupervised multi-omics data fusion methods rooted in the main data fusion strategies to gain insights into their performance under crucial, yet overlooked, settings.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - Mauricio Soto-Gomez
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Valentina Guarino
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Arturo Bonometti
- Department of Biomedical Sciences, Humanitas University, Milan, Italy; Department of Pathology, IRCCS Humanitas Clinical and Research Hospital, Milan, Italy
| | - Alberto Cabri
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Emanuele Cavalleri
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Marco Mesiti
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Giorgio Valentini
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; CINI, Infolife National Laboratory, Roma, Italy
| | - Elena Casiraghi
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; CINI, Infolife National Laboratory, Roma, Italy; Department of Computer Science, Aalto University, Espoo, Finland.
| |
Collapse
|
22
|
Li J, Li B, Zhang X, Ma X, Li Z. MDMNI-DGD: A novel graph neural network approach for druggable gene discovery based on the integration of multi-omics data and the multi-view network. Comput Biol Med 2025; 185:109511. [PMID: 39644579 DOI: 10.1016/j.compbiomed.2024.109511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Revised: 11/14/2024] [Accepted: 11/29/2024] [Indexed: 12/09/2024]
Abstract
Accurately predicting druggable genes is of paramount importance for enhancing the efficacy of targeted therapies, reducing drug-related toxicities and improving patients' survival rates. Nevertheless, accurately predicting candidate cancer-druggable genes remains a critical challenge in translational medicine due to the high heterogeneity and complexity of cancer data. In this study, we proposed a novel graph neural approach called Druggable Gene Discovery based on the Integration of Multi-omics Data and the Multi-view Network (MDMNI-DGD), aiming to predict and evaluate cancer-druggable genes. MDMNI-DGD integrated a comprehensive set of multi-omics data, including copy number variations, DNA methylation, somatic mutations, and gene expression profiles. Simultaneously, it constructed the multi-view gene association network based on protein-protein interactions (PPI), protein structural domains, gene co-expression, pathway co-occurrence, gene sequence and gene ontology. Compared to other state-of-the-art approaches, MDMNI-DGD exhibits excellent performance in key evaluation metrics such as AUROC and AUPR. Moreover, the case study has also demonstrated the efficacy of our approach in discovering potentially druggable genes. Among more than 20,000 protein-coding genes, MDMNI-DGD successfully identified 872 potentially druggable genes. The findings from this investigation may serve to bolster the assessment of pan-cancer druggable genes, potentially catalyzing the development of more personalized and efficacious therapeutic interventions.
Collapse
Affiliation(s)
- Jianwei Li
- School of Artificial Intelligence, Hebei University of Technology, 300401, Tianjin, China.
| | - Bing Li
- School of Artificial Intelligence, Hebei University of Technology, 300401, Tianjin, China
| | - Xukun Zhang
- School of Artificial Intelligence, Hebei University of Technology, 300401, Tianjin, China
| | - Xuxu Ma
- School of Artificial Intelligence, Hebei University of Technology, 300401, Tianjin, China
| | - Ziyu Li
- School of Artificial Intelligence, Hebei University of Technology, 300401, Tianjin, China
| |
Collapse
|
23
|
Ooka T. The Era of Preemptive Medicine: Developing Medical Digital Twins through Omics, IoT, and AI Integration. JMA J 2025; 8:1-10. [PMID: 39926086 PMCID: PMC11799569 DOI: 10.31662/jmaj.2024-0213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 08/26/2024] [Indexed: 02/11/2025] Open
Abstract
Preemptive medicine represents a paradigm shift from reactive treatment to proactive disease prevention. The integration of omics technologies, the Internet of Things (IoT), and artificial intelligence (AI) has facilitated the development of personalized, predictive, and preemptive healthcare strategies. Omic technologies, such as genomics, proteomics, and metabolomics, provide comprehensive insights into molecular profile of an individual, revealing potential disease predispositions and health trajectories. IoT devices, such as wearables and smartphones, enable continuous and periodic monitoring of physiological parameters, thus providing a dynamic view of an individual's health status. AI algorithms analyze comprehensive and complex data from omics and IoT technologies to identify patterns and correlations that inform predictive models of disease risk, progression, and response to interventions. Medical digital twins, or virtual replicas of an individual's biological processes, have emerged as the cornerstone of preemptive medicine. The integration of omics, IoT, and AI enables the development of medical digital twins, which in turn allows for precise simulation of human physiological profiles, prediction of future health outcomes, and virtual individual clinical trials, facilitating personalized proactive interventions and preemptive disease control. This review demonstrates the convergence of omics, IoT, and AI in preemptive medicine, highlighting their potential to revolutionize healthcare by enabling early disease detection, personalized treatment strategies, and chronic disease prevention. We show how AI leverages omics and IoT in preemptive medicine through several case studies while also discussing the necessary data for developing medical digital twins and addressing ethical and social aspects that warrant consideration. Medical digital twins signify a fundamental transformation in health management, shifting from treating diseases after their occurrence to controlling them before their occurrence. This approach enhances the effectiveness of medical interventions and improves overall health outcomes, preparing for a healthier future.
Collapse
Affiliation(s)
- Tadao Ooka
- Department of Health Sciences, University of Yamanashi, Chuo, Japan
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, USA
| |
Collapse
|
24
|
Zeng S, Adusumilli T, Awan SZ, Immadi MS, Xu D, Joshi T. G2PDeep-v2: a web-based deep-learning framework for phenotype prediction and biomarker discovery for all organisms using multi-omics data. RESEARCH SQUARE 2025:rs.3.rs-5776937. [PMID: 39866874 PMCID: PMC11760241 DOI: 10.21203/rs.3.rs-5776937/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
The G2PDeep-v2 server is a web-based platform powered by deep learning, for phenotype prediction and markers discovery from multi-omics data in any organisms including humans, plants, animals, and viruses. The server provides multiple services for researchers to create deep-learning models through an interactive interface and train these models using an automated hyperparameter tuning algorithm on high-performance computing resources. Users can visualize the results of phenotype and markers predictions and perform Gene Set Enrichment Analysis for the significant markers to provide insights into the molecular mechanisms underlying complex diseases, conditions and other biological phenotypes being studied. The G2PDeep-v2 server is publicly available at https://g2pdeep.org/ and can be utilized for all organisms.
Collapse
|
25
|
Ni J, Yan D, Lu S, Xie Z, Liu Y, Zhang X. MiRS-HF: A Novel Deep Learning Predictor for Cancer Classification and miRNA Expression Patterns. IEEE J Biomed Health Inform 2025; 29:679-689. [PMID: 39383085 DOI: 10.1109/jbhi.2024.3476672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/11/2024]
Abstract
Cancer classification and biomarker identification are crucial for guiding personalized treatment. To make effective use of miRNA associations and expression data, we have developed a deep learning model for cancer classification and biomarker identification. We propose an approach for cancer classification called MiRNA Selection and Hybrid Fusion (MiRS-HF), which consists of early fusion and intermediate fusion. The early fusion involves applying a Layer Attention Graph Convolutional Network (LAGCN) to a miRNA-disease heterogeneous network, resulting in a miRNA-disease association degree score matrix. The intermediate fusion employs a Graph Convolutional Network (GCN) in the classification tasks, weighting the expression data based on the miRNA-disease association degree score. Furthermore, MiRS-HF can identify the important miRNA biomarkers and their expression patterns. The proposed method demonstrates superior performance in the classification tasks of six cancers compared to other methods. Simultaneously, we incorporated the feature weighting strategy into the comparison algorithm, leading to a significant improvement in the algorithm's results, highlighting the extreme importance of this strategy.
Collapse
|
26
|
Meng A, Zhuang Y, Huang Q, Tang L, Yang J, Gong P. Development and validation of a cross-modality tensor fusion model using multi-modality MRI radiomics features and clinical radiological characteristics for the prediction of microvascular invasion in hepatocellular carcinoma. EUROPEAN JOURNAL OF SURGICAL ONCOLOGY 2025; 51:109364. [PMID: 39536525 DOI: 10.1016/j.ejso.2024.109364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 10/29/2024] [Accepted: 11/03/2024] [Indexed: 11/16/2024]
Abstract
OBJECTIVES To develop and validate a cross-modality tensor fusion (CMTF) model using multi-modality MRI radiomics features and clinical radiological characteristics for the prediction of microvascular invasion (MVI) in hepatocellular carcinoma (HCC). MATERIALS AND METHODS This study included 174 HCC patients (47 MVI-positive and 127 MVI-negative) confirmed by postoperative pathology. The synthetic minority over-sampling technique was used to augment MVI-positive samples. The amplified dataset of 254 samples (127 MVI-positive and 127 MVI-negative) was randomly divided into training and test cohorts in a 7:3 ratio. Radiomics features were respectively extracted from arterial phase, delayed phase, diffusion-weighted imaging, and fat-suppressed T2-weighted imaging. The least absolute shrinkage and selection operator was used for feature selection. Univariate and multivariate logistic regression analyses were employed to identify clinical and radiological independent predictors. The selected multi-modality MRI radiomics features, clinical and radiological characteristics were used to construct the CMTF model, single modality (SM) model, early fusion (EF) model. RESULTS The CMTF model demonstrated superior performance in predicting MVI compared to the SM and EF models. When integrating four MRI modalities, the CMTF model achieved a high area under the curve (AUC) with 95 % confidence interval (95 % CI) of 0.894 (0.820-0.968). Additionally, incorporating clinical and radiological characteristics further enhanced the predictive performance of CMTF model, the AUC (95 % CI) value increased to 0.945 (0.892-0.998). CONCLUSION The CMTF model showed promising performance in preoperative MVI prediction, providing a more effective non-invasive detection tool for HCC patients.
Collapse
Affiliation(s)
- Ao Meng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yinping Zhuang
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Qian Huang
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Li Tang
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jing Yang
- Department of Interventional Radiology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, 221006, Jiangsu, China
| | - Ping Gong
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
27
|
Liu Y, Li HD, Wang J. CrossIsoFun: predicting isoform functions using the integration of multi-omics data. Bioinformatics 2024; 41:btae742. [PMID: 39680906 PMCID: PMC11706537 DOI: 10.1093/bioinformatics/btae742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 11/16/2024] [Accepted: 12/13/2024] [Indexed: 12/18/2024] Open
Abstract
MOTIVATION Isoforms spliced from the same gene may carry distinct biological functions. Therefore, annotating functions at the isoform level provides valuable insights into the functional diversity of genomes. Since experimental approaches for determining isoform functions are time- and cost-demanding, computational methods have been proposed. In this case, multi-omics data integration helps enhance the model performance, providing complementary insights for isoform functions. However, current methods underperform in leveraging diverse omics data, primarily due to the limited power to integrate the heterogeneous feature domains. Besides, among the multi-omics data, isoform-isoform interactions (IIIs) are a key data source, as isoforms interact with each other to perform functions. Unfortunately, IIIs remain largely underutilized in isoform function predictions until now. RESULTS We introduce CrossIsoFun, a multi-omics data analysis framework for isoform function prediction. CrossIsoFun combines omics-specific and cross-omics learning for data integration and function prediction. In detail, CrossIsoFun uses a graph convolutional network (GCN) as the omics-specific classifier for each data source. The initial label predictions from GCNs are forwarded to the View Correlation Discovery Network (VCDN) and processed as a cross-omics integrative representation. The representation is then used to produce final predictions of isoform functions. In addition, an antoencoder within a cycle-consistency generative adversarial network (cycleGAN) is designed to generate IIIs from PPIs and thereby enrich the interactomics data. Our method outperforms the state-of-the-art methods on three tissue-naive datasets and 15 tissue-specific datasets with mRNA expression, sequence, and PPI data. The prediction of CrossIsoFun is further validated by its consistency with subcellular localization and isoform-level annotations with literature support. AVAILABILITY AND IMPLEMENTATION CrossIsoFun is freely available at https://github.com/genemine/CrossIsoFun.
Collapse
Affiliation(s)
- Yiwei Liu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
28
|
Wu J, Chen Z, Xiao S, Liu G, Wu W, Wang S. DeepMoIC: multi-omics data integration via deep graph convolutional networks for cancer subtype classification. BMC Genomics 2024; 25:1209. [PMID: 39695368 DOI: 10.1186/s12864-024-11112-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 12/02/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Achieving precise cancer subtype classification is imperative for effective prognosis and treatment. Multi-omics studies, encompassing diverse data modalities, have emerged as powerful tools for unraveling the complexities of cancer. However, owing to the intricacies of biological data, multi-omics datasets generally show variations in data types, scales, and distributions. These intractable problems lead to challenges in exploring intact representations from heterogeneous data, which often result in inaccuracies in multi-omics information analysis. RESULTS To address the challenges of multi-omics research, our approach DeepMoIC presents a novel framework derived from deep Graph Convolutional Network (GCN). Leveraging autoencoder modules, DeepMoIC extracts compact representations from omics data and incorporates a patient similarity network through the similarity network fusion algorithm. To handle non-Euclidean data and explore high-order omics information effectively, we design a Deep GCN module with two strategies: residual connection and identity mapping. With extracted higher-order representations, our approach consistently outperforms state-of-the-art models on a pan-cancer dataset and 3 cancer subtype datasets. CONCLUSION The introduction of Deep GCN shows encouraging performance in terms of supervised multi-omics feature learning, offering promising insights for precision medicine in cancer research. DeepMoIC can potentially be an important tool in the field of cancer subtype classification because of its capacity to handle complex multi-omics data and produce reliable classification findings.
Collapse
Affiliation(s)
- Jiecheng Wu
- College of Computer and Data Science, Fuzhou University, Fuzhou, 350108, China
| | - Zhaoliang Chen
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China
| | - Shunxin Xiao
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, China
| | - Genggeng Liu
- College of Computer and Data Science, Fuzhou University, Fuzhou, 350108, China
| | - Wenjie Wu
- Department of Ophthalmology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou University Affiliated Provincial Hospital, Fuzhou, 350001, China
| | - Shiping Wang
- College of Computer and Data Science, Fuzhou University, Fuzhou, 350108, China.
| |
Collapse
|
29
|
Wang H, Han X, Niu S, Cheng H, Ren J, Duan Y. DFASGCNS: A prognostic model for ovarian cancer prediction based on dual fusion channels and stacked graph convolution. PLoS One 2024; 19:e0315924. [PMID: 39680618 DOI: 10.1371/journal.pone.0315924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 12/03/2024] [Indexed: 12/18/2024] Open
Abstract
Ovarian cancer is a malignant tumor with different clinicopathological and molecular characteristics. Due to its nonspecific early symptoms, the majority of patients are diagnosed with local or extensive metastasis, severely affecting treatment and prognosis. The occurrence of ovarian cancer is influenced by multiple complex mechanisms including genomics, transcriptomics, and proteomics. Integrating multiple types of omics data aids in predicting the survival rate of ovarian cancer patients. However, existing methods only fuse multi-omics data at the feature level, neglecting the shared and complementary neighborhood information among samples of multi-omics data, and failing to consider the potential interactions between different omics data at the molecular level. In this paper, we propose a prognostic model for ovarian cancer prediction named Dual Fusion Channels and Stacked Graph Convolutional Neural Network (DFASGCNS). The DFASGCNS utilizes dual fusion channels to learn feature representations of different omics data and the associations between samples. Stacked graph convolutional network is used to comprehensively learn the deep and intricate correlation networks present in multi-omics data, enhancing the model's ability to represent multi-omics data. An attention mechanism is introduced to allocate different weights to important features of different omics data, optimizing the feature representation of multi-omics data. Experimental results demonstrate that compared to existing methods, the DFASGCNS model exhibits significant advantages in ovarian cancer prognosis prediction and survival analysis. Kaplan-Meier curve analysis results indicate significant differences in the survival subgroups predicted by the DFASGCNS model, contributing to a deeper understanding of the pathogenesis of ovarian cancer and providing more reliable auxiliary diagnostic information for the prognosis assessment of ovarian cancer patients.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, China
| | - Xiao Han
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, China
| | - Shuaijun Niu
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, China
| | - Hao Cheng
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, China
| | - Jianxue Ren
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, China
| | - Yimeng Duan
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, China
| |
Collapse
|
30
|
Wu B, Xiong H, Zhuo L, Xiao Y, Yan J, Yang W. Multi-view BLUP: a promising solution for post-omics data integrative prediction. J Genet Genomics 2024:S1673-8527(24)00332-1. [PMID: 39645028 DOI: 10.1016/j.jgg.2024.11.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 11/27/2024] [Accepted: 11/27/2024] [Indexed: 12/09/2024]
Abstract
Phenotypic prediction is a promising strategy for accelerating plant breeding. Data from multiple sources (called multi-view data) can provide complementary information to characterize a biological object from various aspects. By integrating multi-view information into phenotypic prediction, a multi-view best linear unbiased prediction (MVBLUP) method is proposed in this paper. To measure the importance of multiple data views, the differential evolution algorithm with an early stopping mechanism is used, by which we obtain a multi-view kinship matrix and then incorporate it into the BLUP model for phenotypic prediction. To further illustrate the characteristics of MVBLUP, we perform the empirical experiments on four multi-view datasets in different crops. Compared to the single-view method, the prediction accuracy of the MVBLUP method has improved by 0.038-0.201 on average. The results demonstrate that the MVBLUP is an effective integrative prediction method for multi-view data.
Collapse
Affiliation(s)
- Bingjie Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Huijuan Xiong
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Lin Zhuo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Wenyu Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China.
| |
Collapse
|
31
|
Du L, Gao P, Liu Z, Yin N, Wang X. TMODINET: A trustworthy multi-omics dynamic learning integration network for cancer diagnostic. Comput Biol Chem 2024; 113:108202. [PMID: 39243551 DOI: 10.1016/j.compbiolchem.2024.108202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/23/2024] [Accepted: 08/31/2024] [Indexed: 09/09/2024]
Abstract
Multiple types of omics data contain a wealth of biomedical information which reflect different aspects of clinical samples. Multi-omics integrated analysis is more likely to lead to more accurate clinical decisions. Existing cancer diagnostic methods based on multi-omics data integration mainly focus on the classification accuracy of the model, while neglecting the interpretability of the internal mechanism and the reliability of the results, which are crucial in specific domains such as precision medicine and the life sciences. To overcome this limitation, we propose a trustworthy multi-omics dynamic learning framework (TMODINET) for cancer diagnostic. The framework employs multi-omics adaptive dynamic learning to process each sample to provide patient-centered personality diagnosis by using self-attentional learning of features and modalities. To characterize the correlation between samples well, we introduce a graph dynamic learning method which can adaptively adjust the graph structure according to the specific classification results for specific graph convolutional networks (GCN) learning. Moreover, we utilize an uncertainty mechanism by employing Dirichlet distribution and Dempster-Shafer theory to obtain uncertainty and integrate multi-omics data at the decision level, ensuring trustworthy for cancer diagnosis. Extensive experiments on four real-world multimodal medical datasets are conducted. Compared to state-of-the-art methods, the superior performance and trustworthiness of our proposed algorithm are clearly validated. Our model has great potential for clinical diagnosis.
Collapse
Affiliation(s)
- Ling Du
- Department of Software, Tiangong University, Tianjin, China.
| | - Peipei Gao
- Department of Computer Science and Technology, Tiangong University, Tianjin, China.
| | - Zhuang Liu
- School of FinTech, Research Center of Applied Finance Dongbei University of Finance & Economics, Dalian, China.
| | - Nan Yin
- Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.
| | - Xiaochao Wang
- Department of Mathematical Sciences, Tiangong University, Tianjin, China.
| |
Collapse
|
32
|
Chen F, Peng W, Dai W, Wei S, Fu X, Liu L, Liu L. Supervised graph contrastive learning for cancer subtype identification through multi-omics data integration. Health Inf Sci Syst 2024; 12:12. [PMID: 38404715 PMCID: PMC10891026 DOI: 10.1007/s13755-024-00274-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 01/09/2024] [Indexed: 02/27/2024] Open
Abstract
Cancer is one of the most deadly diseases in the world. Accurate cancer subtype classification is critical for patient diagnosis, treatment, and prognosis. Ever-increasing multi-omics data describes the characteristics of the patients from different views and serves as complementary information to promote cancer subtype identification. However, omics data generally have different distributions and high dimensions. How to effectively integrate multiple omics data to classify cancer subtypes accurately is a challenge for researchers. This work proposes a method integrating multi-omics data based on supervised graph contrast learning (MCRGCN) to classify cancer subtypes. The method considers the unique feature distribution of each omics data and the interaction of different omics data features to improve the accuracy of cancer subtype classification. To achieve this, MCRGCN first constructs different sample networks based on the multi-omics data of the samples. Then, it puts the omics data and adjacency matrix of the sample into different residual graph convolution models to get multi-omics features of the samples, which are trained with a supervised comparison loss to maintain that the sample features of each omics should be as consistent as possible. Finally, we input the sample features combining multi-omics features into a classifier to obtain the cancer subtypes. We applied MCRGCN to the invasive breast carcinoma (BRCA) and glioblastoma multiforme (GBM) datasets, integrating gene expression, miRNA expression, and DNA methylation data. The results demonstrate that our model is superior to other methods in integrating multi-omics data. Moreover, the results of survival analysis experiments demonstrate that the cancer subtypes identified by our model have significant clinical features. Furthermore, our model can help to identify potential biomarkers and pathways associated with cancer subtypes.
Collapse
Affiliation(s)
- Fangxu Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Shoulin Wei
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| |
Collapse
|
33
|
Ma Q, Jiang H, Tan S, You F, Zheng C, Wang Q, Ren Y. Emerging trends and hotspots in lung cancer-prediction models research. Ann Med Surg (Lond) 2024; 86:7178-7192. [PMID: 39649903 PMCID: PMC11623829 DOI: 10.1097/ms9.0000000000002648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Accepted: 10/02/2024] [Indexed: 12/11/2024] Open
Abstract
Objective In recent years, lung cancer-prediction models have become popular. However, few bibliometric analyses have been performed in this field. Methods This study aimed to reveal the scientific output and trends in lung cancer-prediction models from a global perspective. In this study, publications were retrieved and extracted from the Web of Science Core Collection (WoSCC) database. CiteSpace 6.1.R3 and VOSviewer 1.6.18 were used to analyze hotspots and theme trends. Results A marked increase in the number of publications related to lung cancer-prediction models was observed. A total of 2711 institutions from in 64 countries/regions published 2139 documents in 566 academic journals. China and the United States were the leading country in the field of lung cancer-prediction models. The institutions represented by Fudan University had significant academic influence in the field. Analysis of keywords revealed that lncRNA, tumor microenvironment, immune, cancer statistics, The Cancer Genome Atlas, nomogram, and machine learning were the current focus of research in lung cancer-prediction models. Conclusions Over the last two decades, research on risk-prediction models for lung cancer has attracted increasing attention. Prognosis, machine learning, and multi-omics technologies are both current hotspots and future trends in this field. In the future, in-depth explorations using different omics should increase the sensitivity and accuracy of lung cancer-prediction models and reduce the global burden of lung cancer.
Collapse
Affiliation(s)
- Qiong Ma
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
| | - Hua Jiang
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
| | - Shiyan Tan
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
| | - Fengming You
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
- TCM Regulating Metabolic Diseases Key Laboratory of Sichuan Province, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
| | - Chuan Zheng
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
- TCM Regulating Metabolic Diseases Key Laboratory of Sichuan Province, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
| | - Qian Wang
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
- TCM Regulating Metabolic Diseases Key Laboratory of Sichuan Province, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
| | - Yifeng Ren
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
- TCM Regulating Metabolic Diseases Key Laboratory of Sichuan Province, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan Province, China
| |
Collapse
|
34
|
Abdelaziz EH, Ismail R, Mabrouk MS, Amin E. Multi-omics data integration and analysis pipeline for precision medicine: Systematic review. Comput Biol Chem 2024; 113:108254. [PMID: 39447405 DOI: 10.1016/j.compbiolchem.2024.108254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 09/05/2024] [Accepted: 10/14/2024] [Indexed: 10/26/2024]
Abstract
Precision medicine has gained considerable popularity since the "one-size-fits-all" approach did not seem very effective or reflective of the complexity of the human body. Subsequently, since single-omics does not reflect the complexity of the human body's inner workings, it did not result in the expected advancement in the medical field. Therefore, the multi-omics approach has emerged. The multi-omics approach involves integrating data from different omics technologies, such as DNA sequencing, RNA sequencing, mass spectrometry, and others, using computational methods and then analyzing the integrated result for different downstream analysis applications such as survival analysis, cancer classification, or biomarker identification. Most of the recent reviews were constrained to discussing one aspect of the multi-omics analysis pipeline, such as the dimensionality reduction step, the integration methods, or the interpretability aspect; however, very few provide a comprehensive review of every step of the analysis. This study aims to give an overview of the multi-omics analysis pipeline, starting with the most popular multi-omics databases used in recent literature, dimensionality reduction techniques, details the different types of data integration techniques and their downstream analysis applications, describes the most commonly used evaluation metrics, highlights the importance of model interpretability, and lastly discusses the challenges and potential future work for multi-omics data integration in precision medicine.
Collapse
Affiliation(s)
| | - Rasha Ismail
- Faculty of Computer and Information Sciences, Ainshams University, Cairo, Egypt.
| | - Mai S Mabrouk
- Information Technology and Computer Science School, Nile University, Cairo, Egypt.
| | - Eman Amin
- Faculty of Computer and Information Sciences, Ainshams University, Cairo, Egypt.
| |
Collapse
|
35
|
Bu Y, Liang J, Li Z, Wang J, Wang J, Yu G. Cancer molecular subtyping using limited multi-omics data with missingness. PLoS Comput Biol 2024; 20:e1012710. [PMID: 39724112 PMCID: PMC11709273 DOI: 10.1371/journal.pcbi.1012710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 01/08/2025] [Accepted: 12/10/2024] [Indexed: 12/28/2024] Open
Abstract
Diagnosing cancer subtypes is a prerequisite for precise treatment. Existing multi-omics data fusion-based diagnostic solutions build on the requisite of sufficient samples with complete multi-omics data, which is challenging to obtain in clinical applications. To address the bottleneck of collecting sufficient samples with complete data in clinical applications, we proposed a flexible integrative model (CancerSD) to diagnose cancer subtype using limited samples with incomplete multi-omics data. CancerSD designs contrastive learning tasks and masking-and-reconstruction tasks to reliably impute missing omics, and fuses available omics data with the imputed ones to accurately diagnose cancer subtypes. To address the issue of limited clinical samples, it introduces a category-level contrastive loss to extend the meta-learning framework, effectively transferring knowledge from external datasets to pretrain the diagnostic model. Experiments on benchmark datasets show that CancerSD not only gives accurate diagnosis, but also maintains a high authenticity and good interpretability. In addition, CancerSD identifies important molecular characteristics associated with cancer subtypes, and it defines the Integrated CancerSD Score that can serve as an independent predictive factor for patient prognosis.
Collapse
Affiliation(s)
- Yongqi Bu
- School of Software, Shandong University, Jinan, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong, China
| | - Jiaxuan Liang
- School of Software, Shandong University, Jinan, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong, China
| | - Zhen Li
- Department of Gastroenterology, Qilu Hospital of Shandong University, Jinan, Shandong, China
| | - Jianbo Wang
- Department of Radiation Oncology, Qilu Hospital of Shandong University, Jinan, Shandong, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong, China
| |
Collapse
|
36
|
Wang FA, Li Y, Zeng T. Deep Learning of radiology-genomics integration for computational oncology: A mini review. Comput Struct Biotechnol J 2024; 23:2708-2716. [PMID: 39035833 PMCID: PMC11260400 DOI: 10.1016/j.csbj.2024.06.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/18/2024] [Accepted: 06/18/2024] [Indexed: 07/23/2024] Open
Abstract
In the field of computational oncology, patient status is often assessed using radiology-genomics, which includes two key technologies and data, such as radiology and genomics. Recent advances in deep learning have facilitated the integration of radiology-genomics data, and even new omics data, significantly improving the robustness and accuracy of clinical predictions. These factors are driving artificial intelligence (AI) closer to practical clinical applications. In particular, deep learning models are crucial in identifying new radiology-genomics biomarkers and therapeutic targets, supported by explainable AI (xAI) methods. This review focuses on recent developments in deep learning for radiology-genomics integration, highlights current challenges, and outlines some research directions for multimodal integration and biomarker discovery of radiology-genomics or radiology-omics that are urgently needed in computational oncology.
Collapse
Affiliation(s)
- Feng-ao Wang
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| | - Yixue Li
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- Guangzhou National Laboratory, Guangzhou, China
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
| | - Tao Zeng
- Guangzhou National Laboratory, Guangzhou, China
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
37
|
Pan L, Wang X, Liang Q, Shang J, Liu W, Xu L, Peng S. DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 257:108478. [PMID: 39504713 DOI: 10.1016/j.cmpb.2024.108478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 10/03/2024] [Accepted: 10/23/2024] [Indexed: 11/08/2024]
Abstract
BACKGROUND AND OBJECTIVE Given the high heterogeneity and clinical diversity of cancer, substantial variations exist in multi-omics data and clinical features across different cancer subtypes. METHODS We propose a model, named DEDUCE, based on a symmetric multi-head attention encoders (SMAE), for unsupervised contrastive learning to analyze multi-omics cancer data, with the aim of identifying and characterizing cancer subtypes. This model adopts a unsupervised SMAE that can deeply extract contextual features and long-range dependencies from multi-omics data, thereby mitigating the impact of noise. Importantly, DEDUCE introduces a subtype decoupled contrastive learning method based on a multi-head attention mechanism to simultaneously learn features from multi-omics data and perform clustering for identifying cancer subtypes. Subtypes are clustered by calculating the similarity between samples in both the feature space and sample space of multi-omics data. The fundamental concept involves decoupling various attributes of multi-omics data features and learning them as contrasting terms. A contrastive loss function is constructed to quantify the disparity between positive and negative examples, and the model minimizes this difference, thereby promoting the acquisition of enhanced feature representation. RESULTS The DEDUCE model undergoes extensive experiments on simulated multi-omics datasets, single-cell multi-omics datasets, and cancer multi-omics datasets, outperforming 10 deep learning models. The DEDUCE model outperforms state-of-the-art methods, and ablation experiments demonstrate the effectiveness of each module in the DEDUCE model. Finally, we applied the DEDUCE model to identify six cancer subtypes of AML. CONCLUSION In this paper, we proposed DEDUCE model learns features from multi-omics data through SMAE, and the subtype decoupled contrastive learning consistently optimizes the model for clustering and identifying cancer subtypes. The DEDUCE model demonstrates a significant capability in discovering new cancer subtypes. We applied the DEDUCE model to identify six subtypes of AML. Through the analysis of GO function enrichment, subtype-specific biological functions, and GSEA of AML using the DEDUCE model, the interpretability of the DEDUCE model in identifying cancer subtypes is further enhanced.
Collapse
Affiliation(s)
- Liangrui Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Xiang Wang
- Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410083, Hunan, China.
| | - Qingchun Liang
- Department of Pathology, The Second Xiangya Hospital, Central South University, Changsha, 410083, Hunan, China.
| | - Jiandong Shang
- National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou, 450001, Henan, China.
| | - Wenjuan Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Liwen Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| |
Collapse
|
38
|
Hayes CN, Nakahara H, Ono A, Tsuge M, Oka S. From Omics to Multi-Omics: A Review of Advantages and Tradeoffs. Genes (Basel) 2024; 15:1551. [PMID: 39766818 PMCID: PMC11675490 DOI: 10.3390/genes15121551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2024] [Revised: 11/25/2024] [Accepted: 11/28/2024] [Indexed: 01/11/2025] Open
Abstract
Bioinformatics is a rapidly evolving field charged with cataloging, disseminating, and analyzing biological data. Bioinformatics started with genomics, but while genomics focuses more narrowly on the genes comprising a genome, bioinformatics now encompasses a much broader range of omics technologies. Overcoming barriers of scale and effort that plagued earlier sequencing methods, bioinformatics adopted an ambitious strategy involving high-throughput and highly automated assays. However, as the list of omics technologies continues to grow, the field of bioinformatics has changed in two fundamental ways. Despite enormous success in expanding our understanding of the biological world, the failure of bulk methods to account for biologically important variability among cells of the same or different type has led to a major shift toward single-cell and spatially resolved omics methods, which attempt to disentangle the conflicting signals contained in heterogeneous samples by examining individual cells or cell clusters. The second major shift has been the attempt to integrate two or more different classes of omics data in a single multimodal analysis to identify patterns that bridge biological layers. For example, unraveling the cause of disease may reveal a metabolite deficiency caused by the failure of an enzyme to be phosphorylated because a gene is not expressed due to aberrant methylation as a result of a rare germline variant. Conclusions: There is a fine line between superficial understanding and analysis paralysis, but like a detective novel, multi-omics increasingly provides the clues we need, if only we are able to see them.
Collapse
Affiliation(s)
- C. Nelson Hayes
- Department of Gastroenterology, Graduate School of Biomedical & Health Sciences, Hiroshima University, Hiroshima 734-8551, Japan; (A.O.); (M.T.); (S.O.)
| | - Hikaru Nakahara
- Department of Clinical and Molecular Genetics, Hiroshima University, Hiroshima 734-8551, Japan;
| | - Atsushi Ono
- Department of Gastroenterology, Graduate School of Biomedical & Health Sciences, Hiroshima University, Hiroshima 734-8551, Japan; (A.O.); (M.T.); (S.O.)
| | - Masataka Tsuge
- Department of Gastroenterology, Graduate School of Biomedical & Health Sciences, Hiroshima University, Hiroshima 734-8551, Japan; (A.O.); (M.T.); (S.O.)
- Liver Center, Hiroshima University, Hiroshima 734-8551, Japan
| | - Shiro Oka
- Department of Gastroenterology, Graduate School of Biomedical & Health Sciences, Hiroshima University, Hiroshima 734-8551, Japan; (A.O.); (M.T.); (S.O.)
| |
Collapse
|
39
|
Tang X, Prodduturi N, Thompson K, Weinshilboum R, O’Sullivan C, Boughey J, Tizhoosh H, Klee E, Wang L, Goetz M, Suman V, Kalari K. OmicsFootPrint: a framework to integrate and interpret multi-omics data using circular images and deep neural networks. Nucleic Acids Res 2024; 52:e99. [PMID: 39445795 PMCID: PMC11602161 DOI: 10.1093/nar/gkae915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 08/14/2024] [Accepted: 10/07/2024] [Indexed: 10/25/2024] Open
Abstract
The OmicsFootPrint framework addresses the need for advanced multi-omics data analysis methodologies by transforming data into intuitive two-dimensional circular images and facilitating the interpretation of complex diseases. Utilizing deep neural networks and incorporating the SHapley Additive exPlanations algorithm, the framework enhances model interpretability. Tested with The Cancer Genome Atlas data, OmicsFootPrint effectively classified lung and breast cancer subtypes, achieving high area under the curve (AUC) scores-0.98 ± 0.02 for lung cancer subtype differentiation and 0.83 ± 0.07 for breast cancer PAM50 subtypes, and successfully distinguished between invasive lobular and ductal carcinomas in breast cancer, showcasing its robustness. It also demonstrated notable performance in predicting drug responses in cancer cell lines, with a median AUC of 0.74, surpassing nine existing methods. Furthermore, its effectiveness persists even with reduced training sample sizes. OmicsFootPrint marks an enhancement in multi-omics research, offering a novel, efficient and interpretable approach that contributes to a deeper understanding of disease mechanisms.
Collapse
Affiliation(s)
- Xiaojia Tang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Naresh Prodduturi
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Kevin J Thompson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Richard Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | | | - Judy C Boughey
- Department of Surgery, Mayo Clinic, Rochester, MN 55905, USA
| | - Hamid R Tizhoosh
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55905, USA
| | - Eric W Klee
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Matthew P Goetz
- Department of Oncology, Mayo Clinic, Rochester, MN 55905, USA
| | - Vera Suman
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Krishna R Kalari
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
40
|
Yan F, Chen B, Ma Z, Chen Q, Jin Z, Wang Y, Qu F, Meng Q. Exploring molecular mechanisms of postoperative delirium through multi-omics strategies in plasma exosomes. Sci Rep 2024; 14:29466. [PMID: 39604493 PMCID: PMC11603267 DOI: 10.1038/s41598-024-80865-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 11/22/2024] [Indexed: 11/29/2024] Open
Abstract
Currently, the diagnosis of delirium is solely based on clinical observation, lacking objective diagnostic tools, and the regulatory networks and pathological mechanisms behind it are not yet fully understood. Exosomes have garnered considerable interest as potential biomarkers for a variety of illnesses. This research aimed to delineate both the proteomic and metabolomic landscapes inherent to exosomes, assessing their diagnostic utility in postoperative delirium (POD) and understanding the underlying pathophysiological frameworks. Integrated analyses of proteomics and metabolomics were conducted on exosomes derived from plasma of individuals from both the non-postoperative delirium (NPOD) control group and the POD group. Subsequently, the study utilized the Connectivity Map (CMap) methodology for the identification of promising small-molecule drugs and carried out molecular docking assessments to explore the binding affinities with the enzyme MMP9 of these identified molecules. We identified significant differences in exosomal metabolites and proteins between the POD and control groups, highlighting pathways related to neuroinflammation and blood-brain barrier (BBB) integrity. Our CMap analysis identified potential small-molecule therapeutics, and molecular docking studies revealed two compounds with high affinity to MMP9, suggesting a new therapeutic avenue for POD. This study highlights MMP9, TLR2, ICAM1, S100B, and glutamate as key biomarkers in the pathophysiology of POD, emphasizing the roles of neuroinflammation and BBB integrity. Notably, molecular docking suggests mirin and orantinib as potential inhibitors targeting MMP9, providing new therapeutic avenues. The findings broaden our understanding of POD mechanisms and suggest targeted strategies for its management, reinforcing the importance of multidimensional biomarker analysis and molecular targeting in POD intervention.
Collapse
Affiliation(s)
- Fuhui Yan
- School of Clinical Medicine, Jining Medical University, Jining, China
| | - Bowang Chen
- Department of Intensive Care Unit, Affiliated Jining First People's Hospital of Shandong First Medical University, Jining, Shandong, China
| | - Zhen Ma
- Department of Intensive Care Unit, Affiliated Jining First People's Hospital of Shandong First Medical University, Jining, Shandong, China
| | - Qirong Chen
- Department of Intensive Care Unit, Affiliated Jining First People's Hospital of Shandong First Medical University, Jining, Shandong, China
| | - Zhi Jin
- Department of Intensive Care Unit, Affiliated Jining First People's Hospital of Shandong First Medical University, Jining, Shandong, China
| | - Yujie Wang
- School of Clinical and Basic Medical Sciences, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, People's Republic of China
| | - Feng Qu
- Department of Intensive Care Unit, Affiliated Jining First People's Hospital of Shandong First Medical University, Jining, Shandong, China.
| | - Qiang Meng
- Department of Intensive Care Unit, Affiliated Jining First People's Hospital of Shandong First Medical University, Jining, Shandong, China.
| |
Collapse
|
41
|
Briscik M, Tazza G, Vidács L, Dillies MA, Déjean S. Supervised multiple kernel learning approaches for multi-omics data integration. BioData Min 2024; 17:53. [PMID: 39580456 PMCID: PMC11585117 DOI: 10.1186/s13040-024-00406-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 11/14/2024] [Indexed: 11/25/2024] Open
Abstract
BACKGROUND Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining. RESULTS We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches. CONCLUSION Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.
Collapse
Affiliation(s)
- Mitja Briscik
- Institut de Mathématiques de Toulouse, UMR5219, CNRS, UPS, Université de Toulouse, Cedex 9, Toulouse, 31062, France.
| | - Gabriele Tazza
- Department of Computer Science, Applied Artificial Intelligence Group , University of Szeged, Szeged, 6720, Hungary.
| | - László Vidács
- Department of Computer Science, Applied Artificial Intelligence Group , University of Szeged, Szeged, 6720, Hungary
| | - Marie-Agnès Dillies
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015, Paris, France
| | - Sébastien Déjean
- Institut de Mathématiques de Toulouse, UMR5219, CNRS, UPS, Université de Toulouse, Cedex 9, Toulouse, 31062, France
| |
Collapse
|
42
|
Zou B, Xenakis JG, Xiao M, Ribeiro A, Divaris K, Wu D, Zou F. A deep learning feature importance test framework for integrating informative high-dimensional biomarkers to improve disease outcome prediction. Brief Bioinform 2024; 26:bbae709. [PMID: 39815828 PMCID: PMC11735761 DOI: 10.1093/bib/bbae709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 12/01/2024] [Accepted: 12/26/2024] [Indexed: 01/18/2025] Open
Abstract
Many human diseases result from a complex interplay of behavioral, clinical, and molecular factors. Integrating low-dimensional behavioral and clinical features with high-dimensional molecular profiles can significantly improve disease outcome prediction and diagnosis. However, while some biomarkers are crucial, many lack informative value. To enhance prediction accuracy and understand disease mechanisms, it is essential to integrate relevant features and identify key biomarkers, separating meaningful data from noise and modeling complex associations. To address these challenges, we introduce the High-dimensional Feature Importance Test (HdFIT) framework for machine learning models. HdFIT includes a feature screening step for dimension reduction and leverages machine learning to model complex associations between biomarkers and disease outcomes. It robustly evaluates each feature's impact. Extensive Monte Carlo experiments and a real microbiome study demonstrate HdFIT's efficacy, especially when integrated with advanced models like deep neural networks. Our framework shows significant improvements in identifying crucial features and enhancing prediction accuracy, even in high-dimensional settings.
Collapse
Affiliation(s)
- Baiming Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - James G Xenakis
- Department of Statistics, Harvard University, Cambridge, MA 02138, United States
| | - Meisheng Xiao
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Apoena Ribeiro
- School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Kimon Divaris
- School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Di Wu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| |
Collapse
|
43
|
Wang Y, Wang Z, Yu X, Wang X, Song J, Yu DJ, Ge F. MORE: a multi-omics data-driven hypergraph integration network for biomedical data classification and biomarker identification. Brief Bioinform 2024; 26:bbae658. [PMID: 39692449 DOI: 10.1093/bib/bbae658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 11/18/2024] [Accepted: 12/04/2024] [Indexed: 12/19/2024] Open
Abstract
High-throughput sequencing methods have brought about a huge change in omics-based biomedical study. Integrating various omics data is possibly useful for identifying some correlations across data modalities, thus improving our understanding of the underlying biological mechanisms and complexity. Nevertheless, most existing graph-based feature extraction methods overlook the complementary information and correlations across modalities. Moreover, these methods tend to treat the features of each omics modality equally, which contradicts current biological principles. To solve these challenges, we introduce a novel approach for integrating multi-omics data termed Multi-Omics hypeRgraph integration nEtwork (MORE). MORE initially constructs a comprehensive hyperedge group by extensively investigating the informative correlations within and across modalities. Subsequently, the multi-omics hypergraph encoding module is employed to learn the enriched omics-specific information. Afterward, the multi-omics self-attention mechanism is then utilized to adaptatively aggregate valuable correlations across modalities for representation learning and making the final prediction. We assess MORE's performance on datasets characterized by message RNA (mRNA) expression, Deoxyribonucleic Acid (DNA) methylation, and microRNA (miRNA) expression for Alzheimer's disease, invasive breast carcinoma, and glioblastoma. The results from three classification tasks highlight the competitive advantage of MORE in contrast with current state-of-the-art (SOTA) methods. Moreover, the results also show that MORE has the capability to identify a greater variety of disease-related biomarkers compared to existing methods, highlighting its advantages in biomedical data mining and interpretation. Overall, MORE can be investigated as a valuable tool for facilitating multi-omics analysis and novel biomarker discovery. Our code and data can be publicly accessed at https://github.com/Wangyuhanxx/MORE.
Collapse
Affiliation(s)
- Yuhan Wang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Wellington Rd, Clayton, Melbourne, VIC 3800, Australia
| | - Xuan Yu
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong 999077, China
| | - Xiaoyu Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Wellington Rd, Clayton, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Wellington Rd, Clayton, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Wellington Rd, Clayton, Melbourne, VIC 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Fang Ge
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan, Nanjing 210023, China
| |
Collapse
|
44
|
Liu J, Xue X, Wen P, Song Q, Yao J, Ge S. Multi-fusion strategy network-guided cancer subtypes discovering based on multi-omics data. Front Genet 2024; 15:1466825. [PMID: 39610828 PMCID: PMC11602503 DOI: 10.3389/fgene.2024.1466825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Accepted: 11/04/2024] [Indexed: 11/30/2024] Open
Abstract
Introduction The combination of next-generation sequencing technology and Cancer Genome Atlas (TCGA) data provides unprecedented opportunities for the discovery of cancer subtypes. Through comprehensive analysis and in-depth analysis of the genomic data of a large number of cancer patients, researchers can more accurately identify different cancer subtypes and reveal their molecular heterogeneity. Methods In this paper, we propose the SMMSN (Self-supervised Multi-fusion Strategy Network) model for the discovery of cancer subtypes. SMMSN can not only fuse multi-level data representations of single omics data by Graph Convolutional Network (GCN) and Stacked Autoencoder Network (SAE), but also achieve the organic fusion of multi- -omics data through multiple fusion strategies. In response to the problem of lack label information in multi-omics data, SMMSN propose to use dual self-supervise method to cluster cancer subtypes from the integrated data. Results We conducted experiments on three labeled and five unlabeled multi-omics datasets to distinguish potential cancer subtypes. Kaplan Meier survival curves and other results showed that SMMSN can obtain cancer subtypes with significant differences. Discussion In the case analysis of Glioblastoma Multiforme (GBM) and Breast Invasive Carcinoma (BIC), we conducted survival time and age distribution analysis, drug response analysis, differential expression analysis, functional enrichment analysis on the predicted cancer subtypes. The research results showed that SMMSN can discover clinically meaningful cancer subtypes.
Collapse
Affiliation(s)
- Jian Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xinzheng Xue
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Pengbo Wen
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Qian Song
- Department of Gynecology and Obstetrics, Taizhou Cancer Hospital, Wenling, China
| | - Jun Yao
- Department of Colorectal Surgery, Taizhou Cancer Hospital, Wenling, China
| | - Shuguang Ge
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
45
|
Kruta J, Carapito R, Trendelenburg M, Martin T, Rizzi M, Voll RE, Cavalli A, Natali E, Meier P, Stawiski M, Mosbacher J, Mollet A, Santoro A, Capri M, Giampieri E, Schkommodau E, Miho E. Machine learning for precision diagnostics of autoimmunity. Sci Rep 2024; 14:27848. [PMID: 39537649 PMCID: PMC11561187 DOI: 10.1038/s41598-024-76093-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 10/10/2024] [Indexed: 11/16/2024] Open
Abstract
Early and accurate diagnosis is crucial to prevent disease development and define therapeutic strategies. Due to predominantly unspecific symptoms, diagnosis of autoimmune diseases (AID) is notoriously challenging. Clinical decision support systems (CDSS) are a promising method with the potential to enhance and expedite precise diagnostics by physicians. However, due to the difficulties of integrating and encoding multi-omics data with clinical values, as well as a lack of standardization, such systems are often limited to certain data types. Accordingly, even sophisticated data models fall short when making accurate disease diagnoses and presenting data analyses in a user-friendly form. Therefore, the integration of various data types is not only an opportunity but also a competitive advantage for research and industry. We have developed an integration pipeline to enable the use of machine learning for patient classification based on multi-omics data in combination with clinical values and laboratory results. The application of our framework resulted in up to 96% prediction accuracy of autoimmune diseases with machine learning models. Our results deliver insights into autoimmune disease research and have the potential to be adapted for applications across disease conditions.
Collapse
Affiliation(s)
- Jan Kruta
- School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Hofackerstrasse 30, Muttenz, 4132, Switzerland
| | - Raphael Carapito
- Laboratoire d'ImmunoRhumatologie Moléculaire, plateforme GENOMAX, Faculté de Médecine, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Institut Thématique Interdisciplinaire TRANSPLANTEX NG, INSERM UMR_S 1109, Fédération Hospitalo-Universitaire OMICARE, Université de Strasbourg, 4 rue Kirschleger, Strasbourg, 67085, France
- Service d'Immunologie Biologique, Pôle de Biologie, Plateau Technique de Biologie, Nouvel Hôpital Civil, 1 place de l'Hôpital, Strasbourg, 67091, France
| | - Marten Trendelenburg
- Division of Internal Medicine, University Hospital Basel, Basel, 4031, Switzerland
| | - Thierry Martin
- Laboratoire d'ImmunoRhumatologie Moléculaire, plateforme GENOMAX, Faculté de Médecine, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Institut Thématique Interdisciplinaire TRANSPLANTEX NG, INSERM UMR_S 1109, Fédération Hospitalo-Universitaire OMICARE, Université de Strasbourg, 4 rue Kirschleger, Strasbourg, 67085, France
| | - Marta Rizzi
- Department of Rheumatology and Clinical Immunology, Medical Center, University of Freiburg, 79106, Freiburg, Germany
| | - Reinhard E Voll
- Department of Rheumatology and Clinical Immunology, Medical Center, University of Freiburg, 79106, Freiburg, Germany
| | - Andrea Cavalli
- FaBiT Department of Pharmacy and Biotechnology, Università di Bologna, Bologna, 40126, Italy
| | - Eriberto Natali
- School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Hofackerstrasse 30, Muttenz, 4132, Switzerland
| | - Patrick Meier
- School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Hofackerstrasse 30, Muttenz, 4132, Switzerland
| | - Marc Stawiski
- School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Hofackerstrasse 30, Muttenz, 4132, Switzerland
| | - Johannes Mosbacher
- School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Hofackerstrasse 30, Muttenz, 4132, Switzerland
| | - Annette Mollet
- Institute of Pharmaceutical Medicine, University of Basel, Basel, 4056, Switzerland
| | - Aurelia Santoro
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of Bologna, Bologna, 40126, Italy
| | - Miriam Capri
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of Bologna, Bologna, 40126, Italy
| | - Enrico Giampieri
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of Bologna, Bologna, 40126, Italy
| | - Erik Schkommodau
- School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Hofackerstrasse 30, Muttenz, 4132, Switzerland
| | - Enkelejda Miho
- School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Hofackerstrasse 30, Muttenz, 4132, Switzerland.
- SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland.
- aiNET GmbH, Lichtstrasse 35, Basel, 4056, Switzerland.
| |
Collapse
|
46
|
Liang H, Luo H, Sang Z, Jia M, Jiang X, Wang Z, Cong S, Yao X. GREMI: An Explainable Multi-Omics Integration Framework for Enhanced Disease Prediction and Module Identification. IEEE J Biomed Health Inform 2024; 28:6983-6996. [PMID: 39110558 DOI: 10.1109/jbhi.2024.3439713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
Multi-omics integration has demonstrated promising performance in complex disease prediction. However, existing research typically focuses on maximizing prediction accuracy, while often neglecting the essential task of discovering meaningful biomarkers. This issue is particularly important in biomedicine, as molecules often interact rather than function individually to influence disease outcomes. To this end, we propose a two-phase framework named GREMI to assist multi-omics classification and explanation. In the prediction phase, we propose to improve prediction performance by employing a graph attention architecture on sample-wise co-functional networks to incorporate biomolecular interaction information for enhanced feature representation, followed by the integration of a joint-late mixed strategy and the true-class-probability block to adaptively evaluate classification confidence at both feature and omics levels. In the interpretation phase, we propose a multi-view approach to explain disease outcomes from the interaction module perspective, providing a more intuitive understanding and biomedical rationale. We incorporate Monte Carlo tree search (MCTS) to explore local-view subgraphs and pinpoint modules that highly contribute to disease characterization from the global-view. Extensive experiments demonstrate that the proposed framework outperforms state-of-the-art methods in seven different classification tasks, and our model effectively addresses data mutual interference when the number of omics types increases. We further illustrate the functional- and disease-relevance of the identified modules, as well as validate the classification performance of discovered modules using an independent cohort.
Collapse
|
47
|
Zhang D, Nayak R, Bashar MA. Pre-gating and contextual attention gate - A new fusion method for multi-modal data tasks. Neural Netw 2024; 179:106553. [PMID: 39053303 DOI: 10.1016/j.neunet.2024.106553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 01/29/2024] [Accepted: 07/16/2024] [Indexed: 07/27/2024]
Abstract
Multi-modal representation learning has received significant attention across diverse research domains due to its ability to model a scenario comprehensively. Learning the cross-modal interactions is essential to combining multi-modal data into a joint representation. However, conventional cross-attention mechanisms can produce noisy and non-meaningful values in the absence of useful cross-modal interactions among input features, thereby introducing uncertainty into the feature representation. These factors have the potential to degrade the performance of downstream tasks. This paper introduces a novel Pre-gating and Contextual Attention Gate (PCAG) module for multi-modal learning comprising two gating mechanisms that operate at distinct information processing levels within the deep learning model. The first gate filters out interactions that lack informativeness for the downstream task, while the second gate reduces the uncertainty introduced by the cross-attention module. Experimental results on eight multi-modal classification tasks spanning various domains show that the multi-modal fusion model with PCAG outperforms state-of-the-art multi-modal fusion models. Additionally, we elucidate how PCAG effectively processes cross-modality interactions.
Collapse
Affiliation(s)
- Duoyi Zhang
- Centre for Data Science, School of Computer Science, Queensland University of Technology, 4000, Brisbane, Australia.
| | - Richi Nayak
- Centre for Data Science, School of Computer Science, Queensland University of Technology, 4000, Brisbane, Australia.
| | - Md Abul Bashar
- Centre for Data Science, School of Computer Science, Queensland University of Technology, 4000, Brisbane, Australia.
| |
Collapse
|
48
|
Tao L, Xie Y, Deng JD, Shen H, Deng HW, Zhou W, Zhao C. SGUQ: Staged Graph Convolution Neural Network for Alzheimer's Disease Diagnosis using Multi-Omics Data. ARXIV 2024:arXiv:2410.11046v1. [PMID: 39483351 PMCID: PMC11527097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Alzheimer's disease (AD) is a chronic neurodegenerative disorder and the leading cause of dementia, significantly impacting cost, mortality, and burden worldwide. The advent of high-throughput omics technologies, such as genomics, transcriptomics, proteomics, and epigenomics, has revolutionized the molecular understanding of AD. Conventional AI approaches typically require the completion of all omics data at the outset to achieve optimal AD diagnosis, which are inefficient and may be unnecessary. To reduce the clinical cost and improve the accuracy of AD diagnosis using multi-omics data, we propose a novel staged graph convolutional network with uncertainty quantification (SGUQ). SGUQ begins with mRNA and progressively incorporates DNA methylation and miRNA data only when necessary, reducing overall costs and exposure to harmful tests. Experimental results indicate that 46.23% of the samples can be reliably predicted using only single-modal omics data (mRNA), while an additional 16.04% of the samples can achieve reliable predictions when combining two omics data types (mRNA + DNA methylation). In addition, the proposed staged SGUQ achieved an accuracy of 0.858 on ROSMAP dataset, which outperformed existing methods significantly. The proposed SGUQ can not only be applied to AD diagnosis using multi-omics data, but also has the potential for clinical decision making using multi-viewed data. Our implementation is publicly available at https://github.com/chenzhao2023/multiomicsuncertainty.
Collapse
Affiliation(s)
- Liang Tao
- Department of Computer Science, Kennesaw State University, Marietta, GA 30060
| | - Yixin Xie
- Department of Information Technology, Kennesaw State University, Marietta, GA, 30060
| | - Jeffrey D Deng
- Geisel School of Medicine at Dartmouth College, Hamover, NH 03755
| | - Hui Shen
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112
| | - Hong-Wen Deng
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112
| | - Weihua Zhou
- Department of Applied Computing, Michigan Technological University, Houghton, MI, 49931
- Center for Biocomputing and Digital Health, Institute of Computing and Cybersystems, and Health Research Institute, Michigan Technological University, Houghton, MI 49931
| | - Chen Zhao
- Department of Computer Science, Kennesaw State University, Marietta, GA 30060
| |
Collapse
|
49
|
Zhang H, Cao D, Chen Z, Zhang X, Chen Y, Sessions C, Cruchaga C, Payne P, Li G, Province M, Li F. mosGraphGen: a novel tool to generate multi-omics signaling graphs to facilitate integrative and interpretable graph AI model development. BIOINFORMATICS ADVANCES 2024; 4:vbae151. [PMID: 39506989 PMCID: PMC11540438 DOI: 10.1093/bioadv/vbae151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 08/22/2024] [Accepted: 10/04/2024] [Indexed: 11/08/2024]
Abstract
Motivation Multi-omics data, i.e. genomics, epigenomics, transcriptomics, proteomics, characterize cellular complex signaling systems from multi-level and multi-view and provide a holistic view of complex cellular signaling pathways. However, it remains challenging to integrate and interpret multi-omics data for mining critical biomarkers. Graph AI models have been widely used to analyze graph-structure datasets, and are ideal for integrative multi-omics data analysis because they can naturally integrate and represent multi-omics data as a biologically meaningful multi-level signaling graph and interpret multi-omics data via graph node and edge ranking analysis. Nevertheless, it is nontrivial for graph-AI model developers to pre-analyze multi-omics data and convert the data into biologically meaningful graphs, which can be directly fed into graph-AI models. Results To resolve this challenge, we developed mosGraphGen (multi-omics signaling graph generator), generating Multi-omics Signaling graphs (mos-graph) of individual samples by mapping multi-omics data onto a biologically meaningful multi-level background signaling network with data normalization by aggregating measurements and aligning to the reference genome. With mosGraphGen, AI model developers can directly apply and evaluate their models using these mos-graphs. In the results, mosGraphGen was used and illustrated using two widely used multi-omics datasets of The Cancer Genome Atlas (TCGA) and Alzheimer's disease (AD) samples. Availability and implementation The code of mosGraphGen is open-source and publicly available via GitHub: https://github.com/FuhaiLiAiLab/mosGraphGen.
Collapse
Affiliation(s)
- Heming Zhang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Dekang Cao
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Zirui Chen
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Xiuyuan Zhang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Yixin Chen
- Department of Computer Science and Engineering, Washington University in St. Louis, Saint Louis, MO 63130, United States
| | - Cole Sessions
- Department of Pediatrics, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
- NeuroGenomics and Informatics Center, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Philip Payne
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Guangfu Li
- Department of Surgery, School of Medicine, University of Connecticut, Farmington, CT 06030, United States
| | - Michael Province
- Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Fuhai Li
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
- Department of Pediatrics, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
- NeuroGenomics and Informatics Center, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| |
Collapse
|
50
|
Nagpal S, Srivastava SK. Colon or semicolon: gut sampling microdevices for omics insights. NPJ Biofilms Microbiomes 2024; 10:97. [PMID: 39358351 PMCID: PMC11447266 DOI: 10.1038/s41522-024-00536-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 07/19/2024] [Indexed: 10/04/2024] Open
Abstract
Ingestible microdevices represent a breakthrough in non-invasive sampling of the human gastrointestinal (GI) tract. By capturing the native spatiotemporal microbiome and intricate biochemical gradients, these devices allow a non-invasive multi-omic access to the unperturbed host-microbiota crosstalk, immune/nutritional landscapes and gut-organ connections. We present the current progress of GI sampling microdevices towards personalized metabolism and fostering collaboration among clinicians, engineers, and data scientists.
Collapse
Affiliation(s)
- Sunil Nagpal
- TCS Research, Tata Consultancy Services Ltd, Pune, India
- CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi, India
| | - Sarvesh Kumar Srivastava
- Centre for Biomedical Engineering, Indian Institute of Technology Delhi, New Delhi, India.
- Department of Biomedical Engineering, All India Institute of Medical Sciences, New Delhi, India.
| |
Collapse
|