1
|
Anyaegbunam UA, Vagiona AC, ten Cate V, Bauer K, Schmidlin T, Distler U, Tenzer S, Araldi E, Bindila L, Wild P, Andrade-Navarro MA. A Map of the Lipid-Metabolite-Protein Network to Aid Multi-Omics Integration. Biomolecules 2025; 15:484. [PMID: 40305217 PMCID: PMC12024871 DOI: 10.3390/biom15040484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Revised: 03/13/2025] [Accepted: 03/20/2025] [Indexed: 05/02/2025] Open
Abstract
The integration of multi-omics data offers transformative potential for elucidating complex molecular mechanisms underlying biological processes and diseases. In this study, we developed a lipid-metabolite-protein network that combines a protein-protein interaction network and enzymatic and genetic interactions of proteins with metabolites and lipids to provide a unified framework for multi-omics integration. Using hyperbolic embedding, the network visualizes connections across omics layers, accessible through a user-friendly Shiny R (version 1.10.0) software package. This framework ranks molecules across omics layers based on functional proximity, enabling intuitive exploration. Application in a cardiovascular disease (CVD) case study identified lipids and metabolites associated with CVD-related proteins. The analysis confirmed known associations, like cholesterol esters and sphingomyelin, and highlighted potential novel biomarkers, such as 4-imidazoleacetate and indoleacetaldehyde. Furthermore, we used the network to analyze empagliflozin's temporal effects on lipid metabolism. Functional enrichment analysis of proteins associated with lipid signatures revealed dynamic shifts in biological processes, with early effects impacting phospholipid metabolism and long-term effects affecting sphingolipid biosynthesis. Our framework offers a versatile tool for hypothesis generation, functional analysis, and biomarker discovery. By bridging molecular layers, this approach advances our understanding of disease mechanisms and therapeutic effects, with broad applications in computational biology and precision medicine.
Collapse
Affiliation(s)
- Uchenna Alex Anyaegbunam
- Computational Biology and Data Mining Group (CBDM), Institute of Organismic and Molecular Evolution (iOME), Johannes Gutenberg University, 55122 Mainz, Germany
| | - Aimilia-Christina Vagiona
- Computational Biology and Data Mining Group (CBDM), Institute of Organismic and Molecular Evolution (iOME), Johannes Gutenberg University, 55122 Mainz, Germany
| | - Vincent ten Cate
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center, Johannes-Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
- Clinical Epidemiology and Systems Medicine, Center for Thrombosis and Hemostasis (CTH), University Medical Center, 55131 Mainz, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Rhine Main, University Medical Center, Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
| | - Katrin Bauer
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center, Johannes-Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Rhine Main, University Medical Center, Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
- Computational Systems Medicine, Center for Thrombosis and Hemostasis (CTH), 55131 Mainz, Germany
| | - Thierry Schmidlin
- Institute of Immunology, University Medical Center, Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
- Research Centre for Immunotherapy (FZI), University Medical Center, Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
| | - Ute Distler
- Institute of Immunology, University Medical Center, Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
- Research Centre for Immunotherapy (FZI), University Medical Center, Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
| | - Stefan Tenzer
- Institute of Immunology, University Medical Center, Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
- Research Centre for Immunotherapy (FZI), University Medical Center, Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
| | - Elisa Araldi
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center, Johannes-Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Rhine Main, University Medical Center, Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
- Computational Systems Medicine, Center for Thrombosis and Hemostasis (CTH), 55131 Mainz, Germany
- Systems Medicine Laboratory, Department of Medicine and Surgery, University of Parma, 43121 Parma, Italy
| | - Laura Bindila
- Institute of Physiological Chemistry, University Medical Center, 55131 Mainz, Germany
| | - Philipp Wild
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center, Johannes-Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
- Clinical Epidemiology and Systems Medicine, Center for Thrombosis and Hemostasis (CTH), University Medical Center, 55131 Mainz, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Rhine Main, University Medical Center, Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
| | - Miguel A. Andrade-Navarro
- Computational Biology and Data Mining Group (CBDM), Institute of Organismic and Molecular Evolution (iOME), Johannes Gutenberg University, 55122 Mainz, Germany
| |
Collapse
|
2
|
Li Z, Chen W, Zhong H, Liang C. PCLSurv: a prototypical contrastive learning-based multi-omics data integration model for cancer survival prediction. Brief Bioinform 2025; 26:bbaf124. [PMID: 40127182 PMCID: PMC11932092 DOI: 10.1093/bib/bbaf124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 02/23/2025] [Accepted: 03/01/2025] [Indexed: 03/26/2025] Open
Abstract
Accurate cancer survival prediction remains a critical challenge in clinical oncology, largely due to the complex and multi-omics nature of cancer data. Existing methods often struggle to capture the comprehensive range of informative features required for precise predictions. Here, we introduce PCLSurv, an innovative deep learning framework designed for cancer survival prediction using multi-omics data. PCLSurv integrates autoencoders to extract omics-specific features and employs sample-level contrastive learning to identify distinct yet complementary characteristics across data views. Then, features are fused via a bilinear fusion module to construct a unified representation. To further enhance the model's capacity to capture high-level semantic relationships, PCLSurv aligns similar samples with shared prototypes while separating unrelated ones via prototypical contrastive learning. As a result, PCLSurv effectively distinguishes patient groups with varying survival outcomes at different semantic similarity levels, providing a robust framework for stratifying patients based on clinical and molecular features. We conduct extensive experiments on 11 cancer datasets. The comparison results confirm the superior performance of PCLSurv over existing alternatives. The source code of PCLSurv is freely available at https://github.com/LiangSDNULab/PCLSurv.
Collapse
Affiliation(s)
- Zhimin Li
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| | - Wenlan Chen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Hai Zhong
- Department of Radiology, the Second Hospital of Shandong University, Jinan 250033, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
- Department of Radiology, the Second Hospital of Shandong University, Jinan 250033, China
| |
Collapse
|
3
|
Tran D, Nguyen H, Pham VD, Nguyen P, Nguyen Luu H, Minh Phan L, Blair DeStefano C, Jim Yeung SC, Nguyen T. A comprehensive review of cancer survival prediction using multi-omics integration and clinical variables. Brief Bioinform 2025; 26:bbaf150. [PMID: 40221959 PMCID: PMC11994034 DOI: 10.1093/bib/bbaf150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2024] [Revised: 01/29/2025] [Accepted: 03/19/2025] [Indexed: 04/15/2025] Open
Abstract
Cancer is an umbrella term that includes a wide spectrum of disease severity, from those that are malignant, metastatic, and aggressive to benign lesions with very low potential for progression or death. The ability to prognosticate patient outcomes would facilitate management of various malignancies: patients whose cancer is likely to advance quickly would receive necessary treatment that is commensurate with the predicted biology of the disease. Former prognostic models based on clinical variables (age, gender, cancer stage, tumor grade, etc.), though helpful, cannot account for genetic differences, molecular etiology, tumor heterogeneity, and important host biological mechanisms. Therefore, recent prognostic models have shifted toward the integration of complementary information available in both molecular data and clinical variables to better predict patient outcomes: vital status (overall survival), metastasis (metastasis-free survival), and recurrence (progression-free survival). In this article, we review 20 survival prediction approaches that integrate multi-omics and clinical data to predict patient outcomes. We discuss their strategies for modeling survival time (continuous and discrete), the incorporation of molecular measurements and clinical variables into risk models (clinical and multi-omics data), how to cope with censored patient records, the effectiveness of data integration techniques, prediction methodologies, model validation, and assessment metrics. The goal is to inform life scientists of available resources, and to provide a complete review of important building blocks in survival prediction. At the same time, we thoroughly describe the pros and cons of each methodology, and discuss in depth the outstanding challenges that need to be addressed in future method development.
Collapse
Affiliation(s)
- Dao Tran
- Department of Computer Science and Software Engineering, Auburn University, 345 W Magnolia Avenue, Auburn, AL 36849, United States
| | - Ha Nguyen
- Department of Computer Science and Software Engineering, Auburn University, 345 W Magnolia Avenue, Auburn, AL 36849, United States
| | - Van-Dung Pham
- Department of Computer Science and Software Engineering, Auburn University, 345 W Magnolia Avenue, Auburn, AL 36849, United States
| | - Phuong Nguyen
- Department of Computer Science and Software Engineering, Auburn University, 345 W Magnolia Avenue, Auburn, AL 36849, United States
| | - Hung Nguyen Luu
- UPMC Hillman Cancer Center, University of Pittsburgh Medical Center, 5150 Centre Avenue, Pittsburgh, PA 15232, United States
- Department of Epidemiology, School of Public Health, University of Pittsburgh, 130 De Soto Street, Pittsburgh, PA 15261, United States
| | - Liem Minh Phan
- David Grant USAF Medical Center—Clinical Investigation Facility, 60 Medical Group, Defense Health Agency, 101 Bodin Circle, Travis Air Force Base, CA 94535, United States
| | - Christin Blair DeStefano
- Walter Reed National Military Medical Center, Defense Health Agency, 8901 Rockville Pike, Bethesda, MD 20889, United States
| | - Sai-Ching Jim Yeung
- Department of Emergency Medicine, The University of Texas MD Anderson Cancer Center, 1400 Pressler Street, Houston, TX 77030, United States
| | - Tin Nguyen
- Department of Computer Science and Software Engineering, Auburn University, 345 W Magnolia Avenue, Auburn, AL 36849, United States
| |
Collapse
|
4
|
Bharadwaj N, Sharma R, Subramanian M, Ragini G, Nagarajan SA, Rahi M. Omics Approaches in Understanding Insecticide Resistance in Mosquito Vectors. Int J Mol Sci 2025; 26:1854. [PMID: 40076478 PMCID: PMC11899280 DOI: 10.3390/ijms26051854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 12/09/2024] [Accepted: 01/07/2025] [Indexed: 03/14/2025] Open
Abstract
In recent years, the emergence of insecticide resistance has been a major challenge to global public health. Understanding the molecular mechanisms of this phenomenon in mosquito vectors is paramount for the formulation of effective vector control strategies. This review explores the current knowledge of insecticide resistance mechanisms through omics approaches. Genomic, transcriptomic, proteomic, and metabolomics approaches have proven crucial to understand these resilient vectors. Genomic studies have identified multiple genes associated with insecticide resistance, while transcriptomics has revealed dynamic gene expression patterns in response to insecticide exposure and other environmental stimuli. Proteomics and metabolomics offer insights into protein expression and metabolic pathways involved in detoxification and resistance. Integrating omics data holds immense potential to expand our knowledge on the molecular basis of insecticide resistance in mosquitoes via information obtained from different omics platforms to understand regulatory mechanisms and differential expression of genes and protein, and to identify the transcription factors and novel molecules involved in the detoxification of insecticides. Eventually, these data will help construct predictive models, identify novel strategies, and develop targeted interventions to control vector-borne diseases.
Collapse
Affiliation(s)
- Nikhil Bharadwaj
- Division of Vector Biology and Control, ICMR-Vector Control Research Centre, Medical Complex, Indira Nagar, Puducherry 605006, India; (M.S.); (G.R.); (S.A.N.); (M.R.)
| | - Rohit Sharma
- Division of Vector Biology and Control, ICMR-Vector Control Research Centre, Medical Complex, Indira Nagar, Puducherry 605006, India; (M.S.); (G.R.); (S.A.N.); (M.R.)
| | | | | | | | | |
Collapse
|
5
|
Yadalam PK, Natarajan PM, Ardila CM. Variational graph autoencoder for reconstructed transcriptomic data associated with NLRP3 mediated pyroptosis in periodontitis. Sci Rep 2025; 15:1962. [PMID: 39809940 PMCID: PMC11733260 DOI: 10.1038/s41598-025-86455-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 01/10/2025] [Indexed: 01/16/2025] Open
Abstract
The NLRP3 inflammasome, regulated by TLR4, plays a pivotal role in periodontitis by mediating inflammatory cytokine release and bone loss induced by Porphyromonas gingivalis. Periodontal disease creates a hypoxic environment, favoring anaerobic bacteria survival and exacerbating inflammation. The NLRP3 inflammasome triggers pyroptosis, a programmed cell death that amplifies inflammation and tissue damage. This study evaluates the efficacy of Variational Graph Autoencoders (VGAEs) in reconstructing gene data related to NLRP3-mediated pyroptosis in periodontitis. The NCBI GEO dataset GSE262663, containing three samples with and without hypoxia exposure, was analyzed using unsupervised K-means clustering. This method identifies natural groupings within biological data without prior labels. VGAE, a deep learning model, captures complex graph relationships for tasks like link prediction and edge detection. The VGAE model demonstrated exceptional performance with an accuracy of 99.42% and perfect precision. While it identified 5,820 false negatives, indicating a conservative approach, it accurately predicted 4,080 out of 9,900 positive samples. The model's latent space distribution differed significantly from the original data, suggesting a tightly clustered representation of the gene expression patterns. K-means clustering and VGAE show promise in gene expression analysis and graph structure reconstruction for periodontitis research.
Collapse
Affiliation(s)
- Pradeep K Yadalam
- Department of Periodontics, Saveetha Dental College, Saveetha Institute of Medical and Technology Sciences, SIMATS, Saveetha University, Chennai, 600077, Tamil Nadu, India
| | - Prabhu Manickam Natarajan
- Department of Clinical Sciences, Center of Medical and Bio-allied Health Sciences and Research, College of Dentistry, Ajman University, Ajman, 346, United Arab Emirates.
| | - Carlos M Ardila
- Department of Basic Sciences, Faculty of Dentistry, Universidad de Antioquia U de A, Medellín, 050010, Colombia.
| |
Collapse
|
6
|
Kidenya BR, Mboowa G. Unlocking the future of complex human diseases prediction: multi-omics risk score breakthrough. FRONTIERS IN BIOINFORMATICS 2024; 4:1510352. [PMID: 39737249 PMCID: PMC11682975 DOI: 10.3389/fbinf.2024.1510352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Accepted: 11/29/2024] [Indexed: 01/01/2025] Open
Affiliation(s)
- Benson R. Kidenya
- Department of Biochemistry and Molecular Biology, Weill Bugando School of Medicine, Catholic University of Health and Allied Sciences, Mwanza, Tanzania
- Train-The-Trainers for Bioinformatics Group, Human Heredity and Health for Africa Bioinformatics Network (H3ABioNet), Cape Town, South Africa
| | - Gerald Mboowa
- Department of Immunology and Molecular Biology, College of Health Sciences, School of Biomedical Sciences, Makerere University, Kampala, Uganda
- The African Center of Excellence in Bioinformatics and Data-Intensive Sciences, The Infectious Diseases Institute, College of Health Sciences, Makerere University, Kampala, Uganda
- Africa Centres for Disease Control and Prevention, African Union Commission, Addis Ababa, Ethiopia
| |
Collapse
|
7
|
Mildau K, Ehlers H, Meisenburg M, Del Pup E, Koetsier RA, Torres Ortega LR, de Jonge NF, Singh KS, Ferreira D, Othibeng K, Tugizimana F, Huber F, van der Hooft JJJ. Effective data visualization strategies in untargeted metabolomics. Nat Prod Rep 2024. [PMID: 39620439 PMCID: PMC11610048 DOI: 10.1039/d4np00039k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Indexed: 12/11/2024]
Abstract
Covering: 2014 to 2023 for metabolomics, 2002 to 2023 for information visualizationLC-MS/MS-based untargeted metabolomics is a rapidly developing research field spawning increasing numbers of computational metabolomics tools assisting researchers with their complex data processing, analysis, and interpretation tasks. In this article, we review the entire untargeted metabolomics workflow from the perspective of information visualization, visual analytics and visual data integration. Data visualization is a crucial step at every stage of the metabolomics workflow, where it provides core components of data inspection, evaluation, and sharing capabilities. However, due to the large number of available data analysis tools and corresponding visualization components, it is hard for both users and developers to get an overview of what is already available and which tools are suitable for their analysis. In addition, there is little cross-pollination between the fields of data visualization and metabolomics, leaving visual tools to be designed in a secondary and mostly ad hoc fashion. With this review, we aim to bridge the gap between the fields of untargeted metabolomics and data visualization. First, we introduce data visualization to the untargeted metabolomics field as a topic worthy of its own dedicated research, and provide a primer on cutting-edge visualization research into data visualization for both researchers as well as developers active in metabolomics. We extend this primer with a discussion of best practices for data visualization as they have emerged from data visualization studies. Second, we provide a practical roadmap to the visual tool landscape and its use within the untargeted metabolomics field. Here, for several computational analysis stages within the untargeted metabolomics workflow, we provide an overview of commonly used visual strategies with practical examples. In this context, we will also outline promising areas for further research and development. We end the review with a set of recommendations for developers and users on how to make the best use of visualizations for more effective and transparent communication of results.
Collapse
Affiliation(s)
- Kevin Mildau
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | - Henry Ehlers
- Visualization Group, Institute of Visual Computing and Human-Centered Technology, TU Wien, Vienna, Austria.
| | - Mara Meisenburg
- Adaptation Physiology Group, Wageningen University & Research, Wageningen, The Netherlands
| | - Elena Del Pup
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | - Robert A Koetsier
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | | | - Niek F de Jonge
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | - Kumar Saurabh Singh
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
- Maastricht University Faculty of Science and Engineering, Plant Functional Genomics Maastricht, Limburg, The Netherlands
- Faculty of Environment, Science and Economy, University of Exeter, Penryl Cornwall, UK
| | | | - Kgalaletso Othibeng
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Fidele Tugizimana
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Florian Huber
- Centre for Digitalisation and Digitality, Düsseldorf University of Applied Sciences, Düsseldorf, Germany
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| |
Collapse
|
8
|
Abdelaziz EH, Ismail R, Mabrouk MS, Amin E. Multi-omics data integration and analysis pipeline for precision medicine: Systematic review. Comput Biol Chem 2024; 113:108254. [PMID: 39447405 DOI: 10.1016/j.compbiolchem.2024.108254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 09/05/2024] [Accepted: 10/14/2024] [Indexed: 10/26/2024]
Abstract
Precision medicine has gained considerable popularity since the "one-size-fits-all" approach did not seem very effective or reflective of the complexity of the human body. Subsequently, since single-omics does not reflect the complexity of the human body's inner workings, it did not result in the expected advancement in the medical field. Therefore, the multi-omics approach has emerged. The multi-omics approach involves integrating data from different omics technologies, such as DNA sequencing, RNA sequencing, mass spectrometry, and others, using computational methods and then analyzing the integrated result for different downstream analysis applications such as survival analysis, cancer classification, or biomarker identification. Most of the recent reviews were constrained to discussing one aspect of the multi-omics analysis pipeline, such as the dimensionality reduction step, the integration methods, or the interpretability aspect; however, very few provide a comprehensive review of every step of the analysis. This study aims to give an overview of the multi-omics analysis pipeline, starting with the most popular multi-omics databases used in recent literature, dimensionality reduction techniques, details the different types of data integration techniques and their downstream analysis applications, describes the most commonly used evaluation metrics, highlights the importance of model interpretability, and lastly discusses the challenges and potential future work for multi-omics data integration in precision medicine.
Collapse
Affiliation(s)
| | - Rasha Ismail
- Faculty of Computer and Information Sciences, Ainshams University, Cairo, Egypt.
| | - Mai S Mabrouk
- Information Technology and Computer Science School, Nile University, Cairo, Egypt.
| | - Eman Amin
- Faculty of Computer and Information Sciences, Ainshams University, Cairo, Egypt.
| |
Collapse
|
9
|
Bu Y, Liang J, Li Z, Wang J, Wang J, Yu G. Cancer molecular subtyping using limited multi-omics data with missingness. PLoS Comput Biol 2024; 20:e1012710. [PMID: 39724112 PMCID: PMC11709273 DOI: 10.1371/journal.pcbi.1012710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 01/08/2025] [Accepted: 12/10/2024] [Indexed: 12/28/2024] Open
Abstract
Diagnosing cancer subtypes is a prerequisite for precise treatment. Existing multi-omics data fusion-based diagnostic solutions build on the requisite of sufficient samples with complete multi-omics data, which is challenging to obtain in clinical applications. To address the bottleneck of collecting sufficient samples with complete data in clinical applications, we proposed a flexible integrative model (CancerSD) to diagnose cancer subtype using limited samples with incomplete multi-omics data. CancerSD designs contrastive learning tasks and masking-and-reconstruction tasks to reliably impute missing omics, and fuses available omics data with the imputed ones to accurately diagnose cancer subtypes. To address the issue of limited clinical samples, it introduces a category-level contrastive loss to extend the meta-learning framework, effectively transferring knowledge from external datasets to pretrain the diagnostic model. Experiments on benchmark datasets show that CancerSD not only gives accurate diagnosis, but also maintains a high authenticity and good interpretability. In addition, CancerSD identifies important molecular characteristics associated with cancer subtypes, and it defines the Integrated CancerSD Score that can serve as an independent predictive factor for patient prognosis.
Collapse
Affiliation(s)
- Yongqi Bu
- School of Software, Shandong University, Jinan, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong, China
| | - Jiaxuan Liang
- School of Software, Shandong University, Jinan, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong, China
| | - Zhen Li
- Department of Gastroenterology, Qilu Hospital of Shandong University, Jinan, Shandong, China
| | - Jianbo Wang
- Department of Radiation Oncology, Qilu Hospital of Shandong University, Jinan, Shandong, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong, China
| |
Collapse
|
10
|
Tang X, Prodduturi N, Thompson K, Weinshilboum R, O’Sullivan C, Boughey J, Tizhoosh H, Klee E, Wang L, Goetz M, Suman V, Kalari K. OmicsFootPrint: a framework to integrate and interpret multi-omics data using circular images and deep neural networks. Nucleic Acids Res 2024; 52:e99. [PMID: 39445795 PMCID: PMC11602161 DOI: 10.1093/nar/gkae915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 08/14/2024] [Accepted: 10/07/2024] [Indexed: 10/25/2024] Open
Abstract
The OmicsFootPrint framework addresses the need for advanced multi-omics data analysis methodologies by transforming data into intuitive two-dimensional circular images and facilitating the interpretation of complex diseases. Utilizing deep neural networks and incorporating the SHapley Additive exPlanations algorithm, the framework enhances model interpretability. Tested with The Cancer Genome Atlas data, OmicsFootPrint effectively classified lung and breast cancer subtypes, achieving high area under the curve (AUC) scores-0.98 ± 0.02 for lung cancer subtype differentiation and 0.83 ± 0.07 for breast cancer PAM50 subtypes, and successfully distinguished between invasive lobular and ductal carcinomas in breast cancer, showcasing its robustness. It also demonstrated notable performance in predicting drug responses in cancer cell lines, with a median AUC of 0.74, surpassing nine existing methods. Furthermore, its effectiveness persists even with reduced training sample sizes. OmicsFootPrint marks an enhancement in multi-omics research, offering a novel, efficient and interpretable approach that contributes to a deeper understanding of disease mechanisms.
Collapse
Affiliation(s)
- Xiaojia Tang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Naresh Prodduturi
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Kevin J Thompson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Richard Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | | | - Judy C Boughey
- Department of Surgery, Mayo Clinic, Rochester, MN 55905, USA
| | - Hamid R Tizhoosh
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55905, USA
| | - Eric W Klee
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Matthew P Goetz
- Department of Oncology, Mayo Clinic, Rochester, MN 55905, USA
| | - Vera Suman
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Krishna R Kalari
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
11
|
Liang H, Luo H, Sang Z, Jia M, Jiang X, Wang Z, Cong S, Yao X. GREMI: An Explainable Multi-Omics Integration Framework for Enhanced Disease Prediction and Module Identification. IEEE J Biomed Health Inform 2024; 28:6983-6996. [PMID: 39110558 DOI: 10.1109/jbhi.2024.3439713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
Multi-omics integration has demonstrated promising performance in complex disease prediction. However, existing research typically focuses on maximizing prediction accuracy, while often neglecting the essential task of discovering meaningful biomarkers. This issue is particularly important in biomedicine, as molecules often interact rather than function individually to influence disease outcomes. To this end, we propose a two-phase framework named GREMI to assist multi-omics classification and explanation. In the prediction phase, we propose to improve prediction performance by employing a graph attention architecture on sample-wise co-functional networks to incorporate biomolecular interaction information for enhanced feature representation, followed by the integration of a joint-late mixed strategy and the true-class-probability block to adaptively evaluate classification confidence at both feature and omics levels. In the interpretation phase, we propose a multi-view approach to explain disease outcomes from the interaction module perspective, providing a more intuitive understanding and biomedical rationale. We incorporate Monte Carlo tree search (MCTS) to explore local-view subgraphs and pinpoint modules that highly contribute to disease characterization from the global-view. Extensive experiments demonstrate that the proposed framework outperforms state-of-the-art methods in seven different classification tasks, and our model effectively addresses data mutual interference when the number of omics types increases. We further illustrate the functional- and disease-relevance of the identified modules, as well as validate the classification performance of discovered modules using an independent cohort.
Collapse
|
12
|
Ren Y, Wu C, Zhou H, Hu X, Miao Z. Dual-extraction modeling: A multi-modal deep-learning architecture for phenotypic prediction and functional gene mining of complex traits. PLANT COMMUNICATIONS 2024; 5:101002. [PMID: 38872306 DOI: 10.1016/j.xplc.2024.101002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 05/27/2024] [Accepted: 06/11/2024] [Indexed: 06/15/2024]
Abstract
Despite considerable advances in extracting crucial insights from bio-omics data to unravel the intricate mechanisms underlying complex traits, the absence of a universal multi-modal computational tool with robust interpretability for accurate phenotype prediction and identification of trait-associated genes remains a challenge. This study introduces the dual-extraction modeling (DEM) approach, a multi-modal deep-learning architecture designed to extract representative features from heterogeneous omics datasets, enabling the prediction of complex trait phenotypes. Through comprehensive benchmarking experiments, we demonstrate the efficacy of DEM in classification and regression prediction of complex traits. DEM consistently exhibits superior accuracy, robustness, generalizability, and flexibility. Notably, we establish its effectiveness in predicting pleiotropic genes that influence both flowering time and rosette leaf number, underscoring its commendable interpretability. In addition, we have developed user-friendly software to facilitate seamless utilization of DEM's functions. In summary, this study presents a state-of-the-art approach with the ability to effectively predict qualitative and quantitative traits and identify functional genes, confirming its potential as a valuable tool for exploring the genetic basis of complex traits.
Collapse
Affiliation(s)
- Yanlin Ren
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chenhua Wu
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - He Zhou
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xiaona Hu
- College of Chemistry & Pharmacy, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | - Zhenyan Miao
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China; Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China.
| |
Collapse
|
13
|
Vitorino R. Transforming Clinical Research: The Power of High-Throughput Omics Integration. Proteomes 2024; 12:25. [PMID: 39311198 PMCID: PMC11417901 DOI: 10.3390/proteomes12030025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/31/2024] [Accepted: 09/02/2024] [Indexed: 09/26/2024] Open
Abstract
High-throughput omics technologies have dramatically changed biological research, providing unprecedented insights into the complexity of living systems. This review presents a comprehensive examination of the current landscape of high-throughput omics pipelines, covering key technologies, data integration techniques and their diverse applications. It looks at advances in next-generation sequencing, mass spectrometry and microarray platforms and highlights their contribution to data volume and precision. In addition, this review looks at the critical role of bioinformatics tools and statistical methods in managing the large datasets generated by these technologies. By integrating multi-omics data, researchers can gain a holistic understanding of biological systems, leading to the identification of new biomarkers and therapeutic targets, particularly in complex diseases such as cancer. The review also looks at the integration of omics data into electronic health records (EHRs) and the potential for cloud computing and big data analytics to improve data storage, analysis and sharing. Despite significant advances, there are still challenges such as data complexity, technical limitations and ethical issues. Future directions include the development of more sophisticated computational tools and the application of advanced machine learning techniques, which are critical for addressing the complexity and heterogeneity of omics datasets. This review aims to serve as a valuable resource for researchers and practitioners, highlighting the transformative potential of high-throughput omics technologies in advancing personalized medicine and improving clinical outcomes.
Collapse
Affiliation(s)
- Rui Vitorino
- iBiMED, Department of Medical Sciences, University of Aveiro, 3810-193 Aveiro, Portugal;
- Department of Surgery and Physiology, Cardiovascular R&D Centre—UnIC@RISE, Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal
| |
Collapse
|
14
|
Zhao Y, Li X, Zhou C, Peng H, Zheng Z, Chen J, Ding W. A review of cancer data fusion methods based on deep learning. INFORMATION FUSION 2024; 108:102361. [DOI: 10.1016/j.inffus.2024.102361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
|
15
|
Cesnik A, Schaffer LV, Gaur I, Jain M, Ideker T, Lundberg E. Mapping the Multiscale Proteomic Organization of Cellular and Disease Phenotypes. Annu Rev Biomed Data Sci 2024; 7:369-389. [PMID: 38748859 PMCID: PMC11343683 DOI: 10.1146/annurev-biodatasci-102423-113534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature. Here, we envision a future powered by data-driven mapping of protein assemblies. These maps can capture and decode cellular functions through the integration of protein expression, localization, and interaction data across length scales and timescales. In this review, we focus on progress toward constructing integrated cell maps that accelerate the life sciences and translational research.
Collapse
Affiliation(s)
- Anthony Cesnik
- Department of Bioengineering, Stanford University, Stanford, California, USA;
| | - Leah V Schaffer
- Department of Medicine, University of California San Diego, La Jolla, California, USA;
| | - Ishan Gaur
- Department of Bioengineering, Stanford University, Stanford, California, USA;
| | - Mayank Jain
- Department of Medicine, University of California San Diego, La Jolla, California, USA;
| | - Trey Ideker
- Departments of Computer Science and Engineering and Bioengineering, University of California San Diego, La Jolla, California, USA
- Department of Medicine, University of California San Diego, La Jolla, California, USA;
| | - Emma Lundberg
- Chan Zuckerberg Biohub, San Francisco, California, USA
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Pathology, Stanford University, Palo Alto, California, USA
- Department of Bioengineering, Stanford University, Stanford, California, USA;
| |
Collapse
|
16
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
17
|
Abbasi AF, Asim MN, Ahmed S, Vollmer S, Dengel A. Survival prediction landscape: an in-depth systematic literature review on activities, methods, tools, diseases, and databases. Front Artif Intell 2024; 7:1428501. [PMID: 39021434 PMCID: PMC11252047 DOI: 10.3389/frai.2024.1428501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024] Open
Abstract
Survival prediction integrates patient-specific molecular information and clinical signatures to forecast the anticipated time of an event, such as recurrence, death, or disease progression. Survival prediction proves valuable in guiding treatment decisions, optimizing resource allocation, and interventions of precision medicine. The wide range of diseases, the existence of various variants within the same disease, and the reliance on available data necessitate disease-specific computational survival predictors. The widespread adoption of artificial intelligence (AI) methods in crafting survival predictors has undoubtedly revolutionized this field. However, the ever-increasing demand for more sophisticated and effective prediction models necessitates the continued creation of innovative advancements. To catalyze these advancements, it is crucial to bring existing survival predictors knowledge and insights into a centralized platform. The paper in hand thoroughly examines 23 existing review studies and provides a concise overview of their scope and limitations. Focusing on a comprehensive set of 90 most recent survival predictors across 44 diverse diseases, it delves into insights of diverse types of methods that are used in the development of disease-specific predictors. This exhaustive analysis encompasses the utilized data modalities along with a detailed analysis of subsets of clinical features, feature engineering methods, and the specific statistical, machine or deep learning approaches that have been employed. It also provides insights about survival prediction data sources, open-source predictors, and survival prediction frameworks.
Collapse
Affiliation(s)
- Ahtisham Fazeel Abbasi
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Muhammad Nabeel Asim
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sheraz Ahmed
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sebastian Vollmer
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| |
Collapse
|
18
|
Li M, Guo H, Wang K, Kang C, Yin Y, Zhang H. AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification. Comput Biol Med 2024; 177:108614. [PMID: 38796884 DOI: 10.1016/j.compbiomed.2024.108614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 02/27/2024] [Accepted: 05/11/2024] [Indexed: 05/29/2024]
Abstract
Integration analysis of cancer multi-omics data for pan-cancer classification has the potential for clinical applications in various aspects such as tumor diagnosis, analyzing clinically significant features, and providing precision medicine. In these applications, the embedding and feature selection on high-dimensional multi-omics data is clinically necessary. Recently, deep learning algorithms become the most promising cancer multi-omic integration analysis methods, due to the powerful capability of capturing nonlinear relationships. Developing effective deep learning architectures for cancer multi-omics embedding and feature selection remains a challenge for researchers in view of high dimensionality and heterogeneity. In this paper, we propose a novel two-phase deep learning model named AVBAE-MODFR for pan-cancer classification. AVBAE-MODFR achieves embedding by a multi2multi autoencoder based on the adversarial variational Bayes method and further performs feature selection utilizing a dual-net-based feature ranking method. AVBAE-MODFR utilizes AVBAE to pre-train the network parameters, which improves the classification performance and enhances feature ranking stability in MODFR. Firstly, AVBAE learns high-quality representation among multiple omics features for unsupervised pan-cancer classification. We design an efficient discriminator architecture to distinguish the latent distributions for updating forward variational parameters. Secondly, we propose MODFR to simultaneously evaluate multi-omics feature importance for feature selection by training a designed multi2one selector network, where the efficient evaluation approach based on the average gradient of random mask subsets can avoid bias caused by input feature drift. We conduct experiments on the TCGA pan-cancer dataset and compare it with four state-of-the-art methods for each phase. The results show the superiority of AVBAE-MODFR over SOTA methods.
Collapse
Affiliation(s)
- Minghe Li
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Huike Guo
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Keao Wang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Chuanze Kang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska - Lincoln, NE, USA
| | - Han Zhang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China.
| |
Collapse
|
19
|
Chakraborty S, Sharma G, Karmakar S, Banerjee S. Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167120. [PMID: 38484941 DOI: 10.1016/j.bbadis.2024.167120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 04/01/2024]
Abstract
Innovative multi-omics frameworks integrate diverse datasets from the same patients to enhance our understanding of the molecular and clinical aspects of cancers. Advanced omics and multi-view clustering algorithms present unprecedented opportunities for classifying cancers into subtypes, refining survival predictions and treatment outcomes, and unravelling key pathophysiological processes across various molecular layers. However, with the increasing availability of cost-effective high-throughput technologies (HTT) that generate vast amounts of data, analyzing single layers often falls short of establishing causal relations. Integrating multi-omics data spanning genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes offers unique prospects to comprehend the underlying biology of complex diseases like cancer. This discussion explores algorithmic frameworks designed to uncover cancer subtypes, disease mechanisms, and methods for identifying pivotal genomic alterations. It also underscores the significance of multi-omics in tumor classifications, diagnostics, and prognostications. Despite its unparalleled advantages, the integration of multi-omics data has been slow to find its way into everyday clinics. A major hurdle is the uneven maturity of different omics approaches and the widening gap between the generation of large datasets and the capacity to process this data. Initiatives promoting the standardization of sample processing and analytical pipelines, as well as multidisciplinary training for experts in data analysis and interpretation, are crucial for translating theoretical findings into practical applications.
Collapse
Affiliation(s)
- Sohini Chakraborty
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Gaurav Sharma
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Sricheta Karmakar
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Satarupa Banerjee
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
20
|
Tsiakiri A, Bakirtzis C, Plakias S, Vlotinou P, Vadikolias K, Terzoudi A, Christidi F. Predictive Models for the Transition from Mild Neurocognitive Disorder to Major Neurocognitive Disorder: Insights from Clinical, Demographic, and Neuropsychological Data. Biomedicines 2024; 12:1232. [PMID: 38927439 PMCID: PMC11201179 DOI: 10.3390/biomedicines12061232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 05/27/2024] [Accepted: 05/29/2024] [Indexed: 06/28/2024] Open
Abstract
Neurocognitive disorders (NCDs) are progressive conditions that severely impact cognitive function and daily living. Understanding the transition from mild to major NCD is crucial for personalized early intervention and effective management. Predictive models incorporating demographic variables, clinical data, and scores on neuropsychological and emotional tests can significantly enhance early detection and intervention strategies in primary healthcare settings. We aimed to develop and validate predictive models for the progression from mild NCD to major NCD using demographic, clinical, and neuropsychological data from 132 participants over a two-year period. Generalized Estimating Equations were employed for data analysis. Our final model achieved an accuracy of 83.7%. A higher body mass index and alcohol drinking increased the risk of progression from mild NCD to major NCD, while female sex, higher praxis abilities, and a higher score on the Geriatric Depression Scale reduced the risk. Here, we show that integrating multiple factors-ones that can be easily examined in clinical settings-into predictive models can improve early diagnosis of major NCD. This approach could facilitate timely interventions, potentially mitigating the progression of cognitive decline and improving patient outcomes in primary healthcare settings. Further research should focus on validating these models across diverse populations and exploring their implementation in various clinical contexts.
Collapse
Affiliation(s)
- Anna Tsiakiri
- Neurology Department, School of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece; (A.T.); (K.V.); (A.T.)
| | - Christos Bakirtzis
- B’ Department of Neurology and the MS Center, School of Medicine, AHEPA University Hospital, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece;
| | - Spyridon Plakias
- Department of Physical Education and Sport Science, University of Thessaly, 41500 Trikala, Greece;
| | - Pinelopi Vlotinou
- Department of Occupational Therapy, University of West Attica, 12243 Athens, Greece;
| | - Konstantinos Vadikolias
- Neurology Department, School of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece; (A.T.); (K.V.); (A.T.)
| | - Aikaterini Terzoudi
- Neurology Department, School of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece; (A.T.); (K.V.); (A.T.)
| | - Foteini Christidi
- Neurology Department, School of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece; (A.T.); (K.V.); (A.T.)
| |
Collapse
|
21
|
Tang X, Prodduturi N, Thompson KJ, Weinshilboum RM, O'Sullivan CC, Boughey JC, Tizhoosh H, Klee EW, Wang L, Goetz MP, Suman V, Kalari KR. OmicsFootPrint: a framework to integrate and interpret multi-omics data using circular images and deep neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.21.586001. [PMID: 38585820 PMCID: PMC10996492 DOI: 10.1101/2024.03.21.586001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
The OmicsFootPrint framework addresses the need for advanced multi-omics data analysis methodologies by transforming data into intuitive two-dimensional circular images and facilitating the interpretation of complex diseases. Utilizing Deep Neural Networks and incorporating the SHapley Additive exPlanations (SHAP) algorithm, the framework enhances model interpretability. Tested with The Cancer Genome Atlas (TCGA) data, OmicsFootPrint effectively classified lung and breast cancer subtypes, achieving high Area Under Curve (AUC) scores - 0.98±0.02 for lung cancer subtype differentiation, 0.83±0.07 for breast cancer PAM50 subtypes, and successfully distinguishe between invasive lobular and ductal carcinomas in breast cancer, showcasing its robustness. It also demonstrated notable performance in predicting drug responses in cancer cell lines, with a median AUC of 0.74, surpassing existing algorithms. Furthermore, its effectiveness persists even with reduced training sample sizes. OmicsFootPrint marks an enhancement in multi-omics research, offering a novel, efficient, and interpretable approach that contributes to a deeper understanding of disease mechanisms.
Collapse
|
22
|
Bottosso M, Mosele F, Michiels S, Cournède PH, Dogan S, Labaki C, André F. Moving toward precision medicine to predict drug sensitivity in patients with metastatic breast cancer. ESMO Open 2024; 9:102247. [PMID: 38401248 PMCID: PMC10982863 DOI: 10.1016/j.esmoop.2024.102247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 01/03/2024] [Accepted: 01/10/2024] [Indexed: 02/26/2024] Open
Abstract
Tumor heterogeneity represents a major challenge in breast cancer, being associated with disease progression and treatment resistance. Precision medicine has been extensively applied to dissect tumor heterogeneity and, through a deeper molecular understanding of the disease, to personalize therapeutic strategies. In the last years, technological advances have widely improved the understanding of breast cancer biology and several trials have been developed to translate these new insights into clinical practice, with the ultimate aim of improving patients' outcomes. In the era of molecular oncology, genomics analyses and other methodologies are shaping a new treatment algorithm in breast cancer care. In this manuscript, we review the main steps of precision medicine to predict drug sensitivity in breast cancer from a translational point of view. Genomic developments and their clinical implications are discussed, along with technological advancements that could broaden precision medicine applications. Current achievements are put into perspective to provide an overview of the state-of-art of breast cancer precision oncology as well as to identify future research directions.
Collapse
Affiliation(s)
- M Bottosso
- INSERM Unit U981, Gustave Roussy Cancer Campus, Villejuif, France; Department of Surgery, Oncology and Gastroenterology, University of Padova, Padova, Italy
| | - F Mosele
- INSERM Unit U981, Gustave Roussy Cancer Campus, Villejuif, France; Department of Medical Oncology, Gustave Roussy, Villejuif
| | - S Michiels
- Gustave Roussy, Department of Biostatistics and Epidemiology, Villejuif; Oncostat U1018, Inserm, Université Paris-Saclay, Ligue Contre le Cancer, Villejuif
| | - P-H Cournède
- Université Paris-Saclay, Centrale Supélec, Laboratory of Mathematics and Computer Science (MICS), Gif-Sur-Yvette, France
| | - S Dogan
- INSERM Unit U981, Gustave Roussy Cancer Campus, Villejuif, France
| | - C Labaki
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston; Department of Medicine, Beth Israel Deaconess Medical Center, Boston, USA
| | - F André
- INSERM Unit U981, Gustave Roussy Cancer Campus, Villejuif, France; Department of Medical Oncology, Gustave Roussy, Villejuif; PRISM, INSERM, Gustave Roussy, Villejuif; Paris Saclay University, Gif Sur-Yvette, France.
| |
Collapse
|
23
|
Ranjbari S, Arslanturk S. Integration of incomplete multi-omics data using Knowledge Distillation and Supervised Variational Autoencoders for disease progression prediction. J Biomed Inform 2023; 147:104512. [PMID: 37813325 DOI: 10.1016/j.jbi.2023.104512] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/31/2023] [Accepted: 10/03/2023] [Indexed: 10/11/2023]
Abstract
OBJECTIVE The rapid advancement of high-throughput technologies in the biomedical field has resulted in the accumulation of diverse omics data types, such as mRNA expression, DNA methylation, and microRNA expression, for studying various diseases. Integrating these multi-omics datasets enables a comprehensive understanding of the molecular basis of cancer and facilitates accurate prediction of disease progression. METHODS However, conventional approaches face challenges due to the dimensionality curse problem. This paper introduces a novel framework called Knowledge Distillation and Supervised Variational AutoEncoders utilizing View Correlation Discovery Network (KD-SVAE-VCDN) to address the integration of high-dimensional multi-omics data with limited common samples. Through our experimental evaluation, we demonstrate that the proposed KD-SVAE-VCDN architecture accurately predicts the progression of breast and kidney carcinoma by effectively classifying patients as long- or short-term survivors. Furthermore, our approach outperforms other state-of-the-art multi-omics integration models. RESULTS Our findings highlight the efficacy of the KD-SVAE-VCDN architecture in predicting the disease progression of breast and kidney carcinoma. By enabling the classification of patients based on survival outcomes, our model contributes to personalized and targeted treatments. The favorable performance of our approach in comparison to several existing models suggests its potential to contribute to the advancement of cancer understanding and management. CONCLUSION The development of a robust predictive model capable of accurately forecasting disease progression at the time of diagnosis holds immense promise for advancing personalized medicine. By leveraging multi-omics data integration, our proposed KD-SVAE-VCDN framework offers an effective solution to this challenge, paving the way for more precise and tailored treatment strategies for patients with different types of cancer.
Collapse
Affiliation(s)
- Sima Ranjbari
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| |
Collapse
|
24
|
Gao J, Yi X, Wang Z. The application of multi-omics in the respiratory microbiome: Progresses, challenges and promises. Comput Struct Biotechnol J 2023; 21:4933-4943. [PMID: 37867968 PMCID: PMC10585227 DOI: 10.1016/j.csbj.2023.10.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/10/2023] [Accepted: 10/10/2023] [Indexed: 10/24/2023] Open
Abstract
The study of the respiratory microbiome has entered a multi-omic era. Through integrating different omic data types such as metagenome, metatranscriptome, metaproteome, metabolome, culturome and radiome surveyed from respiratory specimens, holistic insights can be gained on the lung microbiome and its interaction with host immunity and inflammation in respiratory diseases. The power of multi-omics have moved the field forward from associative assessment of microbiome alterations to causative understanding of the lung microbiome in the pathogenesis of chronic, acute and other types of respiratory diseases. However, the application of multi-omics in respiratory microbiome remains with unique challenges from sample processing, data integration, and downstream validation. In this review, we first introduce the respiratory sample types and omic data types applicable to studying the respiratory microbiome. We next describe approaches for multi-omic integration, focusing on dimensionality reduction, multi-omic association and prediction. We then summarize progresses in the application of multi-omics to studying the microbiome in respiratory diseases. We finally discuss current challenges and share our thoughts on future promises in the field.
Collapse
Affiliation(s)
- Jingyuan Gao
- Institute of Ecological Sciences, School of Life Sciences, South China Normal University, Guangzhou, Guangdong Province, China
| | - Xinzhu Yi
- Institute of Ecological Sciences, School of Life Sciences, South China Normal University, Guangzhou, Guangdong Province, China
| | - Zhang Wang
- Institute of Ecological Sciences, School of Life Sciences, South China Normal University, Guangzhou, Guangdong Province, China
| |
Collapse
|
25
|
Wekesa JS, Kimwele M. A review of multi-omics data integration through deep learning approaches for disease diagnosis, prognosis, and treatment. Front Genet 2023; 14:1199087. [PMID: 37547471 PMCID: PMC10398577 DOI: 10.3389/fgene.2023.1199087] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 07/11/2023] [Indexed: 08/08/2023] Open
Abstract
Accurate diagnosis is the key to providing prompt and explicit treatment and disease management. The recognized biological method for the molecular diagnosis of infectious pathogens is polymerase chain reaction (PCR). Recently, deep learning approaches are playing a vital role in accurately identifying disease-related genes for diagnosis, prognosis, and treatment. The models reduce the time and cost used by wet-lab experimental procedures. Consequently, sophisticated computational approaches have been developed to facilitate the detection of cancer, a leading cause of death globally, and other complex diseases. In this review, we systematically evaluate the recent trends in multi-omics data analysis based on deep learning techniques and their application in disease prediction. We highlight the current challenges in the field and discuss how advances in deep learning methods and their optimization for application is vital in overcoming them. Ultimately, this review promotes the development of novel deep-learning methodologies for data integration, which is essential for disease detection and treatment.
Collapse
|
26
|
Rahnenführer J, De Bin R, Benner A, Ambrogi F, Lusa L, Boulesteix AL, Migliavacca E, Binder H, Michiels S, Sauerbrei W, McShane L. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges. BMC Med 2023; 21:182. [PMID: 37189125 DOI: 10.1186/s12916-023-02858-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 04/03/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. METHODS Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 "High-dimensional data" of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. RESULTS The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. CONCLUSIONS This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.
Collapse
Affiliation(s)
| | | | - Axel Benner
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Federico Ambrogi
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
- Scientific Directorate, IRCCS Policlinico San Donato, San Donato Milanese, Italy
| | - Lara Lusa
- Department of Mathematics, Faculty of Mathematics, Natural Sciences and Information Technology, University of Primorksa, Koper, Slovenia
- Institute of Biostatistics and Medical Informatics, University of Ljubljana, Ljubljana, Slovenia
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany
| | | | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Stefan Michiels
- Service de Biostatistique et d'Épidémiologie, Gustave Roussy, Université Paris-Saclay, Villejuif, France
- Oncostat U1018, Inserm, Université Paris-Saclay, Labeled Ligue Contre le Cancer, Villejuif, France
| | - Willi Sauerbrei
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Lisa McShane
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD, USA.
| |
Collapse
|