1
|
Qattous H, Azzeh M, Ibrahim R, Abed Al-Ghafer I, Al Sorkhy M, Alkhateeb A. PaCMAP-embedded convolutional neural network for multi-omics data integration. Heliyon 2024; 10:e23195. [PMID: 38163104 PMCID: PMC10756978 DOI: 10.1016/j.heliyon.2023.e23195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 11/22/2023] [Accepted: 11/29/2023] [Indexed: 01/03/2024] Open
Abstract
Aims The multi-omics data integration has emerged as a prominent avenue within the healthcare industry, presenting substantial potential for enhancing predictive models. The main motivation behind this study stems from the imperative need to advance prognostic methodologies in cancer diagnosis, an area where precision is pivotal for effective clinical decision-making. In this context, the present study introduces an innovative methodology that integrates copy number alteration (CNA), DNA methylation, and gene expression data. Methods The three omics data were successfully merged into a two-dimensional (2D) map using the PaCMAP dimensionality reduction technique. Utilizing the RGB coloring scheme, a visual representation of the integration was produced utilizing the values of the three omics of each sample. Then, the colored 2D maps were fed into a convolutional neural network (CNN) to forecast the Gleason score. Results Our proposed model outperforms the cutting-edge i-SOM-GSN model by integrating multi-omics data and the CNN architecture with an accuracy of 98.89, and AUC of 0.9996. Conclusion This study demonstrates the effectiveness of multi-omics data integration in predicting health outcomes. The proposed methodology, combining PaCMAP for dimensionality reduction, RGB coloring for visualization, and CNN for prediction, offers a comprehensive framework for integrating heterogeneous omics data and improving predictive accuracy. These findings contribute to the advancement of personalized medicine and have the potential to aid in clinical decision-making for prostate cancer patients.
Collapse
Affiliation(s)
- Hazem Qattous
- Software Engineering Department, Princess Sumaya University for Technology, Amman P.O. Box 1438, Jordan
| | - Mohammad Azzeh
- Data Science Department, Princess Sumaya University for Technology, Amman P.O. Box 1438, Jordan
| | - Rahmeh Ibrahim
- Computer Science Department, Princess Sumaya University for Technology, Amman P.O. Box 1438, Jordan
| | - Ibrahim Abed Al-Ghafer
- Data Science Department, Princess Sumaya University for Technology, Amman P.O. Box 1438, Jordan
| | - Mohammad Al Sorkhy
- Heritage College of Osteopathic medicine, Ohio University, Cleveland, OH 44122, USA
| | - Abedalrhman Alkhateeb
- Computer Science Department, Lakehead University, 955 Oliver Rd, Thunder Bay, ON P7B 5E1, Ontario, Canada
| |
Collapse
|
2
|
Nguyen T, Bian X, Roberson D, Khanna R, Chen Q, Yan C, Beck R, Worman Z, Meerzaman D. Multi-omics Pathways Workflow (MOPAW): An Automated Multi-omics Workflow on the Cancer Genomics Cloud. Cancer Inform 2023; 22:11769351231180992. [PMID: 37342652 PMCID: PMC10278438 DOI: 10.1177/11769351231180992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 05/22/2023] [Indexed: 06/23/2023] Open
Abstract
Introduction In the era of big data, gene-set pathway analyses derived from multi-omics are exceptionally powerful. When preparing and analyzing high-dimensional multi-omics data, the installation process and programing skills required to use existing tools can be challenging. This is especially the case for those who are not familiar with coding. In addition, implementation with high performance computing solutions is required to run these tools efficiently. Methods We introduce an automatic multi-omics pathway workflow, a point and click graphical user interface to Multivariate Single Sample Gene Set Analysis (MOGSA), hosted on the Cancer Genomics Cloud by Seven Bridges Genomics. This workflow leverages the combination of different tools to perform data preparation for each given data types, dimensionality reduction, and MOGSA pathway analysis. The Omics data includes copy number alteration, transcriptomics data, proteomics and phosphoproteomics data. We have also provided an additional workflow to help with downloading data from The Cancer Genome Atlas and Clinical Proteomic Tumor Analysis Consortium and preprocessing these data to be used for this multi-omics pathway workflow. Results The main outputs of this workflow are the distinct pathways for subgroups of interest provided by users, which are displayed in heatmaps if identified. In addition to this, graphs and tables are provided to users for reviewing. Conclusion Multi-omics Pathway Workflow requires no coding experience. Users can bring their own data or download and preprocess public datasets from The Cancer Genome Atlas and Clinical Proteomic Tumor Analysis Consortium using our additional workflow based on the samples of interest. Distinct overactivated or deactivated pathways for groups of interest can be found. This useful information is important in effective therapeutic targeting.
Collapse
Affiliation(s)
- Trinh Nguyen
- The Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| | - Xiaopeng Bian
- The Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| | | | - Rakesh Khanna
- The Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| | - Qingrong Chen
- The Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| | - Chunhua Yan
- The Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| | | | | | - Daoud Meerzaman
- The Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| |
Collapse
|
3
|
Yu X, Zhou S, Zou H, Wang Q, Liu C, Zang M, Liu T. Survey of deep learning techniques for disease prediction based on omics data. HUMAN GENE 2023; 35:201140. [DOI: 10.1016/j.humgen.2022.201140] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
4
|
Translational Bioinformatics for Human Reproductive Biology Research: Examples, Opportunities and Challenges for a Future Reproductive Medicine. Int J Mol Sci 2022; 24:ijms24010004. [PMID: 36613446 PMCID: PMC9819745 DOI: 10.3390/ijms24010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 12/16/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022] Open
Abstract
Since 1978, with the first IVF (in vitro fertilization) baby birth in Manchester (England), more than eight million IVF babies have been born throughout the world, and many new techniques and discoveries have emerged in reproductive medicine. To summarize the modern technology and progress in reproductive medicine, all scientific papers related to reproductive medicine, especially papers related to reproductive translational medicine, were fully searched, manually curated and reviewed. Results indicated whether male reproductive medicine or female reproductive medicine all have made significant progress, and their markers have experienced the progress from karyotype analysis to single-cell omics. However, due to the lack of comprehensive databases, especially databases collecting risk exposures, disease markers and models, prevention drugs and effective treatment methods, the application of the latest precision medicine technologies and methods in reproductive medicine is limited.
Collapse
|
5
|
Lee M, Kim PJ, Joe H, Kim HG. Gene-centric multi-omics integration with convolutional encoders for cancer drug response prediction. Comput Biol Med 2022; 151:106192. [PMID: 36327883 DOI: 10.1016/j.compbiomed.2022.106192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/26/2022] [Accepted: 10/08/2022] [Indexed: 12/27/2022]
Abstract
MOTIVATION Tumor heterogeneity, including genetic and transcriptomic characteristics, can reduce the efficacy of anticancer pharmacological therapy, resulting in clinical variability in patient response to therapeutic medications. Multi-omics integration can allow in silico models to provide an additional perspective on a biological system. METHODS In this study, we propose a gene-centric multi-channel (GCMC) architecture to integrate multi-omics for predicting cancer drug response. GCMC transformed multi-omics profiles into a three-dimensional tensor with an additional dimension for omics types. GCMC's convolutional encoders captures multi-omics profiles for each gene and yields gene-centric features to predict drug responses. RESULTS We evaluated GCMC on various datasets, including The Cancer Genome Atlas (TCGA) patients, patient-derived xenografts (PDX) mice models, and the Genomics of Drug Sensitivity in Cancer (GDSC) cell line datasets. GCMC achieved better performance than baseline models, including single-omics models, in more than 75% of 265 drugs from GDSC cell line datasets. Furthermore, as for the clinical applicability of GCMC, it achieved the best performance on TCGA and PDX datasets in terms of both AUPR and AUC. We also analyzed models' capability of integrating multi-omics profiles by measuring the contribution ratio of omics types. GCMC can incorporate multi-omics profiles in various manners to enhance performance for each drug type. These results suggested that GCMC can improve performance and feature extraction capability by integrating multi-omics profiles in a gene-centric manner.
Collapse
Affiliation(s)
- Munhwan Lee
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Pil-Jong Kim
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Hyunwhan Joe
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Hong-Gee Kim
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| |
Collapse
|
6
|
ElKarami B, Alkhateeb A, Qattous H, Alshomali L, Shahrrava B. Multi-omics Data Integration Model Based on UMAP Embedding and Convolutional Neural Network. Cancer Inform 2022; 21:11769351221124205. [PMID: 36187912 PMCID: PMC9523837 DOI: 10.1177/11769351221124205] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 08/14/2022] [Indexed: 11/15/2022] Open
Abstract
Introduction Multi-omics data integration facilitates collecting richer understanding and perceptions than separate omics data. Various promising integrative approaches have been utilized to analyze multi-omics data for biomedical applications, including disease prediction and disease subtypes, biomarker prediction, and others. Methods In this paper, we introduce a multi-omics data integration method that is constructed using the combination of gene similarity network (GSN) based on uniform manifold approximation and projection (UMAP) and convolutional neural networks (CNNs). The method utilizes UMAP to embed gene expression, DNA methylation, and copy number alteration (CNA) to a lower dimension creating two-dimensional RGB images. Gene expression is used as a reference to construct the GSN and then integrate other omics data with the gene expression for better prediction. We used CNNs to predict the Gleason score levels of prostate cancer patients and the tumor stage in breast cancer patients. Results The model proposed near perfection with accuracy above 99% with all other performance measurements at the same level. The proposed model outperformed the state-of-art iSOM-GSN model that constructs the GSN map based on the self-organizing map. Conclusion The results show that UMAP as an embedding technique can better integrate multi-omics maps into the prediction model than SOM. The proposed model can also be applied to build a multi-omics prediction model for other types of cancer.
Collapse
Affiliation(s)
- Bashier ElKarami
- Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON, Canada
| | - Abedalrhman Alkhateeb
- Software Engineering Department, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Al-Jubaiha, Amman, Jordan
- Abedalrhman Alkhateeb, Software Engineering Department, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, P. O. Box 1438, Al-Jubaiha, Amman 11941, Jordan.
| | - Hazem Qattous
- Software Engineering Department, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Al-Jubaiha, Amman, Jordan
| | - Lujain Alshomali
- Software Engineering Department, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Al-Jubaiha, Amman, Jordan
| | - Behnam Shahrrava
- Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON, Canada
| |
Collapse
|
7
|
Integrated Multi-Omics Maps of Lower-Grade Gliomas. Cancers (Basel) 2022; 14:cancers14112797. [PMID: 35681780 PMCID: PMC9179546 DOI: 10.3390/cancers14112797] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/18/2022] [Accepted: 05/31/2022] [Indexed: 02/01/2023] Open
Abstract
Multi-omics high-throughput technologies produce data sets which are not restricted to only one but consist of multiple omics modalities, often as patient-matched tumour specimens. The integrative analysis of these omics modalities is essential to obtain a holistic view on the otherwise fragmented information hidden in this data. We present an intuitive method enabling the combined analysis of multi-omics data based on self-organizing maps machine learning. It "portrays" the expression, methylation and copy number variations (CNV) landscapes of each tumour using the same gene-centred coordinate system. It enables the visual evaluation and direct comparison of the different omics layers on a personalized basis. We applied this combined molecular portrayal to lower grade gliomas, a heterogeneous brain tumour entity. It classifies into a series of molecular subtypes defined by genetic key lesions, which associate with large-scale effects on DNA methylation and gene expression, and in final consequence, drive with cell fate decisions towards oligodendroglioma-, astrocytoma- and glioblastoma-like cancer cell lineages with different prognoses. Consensus modes of concerted changes of expression, methylation and CNV are governed by the degree of co-regulation within and between the omics layers. The method is not restricted to the triple-omics data used here. The similarity landscapes reflect partly independent effects of genetic lesions and DNA methylation with consequences for cancer hallmark characteristics such as proliferation, inflammation and blocked differentiation in a subtype specific fashion. It can be extended to integrate other omics features such as genetic mutation, protein expression data as well as extracting prognostic markers.
Collapse
|
8
|
Kang M, Ko E, Mersha TB. A roadmap for multi-omics data integration using deep learning. Brief Bioinform 2022; 23:bbab454. [PMID: 34791014 PMCID: PMC8769688 DOI: 10.1093/bib/bbab454] [Citation(s) in RCA: 142] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/30/2021] [Accepted: 10/05/2021] [Indexed: 12/18/2022] Open
Abstract
High-throughput next-generation sequencing now makes it possible to generate a vast amount of multi-omics data for various applications. These data have revolutionized biomedical research by providing a more comprehensive understanding of the biological systems and molecular mechanisms of disease development. Recently, deep learning (DL) algorithms have become one of the most promising methods in multi-omics data analysis, due to their predictive performance and capability of capturing nonlinear and hierarchical features. While integrating and translating multi-omics data into useful functional insights remain the biggest bottleneck, there is a clear trend towards incorporating multi-omics analysis in biomedical research to help explain the complex relationships between molecular layers. Multi-omics data have a role to improve prevention, early detection and prediction; monitor progression; interpret patterns and endotyping; and design personalized treatments. In this review, we outline a roadmap of multi-omics integration using DL and offer a practical perspective into the advantages, challenges and barriers to the implementation of DL in multi-omics data.
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Euiseong Ko
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Tesfaye B Mersha
- Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
9
|
McGrath SP, Benton ML, Tavakoli M, Tatonetti NP. Predictions, Pivots, and a Pandemic: a Review of 2020's Top Translational Bioinformatics Publications. Yearb Med Inform 2021; 30:219-225. [PMID: 34479393 PMCID: PMC8416221 DOI: 10.1055/s-0041-1726540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVES Provide an overview of the emerging themes and notable papers which were published in 2020 in the field of Bioinformatics and Translational Informatics (BTI) for the International Medical Informatics Association Yearbook. METHODS A team of 16 individuals scanned the literature from the past year. Using a scoring rubric, papers were evaluated on their novelty, importance, and objective quality. 1,224 Medical Subject Headings (MeSH) terms extracted from these papers were used to identify themes and research focuses. The authors then used the scoring results to select notable papers and trends presented in this manuscript. RESULTS The search phase identified 263 potential papers and central themes of coronavirus disease 2019 (COVID-19), machine learning, and bioinformatics were examined in greater detail. CONCLUSIONS When addressing a once in a centruy pandemic, scientists worldwide answered the call, with informaticians playing a critical role. Productivity and innovations reached new heights in both TBI and science, but significant research gaps remain.
Collapse
Affiliation(s)
- Scott P. McGrath
- CITRIS Health, University of California Berkeley, Berkeley, CA, USA
| | | | - Maryam Tavakoli
- MTERMS Lab, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | |
Collapse
|
10
|
Neisari A, Rueda L, Saad S. Spam review detection using self-organizing maps and convolutional neural networks. Comput Secur 2021. [DOI: 10.1016/j.cose.2021.102274] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
11
|
Mostavi M, Chiu YC, Chen Y, Huang Y. CancerSiamese: one-shot learning for predicting primary and metastatic tumor types unseen during model training. BMC Bioinformatics 2021; 22:244. [PMID: 33980137 PMCID: PMC8117642 DOI: 10.1186/s12859-021-04157-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The state-of-the-art deep learning based cancer type prediction can only predict cancer types whose samples are available during the training where the sample size is commonly large. In this paper, we consider how to utilize the existing training samples to predict cancer types unseen during the training. We hypothesize the existence of a set of type-agnostic expression representations that define the similarity/dissimilarity between samples of the same/different types and propose a novel one-shot learning model called CancerSiamese to learn this common representation. CancerSiamese accepts a pair of query and support samples (gene expression profiles) and learns the representation of similar or dissimilar cancer types through two parallel convolutional neural networks joined by a similarity function. RESULTS We trained CancerSiamese for cancer type prediction for primary and metastatic tumors using samples from the Cancer Genome Atlas (TCGA) and MET500. Network transfer learning was utilized to facilitate the training of the CancerSiamese models. CancerSiamese was tested for different N-way predictions and yielded an average accuracy improvement of 8% and 4% over the benchmark 1-Nearest Neighbor (1-NN) classifier for primary and metastatic tumors, respectively. Moreover, we applied the guided gradient saliency map and feature selection to CancerSiamese to examine 100 and 200 top marker-gene candidates for the prediction of primary and metastatic cancers, respectively. Functional analysis of these marker genes revealed several cancer related functions between primary and metastatic tumors. CONCLUSION This work demonstrated, for the first time, the feasibility of predicting unseen cancer types whose samples are limited. Thus, it could inspire new and ingenious applications of one-shot and few-shot learning solutions for improving cancer diagnosis, prognostic, and our understanding of cancer.
Collapse
Affiliation(s)
- Milad Mostavi
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA
| | - Yu-Chiao Chiu
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Yidong Chen
- Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA.
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
| |
Collapse
|
12
|
Exploratory Data Analysis and Foreground Detection with the Growing Hierarchical Neural Forest. Neural Process Lett 2020. [DOI: 10.1007/s11063-020-10360-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|