1
|
Mukherjee A, Abraham S, Singh A, Balaji S, Mukunthan KS. From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies. Mol Biotechnol 2025; 67:1269-1289. [PMID: 38565775 PMCID: PMC11928429 DOI: 10.1007/s12033-024-01133-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
In the dynamic landscape of targeted therapeutics, drug discovery has pivoted towards understanding underlying disease mechanisms, placing a strong emphasis on molecular perturbations and target identification. This paradigm shift, crucial for drug discovery, is underpinned by big data, a transformative force in the current era. Omics data, characterized by its heterogeneity and enormity, has ushered biological and biomedical research into the big data domain. Acknowledging the significance of integrating diverse omics data strata, known as multi-omics studies, researchers delve into the intricate interrelationships among various omics layers. This review navigates the expansive omics landscape, showcasing tailored assays for each molecular layer through genomes to metabolomes. The sheer volume of data generated necessitates sophisticated informatics techniques, with machine-learning (ML) algorithms emerging as robust tools. These datasets not only refine disease classification but also enhance diagnostics and foster the development of targeted therapeutic strategies. Through the integration of high-throughput data, the review focuses on targeting and modeling multiple disease-regulated networks, validating interactions with multiple targets, and enhancing therapeutic potential using network pharmacology approaches. Ultimately, this exploration aims to illuminate the transformative impact of multi-omics in the big data era, shaping the future of biological research.
Collapse
Affiliation(s)
- Arnab Mukherjee
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Suzanna Abraham
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Akshita Singh
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - S Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - K S Mukunthan
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| |
Collapse
|
2
|
Canida T, Ke H, Chen S, Ye Z, Ma T. Multivariate Bayesian variable selection for multi-trait genetic fine mapping. J R Stat Soc Ser C Appl Stat 2025; 74:331-351. [PMID: 40092670 PMCID: PMC11905884 DOI: 10.1093/jrsssc/qlae055] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 08/21/2024] [Accepted: 10/01/2024] [Indexed: 03/19/2025]
Abstract
Genome-wide association studies (GWAS) have identified thousands of single-nucleotide polymorphisms (SNPs) associated with complex traits, but determining the underlying causal variants remains challenging. Fine mapping aims to pinpoint the potentially causal variants from a large number of correlated SNPs possibly with group structure in GWAS-enriched genomic regions using variable selection approaches. In multi-trait fine mapping, we are interested in identifying the causal variants for multiple related traits. Existing multivariate variable selection methods for fine mapping select variables for all responses without considering the possible heterogeneity across different responses. Here, we develop a novel multivariate Bayesian variable selection method for multi-trait fine mapping to select causal variants from a large number of grouped SNPs that target at multiple correlated and possibly heterogeneous traits. Our new method is featured by its selection at multiple levels, incorporation of prior biological knowledge to guide selection and identification of best subset of traits the variants target at. We showed the advantage of our method over existing methods via comprehensive simulations that mimic typical fine-mapping settings and a real-world fine-mapping example in UK Biobank, where we identified critical causal variants potentially targeting at different subsets of addictive behaviours and risk factors.
Collapse
Affiliation(s)
- Travis Canida
- Department of Epidemiology and Biostatistics, University of Maryland, 4200 Valley Drive, College Park, MD 20742, USA
| | - Hongjie Ke
- Department of Epidemiology and Biostatistics, University of Maryland, 4200 Valley Drive, College Park, MD 20742, USA
| | - Shuo Chen
- Department of Epidemiology and Public Health, University of Maryland, 655 W. Baltimore Street, Baltimore, MD 21201, USA
| | - Zhenyao Ye
- Department of Epidemiology and Public Health, University of Maryland, 655 W. Baltimore Street, Baltimore, MD 21201, USA
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, University of Maryland, 4200 Valley Drive, College Park, MD 20742, USA
| |
Collapse
|
3
|
Kidenya BR, Mboowa G. Unlocking the future of complex human diseases prediction: multi-omics risk score breakthrough. FRONTIERS IN BIOINFORMATICS 2024; 4:1510352. [PMID: 39737249 PMCID: PMC11682975 DOI: 10.3389/fbinf.2024.1510352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Accepted: 11/29/2024] [Indexed: 01/01/2025] Open
Affiliation(s)
- Benson R. Kidenya
- Department of Biochemistry and Molecular Biology, Weill Bugando School of Medicine, Catholic University of Health and Allied Sciences, Mwanza, Tanzania
- Train-The-Trainers for Bioinformatics Group, Human Heredity and Health for Africa Bioinformatics Network (H3ABioNet), Cape Town, South Africa
| | - Gerald Mboowa
- Department of Immunology and Molecular Biology, College of Health Sciences, School of Biomedical Sciences, Makerere University, Kampala, Uganda
- The African Center of Excellence in Bioinformatics and Data-Intensive Sciences, The Infectious Diseases Institute, College of Health Sciences, Makerere University, Kampala, Uganda
- Africa Centres for Disease Control and Prevention, African Union Commission, Addis Ababa, Ethiopia
| |
Collapse
|
4
|
Tao L, Xie Y, Deng JD, Shen H, Deng HW, Zhou W, Zhao C. SGUQ: Staged Graph Convolution Neural Network for Alzheimer's Disease Diagnosis using Multi-Omics Data. ARXIV 2024:arXiv:2410.11046v1. [PMID: 39483351 PMCID: PMC11527097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Alzheimer's disease (AD) is a chronic neurodegenerative disorder and the leading cause of dementia, significantly impacting cost, mortality, and burden worldwide. The advent of high-throughput omics technologies, such as genomics, transcriptomics, proteomics, and epigenomics, has revolutionized the molecular understanding of AD. Conventional AI approaches typically require the completion of all omics data at the outset to achieve optimal AD diagnosis, which are inefficient and may be unnecessary. To reduce the clinical cost and improve the accuracy of AD diagnosis using multi-omics data, we propose a novel staged graph convolutional network with uncertainty quantification (SGUQ). SGUQ begins with mRNA and progressively incorporates DNA methylation and miRNA data only when necessary, reducing overall costs and exposure to harmful tests. Experimental results indicate that 46.23% of the samples can be reliably predicted using only single-modal omics data (mRNA), while an additional 16.04% of the samples can achieve reliable predictions when combining two omics data types (mRNA + DNA methylation). In addition, the proposed staged SGUQ achieved an accuracy of 0.858 on ROSMAP dataset, which outperformed existing methods significantly. The proposed SGUQ can not only be applied to AD diagnosis using multi-omics data, but also has the potential for clinical decision making using multi-viewed data. Our implementation is publicly available at https://github.com/chenzhao2023/multiomicsuncertainty.
Collapse
Affiliation(s)
- Liang Tao
- Department of Computer Science, Kennesaw State University, Marietta, GA 30060
| | - Yixin Xie
- Department of Information Technology, Kennesaw State University, Marietta, GA, 30060
| | - Jeffrey D Deng
- Geisel School of Medicine at Dartmouth College, Hamover, NH 03755
| | - Hui Shen
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112
| | - Hong-Wen Deng
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112
| | - Weihua Zhou
- Department of Applied Computing, Michigan Technological University, Houghton, MI, 49931
- Center for Biocomputing and Digital Health, Institute of Computing and Cybersystems, and Health Research Institute, Michigan Technological University, Houghton, MI 49931
| | - Chen Zhao
- Department of Computer Science, Kennesaw State University, Marietta, GA 30060
| |
Collapse
|
5
|
Hernández-Lemus E, Ochoa S. Methods for multi-omic data integration in cancer research. Front Genet 2024; 15:1425456. [PMID: 39364009 PMCID: PMC11446849 DOI: 10.3389/fgene.2024.1425456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 08/28/2024] [Indexed: 10/05/2024] Open
Abstract
Multi-omics data integration is a term that refers to the process of combining and analyzing data from different omic experimental sources, such as genomics, transcriptomics, methylation assays, and microRNA sequencing, among others. Such data integration approaches have the potential to provide a more comprehensive functional understanding of biological systems and has numerous applications in areas such as disease diagnosis, prognosis and therapy. However, quantitative integration of multi-omic data is a complex task that requires the use of highly specialized methods and approaches. Here, we discuss a number of data integration methods that have been developed with multi-omics data in view, including statistical methods, machine learning approaches, and network-based approaches. We also discuss the challenges and limitations of such methods and provide examples of their applications in the literature. Overall, this review aims to provide an overview of the current state of the field and highlight potential directions for future research.
Collapse
Affiliation(s)
- Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| |
Collapse
|
6
|
Zhu B, Zhang Z, Leung SY, Fan X. NetMIM: network-based multi-omics integration with block missingness for biomarker selection and disease outcome prediction. Brief Bioinform 2024; 25:bbae454. [PMID: 39288230 PMCID: PMC11407451 DOI: 10.1093/bib/bbae454] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 07/24/2024] [Accepted: 08/30/2024] [Indexed: 09/19/2024] Open
Abstract
Compared with analyzing omics data from a single platform, an integrative analysis of multi-omics data provides a more comprehensive understanding of the regulatory relationships among biological features associated with complex diseases. However, most existing frameworks for integrative analysis overlook two crucial aspects of multi-omics data. Firstly, they neglect the known dependencies among biological features that exist in highly credible biological databases. Secondly, most existing integrative frameworks just simply remove the subjects without full omics data to handle block missingness, resulting in decreasing statistical power. To overcome these issues, we propose a network-based integrative Bayesian framework for biomarker selection and disease outcome prediction based on multi-omics data. Our framework utilizes Dirac spike-and-slab variable selection prior to identifying a small subset of biomarkers. The incorporation of gene pathway information improves the interpretability of feature selection. Furthermore, with the strategy in the FBM (stand for "full Bayesian model with missingness") model where missing omics data are augmented via a mechanistic model, our framework handles block missingness in multi-omics data via a data augmentation approach. The real application illustrates that our approach, which incorporates existing gene pathway information and includes subjects without DNA methylation data, results in more interpretable feature selection results and more accurate predictions.
Collapse
Affiliation(s)
- Bencong Zhu
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Zhen Zhang
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Suet Yi Leung
- Department of Pathology, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Queen Mary Hospital, Hong Kong SAR, China
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| |
Collapse
|
7
|
Gutierrez Reyes CD, Alejo-Jacuinde G, Perez Sanchez B, Chavez Reyes J, Onigbinde S, Mogut D, Hernández-Jasso I, Calderón-Vallejo D, Quintanar JL, Mechref Y. Multi Omics Applications in Biological Systems. Curr Issues Mol Biol 2024; 46:5777-5793. [PMID: 38921016 PMCID: PMC11202207 DOI: 10.3390/cimb46060345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 05/31/2024] [Accepted: 06/05/2024] [Indexed: 06/27/2024] Open
Abstract
Traditional methodologies often fall short in addressing the complexity of biological systems. In this regard, system biology omics have brought invaluable tools for conducting comprehensive analysis. Current sequencing capabilities have revolutionized genetics and genomics studies, as well as the characterization of transcriptional profiling and dynamics of several species and sample types. Biological systems experience complex biochemical processes involving thousands of molecules. These processes occur at different levels that can be studied using mass spectrometry-based (MS-based) analysis, enabling high-throughput proteomics, glycoproteomics, glycomics, metabolomics, and lipidomics analysis. Here, we present the most up-to-date techniques utilized in the completion of omics analysis. Additionally, we include some interesting examples of the applicability of multi omics to a variety of biological systems.
Collapse
Affiliation(s)
| | - Gerardo Alejo-Jacuinde
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX 79409, USA; (G.A.-J.); (B.P.S.)
| | - Benjamin Perez Sanchez
- Department of Plant and Soil Science, Institute of Genomics for Crop Abiotic Stress Tolerance (IGCAST), Texas Tech University, Lubbock, TX 79409, USA; (G.A.-J.); (B.P.S.)
| | - Jesus Chavez Reyes
- Center of Basic Sciences, Department of Physiology and Pharmacology, Autonomous University of Aguascalientes, Aguascalientes 20392, Mexico; (J.C.R.); (I.H.-J.); (D.C.-V.); (J.L.Q.)
| | - Sherifdeen Onigbinde
- Department of Chemistry and Biochemistry, Texas Tech University, Lubbock, TX 79409, USA;
| | - Damir Mogut
- Department of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, 10-719 Olsztyn, Poland;
| | - Irma Hernández-Jasso
- Center of Basic Sciences, Department of Physiology and Pharmacology, Autonomous University of Aguascalientes, Aguascalientes 20392, Mexico; (J.C.R.); (I.H.-J.); (D.C.-V.); (J.L.Q.)
| | - Denisse Calderón-Vallejo
- Center of Basic Sciences, Department of Physiology and Pharmacology, Autonomous University of Aguascalientes, Aguascalientes 20392, Mexico; (J.C.R.); (I.H.-J.); (D.C.-V.); (J.L.Q.)
| | - J. Luis Quintanar
- Center of Basic Sciences, Department of Physiology and Pharmacology, Autonomous University of Aguascalientes, Aguascalientes 20392, Mexico; (J.C.R.); (I.H.-J.); (D.C.-V.); (J.L.Q.)
| | - Yehia Mechref
- Department of Chemistry and Biochemistry, Texas Tech University, Lubbock, TX 79409, USA;
| |
Collapse
|
8
|
Zhao C, Liu A, Zhang X, Cao X, Ding Z, Sha Q, Shen H, Deng HW, Zhou W. CLCLSA: Cross-omics linked embedding with contrastive learning and self attention for integration with incomplete multi-omics data. Comput Biol Med 2024; 170:108058. [PMID: 38295477 PMCID: PMC10959569 DOI: 10.1016/j.compbiomed.2024.108058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 12/30/2023] [Accepted: 01/26/2024] [Indexed: 02/02/2024]
Abstract
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding etiology of complex genetic diseases. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning is employed, which maximizes the mutual information between different types of omics. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Finally, a Softmax classifier is employed to perform multi-omics data classification. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicate that our proposed CLCLSA produces promising results in multi-omics data classification using both complete and incomplete multi-omics data.
Collapse
Affiliation(s)
- Chen Zhao
- Department of Computer Science, Kennesaw State University, Marietta, GA, 30060, USA
| | - Anqi Liu
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Xiao Zhang
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA
| | - Zhengming Ding
- Department of Computer Science, Tulane University, New Orleans, LA, 70118, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA
| | - Hui Shen
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Hong-Wen Deng
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA, 70112, USA.
| | - Weihua Zhou
- Department of Applied Computing, Michigan Technological University, 1400 Townsend Dr, Houghton, MI, 49931, USA; Center for Biocomputing and Digital Health, Institute of Computing and Cybersystems, and Health Research Institute, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
9
|
Zhao C, Liu A, Zhang X, Cao X, Ding Z, Sha Q, Shen H, Deng HW, Zhou W. CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data. RESEARCH SQUARE 2023:rs.3.rs-2768563. [PMID: 37205427 PMCID: PMC10187371 DOI: 10.21203/rs.3.rs-2768563/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multiomics data.
Collapse
Affiliation(s)
- Chen Zhao
- Department of Applied Computing, Michigan Technological University, 1400 Townsend Dr, Houghton, MI 49931, USA
| | - Anqi Liu
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Xiao Zhang
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI 49931, USA
| | - Zhengming Ding
- Department of Computer Science, Tulane University, New Orleans, LA 70118, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI 49931, USA
| | - Hui Shen
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Hong-Wen Deng
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Weihua Zhou
- Department of Applied Computing, Michigan Technological University, 1400 Townsend Dr, Houghton, MI 49931, USA
- Center for Biocomputing and Digital Health, Institute of Computing and Cybersystems, and Health Research Institute, Michigan Technological University, Houghton, MI 49931, USA
| |
Collapse
|
10
|
Zhao C, Liu A, Zhang X, Cao X, Ding Z, Sha Q, Shen H, Deng HW, Zhou W. CLCLSA: Cross-omics Linked embedding with Contrastive Learning and Self Attention for multi-omics integration with incomplete multi-omics data. ARXIV 2023:arXiv:2304.05542v1. [PMID: 37090237 PMCID: PMC10120753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Integration of heterogeneous and high-dimensional multi-omics data is becoming increasingly important in understanding genetic data. Each omics technique only provides a limited view of the underlying biological process and integrating heterogeneous omics layers simultaneously would lead to a more comprehensive and detailed understanding of diseases and phenotypes. However, one obstacle faced when performing multi-omics data integration is the existence of unpaired multi-omics data due to instrument sensitivity and cost. Studies may fail if certain aspects of the subjects are missing or incomplete. In this paper, we propose a deep learning method for multi-omics integration with incomplete data by Cross-omics Linked unified embedding with Contrastive Learning and Self Attention (CLCLSA). Utilizing complete multi-omics data as supervision, the model employs cross-omics autoencoders to learn the feature representation across different types of biological data. The multi-omics contrastive learning, which is used to maximize the mutual information between different types of omics, is employed before latent feature concatenation. In addition, the feature-level self-attention and omics-level self-attention are employed to dynamically identify the most informative features for multi-omics data integration. Extensive experiments were conducted on four public multi-omics datasets. The experimental results indicated that the proposed CLCLSA outperformed the state-of-the-art approaches for multi-omics data classification using incomplete multi-omics data.
Collapse
Affiliation(s)
- Chen Zhao
- Department of Applied Computing, Michigan Technological University, 1400 Townsend Dr, Houghton, MI 49931, USA
| | - Anqi Liu
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Xiao Zhang
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI 49931, USA
| | - Zhengming Ding
- Department of Computer Science, Tulane University, New Orleans, LA 70118, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Dr, Houghton, MI 49931, USA
| | - Hui Shen
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Hong-Wen Deng
- Division of Biomedical Informatics and Genomics, Tulane Center of Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Weihua Zhou
- Department of Applied Computing, Michigan Technological University, 1400 Townsend Dr, Houghton, MI 49931, USA
- Center for Biocomputing and Digital Health, Institute of Computing and Cybersystems, and Health Research Institute, Michigan Technological University, Houghton, MI 49931, USA
| |
Collapse
|
11
|
Donovan SM, Aghaeepour N, Andres A, Azad MB, Becker M, Carlson SE, Järvinen KM, Lin W, Lönnerdal B, Slupsky CM, Steiber AL, Raiten DJ. Evidence for human milk as a biological system and recommendations for study design-a report from "Breastmilk Ecology: Genesis of Infant Nutrition (BEGIN)" Working Group 4. Am J Clin Nutr 2023; 117 Suppl 1:S61-S86. [PMID: 37173061 PMCID: PMC10356565 DOI: 10.1016/j.ajcnut.2022.12.021] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 12/06/2022] [Accepted: 12/08/2022] [Indexed: 05/15/2023] Open
Abstract
Human milk contains all of the essential nutrients required by the infant within a complex matrix that enhances the bioavailability of many of those nutrients. In addition, human milk is a source of bioactive components, living cells and microbes that facilitate the transition to life outside the womb. Our ability to fully appreciate the importance of this matrix relies on the recognition of short- and long-term health benefits and, as highlighted in previous sections of this supplement, its ecology (i.e., interactions among the lactating parent and breastfed infant as well as within the context of the human milk matrix itself). Designing and interpreting studies to address this complexity depends on the availability of new tools and technologies that account for such complexity. Past efforts have often compared human milk to infant formula, which has provided some insight into the bioactivity of human milk, as a whole, or of individual milk components supplemented with formula. However, this experimental approach cannot capture the contributions of the individual components to the human milk ecology, the interaction between these components within the human milk matrix, or the significance of the matrix itself to enhance human milk bioactivity on outcomes of interest. This paper presents approaches to explore human milk as a biological system and the functional implications of that system and its components. Specifically, we discuss study design and data collection considerations and how emerging analytical technologies, bioinformatics, and systems biology approaches could be applied to advance our understanding of this critical aspect of human biology.
Collapse
Affiliation(s)
- Sharon M Donovan
- Department of Food Science and Human Nutrition, University of Illinois, Urbana-Champaign, IL, USA.
| | - Nima Aghaeepour
- Department of Anesthesiology, Pain, and Perioperative Medicine, Department of Pediatrics, and Department of Biomedical Data Sciences, School of Medicine, Stanford University, Stanford, CA, USA
| | - Aline Andres
- Arkansas Children's Nutrition Center and Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Meghan B Azad
- Manitoba Interdisciplinary Lactation Centre (MILC), Children's Hospital Research Institute of Manitoba, Department of Pediatrics and Child Health and Department of Immunology, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Martin Becker
- Department of Anesthesiology, Pain, and Perioperative Medicine, Department of Pediatrics, and Department of Biomedical Data Sciences, School of Medicine, Stanford University, Stanford, CA, USA
| | - Susan E Carlson
- Department of Dietetics and Nutrition, University of Kansas Medical Center, Kansas City, KS, USA
| | - Kirsi M Järvinen
- Department of Pediatrics, Division of Allergy and Immunology and Center for Food Allergy, University of Rochester Medical Center, New York, NY, USA
| | - Weili Lin
- Biomedical Research Imaging Center and Department of Radiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Bo Lönnerdal
- Department of Nutrition, University of California, Davis, CA, USA
| | - Carolyn M Slupsky
- Department of Nutrition, University of California, Davis, CA, USA; Department of Food Science and Technology, University of California, Davis, CA, USA
| | | | - Daniel J Raiten
- Pediatric Growth and Nutrition Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
12
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| |
Collapse
|
13
|
Raufaste-Cazavieille V, Santiago R, Droit A. Multi-omics analysis: Paving the path toward achieving precision medicine in cancer treatment and immuno-oncology. Front Mol Biosci 2022; 9:962743. [PMID: 36304921 PMCID: PMC9595279 DOI: 10.3389/fmolb.2022.962743] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
The acceleration of large-scale sequencing and the progress in high-throughput computational analyses, defined as omics, was a hallmark for the comprehension of the biological processes in human health and diseases. In cancerology, the omics approach, initiated by genomics and transcriptomics studies, has revealed an incredible complexity with unsuspected molecular diversity within a same tumor type as well as spatial and temporal heterogeneity of tumors. The integration of multiple biological layers of omics studies brought oncology to a new paradigm, from tumor site classification to pan-cancer molecular classification, offering new therapeutic opportunities for precision medicine. In this review, we will provide a comprehensive overview of the latest innovations for multi-omics integration in oncology and summarize the largest multi-omics dataset available for adult and pediatric cancers. We will present multi-omics techniques for characterizing cancer biology and show how multi-omics data can be combined with clinical data for the identification of prognostic and treatment-specific biomarkers, opening the way to personalized therapy. To conclude, we will detail the newest strategies for dissecting the tumor immune environment and host–tumor interaction. We will explore the advances in immunomics and microbiomics for biomarker identification to guide therapeutic decision in immuno-oncology.
Collapse
Affiliation(s)
| | - Raoul Santiago
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Division of Pediatric Hematology-Oncology, Centre Hospitalier Universitaire de L’Université Laval, Charles Bruneau Cancer Center, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| | - Arnaud Droit
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| |
Collapse
|
14
|
Hung RJ, Khodayari Moez E, Kim SJ, Budhathoki S, Brooks JD. Considerations of biomarker application for cancer continuum in the era of precision medicine. CURR EPIDEMIOL REP 2022; 9:200-211. [PMID: 36090700 PMCID: PMC9454320 DOI: 10.1007/s40471-022-00295-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2022] [Indexed: 11/25/2022]
Abstract
Purpose of the review The goal of this review is to highlight emerging biomarker research by the key phases of the cancer continuum and outline the methodological considerations for biomarker application. Recent findings While biomarkers have an established role in targeted therapy and to some extent, disease monitoring, their role in early detection and survivorship remains to be elucidated. With the advent of omics technology, the discovery of biomarkers has been accelerated exponentially, therefore careful consideration to ensure an unbiased study design and robust validity is crucial. Summary The rigor of biomarker research holds the key to the success of precision health care. The potential clinical utility and the feasibility of implementation should be central to future biomarker research study design.
Collapse
Affiliation(s)
- Rayjean J Hung
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Elham Khodayari Moez
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Canada
| | - Shana J Kim
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Sanjeev Budhathoki
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Canada
| | - Jennifer D Brooks
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| |
Collapse
|
15
|
Wolc A, Dekkers JCM. Application of Bayesian genomic prediction methods to genome-wide association analyses. Genet Sel Evol 2022; 54:31. [PMID: 35562659 PMCID: PMC9103490 DOI: 10.1186/s12711-022-00724-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Background Bayesian genomic prediction methods were developed to simultaneously fit all genotyped markers to a set of available phenotypes for prediction of breeding values for quantitative traits, allowing for differences in the genetic architecture (distribution of marker effects) of traits. These methods also provide a flexible and reliable framework for genome-wide association (GWA) studies. The objective here was to review developments in Bayesian hierarchical and variable selection models for GWA analyses. Results By fitting all genotyped markers simultaneously, Bayesian GWA methods implicitly account for population structure and the multiple-testing problem of classical single-marker GWA. Implemented using Markov chain Monte Carlo methods, Bayesian GWA methods allow for control of error rates using probabilities obtained from posterior distributions. Power of GWA studies using Bayesian methods can be enhanced by using informative priors based on previous association studies, gene expression analyses, or functional annotation information. Applied to multiple traits, Bayesian GWA analyses can give insight into pleiotropic effects by multi-trait, structural equation, or graphical models. Bayesian methods can also be used to combine genomic, transcriptomic, proteomic, and other -omics data to infer causal genotype to phenotype relationships and to suggest external interventions that can improve performance. Conclusions Bayesian hierarchical and variable selection methods provide a unified and powerful framework for genomic prediction, GWA, integration of prior information, and integration of information from other -omics platforms to identify causal mutations for complex quantitative traits.
Collapse
Affiliation(s)
- Anna Wolc
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.,Hy-Line International, 2583 240th Street, Dallas Center, IA, 50063, USA
| | - Jack C M Dekkers
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.
| |
Collapse
|
16
|
Das S, Mukhopadhyay I. TiMEG: an integrative statistical method for partially missing multi-omics data. Sci Rep 2021; 11:24077. [PMID: 34911979 PMCID: PMC8674330 DOI: 10.1038/s41598-021-03034-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Accepted: 11/24/2021] [Indexed: 11/25/2022] Open
Abstract
Multi-omics data integration is widely used to understand the genetic architecture of disease. In multi-omics association analysis, data collected on multiple omics for the same set of individuals are immensely important for biomarker identification. But when the sample size of such data is limited, the presence of partially missing individual-level observations poses a major challenge in data integration. More often, genotype data are available for all individuals under study but gene expression and/or methylation information are missing for different subsets of those individuals. Here, we develop a statistical model TiMEG, for the identification of disease-associated biomarkers in a case-control paradigm by integrating the above-mentioned data types, especially, in presence of missing omics data. Based on a likelihood approach, TiMEG exploits the inter-relationship among multiple omics data to capture weaker signals, that remain unidentified in single-omic analysis or common imputation-based methods. Its application on a real tuberous sclerosis dataset identified functionally relevant genes in the disease pathway.
Collapse
Affiliation(s)
- Sarmistha Das
- Human Genetics Unit, Indian Statistical Institute, Kolkata, 700108, India
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, USA
| | | |
Collapse
|
17
|
Park Y, Heider D, Hauschild AC. Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence. Cancers (Basel) 2021; 13:3148. [PMID: 34202427 PMCID: PMC8269018 DOI: 10.3390/cancers13133148] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Anne-Christin Hauschild
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
- Department of Medical Informatics, University Medical Center Göttingen, 37075 Göttingen, Germany
| |
Collapse
|
18
|
Mirzaei A, Carter SR, Patanwala AE, Schneider CR. Missing data in surveys: Key concepts, approaches, and applications. Res Social Adm Pharm 2021; 18:2308-2316. [PMID: 33775556 DOI: 10.1016/j.sapharm.2021.03.009] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 02/26/2021] [Accepted: 03/11/2021] [Indexed: 10/21/2022]
Abstract
A recent review of missing data in pharmacy literature has highlighted that a low proportion of studies reported how missing data was handled. In this paper we discuss the concept of missing data in survey research, how missing data is classified, common techniques to account for missingness and how to report on missing data. The paper provides guidance to mitigate the occurrence of missing data through planning. Considerations include estimating expected missing data, intended vs unintended missing data, survey length, working with electronic surveys, choosing between standard and filtered form questions, forced responses and straight-lining, as well as responses that can generate missingness like "I don't know" and "Not Applicable". We introduce methods for analysing data with missing values, such as deletion, imputation and likelihood methods. The manuscript provides a framework and flow chart for choosing the appropriate analysis method based on how much missing data is observed and the type of missingness. Special circumstances involving missing data have been discussed, such as in studies with repeated or cohort measures, factor analysis or as part of data integration. Finally, a checklist of questions are provided for researchers to guide the reporting of the missing data when conducting future research.
Collapse
Affiliation(s)
- Ardalan Mirzaei
- School of Pharmacy, Faculty of Medicine and Health, University of Sydney, Australia.
| | - Stephen R Carter
- School of Pharmacy, Faculty of Medicine and Health, University of Sydney, Australia
| | - Asad E Patanwala
- School of Pharmacy, Faculty of Medicine and Health, University of Sydney, Australia
| | - Carl R Schneider
- School of Pharmacy, Faculty of Medicine and Health, University of Sydney, Australia
| |
Collapse
|
19
|
Espinosa C, Becker M, Marić I, Wong RJ, Shaw GM, Gaudilliere B, Aghaeepour N, Stevenson DK. Data-Driven Modeling of Pregnancy-Related Complications. Trends Mol Med 2021; 27:762-776. [PMID: 33573911 DOI: 10.1016/j.molmed.2021.01.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/01/2020] [Accepted: 01/20/2021] [Indexed: 12/11/2022]
Abstract
A healthy pregnancy depends on complex interrelated biological adaptations involving placentation, maternal immune responses, and hormonal homeostasis. Recent advances in high-throughput technologies have provided access to multiomics biological data that, combined with clinical and social data, can provide a deeper understanding of normal and abnormal pregnancies. Integration of these heterogeneous datasets using state-of-the-art machine-learning methods can enable the prediction of short- and long-term health trajectories for a mother and offspring and the development of treatments to prevent or minimize complications. We review advanced machine-learning methods that could: provide deeper biological insights into a pregnancy not yet unveiled by current methodologies; clarify the etiologies and heterogeneity of pathologies that affect a pregnancy; and suggest the best approaches to address disparities in outcomes affecting vulnerable populations.
Collapse
Affiliation(s)
- Camilo Espinosa
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA; Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Martin Becker
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA; Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA
| | - Ivana Marić
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Ronald J Wong
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Gary M Shaw
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Brice Gaudilliere
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA; Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA; Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA; Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - David K Stevenson
- Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA.
| | | |
Collapse
|
20
|
A network embedding based method for partial multi-omics integration in cancer subtyping. Methods 2020; 192:67-76. [PMID: 32805397 DOI: 10.1016/j.ymeth.2020.08.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 07/18/2020] [Accepted: 08/03/2020] [Indexed: 12/13/2022] Open
Abstract
Integrative analysis of multiple omics offers the opportunity to uncover coordinated cellular processes acting across different omics layers. The ever-increasing of multi-omics data provides us a comprehensive insight into cancer subtyping. Many multi-omics integrative methods have been developed, but few of them can deal with partial datasets in which some samples only have data for a subset of the omics. In this study, we propose a partial multi-omics integrative method, MSNE (Multiple Similarity Network Embedding), for cancer subtyping. MSNE integrates the multi-omics information by embedding the neighbor relations of samples defined by the random walk on multiple similarity networks. We compared MSNE with five existing multi-omics integrative methods on twelve datasets in both full and partial scenarios. MSNE achieved the best result on pan-cancer and image datasets. Furthermore, on ten cancer subtyping datasets, MSNE got the most enriched clinical parameters and comparable log-rank test P-values in survival analysis. In conclusion, MSNE is an effective and efficient integrative method for multi-omics data and, especially, has a strong power on partial datasets.
Collapse
|