1
|
Mao R, Wan L, Zhou M, Li D. Cox-Sage: enhancing Cox proportional hazards model with interpretable graph neural networks for cancer prognosis. Brief Bioinform 2025; 26:bbaf108. [PMID: 40067266 PMCID: PMC11894944 DOI: 10.1093/bib/bbaf108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 02/08/2025] [Accepted: 02/25/2025] [Indexed: 03/15/2025] Open
Abstract
High-throughput sequencing technologies have facilitated a deeper exploration of prognostic biomarkers. While many deep learning (DL) methods primarily focus on feature extraction or employ simplistic fully connected layers within prognostic modules, the interpretability of DL-extracted features can be challenging. To address these challenges, we propose an interpretable cancer prognosis model called Cox-Sage. Specifically, we first propose an algorithm to construct a patient similarity graph from heterogeneous clinical data, and then extract protein-coding genes from the patient's gene expression data to embed them as features into the graph nodes. We utilize multilayer graph convolution to model proportional hazards pattern and introduce a mathematical method to clearly explain the meaning of our model's parameters. Based on this approach, we propose two metrics for measuring gene importance from different perspectives: mean hazard ratio and reciprocal of the mean hazard ratio. These metrics can be used to discover two types of important genes: genes whose low expression levels are associated with high cancer prognosis risk, and genes whose high expression levels are associated with high cancer prognosis risk. We conducted experiments on seven datasets from TCGA, and our model achieved superior prognostic performance compared with some state-of-the-art methods. As a primary research, we performed prognostic biomarker discovery on the LIHC (Liver Hepatocellular Carcinoma) dataset. Our code and dataset can be found at https://github.com/beeeginner/Cox-sage.
Collapse
Affiliation(s)
- Ruijun Mao
- College of Artificial Intelligence, Taiyuan University of Technology, 79 Yingze West Avenue, Wanbailin District, Taiyuan, Shanxi Province 030024, China
| | - Li Wan
- College of Artificial Intelligence, Taiyuan University of Technology, 79 Yingze West Avenue, Wanbailin District, Taiyuan, Shanxi Province 030024, China
| | - Minghao Zhou
- College of Artificial Intelligence, Taiyuan University of Technology, 79 Yingze West Avenue, Wanbailin District, Taiyuan, Shanxi Province 030024, China
| | - Dongxi Li
- College of Computer Science and Technology, 79 Yingze West Avenue, Wanbailin District, Taiyuan University of Technology, Taiyuan, Shanxi Province 030024, China
| |
Collapse
|
2
|
V U P, T I M, K K M. An integrative analysis to identify pancancer epigenetic biomarkers. Comput Biol Chem 2024; 113:108260. [PMID: 39467487 DOI: 10.1016/j.compbiolchem.2024.108260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 09/13/2024] [Accepted: 10/15/2024] [Indexed: 10/30/2024]
Abstract
Integrating and analyzing the pancancer data collected from different experiments is crucial for gaining insights into the common mechanisms in the molecular level underlying the development and progression of cancers. Epigenetic study of the pancancer data can provide promising results in biomarker discovery. The genes that are epigenetically dysregulated in different cancers are powerful biomarkers for drug-related studies. This paper identifies the genes having altered expression due to aberrant methylation patterns using differential analysis of TCGA pancancer data of 12 different cancers. We identified a comprehensive set of 115 epigenetic biomarker genes out of which 106 genes having pancancer properties. The correlation analysis, gene set enrichment, protein-protein interaction analysis, pancancer characteristics analysis, and diagnostic modeling were performed on these biomarkers to illustrate the power of this signature and found to be important in different molecular operations related to cancer. An accuracy of 97.56% was obtained on TCGA pancancer gene expression dataset for predicting the binary class tumor or normal. The source code and dataset of this work are available at https://github.com/panchamisuneeth/EpiPanCan.git.
Collapse
Affiliation(s)
- Panchami V U
- Adi Shankara Institute of Engineering and Technology, Ernakulam, 683574, Kerala, India; Government Engineering College Thrissur, 680009, Kerala, India; APJ Abdul Kalam Technological University, 695016, Kerala, India.
| | - Manish T I
- SCMS School of Engineering and Technology, Ernakulam, 683576, Kerala, India; APJ Abdul Kalam Technological University, 695016, Kerala, India
| | - Manesh K K
- Government Engineering College Thrissur, 680009, Kerala, India; APJ Abdul Kalam Technological University, 695016, Kerala, India
| |
Collapse
|
3
|
Chaudhary V, Taha BA, Lucky, Rustagi S, Khosla A, Papakonstantinou P, Bhalla N. Nose-on-Chip Nanobiosensors for Early Detection of Lung Cancer Breath Biomarkers. ACS Sens 2024; 9:4469-4494. [PMID: 39248694 PMCID: PMC11443536 DOI: 10.1021/acssensors.4c01524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/10/2024]
Abstract
Lung cancer remains a global health concern, demanding the development of noninvasive, prompt, selective, and point-of-care diagnostic tools. Correspondingly, breath analysis using nanobiosensors has emerged as a promising noninvasive nose-on-chip technique for the early detection of lung cancer through monitoring diversified biomarkers such as volatile organic compounds/gases in exhaled breath. This comprehensive review summarizes the state-of-the-art breath-based lung cancer diagnosis employing chemiresistive-module nanobiosensors supported by theoretical findings. It unveils the fundamental mechanisms and biological basis of breath biomarker generation associated with lung cancer, technological advancements, and clinical implementation of nanobiosensor-based breath analysis. It explores the merits, challenges, and potential alternate solutions in implementing these nanobiosensors in clinical settings, including standardization, biocompatibility/toxicity analysis, green and sustainable technologies, life-cycle assessment, and scheming regulatory modalities. It highlights nanobiosensors' role in facilitating precise, real-time, and on-site detection of lung cancer through breath analysis, leading to improved patient outcomes, enhanced clinical management, and remote personalized monitoring. Additionally, integrating these biosensors with artificial intelligence, machine learning, Internet-of-things, bioinformatics, and omics technologies is discussed, providing insights into the prospects of intelligent nose-on-chip lung cancer sniffing nanobiosensors. Overall, this review consolidates knowledge on breathomic biosensor-based lung cancer screening, shedding light on its significance and potential applications in advancing state-of-the-art medical diagnostics to reduce the burden on hospitals and save human lives.
Collapse
Affiliation(s)
- Vishal Chaudhary
- Physics Department, Bhagini Nivedita College, University of Delhi, 110043 Delhi, India
- Centre for Research Impact & Outcome, Chitkara University, Punjab 140401, India
| | - Bakr Ahmed Taha
- Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, UKM, 43600 Bangi, Malaysia
| | - Lucky
- Dr. B. R. Ambedkar Center for Biomedical Research, University of Delhi, 110007 Delhi, India
| | - Sarvesh Rustagi
- School of Applied and Life Sciences, Uttaranchal University, Dehradun, Uttarakhand 248007, India
| | - Ajit Khosla
- School of Advanced Materials and Nanotechnology, Xidian University, Xi'an 710126, China
| | - Pagona Papakonstantinou
- Nanotechnology and Integrated Bioengineering Centre (NIBEC), School of Engineering, Ulster University, 2-24 York Street, Belfast, Northern Ireland BT15 1AP, United Kingdom
| | - Nikhil Bhalla
- Nanotechnology and Integrated Bioengineering Centre (NIBEC), School of Engineering, Ulster University, 2-24 York Street, Belfast, Northern Ireland BT15 1AP, United Kingdom
- Healthcare Technology Hub, Ulster University, 2-24 York Street, Belfast, Northern Ireland BT15 1AP, United Kingdom
| |
Collapse
|
4
|
Abbasi AF, Asim MN, Ahmed S, Vollmer S, Dengel A. Survival prediction landscape: an in-depth systematic literature review on activities, methods, tools, diseases, and databases. Front Artif Intell 2024; 7:1428501. [PMID: 39021434 PMCID: PMC11252047 DOI: 10.3389/frai.2024.1428501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024] Open
Abstract
Survival prediction integrates patient-specific molecular information and clinical signatures to forecast the anticipated time of an event, such as recurrence, death, or disease progression. Survival prediction proves valuable in guiding treatment decisions, optimizing resource allocation, and interventions of precision medicine. The wide range of diseases, the existence of various variants within the same disease, and the reliance on available data necessitate disease-specific computational survival predictors. The widespread adoption of artificial intelligence (AI) methods in crafting survival predictors has undoubtedly revolutionized this field. However, the ever-increasing demand for more sophisticated and effective prediction models necessitates the continued creation of innovative advancements. To catalyze these advancements, it is crucial to bring existing survival predictors knowledge and insights into a centralized platform. The paper in hand thoroughly examines 23 existing review studies and provides a concise overview of their scope and limitations. Focusing on a comprehensive set of 90 most recent survival predictors across 44 diverse diseases, it delves into insights of diverse types of methods that are used in the development of disease-specific predictors. This exhaustive analysis encompasses the utilized data modalities along with a detailed analysis of subsets of clinical features, feature engineering methods, and the specific statistical, machine or deep learning approaches that have been employed. It also provides insights about survival prediction data sources, open-source predictors, and survival prediction frameworks.
Collapse
Affiliation(s)
- Ahtisham Fazeel Abbasi
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Muhammad Nabeel Asim
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sheraz Ahmed
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sebastian Vollmer
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| |
Collapse
|
5
|
Tran TO, Vo TH, Le NQK. Omics-based deep learning approaches for lung cancer decision-making and therapeutics development. Brief Funct Genomics 2024; 23:181-192. [PMID: 37519050 DOI: 10.1093/bfgp/elad031] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/04/2023] [Accepted: 07/13/2023] [Indexed: 08/01/2023] Open
Abstract
Lung cancer has been the most common and the leading cause of cancer deaths globally. Besides clinicopathological observations and traditional molecular tests, the advent of robust and scalable techniques for nucleic acid analysis has revolutionized biological research and medicinal practice in lung cancer treatment. In response to the demands for minimally invasive procedures and technology development over the past decade, many types of multi-omics data at various genome levels have been generated. As omics data grow, artificial intelligence models, particularly deep learning, are prominent in developing more rapid and effective methods to potentially improve lung cancer patient diagnosis, prognosis and treatment strategy. This decade has seen genome-based deep learning models thriving in various lung cancer tasks, including cancer prediction, subtype classification, prognosis estimation, cancer molecular signatures identification, treatment response prediction and biomarker development. In this study, we summarized available data sources for deep-learning-based lung cancer mining and provided an update on recent deep learning models in lung cancer genomics. Subsequently, we reviewed the current issues and discussed future research directions of deep-learning-based lung cancer genomics research.
Collapse
Affiliation(s)
- Thi-Oanh Tran
- International Ph.D. Program in Cell Therapy and Regenerative Medicine, College of Medicine, Taipei Medical University, No 250 Wuxing Street, 110, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, No 250 Wuxing Street, 110, Taipei, Taiwan
- Hematology and Blood Transfusion Center, Bach Mai Hospital, No 78 Giai Phong Street, Hanoi, Viet Nam
| | - Thanh Hoa Vo
- Department of Science, School of Science and Computing, South East Technological University, Waterford X91 K0EK, Ireland
- Pharmaceutical and Molecular Biotechnology Research Center (PMBRC), South East Technological University, Waterford X91 K0EK, Ireland
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, No 250 Wuxing Street, 110, Taipei, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, 252 Wuxing Street, 110, Taipei, Taiwan
| |
Collapse
|
6
|
Zhang Y, Yao L, Chung CR, Huang Y, Li S, Zhang W, Pang Y, Lee TY. KinPred-RNA-kinase activity inference and cancer type classification using machine learning on RNA-seq data. iScience 2024; 27:109333. [PMID: 38523792 PMCID: PMC10959666 DOI: 10.1016/j.isci.2024.109333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 12/07/2023] [Accepted: 02/21/2024] [Indexed: 03/26/2024] Open
Abstract
Kinases as important enzymes can transfer phosphate groups from high-energy and phosphate-donating molecules to specific substrates and play essential roles in various cellular processes. Existing algorithms for kinase activity from phosphorylated proteomics data are often costly, requiring valuable samples. Moreover, methods to extract kinase activities from bulk RNA sequencing data remain undeveloped. In this study, we propose a computational framework KinPred-RNA to derive kinase activities from bulk RNA-sequencing data in cancer samples. KinPred-RNA framework, using the extreme gradient boosting (XGBoost) regression model, outperforms random forest regression, multiple linear regression, and support vector machine regression models in predicting kinase activities from cancer-related RNA sequencing data. Efficient gene signatures from the LINCS-L1000 dataset were used as inputs for KinPred-RNA. The results highlight its potential to be related to biological function. In conclusion, KinPred RNA constitutes a significant advance in cancer research by potentially facilitating the identification of cancer.
Collapse
Affiliation(s)
- Yuntian Zhang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Lantian Yao
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 320953, Taiwan
| | - Yixian Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Shangfu Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Wenyang Zhang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Yuxuan Pang
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDSB), National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
| |
Collapse
|
7
|
Xu Z, Liao H, Huang L, Chen Q, Lan W, Li S. IBPGNET: lung adenocarcinoma recurrence prediction based on neural network interpretability. Brief Bioinform 2024; 25:bbae080. [PMID: 38557672 PMCID: PMC10982951 DOI: 10.1093/bib/bbae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/31/2024] [Accepted: 02/07/2024] [Indexed: 04/04/2024] Open
Abstract
Lung adenocarcinoma (LUAD) is the most common histologic subtype of lung cancer. Early-stage patients have a 30-50% probability of metastatic recurrence after surgical treatment. Here, we propose a new computational framework, Interpretable Biological Pathway Graph Neural Networks (IBPGNET), based on pathway hierarchy relationships to predict LUAD recurrence and explore the internal regulatory mechanisms of LUAD. IBPGNET can integrate different omics data efficiently and provide global interpretability. In addition, our experimental results show that IBPGNET outperforms other classification methods in 5-fold cross-validation. IBPGNET identified PSMC1 and PSMD11 as genes associated with LUAD recurrence, and their expression levels were significantly higher in LUAD cells than in normal cells. The knockdown of PSMC1 and PSMD11 in LUAD cells increased their sensitivity to afatinib and decreased cell migration, invasion and proliferation. In addition, the cells showed significantly lower EGFR expression, indicating that PSMC1 and PSMD11 may mediate therapeutic sensitivity through EGFR expression.
Collapse
Affiliation(s)
- Zhanyu Xu
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Haibo Liao
- School of computer, Electronic and Information, Guangxi University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Liuliu Huang
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Qingfeng Chen
- School of computer, Electronic and Information, Guangxi University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Wei Lan
- School of computer, Electronic and Information, Guangxi University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| | - Shikang Li
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Zhuang Autonomous Region 530021, China
| |
Collapse
|
8
|
Boubnovski Martell M, Linton-Reid K, Hindocha S, Chen M, Moreno P, Álvarez-Benito M, Salvatierra Á, Lee R, Posma JM, Calzado MA, Aboagye EO. Deep representation learning of tissue metabolome and computed tomography annotates NSCLC classification and prognosis. NPJ Precis Oncol 2024; 8:28. [PMID: 38310164 PMCID: PMC10838282 DOI: 10.1038/s41698-024-00502-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 01/04/2024] [Indexed: 02/05/2024] Open
Abstract
The rich chemical information from tissue metabolomics provides a powerful means to elaborate tissue physiology or tumor characteristics at cellular and tumor microenvironment levels. However, the process of obtaining such information requires invasive biopsies, is costly, and can delay clinical patient management. Conversely, computed tomography (CT) is a clinical standard of care but does not intuitively harbor histological or prognostic information. Furthermore, the ability to embed metabolome information into CT to subsequently use the learned representation for classification or prognosis has yet to be described. This study develops a deep learning-based framework -- tissue-metabolomic-radiomic-CT (TMR-CT) by combining 48 paired CT images and tumor/normal tissue metabolite intensities to generate ten image embeddings to infer metabolite-derived representation from CT alone. In clinical NSCLC settings, we ascertain whether TMR-CT results in an enhanced feature generation model solving histology classification/prognosis tasks in an unseen international CT dataset of 742 patients. TMR-CT non-invasively determines histological classes - adenocarcinoma/squamous cell carcinoma with an F1-score = 0.78 and further asserts patients' prognosis with a c-index = 0.72, surpassing the performance of radiomics models and deep learning on single modality CT feature extraction. Additionally, our work shows the potential to generate informative biology-inspired CT-led features to explore connections between hard-to-obtain tissue metabolic profiles and routine lesion-derived image data.
Collapse
Affiliation(s)
| | | | - Sumeet Hindocha
- Early Diagnosis and Detection Centre, National Institute for Health and Care Research Biomedical Research Centre at the Royal Marsden and Institute of Cancer Research, London, SW3 6JJ, UK
| | - Mitchell Chen
- Imperial College London Hammersmith Campus, London, SW7 2AZ, UK
| | - Paula Moreno
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain
- Departamento de Cirugía Toráxica y Trasplante de Pulmón, Hospital Universitario Reina Sofía, Córdoba, 14014, Spain
| | - Marina Álvarez-Benito
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain
- Unidad de Radiodiagnóstico y Cáncer de Mama, Hospital Universitario Reina Sofía, Córdoba, 14004, Spain
| | - Ángel Salvatierra
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain
- Unidad de Radiodiagnóstico y Cáncer de Mama, Hospital Universitario Reina Sofía, Córdoba, 14004, Spain
| | - Richard Lee
- Early Diagnosis and Detection Centre, National Institute for Health and Care Research Biomedical Research Centre at the Royal Marsden and Institute of Cancer Research, London, SW3 6JJ, UK
- National Heart and Lung Institute, Imperial College London, Guy Scadding Building, Dovehouse Street, London, SW3 6LY, UK
| | - Joram M Posma
- Imperial College London Hammersmith Campus, London, SW7 2AZ, UK
| | - Marco A Calzado
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain.
- Departamento de Biología Celular, Fisiología e Inmunología, Universidad de Córdoba, Córdoba, 14014, Spain.
| | - Eric O Aboagye
- Imperial College London Hammersmith Campus, London, SW7 2AZ, UK.
| |
Collapse
|
9
|
Luo H, Liang H, Liu H, Fan Z, Wei Y, Yao X, Cong S. TEMINET: A Co-Informative and Trustworthy Multi-Omics Integration Network for Diagnostic Prediction. Int J Mol Sci 2024; 25:1655. [PMID: 38338932 PMCID: PMC10855161 DOI: 10.3390/ijms25031655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 01/20/2024] [Accepted: 01/26/2024] [Indexed: 02/12/2024] Open
Abstract
Advancing the domain of biomedical investigation, integrated multi-omics data have shown exceptional performance in elucidating complex human diseases. However, as the variety of omics information expands, precisely perceiving the informativeness of intra- and inter-omics becomes challenging due to the intricate interrelations, thus presenting significant challenges in the integration of multi-omics data. To address this, we introduce a novel multi-omics integration approach, referred to as TEMINET. This approach enhances diagnostic prediction by leveraging an intra-omics co-informative representation module and a trustworthy learning strategy used to address inter-omics fusion. Considering the multifactorial nature of complex diseases, TEMINET utilizes intra-omics features to construct disease-specific networks; then, it applies graph attention networks and a multi-level framework to capture more collective informativeness than pairwise relations. To perceive the contribution of co-informative representations within intra-omics, we designed a trustworthy learning strategy to identify the reliability of each omics in integration. To integrate inter-omics information, a combined-beliefs fusion approach is deployed to harmonize the trustworthy representations of different omics types effectively. Our experiments across four different diseases using mRNA, methylation, and miRNA data demonstrate that TEMINET achieves advanced performance and robustness in classification tasks.
Collapse
Affiliation(s)
- Haoran Luo
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Hong Liang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Hongwei Liu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Zhoujie Fan
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
| | - Yanhui Wei
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Xiaohui Yao
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Shan Cong
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| |
Collapse
|
10
|
Viana JN, Pilbeam C, Howard M, Scholz B, Ge Z, Fisser C, Mitchell I, Raman S, Leach J. Maintaining High-Touch in High-Tech Digital Health Monitoring and Multi-Omics Prognostication: Ethical, Equity, and Societal Considerations in Precision Health for Palliative Care. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2023; 27:461-473. [PMID: 37861713 DOI: 10.1089/omi.2023.0120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
Advances in digital health, systems biology, environmental monitoring, and artificial intelligence (AI) continue to revolutionize health care, ushering a precision health future. More than disease treatment and prevention, precision health aims at maintaining good health throughout the lifespan. However, how can precision health impact care for people with a terminal or life-limiting condition? We examine here the ethical, equity, and societal/relational implications of two precision health modalities, (1) integrated systems biology/multi-omics analysis for disease prognostication and (2) digital health technologies for health status monitoring and communication. We focus on three main ethical and societal considerations: benefits and risks associated with integration of these modalities into the palliative care system; inclusion of underrepresented and marginalized groups in technology development and deployment; and the impact of high-tech modalities on palliative care's highly personalized and "high-touch" practice. We conclude with 10 recommendations for ensuring that precision health technologies, such as multi-omics prognostication and digital health monitoring, for palliative care are developed, tested, and implemented ethically, inclusively, and equitably.
Collapse
Affiliation(s)
- John Noel Viana
- Australian National Centre for the Public Awareness of Science, College of Science, The Australian National University, Canberra, Australia
- Responsible Innovation Future Science Platform, Commonwealth Scientific and Industrial Research Organisation, Brisbane, Australia
| | - Caitlin Pilbeam
- School of Medicine and Psychology, College of Health and Medicine, The Australian National University, Canberra, Australia
| | - Mark Howard
- Monash Data Futures Institute, Monash University, Clayton, Australia
- Department of Philosophy, School of Philosophical, Historical and International Studies, Monash University, Clayton, Australia
| | - Brett Scholz
- School of Medicine and Psychology, College of Health and Medicine, The Australian National University, Canberra, Australia
| | - Zongyuan Ge
- Monash Data Futures Institute, Monash University, Clayton, Australia
- Department of Data Science & AI, Monash University, Clayton, Australia
| | - Carys Fisser
- Australian National Centre for the Public Awareness of Science, College of Science, The Australian National University, Canberra, Australia
- School of Medicine and Psychology, College of Health and Medicine, The Australian National University, Canberra, Australia
| | - Imogen Mitchell
- School of Medicine and Psychology, College of Health and Medicine, The Australian National University, Canberra, Australia
- Intensive Care Unit, Canberra Hospital, Canberra, Australia
| | - Sujatha Raman
- Australian National Centre for the Public Awareness of Science, College of Science, The Australian National University, Canberra, Australia
| | - Joan Leach
- Australian National Centre for the Public Awareness of Science, College of Science, The Australian National University, Canberra, Australia
| |
Collapse
|
11
|
Yassi M, Chatterjee A, Parry M. Application of deep learning in cancer epigenetics through DNA methylation analysis. Brief Bioinform 2023; 24:bbad411. [PMID: 37985455 PMCID: PMC10661960 DOI: 10.1093/bib/bbad411] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/08/2023] [Accepted: 10/25/2023] [Indexed: 11/22/2023] Open
Abstract
DNA methylation is a fundamental epigenetic modification involved in various biological processes and diseases. Analysis of DNA methylation data at a genome-wide and high-throughput level can provide insights into diseases influenced by epigenetics, such as cancer. Recent technological advances have led to the development of high-throughput approaches, such as genome-scale profiling, that allow for computational analysis of epigenetics. Deep learning (DL) methods are essential in facilitating computational studies in epigenetics for DNA methylation analysis. In this systematic review, we assessed the various applications of DL applied to DNA methylation data or multi-omics data to discover cancer biomarkers, perform classification, imputation and survival analysis. The review first introduces state-of-the-art DL architectures and highlights their usefulness in addressing challenges related to cancer epigenetics. Finally, the review discusses potential limitations and future research directions in this field.
Collapse
Affiliation(s)
- Maryam Yassi
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand
| | - Aniruddha Chatterjee
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand
- Honorary Professor, UPES University, Dehradun, India
| | - Matthew Parry
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
- Te Pūnaha Matatini Centre of Research Excellence, University of Auckland, Auckland, New Zealand
| |
Collapse
|
12
|
Ellen JG, Jacob E, Nikolaou N, Markuzon N. Autoencoder-based multimodal prediction of non-small cell lung cancer survival. Sci Rep 2023; 13:15761. [PMID: 37737469 PMCID: PMC10517020 DOI: 10.1038/s41598-023-42365-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/09/2023] [Indexed: 09/23/2023] Open
Abstract
The ability to accurately predict non-small cell lung cancer (NSCLC) patient survival is crucial for informing physician decision-making, and the increasing availability of multi-omics data offers the promise of enhancing prognosis predictions. We present a multimodal integration approach that leverages microRNA, mRNA, DNA methylation, long non-coding RNA (lncRNA) and clinical data to predict NSCLC survival and identify patient subtypes, utilizing denoising autoencoders for data compression and integration. Survival performance for patients with lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) was compared across modality combinations and data integration methods. Using The Cancer Genome Atlas data, our results demonstrate that survival prediction models combining multiple modalities outperform single modality models. The highest performance was achieved with a combination of only two modalities, lncRNA and clinical, at concordance indices (C-indices) of 0.69 ± 0.03 for LUAD and 0.62 ± 0.03 for LUSC. Models utilizing all five modalities achieved mean C-indices of 0.67 ± 0.04 and 0.63 ± 0.02 for LUAD and LUSC, respectively, while the best individual modality performance reached C-indices of 0.64 ± 0.03 for LUAD and 0.59 ± 0.03 for LUSC. Analysis of biological differences revealed two distinct survival subtypes with over 900 differentially expressed transcripts.
Collapse
Affiliation(s)
- Jacob G Ellen
- Institute of Health Informatics, University College London, London, UK.
| | - Etai Jacob
- AstraZeneca, Oncology Data Science, Waltham, MA, USA
| | | | | |
Collapse
|
13
|
Zhang Y, Zhang N, Chai X, Sun T. Machine learning for image-based multi-omics analysis of leaf veins. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:4928-4941. [PMID: 37410807 DOI: 10.1093/jxb/erad251] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/29/2023] [Indexed: 07/08/2023]
Abstract
Veins are a critical component of the plant growth and development system, playing an integral role in supporting and protecting leaves, as well as transporting water, nutrients, and photosynthetic products. A comprehensive understanding of the form and function of veins requires a dual approach that combines plant physiology with cutting-edge image recognition technology. The latest advancements in computer vision and machine learning have facilitated the creation of algorithms that can identify vein networks and explore their developmental progression. Here, we review the functional, environmental, and genetic factors associated with vein networks, along with the current status of research on image analysis. In addition, we discuss the methods of venous phenotype extraction and multi-omics association analysis using machine learning technology, which could provide a theoretical basis for improving crop productivity by optimizing the vein network architecture.
Collapse
Affiliation(s)
- Yubin Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Ning Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Xiujuan Chai
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Tan Sun
- Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing, China
- Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| |
Collapse
|
14
|
Yuan T, Edelmann D, Fan Z, Alwers E, Kather JN, Brenner H, Hoffmeister M. Machine learning in the identification of prognostic DNA methylation biomarkers among patients with cancer: A systematic review of epigenome-wide studies. Artif Intell Med 2023; 143:102589. [PMID: 37673571 DOI: 10.1016/j.artmed.2023.102589] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 04/19/2023] [Accepted: 04/30/2023] [Indexed: 09/08/2023]
Abstract
BACKGROUND DNA methylation biomarkers have great potential in improving prognostic classification systems for patients with cancer. Machine learning (ML)-based analytic techniques might help overcome the challenges of analyzing high-dimensional data in relatively small sample sizes. This systematic review summarizes the current use of ML-based methods in epigenome-wide studies for the identification of DNA methylation signatures associated with cancer prognosis. METHODS We searched three electronic databases including PubMed, EMBASE, and Web of Science for articles published until 2 January 2023. ML-based methods and workflows used to identify DNA methylation signatures associated with cancer prognosis were extracted and summarized. Two authors independently assessed the methodological quality of included studies by a seven-item checklist adapted from 'A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies (PROBAST)' and from the 'Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK). Different ML methods and workflows used in included studies were summarized and visualized by a sunburst chart, a bubble chart, and Sankey diagrams, respectively. RESULTS Eighty-three studies were included in this review. Three major types of ML-based workflows were identified. 1) unsupervised clustering, 2) supervised feature selection, and 3) deep learning-based feature transformation. For the three workflows, the most frequently used ML techniques were consensus clustering, least absolute shrinkage and selection operator (LASSO), and autoencoder, respectively. The systematic review revealed that the performance of these approaches has not been adequately evaluated yet and that methodological and reporting flaws were common in the identified studies using ML techniques. CONCLUSIONS There is great heterogeneity in ML-based methodological strategies used by epigenome-wide studies to identify DNA methylation markers associated with cancer prognosis. In theory, most existing workflows could not handle the high multi-collinearity and potentially non-linearity interactions in epigenome-wide DNA methylation data. Benchmarking studies are needed to compare the relative performance of various approaches for specific cancer types. Adherence to relevant methodological and reporting guidelines are urgently needed.
Collapse
Affiliation(s)
- Tanwei Yuan
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany; Medical Faculty Heidelberg, Heidelberg University, Heidelberg, Germany
| | - Dominic Edelmann
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ziwen Fan
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Elizabeth Alwers
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; Medical Oncology, National Center of Tumour Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany; Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany; German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
15
|
Zhu J, Oh JH, Simhal AK, Elkin R, Norton L, Deasy JO, Tannenbaum A. Geometric graph neural networks on multi-omics data to predict cancer survival outcomes. Comput Biol Med 2023; 163:107117. [PMID: 37329617 PMCID: PMC10638676 DOI: 10.1016/j.compbiomed.2023.107117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/25/2023] [Accepted: 05/30/2023] [Indexed: 06/19/2023]
Abstract
The advance of sequencing technologies has enabled a thorough molecular characterization of the genome in human cancers. To improve patient prognosis predictions and subsequent treatment strategies, it is imperative to develop advanced computational methods to analyze large-scale, high-dimensional genomic data. However, traditional machine learning methods face a challenge in handling the high-dimensional, low-sample size problem that is shown in most genomic data sets. To address this, our group has developed geometric network analysis techniques on multi-omics data in connection with prior biological knowledge derived from protein-protein interactions (PPIs) or pathways. Geometric features obtained from the genomic network, such as Ollivier-Ricci curvature and the invariant measure of the associated Markov chain, have been shown to be predictive of survival outcomes in various cancers. In this study, we propose a novel supervised deep learning method called geometric graph neural network (GGNN) that incorporates such geometric features into deep learning for enhanced predictive power and interpretability. More specifically, we utilize a state-of-the-art graph neural network with sparse connections between the hidden layers based on known biology of the PPI network and pathway information. Geometric features along with multi-omics data are then incorporated into the corresponding layers. The proposed approach utilizes a local-global principle in such a manner that highly predictive features are selected at the front layers and fed directly to the last layer for multivariable Cox proportional-hazards regression modeling. The method was applied to multi-omics data from the CoMMpass study of multiple myeloma and ten major cancers in The Cancer Genome Atlas (TCGA). In most experiments, our method showed superior predictive performance compared to other alternative methods.
Collapse
Affiliation(s)
- Jiening Zhu
- Department of Applied Mathematics & Statistics, Stony Brook University, NY, USA.
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Anish K Simhal
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Rena Elkin
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Larry Norton
- Department of Medicine, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Joseph O Deasy
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Allen Tannenbaum
- Department of Applied Mathematics & Statistics, Stony Brook University, NY, USA; Department of Computer Science, Stony Brook University, NY, USA.
| |
Collapse
|
16
|
Guan J, Yao L, Chung CR, Chiang YC, Lee TY. StackTHPred: Identifying Tumor-Homing Peptides through GBDT-Based Feature Selection with Stacking Ensemble Architecture. Int J Mol Sci 2023; 24:10348. [PMID: 37373494 DOI: 10.3390/ijms241210348] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023] Open
Abstract
One of the major challenges in cancer therapy lies in the limited targeting specificity exhibited by existing anti-cancer drugs. Tumor-homing peptides (THPs) have emerged as a promising solution to this issue, due to their capability to specifically bind to and accumulate in tumor tissues while minimally impacting healthy tissues. THPs are short oligopeptides that offer a superior biological safety profile, with minimal antigenicity, and faster incorporation rates into target cells/tissues. However, identifying THPs experimentally, using methods such as phage display or in vivo screening, is a complex, time-consuming task, hence the need for computational methods. In this study, we proposed StackTHPred, a novel machine learning-based framework that predicts THPs using optimal features and a stacking architecture. With an effective feature selection algorithm and three tree-based machine learning algorithms, StackTHPred has demonstrated advanced performance, surpassing existing THP prediction methods. It achieved an accuracy of 0.915 and a 0.831 Matthews Correlation Coefficient (MCC) score on the main dataset, and an accuracy of 0.883 and a 0.767 MCC score on the small dataset. StackTHPred also offers favorable interpretability, enabling researchers to better understand the intrinsic characteristics of THPs. Overall, StackTHPred is beneficial for both the exploration and identification of THPs and facilitates the development of innovative cancer therapies.
Collapse
Affiliation(s)
- Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong (Shenzhen) 2001 Longxiang Road, Shenzhen 518172, China
| | - Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Chia-Ru Chung
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Ying-Chih Chiang
- School of Medicine, The Chinese University of Hong Kong (Shenzhen) 2001 Longxiang Road, Shenzhen 518172, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
17
|
Wang HQ, Li HL, Han JL, Feng ZP, Deng HX, Han X. MMDAE-HGSOC: A novel method for high-grade serous ovarian cancer molecular subtypes classification based on multi-modal deep autoencoder. Comput Biol Chem 2023; 105:107906. [PMID: 37336028 DOI: 10.1016/j.compbiolchem.2023.107906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 05/10/2023] [Accepted: 06/11/2023] [Indexed: 06/21/2023]
Abstract
High-grade serous ovarian cancer (HGSOC) is a type of ovarian cancer developed from serous tubal intraepithelial carcinoma. The intrinsic differences among molecular subtypes are closely associated with prognosis and pathological characteristics. At present, multi-omics data integration methods include early integration and late integration. Most existing HGSOC molecular subtypes classification methods are based on the early integration of multi-omics data. The mutual interference among multi-omics data is ignored, which affects the effectiveness of feature learning. High-dimensional multi-omics data contains genes unassociated with HGSOC molecular subtypes, resulting in redundant information, which is not conducive to model training. In this paper, we propose a multi-modal deep autoencoder learning method, MMDAE-HGSOC. MiRNA expression, DNA methylation, and copy number variation (CNV) are integrated with mRNA expression data to construct a multi-omics feature space. The multi-modal deep autoencoder network is used to learn the high-level feature representation of multi-omics data. The superposition LASSO (S-LASSO) regression algorithm is proposed to fully obtain the associated genes of HGSOC molecular subtypes. The experimental results show that MMDAE-HGSOC is superior to the existing classification methods. Finally, we analyze the enrichment gene ontology (GO) terms and biological pathways of these significant genes, which are discovered during the gene selection process.
Collapse
Affiliation(s)
- Hui-Qing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China.
| | - Hao-Lin Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China.
| | - Jia-Le Han
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Zhi-Peng Feng
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Hong-Xia Deng
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Xiao Han
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| |
Collapse
|
18
|
Li X, Yang L, Jiao X. Deep learning-based multiomics integration model for predicting axillary lymph node metastasis in breast cancer. Future Oncol 2023; 19:1429-1438. [PMID: 37489287 DOI: 10.2217/fon-2023-0070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023] Open
Abstract
Aim: To develop a deep learning-based multiomics integration model. Materials & methods: Five types of omics data (mRNA, DNA methylation, miRNA, copy number variation and protein expression) were used to build a deep learning-based multiomics integration model via a deep neural network, incorporating an attention mechanism that adaptively considers the weights of multiomics features. Results: Compared with other methods, the deep learning-based multiomics integration model achieved remarkable results, with an area under the curve of 0.89 (95% CI: 0.863-0.910). Conclusion: The deep learning-based multiomics integration model achieved promising results and is an effective method for predicting axillary lymph node metastasis in breast cancer.
Collapse
Affiliation(s)
- Xue Li
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, People's Republic of China
| | - Lifeng Yang
- College of Computer Science & Technology, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, People's Republic of China
| | - Xiong Jiao
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, People's Republic of China
| |
Collapse
|
19
|
Local augmented graph neural network for multi-omics cancer prognosis prediction and analysis. Methods 2023; 213:1-9. [PMID: 36933628 DOI: 10.1016/j.ymeth.2023.02.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 12/30/2022] [Accepted: 02/25/2023] [Indexed: 03/17/2023] Open
Abstract
Cancer prognosis prediction and analysis can help patients understand expected life and help clinicians provide correct therapeutic guidance. Thanks to the development of sequencing technology, multi-omics data, and biological networks have been used for cancer prognosis prediction. Besides, graph neural networks can simultaneously consider multi-omics features and molecular interactions in biological networks, becoming mainstream in cancer prognosis prediction and analysis. However, the limited number of neighboring genes in biological networks restricts the accuracy of graph neural networks. To solve this problem, a local augmented graph convolutional network named LAGProg is proposed in this paper for cancer prognosis prediction and analysis. The process follows: first, given a patient's multi-omics data features and biological network, the corresponding augmented conditional variational autoencoder generates features. Then, the generated augmented features and the original features are fed into a cancer prognosis prediction model to complete the cancer prognosis prediction task. The conditional variational autoencoder consists of two parts: encoder-decoder. In the encoding phase, an encoder learns the conditional distribution of the multi-omics data. As a generative model, a decoder takes the conditional distribution and the original feature as inputs to generate the enhanced features. The cancer prognosis prediction model consists of a two-layer graph convolutional neural network and a Cox proportional risk network. The Cox proportional risk network consists of fully connected layers. Extensive experiments on 15 real-world datasets from TCGA demonstrated the effectiveness and efficiency of the proposed method in predicting cancer prognosis. LAGProg improved the C-index values by an average of 8.5% over the state-of-the-art graph neural network method. Moreover, we confirmed that the local augmentation technique could enhance the model's ability to represent multi-omics features, improve the model's robustness to missing multi-omics features, and prevent the model's over-smoothing during training. Finally, based on genes identified through differential expression analysis, we discovered 13 prognostic markers highly associated with breast cancer, among which ten genes have been proved by literature review.
Collapse
|
20
|
Mohammed MA, Abdulkareem KH, Dinar AM, Zapirain BG. Rise of Deep Learning Clinical Applications and Challenges in Omics Data: A Systematic Review. Diagnostics (Basel) 2023; 13:664. [PMID: 36832152 PMCID: PMC9955380 DOI: 10.3390/diagnostics13040664] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/05/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023] Open
Abstract
This research aims to review and evaluate the most relevant scientific studies about deep learning (DL) models in the omics field. It also aims to realize the potential of DL techniques in omics data analysis fully by demonstrating this potential and identifying the key challenges that must be addressed. Numerous elements are essential for comprehending numerous studies by surveying the existing literature. For example, the clinical applications and datasets from the literature are essential elements. The published literature highlights the difficulties encountered by other researchers. In addition to looking for other studies, such as guidelines, comparative studies, and review papers, a systematic approach is used to search all relevant publications on omics and DL using different keyword variants. From 2018 to 2022, the search procedure was conducted on four Internet search engines: IEEE Xplore, Web of Science, ScienceDirect, and PubMed. These indexes were chosen because they offer enough coverage and linkages to numerous papers in the biological field. A total of 65 articles were added to the final list. The inclusion and exclusion criteria were specified. Of the 65 publications, 42 are clinical applications of DL in omics data. Furthermore, 16 out of 65 articles comprised the review publications based on single- and multi-omics data from the proposed taxonomy. Finally, only a small number of articles (7/65) were included in papers focusing on comparative analysis and guidelines. The use of DL in studying omics data presented several obstacles related to DL itself, preprocessing procedures, datasets, model validation, and testbed applications. Numerous relevant investigations were performed to address these issues. Unlike other review papers, our study distinctly reflects different observations on omics with DL model areas. We believe that the result of this study can be a useful guideline for practitioners who look for a comprehensive view of the role of DL in omics data analysis.
Collapse
Affiliation(s)
- Mazin Abed Mohammed
- College of Computer Science and Information Technology, University of Anbar, Anbar 31001, Iraq
- eVIDA Lab, University of Deusto, 48007 Bilbao, Spain
| | - Karrar Hameed Abdulkareem
- College of Agriculture, Al-Muthanna University, Samawah 66001, Iraq
- College of Engineering, University of Warith Al-Anbiyaa, Karbala 56001, Iraq
| | - Ahmed M. Dinar
- Computer Engineering Department, University of Technology- Iraq, Baghdad 19006, Iraq
| | | |
Collapse
|
21
|
Yu X, Zhou S, Zou H, Wang Q, Liu C, Zang M, Liu T. Survey of deep learning techniques for disease prediction based on omics data. HUMAN GENE 2023; 35:201140. [DOI: 10.1016/j.humgen.2022.201140] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
22
|
Mechanistic insights into dietary (poly)phenols and vascular dysfunction-related diseases using multi-omics and integrative approaches: Machine learning as a next challenge in nutrition research. Mol Aspects Med 2023; 89:101101. [PMID: 35728999 DOI: 10.1016/j.mam.2022.101101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/09/2022] [Accepted: 06/11/2022] [Indexed: 02/04/2023]
Abstract
Dietary (poly)phenols have been extensively studied for their vasculoprotective effects and consequently their role in preventing or delaying onsets of cardiovascular and metabolic diseases. Even though early studies have ascribed the vasculoprotective properties of (poly)phenols primarily on their putative free radical scavenging properties, recent data indicate that in biological systems, (poly)phenols act primarily through genomic and epigenomic mechanisms. The molecular mechanisms underlying their health properties are still not well identified, mainly due to the use of physiologically non-relevant conditions (native molecules or extracts at high concentrations, rather than circulating metabolites), but also due to the use of targeted genomic approaches aiming to evaluate the effect only on few specific genes, thus preventing to decipher detailed molecular mechanisms involved. The use of state-of-the-art untargeted analytical methods represents a significant breakthrough in nutrigenomics, as these methods enable detailed insights into the effects at each specific omics level. Moreover, the implementation of multi-omics approaches allows integration of different levels of regulation of cellular functions, to obtain a comprehensive picture of the molecular mechanisms of action of (poly)phenols. In combination with bioinformatics and the methods of machine learning, multi-omics has potential to make a huge contribution to the nutrition science. The aim of this review is to provide an overview of the use of the omics, multi-omics, and integrative approaches in studying the vasculoprotective properties of dietary (poly)phenols and address the potentials for use of the machine learning in nutrigenomics.
Collapse
|
23
|
Mathema VB, Sen P, Lamichhane S, Orešič M, Khoomrung S. Deep learning facilitates multi-data type analysis and predictive biomarker discovery in cancer precision medicine. Comput Struct Biotechnol J 2023; 21:1372-1382. [PMID: 36817954 PMCID: PMC9929204 DOI: 10.1016/j.csbj.2023.01.043] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 01/28/2023] [Accepted: 01/29/2023] [Indexed: 02/02/2023] Open
Abstract
Cancer progression is linked to gene-environment interactions that alter cellular homeostasis. The use of biomarkers as early indicators of disease manifestation and progression can substantially improve diagnosis and treatment. Large omics datasets generated by high-throughput profiling technologies, such as microarrays, RNA sequencing, whole-genome shotgun sequencing, nuclear magnetic resonance, and mass spectrometry, have enabled data-driven biomarker discoveries. The identification of differentially expressed traits as molecular markers has traditionally relied on statistical techniques that are often limited to linear parametric modeling. The heterogeneity, epigenetic changes, and high degree of polymorphism observed in oncogenes demand biomarker-assisted personalized medication schemes. Deep learning (DL), a major subunit of machine learning (ML), has been increasingly utilized in recent years to investigate various diseases. The combination of ML/DL approaches for performance optimization across multi-omics datasets produces robust ensemble-learning prediction models, which are becoming useful in precision medicine. This review focuses on the recent development of ML/DL methods to provide integrative solutions in discovering cancer-related biomarkers, and their utilization in precision medicine.
Collapse
Affiliation(s)
- Vivek Bhakta Mathema
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Partho Sen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
- School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Santosh Lamichhane
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Matej Orešič
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
- School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Sakda Khoomrung
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Center of Excellence for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Bangkok, Thailand
- Corresponding author at: Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
24
|
Machine learning to analyse omic-data for COVID-19 diagnosis and prognosis. BMC Bioinformatics 2023; 24:7. [PMID: 36609221 PMCID: PMC9817417 DOI: 10.1186/s12859-022-05127-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 12/23/2022] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND With the global spread of COVID-19, the world has seen many patients, including many severe cases. The rapid development of machine learning (ML) has made significant disease diagnosis and prediction achievements. Current studies have confirmed that omics data at the host level can reflect the development process and prognosis of the disease. Since early diagnosis and effective treatment of severe COVID-19 patients remains challenging, this research aims to use omics data in different ML models for COVID-19 diagnosis and prognosis. We used several ML models on omics data of a large number of individuals to first predict whether patients are COVID-19 positive or negative, followed by the severity of the disease. RESULTS On the COVID-19 diagnosis task, we got the best AUC of 0.99 with our multilayer perceptron model and the highest F1-score of 0.95 with our logistic regression (LR) model. For the severity prediction task, we achieved the highest accuracy of 0.76 with an LR model. Beyond classification and predictive modeling, our study founds ML models performed better on integrated multi-omics data, rather than single omics. By comparing top features from different omics dataset, we also found the robustness of our model, with a wider range of applicability in diverse dataset related to COVID-19. Additionally, we have found that omics-based models performed better than image or physiological feature-based models, proving the importance of the omics-based dataset for future model development. CONCLUSIONS This study diagnoses COVID-19 positive cases and predicts accurate severity levels. It lowers the dependence on clinical data and professional judgment, by leveraging the utilization of state-of-the-art models. our model showed wider applicability across different omics dataset, which is highly transferable in other respiratory or similar diseases. Hospital and public health care mechanisms can optimize the distribution of medical resources and improve the robustness of the medical system.
Collapse
|
25
|
Liao J, Li X, Gan Y, Han S, Rong P, Wang W, Li W, Zhou L. Artificial intelligence assists precision medicine in cancer treatment. Front Oncol 2023; 12:998222. [PMID: 36686757 PMCID: PMC9846804 DOI: 10.3389/fonc.2022.998222] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 11/22/2022] [Indexed: 01/06/2023] Open
Abstract
Cancer is a major medical problem worldwide. Due to its high heterogeneity, the use of the same drugs or surgical methods in patients with the same tumor may have different curative effects, leading to the need for more accurate treatment methods for tumors and personalized treatments for patients. The precise treatment of tumors is essential, which renders obtaining an in-depth understanding of the changes that tumors undergo urgent, including changes in their genes, proteins and cancer cell phenotypes, in order to develop targeted treatment strategies for patients. Artificial intelligence (AI) based on big data can extract the hidden patterns, important information, and corresponding knowledge behind the enormous amount of data. For example, the ML and deep learning of subsets of AI can be used to mine the deep-level information in genomics, transcriptomics, proteomics, radiomics, digital pathological images, and other data, which can make clinicians synthetically and comprehensively understand tumors. In addition, AI can find new biomarkers from data to assist tumor screening, detection, diagnosis, treatment and prognosis prediction, so as to providing the best treatment for individual patients and improving their clinical outcomes.
Collapse
Affiliation(s)
- Jinzhuang Liao
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Xiaoying Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Yu Gan
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Shuangze Han
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Pengfei Rong
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Wei Wang
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Wei Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Li Zhou
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
- Department of Pathology, The Xiangya Hospital of Central South University, Changsha, Hunan, China
| |
Collapse
|
26
|
Chen S, Zang Y, Xu B, Lu B, Ma R, Miao P, Chen B. An Unsupervised Deep Learning-Based Model Using Multiomics Data to Predict Prognosis of Patients with Stomach Adenocarcinoma. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:5844846. [PMID: 36339684 PMCID: PMC9633210 DOI: 10.1155/2022/5844846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 09/25/2022] [Accepted: 10/08/2022] [Indexed: 09/08/2023]
Abstract
METHODS Patients (363 in total) with stomach adenocarcinoma from The Cancer Genome Atlas (TCGA) cohort were included. An autoencoder was constructed to integrate the RNA sequencing, miRNA sequencing, and methylation data. The features of the bottleneck layer were used to perform the k-means clustering algorithm to obtain different subgroups for evaluating the prognosis-related risk of stomach adenocarcinoma. The model's robustness was verified using a 10-fold cross-validation (CV). Survival was analyzed by the Kaplan-Meier method. Univariate and multivariate Cox regression was used to estimate hazard risk. The model was validated in three independent cohorts with different endpoints. RESULTS The patients were divided into low-risk and high-risk groups according to the k-means clustering algorithm. The high-risk group had a significantly higher risk of poor survival (log-rank P value = 2.80e - 06; adjusted hazard ratio = 2.386, 95% confidence interval: 1.607~3.543), a concordance index (C-index) of 0.714, and a Brier score of 0.184. The model performed well both in the 10-fold CV procedure and three independent cohorts from the Gene Expression Omnibus (GEO) repository. CONCLUSIONS A robust and generalizable model based on the autoencoder was proposed to integrate multiomics data and predict the prognosis of patients with stomach adenocarcinoma. The model demonstrates better performance than two alternative approaches on prognosis prediction. The results might provide the grounds for further exploring the potential biomarkers to predict the prognosis of patients with stomach adenocarcinoma.
Collapse
Affiliation(s)
- Sizhen Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Yiteng Zang
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Biyun Xu
- Department of Biostatistics, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing 210008, China
| | - Beier Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Rongji Ma
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Pengcheng Miao
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Bingwei Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| |
Collapse
|
27
|
Tsimenidis S, Vrochidou E, Papakostas GA. Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:12272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
Affiliation(s)
| | | | - George A. Papakostas
- MLV Research Group, Department of Computer Science, International Hellenic University, 65404 Kavala, Greece
| |
Collapse
|
28
|
Subramanian A, Zakeri P, Mousa M, Alnaqbi H, Alshamsi FY, Bettoni L, Damiani E, Alsafar H, Saeys Y, Carmeliet P. Angiogenesis goes computational - The future way forward to discover new angiogenic targets? Comput Struct Biotechnol J 2022; 20:5235-5255. [PMID: 36187917 PMCID: PMC9508490 DOI: 10.1016/j.csbj.2022.09.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/09/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Multi-omics technologies are being increasingly utilized in angiogenesis research. Yet, computational methods have not been widely used for angiogenic target discovery and prioritization in this field, partly because (wet-lab) vascular biologists are insufficiently familiar with computational biology tools and the opportunities they may offer. With this review, written for vascular biologists who lack expertise in computational methods, we aspire to break boundaries between both fields and to illustrate the potential of these tools for future angiogenic target discovery. We provide a comprehensive survey of currently available computational approaches that may be useful in prioritizing candidate genes, predicting associated mechanisms, and identifying their specificity to endothelial cell subtypes. We specifically highlight tools that use flexible, machine learning frameworks for large-scale data integration and gene prioritization. For each purpose-oriented category of tools, we describe underlying conceptual principles, highlight interesting applications and discuss limitations. Finally, we will discuss challenges and recommend some guidelines which can help to optimize the process of accurate target discovery.
Collapse
Affiliation(s)
- Abhishek Subramanian
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Pooya Zakeri
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Centre for Brain and Disease Research, Flanders Institute for Biotechnology (VIB), Leuven, Belgium
- Department of Neurosciences and Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Mira Mousa
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Halima Alnaqbi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Fatima Yousif Alshamsi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Leo Bettoni
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Ernesto Damiani
- Robotics and Intelligent Systems Institute, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Habiba Alsafar
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Peter Carmeliet
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| |
Collapse
|
29
|
Leng D, Zheng L, Wen Y, Zhang Y, Wu L, Wang J, Wang M, Zhang Z, He S, Bo X. A benchmark study of deep learning-based multi-omics data fusion methods for cancer. Genome Biol 2022; 23:171. [PMID: 35945544 PMCID: PMC9361561 DOI: 10.1186/s13059-022-02739-2] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/26/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fused method using a combination of multi-omics data enables a comprehensive study of complex biological processes and highlights the interrelationship of relevant biomolecules and their functions. Driven by high-throughput sequencing technologies, several promising deep learning methods have been proposed for fusing multi-omics data generated from a large number of samples. RESULTS In this study, 16 representative deep learning methods are comprehensively evaluated on simulated, single-cell, and cancer multi-omics datasets. For each of the datasets, two tasks are designed: classification and clustering. The classification performance is evaluated by using three benchmarking metrics including accuracy, F1 macro, and F1 weighted. Meanwhile, the clustering performance is evaluated by using four benchmarking metrics including the Jaccard index (JI), C-index, silhouette score, and Davies Bouldin score. For the cancer multi-omics datasets, the methods' strength in capturing the association of multi-omics dimensionality reduction results with survival and clinical annotations is further evaluated. The benchmarking results indicate that moGAT achieves the best classification performance. Meanwhile, efmmdVAE, efVAE, and lfmmdVAE show the most promising performance across all complementary contexts in clustering tasks. CONCLUSIONS Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate deep learning-based multi-omics data fusion methods, but also suggest the future directions for the development of more effective multi-omics data fusion methods. The deep learning frameworks are available at https://github.com/zhenglinyi/DL-mo .
Collapse
Affiliation(s)
- Dongjin Leng
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Linyi Zheng
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Yuqi Wen
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Yunhao Zhang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, People’s Republic of China
| | - Jing Wang
- School of Medicine, Tsinghua University, Beijing, People’s Republic of China
| | - Meihong Wang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Song He
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| |
Collapse
|
30
|
Niu J, Xu W, Wei D, Qian K, Wang Q. Deep Learning Framework for Integrating Multibatch Calibration, Classification, and Pathway Activities. Anal Chem 2022; 94:8937-8946. [PMID: 35709357 DOI: 10.1021/acs.analchem.2c00601] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The amount of available biological data has exploded since the emergence of high-throughput technologies, which is not only revolting the way we recognize molecules and diseases but also bringing novel analytical challenges to bioinformatics analysis. In recent years, deep learning has become a dominant technique in data science. However, classification accuracy is plagued with domain discrepancy. Notably, in the presence of multiple batches, domain discrepancy typically happens between individual batches. Most pairwise adaptation approaches may be suboptimal as they fail to eliminate external factors across multiple batches and take the classification task into account simultaneously. We propose a joint deep learning framework for integrating batch effect removal, classification, and downstream pathway activities upon biological data. To this end, we validate it on two MALDI MS-based metabolomics datasets. We have achieved the highest diagnostic accuracy (ACC), with a notable ∼10% improvement over other methods. Overall, these results indicate that our approach removes batch effect more effectively than state-of-the-art methods and yields more accurate classification as well as biomarkers for smart diagnosis.
Collapse
Affiliation(s)
- JingYang Niu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Wei Xu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - DongMing Wei
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Kun Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Qian Wang
- School of Biomedical Engineering, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
31
|
Wei Z, Han D, Zhang C, Wang S, Liu J, Chao F, Song Z, Chen G. Deep Learning-Based Multi-Omics Integration Robustly Predicts Relapse in Prostate Cancer. Front Oncol 2022; 12:893424. [PMID: 35814412 PMCID: PMC9259796 DOI: 10.3389/fonc.2022.893424] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 05/13/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectivePost-operative biochemical relapse (BCR) continues to occur in a significant percentage of patients with localized prostate cancer (PCa). Current stratification methods are not adequate to identify high-risk patients. The present study exploits the ability of deep learning (DL) algorithms using the H2O package to combine multi-omics data to resolve this problem.MethodsFive-omics data from 417 PCa patients from The Cancer Genome Atlas (TCGA) were used to construct the DL-based, relapse-sensitive model. Among them, 265 (63.5%) individuals experienced BCR. Five additional independent validation sets were applied to assess its predictive robustness. Bioinformatics analyses of two relapse-associated subgroups were then performed for identification of differentially expressed genes (DEGs), enriched pathway analysis, copy number analysis and immune cell infiltration analysis.ResultsThe DL-based model, with a significant difference (P = 6e-9) between two subgroups and good concordance index (C-index = 0.767), were proven to be robust by external validation. 1530 DEGs including 678 up- and 852 down-regulated genes were identified in the high-risk subgroup S2 compared with the low-risk subgroup S1. Enrichment analyses found five hallmark gene sets were up-regulated while 13 were down-regulated. Then, we found that DNA damage repair pathways were significantly enriched in the S2 subgroup. CNV analysis showed that 30.18% of genes were significantly up-regulated and gene amplification on chromosomes 7 and 8 was significantly elevated in the S2 subgroup. Moreover, enrichment analysis revealed that some DEGs and pathways were associated with immunity. Three tumor-infiltrating immune cell (TIIC) groups with a higher proportion in the S2 subgroup (p = 1e-05, p = 8.7e-06, p = 0.00014) and one TIIC group with a higher proportion in the S1 subgroup (P = 1.3e-06) were identified.ConclusionWe developed a novel, robust classification for understanding PCa relapse. This study validated the effectiveness of deep learning technique in prognosis prediction, and the method may benefit patients and prevent relapse by improving early detection and advancing early intervention.
Collapse
Affiliation(s)
- Ziwei Wei
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Dunsheng Han
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Cong Zhang
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Shiyu Wang
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Jinke Liu
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Fan Chao
- Department of Urology, Zhongshan Hospital, Fudan University (Xiamen Branch), Xiamen, China
| | - Zhenyu Song
- Ovarian Cancer Program, Department of Gynecologic Oncology, Zhongshan Hospital, Fudan University, Shanghai, China
- *Correspondence: Gang Chen, ; Zhenyu Song,
| | - Gang Chen
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
- *Correspondence: Gang Chen, ; Zhenyu Song,
| |
Collapse
|
32
|
Carrillo-Perez F, Morales JC, Castillo-Secilla D, Gevaert O, Rojas I, Herrera LJ. Machine-Learning-Based Late Fusion on Multi-Omics and Multi-Scale Data for Non-Small-Cell Lung Cancer Diagnosis. J Pers Med 2022; 12:601. [PMID: 35455716 PMCID: PMC9025878 DOI: 10.3390/jpm12040601] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 03/29/2022] [Accepted: 04/06/2022] [Indexed: 01/27/2023] Open
Abstract
Differentiation between the various non-small-cell lung cancer subtypes is crucial for providing an effective treatment to the patient. For this purpose, machine learning techniques have been used in recent years over the available biological data from patients. However, in most cases this problem has been treated using a single-modality approach, not exploring the potential of the multi-scale and multi-omic nature of cancer data for the classification. In this work, we study the fusion of five multi-scale and multi-omic modalities (RNA-Seq, miRNA-Seq, whole-slide imaging, copy number variation, and DNA methylation) by using a late fusion strategy and machine learning techniques. We train an independent machine learning model for each modality and we explore the interactions and gains that can be obtained by fusing their outputs in an increasing manner, by using a novel optimization approach to compute the parameters of the late fusion. The final classification model, using all modalities, obtains an F1 score of 96.81±1.07, an AUC of 0.993±0.004, and an AUPRC of 0.980±0.016, improving those results that each independent model obtains and those presented in the literature for this problem. These obtained results show that leveraging the multi-scale and multi-omic nature of cancer data can enhance the performance of single-modality clinical decision support systems in personalized medicine, consequently improving the diagnosis of the patient.
Collapse
Affiliation(s)
- Francisco Carrillo-Perez
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18170 Granada, Spain; (J.C.M.); (I.R.); (L.J.H.)
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, 1265 Welch Rd, Stanford, CA 94305, USA;
| | - Juan Carlos Morales
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18170 Granada, Spain; (J.C.M.); (I.R.); (L.J.H.)
| | - Daniel Castillo-Secilla
- Fujitsu Technology Solutions S.A, CoE Data Intelligence, Camino del Cerro de los Gamos, 1, Pozuelo de Alarcón, 28224 Madrid, Spain;
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, 1265 Welch Rd, Stanford, CA 94305, USA;
| | - Ignacio Rojas
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18170 Granada, Spain; (J.C.M.); (I.R.); (L.J.H.)
| | - Luis Javier Herrera
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18170 Granada, Spain; (J.C.M.); (I.R.); (L.J.H.)
| |
Collapse
|
33
|
Stahlschmidt SR, Ulfenborg B, Synnergren J. Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform 2022; 23:bbab569. [PMID: 35089332 PMCID: PMC8921642 DOI: 10.1093/bib/bbab569] [Citation(s) in RCA: 146] [Impact Index Per Article: 48.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/06/2021] [Accepted: 12/11/2021] [Indexed: 02/06/2023] Open
Abstract
Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.
Collapse
Affiliation(s)
| | | | - Jane Synnergren
- Systems Biology Research Center, University of Skövde, Sweden
| |
Collapse
|
34
|
Song C, Li X. Cost-Sensitive KNN Algorithm for Cancer Prediction Based on Entropy Analysis. ENTROPY 2022; 24:e24020253. [PMID: 35205547 PMCID: PMC8871087 DOI: 10.3390/e24020253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/29/2022] [Accepted: 01/31/2022] [Indexed: 02/06/2023]
Abstract
Early diagnosis of cancer is beneficial in the formulation of the best treatment plan; it can improve the survival rate and the quality of patient life. However, imaging detection and needle biopsy usually used not only find it difficult to effectively diagnose tumors at early stage, but also do great harm to the human body. Since the changes in a patient’s health status will cause changes in blood protein indexes, if cancer can be diagnosed by the changes in blood indexes in the early stage of cancer, it can not only conveniently track and detect the treatment process of cancer, but can also reduce the pain of patients and reduce the costs. In this paper, 39 serum protein markers were taken as research objects. The difference of the entropies of serum protein marker sequences in different types of patients was analyzed, and based on this, a cost-sensitive analysis model was established for the purpose of improving the accuracy of cancer recognition. The results showed that there were significant differences in entropy of different cancer patients, and the complexity of serum protein markers in normal people was higher than that in cancer patients. Although the dataset was rather imbalanced, containing 897 instances, including 799 normal instances, 44 liver cancer instances, and 54 ovarian cancer instances, the accuracy of our model still reached 95.21%. Other evaluation indicators were also stable and satisfactory; precision, recall, F1 and AUC reach 0.807, 0.833, 0.819 and 0.92, respectively. This study has certain theoretical and practical significance for cancer prediction and clinical application and can also provide a research basis for the intelligent medical treatment.
Collapse
|
35
|
Arjmand B, Hamidpour SK, Tayanloo-Beik A, Goodarzi P, Aghayan HR, Adibi H, Larijani B. Machine Learning: A New Prospect in Multi-Omics Data Analysis of Cancer. Front Genet 2022; 13:824451. [PMID: 35154283 PMCID: PMC8829119 DOI: 10.3389/fgene.2022.824451] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 01/10/2022] [Indexed: 12/11/2022] Open
Abstract
Cancer is defined as a large group of diseases that is associated with abnormal cell growth, uncontrollable cell division, and may tend to impinge on other tissues of the body by different mechanisms through metastasis. What makes cancer so important is that the cancer incidence rate is growing worldwide which can have major health, economic, and even social impacts on both patients and the governments. Thereby, the early cancer prognosis, diagnosis, and treatment can play a crucial role at the front line of combating cancer. The onset and progression of cancer can occur under the influence of complicated mechanisms and some alterations in the level of genome, proteome, transcriptome, metabolome etc. Consequently, the advent of omics science and its broad research branches (such as genomics, proteomics, transcriptomics, metabolomics, and so forth) as revolutionary biological approaches have opened new doors to the comprehensive perception of the cancer landscape. Due to the complexities of the formation and development of cancer, the study of mechanisms underlying cancer has gone beyond just one field of the omics arena. Therefore, making a connection between the resultant data from different branches of omics science and examining them in a multi-omics field can pave the way for facilitating the discovery of novel prognostic, diagnostic, and therapeutic approaches. As the volume and complexity of data from the omics studies in cancer are increasing dramatically, the use of leading-edge technologies such as machine learning can have a promising role in the assessments of cancer research resultant data. Machine learning is categorized as a subset of artificial intelligence which aims to data parsing, classification, and data pattern identification by applying statistical methods and algorithms. This acquired knowledge subsequently allows computers to learn and improve accurate predictions through experiences from data processing. In this context, the application of machine learning, as a novel computational technology offers new opportunities for achieving in-depth knowledge of cancer by analysis of resultant data from multi-omics studies. Therefore, it can be concluded that the use of artificial intelligence technologies such as machine learning can have revolutionary roles in the fight against cancer.
Collapse
Affiliation(s)
- Babak Arjmand
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
- *Correspondence: Babak Arjmand, ; Bagher Larijani,
| | - Shayesteh Kokabi Hamidpour
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Akram Tayanloo-Beik
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Parisa Goodarzi
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hamid Reza Aghayan
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hossein Adibi
- Diabetes Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Bagher Larijani
- Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
- *Correspondence: Babak Arjmand, ; Bagher Larijani,
| |
Collapse
|
36
|
Kang M, Ko E, Mersha TB. A roadmap for multi-omics data integration using deep learning. Brief Bioinform 2022; 23:bbab454. [PMID: 34791014 PMCID: PMC8769688 DOI: 10.1093/bib/bbab454] [Citation(s) in RCA: 133] [Impact Index Per Article: 44.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/30/2021] [Accepted: 10/05/2021] [Indexed: 12/18/2022] Open
Abstract
High-throughput next-generation sequencing now makes it possible to generate a vast amount of multi-omics data for various applications. These data have revolutionized biomedical research by providing a more comprehensive understanding of the biological systems and molecular mechanisms of disease development. Recently, deep learning (DL) algorithms have become one of the most promising methods in multi-omics data analysis, due to their predictive performance and capability of capturing nonlinear and hierarchical features. While integrating and translating multi-omics data into useful functional insights remain the biggest bottleneck, there is a clear trend towards incorporating multi-omics analysis in biomedical research to help explain the complex relationships between molecular layers. Multi-omics data have a role to improve prevention, early detection and prediction; monitor progression; interpret patterns and endotyping; and design personalized treatments. In this review, we outline a roadmap of multi-omics integration using DL and offer a practical perspective into the advantages, challenges and barriers to the implementation of DL in multi-omics data.
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Euiseong Ko
- Department of Computer Science at the University of Nevada, Las Vegas, NV, USA
| | - Tesfaye B Mersha
- Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
37
|
Sun N, Chu J, Hu W, Chen X, Yi N, Shen Y. A novel 14-gene signature for overall survival in lung adenocarcinoma based on the Bayesian hierarchical Cox proportional hazards model. Sci Rep 2022; 12:27. [PMID: 34996932 PMCID: PMC8741994 DOI: 10.1038/s41598-021-03645-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 12/06/2021] [Indexed: 12/14/2022] Open
Abstract
There have been few investigations of cancer prognosis models based on Bayesian hierarchical models. In this study, we used a novel Bayesian method to screen mRNAs and estimate the effects of mRNAs on the prognosis of patients with lung adenocarcinoma. Based on the identified mRNAs, we can build a prognostic model combining mRNAs and clinical features, allowing us to explore new molecules with the potential to predict the prognosis of lung adenocarcinoma. The mRNA data (n = 594) and clinical data (n = 470) for lung adenocarcinoma were obtained from the TCGA database. Gene set enrichment analysis (GSEA), univariate Cox proportional hazards regression, and the Bayesian hierarchical Cox proportional hazards model were used to explore the mRNAs related to the prognosis of lung adenocarcinoma. Multivariate Cox proportional hazard regression was used to identify independent markers. The prediction performance of the prognostic model was evaluated not only by the internal cross-validation but also by the external validation based on the GEO dataset (n = 437). With the Bayesian hierarchical Cox proportional hazards model, a 14-gene signature that included CPS1, CTPS2, DARS2, IGFBP3, MCM5, MCM7, NME4, NT5E, PLK1, POLR3G, PTTG1, SERPINB5, TXNRD1, and TYMS was established to predict overall survival in lung adenocarcinoma. Multivariate analysis demonstrated that the 14-gene signature (HR 3.960, 95% CI 2.710–5.786), T classification (T1, reference; T3, HR 1.925, 95% CI 1.104–3.355) and N classification (N0, reference; N1, HR 2.212, 95% CI 1.520–3.220; N2, HR 2.260, 95% CI 1.499–3.409) were independent predictors. The C-index of the model was 0.733 and 0.735, respectively, after performing cross-validation and external validation, a nomogram was provided for better prediction in clinical application. Bayesian hierarchical Cox proportional hazards models can be used to integrate high-dimensional omics information into a prediction model for lung adenocarcinoma to improve the prognostic prediction and discover potential targets. This approach may be a powerful predictive tool for clinicians treating malignant tumours.
Collapse
Affiliation(s)
- Na Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China
| | - Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China
| | - Wei Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China
| | - Xuanli Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China.
| |
Collapse
|
38
|
Arslan E, Schulz J, Rai K. Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine. Biochim Biophys Acta Rev Cancer 2021; 1876:188588. [PMID: 34245839 PMCID: PMC8595561 DOI: 10.1016/j.bbcan.2021.188588] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 05/29/2021] [Accepted: 07/02/2021] [Indexed: 02/01/2023]
Abstract
The recent deluge of genome-wide technologies for the mapping of the epigenome and resulting data in cancer samples has provided the opportunity for gaining insights into and understanding the roles of epigenetic processes in cancer. However, the complexity, high-dimensionality, sparsity, and noise associated with these data pose challenges for extensive integrative analyses. Machine Learning (ML) algorithms are particularly suited for epigenomic data analyses due to their flexibility and ability to learn underlying hidden structures. We will discuss four overlapping but distinct major categories under ML: dimensionality reduction, unsupervised methods, supervised methods, and deep learning (DL). We review the preferred use cases of these algorithms in analyses of cancer epigenomics data with the hope to provide an overview of how ML approaches can be used to explore fundamental questions on the roles of epigenome in cancer biology and medicine.
Collapse
Affiliation(s)
- Emre Arslan
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Jonathan Schulz
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Kunal Rai
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America.
| |
Collapse
|
39
|
Reel PS, Reel S, Pearson E, Trucco E, Jefferson E. Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv 2021; 49:107739. [PMID: 33794304 DOI: 10.1016/j.biotechadv.2021.107739] [Citation(s) in RCA: 360] [Impact Index Per Article: 90.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/01/2021] [Accepted: 03/25/2021] [Indexed: 02/06/2023]
Abstract
With the development of modern high-throughput omic measurement platforms, it has become essential for biomedical studies to undertake an integrative (combined) approach to fully utilise these data to gain insights into biological systems. Data from various omics sources such as genetics, proteomics, and metabolomics can be integrated to unravel the intricate working of systems biology using machine learning-based predictive algorithms. Machine learning methods offer novel techniques to integrate and analyse the various omics data enabling the discovery of new biomarkers. These biomarkers have the potential to help in accurate disease prediction, patient stratification and delivery of precision medicine. This review paper explores different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease. It provides insight and recommendations for interdisciplinary professionals who envisage employing machine learning skills in multi-omics studies.
Collapse
Affiliation(s)
- Parminder S Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Smarti Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Ewan Pearson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Emanuele Trucco
- VAMPIRE project, Computing, School of Science and Engineering, University of Dundee, Dundee, United Kingdom
| | - Emily Jefferson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom.
| |
Collapse
|
40
|
Boniolo F, Dorigatti E, Ohnmacht AJ, Saur D, Schubert B, Menden MP. Artificial intelligence in early drug discovery enabling precision medicine. Expert Opin Drug Discov 2021; 16:991-1007. [PMID: 34075855 DOI: 10.1080/17460441.2021.1918096] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Introduction: Precision medicine is the concept of treating diseases based on environmental factors, lifestyles, and molecular profiles of patients. This approach has been found to increase success rates of clinical trials and accelerate drug approvals. However, current precision medicine applications in early drug discovery use only a handful of molecular biomarkers to make decisions, whilst clinics gear up to capture the full molecular landscape of patients in the near future. This deep multi-omics characterization demands new analysis strategies to identify appropriate treatment regimens, which we envision will be pioneered by artificial intelligence.Areas covered: In this review, the authors discuss the current state of drug discovery in precision medicine and present our vision of how artificial intelligence will impact biomarker discovery and drug design.Expert opinion: Precision medicine is expected to revolutionize modern medicine; however, its traditional form is focusing on a few biomarkers, thus not equipped to leverage the full power of molecular landscapes. For learning how the development of drugs can be tailored to the heterogeneity of patients across their molecular profiles, artificial intelligence algorithms are the next frontier in precision medicine and will enable a fully personalized approach in drug design, and thus ultimately impacting clinical practice.
Collapse
Affiliation(s)
- Fabio Boniolo
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Centre for Environmental Health, Munich, Germany.,School of Medicine, Chair of Translational Cancer Research and Institute for Experimental Cancer Therapy, Klinikum Rechts Der Isar, Technische Universität München, Munich, Germany
| | - Emilio Dorigatti
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Centre for Environmental Health, Munich, Germany.,Statistical Learning and Data Science, Department of Statistics, Ludwig Maximilian Universität München, Munich, Germany
| | - Alexander J Ohnmacht
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Centre for Environmental Health, Munich, Germany.,Department of Biology, Ludwig-Maximilians University Munich, Martinsried, Germany
| | - Dieter Saur
- School of Medicine, Chair of Translational Cancer Research and Institute for Experimental Cancer Therapy, Klinikum Rechts Der Isar, Technische Universität München, Munich, Germany
| | - Benjamin Schubert
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Centre for Environmental Health, Munich, Germany.,Department of Mathematics, Technical University of Munich, Garching, Germany
| | - Michael P Menden
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Centre for Environmental Health, Munich, Germany.,Department of Biology, Ludwig-Maximilians University Munich, Martinsried, Germany.,German Centre for Diabetes Research (DZD e.V.), Neuherberg, Germany
| |
Collapse
|
41
|
Comprehensive Analysis of Prognostic and Genetic Signatures for General Transcription Factor III (GTF3) in Clinical Colorectal Cancer Patients Using Bioinformatics Approaches. Curr Issues Mol Biol 2021; 43:cimb43010002. [PMID: 33925358 PMCID: PMC8935981 DOI: 10.3390/cimb43010002] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 04/19/2021] [Accepted: 04/22/2021] [Indexed: 02/07/2023] Open
Abstract
Colorectal cancer (CRC) has the fourth-highest incidence of all cancer types, and its incidence has steadily increased in the last decade. The general transcription factor III (GTF3) family, comprising GTF3A, GTF3B, GTF3C1, and GTFC2, were stated to be linked with the expansion of different types of cancers; however, their messenger (m)RNA expressions and prognostic values in colorectal cancer need to be further investigated. To study the transcriptomic expression levels of GTF3 gene members in colorectal cancer in both cancerous tissues and cell lines, we first performed high-throughput screening using the Oncomine, GEPIA, and CCLE databases. We then applied the Prognoscan database to query correlations of their mRNA expressions with the disease-specific survival (DSS), overall survival (OS), and disease-free survival (DFS) status of the colorectal cancer patient. Furthermore, proteomics expressions of GTF3 family members in clinical colorectal cancer specimens were also examined using the Human Protein Atlas. Finally, genomic alterations of GTF3 family gene expressions in colorectal cancer and their signal transduction pathways were studied using cBioPortal, ClueGO, CluePedia, and MetaCore platform. Our findings revealed that GTF3 family members' expressions were significantly correlated with the cell cycle, oxidative stress, WNT/β-catenin signaling, Rho GTPases, and G-protein-coupled receptors (GPCRs). Clinically, high GTF3A and GTF3B expressions were significantly correlated with poor prognoses in colorectal cancer patients. Collectively, our study declares that GTF3A was overexpressed in cancer tissues and cell lines, particularly colorectal cancer, and it could possibly step in as a potential prognostic biomarker.
Collapse
|
42
|
Wu Z, Lawrence PJ, Ma A, Zhu J, Xu D, Ma Q. Single-Cell Techniques and Deep Learning in Predicting Drug Response. Trends Pharmacol Sci 2020; 41:1050-1065. [PMID: 33153777 PMCID: PMC7669610 DOI: 10.1016/j.tips.2020.10.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 10/04/2020] [Accepted: 10/09/2020] [Indexed: 12/19/2022]
Abstract
Rapidly developing single-cell sequencing analyses produce more comprehensive profiles of the genomic, transcriptomic, and epigenomic heterogeneity of tumor subpopulations than do traditional bulk sequencing analyses. Moreover, single-cell techniques allow the response of a tumor to drug exposure to be more thoroughlyinvestigated. Deep learning (DL) models have successfully extracted features from complex bulk sequence data to predict drug responses. We review recent innovations in single-cell technologies and DL-based approaches related to drug sensitivity predictions. We believe that, by using insights from bulk sequencedata, deep transfer learning (DTL) can facilitate the use of single-cell data for training superior DL-based drug prediction models.
Collapse
Affiliation(s)
- Zhenyu Wu
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | - Patrick J Lawrence
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | - Jian Zhu
- Department of Pathology, The Ohio State University, Columbus, OH 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA.
| |
Collapse
|
43
|
Affiliation(s)
- Yanan Gao
- Department of Information, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Rui Zhou
- Department of Information, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Qingwen Lyu
- Department of Information, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|