1
|
Valous NA, Popp F, Zörnig I, Jäger D, Charoentong P. Graph machine learning for integrated multi-omics analysis. Br J Cancer 2024:10.1038/s41416-024-02706-7. [PMID: 38729996 DOI: 10.1038/s41416-024-02706-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/25/2024] [Accepted: 04/26/2024] [Indexed: 05/12/2024] Open
Abstract
Multi-omics experiments at bulk or single-cell resolution facilitate the discovery of hypothesis-generating biomarkers for predicting response to therapy, as well as aid in uncovering mechanistic insights into cellular and microenvironmental processes. Many methods for data integration have been developed for the identification of key elements that explain or predict disease risk or other biological outcomes. The heterogeneous graph representation of multi-omics data provides an advantage for discerning patterns suitable for predictive/exploratory analysis, thus permitting the modeling of complex relationships. Graph-based approaches-including graph neural networks-potentially offer a reliable methodological toolset that can provide a tangible alternative to scientists and clinicians that seek ideas and implementation strategies in the integrated analysis of their omics sets for biomedical research. Graph-based workflows continue to push the limits of the technological envelope, and this perspective provides a focused literature review of research articles in which graph machine learning is utilized for integrated multi-omics data analyses, with several examples that demonstrate the effectiveness of graph-based approaches.
Collapse
Affiliation(s)
- Nektarios A Valous
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany.
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany.
| | - Ferdinand Popp
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Inka Zörnig
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Dirk Jäger
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Pornpimol Charoentong
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| |
Collapse
|
2
|
Wang M, Yan X, Dong Y, Li X, Gao B. Machine learning and multi-omics data reveal driver gene-based molecular subtypes in hepatocellular carcinoma for precision treatment. PLoS Comput Biol 2024; 20:e1012113. [PMID: 38728362 DOI: 10.1371/journal.pcbi.1012113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 04/24/2024] [Indexed: 05/12/2024] Open
Abstract
The heterogeneity of Hepatocellular Carcinoma (HCC) poses a barrier to effective treatment. Stratifying highly heterogeneous HCC into molecular subtypes with similar features is crucial for personalized anti-tumor therapies. Although driver genes play pivotal roles in cancer progression, their potential in HCC subtyping has been largely overlooked. This study aims to utilize driver genes to construct HCC subtype models and unravel their molecular mechanisms. Utilizing a novel computational framework, we expanded the initially identified 96 driver genes to 1192 based on mutational aspects and an additional 233 considering driver dysregulation. These genes were subsequently employed as stratification markers for further analyses. A novel multi-omics subtype classification algorithm was developed, leveraging mutation and expression data of the identified stratification genes. This algorithm successfully categorized HCC into two distinct subtypes, CLASS A and CLASS B, demonstrating significant differences in survival outcomes. Integrating multi-omics and single-cell data unveiled substantial distinctions between these subtypes regarding transcriptomics, mutations, copy number variations, and epigenomics. Moreover, our prognostic model exhibited excellent predictive performance in training and external validation cohorts. Finally, a 10-gene classification model for these subtypes identified TTK as a promising therapeutic target with robust classification capabilities. This comprehensive study provides a novel perspective on HCC stratification, offering crucial insights for a deeper understanding of its pathogenesis and the development of promising treatment strategies.
Collapse
Affiliation(s)
- Meng Wang
- Faculty of Environment and Life of Beijing University of Technology, Beijing, China
| | - Xinyue Yan
- Faculty of Environment and Life of Beijing University of Technology, Beijing, China
| | - Yanan Dong
- Faculty of Environment and Life of Beijing University of Technology, Beijing, China
| | - Xiaoqin Li
- Faculty of Environment and Life of Beijing University of Technology, Beijing, China
| | - Bin Gao
- Faculty of Environment and Life of Beijing University of Technology, Beijing, China
| |
Collapse
|
3
|
Tang X, Prodduturi N, Thompson KJ, Weinshilboum RM, O'Sullivan CC, Boughey JC, Tizhoosh H, Klee EW, Wang L, Goetz MP, Suman V, Kalari KR. OmicsFootPrint: a framework to integrate and interpret multi-omics data using circular images and deep neural networks. bioRxiv 2024:2024.03.21.586001. [PMID: 38585820 PMCID: PMC10996492 DOI: 10.1101/2024.03.21.586001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
The OmicsFootPrint framework addresses the need for advanced multi-omics data analysis methodologies by transforming data into intuitive two-dimensional circular images and facilitating the interpretation of complex diseases. Utilizing Deep Neural Networks and incorporating the SHapley Additive exPlanations (SHAP) algorithm, the framework enhances model interpretability. Tested with The Cancer Genome Atlas (TCGA) data, OmicsFootPrint effectively classified lung and breast cancer subtypes, achieving high Area Under Curve (AUC) scores - 0.98±0.02 for lung cancer subtype differentiation, 0.83±0.07 for breast cancer PAM50 subtypes, and successfully distinguishe between invasive lobular and ductal carcinomas in breast cancer, showcasing its robustness. It also demonstrated notable performance in predicting drug responses in cancer cell lines, with a median AUC of 0.74, surpassing existing algorithms. Furthermore, its effectiveness persists even with reduced training sample sizes. OmicsFootPrint marks an enhancement in multi-omics research, offering a novel, efficient, and interpretable approach that contributes to a deeper understanding of disease mechanisms.
Collapse
|
4
|
Hu S, Xiao Q, Gao R, Qin J, Nie J, Chen Y, Lou J, Ding M, Pan Y, Wang S. Identification of BGN positive fibroblasts as a driving factor for colorectal cancer and development of its related prognostic model combined with machine learning. BMC Cancer 2024; 24:516. [PMID: 38654221 PMCID: PMC11041013 DOI: 10.1186/s12885-024-12251-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 04/11/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND Numerous studies have indicated that cancer-associated fibroblasts (CAFs) play a crucial role in the progression of colorectal cancer (CRC). However, there are still many unknowns regarding the exact role of CAF subtypes in CRC. METHODS The data for this study were obtained from bulk, single-cell, and spatial transcriptomic sequencing data. Bioinformatics analysis, in vitro experiments, and machine learning methods were employed to investigate the functional characteristics of CAF subtypes and construct prognostic models. RESULTS Our study demonstrates that Biglycan (BGN) positive cancer-associated fibroblasts (BGN + Fib) serve as a driver in colorectal cancer (CRC). The proportion of BGN + Fib increases gradually with the progression of CRC, and high infiltration of BGN + Fib is associated with poor prognosis in terms of overall survival (OS) and recurrence-free survival (RFS) in CRC. Downregulation of BGN expression in cancer-associated fibroblasts (CAFs) significantly reduces migration and proliferation of CRC cells. Among 101 combinations of 10 machine learning algorithms, the StepCox[both] + plsRcox combination was utilized to develop a BGN + Fib derived risk signature (BGNFRS). BGNFRS was identified as an independent adverse prognostic factor for CRC OS and RFS, outperforming 92 previously published risk signatures. A Nomogram model constructed based on BGNFRS and clinical-pathological features proved to be a valuable tool for predicting CRC prognosis. CONCLUSION In summary, our study identified BGN + Fib as drivers of CRC, and the derived BGNFRS was effective in predicting the OS and RFS of CRC patients.
Collapse
Affiliation(s)
- Shangshang Hu
- School of Medicine, Southeast University, 210009, Nanjing, Jiangsu, China
- General Clinical Research Center, Nanjing First Hospital, Nanjing Medical University, No. 68, Changle Road, 210006, Nanjing, Jiangsu, China
| | - Qianni Xiao
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, 211122, Nanjing, Jiangsu, China
| | - Rui Gao
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, 211122, Nanjing, Jiangsu, China
| | - Jian Qin
- School of Medicine, Southeast University, 210009, Nanjing, Jiangsu, China
- General Clinical Research Center, Nanjing First Hospital, Nanjing Medical University, No. 68, Changle Road, 210006, Nanjing, Jiangsu, China
| | - Junjie Nie
- School of Medicine, Southeast University, 210009, Nanjing, Jiangsu, China
- General Clinical Research Center, Nanjing First Hospital, Nanjing Medical University, No. 68, Changle Road, 210006, Nanjing, Jiangsu, China
| | - Yuhan Chen
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, 211122, Nanjing, Jiangsu, China
| | - Jinwei Lou
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, 211122, Nanjing, Jiangsu, China
| | - Muzi Ding
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, 211122, Nanjing, Jiangsu, China
| | - Yuqin Pan
- General Clinical Research Center, Nanjing First Hospital, Nanjing Medical University, No. 68, Changle Road, 210006, Nanjing, Jiangsu, China.
- Jiangsu Collaborative Innovation Center on Cancer Personalized Medicine, Nanjing Medical University, 211100, Nanjing, Jiangsu, China.
| | - Shukui Wang
- School of Medicine, Southeast University, 210009, Nanjing, Jiangsu, China.
- General Clinical Research Center, Nanjing First Hospital, Nanjing Medical University, No. 68, Changle Road, 210006, Nanjing, Jiangsu, China.
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, 211122, Nanjing, Jiangsu, China.
- Jiangsu Collaborative Innovation Center on Cancer Personalized Medicine, Nanjing Medical University, 211100, Nanjing, Jiangsu, China.
| |
Collapse
|
5
|
Zhang Y, Yu L, Yang M, Han B, Luo J, Jing R. Model fusion for predicting unconventional proteins secreted by exosomes using deep learning. Proteomics 2024:e2300184. [PMID: 38643383 DOI: 10.1002/pmic.202300184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/22/2024]
Abstract
Unconventional secretory proteins (USPs) are vital for cell-to-cell communication and are necessary for proper physiological processes. Unlike classical proteins that follow the conventional secretory pathway via the Golgi apparatus, these proteins are released using unconventional pathways. The primary modes of secretion for USPs are exosomes and ectosomes, which originate from the endoplasmic reticulum. Accurate and rapid identification of exosome-mediated secretory proteins is crucial for gaining valuable insights into the regulation of non-classical protein secretion and intercellular communication, as well as for the advancement of novel therapeutic approaches. Although computational methods based on amino acid sequence prediction exist for predicting unconventional proteins secreted by exosomes (UPSEs), they suffer from significant limitations in terms of algorithmic accuracy. In this study, we propose a novel approach to predict UPSEs by combining multiple deep learning models that incorporate both protein sequences and evolutionary information. Our approach utilizes a convolutional neural network (CNN) to extract protein sequence information, while various densely connected neural networks (DNNs) are employed to capture evolutionary conservation patterns.By combining six distinct deep learning models, we have created a superior framework that surpasses previous approaches, achieving an ACC score of 77.46% and an MCC score of 0.5406 on an independent test dataset.
Collapse
Affiliation(s)
- Yonglin Zhang
- Department of Clinical Pharmacy and Pharmacy Management, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China
| | - Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, Guizhou, China
| | - Ming Yang
- Department of Clinical Pharmacy and Pharmacy Management, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China
| | - Bin Han
- GCP Center/Institute of Drug Clinical Trials, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Jiesi Luo
- Basic Medical College, Southwest Medical University, Luzhou, Sichuan, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
6
|
Xu H, Yan R, Ye C, Li J, Ji G. Specific mortality in patients with diffuse large B-cell lymphoma: a retrospective analysis based on the surveillance, epidemiology, and end results database. Eur J Med Res 2024; 29:241. [PMID: 38643217 PMCID: PMC11031870 DOI: 10.1186/s40001-024-01833-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 04/06/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND The full potential of competing risk modeling approaches in the context of diffuse large B-cell lymphoma (DLBCL) patients has yet to be fully harnessed. This study aims to address this gap by developing a sophisticated competing risk model specifically designed to predict specific mortality in DLBCL patients. METHODS We extracted DLBCL patients' data from the SEER (Surveillance, Epidemiology, and End Results) database. To identify relevant variables, we conducted a two-step screening process using univariate and multivariate Fine and Gray regression analyses. Subsequently, a nomogram was constructed based on the results. The model's consistency index (C-index) was calculated to assess its performance. Additionally, calibration curves and receiver operator characteristic (ROC) curves were generated to validate the model's effectiveness. RESULTS This study enrolled a total of 24,402 patients. The feature selection analysis identified 13 variables that were statistically significant and therefore included in the model. The model validation results demonstrated that the area under the receiver operating characteristic (ROC) curve (AUC) for predicting 6-month, 1-year, and 3-year DLBCL-specific mortality was 0.748, 0.718, and 0.698, respectively, in the training cohort. In the validation cohort, the AUC values were 0.747, 0.721, and 0.697. The calibration curves indicated good consistency between the training and validation cohorts. CONCLUSION The most significant predictor of DLBCL-specific mortality is the age of the patient, followed by the Ann Arbor stage and the administration of chemotherapy. This predictive model has the potential to facilitate the identification of high-risk DLBCL patients by clinicians, ultimately leading to improved prognosis.
Collapse
Affiliation(s)
- Hui Xu
- Department of Hematology, Taixing People's Hospital, No. 98, Runtai South Road, Taixing, 225400, Jiangsu, China
| | - Rong Yan
- Taixing People's Hospital, Taixing, Jiangsu, China
| | - Chunmei Ye
- Department of Hematology, Taixing People's Hospital, No. 98, Runtai South Road, Taixing, 225400, Jiangsu, China
| | - Jun Li
- Department of Hematology, Taixing People's Hospital, No. 98, Runtai South Road, Taixing, 225400, Jiangsu, China
| | - Guo Ji
- Department of Hematology, Taixing People's Hospital, No. 98, Runtai South Road, Taixing, 225400, Jiangsu, China.
| |
Collapse
|
7
|
Acharya D, Mukhopadhyay A. A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology. Brief Funct Genomics 2024:elae013. [PMID: 38600757 DOI: 10.1093/bfgp/elae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/12/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024] Open
Abstract
Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact: anirban@klyuniv.ac.in.
Collapse
Affiliation(s)
- Debabrata Acharya
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| |
Collapse
|
8
|
Zhang W, Mou M, Hu W, Lu M, Zhang H, Zhang H, Luo Y, Xu H, Tao L, Dai H, Gao J, Zhu F. MOINER: A Novel Multiomics Early Integration Framework for Biomedical Classification and Biomarker Discovery. J Chem Inf Model 2024; 64:2720-2732. [PMID: 38373720 DOI: 10.1021/acs.jcim.4c00013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
In the context of precision medicine, multiomics data integration provides a comprehensive understanding of underlying biological processes and is critical for disease diagnosis and biomarker discovery. One commonly used integration method is early integration through concatenation of multiple dimensionally reduced omics matrices due to its simplicity and ease of implementation. However, this approach is seriously limited by information loss and lack of latent feature interaction. Herein, a novel multiomics early integration framework (MOINER) based on information enhancement and image representation learning is thus presented to address the challenges. MOINER employs the self-attention mechanism to capture the intrinsic correlations of omics-features, which make it significantly outperform the existing state-of-the-art methods for multiomics data integration. Moreover, visualizing the attention embedding and identifying potential biomarkers offer interpretable insights into the prediction results. All source codes and model for MOINER are freely available https://github.com/idrblab/MOINER.
Collapse
Affiliation(s)
- Wei Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Wei Hu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hongquan Xu
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Haibin Dai
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
9
|
Lac L, Leung CK, Hu P. Computational frameworks integrating deep learning and statistical models in mining multimodal omics data. J Biomed Inform 2024; 152:104629. [PMID: 38552994 DOI: 10.1016/j.jbi.2024.104629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 02/26/2024] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.
Collapse
Affiliation(s)
- Leann Lac
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Carson K Leung
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Pingzhao Hu
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Biochemistry, Western University, London, Ontario, Canada; Department of Computer Science, Western University, London, Ontario, Canada; Department of Oncology, Western University, London, Ontario, Canada; Department of Epidemiology and Biostatistics, Western University, London, Ontario, Canada; The Children's Health Research Institute, Lawson Health Research Institute, London, Ontario, Canada.
| |
Collapse
|
10
|
Sun C, Cheng X, Xu J, Chen H, Tao J, Dong Y, Wei S, Chen R, Meng X, Ma Y, Tian H, Guo X, Bi S, Zhang C, Kang J, Zhang M, Lv H, Shang Z, Lv W, Zhang R, Jiang Y. A review of disease risk prediction methods and applications in the omics era. Proteomics 2024:e2300359. [PMID: 38522029 DOI: 10.1002/pmic.202300359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 03/25/2024]
Abstract
Risk prediction and disease prevention are the innovative care challenges of the 21st century. Apart from freeing the individual from the pain of disease, it will lead to low medical costs for society. Until very recently, risk assessments have ushered in a new era with the emergence of omics technologies, including genomics, transcriptomics, epigenomics, proteomics, and so on, which potentially advance the ability of biomarkers to aid prediction models. While risk prediction has achieved great success, there are still some challenges and limitations. We reviewed the general process of omics-based disease risk model construction and the applications in four typical diseases. Meanwhile, we highlighted the problems in current studies and explored the potential opportunities and challenges for future clinical practice.
Collapse
Affiliation(s)
- Chen Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Xiangshu Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Jing Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Haiyan Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Junxian Tao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Yu Dong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Siyu Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Rui Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xin Meng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yingnan Ma
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| | - Hongsheng Tian
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xuying Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shuo Bi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jingxuan Kang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Mingming Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongchao Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zhenwei Shang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Wenhua Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Ruijie Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The EWAS Project, Harbin, China
| |
Collapse
|
11
|
Chang TH, Chen YD, Lu HHS, Wu JL, Mak K, Yu CS. Specific patterns and potential risk factors to predict 3-year risk of death among non-cancer patients with advanced chronic kidney disease by machine learning. Medicine (Baltimore) 2024; 103:e37112. [PMID: 38363886 PMCID: PMC10869094 DOI: 10.1097/md.0000000000037112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/09/2024] [Indexed: 02/18/2024] Open
Abstract
Chronic kidney disease (CKD) is a major public health concern. But there are limited machine learning studies on non-cancer patients with advanced CKD, and the results of machine learning studies on cancer patients with CKD may not apply directly on non-cancer patients. We aimed to conduct a comprehensive investigation of risk factors for a 3-year risk of death among non-cancer advanced CKD patients with an estimated glomerular filtration rate < 60.0 mL/min/1.73m2 by several machine learning algorithms. In this retrospective cohort study, we collected data from in-hospital and emergency care patients from 2 hospitals in Taiwan from 2009 to 2019, including their international classification of disease at admission and laboratory data from the hospital's electronic medical records (EMRs). Several machine learning algorithms were used to analyze the potential impact and degree of influence of each factor on mortality and survival. Data from 2 hospitals in northern Taiwan were collected with 6565 enrolled patients. After data cleaning, 26 risk factors and approximately 3887 advanced CKD patients from Shuang Ho Hospital were used as the training set. The validation set contained 2299 patients from Taipei Medical University Hospital. Predictive variables, such as albumin, PT-INR, and age, were the top 3 significant risk factors with paramount influence on mortality prediction. In the receiver operating characteristic curve, the random forest had the highest values for accuracy above 0.80. MLP, and Adaboost had better performance on sensitivity and F1-score compared to other methods. Additionally, SVM with linear kernel function had the highest specificity of 0.9983, while its sensitivity and F1-score were poor. Logistic regression had the best performance, with an area under the curve of 0.8527. Evaluating Taiwanese advanced CKD patients' EMRs could provide physicians with a good approximation of the patients' 3-year risk of death by machine learning algorithms.
Collapse
Affiliation(s)
- Tzu-Hao Chang
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
- Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan
| | - Yu-Da Chen
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan
- School of Health Care Administration, College of Management, Taipei Medical University, Taipei, Taiwan
- Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Henry Horng-Shing Lu
- Institute of Statistics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Institute of Data Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Jenny L. Wu
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | | | - Cheng-Sheng Yu
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan
- Clinical Data Center, Office of Data Science, Taipei Medical University, Taipei, Taiwan
- Fintech RD Center, Nan Shan Life Insurance Co., Ltd
| |
Collapse
|
12
|
Chen Z, Liu Y, Lin Z, Huang W. Understand how machine learning impact lung cancer research from 2010 to 2021: A bibliometric analysis. Open Med (Wars) 2024; 19:20230874. [PMID: 38463530 PMCID: PMC10921441 DOI: 10.1515/med-2023-0874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 11/18/2023] [Accepted: 11/20/2023] [Indexed: 03/12/2024] Open
Abstract
Advances in lung cancer research applying machine learning (ML) technology have generated many relevant literature. However, there is absence of bibliometric analysis review that aids a comprehensive understanding of this field and its progress. Present article for the first time performed a bibliometric analysis to clarify research status and focus from 2010 to 2021. In the analysis, a total of 2,312 relevant literature were searched and retrieved from the Web of Science Core Collection database. We conducted a bibliometric analysis and further visualization. During that time, exponentially growing annual publication and our model have shown a flourishing research prospect. Annual citation reached the peak in 2017. Researchers from United States and China have produced most of the relevant literature and strongest partnership between them. Medical image analysis and Nature appeared to bring more attention to the public. The computer-aided diagnosis, precision medicine, and survival prediction were the focus of research, reflecting the development trend at that period. ML did make a big difference in lung cancer research in the past decade.
Collapse
Affiliation(s)
- Zijian Chen
- Department of Cardiothoracic Surgery, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China
| | - Yangqi Liu
- Department of Cardiothoracic Surgery, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China
| | - Zeying Lin
- Department of Cardiothoracic Surgery, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China
| | - Weizhe Huang
- Department of Cardiothoracic Surgery, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China
| |
Collapse
|
13
|
Feng X, Shu W, Li M, Li J, Xu J, He M. Pathogenomics for accurate diagnosis, treatment, prognosis of oncology: a cutting edge overview. J Transl Med 2024; 22:131. [PMID: 38310237 PMCID: PMC10837897 DOI: 10.1186/s12967-024-04915-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 01/20/2024] [Indexed: 02/05/2024] Open
Abstract
The capability to gather heterogeneous data, alongside the increasing power of artificial intelligence to examine it, leading a revolution in harnessing multimodal data in the life sciences. However, most approaches are limited to unimodal data, leaving integrated approaches across modalities relatively underdeveloped in computational pathology. Pathogenomics, as an invasive method to integrate advanced molecular diagnostics from genomic data, morphological information from histopathological imaging, and codified clinical data enable the discovery of new multimodal cancer biomarkers to propel the field of precision oncology in the coming decade. In this perspective, we offer our opinions on synthesizing complementary modalities of data with emerging multimodal artificial intelligence methods in pathogenomics. It includes correlation between the pathological and genomic profile of cancer, fusion of histology, and genomics profile of cancer. We also present challenges, opportunities, and avenues for future work.
Collapse
Affiliation(s)
- Xiaobing Feng
- College of Electrical and Information Engineering, Hunan University, Changsha, China
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, Zhejiang, China
| | - Wen Shu
- College of Electrical and Information Engineering, Hunan University, Changsha, China
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, Zhejiang, China
| | - Mingya Li
- College of Electrical and Information Engineering, Hunan University, Changsha, China
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, Zhejiang, China
| | - Junyu Li
- College of Electrical and Information Engineering, Hunan University, Changsha, China
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, Zhejiang, China
| | - Junyao Xu
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, Zhejiang, China
| | - Min He
- College of Electrical and Information Engineering, Hunan University, Changsha, China.
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310022, Zhejiang, China.
| |
Collapse
|
14
|
Tong L, Shi W, Isgut M, Zhong Y, Lais P, Gloster L, Sun J, Swain A, Giuste F, Wang MD. Integrating Multi-Omics Data With EHR for Precision Medicine Using Advanced Artificial Intelligence. IEEE Rev Biomed Eng 2024; 17:80-97. [PMID: 37824325 DOI: 10.1109/rbme.2023.3324264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
With the recent advancement of novel biomedical technologies such as high-throughput sequencing and wearable devices, multi-modal biomedical data ranging from multi-omics molecular data to real-time continuous bio-signals are generated at an unprecedented speed and scale every day. For the first time, these multi-modal biomedical data are able to make precision medicine close to a reality. However, due to data volume and the complexity, making good use of these multi-modal biomedical data requires major effort. Researchers and clinicians are actively developing artificial intelligence (AI) approaches for data-driven knowledge discovery and causal inference using a variety of biomedical data modalities. These AI-based approaches have demonstrated promising results in various biomedical and healthcare applications. In this review paper, we summarize the state-of-the-art AI models for integrating multi-omics data and electronic health records (EHRs) for precision medicine. We discuss the challenges and opportunities in integrating multi-omics data with EHRs and future directions. We hope this review can inspire future research and developing in integrating multi-omics data with EHRs for precision medicine.
Collapse
|
15
|
Vanmathi P, Jose D. An ensemble-based serial cascaded attention network and improved variational auto encoder for breast cancer prognosis prediction using data. Comput Methods Biomech Biomed Engin 2024; 27:98-115. [PMID: 38006210 DOI: 10.1080/10255842.2023.2280883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023]
Abstract
Breast cancer is one of the most common types of cancer in women and it produces a huge amount of death rate in the world. Early recognition is lessening its impact. The early recognition of breast cancer could convince patients to receive surgical therapy, which will significantly improve the chance of restoration. This information is used by the machine learning technique to find links between them and appraise our forecasts of fresh occurrences. Later recognition of breast cancer can lead to death. An accurate prescient framework for breast cancer prediction is urgently needed in the current era. In order to accomplish the objective, an adaptive ensemble model is proposed for breast cancer prognosis prediction using data. At the initial stage, the raw data are fetched from benchmark datasets. It is then followed by data cleaning and preprocessing. Subsequently, the pre-processed data is fed into the Improved Variational Autoencoder (IVAE), where the deep features are extracted. Finally, the resultant features are given as input to the Ensemble-based Serial Cascaded Attention Network (ESCANet), which is built with Deep Temporal Convolution Network (DTCN), Bi-directional Long Short-Term Memory (BiLSTM), and Recurrent Neural Network (RNN). The effectiveness of the model is validated and compared with conventional methodologies. Therefore, the results elucidate that the proposed methodology achieves extensive results; thus, it increases the system's efficiency.
Collapse
Affiliation(s)
- P Vanmathi
- Full time Research Scholar, Department of ECE, KCG College of Technology, Karapakkam, Chennai, Tamil Nadu, India
| | - Deepa Jose
- Professor, Department of ECE, KCG College of Technology, Karapakkam, Chennai, Tamil Nadu, India
| |
Collapse
|
16
|
Yadav S, Zhou S, He B, Du Y, Garmire LX. Deep learning and transfer learning identify breast cancer survival subtypes from single-cell imaging data. Commun Med (Lond) 2023; 3:187. [PMID: 38114659 PMCID: PMC10730890 DOI: 10.1038/s43856-023-00414-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 11/23/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND Single-cell multiplex imaging data have provided new insights into disease subtypes and prognoses recently. However, quantitative models that explicitly capture single-cell resolution cell-cell interaction features to predict patient survival at a population scale are currently missing. METHODS We quantified hundreds of single-cell resolution cell-cell interaction features through neighborhood calculation, in addition to cellular phenotypes. We applied these features to a neural-network-based Cox-nnet survival model to identify survival-associated features. We used non-negative matrix factorization (NMF) to identify patient survival subtypes. We identified atypical subpopulations of triple-negative breast cancer (TNBC) patients with moderate prognosis and Luminal A patients with poor prognosis and validated these subpopulations by label transferring using the UNION-COM method. RESULTS The neural-network-based Cox-nnet survival model using all cellular phenotype and cell-cell interaction features is highly predictive of patient survival in the test data (Concordance Index > 0.8). We identify seven survival subtypes using the top survival features, presenting distinct profiles of epithelial, immune, and fibroblast cells and their interactions. We reveal atypical subpopulations of TNBC patients with moderate prognosis (marked by GATA3 over-expression) and Luminal A patients with poor prognosis (marked by KRT6 and ACTA2 over-expression and CDH1 under-expression). These atypical subpopulations are validated in TCGA-BRCA and METABRIC datasets. CONCLUSIONS This work provides an approach to bridge single-cell level information toward population-level survival prediction.
Collapse
Affiliation(s)
- Shashank Yadav
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan, MI, 48105, USA
| | - Shu Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan, MI, 48105, USA
| | - Bing He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan, MI, 48105, USA
| | - Yuheng Du
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan, MI, 48105, USA
| | - Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan, MI, 48105, USA.
| |
Collapse
|
17
|
Feng K, Ren F, Xing Z, Zhao Y, Yang C, Liu J, Shang Q, Wang X, Wang X. Microbiome and its implications in oncogenesis: a Mendelian randomization perspective. Am J Cancer Res 2023; 13:5785-5804. [PMID: 38187050 PMCID: PMC10767327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/02/2023] [Indexed: 01/09/2024] Open
Abstract
The human microbiome, an intricate ecological network, has garnered significant attention due to its potential implications in oncogenesis. This paper delves into the multifaceted relationships between the microbiome, its metabolites, and cancer development, emphasizing the human intestinal tract as the primary microbial habitat. Highlighting the potential causative associations between microbial disturbances and cancer progression, we underscore the role of specific bacterial strains in various cancers, such as stomach and colorectal cancer. Traditional causality assessment methods, like randomized controlled trials (RCTs), have limitations. Therefore, we advocate using Mendelian Randomization (MR) as a powerful alternative to study causal relationships, leveraging genetic variants as instrumental variables. With the proliferation of genome-wide association studies, MR harnesses genetic variations to infer causality, which is especially beneficial when addressing confounders like diet and lifestyle that can skew microbial research. We systematically review MR's application in understanding the microbiome-cancer nexus, emphasizing its strengths and challenges. While MR offers a unique perspective on causality, it faces hurdles like horizontal pleiotropy and weak instrumental variable bias. Integrating MR with multi-omics data, encompassing genomics, transcriptomics, proteomics, and metabolomics, holds promise for future research, potentially heralding groundbreaking discoveries in microbiology and genetics. This comprehensive review underscores the critical role of the human microbiome in oncogenesis and champions MR as an indispensable tool for advancing our understanding in this domain.
Collapse
Affiliation(s)
- Kexin Feng
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Fei Ren
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Zeyu Xing
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Yifan Zhao
- School of Engineering, RMIT UniversityBundoora, VIC 3083, Australia
| | - Chenxuan Yang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Jiaxiang Liu
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Qingyao Shang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Xin Wang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Xiang Wang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| |
Collapse
|
18
|
Su D, Xiong Y, Wang S, Wei H, Ke J, Li H, Wang T, Zuo Y, Yang L. Structural deep clustering network for stratification of breast cancer patients through integration of somatic mutation profiles. Comput Methods Programs Biomed 2023; 242:107808. [PMID: 37716222 DOI: 10.1016/j.cmpb.2023.107808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/15/2023] [Accepted: 09/10/2023] [Indexed: 09/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Breast cancer is among of the most malignant tumor that occurs in women and is one of the leading causes of death from gynecologic malignancy worldwide. The high degree of heterogeneity that characterizes breast cancer makes it challenging to devise effective therapeutic strategies. Accumulating evidence highlights the crucial role of stratifying breast cancer patients into clinically significant subtypes to achieve better prognoses and treatments. The structural deep clustering network is a graph convolutional network-based clustering algorithm that integrates structural information and has achieved state-of-the-art performance in various applications. METHODS In this study, we employed structural deep clustering network to integrate somatic mutation profiles for stratifying 2526 breast cancer patients from the Memorial Sloan Kettering Cancer Center into two clinically differentiable subtypes. RESULTS Breast cancer patients in cluster 1 exhibited better prognosis than breast cancer patients in cluster 2, and the difference between them was statistically significant. The immunogenomic landscape further demonstrated that cluster 1 was associated with remarkable infiltration of the tumor infiltrating lymphocytes. The clustering subtype could be used to evaluate the therapeutic benefit of immunotherapy and chemotherapy in breast cancer patients. Furthermore, our approach effectively classified patients from eight different cancer types, demonstrating its generalizability. CONCLUSIONS Our study represents a step towards a generic methodology for classifying cancer patients using only somatic mutation data and structural deep clustering network approaches. Employing structural deep clustering network to identify breast cancer subtypes is promising and can inform the development of more accurate and personalized therapies.
Collapse
Affiliation(s)
- Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Jiawei Ke
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Honghao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Tao Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd. Hohhot, 010010, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
19
|
Lin YT, Zhou Q, Tan J, Tao Y. Multimodal and multi-omics-based deep learning model for screening of optic neuropathy. Heliyon 2023; 9:e22244. [PMID: 38046141 PMCID: PMC10686864 DOI: 10.1016/j.heliyon.2023.e22244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 12/05/2023] Open
Abstract
Purpose To examine the use of multimodal data and multi-omics strategies for optic nerve disease screening. Methods This was a single-center retrospective study. A deep learning model was created from fundus photography and infrared reflectance (IR) images of patients with diabetic optic neuropathy, glaucomatous optic neuropathy, and optic neuritis. Patients who were seen at the Ophthalmology Department of First Affiliated Hospital of Nanchang University in Jiangxi Province from November 2019 to April 2023 were included in this study. The data were analyzed in single and multimodal modes following the traditional omics, Resnet101, and fusion models. The accuracy and area-under-the-curve (AUC) of each model were compared. Results A total of 312 images fundus and infrared fundus photographs were collected from 156 patients. When multi-modal data was used, the accuracy of the traditional omics mode, Resnet101, and fusion models with the training set were 0.97, 0.98, and 0.99, respectively. The accuracy of the same models with the test sets were 0.72, 0.87, and 0.88, respectively. We compared single- and multi-mode states by applying the data to the different groups in the learning model. In the traditional omics model, the macro-average AUCs of the features extracted from fundus photography, IR images, and multimodal data were 0.94, 0.90, and 0.96, respectively. When the same data were processed in the Resnet101 model, the scores were 0.97 equally. However, when multimodal data was utilized, the macro-average AUCs in the traditional omics, Resnet101, and fusion modesl were 0.96, 0.97, and 0.99, respectively. Conclusion The deep learning model based on multimodal data and multi-omics strategies can improve the accuracy of screening and diagnosing diabetic optic neuropathy, glaucomatous optic neuropathy, and optic neuritis.
Collapse
Affiliation(s)
- Ye-ting Lin
- Department of Ophthalmology, The First Affiliated Hospital of Nanchang University, China
| | - Qiong Zhou
- Department of Ophthalmology, The First Affiliated Hospital of Nanchang University, China
| | - Jian Tan
- Department of Ophthalmology, The First Affiliated Hospital of Nanchang University, China
| | - Yulin Tao
- Department of Ophthalmology, The First Affiliated Hospital of Nanchang University, China
| |
Collapse
|
20
|
Peng J, Xiao L, Zhu H, Han L, Ma H. Determining the prognosis of Lung cancer from mutated genes using a deep learning survival model: a large multi-center study. Cancer Cell Int 2023; 23:262. [PMID: 37925409 PMCID: PMC10625246 DOI: 10.1186/s12935-023-03118-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 10/30/2023] [Indexed: 11/06/2023] Open
Abstract
BACKGROUND Gene status has become the focus of prognosis prediction. Furthermore, deep learning has frequently been implemented in medical imaging to diagnose, prognosticate, and evaluate treatment responses in patients with cancer. However, few deep learning survival (DLS) models based on mutational genes that are directly associated with patient prognosis in terms of progression-free survival (PFS) or overall survival (OS) have been reported. Additionally, DLS models have not been applied to determine IO-related prognosis based on mutational genes. Herein, we developed a deep learning method to predict the prognosis of patients with lung cancer treated with or without immunotherapy (IO). METHODS Samples from 6542 patients from different centers were subjected to genome sequencing. A DLS model based on multi-panels of somatic mutations was trained and validated to predict OS in patients treated without IO and PFS in patients treated with IO. RESULTS In patients treated without IO, the DLS model (low vs. high DLS) was trained using the training MSK-MET cohort (HR = 0.241 [0.213-0.273], P < 0.001) and tested in the inter-validation MSK-MET cohort (HR = 0.175 [0.148-0.206], P < 0.001). The DLS model was then validated with the OncoSG, MSK-CSC, and TCGA-LUAD cohorts (HR = 0.420 [0.272-0.649], P < 0.001; HR = 0.550 [0.424-0.714], P < 0.001; HR = 0.215 [0.159-0.291], P < 0.001, respectively). Subsequently, it was fine-tuned and retrained in patients treated with IO. The DLS model (low vs. high DLS) could predict PFS and OS in the MIND, MSKCC, and POPLAR/OAK cohorts (P < 0.001, respectively). Compared with tumor-node-metastasis staging, the COX model, tumor mutational burden, and programmed death-ligand 1 expression, the DLS model had the highest C-index in patients treated with or without IO. CONCLUSIONS The DLS model based on mutational genes can robustly predict the prognosis of patients with lung cancer treated with or without IO.
Collapse
Affiliation(s)
- Jie Peng
- Department of Medical Oncology, The Second Affiliated Hospital, Guizhou Medical University, Kaili, China.
| | - Lushan Xiao
- Hepatology Unit, Department of Infectious Diseases, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Hongbo Zhu
- Department of Medical Oncology, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang, China
| | - Lijie Han
- Department of Hematology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Honglian Ma
- Department of Radiation Oncology, Cancer Hospital of the University of Chinese Academy of Sciences, Hangzhou, China
| |
Collapse
|
21
|
Wei L, Niraula D, Gates EDH, Fu J, Luo Y, Nyflot MJ, Bowen SR, El Naqa IM, Cui S. Artificial intelligence (AI) and machine learning (ML) in precision oncology: a review on enhancing discoverability through multiomics integration. Br J Radiol 2023; 96:20230211. [PMID: 37660402 PMCID: PMC10546458 DOI: 10.1259/bjr.20230211] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/15/2023] [Accepted: 06/27/2023] [Indexed: 09/05/2023] Open
Abstract
Multiomics data including imaging radiomics and various types of molecular biomarkers have been increasingly investigated for better diagnosis and therapy in the era of precision oncology. Artificial intelligence (AI) including machine learning (ML) and deep learning (DL) techniques combined with the exponential growth of multiomics data may have great potential to revolutionize cancer subtyping, risk stratification, prognostication, prediction and clinical decision-making. In this article, we first present different categories of multiomics data and their roles in diagnosis and therapy. Second, AI-based data fusion methods and modeling methods as well as different validation schemes are illustrated. Third, the applications and examples of multiomics research in oncology are demonstrated. Finally, the challenges regarding the heterogeneity data set, availability of omics data, and validation of the research are discussed. The transition of multiomics research to real clinics still requires consistent efforts in standardizing omics data collection and analysis, building computational infrastructure for data sharing and storing, developing advanced methods to improve data fusion and interpretability, and ultimately, conducting large-scale prospective clinical trials to fill the gap between study findings and clinical benefits.
Collapse
Affiliation(s)
- Lise Wei
- Department of Radiation Oncology, University of Michigan, Michigan, United States
| | - Dipesh Niraula
- Department of Radiation Oncology, Moffitt Cancer Center, Tampa, United States
| | - Evan D. H. Gates
- Department of Radiation Oncology, University of Washington, Washington, United States
| | - Jie Fu
- Department of Radiation Oncology, Stanford University, Stanford, California, United States
| | - Yi Luo
- Department of Radiation Oncology, Moffitt Cancer Center, Tampa, United States
| | - Matthew J. Nyflot
- Department of Radiation Oncology, University of Washington, Washington, United States
| | - Stephen R. Bowen
- Department of Radiation Oncology, University of Washington, Washington, United States
| | - Issam M El Naqa
- Department of Radiation Oncology, Moffitt Cancer Center, Tampa, United States
| | - Sunan Cui
- Department of Radiation Oncology, University of Washington, Washington, United States
| |
Collapse
|
22
|
Nguyen QTN, Nguyen P, Wang C, Phuc PT, Lin R, Hung C, Kuo N, Cheng Y, Lin S, Hsieh Z, Cheng C, Hsu M, Hsu JC. Machine learning approaches for predicting 5-year breast cancer survival: A multicenter study. Cancer Sci 2023; 114:4063-4072. [PMID: 37489252 PMCID: PMC10551582 DOI: 10.1111/cas.15917] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 06/27/2023] [Accepted: 07/05/2023] [Indexed: 07/26/2023] Open
Abstract
The study used clinical data to develop a prediction model for breast cancer survival. Breast cancer prognostic factors were explored using machine learning techniques. We conducted a retrospective study using data from the Taipei Medical University Clinical Research Database, which contains electronic medical records from three affiliated hospitals in Taiwan. The study included female patients aged over 20 years who were diagnosed with primary breast cancer and had medical records in hospitals between January 1, 2009 and December 31, 2020. The data were divided into training and external testing datasets. Nine different machine learning algorithms were applied to develop the models. The performances of the algorithms were measured using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score. A total of 3914 patients were included in the study. The highest AUC of 0.95 was observed with the artificial neural network model (accuracy, 0.90; sensitivity, 0.71; specificity, 0.73; PPV, 0.28; NPV, 0.94; and F1-score, 0.37). Other models showed relatively high AUC, ranging from 0.75 to 0.83. According to the optimal model results, cancer stage, tumor size, diagnosis age, surgery, and body mass index were the most critical factors for predicting breast cancer survival. The study successfully established accurate 5-year survival predictive models for breast cancer. Furthermore, the study found key factors that could affect breast cancer survival in Taiwanese women. Its results might be used as a reference for the clinical practice of breast cancer treatment.
Collapse
Affiliation(s)
- Quynh Thi Nhu Nguyen
- School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
| | - Phung‐Anh Nguyen
- Clinical Data Center, Office of Data ScienceTaipei Medical UniversityTaipei CityTaiwan
- Clinical Big Data Research CenterTaipei Medical University Hospital, Taipei Medical UniversityTaipei CityTaiwan
- Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| | - Chun‐Jung Wang
- School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
| | - Phan Thanh Phuc
- Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| | - Ruo‐Kai Lin
- School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
| | - Chin‐Sheng Hung
- Department of Surgery, School of Medicine, College of MedicineTaipei Medical UniversityTaipei CityTaiwan
| | - Nei‐Hui Kuo
- Oncology CenterTaipei Medical University HospitalTaipei CityTaiwan
| | - Yu‐Wen Cheng
- School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
| | - Shwu‐Jiuan Lin
- School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
| | - Zong‐You Hsieh
- Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| | - Chi‐Tsun Cheng
- Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| | - Min‐Huei Hsu
- Clinical Data Center, Office of Data ScienceTaipei Medical UniversityTaipei CityTaiwan
- Graduate Institute of Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| | - Jason C. Hsu
- Clinical Data Center, Office of Data ScienceTaipei Medical UniversityTaipei CityTaiwan
- Clinical Big Data Research CenterTaipei Medical University Hospital, Taipei Medical UniversityTaipei CityTaiwan
- Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
- International Ph.D. Program in Biotech and Healthcare Management, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| |
Collapse
|
23
|
Yadav S, Zhou S, He B, Du Y, Garmire LX. Deep-learning and transfer learning identify new breast cancer survival subtypes from single-cell imaging data. medRxiv 2023:2023.09.14.23295578. [PMID: 37745392 PMCID: PMC10516066 DOI: 10.1101/2023.09.14.23295578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Quantitative models that explicitly capture single-cell resolution cell-cell interaction features to predict patient survival at population scale are currently missing. Here, we computationally extracted hundreds of features describing single-cell based cell-cell interactions and cellular phenotypes from a large, published cohort of cyto-images of breast cancer patients. We applied these features to a neural-network based Cox-nnet survival model and obtained high accuracy in predicting patient survival in test data (Concordance Index > 0.8). We identified seven survival subtypes using the top survival features, which present distinct profiles of epithelial, immune, fibroblast cells, and their interactions. We identified atypical subpopulations of TNBC patients with moderate prognosis (marked by GATA3 over-expression) and Luminal A patients with poor prognosis (marked by KRT6 and ACTA2 over-expression and CDH1 under-expression). These atypical subpopulations are validated in TCGA-BRCA and METABRIC datasets. This work provides important guidelines on bridging single-cell level information towards population-level survival prediction. STATEMENT OF TRANSLATIONAL RELEVANCE Our findings from a breast cancer population cohort demonstrate the clinical utility of using the single-cell level imaging mass cytometry (IMC) data as a new type of patient prognosis prediction marker. Not only did the prognosis prediction achieve high accuracy with a Concordance index score greater than 0.8, it also enabled the discovery of seven survival subtypes that are more distinguishable than the molecular subtypes. These new subtypes present distinct profiles of epithelial, immune, fibroblast cells, and their interactions. Most importantly, this study identified and validated atypical subpopulations of TNBC patients with moderate prognosis (GATA3 over-expression) and Luminal A patients with poor prognosis (KRT6 and ACTA2 over-expression and CDH1 under-expression), using multiple large breast cancer cohorts.
Collapse
|
24
|
Zhang Y, Zhang N, Chai X, Sun T. Machine learning for image-based multi-omics analysis of leaf veins. J Exp Bot 2023; 74:4928-4941. [PMID: 37410807 DOI: 10.1093/jxb/erad251] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/29/2023] [Indexed: 07/08/2023]
Abstract
Veins are a critical component of the plant growth and development system, playing an integral role in supporting and protecting leaves, as well as transporting water, nutrients, and photosynthetic products. A comprehensive understanding of the form and function of veins requires a dual approach that combines plant physiology with cutting-edge image recognition technology. The latest advancements in computer vision and machine learning have facilitated the creation of algorithms that can identify vein networks and explore their developmental progression. Here, we review the functional, environmental, and genetic factors associated with vein networks, along with the current status of research on image analysis. In addition, we discuss the methods of venous phenotype extraction and multi-omics association analysis using machine learning technology, which could provide a theoretical basis for improving crop productivity by optimizing the vein network architecture.
Collapse
Affiliation(s)
- Yubin Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Ning Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Xiujuan Chai
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Tan Sun
- Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing, China
- Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| |
Collapse
|
25
|
Martinez-Garcia M, Olmos PM. Handling Ill-Conditioned Omics Data With Deep Probabilistic Models. IEEE J Biomed Health Inform 2023; 27:4601-4610. [PMID: 37224378 DOI: 10.1109/jbhi.2023.3279493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The advent of high-throughput technologies has produced an increase in the dimensionality of omics datasets, which limits the application of machine learning methods due to the great unbalance between the number of observations and features. In this scenario, dimensionality reduction is essential to extract the relevant information within these datasets and project it in a low-dimensional space, and probabilistic latent space models are becoming popular given their capability to capture the underlying structure of the data as well as the uncertainty in the information. This article aims to provide a general classification and dimensionality reduction method based on deep latent space models that tackles two of the main problems that arise in omics datasets: the presence of missing data and the limited number of observations against the number of features. We propose a semi-supervised Bayesian latent space model that infers a low-dimensional embedding driven by the target label: the Deep Bayesian Logistic Regression (DBLR) model. During inference, the model also learns a global vector of weights that allows it to make predictions given the low-dimensional embedding of the observations. Since this kind of dataset is prone to overfitting, we introduce an additional probabilistic regularization method based on the semi-supervised nature of the model. We compared the performance of the DBLR against several state-of-the-art methods for dimensionality reduction, both in synthetic and real datasets with different data types. The proposed model provides more informative low-dimensional representations, outperforms the baseline methods in classification, and can naturally handle missing entries.
Collapse
|
26
|
Zhang K, Ye B, Wu L, Ni S, Li Y, Wang Q, Zhang P, Wang D. Machine learning‑based prediction of survival prognosis in esophageal squamous cell carcinoma. Sci Rep 2023; 13:13532. [PMID: 37598277 PMCID: PMC10439907 DOI: 10.1038/s41598-023-40780-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 08/16/2023] [Indexed: 08/21/2023] Open
Abstract
The current prognostic tools for esophageal squamous cell carcinoma (ESCC) lack the necessary accuracy to facilitate individualized patient management strategies. To address this issue, this study was conducted to develop a machine learning (ML) prediction model for ESCC patients' survival management. Six ML approaches, including Rpart, Elastic Net, GBM, Random Forest, GLMboost, and the machine learning-extended CoxPH method, were employed to develop risk prediction models. The model was trained on a dataset of 1954 ESCC patients with 27 clinical features and validated on a dataset of 487 ESCC patients. The discriminative performance of the models was assessed using the concordance index (C-index). The best performing model was used for risk stratification and clinical evaluation. The study found that N stage, T stage, surgical margin, tumor grade, tumor length, sex, MPV, AST, FIB, and Mg are the important feature for ESCC patients' survival. The machine learning-extended CoxPH model, Elastic Net, and Random Forest had similar performance in predicting the mortality risk of ESCC patients, and outperformed GBM, GLMboost, and Rpart. The risk scores derived from the CoxPH model effectively stratified ESCC patients into low-, intermediate-, and high-risk groups with distinctly different 3-year overall survival (OS) probabilities of 80.8%, 58.2%, and 29.5%, respectively. This risk stratification was also observed in the validation cohort. Furthermore, the risk model demonstrated greater discriminative ability and net benefit than the AJCC8th stage, suggesting its potential as a prognostic tool for predicting survival events and guiding clinical decision-making. The classical algorithm of the CoxPH method was also found to be sufficiently good for interpretive studies.
Collapse
Affiliation(s)
- Kaijiong Zhang
- Department of Clinical Laboratory, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
| | - Bo Ye
- Department of Clinical Laboratory, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
| | - Lichun Wu
- Department of Clinical Laboratory, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
| | - Sujiao Ni
- Department of Clinical Laboratory, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
| | - Yang Li
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Qifeng Wang
- Department of Radiation Oncology, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China.
| | - Peng Zhang
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
| | - Dongsheng Wang
- Department of Clinical Laboratory, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
27
|
Chen C, Wang J, Pan D, Wang X, Xu Y, Yan J, Wang L, Yang X, Yang M, Liu G. Applications of multi-omics analysis in human diseases. MedComm (Beijing) 2023; 4:e315. [PMID: 37533767 PMCID: PMC10390758 DOI: 10.1002/mco2.315] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/25/2023] [Accepted: 05/31/2023] [Indexed: 08/04/2023] Open
Abstract
Multi-omics usually refers to the crossover application of multiple high-throughput screening technologies represented by genomics, transcriptomics, single-cell transcriptomics, proteomics and metabolomics, spatial transcriptomics, and so on, which play a great role in promoting the study of human diseases. Most of the current reviews focus on describing the development of multi-omics technologies, data integration, and application to a particular disease; however, few of them provide a comprehensive and systematic introduction of multi-omics. This review outlines the existing technical categories of multi-omics, cautions for experimental design, focuses on the integrated analysis methods of multi-omics, especially the approach of machine learning and deep learning in multi-omics data integration and the corresponding tools, and the application of multi-omics in medical researches (e.g., cancer, neurodegenerative diseases, aging, and drug target discovery) as well as the corresponding open-source analysis tools and databases, and finally, discusses the challenges and future directions of multi-omics integration and application in precision medicine. With the development of high-throughput technologies and data integration algorithms, as important directions of multi-omics for future disease research, single-cell multi-omics and spatial multi-omics also provided a detailed introduction. This review will provide important guidance for researchers, especially who are just entering into multi-omics medical research.
Collapse
Affiliation(s)
- Chongyang Chen
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
| | - Jing Wang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Donghui Pan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xinyu Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Yuping Xu
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Junjie Yan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Lizhen Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xifei Yang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Min Yang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Gong‐Ping Liu
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
- Department of PathophysiologySchool of Basic MedicineKey Laboratory of Ministry of Education of China and Hubei Province for Neurological DisordersTongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
28
|
Sanjaya P, Maljanen K, Katainen R, Waszak SM, Aaltonen LA, Stegle O, Korbel JO, Pitkänen E. Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping. Genome Med 2023; 15:47. [PMID: 37420249 DOI: 10.1186/s13073-023-01204-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 06/21/2023] [Indexed: 07/09/2023] Open
Abstract
BACKGROUND Cancer genome sequencing enables accurate classification of tumours and tumour subtypes. However, prediction performance is still limited using exome-only sequencing and for tumour types with low somatic mutation burden such as many paediatric tumours. Moreover, the ability to leverage deep representation learning in discovery of tumour entities remains unknown. METHODS We introduce here Mutation-Attention (MuAt), a deep neural network to learn representations of simple and complex somatic alterations for prediction of tumour types and subtypes. In contrast to many previous methods, MuAt utilizes the attention mechanism on individual mutations instead of aggregated mutation counts. RESULTS We trained MuAt models on 2587 whole cancer genomes (24 tumour types) from the Pan-Cancer Analysis of Whole Genomes (PCAWG) and 7352 cancer exomes (20 types) from the Cancer Genome Atlas (TCGA). MuAt achieved prediction accuracy of 89% for whole genomes and 64% for whole exomes, and a top-5 accuracy of 97% and 90%, respectively. MuAt models were found to be well-calibrated and perform well in three independent whole cancer genome cohorts with 10,361 tumours in total. We show MuAt to be able to learn clinically and biologically relevant tumour entities including acral melanoma, SHH-activated medulloblastoma, SPOP-associated prostate cancer, microsatellite instability, POLE proofreading deficiency, and MUTYH-associated pancreatic endocrine tumours without these tumour subtypes and subgroups being provided as training labels. Finally, scrunity of MuAt attention matrices revealed both ubiquitous and tumour-type specific patterns of simple and complex somatic mutations. CONCLUSIONS Integrated representations of somatic alterations learnt by MuAt were able to accurately identify histological tumour types and identify tumour entities, with potential to impact precision cancer medicine.
Collapse
Affiliation(s)
- Prima Sanjaya
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland
| | - Katri Maljanen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland
| | - Riku Katainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland
- Department of Medical and Clinical Genetics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Sebastian M Waszak
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo and Oslo University Hospital, Oslo, Norway
- Swiss Institute for Experimental Cancer Research School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Department of Neurology, University of California, San Francisco (UCSF), San Francisco, CA, USA
| | - Lauri A Aaltonen
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Oliver Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jan O Korbel
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Esa Pitkänen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland.
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| |
Collapse
|
29
|
Abstract
Data are the most important elements of bioinformatics: Computational analysis of bioinformatics data, in fact, can help researchers infer new knowledge about biology, chemistry, biophysics, and sometimes even medicine, influencing treatments and therapies for patients. Bioinformatics and high-throughput biological data coming from different sources can even be more helpful, because each of these different data chunks can provide alternative, complementary information about a specific biological phenomenon, similar to multiple photos of the same subject taken from different angles. In this context, the integration of bioinformatics and high-throughput biological data gets a pivotal role in running a successful bioinformatics study. In the last decades, data originating from proteomics, metabolomics, metagenomics, phenomics, transcriptomics, and epigenomics have been labelled -omics data, as a unique name to refer to them, and the integration of these omics data has gained importance in all biological areas. Even if this omics data integration is useful and relevant, due to its heterogeneity, it is not uncommon to make mistakes during the integration phases. We therefore decided to present these ten quick tips to perform an omics data integration correctly, avoiding common mistakes we experienced or noticed in published studies in the past. Even if we designed our ten guidelines for beginners, by using a simple language that (we hope) can be understood by anyone, we believe our ten recommendations should be taken into account by all the bioinformaticians performing omics data integration, including experts.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Fabio Cumbo
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Claudio Angione
- School of Computing Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
30
|
Zheng ZQ, Yuan GQ, Zhang GG, Nie QQ, Wang Z. Development and validation of a predictive model in diagnosis and prognosis of primary glioblastoma patients based on Homeobox A family. Discov Oncol 2023; 14:108. [PMID: 37351805 DOI: 10.1007/s12672-023-00726-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 06/13/2023] [Indexed: 06/24/2023] Open
Abstract
BACKGROUND Homeobox A (HOXA) family is involved in the development of malignancies as either tumor suppressors or oncogenes. However, their roles in glioblastoma (GBM) and clinical significance have not been fully elucidated. METHODS HOXA mutation and expressions in pan-cancers were investigated using GSCA and Oncomine, which in GBM were validated by cBioPortal, Chinese Glioma Genome Atlas (CGGA), and The Cancer Genome Atlas (TCGA) datasets. Kaplan-Meier analyses were conducted to determine prognostic values of HOXAs at genetic and mRNA levels. Diagnostic roles of HOXAs in tumor classification were explored by GlioVis and R software. Independent prognostic HOXAs were identified using Cox survival analyses, the least absolute shrinkage and selection operator (LASSO) regression, quantitative real-time PCR, and immunohistochemical staining. A HOXAs-based nomogram survival prediction model was developed and evaluated using Kaplan-Meier analysis, time-dependent Area Under Curve, calibration plots, and Decision Curve Analysis in training and validation cohorts. RESULTS HOXAs were highly mutated and overexpressed in pan-cancers, especially in CGGA and TCGA GBM datasets. Genetic alteration and mRNA expression of HOXAs were both found to be prognostic. Specific HOXAs could distinguish IDH mutation (HOXA1-7, HOXA9, HOXA13) and molecular GBM subtypes (HOXA1-2, HOXA9-11, HOXA13). HOXA1/2/3/10 were confirmed to be independent prognostic members, with high expressions validated in clinical GBM tissues. The HOXAs-based nomogram model exhibited good prediction performance and net benefits for patients in training and validation cohorts. CONCLUSION HOXA family has diagnostic values, and the HOXAs-based nomogram model is effective in survival prediction, providing a novel approach to support the treatment of GBM patients.
Collapse
Affiliation(s)
- Zong-Qing Zheng
- Department of Neurosurgery & Brain and Nerve Research Laboratory, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou, 215006, Jiangsu Province, China
| | - Gui-Qiang Yuan
- Beijing Neurosurgical Institute & Department of Neurosurgery, Beijing Tiantan Hospital Affiliated to Capital Medical University, Capital Medical University, Beijing, China
| | - Guo-Guo Zhang
- Department of Neurosurgery & Brain and Nerve Research Laboratory, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou, 215006, Jiangsu Province, China
| | - Qian-Qian Nie
- Department of Neurosurgery & Brain and Nerve Research Laboratory, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou, 215006, Jiangsu Province, China
| | - Zhong Wang
- Department of Neurosurgery & Brain and Nerve Research Laboratory, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Suzhou, 215006, Jiangsu Province, China.
| |
Collapse
|
31
|
Lee M. Deep Learning Techniques with Genomic Data in Cancer Prognosis: A Comprehensive Review of the 2021-2023 Literature. Biology (Basel) 2023; 12:893. [PMID: 37508326 PMCID: PMC10376033 DOI: 10.3390/biology12070893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/16/2023] [Accepted: 06/20/2023] [Indexed: 07/30/2023]
Abstract
Deep learning has brought about a significant transformation in machine learning, leading to an array of novel methodologies and consequently broadening its influence. The application of deep learning in various sectors, especially biomedical data analysis, has initiated a period filled with noteworthy scientific developments. This trend has majorly influenced cancer prognosis, where the interpretation of genomic data for survival analysis has become a central research focus. The capacity of deep learning to decode intricate patterns embedded within high-dimensional genomic data has provoked a paradigm shift in our understanding of cancer survival. Given the swift progression in this field, there is an urgent need for a comprehensive review that focuses on the most influential studies from 2021 to 2023. This review, through its careful selection and thorough exploration of dominant trends and methodologies, strives to fulfill this need. The paper aims to enhance our existing understanding of applications of deep learning in cancer survival analysis, while also highlighting promising directions for future research. This paper undertakes aims to enrich our existing grasp of the application of deep learning in cancer survival analysis, while concurrently shedding light on promising directions for future research in this vibrant and rapidly proliferating field.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
32
|
Ye W, Chen X, Li P, Tao Y, Wang Z, Gao C, Cheng J, Li F, Yi D, Wei Z, Yi D, Wu Y. OEDL: an optimized ensemble deep learning method for the prediction of acute ischemic stroke prognoses using union features. Front Neurol 2023; 14:1158555. [PMID: 37416306 PMCID: PMC10321134 DOI: 10.3389/fneur.2023.1158555] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 05/22/2023] [Indexed: 07/08/2023] Open
Abstract
Background Early stroke prognosis assessments are critical for decision-making regarding therapeutic intervention. We introduced the concepts of data combination, method integration, and algorithm parallelization, aiming to build an integrated deep learning model based on a combination of clinical and radiomics features and analyze its application value in prognosis prediction. Methods The research steps in this study include data source and feature extraction, data processing and feature fusion, model building and optimization, model training, and so on. Using data from 441 stroke patients, clinical and radiomics features were extracted, and feature selection was performed. Clinical, radiomics, and combined features were included to construct predictive models. We applied the concept of deep integration to the joint analysis of multiple deep learning methods, used a metaheuristic algorithm to improve the parameter search efficiency, and finally, developed an acute ischemic stroke (AIS) prognosis prediction method, namely, the optimized ensemble of deep learning (OEDL) method. Results Among the clinical features, 17 features passed the correlation check. Among the radiomics features, 19 features were selected. In the comparison of the prediction performance of each method, the OEDL method based on the concept of ensemble optimization had the best classification performance. In the comparison to the predictive performance of each feature, the inclusion of the combined features resulted in better classification performance than that of the clinical and radiomics features. In the comparison to the prediction performance of each balanced method, SMOTEENN, which is based on a hybrid sampling method, achieved the best classification performance than that of the unbalanced, oversampled, and undersampled methods. The OEDL method with combined features and mixed sampling achieved the best classification performance, with 97.89, 95.74, 94.75, 94.03, and 94.35% for Macro-AUC, ACC, Macro-R, Macro-P, and Macro-F1, respectively, and achieved advanced performance in comparison with that of methods in previous studies. Conclusion The OEDL approach proposed herein could effectively achieve improved stroke prognosis prediction performance, the effect of using combined data modeling was significantly better than that of single clinical or radiomics feature models, and the proposed method had a better intervention guidance value. Our approach is beneficial for optimizing the early clinical intervention process and providing the necessary clinical decision support for personalized treatment.
Collapse
Affiliation(s)
- Wei Ye
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, China
| | - Xicheng Chen
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, China
| | - Pengpeng Li
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, China
| | - Yongjun Tao
- Department of Neurology, Taizhou Municipal Hospital, Taizhou, Zhejiang, China
| | - Zhenyan Wang
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, China
| | - Chengcheng Gao
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, China
| | - Jian Cheng
- Department of Radiology, Taizhou Municipal Hospital, Taizhou, Zhejiang, China
| | - Fang Li
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, China
| | - Dali Yi
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, China
- Department of Health Education, College of Preventive Medicine, Army Medical University, Chongqing, China
| | - Zeliang Wei
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, China
| | - Dong Yi
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, China
| | - Yazhou Wu
- Department of Health Statistics, College of Preventive Medicine, Army Medical University, Chongqing, China
| |
Collapse
|
33
|
Zhang Z, Lu Y, Vosoughi S, Levy J, Christensen B, Salas L. HiTAIC: hierarchical tumor artificial intelligence classifier traces tissue of origin and tumor type in primary and metastasized tumors using DNA methylation. NAR Cancer 2023; 5:zcad017. [PMID: 37089814 PMCID: PMC10113876 DOI: 10.1093/narcan/zcad017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/04/2023] [Accepted: 04/13/2023] [Indexed: 04/25/2023] Open
Abstract
Human cancers are heterogenous by their cell composition and origination site. Cancer metastasis generates the conundrum of the unknown origin of migrated tumor cells. Tracing tissue of origin and tumor type in primary and metastasized cancer is vital for clinical significance. DNA methylation alterations play a crucial role in carcinogenesis and mark cell fate differentiation, thus can be used to trace tumor tissue of origin. In this study, we employed a novel tumor-type-specific hierarchical model using genome-scale DNA methylation data to develop a multilayer perceptron model, HiTAIC, to trace tissue of origin and tumor type in 27 cancers from 23 tissue sites in data from 7735 tumors with high resolution, accuracy, and specificity. In tracing primary cancer origin, HiTAIC accuracy was 99% in the test set and 93% in the external validation data set. Metastatic cancers were identified with a 96% accuracy in the external data set. HiTAIC is a user-friendly web-based application through https://sites.dartmouth.edu/salaslabhitaic/. In conclusion, we developed HiTAIC, a DNA methylation-based algorithm, to trace tumor tissue of origin in primary and metastasized cancers. The high accuracy and resolution of tumor tracing using HiTAIC holds promise for clinical assistance in identifying cancer of unknown origin.
Collapse
Affiliation(s)
- Ze Zhang
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
- Quantitative Biomedical Sciences Program, Guarini School of Graduate and Advanced Studies, Dartmouth College, Hanover, NH, USA
| | - Yunrui Lu
- Quantitative Biomedical Sciences Program, Guarini School of Graduate and Advanced Studies, Dartmouth College, Hanover, NH, USA
| | - Soroush Vosoughi
- Department of Computer Science, Dartmouth College, Hanover, NH, USA
| | - Joshua J Levy
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
- Quantitative Biomedical Sciences Program, Guarini School of Graduate and Advanced Studies, Dartmouth College, Hanover, NH, USA
- Department of Pathology and Dermatology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Brock C Christensen
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
- Quantitative Biomedical Sciences Program, Guarini School of Graduate and Advanced Studies, Dartmouth College, Hanover, NH, USA
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Lucas A Salas
- To whom correspondence should be addressed. Tel: +1 603 646 5420;
| |
Collapse
|
34
|
Pang J, Xiu W, Ma X. Application of Artificial Intelligence in the Diagnosis, Treatment, and Prognostic Evaluation of Mediastinal Malignant Tumors. J Clin Med 2023; 12:jcm12082818. [PMID: 37109155 PMCID: PMC10144939 DOI: 10.3390/jcm12082818] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/01/2023] [Accepted: 04/06/2023] [Indexed: 04/29/2023] Open
Abstract
Artificial intelligence (AI), also known as machine intelligence, is widely utilized in the medical field, promoting medical advances. Malignant tumors are the critical focus of medical research and improvement of clinical diagnosis and treatment. Mediastinal malignancy is an important tumor that attracts increasing attention today due to the difficulties in treatment. Combined with artificial intelligence, challenges from drug discovery to survival improvement are constantly being overcome. This article reviews the progress of the use of AI in the diagnosis, treatment, and prognostic prospects of mediastinal malignant tumors based on current literature findings.
Collapse
Affiliation(s)
- Jiyun Pang
- Division of Thoracic Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, Chengdu 610041, China
- State Key Laboratory of Biotherapy, Cancer Center, West China Hospital, Sichuan University, Chengdu 610041, China
- West China School of Medicine, Sichuan University, Chengdu 610041, China
| | - Weigang Xiu
- Division of Thoracic Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, Chengdu 610041, China
- State Key Laboratory of Biotherapy, Cancer Center, West China Hospital, Sichuan University, Chengdu 610041, China
- West China School of Medicine, Sichuan University, Chengdu 610041, China
| | - Xuelei Ma
- Department of Biotherapy, Cancer Center, West China Hospital, Sichuan University, Chengdu 610041, China
| |
Collapse
|
35
|
Lee S, Jung H, Park J, Ahn J. Accurate Prediction of Cancer Prognosis by Exploiting Patient-Specific Cancer Driver Genes. Int J Mol Sci 2023; 24:ijms24076445. [PMID: 37047418 PMCID: PMC10095073 DOI: 10.3390/ijms24076445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/17/2023] [Accepted: 03/28/2023] [Indexed: 04/03/2023] Open
Abstract
Accurate prediction of the prognoses of cancer patients and identification of prognostic biomarkers are both important for the improved treatment of cancer patients, in addition to enhanced anticancer drugs. Many previous bioinformatic studies have been carried out to achieve this goal; however, there remains room for improvement in terms of accuracy. In this study, we demonstrated that patient-specific cancer driver genes could be used to predict cancer prognoses more accurately. To identify patient-specific cancer driver genes, we first generated patient-specific gene networks before using modified PageRank to generate feature vectors that represented the impacts genes had on the patient-specific gene network. Subsequently, the feature vectors of the good and poor prognosis groups were used to train the deep feedforward network. For the 11 cancer types in the TCGA data, the proposed method showed a significantly better prediction performance than the existing state-of-the-art methods for three cancer types (BRCA, CESC and PAAD), better performance for five cancer types (COAD, ESCA, HNSC, KIRC and STAD), and a similar or slightly worse performance for the remaining three cancer types (BLCA, LIHC and LUAD). Furthermore, the case study for the identified breast cancer and cervical squamous cell carcinoma prognostic genes and their subnetworks included several pathways associated with the progression of breast cancer and cervical squamous cell carcinoma. These results suggested that heterogeneous cancer driver information may be associated with cancer prognosis.
Collapse
Affiliation(s)
- Suyeon Lee
- Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea
| | - Heewon Jung
- Samsung Electronics Company Ltd., Suwon 16677, Republic of Korea
| | - Jiwoo Park
- Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea
| | - Jaegyoon Ahn
- Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea
- Correspondence:
| |
Collapse
|
36
|
Zhao J, Zhao B, Song X, Lyu C, Chen W, Xiong Y, Wei DQ. Subtype-DCC: decoupled contrastive clustering method for cancer subtype identification based on multi-omics data. Brief Bioinform 2023; 24:7005165. [PMID: 36702755 DOI: 10.1093/bib/bbad025] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 12/21/2022] [Accepted: 01/08/2023] [Indexed: 01/28/2023] Open
Abstract
Due to the high heterogeneity and complexity of cancers, patients with different cancer subtypes often have distinct groups of genomic and clinical characteristics. Therefore, the discovery and identification of cancer subtypes are crucial to cancer diagnosis, prognosis and treatment. Recent technological advances have accelerated the increasing availability of multi-omics data for cancer subtyping. To take advantage of the complementary information from multi-omics data, it is necessary to develop computational models that can represent and integrate different layers of data into a single framework. Here, we propose a decoupled contrastive clustering method (Subtype-DCC) based on multi-omics data integration for clustering to identify cancer subtypes. The idea of contrastive learning is introduced into deep clustering based on deep neural networks to learn clustering-friendly representations. Experimental results demonstrate the superior performance of the proposed Subtype-DCC model in identifying cancer subtypes over the currently available state-of-the-art clustering methods. The strength of Subtype-DCC is also supported by the survival and clinical analysis.
Collapse
Affiliation(s)
- Jing Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaotong Song
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Chujun Lyu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Weizhi Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
- Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong, 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nayang, Henan, 473006, China
| |
Collapse
|
37
|
Unlu Yazici M, Marron JS, Bakir-Gungor B, Zou F, Yousef M. Invention of 3Mint for feature grouping and scoring in multi-omics. Front Genet 2023; 14:1093326. [PMID: 37007972 PMCID: PMC10050723 DOI: 10.3389/fgene.2023.1093326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 02/27/2023] [Indexed: 03/17/2023] Open
Abstract
Advanced genomic and molecular profiling technologies accelerated the enlightenment of the regulatory mechanisms behind cancer development and progression, and the targeted therapies in patients. Along this line, intense studies with immense amounts of biological information have boosted the discovery of molecular biomarkers. Cancer is one of the leading causes of death around the world in recent years. Elucidation of genomic and epigenetic factors in Breast Cancer (BRCA) can provide a roadmap to uncover the disease mechanisms. Accordingly, unraveling the possible systematic connections between-omics data types and their contribution to BRCA tumor progression is crucial. In this study, we have developed a novel machine learning (ML) based integrative approach for multi-omics data analysis. This integrative approach combines information from gene expression (mRNA), microRNA (miRNA) and methylation data. Due to the complexity of cancer, this integrated data is expected to improve the prediction, diagnosis and treatment of disease through patterns only available from the 3-way interactions between these 3-omics datasets. In addition, the proposed method bridges the interpretation gap between the disease mechanisms that drive onset and progression. Our fundamental contribution is the 3 Multi-omics integrative tool (3Mint). This tool aims to perform grouping and scoring of groups using biological knowledge. Another major goal is improved gene selection via detection of novel groups of cross-omics biomarkers. Performance of 3Mint is assessed using different metrics. Our computational performance evaluations showed that the 3Mint classifies the BRCA molecular subtypes with lower number of genes when compared to the miRcorrNet tool which uses miRNA and mRNA gene expression profiles in terms of similar performance metrics (95% Accuracy). The incorporation of methylation data in 3Mint yields a much more focused analysis. The 3Mint tool and all other supplementary files are available at https://github.com/malikyousef/3Mint/.
Collapse
Affiliation(s)
- Miray Unlu Yazici
- Department of Bioengineering, Abdullah Gül University, Kayseri, Türkiye
| | - J. S. Marron
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, United States
| | - Burcu Bakir-Gungor
- Department of Bioengineering, Abdullah Gül University, Kayseri, Türkiye
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Türkiye
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
- *Correspondence: Malik Yousef,
| |
Collapse
|
38
|
Santhanam B, Oikonomou P, Tavazoie S. Systematic assessment of prognostic molecular features across cancers. Cell Genom 2023; 3:100262. [PMID: 36950380 PMCID: PMC10025453 DOI: 10.1016/j.xgen.2023.100262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 09/29/2022] [Accepted: 01/12/2023] [Indexed: 02/05/2023]
Abstract
Precision oncology promises accurate prediction of disease trajectories by utilizing molecular features of tumors. We present a systematic analysis of the prognostic potential of diverse molecular features across large cancer cohorts. We find that the mRNA expression of biologically coherent sets of genes (modules) is substantially more predictive of patient survival than single-locus genomic and transcriptomic aberrations. Extending our analysis beyond existing curated gene modules, we find a large novel class of highly prognostic DNA/RNA cis-regulatory modules associated with dynamic gene expression within cancers. Remarkably, in more than 82% of cancers, modules substantially improve survival stratification compared with conventional clinical factors and prominent genomic aberrations. The prognostic potential of cancer modules generalizes to external cohorts better than conventionally used single-gene features. Finally, a machine-learning framework demonstrates the combined predictive power of multiple modules, yielding prognostic models that perform substantially better than existing histopathological and clinical factors in common use.
Collapse
Affiliation(s)
- Balaji Santhanam
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10032, USA
| | - Panos Oikonomou
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10032, USA
| | - Saeed Tavazoie
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
39
|
Pérez Del Barrio A, Esteve Domínguez AS, Menéndez Fernández-Miranda P, Sanz Bellón P, Rodríguez González D, Lloret Iglesias L, Marqués Fraguela E, González Mandly AA, Vega JA. A deep learning model for prognosis prediction after intracranial hemorrhage. J Neuroimaging 2023; 33:218-226. [PMID: 36585957 DOI: 10.1111/jon.13078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 12/13/2022] [Accepted: 12/20/2022] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND AND PURPOSE Intracranial hemorrhage (ICH) is a common life-threatening condition that must be rapidly diagnosed and treated. However, there is still a lack of consensus regarding treatment, driven to some extent by prognostic uncertainty. While several prediction models for ICH detection have already been published, here we present a deep learning predictive model for ICH prognosis. METHODS We included patients with ICH (n = 262), and we trained a custom model for the classification of patients into poor prognosis and good prognosis, using a hybrid input consisting of brain CT images and other clinical variables. We compared it with two other models, one trained with images only (I-model) and the other with tabular data only (D-model). RESULTS Our hybrid model achieved an area under the receiver operating characteristic curve (AUC) of .924 (95% confidence interval [CI]: .831-.986), and an accuracy of .861 (95% CI: .760-.960). The I- and D-models achieved an AUC of .763 (95% CI: .622-.902) and .746 (95% CI: .598-.876), respectively. CONCLUSIONS The proposed hybrid model was able to accurately classify patients into good and poor prognosis. To the best of our knowledge, this is the first ICH prognosis prediction deep learning model. We concluded that deep learning can be applied for prognosis prediction in ICH that could have a great impact on clinical decision-making. Further, hybrid inputs could be a promising technique for deep learning in medical imaging.
Collapse
Affiliation(s)
- Amaia Pérez Del Barrio
- Servicio de Radiodiagnóstico, Hospital Universitario "Marqués de Valdecilla", Santander, Spain
| | - Anna Salut Esteve Domínguez
- Advanced Computation and e-Science, Instituto de Física de Cantabria (IFCA), Consejo Superior de Investigaciones Científicas (CSIC), Santander, Spain
| | | | - Pablo Sanz Bellón
- Servicio de Radiodiagnóstico, Hospital Universitario "Marqués de Valdecilla", Santander, Spain
| | - David Rodríguez González
- Advanced Computation and e-Science, Instituto de Física de Cantabria (IFCA), Consejo Superior de Investigaciones Científicas (CSIC), Santander, Spain
| | - Lara Lloret Iglesias
- Advanced Computation and e-Science, Instituto de Física de Cantabria (IFCA), Consejo Superior de Investigaciones Científicas (CSIC), Santander, Spain
| | | | | | - José A Vega
- Departamento de Morfología y Biología Celular, Universidad de Oviedo, Oviedo, Spain.,Facultad de Ciencias de la Salud, Universidad Autónoma de Chile, Santiago de Chile, Chile
| |
Collapse
|
40
|
Li J, Li L, You P, Wei Y, Xu B. Towards artificial intelligence to multi-omics characterization of tumor heterogeneity in esophageal cancer. Semin Cancer Biol 2023; 91:35-49. [PMID: 36868394 DOI: 10.1016/j.semcancer.2023.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 02/21/2023] [Accepted: 02/28/2023] [Indexed: 03/05/2023]
Abstract
Esophageal cancer is a unique and complex heterogeneous malignancy, with substantial tumor heterogeneity: at the cellular levels, tumors are composed of tumor and stromal cellular components; at the genetic levels, they comprise genetically distinct tumor clones; at the phenotypic levels, cells in distinct microenvironmental niches acquire diverse phenotypic features. This heterogeneity affects almost every process of esophageal cancer progression from onset to metastases and recurrence, etc. Intertumoral and intratumoral heterogeneity are major obstacles in the treatment of esophageal cancer, but also offer the potential to manipulate the heterogeneity themselves as a new therapeutic strategy. The high-dimensional, multi-faceted characterization of genomics, epigenomics, transcriptomics, proteomics, metabonomics, etc. of esophageal cancer has opened novel horizons for dissecting tumor heterogeneity. Artificial intelligence especially machine learning and deep learning algorithms, are able to make decisive interpretations of data from multi-omics layers. To date, artificial intelligence has emerged as a promising computational tool for analyzing and dissecting esophageal patient-specific multi-omics data. This review provides a comprehensive review of tumor heterogeneity from a multi-omics perspective. Especially, we discuss the novel techniques single-cell sequencing and spatial transcriptomics, which have revolutionized our understanding of the cell compositions of esophageal cancer and allowed us to determine novel cell types. We focus on the latest advances in artificial intelligence in integrating multi-omics data of esophageal cancer. Artificial intelligence-based multi-omics data integration computational tools exert a key role in tumor heterogeneity assessment, which will potentially boost the development of precision oncology in esophageal cancer.
Collapse
Affiliation(s)
- Junyu Li
- Department of Radiation Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China; Jiangxi Health Committee Key (JHCK) Laboratory of Tumor Metastasis, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Lin Li
- Department of Thoracic Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Peimeng You
- Nanchang University, Department of Radiation Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Yiping Wei
- Department of Thoracic Surgery, The Second Affiliated Hospital of Nanchang University, Nanchang 330006, Jiangxi, China.
| | - Bin Xu
- Jiangxi Health Committee Key (JHCK) Laboratory of Tumor Metastasis, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China.
| |
Collapse
|
41
|
Li H, Tao X, Liang T, Jiang J, Zhu J, Wu S, Chen L, Zhang Z, Zhou C, Sun X, Huang S, Chen J, Chen T, Ye Z, Chen W, Guo H, Yao Y, Liao S, Yu C, Fan B, Liu Y, Lu C, Hu J, Xie Q, Wei X, Fang C, Liu H, Huang C, Pan S, Zhan X, Liu C. Comprehensive AI-assisted tool for ankylosing spondylitis based on multicenter research outperforms human experts. Front Public Health 2023; 11:1063633. [PMID: 36844823 PMCID: PMC9947660 DOI: 10.3389/fpubh.2023.1063633] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 01/18/2023] [Indexed: 02/11/2023] Open
Abstract
Introduction The diagnosis and treatment of ankylosing spondylitis (AS) is a difficult task, especially in less developed countries without access to experts. To address this issue, a comprehensive artificial intelligence (AI) tool was created to help diagnose and predict the course of AS. Methods In this retrospective study, a dataset of 5389 pelvic radiographs (PXRs) from patients treated at a single medical center between March 2014 and April 2022 was used to create an ensemble deep learning (DL) model for diagnosing AS. The model was then tested on an additional 583 images from three other medical centers, and its performance was evaluated using the area under the receiver operating characteristic curve analysis, accuracy, precision, recall, and F1 scores. Furthermore, clinical prediction models for identifying high-risk patients and triaging patients were developed and validated using clinical data from 356 patients. Results The ensemble DL model demonstrated impressive performance in a multicenter external test set, with precision, recall, and area under the receiver operating characteristic curve values of 0.90, 0.89, and 0.96, respectively. This performance surpassed that of human experts, and the model also significantly improved the experts' diagnostic accuracy. Furthermore, the model's diagnosis results based on smartphone-captured images were comparable to those of human experts. Additionally, a clinical prediction model was established that accurately categorizes patients with AS into high-and low-risk groups with distinct clinical trajectories. This provides a strong foundation for individualized care. Discussion In this study, an exceptionally comprehensive AI tool was developed for the diagnosis and management of AS in complex clinical scenarios, especially in underdeveloped or rural areas that lack access to experts. This tool is highly beneficial in providing an efficient and effective system of diagnosis and management.
Collapse
Affiliation(s)
- Hao Li
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Xiang Tao
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Tuo Liang
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Jie Jiang
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Jichong Zhu
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Shaofeng Wu
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Liyi Chen
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Zide Zhang
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Chenxing Zhou
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Xuhua Sun
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Shengsheng Huang
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Jiarui Chen
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Tianyou Chen
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Zhen Ye
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Wuhua Chen
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Hao Guo
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Yuanlin Yao
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Shian Liao
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Chaojie Yu
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Binguang Fan
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Yihong Liu
- Guangxi Medical University, Nanning, Guangxi, China
| | - Chunai Lu
- Guangxi Medical University, Nanning, Guangxi, China
| | - Junnan Hu
- Guangxi Medical University, Nanning, Guangxi, China
| | - Qinghong Xie
- Guangxi Medical University, Nanning, Guangxi, China
| | - Xiao Wei
- Guangxi Medical University, Nanning, Guangxi, China
| | - Cairen Fang
- Guangxi Medical University, Nanning, Guangxi, China
| | - Huijiang Liu
- Orthopaedics of The First People's Hospital of Nanning, Nanning, Guangxi, China
| | - Chengqian Huang
- Orthopaedics of People's Hospital of Baise, Baise, Guangxi, China
| | - Shixin Pan
- Orthopaedics of Wuzhou Red Cross Hospital, Wuzhou, Guangxi, China
| | - Xinli Zhan
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Chong Liu
- The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China,*Correspondence: Chong Liu ✉
| |
Collapse
|
42
|
Feldner-Busztin D, Firbas Nisantzis P, Edmunds SJ, Boza G, Racimo F, Gopalakrishnan S, Limborg MT, Lahti L, de Polavieja GG. Dealing with dimensionality: the application of machine learning to multi-omics data. Bioinformatics 2023; 39:6986971. [PMID: 36637211 PMCID: PMC9907220 DOI: 10.1093/bioinformatics/btad021] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 12/02/2022] [Accepted: 01/11/2023] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION Machine learning (ML) methods are motivated by the need to automate information extraction from large datasets in order to support human users in data-driven tasks. This is an attractive approach for integrative joint analysis of vast amounts of omics data produced in next generation sequencing and other -omics assays. A systematic assessment of the current literature can help to identify key trends and potential gaps in methodology and applications. We surveyed the literature on ML multi-omic data integration and quantitatively explored the goals, techniques and data involved in this field. We were particularly interested in examining how researchers use ML to deal with the volume and complexity of these datasets. RESULTS Our main finding is that the methods used are those that address the challenges of datasets with few samples and many features. Dimensionality reduction methods are used to reduce the feature count alongside models that can also appropriately handle relatively few samples. Popular techniques include autoencoders, random forests and support vector machines. We also found that the field is heavily influenced by the use of The Cancer Genome Atlas dataset, which is accessible and contains many diverse experiments. AVAILABILITY AND IMPLEMENTATION All data and processing scripts are available at this GitLab repository: https://gitlab.com/polavieja_lab/ml_multi-omics_review/ or in Zenodo: https://doi.org/10.5281/zenodo.7361807. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dylan Feldner-Busztin
- Champalimaud Centre for the Unknown, Champalimaud Foundation, 1400-038 Lisbon, Portugal
| | | | - Shelley Jane Edmunds
- Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| | - Gergely Boza
- Centre for Ecological Research, 1113 Budapest, Hungary
| | - Fernando Racimo
- Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Shyam Gopalakrishnan
- Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| | - Morten Tønsberg Limborg
- Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, 1353 Copenhagen, Denmark
| | - Leo Lahti
- Department of Computing, University of Turku, 20014 Turku, Finland
| | | |
Collapse
|
43
|
Juan H, Huang H. Quantitative analysis of high‐throughput biological data. WIREs Comput Mol Sci 2023. [DOI: 10.1002/wcms.1658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- Hsueh‐Fen Juan
- Department of Life Science, Institute of Biomedical Electronics and Bioinformatics, and Center for Systems Biology National Taiwan University Taipei Taiwan
- Taiwan AI Labs Taipei Taiwan
| | - Hsuan‐Cheng Huang
- Institute of Biomedical Informatics National Yang Ming Chiao Tung University Taipei Taiwan
| |
Collapse
|
44
|
Attallah O, Ragab DA. Auto-MyIn: Automatic diagnosis of myocardial infarction via multiple GLCMs, CNNs, and SVMs. Biomed Signal Process Control 2023; 80:104273. [DOI: 10.1016/j.bspc.2022.104273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
45
|
Mathema VB, Sen P, Lamichhane S, Orešič M, Khoomrung S. Deep learning facilitates multi-data type analysis and predictive biomarker discovery in cancer precision medicine. Comput Struct Biotechnol J 2023; 21:1372-1382. [PMID: 36817954 PMCID: PMC9929204 DOI: 10.1016/j.csbj.2023.01.043] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 01/28/2023] [Accepted: 01/29/2023] [Indexed: 02/02/2023] Open
Abstract
Cancer progression is linked to gene-environment interactions that alter cellular homeostasis. The use of biomarkers as early indicators of disease manifestation and progression can substantially improve diagnosis and treatment. Large omics datasets generated by high-throughput profiling technologies, such as microarrays, RNA sequencing, whole-genome shotgun sequencing, nuclear magnetic resonance, and mass spectrometry, have enabled data-driven biomarker discoveries. The identification of differentially expressed traits as molecular markers has traditionally relied on statistical techniques that are often limited to linear parametric modeling. The heterogeneity, epigenetic changes, and high degree of polymorphism observed in oncogenes demand biomarker-assisted personalized medication schemes. Deep learning (DL), a major subunit of machine learning (ML), has been increasingly utilized in recent years to investigate various diseases. The combination of ML/DL approaches for performance optimization across multi-omics datasets produces robust ensemble-learning prediction models, which are becoming useful in precision medicine. This review focuses on the recent development of ML/DL methods to provide integrative solutions in discovering cancer-related biomarkers, and their utilization in precision medicine.
Collapse
Affiliation(s)
- Vivek Bhakta Mathema
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Partho Sen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
- School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Santosh Lamichhane
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Matej Orešič
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
- School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Sakda Khoomrung
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Center of Excellence for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Bangkok, Thailand
- Corresponding author at: Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
46
|
Gong S, Bou Kheir G, Kabarriti A, Khosla L, Gong F, Van Laecke E, Weiss J, Everaert K, Hervé F. 'Nocturomics': transition to omics-driven biomarkers of nocturia, a systematic review and future prospects. BJU Int 2023; 131:675-684. [PMID: 36683403 DOI: 10.1111/bju.15975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
OBJECTIVE To systematically review studies that investigated different biomarkers of nocturia, including omics-driven biomarkers or 'Nocturomics'. MATERIALS AND METHODS PubMed® , Scopus® , and Embase® were searched systematically in May 2022 for research papers on biomarkers in physiological fluids and tissues from patients with nocturia. A distinction was made between biomarkers or candidates discovered by omics techniques, referred to as omics-driven biomarkers, and classical biomarkers, measured by standard laboratory techniques and mostly thought from pathophysiological hypothesis. RESULTS A total of 13 studies with 18 881 patients in total were included, eight of which focused on classical biomarkers including: atrial natriuretic peptide (ANP), B-type natriuretic peptide (BNP), C-reactive protein (CRP), aldosterone, and melatonin. Five were 'Nocturomics', including one that assessed the microbiome and identified 27 faecal and eight urinary bacteria correlated with nocturia; and four studies that identified candidate metabolomic biomarkers, including fatty acid metabolites, serotonin, glycerol, lauric acid, thiaproline, and imidazolelactic acid among others. To date, no biomarker is recommended in clinical practice. Nocturomics are in an embryonic phase of conception but are developing quickly. Although candidate biomarkers are being identified, none of them are yet validated on a large sample, although some preclinical studies have shown a probable role of fatty acid metabolites as a possible biomarker of circadian rhythm and chronotherapy. CONCLUSION Further research is needed to validate biomarkers for nocturia within the framework of a diagnostic and therapeutic precision medicine perspective. We hope this study provides a summary of the current biomarker discoveries associated with nocturia and details future prospects for omics-driven biomarkers.
Collapse
Affiliation(s)
- Susan Gong
- Department of Urology, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - George Bou Kheir
- Department of Urology, Ghent University Hospital, Ghent, Belgium
| | - Abdo Kabarriti
- Department of Urology, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Lakshay Khosla
- Department of Urology, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Fred Gong
- Department of Urology, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Erik Van Laecke
- Department of Urology, Ghent University Hospital, Ghent, Belgium
| | - Jeffrey Weiss
- Department of Urology, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Karel Everaert
- Department of Urology, Ghent University Hospital, Ghent, Belgium
| | - François Hervé
- Department of Urology, Ghent University Hospital, Ghent, Belgium
| |
Collapse
|
47
|
Liu Z, Chen Y, Shen T. Evidence Based on an Integrative Analysis of Multi-Omics Data on METTL7A as a Molecular Marker in Pan-Cancer. Biomolecules 2023; 13. [PMID: 36830565 DOI: 10.3390/biom13020195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 01/05/2023] [Accepted: 01/16/2023] [Indexed: 01/20/2023] Open
Abstract
Methyltransferase-like protein 7A (METTL7A), an RNA N6-methyladenosine (m6A) methyltransferase, has attracted much attention as it has been found to be closely associated with various types of tumorigenesis and progression. This study provides a comprehensive assessment of METTL7A from a pan-cancer perspective using multi-omics data. The gene ontology enrichment analysis of METTL7A-binding proteins revealed a close association with methylation and lipid metabolism. We then explored the expression of METTL7A in normal tissues, cell lines, different subtypes and cancers, and found that METTL7A was differentially expressed in various cancer species, tumor molecular subtypes and immune subtypes. Evaluation of the diagnostic and prognostic value of METTL7A in pan-cancer revealed that METTL7A had high accuracy in tumor prediction. Moreover, the low expression of METTL7A significantly correlated with the poor prognosis, including kidney renal clear cell carcinoma (KIRC), mesothelioma and sarcoma, indicating that METTL7A could be a potential biomarker for tumor diagnosis and prognosis. We focused on KIRC after pre-screening and analyzed its expression and prognostic value in various clinical subgroups. We found that METTL7A was significantly related to tumor stage, metastasis stage, pathologic stage, primary therapy outcome, histologic grade and gender, and that low METTL7A expression was associated with poorer outcomes. Finally, we analyzed the immune infiltration and co-expressed genes of METTL7A as well as the differentially expressed genes in the high and low expression groups. In conclusion, METTL7A is a better molecular marker for pan-cancer diagnosis and prognosis and has high potential as a diagnostic and prognostic biomarker for KIRC.
Collapse
|
48
|
Yan Y, Yang Y, Ning C, Wu N, Yan S, Sun L. Role of Traditional Chinese Medicine Syndrome Type, Gut Microbiome, and Host Immunity in Predicting Early and Advanced Stage Colorectal Cancer. Integr Cancer Ther 2023; 22:15347354221144051. [PMID: 36604798 PMCID: PMC9830091 DOI: 10.1177/15347354221144051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
OBJECTIVE To investigate the role of Traditional Chinese Medicine (TCM) syndrome type, gut microbiome distribution, and host immunity function in predicting the early and advanced clinical stages of colorectal cancer (CRC). METHODS A cross-sectional case-control study was performed which included 48 early stage and 48 advanced patients with CRC enrolled from March 2018 to December 2020. 16S rRNA gene sequencing was performed to analyze the gut microbiomes of the patients, while T and B lymphocyte subsets in peripheral blood were assessed using flow cytometry. TCM syndrome type was measured using the spleen deficiency syndrome (SDS) scale. RESULTS The abundance levels of Prevotella, Escherichia-Shigella, and Faecalibacterium in the gut microbiota were significantly increased in the advanced group, while Bacteroides was significantly decreased. Phascolarctobacterium was detectable only in the early metaphase group, whereas Alistipes was detectable only in the advanced group. The lymphocyte (P = .006), T helper cell (TH) (P = .002), cytotoxic T cell (TC) (P = .003), double positive T cell (DPT) (P = .02), and total T counts (P = .001) were significantly higher in the early metaphase group than in the advanced metaphase group. Compared with patients with early stage CRC, the advanced group had a higher SDS score. After adjusting for clinical stage, Spearman's correlation analysis showed interactions among gut microbiome abundance, T cell level, and SDS score. Multivariate logistic analysis showed that after controlling for the SDS score, abundance of Alistipes and Faecalibacterium, and double negative T cell (DNT) level, DPT was significantly associated with a lower risk of advanced-stage disease (hazard ratio, 0.918; P = .022). CONCLUSION Our study suggested associations between clinical stage, SDS, gut microbiota, and T lymphocytes, which provided insights for a potential prediction model for the disease progression of CRC.
Collapse
Affiliation(s)
- Yunzi Yan
- Beijing University of Chinese Medicine,
Beijing, China
- China Academy of Chinese Medical
Science, Beijing, China
| | - Yufei Yang
- Beijing University of Chinese Medicine,
Beijing, China
| | - Chunhui Ning
- China Academy of Chinese Medical
Science, Beijing, China
| | - Na Wu
- Beijing University of Chinese Medicine,
Beijing, China
| | - Shaohua Yan
- Beijing University of Chinese Medicine,
Beijing, China
| | - Lingyun Sun
- China Academy of Chinese Medical
Science, Beijing, China
- Lingyun Sun, China Academy of Chinese
Medical Sciences Xiyuan Hospital, Xiyuan Caochang Road, Haidian District,
Beijing, 100091, China.
| |
Collapse
|
49
|
Liao J, Li X, Gan Y, Han S, Rong P, Wang W, Li W, Zhou L. Artificial intelligence assists precision medicine in cancer treatment. Front Oncol 2023; 12:998222. [PMID: 36686757 PMCID: PMC9846804 DOI: 10.3389/fonc.2022.998222] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 11/22/2022] [Indexed: 01/06/2023] Open
Abstract
Cancer is a major medical problem worldwide. Due to its high heterogeneity, the use of the same drugs or surgical methods in patients with the same tumor may have different curative effects, leading to the need for more accurate treatment methods for tumors and personalized treatments for patients. The precise treatment of tumors is essential, which renders obtaining an in-depth understanding of the changes that tumors undergo urgent, including changes in their genes, proteins and cancer cell phenotypes, in order to develop targeted treatment strategies for patients. Artificial intelligence (AI) based on big data can extract the hidden patterns, important information, and corresponding knowledge behind the enormous amount of data. For example, the ML and deep learning of subsets of AI can be used to mine the deep-level information in genomics, transcriptomics, proteomics, radiomics, digital pathological images, and other data, which can make clinicians synthetically and comprehensively understand tumors. In addition, AI can find new biomarkers from data to assist tumor screening, detection, diagnosis, treatment and prognosis prediction, so as to providing the best treatment for individual patients and improving their clinical outcomes.
Collapse
Affiliation(s)
- Jinzhuang Liao
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Xiaoying Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Yu Gan
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Shuangze Han
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Pengfei Rong
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China,Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China,*Correspondence: Pengfei Rong, ; Wei Wang, ; Wei Li, ; Li Zhou,
| | - Wei Wang
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China,Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China,*Correspondence: Pengfei Rong, ; Wei Wang, ; Wei Li, ; Li Zhou,
| | - Wei Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China,Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China,*Correspondence: Pengfei Rong, ; Wei Wang, ; Wei Li, ; Li Zhou,
| | - Li Zhou
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China,Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China,Department of Pathology, The Xiangya Hospital of Central South University, Changsha, Hunan, China,*Correspondence: Pengfei Rong, ; Wei Wang, ; Wei Li, ; Li Zhou,
| |
Collapse
|
50
|
Wu R, Luo J, Wan H, Zhang H, Yuan Y, Hu H, Feng J, Wen J, Wang Y, Li J, Liang Q, Gan F, Zhang G. Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database. PLoS One 2023; 18:e0280340. [PMID: 36701415 PMCID: PMC9879508 DOI: 10.1371/journal.pone.0280340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 12/26/2022] [Indexed: 01/27/2023] Open
Abstract
INTRODUCTION Many researchers used machine learning (ML) to predict the prognosis of breast cancer (BC) patients and noticed that the ML model had good individualized prediction performance. OBJECTIVE The cohort study was intended to establish a reliable data analysis model by comparing the performance of 10 common ML algorithms and the the traditional American Joint Committee on Cancer (AJCC) stage, and used this model in Web application development to provide a good individualized prediction for others. METHODS This study included 63145 BC patients from the Surveillance, Epidemiology, and End Results database. RESULTS Through the performance of the 10 ML algorithms and 7th AJCC stage in the optimal test set, we found that in terms of 5-year overall survival, multivariate adaptive regression splines (MARS) had the highest area under the curve (AUC) value (0.831) and F1-score (0.608), and both sensitivity (0.737) and specificity (0.772) were relatively high. Besides, MARS showed a highest AUC value (0.831, 95%confidence interval: 0.820-0.842) in comparison to the other ML algorithms and 7th AJCC stage (all P < 0.05). MARS, the best performing model, was selected for web application development (https://w12251393.shinyapps.io/app2/). CONCLUSIONS The comparative study of multiple forecasting models utilizing a large data noted that MARS based model achieved a much better performance compared to other ML algorithms and 7th AJCC stage in individualized estimation of survival of BC patients, which was very likely to be the next step towards precision medicine.
Collapse
Affiliation(s)
- Ruiyang Wu
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Jing Luo
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Hangyu Wan
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Haiyan Zhang
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Yewei Yuan
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Huihua Hu
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Jinyan Feng
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Jing Wen
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Yan Wang
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Junyan Li
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Qi Liang
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Fengjiao Gan
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Gang Zhang
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
- * E-mail:
| |
Collapse
|