1
|
Sahoo K, Sundararajan V. Methods in DNA methylation array dataset analysis: A review. Comput Struct Biotechnol J 2024; 23:2304-2325. [PMID: 38845821 PMCID: PMC11153885 DOI: 10.1016/j.csbj.2024.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 04/25/2024] [Accepted: 05/08/2024] [Indexed: 06/09/2024] Open
Abstract
Understanding the intricate relationships between gene expression levels and epigenetic modifications in a genome is crucial to comprehending the pathogenic mechanisms of many diseases. With the advancement of DNA Methylome Profiling techniques, the emphasis on identifying Differentially Methylated Regions (DMRs/DMGs) has become crucial for biomarker discovery, offering new insights into the etiology of illnesses. This review surveys the current state of computational tools/algorithms for the analysis of microarray-based DNA methylation profiling datasets, focusing on key concepts underlying the diagnostic/prognostic CpG site extraction. It addresses methodological frameworks, algorithms, and pipelines employed by various authors, serving as a roadmap to address challenges and understand changing trends in the methodologies for analyzing array-based DNA methylation profiling datasets derived from diseased genomes. Additionally, it highlights the importance of integrating gene expression and methylation datasets for accurate biomarker identification, explores prognostic prediction models, and discusses molecular subtyping for disease classification. The review also emphasizes the contributions of machine learning, neural networks, and data mining to enhance diagnostic workflow development, thereby improving accuracy, precision, and robustness.
Collapse
Affiliation(s)
| | - Vino Sundararajan
- Correspondence to: Department of Bio Sciences, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632 014, Tamil Nadu, India.
| |
Collapse
|
2
|
Li Y, Li H, Sun G, Xu S, Tang X, Zhang L, Wan L, Zhang L, Tang M. Integrative analyses of multi-omics data constructing tumor microenvironment and immune-related molecular prognosis model in human colorectal cancer. Heliyon 2024; 10:e32744. [PMID: 38975206 PMCID: PMC11226854 DOI: 10.1016/j.heliyon.2024.e32744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 05/30/2024] [Accepted: 06/07/2024] [Indexed: 07/09/2024] Open
Abstract
The increasing prevalence and incidence of colorectal cancer (CRC), particularly in young adults, underscore the imperative to comprehend its fundamental mechanisms, discover novel diagnostic and prognostic markers, and enhance therapeutic strategies. Here, we integrated multi-omics data, including gene expression, somatic mutation data and DNA methylation data, to unravel the intricacies of tumor microenvironment (TME) in CRC and search for novel prognostic markers. By calculating the immune score for each patient from the expression profile, we delineated the differential immune cell fraction, constructed an immune-related multi-omics atlas, and identified molecular characteristics. The entire colorectal dataset (n = 343) was randomly divided into training (n = 249) and testing datasets (n = 94). We screened 144 immune-related genes, 6 mutant genes, and 38 methylation probes associated with overall survival (OS). These makers were then incorporated into a 10-gene prognostic model using Lasso and Cox regression in the training dataset, and the model's performance was evaluated in an independent validation dataset. The model exhibited satisfactory results (average concordance index [C-index] = 0.77), with the average 1-year, 3-year, and 5-year AUCs being 0.79, 0.76, and 0.76 in the training dataset and 0.74, 0.80, and 0.90 in the testing dataset. Furthermore, the prognostic model demonstrated applicability in guiding chemotherapy for CRC patients and exhibited a degree of pan-cancer utility in risk stratification. In conclusion, our integrated analysis of multi-omics data revealed immune-related genetic and epigenetic characteristics of the TME. We propose an integrative prognostic model that can stratify risk and guide chemotherapy for CRC patients. The generalizability of the model in risk stratification across different cancer types was validated in Pan-Cancer cohort.
Collapse
Affiliation(s)
- Yifei Li
- Clinical Biobank, Institute of Geriatric Medicine, Beijing Hospital, National Center of Gerontology, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Hexin Li
- Clinical Biobank, Institute of Geriatric Medicine, Beijing Hospital, National Center of Gerontology, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Gaoyuan Sun
- Clinical Biobank, Institute of Geriatric Medicine, Beijing Hospital, National Center of Gerontology, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Siyuan Xu
- Clinical Biobank, Institute of Geriatric Medicine, Beijing Hospital, National Center of Gerontology, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Xiaokun Tang
- Clinical Biobank, Institute of Geriatric Medicine, Beijing Hospital, National Center of Gerontology, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Lanxin Zhang
- Clinical Biobank, Institute of Geriatric Medicine, Beijing Hospital, National Center of Gerontology, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Li Wan
- Clinical Biobank, Institute of Geriatric Medicine, Beijing Hospital, National Center of Gerontology, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Lili Zhang
- Clinical Biobank, Institute of Geriatric Medicine, Beijing Hospital, National Center of Gerontology, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Min Tang
- Department of Medical Oncology, Institute of Geriatric Medicine, Beijing Hospital, National Center of Gerontology, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
3
|
Peng Y, Wu Q, Ding X, Wang L, Gong H, Feng C, Liu T, Zhu H. A hypoxia- and lactate metabolism-related gene signature to predict prognosis of sepsis: discovery and validation in independent cohorts. Eur J Med Res 2023; 28:320. [PMID: 37661250 PMCID: PMC10476321 DOI: 10.1186/s40001-023-01307-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 08/21/2023] [Indexed: 09/05/2023] Open
Abstract
BACKGROUND High throughput gene expression profiling is a valuable tool in providing insight into the molecular mechanism of human diseases. Hypoxia- and lactate metabolism-related genes (HLMRGs) are fundamentally dysregulated in sepsis and have great predictive potential. Therefore, we attempted to build an HLMRG signature to predict the prognosis of patients with sepsis. METHODS Three publicly available transcriptomic profiles of peripheral blood mononuclear cells from patients with sepsis (GSE65682, E-MTAB-4421 and E-MTAB-4451, total n = 850) were included in this study. An HLMRG signature was created by employing Cox regression and least absolute shrinkage and selection operator estimation. The CIBERSORT method was used to analyze the abundances of 22 immune cell subtypes based on transcriptomic data. Metascape was used to investigate pathways related to the HLMRG signature. RESULTS We developed a prognostic signature based on five HLMRGs (ERO1L, SIAH2, TGFA, TGFBI, and THBS1). This classifier successfully discriminated patients with disparate 28-day mortality in the discovery cohort (GSE65682, n = 479), and consistent results were observed in the validation cohort (E-MTAB-4421 plus E-MTAB-4451, n = 371). Estimation of immune infiltration revealed significant associations between the risk score and a subset of immune cells. Enrichment analysis revealed that pathways related to antimicrobial immune responses, leukocyte activation, and cell adhesion and migration were significantly associated with the HLMRG signature. CONCLUSIONS Identification of a prognostic signature suggests the critical role of hypoxia and lactate metabolism in the pathophysiology of sepsis. The HLMRG signature can be used as an efficient tool for the risk stratification of patients with sepsis.
Collapse
Affiliation(s)
- Yaojun Peng
- Medical School of Chinese PLA General Hospital, Beijing, China
- Department of Emergency, The First Medical Center, Chinese PLA General Hospital, 28th Fuxing Road, Beijing, China
| | - Qiyan Wu
- Institute of Oncology, The Fifth Medical Centre, Chinese PLA General Hospital, Beijing, China
| | - Xinhuan Ding
- Medical School of Chinese PLA General Hospital, Beijing, China
- Department of Emergency, The First Medical Center, Chinese PLA General Hospital, 28th Fuxing Road, Beijing, China
| | - Lingxiong Wang
- Institute of Oncology, The Fifth Medical Centre, Chinese PLA General Hospital, Beijing, China
| | - Hanpu Gong
- Department of Emergency, The First Medical Center, Chinese PLA General Hospital, 28th Fuxing Road, Beijing, China
| | - Cong Feng
- Department of Emergency, The First Medical Center, Chinese PLA General Hospital, 28th Fuxing Road, Beijing, China
| | - Tianyi Liu
- Institute of Oncology, The Fifth Medical Centre, Chinese PLA General Hospital, Beijing, China
| | - Haiyan Zhu
- Department of Emergency, The First Medical Center, Chinese PLA General Hospital, 28th Fuxing Road, Beijing, China.
| |
Collapse
|
4
|
Yuan T, Edelmann D, Fan Z, Alwers E, Kather JN, Brenner H, Hoffmeister M. Machine learning in the identification of prognostic DNA methylation biomarkers among patients with cancer: A systematic review of epigenome-wide studies. Artif Intell Med 2023; 143:102589. [PMID: 37673571 DOI: 10.1016/j.artmed.2023.102589] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 04/19/2023] [Accepted: 04/30/2023] [Indexed: 09/08/2023]
Abstract
BACKGROUND DNA methylation biomarkers have great potential in improving prognostic classification systems for patients with cancer. Machine learning (ML)-based analytic techniques might help overcome the challenges of analyzing high-dimensional data in relatively small sample sizes. This systematic review summarizes the current use of ML-based methods in epigenome-wide studies for the identification of DNA methylation signatures associated with cancer prognosis. METHODS We searched three electronic databases including PubMed, EMBASE, and Web of Science for articles published until 2 January 2023. ML-based methods and workflows used to identify DNA methylation signatures associated with cancer prognosis were extracted and summarized. Two authors independently assessed the methodological quality of included studies by a seven-item checklist adapted from 'A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies (PROBAST)' and from the 'Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK). Different ML methods and workflows used in included studies were summarized and visualized by a sunburst chart, a bubble chart, and Sankey diagrams, respectively. RESULTS Eighty-three studies were included in this review. Three major types of ML-based workflows were identified. 1) unsupervised clustering, 2) supervised feature selection, and 3) deep learning-based feature transformation. For the three workflows, the most frequently used ML techniques were consensus clustering, least absolute shrinkage and selection operator (LASSO), and autoencoder, respectively. The systematic review revealed that the performance of these approaches has not been adequately evaluated yet and that methodological and reporting flaws were common in the identified studies using ML techniques. CONCLUSIONS There is great heterogeneity in ML-based methodological strategies used by epigenome-wide studies to identify DNA methylation markers associated with cancer prognosis. In theory, most existing workflows could not handle the high multi-collinearity and potentially non-linearity interactions in epigenome-wide DNA methylation data. Benchmarking studies are needed to compare the relative performance of various approaches for specific cancer types. Adherence to relevant methodological and reporting guidelines are urgently needed.
Collapse
Affiliation(s)
- Tanwei Yuan
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany; Medical Faculty Heidelberg, Heidelberg University, Heidelberg, Germany
| | - Dominic Edelmann
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ziwen Fan
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Elizabeth Alwers
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; Medical Oncology, National Center of Tumour Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany; Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany; German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
5
|
Fasano C, Grossi V, Forte G, Simone C. Short Linear Motifs in Colorectal Cancer Interactome and Tumorigenesis. Cells 2022; 11:3739. [PMID: 36496998 PMCID: PMC9737320 DOI: 10.3390/cells11233739] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 11/16/2022] [Accepted: 11/21/2022] [Indexed: 11/25/2022] Open
Abstract
Colorectal tumorigenesis is driven by alterations in genes and proteins responsible for cancer initiation, progression, and invasion. This multistage process is based on a dense network of protein-protein interactions (PPIs) that become dysregulated as a result of changes in various cell signaling effectors. PPIs in signaling and regulatory networks are known to be mediated by short linear motifs (SLiMs), which are conserved contiguous regions of 3-10 amino acids within interacting protein domains. SLiMs are the minimum sequences required for modulating cellular PPI networks. Thus, several in silico approaches have been developed to predict and analyze SLiM-mediated PPIs. In this review, we focus on emerging evidence supporting a crucial role for SLiMs in driver pathways that are disrupted in colorectal cancer (CRC) tumorigenesis and related PPI network alterations. As a result, SLiMs, along with short peptides, are attracting the interest of researchers to devise small molecules amenable to be used as novel anti-CRC targeted therapies. Overall, the characterization of SLiMs mediating crucial PPIs in CRC may foster the development of more specific combined pharmacological approaches.
Collapse
Affiliation(s)
- Candida Fasano
- Medical Genetics, National Institute of Gastroenterology-IRCCS “Saverio de Bellis”, Castellana Grotte, 70013 Bari, Italy; (V.G.); (G.F.)
| | - Valentina Grossi
- Medical Genetics, National Institute of Gastroenterology-IRCCS “Saverio de Bellis”, Castellana Grotte, 70013 Bari, Italy; (V.G.); (G.F.)
| | - Giovanna Forte
- Medical Genetics, National Institute of Gastroenterology-IRCCS “Saverio de Bellis”, Castellana Grotte, 70013 Bari, Italy; (V.G.); (G.F.)
| | - Cristiano Simone
- Medical Genetics, National Institute of Gastroenterology-IRCCS “Saverio de Bellis”, Castellana Grotte, 70013 Bari, Italy; (V.G.); (G.F.)
- Medical Genetics, Department of Precision and Regenerative Medicine and Jonic Area (DiMePRe-J), University of Bari Aldo Moro, 70124 Bari, Italy
| |
Collapse
|
6
|
Zhao Z, Wang Z, Wu Y, Liao D, Zhao B. Comprehensive analysis of TAMs marker genes in glioma for predicting prognosis and immunotherapy response. Mol Immunol 2022; 144:78-95. [DOI: 10.1016/j.molimm.2022.02.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 02/05/2022] [Accepted: 02/10/2022] [Indexed: 12/17/2022]
|