1
|
Kim S, Qin Y, Park HJ, Bohn RIC, Yue M, Xu Z, Forno E, Chen W, Celedón JC. MOSES: a methylation-based gene association approach for unveiling environmentally regulated genes linked to a trait or disease. Clin Epigenetics 2024; 16:161. [PMID: 39558360 PMCID: PMC11574994 DOI: 10.1186/s13148-024-01776-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 11/06/2024] [Indexed: 11/20/2024] Open
Abstract
BACKGROUND DNA methylation is a critical regulatory mechanism of gene expression, influencing various human diseases and traits. While traditional expression quantitative trait loci (eQTL) studies have helped elucidate the genetic regulation of gene expression, there is a growing need to explore environmental influences on gene expression. Existing methods such as PrediXcan and FUSION focus on genotype-based associations but overlook the impact of environmental factors. To address this gap, we present MOSES (methylation-based gene association), a novel approach that utilizes DNA methylation to identify environmentally regulated genes associated with traits or diseases without relying on measured gene expression. RESULTS MOSES involves training, imputation, and association testing. It employs elastic-net penalized regression models to estimate the influence of CpGs and SNPs (if available) on gene expression. We developed and compared four MOSES versions incorporating different methylation and genetic data: (1) cis-DNA methylation within 1 Mb of promoter regions, (2) both cis-SNPs and cis-CpGs, 3) both cis- and a part of trans- CpGs (±5Mb away) from promoter regions), and 4) long-range DNA methylation (±10 Mb away) from promoter regions. Our analysis using nasal epithelium and white blood cell data from the Epigenetic Variation and Childhood Asthma in Puerto Ricans (EVA-PR) study demonstrated that MOSES, particularly the version incorporating long-range CpGs (MOSES-DNAm 10 M), significantly outperformed existing methods like PrediXcan, MethylXcan, and Biomethyl in predicting gene expression. MOSES-DNAm 10 M identified more differentially expressed genes (DEGs) associated with atopic asthma, particularly those involved in immune pathways, highlighting its superior performance in uncovering environmentally regulated genes. Further application of MOSES to lung tissue data from idiopathic pulmonary fibrosis (IPF) patients confirmed its robustness and versatility across different diseases and tissues. CONCLUSION MOSES represents an innovative advancement in gene association studies, leveraging DNA methylation to capture the influence of environmental factors on gene expression. By incorporating long-range CpGs, MOSES-DNAm 10 M provides superior predictive accuracy and gene association capabilities compared to traditional genotype-based methods. This novel approach offers valuable insights into the complex interplay between genetics and the environment, enhancing our understanding of disease mechanisms and potentially guiding therapeutic strategies. The user-friendly MOSES R package is publicly available to advance studies in various diseases, including immune-related conditions like asthma.
Collapse
Affiliation(s)
- Soyeon Kim
- Division of Pulmonary Medicine, Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Yidi Qin
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hyun Jung Park
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Rebecca I Caldino Bohn
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Molin Yue
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zhongli Xu
- School of Medicine, Tsinghua University, Beijing, China
| | - Erick Forno
- Division of Pulmonary Medicine, Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei Chen
- Division of Pulmonary Medicine, Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Juan C Celedón
- Division of Pulmonary Medicine, Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
2
|
Bose B, Bozdag S. Identifying cell lines across pan-cancer to be used in preclinical research as a proxy for patient tumor samples. Commun Biol 2024; 7:1101. [PMID: 39244634 PMCID: PMC11380668 DOI: 10.1038/s42003-024-06812-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 08/30/2024] [Indexed: 09/09/2024] Open
Abstract
In pre-clinical trials of anti-cancer drugs, cell lines are utilized as a model for patient tumor samples to understand the response of drugs. However, in vitro culture of cell lines, in general, alters the biology of the cell lines and likely gives rise to systematic differences from the tumor samples' genomic profiles; hence the drug response of cell lines may deviate from actual patients' drug response. In this study, we computed a similarity score for the selection of cell lines depicting the close and far resemblance to patient tumor samples in twenty-two different cancer types at genetic, genomic, and epigenetic levels integrating multi-omics datasets. We also considered the presence of immune cells in tumor samples and cancer-related biological pathways in this score which aids personalized medicine research in cancer. We showed that based on these similarity scores, cell lines were able to recapitulate the drug response of patient tumor samples for several FDA-approved cancer drugs in multiple cancer types. Based on these scores, several of the high-rank cell lines were shown to have a close likeness to the corresponding tumor type in previously reported in vitro experiments.
Collapse
Affiliation(s)
- Banabithi Bose
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX, USA.
- Department of Mathematics, University of North Texas, Denton, TX, USA.
- BioDiscovery Institute, University of North Texas, Denton, TX, USA.
- Center for Computational Life Sciences, University of North Texas, Denton, TX, USA.
| |
Collapse
|
3
|
Gristina V, Pepe F, Genova C, Bazan Russo TD, Gottardo A, Russo G, Incorvaia L, Galvano A, Badalamenti G, Bazan V, Troncone G, Russo A, Malapelle U. Harnessing the potential of genomic characterization of mutational profiles to improve early diagnosis of lung cancer. Expert Rev Mol Diagn 2024; 24:793-802. [PMID: 39267426 DOI: 10.1080/14737159.2024.2403081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 09/08/2024] [Indexed: 09/17/2024]
Abstract
INTRODUCTION Lung Cancer (LC) continues to be a leading cause of cancer-related mortality globally, largely due to the asymptomatic nature of its early stages and the limitations of current diagnostic methods such as Low-Dose Computed Tomography (LDCT), whose often result in late diagnosis, highlighting an urgent need for innovative, minimally invasive diagnostic techniques that can improve early detection rates. AREAS COVERED This review delves into the potential of genomic characterization and mutational profiling to enhance early LC diagnosis, exploring the current state and limitations of traditional diagnostic approaches and the revolutionary role of Liquid Biopsies (LB), including cell-free DNA (cfDNA) analysis through fragmentomics and methylomics. New genomic technologies that allow for earlier detection of LC are scrutinized, alongside a detailed discussion on the literature that shaped our understanding in this field. EXPERT OPINION Despite the promising advancements in genomic characterization techniques, several challenges remain, such as the heterogeneity of LC mutations, the high cost, and limited accessibility of Next-Generation Sequencing (NGS) technologies. Additionally, there is a critical need of standardized protocols for interpreting mutational data. Future research should focus on overcoming these barriers to integrate these novel diagnostic methods into standard clinical practice, potentially revolutionizing the management of LC patients.
Collapse
Affiliation(s)
- Valerio Gristina
- Department of Precision Medicine in Medical, Surgical and Critical Care (Me.Pre.C.C.), University of Palermo, Palermo, Italy
| | - Francesco Pepe
- Department of Public Health, University of Naples Federico II, Naples, Italy
| | - Carlo Genova
- Department of Internal Medicine and Medical Specialties, University of Genoa, Genoa, Italy
- Academic Oncology Unit, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Tancredi Didier Bazan Russo
- Department of Precision Medicine in Medical, Surgical and Critical Care (Me.Pre.C.C.), University of Palermo, Palermo, Italy
| | - Andrea Gottardo
- Department of Precision Medicine in Medical, Surgical and Critical Care (Me.Pre.C.C.), University of Palermo, Palermo, Italy
| | - Gianluca Russo
- Department of Public Health, University of Naples Federico II, Naples, Italy
| | - Lorena Incorvaia
- Department of Precision Medicine in Medical, Surgical and Critical Care (Me.Pre.C.C.), University of Palermo, Palermo, Italy
| | - Antonio Galvano
- Department of Precision Medicine in Medical, Surgical and Critical Care (Me.Pre.C.C.), University of Palermo, Palermo, Italy
| | - Giuseppe Badalamenti
- Department of Precision Medicine in Medical, Surgical and Critical Care (Me.Pre.C.C.), University of Palermo, Palermo, Italy
| | - Viviana Bazan
- Department of Biomedicine, Neuroscience and Advanced Diagnostics (BIND), University of Palermo, Palermo, Italy
| | - Giancarlo Troncone
- Department of Public Health, University of Naples Federico II, Naples, Italy
| | - Antonio Russo
- Department of Precision Medicine in Medical, Surgical and Critical Care (Me.Pre.C.C.), University of Palermo, Palermo, Italy
| | - Umberto Malapelle
- Department of Public Health, University of Naples Federico II, Naples, Italy
| |
Collapse
|
4
|
Dupont ME, Christiansen SN, Jacobsen SB, Kampmann ML, Olsen KB, Tfelt-Hansen J, Banner J, Morling N, Andersen JD. DNA quality evaluation of formalin-fixed paraffin-embedded heart tissue for DNA methylation array analysis. Sci Rep 2023; 13:2004. [PMID: 36737451 PMCID: PMC9898234 DOI: 10.1038/s41598-023-29120-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 01/31/2023] [Indexed: 02/05/2023] Open
Abstract
Archived formalin-fixed and paraffin-embedded (FFPE) heart tissue from autopsied individuals represents an important resource for investigating the DNA methylation of heart tissue of deceased individuals. The DNA quality of FFPE tissue from autopsies may be decreased, affecting the DNA methylation measurements. Therefore, inexpensive screening methods for estimating DNA quality are valuable. We investigated the correlation between the DNA quality of archived FFPE heart tissue examined with the Illumina Infinium HD FFPE QC assay (Infinium QC) and Thermo Fisher's Quantifiler Trio DNA Quantification kit (QuantifilerTrio), respectively, and the amount of usable DNA methylation data as measured by the probe detection rate (probe DR) obtained with the Illumina Infinium MethylationEPIC array. We observed a high correlation (r2 = 0.75; p < 10-11) between the QuantifilerTrio degradation index, DI, and the amount of usable DNA methylation data analysed with SeSAMe, whereas a much weaker correlation was observed between the Infinium QC and SeSAMe probe DR (r2 = 0.17; p < 0.05). Based on the results, QuantifilerTrio DI seems to predict the proportion of usable DNA methylation data analysed with the Illumina Infinium MethylationEPIC array and SeSAMe by a linear model: SeSAMe probe DR = 0.80-log10(DI) × 0.25.
Collapse
Affiliation(s)
- Mikkel E Dupont
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Steffan N Christiansen
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Stine B Jacobsen
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Marie-Louise Kampmann
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kristine B Olsen
- Section of Forensic Pathology, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jacob Tfelt-Hansen
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Jytte Banner
- Section of Forensic Pathology, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Niels Morling
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Mathematical Sciences, Aalborg University, Aalborg, Denmark
| | - Jeppe D Andersen
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
5
|
Manu DM, Mwinyi J, Schiöth HB. Challenges in Analyzing Functional Epigenetic Data in Perspective of Adolescent Psychiatric Health. Int J Mol Sci 2022; 23:5856. [PMID: 35628666 PMCID: PMC9147258 DOI: 10.3390/ijms23105856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/11/2022] [Accepted: 05/18/2022] [Indexed: 12/10/2022] Open
Abstract
The formative period of adolescence plays a crucial role in the development of skills and abilities for adulthood. Adolescents who are affected by mental health conditions are at risk of suicide and social and academic impairments. Gene-environment complementary contributions to the molecular mechanisms involved in psychiatric disorders have emphasized the need to analyze epigenetic marks such as DNA methylation (DNAm) and non-coding RNAs. However, the large and diverse bioinformatic and statistical methods, referring to the confounders of the statistical models, application of multiple-testing adjustment methods, questions regarding the correlation of DNAm across tissues, and sex-dependent differences in results, have raised challenges regarding the interpretation of the results. Based on the example of generalized anxiety disorder (GAD) and depressive disorder (MDD), we shed light on the current knowledge and usage of methodological tools in analyzing epigenetics. Statistical robustness is an essential prerequisite for a better understanding and interpretation of epigenetic modifications and helps to find novel targets for personalized therapeutics in psychiatric diseases.
Collapse
Affiliation(s)
- Diana M. Manu
- Department of Surgical Sciences, Functional Pharmacology and Neuroscience, Uppsala University, 751 24 Uppsala, Sweden; (J.M.); (H.B.S.)
| | | | | |
Collapse
|
6
|
Muthamilselvan S, Raghavendran A, Palaniappan A. Stage-differentiated ensemble modeling of DNA methylation landscapes uncovers salient biomarkers and prognostic signatures in colorectal cancer progression. PLoS One 2022; 17:e0249151. [PMID: 35202405 PMCID: PMC8870460 DOI: 10.1371/journal.pone.0249151] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 02/01/2022] [Indexed: 12/15/2022] Open
Abstract
Background Aberrant DNA methylation acts epigenetically to skew the gene transcription rate up or down, contributing to cancer etiology. A gap in our understanding concerns the epigenomics of stagewise cancer progression. In this study, we have developed a comprehensive computational framework for the stage-differentiated modelling of DNA methylation landscapes in colorectal cancer (CRC). Methods The methylation β-matrix was derived from the public-domain TCGA data, converted into M-value matrix, annotated with AJCC stages, and analysed for stage-salient genes using an ensemble of approaches involving stage-differentiated modelling of methylation patterns and/or expression patterns. Differentially methylated genes (DMGs) were identified using a contrast against controls (adjusted p-value <0.001 and |log fold-change of M-value| >2), and then filtered using a series of all possible pairwise stage contrasts (p-value <0.05) to obtain stage-salient DMGs. These were then subjected to a consensus analysis, followed by matching with clinical data and performing Kaplan–Meier survival analysis to evaluate the impact of methylation patterns of consensus stage-salient biomarkers on disease prognosis. Results We found significant genome-wide changes in methylation patterns in cancer cases relative to controls agnostic of stage. The stage-differentiated models yielded the following consensus salient genes: one stage-I gene (FBN1), one stage-II gene (FOXG1), one stage-III gene (HCN1) and four stage-IV genes (NELL1, ZNF135, FAM123A, LAMA1). All the biomarkers were significantly hypermethylated in the promoter regions, indicating down-regulation of expression and implying a putative CpG island Methylator Phenotype (CIMP) manifestation. A prognostic signature consisting of FBN1 and FOXG1 survived all the analytical filters, and represents a novel early-stage epigenetic biomarker / target. Conclusions We have designed and executed a workflow for stage-differentiated epigenomic analysis of colorectal cancer progression, and identified several stage-salient diagnostic biomarkers, and an early-stage prognostic biomarker panel. The study has led to the discovery of an alternative CIMP-like signature in colorectal cancer, reinforcing the role of CIMP drivers in tumor pathophysiology.
Collapse
Affiliation(s)
- Sangeetha Muthamilselvan
- Department of Bioinformatics, School of Chemical and BioTechnology, SASTRA Deemed University, Thanjavur, India
| | - Abirami Raghavendran
- Department of Bioinformatics, School of Chemical and BioTechnology, SASTRA Deemed University, Thanjavur, India
| | - Ashok Palaniappan
- Department of Bioinformatics, School of Chemical and BioTechnology, SASTRA Deemed University, Thanjavur, India
- * E-mail:
| |
Collapse
|
7
|
Minegishi R, Gotoh O, Tanaka N, Maruyama R, Chang JT, Mori S. A method of sample-wise region-set enrichment analysis for DNA methylomics. Epigenomics 2021; 13:1081-1093. [PMID: 34241544 DOI: 10.2217/epi-2021-0065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Aim: Gene set analysis has commonly been used to interpret DNA methylome data. However, summarizing the DNA methylation level of a gene is challenging due to variability in the number, density and methylation levels of CpG sites, and the numerous intergenic CpGs. Instead, we propose to use region sets to annotate the DNA methylome. Methods: We developed single sample region-set enrichment analysis for DNA methylome (methyl-ssRSEA) to conduct sample-wise, region-set enrichment analysis. Results: Methyl-ssRSEA can handle both microarray- and sequencing-based platforms and reproducibly recover the known biology from the methylation profiles of peripheral blood cells and breast cancers. The performance was superior to existing tools for region-set analysis in discriminating blood cell types. Conclusion: Methyl-ssRSEA offers a novel way to functionally interpret the DNA methylome in the cell.
Collapse
Affiliation(s)
- Ryu Minegishi
- Project for Development of Innovative Research on Cancer Therapeutics, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Osamu Gotoh
- Project for Development of Innovative Research on Cancer Therapeutics, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Norio Tanaka
- Project for Development of Innovative Research on Cancer Therapeutics, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Reo Maruyama
- Project for Cancer Epigenomics, Cancer Institute, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Jeffrey T Chang
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center, Houston, TX 77030, USA
| | - Seiichi Mori
- Project for Development of Innovative Research on Cancer Therapeutics, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo, Japan
| |
Collapse
|
8
|
Wu M, Yi H, Ma S. Vertical integration methods for gene expression data analysis. Brief Bioinform 2021; 22:bbaa169. [PMID: 32793970 PMCID: PMC8138889 DOI: 10.1093/bib/bbaa169] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 06/18/2020] [Accepted: 07/04/2020] [Indexed: 12/12/2022] Open
Abstract
Gene expression data have played an essential role in many biomedical studies. When the number of genes is large and sample size is limited, there is a 'lack of information' problem, leading to low-quality findings. To tackle this problem, both horizontal and vertical data integrations have been developed, where vertical integration methods collectively analyze data on gene expressions as well as their regulators (such as mutations, DNA methylation and miRNAs). In this article, we conduct a selective review of vertical data integration methods for gene expression data. The reviewed methods cover both marginal and joint analysis and supervised and unsupervised analysis. The main goal is to provide a sketch of the vertical data integration paradigm without digging into too many technical details. We also briefly discuss potential pitfalls, directions for future developments and application notes.
Collapse
Affiliation(s)
- Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics
| | - Huangdi Yi
- Department of Biostatistics at Yale University
| | - Shuangge Ma
- Department of Biostatistics at Yale University
| |
Collapse
|
9
|
Kruppa J, Sieg M, Richter G, Pohrt A. Estimands in epigenome-wide association studies. Clin Epigenetics 2021; 13:98. [PMID: 33926513 PMCID: PMC8086103 DOI: 10.1186/s13148-021-01083-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 04/19/2021] [Indexed: 12/11/2022] Open
Abstract
Background In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both. Results We present and discuss four possible approaches of achieving estimands in DNA methylation analysis. In addition, we present the usage of M-values or Beta-values in the context of bioinformatical pipelines, which often demand a predefined outcome. We show the dependencies between the differences in M-values to differences in Beta-values in two data simulations: a analysis with and without confounder effect. Without present confounder effects, M-values can be used for the statistical analysis and Beta-values statistics for the reporting. If confounder effects exist, we demonstrate the deviations and correct the effects by the intercept method. Finally, we demonstrate the theoretical problem on two large human genome-wide DNA methylation datasets to verify the results. Conclusions The usage of M-values in the analysis of DNA methylation data will produce effect estimates, which cannot be biologically interpreted. The parallel usage of Beta-value statistics ignores possible confounder effects and can therefore not be recommended. Hence, if the differences in Beta-values are the focus of the study, the intercept method is recommendable. Hyper- or hypomethylated CpG sites must then be carefully evaluated. If an exploratory analysis of possible CpG sites is the aim of the study, M-values can be used for inference. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-021-01083-9.
Collapse
Affiliation(s)
- Jochen Kruppa
- Charité - University Medicine, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany. .,Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178, Berlin, Germany.
| | - Miriam Sieg
- Charité - University Medicine, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany.,Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178, Berlin, Germany
| | - Gesa Richter
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178, Berlin, Germany.,Department of Periodontology and Synoptic Dentistry, Institute of Dental, Oral and Maxillary Medicine, Charité - University Medicine, Charitéplatz 1, 10117, Berlin, Germany
| | - Anne Pohrt
- Charité - University Medicine, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany.,Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178, Berlin, Germany
| |
Collapse
|
10
|
Mattesen TB, Andersen CL, Bramsen JB. MethCORR infers gene expression from DNA methylation and allows molecular analysis of ten common cancer types using fresh-frozen and formalin-fixed paraffin-embedded tumor samples. Clin Epigenetics 2021; 13:20. [PMID: 33509261 PMCID: PMC7842045 DOI: 10.1186/s13148-021-01000-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 01/01/2021] [Indexed: 11/10/2022] Open
Abstract
Background Transcriptional analysis is widely used to study the molecular biology of cancer and hold great biomarker potential for clinical patient stratification. Yet, accurate transcriptional profiling requires RNA of a high quality, which often cannot be retrieved from formalin-fixed, paraffin-embedded (FFPE) tumor tissue that is routinely collected and archived in clinical departments. To overcome this roadblock to clinical testing, we previously developed MethCORR, a method that infers gene expression from DNA methylation data, which is robustly retrieved from FFPE tissue. MethCORR was originally developed for colorectal cancer and with this study, we aim to: (1) extend the MethCORR method to 10 additional cancer types and (2) to illustrate that the inferred gene expression is accurate and clinically informative. Results Regression models to infer gene expression information from DNA methylation were developed for ten common cancer types using matched RNA sequencing and DNA methylation profiles (HumanMethylation450 BeadChip) from The Cancer Genome Atlas Project. Robust and accurate gene expression profiles were inferred for all cancer types: on average, the expression of 11,000 genes was modeled with good accuracy and an intra-sample correlation of R2 = 0.90 between inferred and measured gene expression was observed. Molecular pathway analysis and transcriptional subtyping were performed for breast, prostate, and lung cancer samples to illustrate the general usability of the inferred gene expression profiles: overall, a high correlation of r = 0.96 (Pearson) in pathway enrichment scores and a 76% correspondence in molecular subtype calls were observed when using measured and inferred gene expression as input. Finally, inferred expression from FFPE tissue correlated better with RNA sequencing data from matched fresh-frozen tissue than did RNA sequencing data from FFPE tissue (P < 0.0001; Wilcoxon rank-sum test). Conclusions In all cancers investigated, MethCORR enabled DNA methylation-based transcriptional analysis, thus enabling future analysis of cancer in situations where high-quality DNA, but not RNA, is available. Here, we provide the framework and resources for MethCORR modeling of ten common cancer types, thereby widely expanding the possibilities for transcriptional studies of archival FFPE material.
Collapse
Affiliation(s)
- Trine B Mattesen
- Department of Molecular Medicine, Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark
| | - Claus L Andersen
- Department of Molecular Medicine, Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark.
| | - Jesper B Bramsen
- Department of Molecular Medicine, Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark.
| |
Collapse
|
11
|
A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data. Genes (Basel) 2020; 11:genes11080931. [PMID: 32806782 PMCID: PMC7465138 DOI: 10.3390/genes11080931] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 08/03/2020] [Accepted: 08/06/2020] [Indexed: 12/12/2022] Open
Abstract
DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using Limma. Then we applied a deep learning method, "nnet" to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <0.001. After performing deep learning analysis, we obtained average classification accuracy 90.69% (±1.97%) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using Cytoscape. We reported five top in-degree genes (PAIP2, GRWD1, VPS4B, CRADD and LLPH) and five top out-degree genes (MRPL35, FAM177A1, STAT4, ASPSCR1 and FABP7). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool WebGestalt(WEB-based Gene SeT AnaLysis Toolkit). In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.
Collapse
|
12
|
Kim S, Park HJ, Cui X, Zhi D. Collective effects of long-range DNA methylations predict gene expressions and estimate phenotypes in cancer. Sci Rep 2020; 10:3920. [PMID: 32127627 PMCID: PMC7054398 DOI: 10.1038/s41598-020-60845-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 02/07/2020] [Indexed: 01/12/2023] Open
Abstract
DNA methylation of various genomic regions has been found to be associated with gene expression in diverse biological contexts. However, most genome-wide studies have focused on the effect of (1) methylation in cis, not in trans and (2) a single CpG, not the collective effects of multiple CpGs, on gene expression. In this study, we developed a statistical machine learning model, geneEXPLORE (gene expression prediction by long-range epigenetics), that quantifies the collective effects of both cis- and trans- methylations on gene expression. By applying geneEXPLORE to The Cancer Genome Atlas (TCGA) breast and 10 other types of cancer data, we found that most genes are associated with methylations of as much as 10 Mb from the promoters or more, and the long-range methylation explains 50% of the variation in gene expression on average, far greater than cis-methylation. geneEXPLORE outperforms competing methods such as BioMethyl and MethylXcan. Further, the predicted gene expressions could predict clinical phenotypes such as breast tumor status and estrogen receptor status (AUC = 0.999, 0.94 respectively) as accurately as the measured gene expression levels. These results suggest that geneEXPLORE provides a means for accurate imputation of gene expression, which can be further used to predict clinical phenotypes.
Collapse
Affiliation(s)
- Soyeon Kim
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States.,Division of Pediatric Pulmonary Medicine, UPMC Children's hospital of Pittsburgh, Pittsburgh, Pennsylvania, United States
| | - Hyun Jung Park
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pennsylvania, United States
| | - Xiangqin Cui
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States
| | - Degui Zhi
- Center for Precision Health, School of Biomedical Informatics, School of Public Health, University of Texas Health Center at Houston, Houston, Texas, United States.
| |
Collapse
|
13
|
Malousi A, Kouidou S, Tsagiopoulou M, Papakonstantinou N, Bouras E, Georgiou E, Tzimagiorgis G, Stamatopoulos K. MeinteR: A framework to prioritize DNA methylation aberrations based on conformational and cis-regulatory element enrichment. Sci Rep 2019; 9:19148. [PMID: 31844073 PMCID: PMC6915744 DOI: 10.1038/s41598-019-55453-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 11/19/2019] [Indexed: 12/16/2022] Open
Abstract
DNA methylation studies have been reformed with the advent of single-base resolution arrays and bisulfite sequencing methods, enabling deeper investigation of methylation-mediated mechanisms. In addition to these advancements, numerous bioinformatics tools address important computational challenges, covering DNA methylation calling up to multi-modal interpretative analyses. However, contrary to the analytical frameworks that detect driver mutational signatures, the identification of putatively actionable epigenetic events remains an unmet need. The present work describes a novel computational framework, called MeinteR, that prioritizes critical DNA methylation events based on the following hypothesis: critical aberrations of DNA methylation more likely occur on a genomic substrate that is enriched in cis-acting regulatory elements with distinct structural characteristics, rather than in genomic “deserts”. In this context, the framework incorporates functional cis-elements, e.g. transcription factor binding sites, tentative splice sites, as well as conformational features, such as G-quadruplexes and palindromes, to identify critical epigenetic aberrations with potential implications on transcriptional regulation. The evaluation on multiple, public cancer datasets revealed significant associations between the highest-ranking loci with gene expression and known driver genes, enabling for the first time the computational identification of high impact epigenetic changes based on high-throughput DNA methylation data.
Collapse
Affiliation(s)
- Andigoni Malousi
- Lab. of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - Sofia Kouidou
- Lab. of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Maria Tsagiopoulou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Nikos Papakonstantinou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Emmanouil Bouras
- Lab. of Hygiene, Social-Preventive Medicine & Medical Statistics, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Elisavet Georgiou
- Lab. of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Georgios Tzimagiorgis
- Lab. of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Kostas Stamatopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
14
|
Rauluseviciute I, Drabløs F, Rye MB. DNA methylation data by sequencing: experimental approaches and recommendations for tools and pipelines for data analysis. Clin Epigenetics 2019; 11:193. [PMID: 31831061 PMCID: PMC6909609 DOI: 10.1186/s13148-019-0795-x] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 12/04/2019] [Indexed: 02/06/2023] Open
Abstract
Sequencing technologies have changed not only our approaches to classical genetics, but also the field of epigenetics. Specific methods allow scientists to identify novel genome-wide epigenetic patterns of DNA methylation down to single-nucleotide resolution. DNA methylation is the most researched epigenetic mark involved in various processes in the human cell, including gene regulation and development of diseases, such as cancer. Increasing numbers of DNA methylation sequencing datasets from human genome are produced using various platforms-from methylated DNA precipitation to the whole genome bisulfite sequencing. Many of those datasets are fully accessible for repeated analyses. Sequencing experiments have become routine in laboratories around the world, while analysis of outcoming data is still a challenge among the majority of scientists, since in many cases it requires advanced computational skills. Even though various tools are being created and published, guidelines for their selection are often not clear, especially to non-bioinformaticians with limited experience in computational analyses. Separate tools are often used for individual steps in the analysis, and these can be challenging to manage and integrate. However, in some instances, tools are combined into pipelines that are capable to complete all the essential steps to achieve the result. In the case of DNA methylation sequencing analysis, the goal of such pipeline is to map sequencing reads, calculate methylation levels, and distinguish differentially methylated positions and/or regions. The objective of this review is to describe basic principles and steps in the analysis of DNA methylation sequencing data that in particular have been used for mammalian genomes, and more importantly to present and discuss the most pronounced computational pipelines that can be used to analyze such data. We aim to provide a good starting point for scientists with limited experience in computational analyses of DNA methylation and hydroxymethylation data, and recommend a few tools that are powerful, but still easy enough to use for their own data analysis.
Collapse
Affiliation(s)
- Ieva Rauluseviciute
- Department of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and Technology, P.O. Box 8905, NO-7491, Trondheim, Norway.
| | - Finn Drabløs
- Department of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and Technology, P.O. Box 8905, NO-7491, Trondheim, Norway
| | - Morten Beck Rye
- Department of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and Technology, P.O. Box 8905, NO-7491, Trondheim, Norway.,Clinic of Surgery, St. Olavs Hospital, Trondheim University Hospital, NO-7030, Trondheim, Norway
| |
Collapse
|