1
|
Fu Y, Timp W, Sedlazeck FJ. Computational analysis of DNA methylation from long-read sequencing. Nat Rev Genet 2025:10.1038/s41576-025-00822-5. [PMID: 40155770 DOI: 10.1038/s41576-025-00822-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2025] [Indexed: 04/01/2025]
Abstract
DNA methylation is a critical epigenetic mechanism in numerous biological processes, including gene regulation, development, ageing and the onset of various diseases such as cancer. Studies of methylation are increasingly using single-molecule long-read sequencing technologies to simultaneously measure epigenetic states such as DNA methylation with genomic variation. These long-read data sets have spurred the continuous development of advanced computational methods to gain insights into the roles of methylation in regulating chromatin structure and gene regulation. In this Review, we discuss the computational methods for calling methylation signals, contrasting methylation between samples, analysing cell-type diversity and gaining additional genomic insights, and then further discuss the challenges and future perspectives of tool development for DNA methylation research.
Collapse
Affiliation(s)
- Yilei Fu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
2
|
Fu MPY, Merrill SM, Korthauer K, Kobor MS. Examining cellular heterogeneity in human DNA methylation studies: Overview and recommendations. STAR Protoc 2025; 6:103638. [PMID: 39951379 PMCID: PMC11969412 DOI: 10.1016/j.xpro.2025.103638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/20/2024] [Accepted: 01/23/2025] [Indexed: 02/16/2025] Open
Abstract
Intersample cellular heterogeneity (ISCH) is one of the largest contributors to DNA methylation (DNAme) variability. It is imperative to account for ISCH to accurately interpret analysis results in epigenome-wide association studies. We compiled this primer based on the current literature to guide researchers through the process of estimating and accounting for ISCH in DNA methylation studies. This primer outlines the procedure of bioinformatic ISCH prediction, including using reference-based and reference-free algorithms. It then follows with descriptions of several methods to account for ISCH in downstream analyses, including robust linear regression and principal-component-analysis-based adjustments. Finally, we outlined three methods for estimating differential DNAme signals in a cell-type-specific manner. Throughout the primer, we provided statistical and biological justification for our recommendations, as well as R code examples for ease of implementation.
Collapse
Affiliation(s)
- Maggie Po-Yuan Fu
- BC Children's Hospital Research Institute, Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Sarah Martin Merrill
- BC Children's Hospital Research Institute, Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada; Department of Pyschiatry and Human Behavior, The Warren Alpert Medical School at Brown University, Providence, RI, USA
| | - Keegan Korthauer
- BC Children's Hospital Research Institute, Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada; Department of Statistics, University of British Columbia, Vancouver, BC, Canada
| | - Michael Steffen Kobor
- BC Children's Hospital Research Institute, Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada; Edwin S.H. Leong Centre for Healthy Aging, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
3
|
Cheng Y, Cai B, Li H, Zhang X, D'Souza G, Shrestha S, Edmonds A, Meyers J, Fischl M, Kassaye S, Anastos K, Cohen M, Aouizerat BE, Xu K, Zhao H. HBI: a hierarchical Bayesian interaction model to estimate cell-type-specific methylation quantitative trait loci incorporating priors from cell-sorted bisulfite sequencing data. Genome Biol 2024; 25:273. [PMID: 39407252 PMCID: PMC11476968 DOI: 10.1186/s13059-024-03411-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 09/30/2024] [Indexed: 10/20/2024] Open
Abstract
Methylation quantitative trait loci (meQTLs) quantify the effects of genetic variants on DNA methylation levels. However, most published studies utilize bulk methylation datasets composed of different cell types and limit our understanding of cell-type-specific methylation regulation. We propose a hierarchical Bayesian interaction (HBI) model to infer cell-type-specific meQTLs, which integrates a large-scale bulk methylation data and a small-scale cell-type-specific methylation data. Through simulations, we show that HBI enhances the estimation of cell-type-specific meQTLs. In real data analyses, we demonstrate that HBI can further improve the functional annotation of genetic variants and identify biologically relevant cell types for complex traits.
Collapse
Affiliation(s)
- Youshu Cheng
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06511, USA
- VA Connecticut Healthcare System, West Haven, CT, 06516, USA
| | - Biao Cai
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06511, USA
| | - Hongyu Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06511, USA
| | - Xinyu Zhang
- VA Connecticut Healthcare System, West Haven, CT, 06516, USA
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, 06511, USA
| | - Gypsyamber D'Souza
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Sadeep Shrestha
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Andrew Edmonds
- The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jacquelyn Meyers
- Department of Psychiatry, SUNY Downstate Health Sciences University School of Medicine, Brooklyn, NY, USA
| | - Margaret Fischl
- Department of Medicine, University of Miami School of Medicine, Miami, FL, USA
| | - Seble Kassaye
- Division of Infectious Diseases and Tropical Medicine, Georgetown University, Washington, DC, USA
| | - Kathryn Anastos
- Department of Medicine, Albert Einstein College of Medicine, New York, NY, USA
| | - Mardge Cohen
- Hektoen Institute for Medical Research, Chicago, IL, USA
| | - Bradley E Aouizerat
- Bluestone Center for Clinical Research, College of Dentistry, New York University, New York, NY, USA
- Department of Oral and Maxillofacial Surgery, College of Dentistry, New York University, New York, NY, USA
| | - Ke Xu
- VA Connecticut Healthcare System, West Haven, CT, 06516, USA.
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, 06511, USA.
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06511, USA.
- VA Connecticut Healthcare System, West Haven, CT, 06516, USA.
| |
Collapse
|
4
|
Gorla A, Witonsky J, Elhawary JR, Chen ZJ, Mefford J, Perez-Garcia J, Huntsman S, Hu D, Eng C, Woodruff PG, Sankararaman S, Ziv E, Flint J, Zaitlen N, Burchard E, Rahmani E. Epigenetic patient stratification via contrastive machine learning refines hallmark biomarkers in minoritized children with asthma. RESEARCH SQUARE 2024:rs.3.rs-5066762. [PMID: 39315258 PMCID: PMC11419268 DOI: 10.21203/rs.3.rs-5066762/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Identifying and refining clinically significant patient stratification is a critical step toward realizing the promise of precision medicine in asthma. Several peripheral blood hallmarks, including total peripheral blood eosinophil count (BEC) and immunoglobulin E (IgE) levels, are routinely used in asthma clinical practice for endotype classification and predicting response to state-of-the-art targeted biologic drugs. However, these biomarkers appear ineffective in predicting treatment outcomes in some patients, and they differ in distribution between racially and ethnically diverse populations, potentially compromising medical care and hindering health equity due to biases in drug eligibility. Here, we propose constructing an unbiased patient stratification score based on DNA methylation (DNAm) and utilizing it to refine the efficacy of hallmark biomarkers for predicting drug response. We developed Phenotype Aware Component Analysis (PACA), a novel contrastive machine-learning method for learning combinations of DNAm sites reflecting biomedically meaningful patient stratifications. Leveraging whole-blood DNAm from Latino (discovery; n=1,016) and African American (replication; n=756) pediatric asthma case-control cohorts, we applied PACA to refine the prediction of bronchodilator response (BDR) to the short-acting β2-agonist albuterol, the most used drug to treat acute bronchospasm worldwide. While BEC and IgE correlate with BDR in the general patient population, our PACA-derived DNAm score renders these biomarkers predictive of drug response only in patients with high DNAm scores. BEC correlates with BDR in patients with upper-quartile DNAm scores (OR 1.12; 95% CI [1.04, 1.22]; P=7.9 e-4) but not in patients with lower-quartile scores (OR 1.05; 95% CI [0.95, 1.17]; P=0.21); and IgE correlates with BDR in above-median (OR for response 1.42; 95% CI [1.24, 1.63]; P=3.9e-7) but not in below-median patients (OR 1.05; 95% CI [0.92, 1.2]; P=0.57). These results hold within the commonly recognized type 2 (T2)-high asthma endotype but not in T2-low patients, suggesting that our DNAm score primarily represents an unknown variation of T2 asthma. Among T2-high patients with high DNAm scores, elevated BEC or IgE also corresponds to baseline clinical presentation that is known to benefit more from biologic treatment, including higher exacerbation scores, higher allergen sensitization, lower BMI, more recent oral corticosteroids prescription, and lower lung function. Our findings suggest that BEC and IgE, the traditional asthma biomarkers of T2-high asthma, are poor biomarkers for millions worldwide. Revisiting existing drug eligibility criteria relying on these biomarkers in asthma medical care may enhance precision and equity in treatment.
Collapse
Affiliation(s)
- Aditya Gorla
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Jonathan Witonsky
- Division of Allergy, Immunology, and Bone Marrow Transplant, Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Jennifer R Elhawary
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Zeyuan Johnson Chen
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - Joel Mefford
- Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
| | - Javier Perez-Garcia
- Genomics and Health Group, Department of Biochemistry, Microbiology, Cell Biology, and Genetics, University of La Laguna, La Laguna, Spain
| | - Scott Huntsman
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Donglei Hu
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Celeste Eng
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Prescott G Woodruff
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | - Elad Ziv
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Jonathan Flint
- Department of Psychiatry and Behavioral Sciences, Brain Research Institute, University of California Los Angeles, Los Angeles, CA, USA
| | - Noah Zaitlen
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
- Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Esteban Burchard
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Elior Rahmani
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
5
|
Herzog C, Jones A, Evans I, Raut JR, Zikan M, Cibula D, Wong A, Brenner H, Richmond RC, Widschwendter M. Cigarette Smoking and E-cigarette Use Induce Shared DNA Methylation Changes Linked to Carcinogenesis. Cancer Res 2024; 84:1898-1914. [PMID: 38503267 PMCID: PMC11148547 DOI: 10.1158/0008-5472.can-23-2957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 11/30/2023] [Accepted: 03/11/2024] [Indexed: 03/21/2024]
Abstract
Tobacco use is a major modifiable risk factor for adverse health outcomes, including cancer, and elicits profound epigenetic changes thought to be associated with long-term cancer risk. While electronic cigarettes (e-cigarettes) have been advocated as harm reduction alternatives to tobacco products, recent studies have revealed potential detrimental effects, highlighting the urgent need for further research into the molecular and health impacts of e-cigarettes. Here, we applied computational deconvolution methods to dissect the cell- and tissue-specific epigenetic effects of tobacco or e-cigarette use on DNA methylation (DNAme) in over 3,500 buccal/saliva, cervical, or blood samples, spanning epithelial and immune cells at directly and indirectly exposed sites. The 535 identified smoking-related DNAme loci [cytosine-phosphate-guanine sites (CpG)] clustered into four functional groups, including detoxification or growth signaling, based on cell type and anatomic site. Loci hypermethylated in buccal epithelial cells of smokers associated with NOTCH1/RUNX3/growth factor receptor signaling also exhibited elevated methylation in cancer tissue and progressing lung carcinoma in situ lesions, and hypermethylation of these sites predicted lung cancer development in buccal samples collected from smokers up to 22 years prior to diagnosis, suggesting a potential role in driving carcinogenesis. Alarmingly, these CpGs were also hypermethylated in e-cigarette users with a limited smoking history. This study sheds light on the cell type-specific changes to the epigenetic landscape induced by smoking-related products. SIGNIFICANCE The use of both cigarettes and e-cigarettes elicits cell- and exposure-specific epigenetic effects that are predictive of carcinogenesis, suggesting caution when broadly recommending e-cigarettes as aids for smoking cessation.
Collapse
Affiliation(s)
- Chiara Herzog
- European Translational Oncology Prevention and Screening (EUTOPS) Institute, Universität Innsbruck, Innsbruck, Austria
- Research Institute for Biomedical Aging, Universität Innsbruck, Innsbruck, Austria
| | - Allison Jones
- Department of Women's Cancer, UCL EGA Institute for Women's Health, University College London, London, United Kingdom
| | - Iona Evans
- Department of Women's Cancer, UCL EGA Institute for Women's Health, University College London, London, United Kingdom
| | - Janhavi R. Raut
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Michal Zikan
- Department of Gynecology and Obstetrics, First Faculty of Medicine and Hospital Na Bulovce, Charles University in Prague, Prague, Czech Republic
| | - David Cibula
- Gynecologic Oncology Center, Department of Obstetrics and Gynecology, First Faculty of Medicine, Charles University in Prague, General University Hospital in Prague, Prague, Czech Republic
| | - Andrew Wong
- MRC Unit for Lifelong Health and Ageing, Institute of Cardiovascular Science, University College London, London, United Kingdom
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Rebecca C. Richmond
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Martin Widschwendter
- European Translational Oncology Prevention and Screening (EUTOPS) Institute, Universität Innsbruck, Innsbruck, Austria
- Research Institute for Biomedical Aging, Universität Innsbruck, Innsbruck, Austria
- Department of Women's Cancer, UCL EGA Institute for Women's Health, University College London, London, United Kingdom
- Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
6
|
Nguyen H, Nguyen H, Tran D, Draghici S, Nguyen T. Fourteen years of cellular deconvolution: methodology, applications, technical evaluation and outstanding challenges. Nucleic Acids Res 2024; 52:4761-4783. [PMID: 38619038 PMCID: PMC11109966 DOI: 10.1093/nar/gkae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 03/01/2024] [Accepted: 04/02/2024] [Indexed: 04/16/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-Seq) is a recent technology that allows for the measurement of the expression of all genes in each individual cell contained in a sample. Information at the single-cell level has been shown to be extremely useful in many areas. However, performing single-cell experiments is expensive. Although cellular deconvolution cannot provide the same comprehensive information as single-cell experiments, it can extract cell-type information from bulk RNA data, and therefore it allows researchers to conduct studies at cell-type resolution from existing bulk datasets. For these reasons, a great effort has been made to develop such methods for cellular deconvolution. The large number of methods available, the requirement of coding skills, inadequate documentation, and lack of performance assessment all make it extremely difficult for life scientists to choose a suitable method for their experiment. This paper aims to fill this gap by providing a comprehensive review of 53 deconvolution methods regarding their methodology, applications, performance, and outstanding challenges. More importantly, the article presents a benchmarking of all these 53 methods using 283 cell types from 30 tissues of 63 individuals. We also provide an R package named DeconBenchmark that allows readers to execute and benchmark the reviewed methods (https://github.com/tinnlab/DeconBenchmark).
Collapse
Affiliation(s)
- Hung Nguyen
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | - Ha Nguyen
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | - Duc Tran
- Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, MI, USA
- Advaita Bioinformatics, Ann Arbor, MI, USA
| | - Tin Nguyen
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| |
Collapse
|
7
|
Lee MK, Azizgolshani N, Zhang Z, Perreard L, Kolling FW, Nguyen LN, Zanazzi GJ, Salas LA, Christensen BC. Associations in cell type-specific hydroxymethylation and transcriptional alterations of pediatric central nervous system tumors. Nat Commun 2024; 15:3635. [PMID: 38688903 PMCID: PMC11061294 DOI: 10.1038/s41467-024-47943-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/16/2024] [Indexed: 05/02/2024] Open
Abstract
Although intratumoral heterogeneity has been established in pediatric central nervous system tumors, epigenomic alterations at the cell type level have largely remained unresolved. To identify cell type-specific alterations to cytosine modifications in pediatric central nervous system tumors, we utilize a multi-omic approach that integrated bulk DNA cytosine modification data (methylation and hydroxymethylation) with both bulk and single-cell RNA-sequencing data. We demonstrate a large reduction in the scope of significantly differentially modified cytosines in tumors when accounting for tumor cell type composition. In the progenitor-like cell types of tumors, we identify a preponderance differential Cytosine-phosphate-Guanine site hydroxymethylation rather than methylation. Genes with differential hydroxymethylation, like histone deacetylase 4 and insulin-like growth factor 1 receptor, are associated with cell type-specific changes in gene expression in tumors. Our results highlight the importance of epigenomic alterations in the progenitor-like cell types and its role in cell type-specific transcriptional regulation in pediatric central nervous system tumors.
Collapse
Affiliation(s)
- Min Kyung Lee
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA.
| | - Nasim Azizgolshani
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
- Department of Surgery, Columbia University Medical Center, New York, NY, USA
| | - Ze Zhang
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Laurent Perreard
- Dartmouth Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Fred W Kolling
- Dartmouth Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Lananh N Nguyen
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
| | - George J Zanazzi
- Dartmouth Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
- Department of Pathology and Laboratory Medicine, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Lucas A Salas
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Brock C Christensen
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA.
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA.
- Department of Community and Family Medicine, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA.
| |
Collapse
|
8
|
Ferro dos Santos MR, Giuili E, De Koker A, Everaert C, De Preter K. Computational deconvolution of DNA methylation data from mixed DNA samples. Brief Bioinform 2024; 25:bbae234. [PMID: 38762790 PMCID: PMC11102637 DOI: 10.1093/bib/bbae234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 03/30/2024] [Accepted: 04/30/2024] [Indexed: 05/20/2024] Open
Abstract
In this review, we provide a comprehensive overview of the different computational tools that have been published for the deconvolution of bulk DNA methylation (DNAm) data. Here, deconvolution refers to the estimation of cell-type proportions that constitute a mixed sample. The paper reviews and compares 25 deconvolution methods (supervised, unsupervised or hybrid) developed between 2012 and 2023 and compares the strengths and limitations of each approach. Moreover, in this study, we describe the impact of the platform used for the generation of methylation data (including microarrays and sequencing), the applied data pre-processing steps and the used reference dataset on the deconvolution performance. Next to reference-based methods, we also examine methods that require only partial reference datasets or require no reference set at all. In this review, we provide guidelines for the use of specific methods dependent on the DNA methylation data type and data availability.
Collapse
Affiliation(s)
- Maísa R Ferro dos Santos
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Edoardo Giuili
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Andries De Koker
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Celine Everaert
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| | - Katleen De Preter
- VIB-UGent Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Zwijnaarde, Belgium
- Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium
| |
Collapse
|
9
|
Garmire LX, Li Y, Huang Q, Xu C, Teichmann SA, Kaminski N, Pellegrini M, Nguyen Q, Teschendorff AE. Challenges and perspectives in computational deconvolution of genomics data. Nat Methods 2024; 21:391-400. [PMID: 38374264 DOI: 10.1038/s41592-023-02166-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 12/26/2023] [Indexed: 02/21/2024]
Abstract
Deciphering cell-type heterogeneity is crucial for systematically understanding tissue homeostasis and its dysregulation in diseases. Computational deconvolution is an efficient approach for estimating cell-type abundances from a variety of omics data. Despite substantial methodological progress in computational deconvolution in recent years, challenges are still outstanding. Here we enlist four important challenges related to computational deconvolution: the quality of the reference data, generation of ground truth data, limitations of computational methodologies, and benchmarking design and implementation. Finally, we make recommendations on reference data generation, new directions of computational methodologies, and strategies to promote rigorous benchmarking.
Collapse
Affiliation(s)
- Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| | - Yijun Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Qianhui Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Naftali Kaminski
- Pulmonary, Critical Care & Sleep Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Matteo Pellegrini
- Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Quan Nguyen
- Institute for Molecular Bioscience, The University of Queensland and QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- UCL Cancer Institute, University College London, London, UK
| |
Collapse
|
10
|
Zheng Y, Jun J, Brennan K, Gevaert O. EpiMix is an integrative tool for epigenomic subtyping using DNA methylation. CELL REPORTS METHODS 2023; 3:100515. [PMID: 37533639 PMCID: PMC10391348 DOI: 10.1016/j.crmeth.2023.100515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 04/12/2023] [Accepted: 06/01/2023] [Indexed: 08/04/2023]
Abstract
DNA methylation (DNAme) is a major epigenetic factor influencing gene expression with alterations leading to cancer and immunological and cardiovascular diseases. Recent technological advances have enabled genome-wide profiling of DNAme in large human cohorts. There is a need for analytical methods that can more sensitively detect differential methylation profiles present in subsets of individuals from these heterogeneous, population-level datasets. We developed an end-to-end analytical framework named "EpiMix" for population-level analysis of DNAme and gene expression. Compared with existing methods, EpiMix showed higher sensitivity in detecting abnormal DNAme that was present in only small patient subsets. We extended the model-based analyses of EpiMix to cis-regulatory elements within protein-coding genes, distal enhancers, and genes encoding microRNAs and long non-coding RNAs (lncRNAs). Using cell-type-specific data from two separate studies, we discover epigenetic mechanisms underlying childhood food allergy and survival-associated, methylation-driven ncRNAs in non-small cell lung cancer.
Collapse
Affiliation(s)
- Yuanning Zheng
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - John Jun
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Kevin Brennan
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
11
|
Lee MK, Azizgolshani N, Zhang Z, Perreard L, Kolling FW, Nguyen LN, Zanazzi GJ, Salas LA, Christensen BC. Hydroxymethylation alterations in progenitor-like cell types of pediatric central nervous system tumors are associated with cell type-specific transcriptional changes. RESEARCH SQUARE 2023:rs.3.rs-2517758. [PMID: 36909536 PMCID: PMC10002842 DOI: 10.21203/rs.3.rs-2517758/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Although intratumoral heterogeneity has been established in pediatric central nervous system tumors, epigenomic alterations at the cell type level have largely remained unresolved. To identify cell type-specific alterations to cytosine modifications in pediatric central nervous system tumors we utilized a multi-omic approach that integrated bulk DNA cytosine modification data (methylation and hydroxymethylation) with both bulk and single-cell RNA-sequencing data. We demonstrate a large reduction in the scope of significantly differentially modified cytosines in tumors when accounting for tumor cell type composition. In the progenitor-like cell types of tumors, we identified a preponderance differential CpG hydroxymethylation rather than methylation. Genes with differential hydroxymethylation, like HDAC4 and IGF1R, were associated with cell type-specific changes in gene expression in tumors. Our results highlight the importance of epigenomic alterations in the progenitor-like cell types and its role in cell type-specific transcriptional regulation in pediatric CNS tumors.
Collapse
Affiliation(s)
- Min Kyung Lee
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Nasim Azizgolshani
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
- Department of Cardiothoracic Surgery, Columbia University Medical Center, New York, NY, USA
| | - Ze Zhang
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Laurent Perreard
- Dartmouth Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Fred W Kolling
- Dartmouth Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Lananh N Nguyen
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
| | - George J Zanazzi
- Dartmouth Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
- Department of Pathology and Laboratory Medicine, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Lucas A Salas
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Brock C Christensen
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
- Department of Community and Family Medicine, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| |
Collapse
|
12
|
Chen L, Li Z, Wu H. CeDAR: incorporating cell type hierarchy improves cell type-specific differential analyses in bulk omics data. Genome Biol 2023; 24:37. [PMID: 36855165 PMCID: PMC9972684 DOI: 10.1186/s13059-023-02857-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 01/17/2023] [Indexed: 03/02/2023] Open
Abstract
Bulk high-throughput omics data contain signals from a mixture of cell types. Recent developments of deconvolution methods facilitate cell type-specific inferences from bulk data. Our real data exploration suggests that differential expression or methylation status is often correlated among cell types. Based on this observation, we develop a novel statistical method named CeDAR to incorporate the cell type hierarchy in cell type-specific differential analyses of bulk data. Extensive simulation and real data analyses demonstrate that this approach significantly improves the accuracy and power in detecting cell type-specific differential signals compared with existing methods, especially in low-abundance cell types.
Collapse
Affiliation(s)
- Luxiao Chen
- Department of Biostatistics and Bioinformatics, Emory University, GA 30322 Atlanta, USA
| | - Ziyi Li
- Department of Biostatistics, The University of MD Anderson Cancer Center, 77030 Houston, TX, USA
| | - Hao Wu
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, 518055 P.R. China
| |
Collapse
|
13
|
Zheng Y, Jun J, Brennan K, Gevaert O. EpiMix: an integrative tool for epigenomic subtyping using DNA methylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.03.522660. [PMID: 36711917 PMCID: PMC9881910 DOI: 10.1101/2023.01.03.522660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
DNA methylation (DNAme) is a major epigenetic factor influencing gene expression with alterations leading to cancer, immunological, and cardiovascular diseases. Recent technological advances enable genome-wide quantification of DNAme in large human cohorts. So far, existing methods have not been evaluated to identify differential DNAme present in large and heterogeneous patient cohorts. We developed an end-to-end analytical framework named "EpiMix" for population-level analysis of DNAme and gene expression. Compared to existing methods, EpiMix showed higher sensitivity in detecting abnormal DNAme that was present in only small patient subsets. We extended the model-based analyses of EpiMix to cis-regulatory elements within protein-coding genes, distal enhancers, and genes encoding microRNAs and lncRNAs. Using cell-type specific data from two separate studies, we discovered novel epigenetic mechanisms underlying childhood food allergy and survival-associated, methylation-driven non-coding RNAs in non-small cell lung cancer.
Collapse
Affiliation(s)
- Yuanning Zheng
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - John Jun
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Kevin Brennan
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine & Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
14
|
Song J, Kuan PF. A systematic assessment of cell type deconvolution algorithms for DNA methylation data. Brief Bioinform 2022; 23:bbac449. [PMID: 36242584 PMCID: PMC9947552 DOI: 10.1093/bib/bbac449] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 08/11/2022] [Accepted: 09/20/2022] [Indexed: 12/14/2022] Open
Abstract
We performed systematic assessment of computational deconvolution methods that play an important role in the estimation of cell type proportions from bulk methylation data. The proposed framework methylDeConv (available as an R package) integrates several deconvolution methods for methylation profiles (Illumina HumanMethylation450 and MethylationEPIC arrays) and offers different cell-type-specific CpG selection to construct the extended reference library which incorporates the main immune cell subsets, epithelial cells and cell-free DNAs. We compared the performance of different deconvolution algorithms via simulations and benchmark datasets and further investigated the associations of the estimated cell type proportions to cancer therapy in breast cancer and subtypes in melanoma methylation case studies. Our results indicated that the deconvolution based on the extended reference library is critical to obtain accurate estimates of cell proportions in non-blood tissues.
Collapse
Affiliation(s)
- Junyan Song
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY
| | - Pei-Fen Kuan
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY
| |
Collapse
|
15
|
Shi C, Zhu J, Shen Y, Luo S, Zhu H, Song R. Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2110876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Affiliation(s)
| | | | - Ye Shen
- North Carolina State University
| | | | - Hongtu Zhu
- University of North Carolina at Chapel Hill
| | | |
Collapse
|
16
|
Jeong Y, de Andrade e Sousa LB, Thalmeier D, Toth R, Ganslmeier M, Breuer K, Plass C, Lutsik P. Systematic evaluation of cell-type deconvolution pipelines for sequencing-based bulk DNA methylomes. Brief Bioinform 2022; 23:bbac248. [PMID: 35794707 PMCID: PMC9294431 DOI: 10.1093/bib/bbac248] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 05/18/2022] [Accepted: 05/26/2022] [Indexed: 11/18/2022] Open
Abstract
DNA methylation analysis by sequencing is becoming increasingly popular, yielding methylomes at single-base pair and single-molecule resolution. It has tremendous potential for cell-type heterogeneity analysis using intrinsic read-level information. Although diverse deconvolution methods were developed to infer cell-type composition based on bulk sequencing-based methylomes, systematic evaluation has not been performed yet. Here, we thoroughly benchmark six previously published methods: Bayesian epiallele detection, DXM, PRISM, csmFinder+coMethy, ClubCpG and MethylPurify, together with two array-based methods, MeDeCom and Houseman, as a comparison group. Sequencing-based deconvolution methods consist of two main steps, informative region selection and cell-type composition estimation, thus each was individually assessed. With this elaborate evaluation, we aimed to establish which method achieves the highest performance in different scenarios of synthetic bulk samples. We found that cell-type deconvolution performance is influenced by different factors depending on the number of cell types within the mixture. Finally, we propose a best-practice deconvolution strategy for sequencing data and point out limitations that need to be handled. Array-based methods-both reference-based and reference-free-generally outperformed sequencing-based methods, despite the absence of read-level information. This implies that the current sequencing-based methods still struggle with correctly identifying cell-type-specific signals and eliminating confounding methylation patterns, which needs to be handled in future studies.
Collapse
Affiliation(s)
- Yunhee Jeong
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
- Faculty of Mathematics and Informatics, Heidelberg University, Im Neuenheimer Feld 205, 69120, Heidelberg, Germany
| | | | - Dominik Thalmeier
- Helmholtz AI, Helmholtz Zentrum München, Ingolstädter Landstraβ e 1, 85764, Neuherberg, Germany
| | - Reka Toth
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Marlene Ganslmeier
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Kersten Breuer
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Christoph Plass
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Pavlo Lutsik
- Division of Cancer Epigenomics, German Cancer Research (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| |
Collapse
|
17
|
Vara EL, Langefeld CD, Wolf BJ, Howard TD, Hawkins GA, Quet Q, Moultrie LH, Quinnette King L, Molano ID, Bray SL, Ueberroth LA, Lim SS, Williams EM, Kamen DL, Ramos PS. Social Factors, Epigenomics and Lupus in African American Women (SELA) Study: protocol for an observational mechanistic study examining the interplay of multiple individual and social factors on lupus outcomes in a health disparity population. Lupus Sci Med 2022; 9:9/1/e000698. [PMID: 35768168 PMCID: PMC9244713 DOI: 10.1136/lupus-2022-000698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 06/14/2022] [Indexed: 11/08/2022]
Abstract
Introduction Despite the disproportional impact of SLE on historically marginalised communities, the individual and sociocultural factors underlying these health disparities remain elusive. We report the design and methods for a study aimed at identifying epigenetic biomarkers associated with racism and resiliency that affect gene function and thereby influence SLE in a health disparity population. Methods and analysis The Social Factors, Epigenomics and Lupus in African American Women (SELA) Study is a cross-sectional, case–control study. A total of 600 self-reported African American women will be invited to participate. All participants will respond to questionnaires that capture detailed sociodemographic and medical history, validated measures of racial discrimination, social support, as well as disease activity and damage for cases. Participants who wish will receive their genetic ancestry estimates and be involved in research. Blood samples are required to provide peripheral blood mononuclear cell counts, DNA and RNA. The primary goals of SELA are to identify variation in DNA methylation (DNAm) associated with self-reported exposure to racial discrimination and social support, to evaluate whether social DNAm sites affect gene expression, to identify the synergistic effects of social factors on DNAm changes on SLE and to develop a social factors-DNAm predictive model for disease outcomes. This study is conducted in cooperation with the Sea Island Families Project Citizen Advisory Committee. Discussion and dissemination SELA will respond to the pressing need to clarify the interplay and regulatory mechanism by which various positive and negative social exposures influence SLE. Results will be published and shared with patients and the community. Knowledge of the biological impact of social exposures on SLE, as informed by the results of this study, can be leveraged by advocacy efforts to develop psychosocial interventions that prevent or mitigate risk exposures, and services or interventions that promote positive exposures. Implementation of such interventions is paramount to the closure of the health disparities gap.
Collapse
Affiliation(s)
- Emily L Vara
- Department of Medicine, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Carl D Langefeld
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA.,Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Bethany J Wolf
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Timothy D Howard
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA.,Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Gregory A Hawkins
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA.,Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Queen Quet
- Gullah/Geechee Nation, St Helena Island, South Carolina, USA
| | - Lee H Moultrie
- Lee H Moultrie & Associates, North Charleston, South Carolina, USA
| | - L Quinnette King
- Department of Medicine, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Ivan D Molano
- Department of Medicine, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Stephanie L Bray
- Department of Medicine, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Lori Ann Ueberroth
- Department of Medicine, Medical University of South Carolina, Charleston, South Carolina, USA
| | - S Sam Lim
- Department of Medicine, Emory University, Atlanta, Georgia, USA.,Department of Epidemiology, Emory University, Atlanta, Georgia, USA
| | - Edith M Williams
- Department of Medicine, Medical University of South Carolina, Charleston, South Carolina, USA.,Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Diane L Kamen
- Department of Medicine, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Paula S Ramos
- Department of Medicine, Medical University of South Carolina, Charleston, South Carolina, USA .,Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, USA
| |
Collapse
|
18
|
Salas LA, Peres LC, Thayer ZM, Smith RWA, Guo Y, Chung W, Si J, Liang L. A transdisciplinary approach to understand the epigenetic basis of race/ethnicity health disparities. Epigenomics 2021; 13:1761-1770. [PMID: 33719520 PMCID: PMC8579937 DOI: 10.2217/epi-2020-0080] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 04/07/2020] [Indexed: 11/21/2022] Open
Abstract
Health disparities correspond to differences in disease burden and mortality among socially defined population groups. Such disparities may emerge according to race/ethnicity, socioeconomic status and a variety of other social contexts, and are documented for a wide range of diseases. Here, we provide a transdisciplinary perspective on the contribution of epigenetics to the understanding of health disparities, with a special emphasis on disparities across socially defined racial/ethnic groups. Scientists in the fields of biological anthropology, bioinformatics and molecular epidemiology provide a summary of theoretical, statistical and practical considerations for conducting epigenetic health disparities research, and provide examples of successful applications from cancer research using this approach.
Collapse
Affiliation(s)
- Lucas A Salas
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Lebanon, NH 03756, USA
| | - Lauren C Peres
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
| | - Zaneta M Thayer
- Department of Anthropology, Dartmouth College, Hanover, NH 03755, USA
| | - Rick WA Smith
- Department of Anthropology, Dartmouth College, Hanover, NH 03755, USA
- The William H. Neukom Institute for Computational Science, Dartmouth College, Hanover, NH 03755, USA
| | | | - Wonil Chung
- Department of Statistics & Actuarial Science, Soongsil University, Seoul, 06478, Korea
- Program in Genetic Epidemiology & Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Jiahui Si
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Department of Biostatistics & Epidemiology, Peking University School of Public Health, Beijing, 100191, China
| | - Liming Liang
- Program in Genetic Epidemiology & Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
19
|
Meier R, Nissen E, Koestler DC. Low variability in the underlying cellular landscape adversely affects the performance of interaction-based approaches for conducting cell-specific analyses of DNA methylation in bulk samples. Stat Appl Genet Mol Biol 2021; 20:73-84. [PMID: 34378875 PMCID: PMC9125800 DOI: 10.1515/sagmb-2021-0004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 07/19/2021] [Indexed: 11/15/2022]
Abstract
Statistical methods that allow for cell type specific DNA methylation (DNAm) analyses based on bulk-tissue methylation data have great potential to improve our understanding of human disease and have created unprecedented opportunities for new insights using the wealth of publicly available bulk-tissue methylation data. These methodologies involve incorporating interaction terms formed between the phenotypes/exposures of interest and proportions of the cell types underlying the bulk-tissue sample used for DNAm profiling. Despite growing interest in such "interaction-based" methods, there has been no comprehensive assessment how variability in the cellular landscape across study samples affects their performance. To answer this question, we used numerous publicly available whole-blood DNAm data sets along with extensive simulation studies and evaluated the performance of interaction-based approaches in detecting cell-specific methylation effects. Our results show that low cell proportion variability results in large estimation error and low statistical power for detecting cell-specific effects of DNAm. Further, we identified that many studies targeting methylation profiling in whole-blood may be at risk to be underpowered due to low variability in the cellular landscape across study samples. Finally, we discuss guidelines for researchers seeking to conduct studies utilizing interaction-based approaches to help ensure that their studies are adequately powered.
Collapse
Affiliation(s)
- Richard Meier
- Department of Biostatistics & Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City66160, KS, USA
| | - Emily Nissen
- Department of Biostatistics & Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City66160, KS, USA
| | - Devin C. Koestler
- Department of Biostatistics & Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City66160, KS, USA
| |
Collapse
|
20
|
Zhang W, Wu H, Li Z. Complete deconvolution of DNA methylation signals from complex tissues: a geometric approach. Bioinformatics 2021; 37:1052-1059. [PMID: 33135072 PMCID: PMC8150138 DOI: 10.1093/bioinformatics/btaa930] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 10/16/2020] [Accepted: 10/21/2020] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION It is a common practice in epigenetics research to profile DNA methylation on tissue samples, which is usually a mixture of different cell types. To properly account for the mixture, estimating cell compositions has been recognized as an important first step. Many methods were developed for quantifying cell compositions from DNA methylation data, but they mostly have limited applications due to lack of reference or prior information. RESULTS We develop Tsisal, a novel complete deconvolution method which accurately estimate cell compositions from DNA methylation data without any prior knowledge of cell types or their proportions. Tsisal is a full pipeline to estimate number of cell types, cell compositions and identify cell-type-specific CpG sites. It can also assign cell type labels when (full or part of) reference panel is available. Extensive simulation studies and analyses of seven real datasets demonstrate the favorable performance of our proposed method compared with existing deconvolution methods serving similar purpose. AVAILABILITY AND IMPLEMENTATION The proposed method Tsisal is implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weiwei Zhang
- School of Science, East China University of Technology, Nanchang, Jiangxi 330013, China
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
21
|
Comprehensive cell type decomposition of circulating cell-free DNA with CelFiE. Nat Commun 2021; 12:2717. [PMID: 33976150 PMCID: PMC8113516 DOI: 10.1038/s41467-021-22901-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 03/23/2021] [Indexed: 12/18/2022] Open
Abstract
Circulating cell-free DNA (cfDNA) in the bloodstream originates from dying cells and is a promising noninvasive biomarker for cell death. Here, we propose an algorithm, CelFiE, to accurately estimate the relative abundances of cell types and tissues contributing to cfDNA from epigenetic cfDNA sequencing. In contrast to previous work, CelFiE accommodates low coverage data, does not require CpG site curation, and estimates contributions from multiple unknown cell types that are not available in external reference data. In simulations, CelFiE accurately estimates known and unknown cell type proportions from low coverage and noisy cfDNA mixtures, including from cell types composing less than 1% of the total mixture. When used in two clinically-relevant situations, CelFiE correctly estimates a large placenta component in pregnant women, and an elevated skeletal muscle component in amyotrophic lateral sclerosis (ALS) patients, consistent with the occurrence of muscle wasting typical in these patients. Together, these results show how CelFiE could be a useful tool for biomarker discovery and monitoring the progression of degenerative disease. Tissue damage and turnover lead to the release of DNA in the blood and can be used to monitor changes in tissue state. Here, the authors developed a tool to accurately estimate the proportion of cell types contributing to cell-free DNA in the blood, with an application to pregnant women and ALS patients.
Collapse
|
22
|
EMeth: An EM algorithm for cell type decomposition based on DNA methylation data. Sci Rep 2021; 11:5717. [PMID: 33707472 PMCID: PMC7952399 DOI: 10.1038/s41598-021-84864-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 02/22/2021] [Indexed: 12/31/2022] Open
Abstract
We introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.
Collapse
|
23
|
Scherer M, Schmidt F, Lazareva O, Walter J, Baumbach J, Schulz MH, List M. Machine learning for deciphering cell heterogeneity and gene regulation. NATURE COMPUTATIONAL SCIENCE 2021; 1:183-191. [PMID: 38183187 DOI: 10.1038/s43588-021-00038-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 02/08/2021] [Indexed: 12/14/2022]
Abstract
Epigenetics studies inheritable and reversible modifications of DNA that allow cells to control gene expression throughout their development and in response to environmental conditions. In computational epigenomics, machine learning is applied to study various epigenetic mechanisms genome wide. Its aim is to expand our understanding of cell differentiation, that is their specialization, in health and disease. Thus far, most efforts focus on understanding the functional encoding of the genome and on unraveling cell-type heterogeneity. Here, we provide an overview of state-of-the-art computational methods and their underlying statistical concepts, which range from matrix factorization and regularized linear regression to deep learning methods. We further show how the rise of single-cell technology leads to new computational challenges and creates opportunities to further our understanding of epigenetic regulation.
Collapse
Affiliation(s)
- Michael Scherer
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany
- Computational Biology Group, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
- Graduate School of Computer Science, Saarland Informatics Campus, Saarbrücken, Germany
| | | | - Olga Lazareva
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Jörn Walter
- Computational Biology Group, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Computational BioMedicine Lab, Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Marcel H Schulz
- Institute of Cardiovascular Regeneration, University Hospital and Goethe University Frankfurt, Frankfurt, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
| |
Collapse
|
24
|
Mancarella D, Plass C. Epigenetic signatures in cancer: proper controls, current challenges and the potential for clinical translation. Genome Med 2021; 13:23. [PMID: 33568205 PMCID: PMC7874645 DOI: 10.1186/s13073-021-00837-7] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 01/21/2021] [Indexed: 12/26/2022] Open
Abstract
Epigenetic alterations are associated with normal biological processes such as aging or differentiation. Changes in global epigenetic signatures, together with genetic alterations, are driving events in several diseases including cancer. Comparative studies of cancer and healthy tissues found alterations in patterns of DNA methylation, histone posttranslational modifications, and changes in chromatin accessibility. Driven by sophisticated, next-generation sequencing-based technologies, recent studies discovered cancer epigenomes to be dominated by epigenetic patterns already present in the cell-of-origin, which transformed into a neoplastic cell. Tumor-specific epigenetic changes therefore need to be redefined and factors influencing epigenetic patterns need to be studied to unmask truly disease-specific alterations. The underlying mechanisms inducing cancer-associated epigenetic alterations are poorly understood. Studies of mutated epigenetic modifiers, enzymes that write, read, or edit epigenetic patterns, or mutated chromatin components, for example oncohistones, help to provide functional insights on how cancer epigenomes arise. In this review, we highlight the importance and define challenges of proper control tissues and cell populations to exploit cancer epigenomes. We summarize recent advances describing mechanisms leading to epigenetic changes in tumorigenesis and briefly discuss advances in investigating their translational potential.
Collapse
Affiliation(s)
- Daniela Mancarella
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany. .,Faculty of Biosciences, Ruprecht-Karls-University of Heidelberg, 69120, Heidelberg, Germany.
| | - Christoph Plass
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany.,German Consortium for Translational Cancer Research (DKTK), 69120, Heidelberg, Germany
| |
Collapse
|
25
|
Maternal DNA Methylation During Pregnancy: a Review. Reprod Sci 2021; 28:2758-2769. [PMID: 33469876 DOI: 10.1007/s43032-020-00456-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 12/29/2020] [Indexed: 12/19/2022]
Abstract
Multiple environmental, behavioral, and hereditary factors affect pregnancy. Recent studies suggest that epigenetic modifications, such as DNA methylation (DNAm), affect both maternal and fetal health during the period of gestation. Some of the pregnancy-related risk factors can influence maternal DNAm, thus predisposing both the mother and the neonate to clinical adversities with long-lasting consequences. DNAm alterations in the promoter and enhancer regions modulate gene expression changes which play vital physiological role. In this review, we have discussed the recent advances in our understanding of maternal DNA methylation changes during pregnancy and its associated complications such as gestational diabetes and anemia, adverse pregnancy outcomes like preterm birth, and preeclampsia. We have also highlighted some major gaps and limitations in the area which if addressed might improve our understanding of pregnancy and its associated adverse clinical conditions, ultimately leading to healthy pregnancies and reduction of public health burden.
Collapse
|
26
|
Campbell KA, Colacino JA, Park SK, Bakulski KM. Cell Types in Environmental Epigenetic Studies: Biological and Epidemiological Frameworks. Curr Environ Health Rep 2021; 7:185-197. [PMID: 32794033 DOI: 10.1007/s40572-020-00287-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
PURPOSE OF REVIEW This article introduces the roles of perinatal DNA methylation in human health and disease, highlights the challenges of tissue and cellular heterogeneity to studying DNA methylation, summarizes approaches to overcome these challenges, and offers recommendations in conducting research in environmental epigenetics. RECENT FINDINGS Epigenetic modifications are essential for human development and are labile to environmental influences, especially during gestation. Epigenetic dysregulation is also a hallmark of multiple diseases. Environmental epigenetic studies routinely measure DNA methylation in readily available tissues. However, tissues and cell types exhibit specific epigenetic patterning and heterogeneity between samples complicates epigenetic studies. Failure to account for cell-type heterogeneity limits identification of biological mechanisms and biases study results. Tissue-level epigenetic measures represent a convolution of epigenetic signals from individual cell types. Tissue-specific epigenetics is an evolving field and the use of disease-affected target, surrogate, or multiple tissues has inherent trade-offs and affects inference. Likewise, experimental and bioinformatic approaches to accommodate cell-type heterogeneity have varying assumptions and inherent trade-offs that affect inference. The relationships between exposure, disease, tissue-level DNA methylation, cell type-specific DNA methylation, and cell-type heterogeneity must be carefully considered in study design and analysis. Causal diagrams can inform study design and analytic strategies. Properly addressing cell-type heterogeneity limits sources of potential bias, avoids misinterpretation of study results, and allows investigators to distinguish shifts in cell-type proportions from direct changes to cellular epigenetic programming, both of which provide insights into environmental disease etiology and aid development of novel methods for prevention and treatment.
Collapse
Affiliation(s)
- Kyle A Campbell
- Department of Epidemiology, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI, USA.
| | - Justin A Colacino
- Department of Environmental Health Sciences, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Sung Kyun Park
- Department of Epidemiology, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI, USA.,Department of Environmental Health Sciences, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Kelly M Bakulski
- Department of Epidemiology, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
27
|
Chen Z, Wu A. Progress and challenge for computational quantification of tissue immune cells. Brief Bioinform 2021; 22:6065002. [PMID: 33401306 DOI: 10.1093/bib/bbaa358] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 10/23/2020] [Accepted: 11/07/2020] [Indexed: 12/28/2022] Open
Abstract
Tissue immune cells have long been recognized as important regulators for the maintenance of balance in the body system. Quantification of the abundance of different immune cells will provide enhanced understanding of the correlation between immune cells and normal or abnormal situations. Currently, computational methods to predict tissue immune cell compositions from bulk transcriptomes have been largely developed. Therefore, summarizing the advantages and disadvantages is appropriate. In addition, an examination of the challenges and possible solutions for these computational models will assist the development of this field. The common hypothesis of these models is that the expression of signature genes for immune cell types might represent the proportion of immune cells that contribute to the tissue transcriptome. In general, we grouped all reported tools into three groups, including reference-free, reference-based scoring and reference-based deconvolution methods. In this review, a summary of all the currently reported computational immune cell quantification tools and their applications, limitations, and perspectives are presented. Furthermore, some critical problems are found that have limited the performance and application of these models, including inadequate immune cell type, the collinearity problem, the impact of the tissue environment on the immune cell expression level, and the deficiency of standard datasets for model validation. To address these issues, tissue specific training datasets that include all known immune cells, a hierarchical computational framework, and benchmark datasets including both tissue expression profiles and the abundances of all the immune cells are proposed to further promote the development of this field.
Collapse
Affiliation(s)
- Ziyi Chen
- Suzhou Institute of Systems Medicine, Center for Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Jiangsu, Suzhou, China
| | - Aiping Wu
- Suzhou Institute of Systems Medicine, Center for Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Jiangsu, Suzhou, China
| |
Collapse
|
28
|
Qin Y, Zhang W, Sun X, Nan S, Wei N, Wu HJ, Zheng X. Deconvolution of heterogeneous tumor samples using partial reference signals. PLoS Comput Biol 2020; 16:e1008452. [PMID: 33253170 PMCID: PMC7728196 DOI: 10.1371/journal.pcbi.1008452] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 12/10/2020] [Accepted: 10/19/2020] [Indexed: 12/16/2022] Open
Abstract
Deconvolution of heterogeneous bulk tumor samples into distinct cellular populations is an important yet challenging problem, particularly when only partial references are available. A common approach to dealing with this problem is to deconvolve the mixed signals using available references and leverage the remaining signal as a new cell component. However, as indicated in our simulation, such an approach tends to over-estimate the proportions of known cell types and fails to detect novel cell types. Here, we propose PREDE, a partial reference-based deconvolution method using an iterative non-negative matrix factorization algorithm. Our method is verified to be effective in estimating cell proportions and expression profiles of unknown cell types based on simulated datasets at a variety of parameter settings. Applying our method to TCGA tumor samples, we found that proportions of pure cancer cells better indicate different subtypes of tumor samples. We also detected several cell types for each cancer type whose proportions successfully predicted patient survival. Our method makes a significant contribution to deconvolution of heterogeneous tumor samples and could be widely applied to varieties of high throughput bulk data. PREDE is implemented in R and is freely available from GitHub (https://xiaoqizheng.github.io/PREDE). Tumor tissues are mixtures of different cell types. Identification and quantification of constitutional cell types within tumor tissues are important tasks in cancer research. The problem can be readily solved using regression-based methods if reference signals are available. But in most clinical applications, only partial references are available, which significantly reduces the deconvolution accuracy of the existing regression-based methods. In this paper, we propose a partial-reference based deconvolution model, PREDE, integrating the non-negative matrix factorization framework with an iterative optimization strategy. We conducted comprehensive evaluations for PREDE using both simulation and real data analyses, demonstrating better performance of our method than other existing methods.
Collapse
Affiliation(s)
- Yufang Qin
- College of Information Technology, Shanghai Ocean University, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Weiwei Zhang
- School of Science, East China University of Technology, Nanchang, Jiangxi, China
| | - Xiaoqiang Sun
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Siwei Nan
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Nana Wei
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Hua-Jun Wu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, China
- * E-mail:
| |
Collapse
|
29
|
Li Z, Guo Z, Cheng Y, Jin P, Wu H. Robust partial reference-free cell composition estimation from tissue expression. Bioinformatics 2020; 36:3431-3438. [PMID: 32167531 DOI: 10.1093/bioinformatics/btaa184] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 03/05/2020] [Accepted: 03/10/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION In the analysis of high-throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell-sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy. RESULTS We introduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell-type-specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods. AVAILABILITY AND IMPLEMENTATION The proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. CONTACT ziyi.li@emory.edu or hao.wu@emory.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Zhenxing Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Ying Cheng
- Institute of Biomedical Research, Yunnan University, Kunming, China
| | - Peng Jin
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
30
|
Shi M, Sheng Z, Tang H. Prognostic outcome prediction by semi-supervised least squares classification. Brief Bioinform 2020; 22:5935498. [PMID: 33094318 DOI: 10.1093/bib/bbaa249] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/04/2020] [Accepted: 09/04/2020] [Indexed: 11/13/2022] Open
Abstract
Although great progress has been made in prognostic outcome prediction, small sample size remains a challenge in obtaining accurate and robust classifiers. We proposed the Rescaled linear square Regression based Least Squares Learning (RRLSL), a jointly developed semi-supervised feature selection and classifier, for predicting prognostic outcome of cancer patients. RRLSL used the least square regression to identify the scale factors and then rank the features in available multiple types of molecular data. We applied the unlabeled multiple molecular data in conjunction with the labeled data to develop a similarity graph. RRLSL produced the constraint with kernel functions to bridge the gap between label information and geometry information from messenger RNA and microRNA expression profiling. Importantly, this semi-supervised model proposed the least squares learning with L2 regularization to develop a semi-supervised classifier. RRLSL suggested the performance improvement in the prognostic outcome prediction and successfully discriminated between the recurrent patients and non-recurrent ones. We also demonstrated that RRLSL improved the accuracy and Area Under the Precision Recall Curve (AUPRC) as compared to the baseline semi-supervised methods. RRLSL is available for a stand-alone software package (https://github.com/ShiMGLab/RRLSL). A short abstract We proposed the Rescaled linear square Regression based Least Squares Learning (RRLSL), a jointly developed semi-supervised feature selection and classifier, for predicting prognostic outcome of cancer patients. RRLSL used the least square regression to identify the scale factors to rank the features in available multiple types of molecular data. RRLSL produced the constraint with kernel functions to bridge the gap between label information and geometry information from messenger RNA and microRNA expression profiling. Importantly, this semi-supervised model proposed the least squares learning with L2 regularization to develop the semi-supervised classifier. RRLSL suggested the performance improvement in the prognostic outcome prediction and successfully discriminated between the recurrent patients and non-recurrent ones.
Collapse
Affiliation(s)
- Mingguang Shi
- School of Electric Engineering and Automation, Hefei University of Technology, Hefei, Anhui, 230009 China
| | - Zhou Sheng
- School of Electric Engineering and Automation, Hefei University of Technology, Hefei, Anhui, 230009 China
| | - Hao Tang
- School of Electric Engineering and Automation, Hefei University of Technology, Hefei, Anhui, 230009 China
| |
Collapse
|
31
|
Scherer M, Nazarov PV, Toth R, Sahay S, Kaoma T, Maurer V, Vedeneev N, Plass C, Lengauer T, Walter J, Lutsik P. Reference-free deconvolution, visualization and interpretation of complex DNA methylation data using DecompPipeline, MeDeCom and FactorViz. Nat Protoc 2020; 15:3240-3263. [PMID: 32978601 DOI: 10.1038/s41596-020-0369-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 05/29/2020] [Indexed: 12/13/2022]
Abstract
DNA methylation profiling offers unique insights into human development and diseases. Often the analysis of complex tissues and cell mixtures is the only feasible option to study methylation changes across large patient cohorts. Since DNA methylomes are highly cell type specific, deconvolution methods can be used to recover cell type-specific information in the form of latent methylation components (LMCs) from such 'bulk' samples. Reference-free deconvolution methods retrieve these components without the need for DNA methylation profiles of purified cell types. Currently no integrated and guided procedure is available for data preparation and subsequent interpretation of deconvolution results. Here, we describe a three-stage protocol for reference-free deconvolution of DNA methylation data comprising: (i) data preprocessing, confounder adjustment using independent component analysis (ICA) and feature selection using DecompPipeline, (ii) deconvolution with multiple parameters using MeDeCom, RefFreeCellMix or EDec and (iii) guided biological inference and validation of deconvolution results with the R/Shiny graphical user interface FactorViz. Our protocol simplifies the analysis and guides the initial interpretation of DNA methylation data derived from complex samples. The harmonized approach is particularly useful to dissect and evaluate cell heterogeneity in complex systems such as tumors. We apply the protocol to lung cancer methylomes from The Cancer Genome Atlas (TCGA) and show that our approach identifies the proportions of stromal cells and tumor-infiltrating immune cells, as well as associations of the detected components with clinical parameters. The protocol takes slightly >3 d to complete and requires basic R skills.
Collapse
Affiliation(s)
- Michael Scherer
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany.,Computational Biology, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Petr V Nazarov
- Quantitative Biology Unit, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Reka Toth
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Division of Thoracic Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Shashwat Sahay
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany.,Center for Digital Health, Berlin Institute of Health and Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Tony Kaoma
- Quantitative Biology Unit, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Valentin Maurer
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | - Christoph Plass
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Thomas Lengauer
- Computational Biology, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Jörn Walter
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany
| | - Pavlo Lutsik
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
32
|
Fan F, Chen D, Zhao Y, Wang H, Sun H, Sun K. Rapid preliminary purity evaluation of tumor biopsies using deep learning approach. Comput Struct Biotechnol J 2020; 18:1746-1753. [PMID: 32695267 PMCID: PMC7352054 DOI: 10.1016/j.csbj.2020.06.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 05/18/2020] [Accepted: 06/05/2020] [Indexed: 12/29/2022] Open
Abstract
Tumor biopsy is one of the most widely used materials in cancer diagnoses and molecular studies, where the purity of the biopsies (i.e., proportion of cells that are cancerous) is crucial for both applications. However, conventional approaches for tumor biopsy purity evaluation require experienced pathologists and/or various materials/experiments therefore were time-consuming and error prone. Rapid, easy-to-perform and cost-effective methods are thus still of demand. Recent studies had demonstrated that molecular signatures were informative to this task. Previously, we had developed GeneCT, a deep learning-based cancerous status and tissue-of-origin classifier for pan-tumor/tissue biopsies. In the current work, we applied GeneCT on datasets collected from various groups, where the experimental protocols and cancer types differed from each other. We found that GeneCT showed high accuracies on most datasets; for samples with unexpected results, in-depth investigations suggested that they might suffer from imperfect purity. In silico mixture experiments further showed that GeneCT classification was highly indicative in predicting the purity of the tumor biopsies. Considering that transcriptome profiling is a common and inexpensive experiment in molecular cancer studies, our deep learning-based GeneCT could thus serve as a valuable tool for rapid, preliminary tumor biopsy purity assessment.
Collapse
Affiliation(s)
- Fei Fan
- Department of Neurosurgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Dan Chen
- The Third Affiliated Hospital (Provisional) of The Chinese University of Hong, Shenzhen, Shenzhen 518172, China
| | - Yu Zhao
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong SAR 999077, China
| | - Huating Wang
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong SAR 999077, China.,Department of Orthopaedics and Traumatology, The Chinese University of Hong Kong, Hong Kong SAR 999077, China
| | - Hao Sun
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong SAR 999077, China.,Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong SAR 999077, China
| | - Kun Sun
- Shenzhen Bay Laboratory, Shenzhen 518132, China
| |
Collapse
|
33
|
Hicks SC, Irizarry RA. methylCC: technology-independent estimation of cell type composition using differentially methylated regions. Genome Biol 2019; 20:261. [PMID: 31783894 PMCID: PMC6883691 DOI: 10.1186/s13059-019-1827-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 09/19/2019] [Indexed: 01/01/2023] Open
Abstract
A major challenge in the analysis of DNA methylation (DNAm) data is variability introduced from intra-sample cellular heterogeneity, such as whole blood which is a convolution of DNAm profiles across a unique cell type. When this source of variability is confounded with an outcome of interest, if unaccounted for, false positives ensue. Current methods to estimate the cell type proportions in whole blood DNAm samples are only appropriate for one technology and lead to technology-specific biases if applied to data generated from other technologies. Here, we propose the technology-independent alternative: methylCC, which is available at https://github.com/stephaniehicks/methylCC.
Collapse
Affiliation(s)
- Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St,, Baltimore, USA
| | - Rafael A Irizarry
- Department Data Sciences, Dana-Farber Cancer Institute, 450 Brookline Ave,, Boston, USA. .,Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, USA.
| |
Collapse
|
34
|
Yin L, Luo Y, Xu X, Wen S, Wu X, Lu X, Xie H. Virtual methylome dissection facilitated by single-cell analyses. Epigenetics Chromatin 2019; 12:66. [PMID: 31711526 PMCID: PMC6844058 DOI: 10.1186/s13072-019-0310-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 10/21/2019] [Indexed: 12/31/2022] Open
Abstract
Background Numerous cell types can be identified within plant tissues and animal organs, and the epigenetic modifications underlying such enormous cellular heterogeneity are just beginning to be understood. It remains a challenge to infer cellular composition using DNA methylomes generated for mixed cell populations. Here, we propose a semi-reference-free procedure to perform virtual methylome dissection using the nonnegative matrix factorization (NMF) algorithm. Results In the pipeline that we implemented to predict cell-subtype percentages, putative cell-type-specific methylated (pCSM) loci were first determined according to their DNA methylation patterns in bulk methylomes and clustered into groups based on their correlations in methylation profiles. A representative set of pCSM loci was then chosen to decompose target methylomes into multiple latent DNA methylation components (LMCs). To test the performance of this pipeline, we made use of single-cell brain methylomes to create synthetic methylomes of known cell composition. Compared with highly variable CpG sites, pCSM loci achieved a higher prediction accuracy in the virtual methylome dissection of synthetic methylomes. In addition, pCSM loci were shown to be good predictors of the cell type of the sorted brain cells. The software package developed in this study is available in the GitHub repository (https://github.com/Gavin-Yinld). Conclusions We anticipate that the pipeline implemented in this study will be an innovative and valuable tool for the decoding of cellular heterogeneity.
Collapse
Affiliation(s)
- Liduo Yin
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, 100101, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China
| | - Yanting Luo
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xiguang Xu
- Epigenomics and Computational Biology Lab, Fralin Life Sciences Institute at Virginia Tech, Virginia Tech, Blacksburg, VA, 24061, USA.,Department of Biological Sciences, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Shiyu Wen
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Xuemei Lu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China. .,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China. .,School of Future Technology, University of Chinese Academy of Sciences, Beijing, 100101, China.
| | - Hehuang Xie
- Epigenomics and Computational Biology Lab, Fralin Life Sciences Institute at Virginia Tech, Virginia Tech, Blacksburg, VA, 24061, USA. .,Department of Biological Sciences, Virginia Tech, Blacksburg, VA, 24061, USA. .,Department of Biomedical Sciences and Pathobiology, Virginia-Maryland College of Veterinary Medicine, Virginia Tech, Blacksburg, VA, 24061, USA.
| |
Collapse
|
35
|
Li Z, Wu H. TOAST: improving reference-free cell composition estimation by cross-cell type differential analysis. Genome Biol 2019; 20:190. [PMID: 31484546 PMCID: PMC6727351 DOI: 10.1186/s13059-019-1778-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 07/30/2019] [Indexed: 02/07/2023] Open
Abstract
In the analysis of high-throughput data from complex samples, cell composition is an important factor that needs to be accounted for. Except for a limited number of tissues with known pure cell type profiles, a majority of genomics and epigenetics data relies on the "reference-free deconvolution" methods to estimate cell composition. We develop a novel computational method to improve reference-free deconvolution, which iteratively searches for cell type-specific features and performs composition estimation. Simulation studies and applications to six real datasets including both DNA methylation and gene expression data demonstrate favorable performance of the proposed method. TOAST is available at https://bioconductor.org/packages/TOAST .
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, 30322, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, 30322, GA, USA.
| |
Collapse
|
36
|
Rahmani E, Schweiger R, Rhead B, Criswell LA, Barcellos LF, Eskin E, Rosset S, Sankararaman S, Halperin E. Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology. Nat Commun 2019; 10:3417. [PMID: 31366909 PMCID: PMC6668473 DOI: 10.1038/s41467-019-11052-9] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 06/17/2019] [Indexed: 02/07/2023] Open
Abstract
High costs and technical limitations of cell sorting and single-cell techniques currently restrict the collection of large-scale, cell-type-specific DNA methylation data. This, in turn, impedes our ability to tackle key biological questions that pertain to variation within a population, such as identification of disease-associated genes at a cell-type-specific resolution. Here, we show mathematically and empirically that cell-type-specific methylation levels of an individual can be learned from its tissue-level bulk data, conceptually emulating the case where the individual has been profiled with a single-cell resolution and then signals were aggregated in each cell population separately. Provided with this unprecedented way to perform powerful large-scale epigenetic studies with cell-type-specific resolution, we revisit previous studies with tissue-level bulk methylation and reveal novel associations with leukocyte composition in blood and with rheumatoid arthritis. For the latter, we further show consistency with validation data collected from sorted leukocyte sub-types.
Collapse
Affiliation(s)
- Elior Rahmani
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| | - Regev Schweiger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 6997801, Israel
- MyHeritage Ltd., Or Yehuda, 6037606, Israel
| | - Brooke Rhead
- Computational Biology Graduate Group, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Lindsey A Criswell
- Russell/Engleman Rheumatology Research Center, Department of Medicine, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Lisa F Barcellos
- School of Public Health, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Saharon Rosset
- Department of Statistics, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Eran Halperin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Department of Anesthesiology and Perioperative Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| |
Collapse
|
37
|
Luo X, Yang C, Wei Y. Detection of cell-type-specific risk-CpG sites in epigenome-wide association studies. Nat Commun 2019; 10:3113. [PMID: 31308366 PMCID: PMC6629651 DOI: 10.1038/s41467-019-10864-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Accepted: 06/06/2019] [Indexed: 02/06/2023] Open
Abstract
In epigenome-wide association studies, the measured signals for each sample are a mixture of methylation profiles from different cell types. Current approaches to the association detection claim whether a cytosine-phosphate-guanine (CpG) site is associated with the phenotype or not at aggregate level and can suffer from low statistical power. Here, we propose a statistical method, HIgh REsolution (HIRE), which not only improves the power of association detection at aggregate level as compared to the existing methods but also enables the detection of risk-CpG sites for individual cell types. Cellular heterogeneity is one of the major confounding factors in EWAS studies. Here the authors present a statistical method, HIgh REsolution (HIRE), which enables the detection of risk-CpG sites for individual cell types.
Collapse
Affiliation(s)
- Xiangyu Luo
- Institute of Statistics and Big Data, Renmin University of China, 100872, Beijing, China.,Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Can Yang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| | - Yingying Wei
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
38
|
Thompson M, Chen ZJ, Rahmani E, Halperin E. CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets. Genome Biol 2019; 20:138. [PMID: 31300005 PMCID: PMC6624895 DOI: 10.1186/s13059-019-1743-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 06/21/2019] [Indexed: 12/11/2022] Open
Abstract
Methylation datasets are affected by innumerable sources of variability, both biological (cell-type composition, genetics) and technical (batch effects). Here, we propose a reference-free method based on sparse canonical correlation analysis to separate the biological from technical sources of variability. We show through simulations and real data that our method, CONFINED, is not only more accurate than the state-of-the-art reference-free methods for capturing known, replicable biological variability, but it is also considerably more robust to dataset-specific technical variability than previous approaches. CONFINED is available as an R package as detailed at https://github.com/cozygene/CONFINED.
Collapse
Affiliation(s)
- Mike Thompson
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - Zeyuan Johnson Chen
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - Elior Rahmani
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - Eran Halperin
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA. .,Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA. .,Department of Anesthesiology and Perioperative Medicine, University of California Los Angeles, Los Angeles, CA, USA. .,Department of Biomathematics, University of California Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
39
|
BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference. Genome Biol 2018; 19:141. [PMID: 30241486 PMCID: PMC6151042 DOI: 10.1186/s13059-018-1513-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 08/20/2018] [Indexed: 11/10/2022] Open
Abstract
We introduce a Bayesian semi-supervised method for estimating cell counts from DNA methylation by leveraging an easily obtainable prior knowledge on the cell-type composition distribution of the studied tissue. We show mathematically and empirically that alternative methods which attempt to infer cell counts without methylation reference only capture linear combinations of cell counts rather than provide one component per cell type. Our approach allows the construction of components such that each component corresponds to a single cell type, and provides a new opportunity to investigate cell compositions in genomic studies of tissues for which it was not possible before.
Collapse
|