1
|
Nazer N, Sepehri MH, Mohammadzade H, Mehrmohamadi M. A novel approach toward optimal workflow selection for DNA methylation biomarker discovery. BMC Bioinformatics 2024; 25:37. [PMID: 38262949 PMCID: PMC10804576 DOI: 10.1186/s12859-024-05658-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 01/15/2024] [Indexed: 01/25/2024] Open
Abstract
DNA methylation is a major epigenetic modification involved in many physiological processes. Normal methylation patterns are disrupted in many diseases and methylation-based biomarkers have shown promise in several contexts. Marker discovery typically involves the analysis of publicly available DNA methylation data from high-throughput assays. Numerous methods for identification of differentially methylated biomarkers have been developed, making the need for best practices guidelines and context-specific analyses workflows exceedingly high. To this end, here we propose TASA, a novel method for simulating methylation array data in various scenarios. We then comprehensively assess different data analysis workflows using real and simulated data and suggest optimal start-to-finish analysis workflows. Our study demonstrates that the choice of analysis pipeline for DNA methylation-based marker discovery is crucial and different across different contexts.
Collapse
Affiliation(s)
- Naghme Nazer
- Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran
| | | | - Hoda Mohammadzade
- Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran
| | - Mahya Mehrmohamadi
- Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran.
| |
Collapse
|
2
|
Giuili E, Grolaux R, Macedo CZNM, Desmyter L, Pichon B, Neuens S, Vilain C, Olsen C, Van Dooren S, Smits G, Defrance M. Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs). Hum Genet 2023; 142:1721-1735. [PMID: 37889307 PMCID: PMC10676303 DOI: 10.1007/s00439-023-02609-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023]
Abstract
Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models' predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data.
Collapse
Affiliation(s)
- Edoardo Giuili
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Brussels, Belgium
| | - Robin Grolaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Brussels, Belgium
| | - Catarina Z N M Macedo
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Brussels, Belgium
| | - Laurence Desmyter
- Center for Human Genetics, Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, Brussels, Belgium
| | - Bruno Pichon
- Center for Human Genetics, Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, Brussels, Belgium
| | - Sebastian Neuens
- Center for Human Genetics, Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, Brussels, Belgium
- Department of Genetics, Hôpital Universitaire Des Enfants Reine Fabiola, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, Brussels, Belgium
| | - Catheline Vilain
- Center for Human Genetics, Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, Brussels, Belgium
- Department of Genetics, Hôpital Universitaire Des Enfants Reine Fabiola, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, Brussels, Belgium
| | - Catharina Olsen
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Brussels, Belgium
- Clinical Sciences, Research Group Reproduction and Genetics, Brussels Interuniversity Genomics High Throughput Core (BRIGHTcore), Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Brussels, Belgium
- Clinical Sciences, Research Group Reproduction and Genetics, Centre for Medical Genetics, Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Brussels, Belgium
| | - Sonia Van Dooren
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Brussels, Belgium
- Clinical Sciences, Research Group Reproduction and Genetics, Brussels Interuniversity Genomics High Throughput Core (BRIGHTcore), Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Brussels, Belgium
- Clinical Sciences, Research Group Reproduction and Genetics, Centre for Medical Genetics, Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Brussels, Belgium
| | - Guillaume Smits
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Brussels, Belgium
- Center for Human Genetics, Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, Brussels, Belgium
- Department of Genetics, Hôpital Universitaire Des Enfants Reine Fabiola, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, Brussels, Belgium
| | - Matthieu Defrance
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Brussels, Belgium.
| |
Collapse
|
3
|
Singh M, Spendlove SJ, Wei A, Bondhus LM, Nava AA, de L Vitorino FN, Amano S, Lee J, Echeverria G, Gomez D, Garcia BA, Arboleda VA. KAT6A mutations in Arboleda-Tham syndrome drive epigenetic regulation of posterior HOXC cluster. Hum Genet 2023; 142:1705-1720. [PMID: 37861717 PMCID: PMC10676314 DOI: 10.1007/s00439-023-02608-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 09/28/2023] [Indexed: 10/21/2023]
Abstract
Arboleda-Tham Syndrome (ARTHS) is a rare genetic disorder caused by heterozygous, de novo mutations in Lysine(K) acetyltransferase 6A (KAT6A). ARTHS is clinically heterogeneous and characterized by several common features, including intellectual disability, developmental and speech delay, and hypotonia, and affects multiple organ systems. KAT6A is the enzymatic core of a histone-acetylation protein complex; however, the direct histone targets and gene regulatory effects remain unknown. In this study, we use ARTHS patient (n = 8) and control (n = 14) dermal fibroblasts and perform comprehensive profiling of the epigenome and transcriptome caused by KAT6A mutations. We identified differential chromatin accessibility within the promoter or gene body of 23% (14/60) of genes that were differentially expressed between ARTHS and controls. Within fibroblasts, we show a distinct set of genes from the posterior HOXC gene cluster (HOXC10, HOXC11, HOXC-AS3, HOXC-AS2, and HOTAIR) that are overexpressed in ARTHS and are transcription factors critical for early development body segment patterning. The genomic loci harboring HOXC genes are epigenetically regulated with increased chromatin accessibility, high levels of H3K23ac, and increased gene-body DNA methylation compared to controls, all of which are consistent with transcriptomic overexpression. Finally, we used unbiased proteomic mass spectrometry and identified two new histone post-translational modifications (PTMs) that are disrupted in ARTHS: H2A and H3K56 acetylation. Our multi-omics assays have identified novel histone and gene regulatory roles of KAT6A in a large group of ARTHS patients harboring diverse pathogenic mutations. This work provides insight into the role of KAT6A on the epigenomic regulation in somatic cell types.
Collapse
Affiliation(s)
- Meghna Singh
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Sarah J Spendlove
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Interdepartmental BioInformatics Program, UCLA, Los Angeles, CA, USA
| | - Angela Wei
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Interdepartmental BioInformatics Program, UCLA, Los Angeles, CA, USA
| | - Leroy M Bondhus
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Aileen A Nava
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Francisca N de L Vitorino
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, St. Louis, MO, USA
| | - Seth Amano
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Jacob Lee
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Gesenia Echeverria
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Dianne Gomez
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Benjamin A Garcia
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, St. Louis, MO, USA
| | - Valerie A Arboleda
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA.
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Interdepartmental BioInformatics Program, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
4
|
Singh M, Spendlove S, Wei A, Bondhus L, Nava A, de L. Vitorino FN, Amano S, Lee J, Echeverria G, Gomez D, Garcia BA, Arboleda VA. KAT6A mutations in Arboleda-Tham syndrome drive epigenetic regulation of posterior HOXC cluster. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.03.550595. [PMID: 37577627 PMCID: PMC10418288 DOI: 10.1101/2023.08.03.550595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Arboleda-Tham Syndrome (ARTHS) is a rare genetic disorder caused by heterozygous, de novo truncating mutations in Lysine(K) acetyltransferase 6A (KAT6A). ARTHS is clinically heterogeneous and characterized by several common features including intellectual disability, developmental and speech delay, hypotonia and affects multiple organ systems. KAT6A is highly expressed in early development and plays a key role in cell-type specific differentiation. KAT6A is the enzymatic core of a histone-acetylation protein complex, however the direct histone targets and gene regulatory effects remain unknown. In this study, we use ARTHS patient (n=8) and control (n=14) dermal fibroblasts and perform comprehensive profiling of the epigenome and transcriptome caused by KAT6A mutations. We identified differential chromatin accessibility within the promoter or gene body of 23%(14/60) of genes that were differentially expressed between ARTHS and controls. Within fibroblasts, we show a distinct set of genes from the posterior HOXC gene cluster (HOXC10, HOXC11, HOXC-AS3, HOXC-AS2, HOTAIR) that are overexpressed in ARTHS and are transcription factors critical for early development body segment patterning. The genomic loci harboring HOXC genes are epigenetically regulated with increased chromatin accessibility, high levels of H3K23ac, and increased gene-body DNA methylation compared to controls, all of which are consistent with transcriptomic overexpression. Finally, we used unbiased proteomic mass spectrometry and identified two new histone post-translational modifications (PTMs) that are disrupted in ARTHS: H2A and H3K56 acetylation. Our multi-omics assays have identified novel histone and gene regulatory roles of KAT6A in a large group of ARTHS patients harboring diverse pathogenic mutations. This work provides insight into the role of KAT6A on the epigenomic regulation in somatic cell types.
Collapse
Affiliation(s)
- Meghna Singh
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Sarah Spendlove
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Interdepartmental BioInformatics Program, UCLA
| | - Angela Wei
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Interdepartmental BioInformatics Program, UCLA
| | - Leroy Bondhus
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Aileen Nava
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | | | - Seth Amano
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Jacob Lee
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Gesenia Echeverria
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Dianne Gomez
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Benjamin A. Garcia
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis
| | - Valerie A. Arboleda
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Interdepartmental BioInformatics Program, UCLA
| |
Collapse
|
5
|
Budkina A, Medvedeva YA, Stupnikov A. Assessing the Differential Methylation Analysis Quality for Microarray and NGS Platforms. Int J Mol Sci 2023; 24:ijms24108591. [PMID: 37239934 DOI: 10.3390/ijms24108591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 04/28/2023] [Accepted: 05/07/2023] [Indexed: 05/28/2023] Open
Abstract
Differential methylation (DM) is actively recruited in different types of fundamental and translational studies. Currently, microarray- and NGS-based approaches for methylation analysis are the most widely used with multiple statistical models designed to extract differential methylation signatures. The benchmarking of DM models is challenging due to the absence of gold standard data. In this study, we analyze an extensive number of publicly available NGS and microarray datasets with divergent and widely utilized statistical models and apply the recently suggested and validated rank-statistic-based approach Hobotnica to evaluate the quality of their results. Overall, microarray-based methods demonstrate more robust and convergent results, while NGS-based models are highly dissimilar. Tests on the simulated NGS data tend to overestimate the quality of the DM methods and therefore are recommended for use with caution. Evaluation of the top 10 DMC and top 100 DMC in addition to the not-subset signature also shows more stable results for microarray data. Summing up, given the observed heterogeneity in NGS methylation data, the evaluation of newly generated methylation signatures is a crucial step in DM analysis. The Hobotnica metric is coordinated with previously developed quality metrics and provides a robust, sensitive, and informative estimation of methods' performance and DM signatures' quality in the absence of gold standard data solving a long-existing problem in DM analysis.
Collapse
Affiliation(s)
- Anna Budkina
- Department of Biomedical Physics, Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Yulia A Medvedeva
- Department of Biomedical Physics, Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
- Federal State Institution «Federal Research Centre «Fundamentals of Biotechnology» of the Russian Academy of Sciences», 119071 Moscow, Russia
| | - Alexey Stupnikov
- Department of Biomedical Physics, Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| |
Collapse
|