1
|
Stricker M, Zhang W, Cheng WY, Gazal S, Dendrou C, Nahkuri S, Palamara PF. Genome-wide classification of epigenetic activity reveals regions of enriched heritability in immune-related traits. CELL GENOMICS 2024; 4:100469. [PMID: 38190103 PMCID: PMC10794845 DOI: 10.1016/j.xgen.2023.100469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 07/04/2023] [Accepted: 11/29/2023] [Indexed: 01/09/2024]
Abstract
Epigenetics underpins the regulation of genes known to play a key role in the adaptive and innate immune system (AIIS). We developed a method, EpiNN, that leverages epigenetic data to detect AIIS-relevant genomic regions and used it to detect 2,765 putative AIIS loci. Experimental validation of one of these loci, DNMT1, provided evidence for a novel AIIS-specific transcription start site. We built a genome-wide AIIS annotation and used linkage disequilibrium (LD) score regression to test whether it predicts regional heritability using association statistics for 176 traits. We detected significant heritability effects (average |τ∗|=1.65) for 20 out of 26 immune-relevant traits. In a meta-analysis, immune-relevant traits and diseases were 4.45× more enriched for heritability than other traits. The EpiNN annotation was also depleted of trans-ancestry genetic correlation, indicating ancestry-specific effects. These results underscore the effectiveness of leveraging supervised learning algorithms and epigenetic data to detect loci implicated in specific classes of traits and diseases.
Collapse
Affiliation(s)
| | - Weijiao Zhang
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Wei-Yi Cheng
- Data & Analytics, Roche Pharma Research & Early Development, Roche Innovation Center New York, Little Falls, NJ, USA
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Calliope Dendrou
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Satu Nahkuri
- Data & Analytics, Roche Pharma Research & Early Development, Roche Innovation Center Zürich, Zürich, Switzerland.
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK; Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
2
|
Niggl E, Bouman A, Briere LC, Hoogenboezem RM, Wallaard I, Park J, Admard J, Wilke M, Harris-Mostert EDRO, Elgersma M, Bain J, Balasubramanian M, Banka S, Benke PJ, Bertrand M, Blesson AE, Clayton-Smith J, Ellingford JM, Gillentine MA, Goodloe DH, Haack TB, Jain M, Krantz I, Luu SM, McPheron M, Muss CL, Raible SE, Robin NH, Spiller M, Starling S, Sweetser DA, Thiffault I, Vetrini F, Witt D, Woods E, Zhou D, Elgersma Y, van Esbroeck ACM. HNRNPC haploinsufficiency affects alternative splicing of intellectual disability-associated genes and causes a neurodevelopmental disorder. Am J Hum Genet 2023; 110:1414-1435. [PMID: 37541189 PMCID: PMC10432175 DOI: 10.1016/j.ajhg.2023.07.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 07/04/2023] [Accepted: 07/07/2023] [Indexed: 08/06/2023] Open
Abstract
Heterogeneous nuclear ribonucleoprotein C (HNRNPC) is an essential, ubiquitously abundant protein involved in mRNA processing. Genetic variants in other members of the HNRNP family have been associated with neurodevelopmental disorders. Here, we describe 13 individuals with global developmental delay, intellectual disability, behavioral abnormalities, and subtle facial dysmorphology with heterozygous HNRNPC germline variants. Five of them bear an identical in-frame deletion of nine amino acids in the extreme C terminus. To study the effect of this recurrent variant as well as HNRNPC haploinsufficiency, we used induced pluripotent stem cells (iPSCs) and fibroblasts obtained from affected individuals. While protein localization and oligomerization were unaffected by the recurrent C-terminal deletion variant, total HNRNPC levels were decreased. Previously, reduced HNRNPC levels have been associated with changes in alternative splicing. Therefore, we performed a meta-analysis on published RNA-seq datasets of three different cell lines to identify a ubiquitous HNRNPC-dependent signature of alternative spliced exons. The identified signature was not only confirmed in fibroblasts obtained from an affected individual but also showed a significant enrichment for genes associated with intellectual disability. Hence, we assessed the effect of decreased and increased levels of HNRNPC on neuronal arborization and neuronal migration and found that either condition affects neuronal function. Taken together, our data indicate that HNRNPC haploinsufficiency affects alternative splicing of multiple intellectual disability-associated genes and that the developing brain is sensitive to aberrant levels of HNRNPC. Hence, our data strongly support the inclusion of HNRNPC to the family of HNRNP-related neurodevelopmental disorders.
Collapse
Affiliation(s)
- Eva Niggl
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, the Netherlands; ENCORE Expertise Center for Neurodevelopmental Disorders, Erasmus MC, 3015 GD Rotterdam, the Netherlands
| | - Arjan Bouman
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, the Netherlands.
| | - Lauren C Briere
- Center for Genomic Medicine and Department of Pediatrics, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | - Ilse Wallaard
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, the Netherlands; ENCORE Expertise Center for Neurodevelopmental Disorders, Erasmus MC, 3015 GD Rotterdam, the Netherlands
| | - Joohyun Park
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, 72076 Tübingen, Germany
| | - Jakob Admard
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, 72076 Tübingen, Germany; NGS Competence Center Tübingen, Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Martina Wilke
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, the Netherlands
| | - Emilio D R O Harris-Mostert
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, the Netherlands; ENCORE Expertise Center for Neurodevelopmental Disorders, Erasmus MC, 3015 GD Rotterdam, the Netherlands
| | - Minetta Elgersma
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, the Netherlands; ENCORE Expertise Center for Neurodevelopmental Disorders, Erasmus MC, 3015 GD Rotterdam, the Netherlands
| | - Jennifer Bain
- Department of Neurology Division of Child Neurology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Meena Balasubramanian
- Sheffield Clinical Genetics Service, Sheffield Children's NHS Foundation Trust, S5 7AU Sheffield, UK; Department of Oncology & Metabolism, University of Sheffield, S5 7AU Sheffield, UK
| | - Siddharth Banka
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester M13 9WL, UK; Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, M13 9PL Manchester, UK
| | - Paul J Benke
- Division of Clinical Genetics, Joe DiMaggio Children's Hospital, Hollywood, FL 33021, USA
| | - Miriam Bertrand
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, 72076 Tübingen, Germany
| | - Alyssa E Blesson
- Department of Neurogenetics, Kennedy Krieger Institute, Baltimore, MD 21205, USA
| | - Jill Clayton-Smith
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester M13 9WL, UK; Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, M13 9PL Manchester, UK
| | - Jamie M Ellingford
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Manchester M13 9WL, UK; Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, M13 9PL Manchester, UK
| | | | - Dana H Goodloe
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| | - Tobias B Haack
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, 72076 Tübingen, Germany; Center for Rare Diseases, University of Tübingen, 72076 Tübingen, Germany
| | - Mahim Jain
- Department of Neurogenetics, Kennedy Krieger Institute, Baltimore, MD 21205, USA
| | - Ian Krantz
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Sharon M Luu
- Waisman Center, University of Wisconsin Hospitals and Clinics, Madison, WI 53704, USA; Department of Medical and Molecular Genetics, Indiana University, Indianapolis, IN 46202, USA
| | - Molly McPheron
- Department of Medical and Molecular Genetics, Indiana University, Indianapolis, IN 46202, USA
| | - Candace L Muss
- Nemours / AI DuPont Hospital for Children, Wilmington, DE 19803, USA
| | - Sarah E Raible
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Nathaniel H Robin
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| | - Michael Spiller
- Sheffield Diagnostic Genetics Service, Sheffield Children's NHS Foundation Trust, Sheffield, UK
| | - Susan Starling
- Division of Clinical Genetics, Children's Mercy, Kansas City, MO 64108, USA; School of Medicine, University of Missouri- Kansas City, Kansas City, MO 64108, USA
| | - David A Sweetser
- Center for Genomic Medicine and Department of Pediatrics, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Isabelle Thiffault
- Division of Clinical Genetics, Children's Mercy, Kansas City, MO 64108, USA; Genomic Medicine Center, Children's Mercy Research Institute, Kansas City, MO 64108, USA; Department of Pathology and Laboratory Medicine, Children's Mercy Kansas City, Kansas City, MO 64108, USA
| | - Francesco Vetrini
- Department of Medical and Molecular Genetics, Indiana University, Indianapolis, IN 46202, USA; Undiagnosed Rare Disease Clinic (URDC), Indiana University, Indianapolis, IN 46202, USA
| | - Dennis Witt
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, 72076 Tübingen, Germany
| | - Emily Woods
- Sheffield Clinical Genetics Service, Sheffield Children's NHS Foundation Trust, S5 7AU Sheffield, UK
| | - Dihong Zhou
- Division of Clinical Genetics, Children's Mercy, Kansas City, MO 64108, USA; School of Medicine, University of Missouri- Kansas City, Kansas City, MO 64108, USA
| | - Ype Elgersma
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, the Netherlands; ENCORE Expertise Center for Neurodevelopmental Disorders, Erasmus MC, 3015 GD Rotterdam, the Netherlands.
| | - Annelot C M van Esbroeck
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, the Netherlands; ENCORE Expertise Center for Neurodevelopmental Disorders, Erasmus MC, 3015 GD Rotterdam, the Netherlands
| |
Collapse
|
3
|
Li RY, Huang Y, Zhao Z, Qin ZS. Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome. Data Brief 2023; 46:108827. [PMID: 36582986 PMCID: PMC9792340 DOI: 10.1016/j.dib.2022.108827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 11/21/2022] [Accepted: 12/09/2022] [Indexed: 12/15/2022] Open
Abstract
This manuscript presents a comprehensive collection of diverse epigenomic profiling data for the human genome in 100-bp resolution with full genome-wide coverage. The datasets are processed from raw read count data collected from five types of sequencing-based assays collected by the Encyclopedia of DNA Elements consortium (ENCODE, http://www.encodeproject.org). Data from high-throughput sequencing assays were processed and crystallized into a total of 6,305 genome-wide profiles. To ensure the quality of the features, we filtered out assays with low read depth, inconsistent read counts, and poor data quality. The types of sequencing-based experiment assays include DNase-seq, histone and TF ChIP-seq, ATAC-seq, and Poly(A) RNA-seq. Merging of processed data was done by averaging read counts across technical replicates to obtain signals in about 30 million predefined 100-bp bins that tile the entire genome. We provide an example of fetching read counts using disease-related risk variants from the GWAS Catalog. Additionally, we have created a tabix index enabling fast user retrieval of read counts given coordinates in the human genome. The data processing pipeline is replicable for users' own purposes and for other experimental assays. The processed data can be found on Zenodo at https://zenodo.org/record/7015783. These data can be used as features for statistical and machine learning models to predict or infer a wide range of variables of biological interest. They can also be applied to generate novel insights into gene expression, chromatin accessibility, and epigenetic modifications across the human genome. Finally, the processing pipeline can be easily applied to data from any other genome-wide profiling assays, expanding the amount of available data.
Collapse
Affiliation(s)
- Ronnie Y. Li
- Graduate program in Neuroscience, Emory University, United States
| | - Yanting Huang
- Department of Computer Science, Emory University, United States
| | - Zhiyue Zhao
- Department of Computer Science, Emory University, United States
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Emory University, United States
| |
Collapse
|
4
|
Huang L, Zhang L, Chen X. Updated review of advances in microRNAs and complex diseases: taxonomy, trends and challenges of computational models. Brief Bioinform 2022; 23:6686738. [PMID: 36056743 DOI: 10.1093/bib/bbac358] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/24/2022] [Accepted: 07/30/2022] [Indexed: 12/12/2022] Open
Abstract
Since the problem proposed in late 2000s, microRNA-disease association (MDA) predictions have been implemented based on the data fusion paradigm. Integrating diverse data sources gains a more comprehensive research perspective, and brings a challenge to algorithm design for generating accurate, concise and consistent representations of the fused data. After more than a decade of research progress, a relatively simple algorithm like the score function or a single computation layer may no longer be sufficient for further improving predictive performance. Advanced model design has become more frequent in recent years, particularly in the form of reasonably combing multiple algorithms, a process known as model fusion. In the current review, we present 29 state-of-the-art models and introduce the taxonomy of computational models for MDA prediction based on model fusion and non-fusion. The new taxonomy exhibits notable changes in the algorithmic architecture of models, compared with that of earlier ones in the 2017 review by Chen et al. Moreover, we discuss the progresses that have been made towards overcoming the obstacles to effective MDA prediction since 2017 and elaborated on how future models can be designed according to a set of new schemas. Lastly, we analysed the strengths and weaknesses of each model category in the proposed taxonomy and proposed future research directions from diverse perspectives for enhancing model performance.
Collapse
Affiliation(s)
- Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, 10084, China.,The Future Laboratory, Tsinghua University, Beijing, 10084, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.,Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
5
|
Yuan Y, Yan H, Cui Z, Liu Z, Su W, Zhang R. Quantum Chemical Calculations with Machine Learning for Multipolar Electrostatics Prediction in RNA: An Application to Pentose. J Chem Inf Model 2022; 62:4122-4133. [PMID: 36036609 DOI: 10.1021/acs.jcim.2c00747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
To develop a realistic electrostatic model that allows for the anisotropy of the atomic electron density, high-rank atomic multipole moments computed by quantum chemical calculations have been studied extensively. However, it is hard to process huge RNA systems only relying on quantum chemical calculations due to its highly computational cost. In this study, we employ five machine learning methods of Gaussian process regression with automatic relevance determination (ARDGPR), Kriging, radial basis function neural networks, Bagging, and generalized regression neural network to predict atomic multipole moments. Atom-atom electrostatic interaction energies are subsequently computed using the predicted atomic multipole moments in the pilot system pentose of RNA. Here, the performance of the five methods is compared in terms of both the multipole moment prediction errors and the electrostatic energy prediction errors. For the predicted high-rank multipole moments of the four elements (O, C, N, and H) in capped pentose, ARDGPR and Kriging consistently outperform the other three methods. Therefore, the multipole moments predicted by the two best methods of ARDGPR and Kriging are then used to predict electrostatic interaction energy of each pentose. Finally, the absolute average energy errors of ARDGPR and Kriging are 1.83 and 4.33 kJ mol-1, respectively. Compared to Kriging, the ARDGPR method achieves a 58% decrease in the absolute average energy error. These satisfactory results demonstrated that the ARDGPR method with the strong feature extraction ability can predict the electrostatic interaction energy of pentose in RNA correctly and reliably.
Collapse
Affiliation(s)
- Yongna Yuan
- School of Information Science & Engineering, Lanzhou University, Lanzhou, China, 730000
| | - Haoqiu Yan
- School of Information Science & Engineering, Lanzhou University, Lanzhou, China, 730000
| | - Zeyang Cui
- School of Information Science & Engineering, Lanzhou University, Lanzhou, China, 730000
| | - Zhenyu Liu
- School of Cyberspace Security, Gansu University of Political Science and Law, Lanzhou, China, 730070
| | - Wei Su
- School of Information Science & Engineering, Lanzhou University, Lanzhou, China, 730000
| | - Ruisheng Zhang
- School of Information Science & Engineering, Lanzhou University, Lanzhou, China, 730000
| |
Collapse
|
6
|
Gong Y, Srinivasan SS, Zhang R, Kessenbrock K, Zhang J. scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data. Biomolecules 2022; 12:874. [PMID: 35883430 PMCID: PMC9312957 DOI: 10.3390/biom12070874] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 06/16/2022] [Accepted: 06/16/2022] [Indexed: 02/04/2023] Open
Abstract
Recent advances in single-cell transposase-accessible chromatin using a sequencing assay (scATAC-seq) allow cellular heterogeneity dissection and regulatory landscape reconstruction with an unprecedented resolution. However, compared to bulk-sequencing, its ultra-high missingness remarkably reduces usable reads in each cell type, resulting in broader, fuzzier peak boundary definitions and limiting our ability to pinpoint functional regions and interpret variant impacts precisely. We propose a weakly supervised learning method, scEpiLock, to directly identify core functional regions from coarse peak labels and quantify variant impacts in a cell-type-specific manner. First, scEpiLock uses a multi-label classifier to predict chromatin accessibility via a deep convolutional neural network. Then, its weakly supervised object detection module further refines the peak boundary definition using gradient-weighted class activation mapping (Grad-CAM). Finally, scEpiLock provides cell-type-specific variant impacts within a given peak region. We applied scEpiLock to various scATAC-seq datasets and found that it achieves an area under receiver operating characteristic curve (AUC) of ~0.9 and an area under precision recall (AUPR) above 0.7. Besides, scEpiLock's object detection condenses coarse peaks to only ⅓ of their original size while still reporting higher conservation scores. In addition, we applied scEpiLock on brain scATAC-seq data and reported several genome-wide association studies (GWAS) variants disrupting regulatory elements around known risk genes for Alzheimer's disease, demonstrating its potential to provide cell-type-specific biological insights in disease studies.
Collapse
Affiliation(s)
- Yanwen Gong
- Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA;
- Department of Biological Chemistry, School of Medicine, University of California, Irvine, CA 92697, USA
| | | | - Ruiyi Zhang
- Department of Computer Science, University of California, Irvine, CA 92697, USA; (S.S.S.); (R.Z.)
| | - Kai Kessenbrock
- Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA;
- Department of Biological Chemistry, School of Medicine, University of California, Irvine, CA 92697, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, CA 92697, USA; (S.S.S.); (R.Z.)
| |
Collapse
|