1
|
Liu W, Kurkewich JL, Stoddart A, Khan S, Anandan D, Gaubil AN, Wolfgeher DJ, Jueng L, Kron SJ, McNerney ME. CUX1 regulates human hematopoietic stem cell chromatin accessibility via the BAF complex. Cell Rep 2024; 43:114227. [PMID: 38735044 DOI: 10.1016/j.celrep.2024.114227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 03/16/2024] [Accepted: 04/26/2024] [Indexed: 05/14/2024] Open
Abstract
CUX1 is a homeodomain-containing transcription factor that is essential for the development and differentiation of multiple tissues. CUX1 is recurrently mutated or deleted in cancer, particularly in myeloid malignancies. However, the mechanism by which CUX1 regulates gene expression and differentiation remains poorly understood, creating a barrier to understanding the tumor-suppressive functions of CUX1. Here, we demonstrate that CUX1 directs the BAF chromatin remodeling complex to DNA to increase chromatin accessibility in hematopoietic cells. CUX1 preferentially regulates lineage-specific enhancers, and CUX1 target genes are predictive of cell fate in vivo. These data indicate that CUX1 regulates hematopoietic lineage commitment and homeostasis via pioneer factor activity, and CUX1 deficiency disrupts these processes in stem and progenitor cells, facilitating transformation.
Collapse
Affiliation(s)
- Weihan Liu
- Department of Pathology, The University of Chicago, Chicago, IL 60637, USA; Committee on Cancer Biology, The University of Chicago, Chicago, IL 60637, USA
| | | | - Angela Stoddart
- Department of Pathology, The University of Chicago, Chicago, IL 60637, USA
| | - Saira Khan
- Department of Pathology, The University of Chicago, Chicago, IL 60637, USA
| | - Dhivyaa Anandan
- Department of Pathology, The University of Chicago, Chicago, IL 60637, USA
| | - Alexandre N Gaubil
- Department of Pathology, The University of Chicago, Chicago, IL 60637, USA
| | - Donald J Wolfgeher
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Lia Jueng
- Department of Pathology, The University of Chicago, Chicago, IL 60637, USA
| | - Stephen J Kron
- The University of Chicago Medicine Comprehensive Cancer Center, The University of Chicago, Chicago, IL 60637, USA; Committee on Cancer Biology, The University of Chicago, Chicago, IL 60637, USA; Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Megan E McNerney
- Department of Pathology, The University of Chicago, Chicago, IL 60637, USA; The University of Chicago Medicine Comprehensive Cancer Center, The University of Chicago, Chicago, IL 60637, USA; Committee on Cancer Biology, The University of Chicago, Chicago, IL 60637, USA; Department of Pediatrics, Section of Hematology/Oncology, The University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
2
|
Xiang G, He X, Giardine BM, Isaac KJ, Taylor DJ, McCoy RC, Jansen C, Keller CA, Wixom AQ, Cockburn A, Miller A, Qi Q, He Y, Li Y, Lichtenberg J, Heuston EF, Anderson SM, Luan J, Vermunt MW, Yue F, Sauria MEG, Schatz MC, Taylor J, Gottgens B, Hughes JR, Higgs DR, Weiss MJ, Cheng Y, Blobel GA, Bodine DM, Zhang Y, Li Q, Mahony S, Hardison RC. Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.02.535219. [PMID: 37066352 PMCID: PMC10103973 DOI: 10.1101/2023.04.02.535219] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state Regulatory Potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbored distinctive transcription factor binding motifs that were similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we showed that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.
Collapse
|
3
|
Foroozandeh Shahraki M, Farahbod M, Libbrecht MW. Robust chromatin state annotation. Genome Res 2024; 34:469-483. [PMID: 38514204 PMCID: PMC11067878 DOI: 10.1101/gr.278343.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 03/19/2024] [Indexed: 03/23/2024]
Abstract
With the goal of mapping genomic activity, international projects have recently measured epigenetic activity in hundreds of cell and tissue types. Chromatin state annotations produced by segmentation and genome annotation (SAGA) methods have emerged as the predominant way to summarize these epigenomic data sets in order to annotate the genome. These chromatin state annotations are essential for many genomic tasks, including identifying active regulatory elements and interpreting disease-associated genetic variation. However, despite the widespread applications of SAGA methods, no principled approach exists to evaluate the statistical significance of chromatin state assignments. Here, we propose the first method for assigning calibrated confidence scores to chromatin state annotations. Toward this goal, we performed a comprehensive evaluation of the reproducibility of the two most widely used existing SAGA methods, ChromHMM and Segway. We found that their predictions are frequently irreproducible. For example, when applying the same SAGA method on two sets of experimental replicates, 27%-69% of predicted enhancers fail to replicate. This suggests that a substantial fraction of predicted elements in existing chromatin state annotations cannot be relied upon. To remedy this problem, we introduce SAGAconf, a method for assigning a measure of confidence (r-value) to chromatin state annotations. SAGAconf works with any SAGA method and assigns an r-value to each genomic bin of a chromatin state annotation that represents the probability that the label of this bin will be reproduced in a replicated experiment. Thus, SAGAconf allows a researcher to select only the reliable predictions from a chromatin annotation for use in downstream analyses.
Collapse
Affiliation(s)
| | - Marjan Farahbod
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
| |
Collapse
|
4
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
5
|
Xiang G, Guo Y, Bumcrot D, Sigova A. JMnorm: a novel joint multi-feature normalization method for integrative and comparative epigenomics. Nucleic Acids Res 2024; 52:e11. [PMID: 38055833 PMCID: PMC10810286 DOI: 10.1093/nar/gkad1146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/25/2023] [Accepted: 11/14/2023] [Indexed: 12/08/2023] Open
Abstract
Combinatorial patterns of epigenetic features reflect transcriptional states and functions of genomic regions. While many epigenetic features have correlated relationships, most existing data normalization approaches analyze each feature independently. Such strategies may distort relationships between functionally correlated epigenetic features and hinder biological interpretation. We present a novel approach named JMnorm that simultaneously normalizes multiple epigenetic features across cell types, species, and experimental conditions by leveraging information from partially correlated epigenetic features. We demonstrate that JMnorm-normalized data can better preserve cross-epigenetic-feature correlations across different cell types and enhance consistency between biological replicates than data normalized by other methods. Additionally, we show that JMnorm-normalized data can consistently improve the performance of various downstream analyses, which include candidate cis-regulatory element clustering, cross-cell-type gene expression prediction, detection of transcription factor binding and changes upon perturbations. These findings suggest that JMnorm effectively minimizes technical noise while preserving true biologically significant relationships between epigenetic datasets. We anticipate that JMnorm will enhance integrative and comparative epigenomics.
Collapse
Affiliation(s)
- Guanjue Xiang
- CAMP4 Therapeutics Corp., One Kendall Square, Building 1400 West, Cambridge, MA 02139, USA
| | - Yuchun Guo
- CAMP4 Therapeutics Corp., One Kendall Square, Building 1400 West, Cambridge, MA 02139, USA
| | - David Bumcrot
- CAMP4 Therapeutics Corp., One Kendall Square, Building 1400 West, Cambridge, MA 02139, USA
| | - Alla Sigova
- CAMP4 Therapeutics Corp., One Kendall Square, Building 1400 West, Cambridge, MA 02139, USA
| |
Collapse
|
6
|
Gao C, Amador C, Walker RM, Campbell A, Madden RA, Adams MJ, Bai X, Liu Y, Li M, Hayward C, Porteous DJ, Shen X, Evans KL, Haley CS, McIntosh AM, Navarro P, Zeng Y. Phenome-wide analyses identify an association between the parent-of-origin effects dependent methylome and the rate of aging in humans. Genome Biol 2023; 24:117. [PMID: 37189164 PMCID: PMC10184337 DOI: 10.1186/s13059-023-02953-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 04/26/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND The variation in the rate at which humans age may be rooted in early events acting through the genomic regions that are influenced by such events and subsequently are related to health phenotypes in later life. The parent-of-origin-effect (POE)-regulated methylome includes regions enriched for genetically controlled imprinting effects (the typical type of POE) and regions influenced by environmental effects associated with parents (the atypical POE). This part of the methylome is heavily influenced by early events, making it a potential route connecting early exposures, the epigenome, and aging. We aim to test the association of POE-CpGs with early and later exposures and subsequently with health-related phenotypes and adult aging. RESULTS We perform a phenome-wide association analysis for the POE-influenced methylome using GS:SFHS (Ndiscovery = 5087, Nreplication = 4450). We identify and replicate 92 POE-CpG-phenotype associations. Most of the associations are contributed by the POE-CpGs belonging to the atypical class where the most strongly enriched associations are with aging (DNAmTL acceleration), intelligence, and parental (maternal) smoking exposure phenotypes. A proportion of the atypical POE-CpGs form co-methylation networks (modules) which are associated with these phenotypes, with one of the aging-associated modules displaying increased within-module methylation connectivity with age. The atypical POE-CpGs also display high levels of methylation heterogeneity, fast information loss with age, and a strong correlation with CpGs contained within epigenetic clocks. CONCLUSIONS These results identify the association between the atypical POE-influenced methylome and aging and provide new evidence for the "early development of origin" hypothesis for aging in humans.
Collapse
Affiliation(s)
- Chenhao Gao
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Carmen Amador
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Rosie M Walker
- Centre for Clinical Brain Sciences, Chancellor's Building, 49 Little France Crescent, Edinburgh BioQuarter, Edinburgh, UK
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
- School of Psychology, University of Exeter, Perry Road, Exeter, UK
| | - Archie Campbell
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | | | - Mark J Adams
- Division of Psychiatry, University of Edinburgh, Edinburgh, UK
| | - Xiaomeng Bai
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Ying Liu
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Caroline Hayward
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - David J Porteous
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Xueyi Shen
- Division of Psychiatry, University of Edinburgh, Edinburgh, UK
| | - Kathryn L Evans
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Chris S Haley
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
- Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | | | - Pau Navarro
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
- Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK.
| | - Yanni Zeng
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China.
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China.
- Guangdong Province Key Laboratory of Brain Function and Disease, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China.
| |
Collapse
|
7
|
Gao C, Amador C, Walker RM, Campbell A, Madden RA, Adams MJ, Bai X, Liu Y, Li M, Hayward C, Porteous DJ, Shen X, Evans KL, Haley CS, McIntosh AM, Navarro P, Zeng Y. Phenome-wide analysis identifies parent-of-origin effects on the human methylome associated with changes in the rate of aging. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.18.524653. [PMID: 36711749 PMCID: PMC9882261 DOI: 10.1101/2023.01.18.524653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Variation in the rate at which humans age may be rooted in early life events acting through genomic regions that are influenced by such events and subsequently are related to health phenotypes in later life. The parent-of-origin-effect (POE)-regulated methylome includes regions either enriched for genetically controlled imprinting effects (the typical type of POE) or atypical POE introduced by environmental effects associated with parents. This part of the methylome is heavily influenced by early life events, making it a potential route connecting early environmental exposures, the epigenome and the rate of aging. Here, we aim to test the association of POE-influenced methylation of CpG dinucleotides (POE-CpG sites) with early and later environmental exposures and subsequently with health-related phenotypes and adult aging phenotypes. We do this by performing phenome-wide association analyses of the POE-influenced methylome using a large family-based population cohort (GS:SFHS, Ndiscovery=5,087, Nreplication=4,450). At the single CpG level, 92 associations of POE-CpGs with phenotypic variation were identified and replicated. Most of the associations were contributed by POE-CpGs belonging to the atypical class and the most strongly enriched associations were with aging (DNAmTL acceleration), intelligence and parental (maternal) smoking exposure phenotypes. We further found that a proportion of the atypical-POE-CpGs formed co-methylation networks (modules) which are associated with these phenotypes, with one of the aging-associated modules displaying increased internal module connectivity (strength of methylation correlation across constituent CpGs) with age. Atypical POE-CpGs also displayed high levels of methylation heterogeneity and epigenetic drift (i.e. information loss with age) and a strong correlation with CpGs contained within epigenetic clocks. These results identified associations between the atypical-POE-influenced methylome and aging and provided new evidence for the "early development of origin" hypothesis for aging in humans.
Collapse
Affiliation(s)
- Chenhao Gao
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| | - Carmen Amador
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Rosie M. Walker
- Centre for Clinical Brain Sciences, Chancellor’s Building, 49 Little France Crescent, Edinburgh BioQuarter, Edinburgh, UK
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
- School of Psychology, University of Exeter, Perry Road, Exeter, UK
| | - Archie Campbell
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Rebecca A Madden
- Division of Psychiatry, University of Edinburgh, Edinburgh, United Kingdom
| | - Mark J. Adams
- Division of Psychiatry, University of Edinburgh, Edinburgh, United Kingdom
| | - Xiaomeng Bai
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| | - Ying Liu
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| | - Caroline Hayward
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - David J. Porteous
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Xueyi Shen
- Division of Psychiatry, University of Edinburgh, Edinburgh, United Kingdom
| | - Kathryn L. Evans
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Chris S. Haley
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
- Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - Andrew M. McIntosh
- Division of Psychiatry, University of Edinburgh, Edinburgh, United Kingdom
| | - Pau Navarro
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Yanni Zeng
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
- Guangdong Province Key Laboratory of Brain Function and Disease, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
| |
Collapse
|
8
|
Lynall ME, Soskic B, Hayhurst J, Schwartzentruber J, Levey DF, Pathak GA, Polimanti R, Gelernter J, Stein MB, Trynka G, Clatworthy MR, Bullmore E. Genetic variants associated with psychiatric disorders are enriched at epigenetically active sites in lymphoid cells. Nat Commun 2022; 13:6102. [PMID: 36243721 PMCID: PMC9569335 DOI: 10.1038/s41467-022-33885-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 10/06/2022] [Indexed: 02/06/2023] Open
Abstract
Multiple psychiatric disorders have been associated with abnormalities in both the innate and adaptive immune systems. The role of these abnormalities in pathogenesis, and whether they are driven by psychiatric risk variants, remains unclear. We test for enrichment of GWAS variants associated with multiple psychiatric disorders (cross-disorder or trans-diagnostic risk), or 5 specific disorders (cis-diagnostic risk), in regulatory elements in immune cells. We use three independent epigenetic datasets representing multiple organ systems and immune cell subsets. Trans-diagnostic and cis-diagnostic risk variants (for schizophrenia and depression) are enriched at epigenetically active sites in brain tissues and in lymphoid cells, especially stimulated CD4+ T cells. There is no evidence for enrichment of either trans-risk or cis-risk variants for schizophrenia or depression in myeloid cells. This suggests a possible model where environmental stimuli activate T cells to unmask the effects of psychiatric risk variants, contributing to the pathogenesis of mental health disorders.
Collapse
Affiliation(s)
- Mary-Ellen Lynall
- Department of Psychiatry, Herchel Smith Building of Brain & Mind Sciences, Cambridge Biomedical Campus, University of Cambridge, Cambridge, CB2 0SZ, UK.
- Cambridgeshire & Peterborough NHS Foundation Trust, Cambridge, UK.
- Molecular Immunity Unit, University of Cambridge Department of Medicine, Cambridge, UK.
- Cellular Genetics, Wellcome Sanger Institute, Cambridge, UK.
| | - Blagoje Soskic
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Hinxton, UK
- Human Technopole, Milan, Italy
| | | | | | - Daniel F Levey
- VA Connecticut Healthcare System, West Haven, CT, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Gita A Pathak
- VA Connecticut Healthcare System, West Haven, CT, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Renato Polimanti
- VA Connecticut Healthcare System, West Haven, CT, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Joel Gelernter
- VA Connecticut Healthcare System, West Haven, CT, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
- Departments of Genetics and Neuroscience, Yale University School of Medicine, New Haven, CT, USA
| | - Murray B Stein
- VA San Diego Healthcare System, San Diego, CA, USA
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Gosia Trynka
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Hinxton, UK
| | - Menna R Clatworthy
- Molecular Immunity Unit, University of Cambridge Department of Medicine, Cambridge, UK
- Cellular Genetics, Wellcome Sanger Institute, Cambridge, UK
| | - Ed Bullmore
- Department of Psychiatry, Herchel Smith Building of Brain & Mind Sciences, Cambridge Biomedical Campus, University of Cambridge, Cambridge, CB2 0SZ, UK
- Cambridgeshire & Peterborough NHS Foundation Trust, Cambridge, UK
| |
Collapse
|
9
|
Dsouza KB, Li AY, Bhargava VK, Libbrecht MW. Latent Representation of the Human Pan-Celltype Epigenome Through a Deep Recurrent Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2313-2323. [PMID: 34043510 DOI: 10.1109/tcbb.2021.3084147] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The availability of thousands of assays of epigenetic activity necessitates compressed representations of these data sets that summarize the epigenetic landscape of the genome. Until recently, most such representations were cell type-specific, applying to a single tissue or cell state. Recently, neural networks have made it possible to summarize data across tissues to produce a pan-cell type representation. In this work, we propose Epi-LSTM, a deep long short-term memory (LSTM) recurrent neural network autoencoder to capture the long-term dependencies in the epigenomic data. The latent representations from Epi-LSTM capture a variety of genomic phenomena, including gene-expression, promoter-enhancer interactions, replication timing, frequently interacting regions, and evolutionary conservation. These representations outperform existing methods in a majority of cell types while yielding smoother representations along the genomic axis due to their sequential nature.
Collapse
|
10
|
Costallat M, Batsché E, Rachez C, Muchardt C. The 'Alu-ome' shapes the epigenetic environment of regulatory elements controlling cellular defense. Nucleic Acids Res 2022; 50:5095-5110. [PMID: 35544277 PMCID: PMC9122584 DOI: 10.1093/nar/gkac346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 04/19/2022] [Accepted: 04/23/2022] [Indexed: 11/13/2022] Open
Abstract
Promoters and enhancers are sites of transcription initiation (TSSs) and carry specific histone modifications, including H3K4me1, H3K4me3, and H3K27ac. Yet, the principles governing the boundaries of such regulatory elements are still poorly characterized. Alu elements are good candidates for a boundary function, being highly abundant in gene-rich regions, while essentially excluded from regulatory elements. Here, we show that the interval ranging from TSS to first upstream Alu, accommodates all H3K4me3 and most H3K27ac marks, while excluding DNA methylation. Remarkably, the average length of these intervals greatly varies in-between tissues, being longer in stem- and shorter in immune-cells. The very shortest TSS-to-first-Alu intervals were observed at promoters active in T-cells, particularly at immune genes, where first-Alus were traversed by RNA polymerase II transcription, while accumulating H3K4me1 signal. Finally, DNA methylation at first-Alus was found to evolve with age, regressing from young to middle-aged, then recovering later in life. Thus, the first-Alus upstream of TSSs appear as dynamic boundaries marking the transition from DNA methylation to active histone modifications at regulatory elements, while also participating in the recording of immune gene transcriptional events by positioning H3K4me1-modified nucleosomes.
Collapse
Affiliation(s)
- Mickael Costallat
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Biological Adaptation and Ageing, B2A-IBPS, 75005, Paris, France
| | - Eric Batsché
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Biological Adaptation and Ageing, B2A-IBPS, 75005, Paris, France
| | - Christophe Rachez
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Biological Adaptation and Ageing, B2A-IBPS, 75005, Paris, France
| | - Christian Muchardt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Biological Adaptation and Ageing, B2A-IBPS, 75005, Paris, France
| |
Collapse
|
11
|
Daneshpajouh H, Chen B, Shokraneh N, Masoumi S, Wiese KC, Libbrecht MW. Continuous chromatin state feature annotation of the human epigenome. Bioinformatics 2022; 38:3029-3036. [PMID: 35451453 PMCID: PMC9154241 DOI: 10.1093/bioinformatics/btac283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 02/18/2022] [Accepted: 04/18/2022] [Indexed: 12/02/2022] Open
Abstract
Motivation Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These methods take as input a set of sequencing-based assays of epigenomic activity, such as ChIP-seq measurements of histone modification and transcription factor binding. They output an annotation of the genome that assigns a chromatin state label to each genomic position. Existing SAGA methods have several limitations caused by the discrete annotation framework: such annotations cannot easily represent varying strengths of genomic elements, and they cannot easily represent combinatorial elements that simultaneously exhibit multiple types of activity. To remedy these limitations, we propose an annotation strategy that instead outputs a vector of chromatin state features at each position rather than a single discrete label. Continuous modeling is common in other fields, such as in topic modeling of text documents. We propose a method, epigenome-ssm-nonneg, that uses a non-negative state space model to efficiently annotate the genome with chromatin state features. We also propose several measures of the quality of a chromatin state feature annotation and we compare the performance of several alternative methods according to these quality measures. Results We show that chromatin state features from epigenome-ssm-nonneg are more useful for several downstream applications than both continuous and discrete alternatives, including their ability to identify expressed genes and enhancers. Therefore, we expect that these continuous chromatin state features will be valuable reference annotations to be used in visualization and downstream analysis. Availability and implementation Source code for epigenome-ssm is available at https://github.com/habibdanesh/epigenome-ssm and Zenodo (DOI: 10.5281/zenodo.6507585). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Habib Daneshpajouh
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Bowen Chen
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Neda Shokraneh
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Shohre Masoumi
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Kay C Wiese
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| |
Collapse
|
12
|
Leone M, Galeota E, Masseroli M, Pelizzola M. Identification, semantic annotation and comparison of combinations of functional elements in multiple biological conditions. Bioinformatics 2022; 38:1183-1190. [PMID: 34864898 DOI: 10.1093/bioinformatics/btab815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 10/12/2021] [Accepted: 11/30/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Approaches such as chromatin immunoprecipitation followed by sequencing (ChIP-seq) represent the standard for the identification of binding sites of DNA-associated proteins, including transcription factors and histone marks. Public repositories of omics data contain a huge number of experimental ChIP-seq data, but their reuse and integrative analysis across multiple conditions remain a daunting task. RESULTS We present the Combinatorial and Semantic Analysis of Functional Elements (CombSAFE), an efficient computational method able to integrate and take advantage of the valuable and numerous, but heterogeneous, ChIP-seq data publicly available in big data repositories. Leveraging natural language processing techniques, it integrates omics data samples with semantic annotations from selected biomedical ontologies; then, using hidden Markov models, it identifies combinations of static and dynamic functional elements throughout the genome for the corresponding samples. CombSAFE allows analyzing the whole genome, by clustering patterns of regions with similar functional elements and through enrichment analyses to discover ontological terms significantly associated with them. Moreover, it allows comparing functional states of a specific genomic region to analyze their different behavior throughout the various semantic annotations. Such findings can provide novel insights by identifying unexpected combinations of functional elements in different biological conditions. AVAILABILITY AND IMPLEMENTATION The Python implementation of the CombSAFE pipeline is freely available for non-commercial use at: https://github.com/DEIB-GECO/CombSAFE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michele Leone
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy
| | - Eugenia Galeota
- Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), 20139 Milan, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy
| | - Mattia Pelizzola
- Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), 20139 Milan, Italy
| |
Collapse
|
13
|
Cao Z, Huang Y, Duan R, Jin P, Qin ZS, Zhang S. Disease category-specific annotation of variants using an ensemble learning framework. Brief Bioinform 2021; 23:6394995. [PMID: 34643213 DOI: 10.1093/bib/bbab438] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 09/03/2021] [Accepted: 09/22/2021] [Indexed: 02/01/2023] Open
Abstract
Understanding the impact of non-coding sequence variants on complex diseases is an essential problem. We present a novel ensemble learning framework-CASAVA, to predict genomic loci in terms of disease category-specific risk. Using disease-associated variants identified by GWAS as training data, and diverse sequencing-based genomics and epigenomics profiles as features, CASAVA provides risk prediction of 24 major categories of diseases throughout the human genome. Our studies showed that CASAVA scores at a genomic locus provide a reasonable prediction of the disease-specific and disease category-specific risk prediction for non-coding variants located within the locus. Taking MHC2TA and immune system diseases as an example, we demonstrate the potential of CASAVA in revealing variant-disease associations. A website (http://zhanglabtools.org/CASAVA) has been built to facilitate easily access to CASAVA scores.
Collapse
Affiliation(s)
- Zhen Cao
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yanting Huang
- Department of Computer Science, Emory University, Atlanta, GA 30322, USA
| | - Ran Duan
- Department of Software Engineering, Yunnan University, Kunming 650500, China
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Zhaohui S Qin
- Department of Computer Science, Emory University, Atlanta, GA 30322, USA.,Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.,Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
14
|
Libbrecht MW, Chan RCW, Hoffman MM. Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns. PLoS Comput Biol 2021; 17:e1009423. [PMID: 34648491 PMCID: PMC8516206 DOI: 10.1371/journal.pcbi.1009423] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.
Collapse
Affiliation(s)
| | - Rachel C. W. Chan
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Michael M. Hoffman
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| |
Collapse
|
15
|
Fang K, Li T, Huang Y, Jin VX. NucHMM: a method for quantitative modeling of nucleosome organization identifying functional nucleosome states distinctly associated with splicing potentiality. Genome Biol 2021; 22:250. [PMID: 34446075 PMCID: PMC8390234 DOI: 10.1186/s13059-021-02465-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 08/12/2021] [Indexed: 01/01/2023] Open
Abstract
We develop a novel computational method, NucHMM, to identify functional nucleosome states associated with cell type-specific combinatorial histone marks and nucleosome organization features such as phasing, spacing and positioning. We test it on publicly available MNase-seq and ChIP-seq data in MCF7, H1, and IMR90 cells and identify 11 distinct functional nucleosome states. We demonstrate these nucleosome states are distinctly associated with the splicing potentiality of skipping exons. This advances our understanding of the chromatin function at the nucleosome level and offers insights into the interplay between nucleosome organization and splicing processes.
Collapse
Affiliation(s)
- Kun Fang
- Department of Molecular Medicine, UTHSA-UTSA Joint Biomedical Engineering Program, San Antonio, TX, 78229, USA
| | - Tianbao Li
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229, USA
| | - Yufei Huang
- Department of Medicine, UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15232, USA
| | - Victor X Jin
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229, USA.
| |
Collapse
|
16
|
Bayat F, Libbrecht M. VSS: Variance-stabilized signals for sequencing-based genomic signals. Bioinformatics 2021; 37:4383-4391. [PMID: 34165492 PMCID: PMC8652025 DOI: 10.1093/bioinformatics/btab457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 04/28/2021] [Accepted: 06/17/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION A sequencing-based genomic assay such as ChIP-seq outputs a real-valued signal for each position in the genome that measures the strength of activity at that position. Most genomic signals lack the property of variance stabilization. That is, a difference between 0 and 100 reads usually has a very different statistical importance from a difference between 1,000 and 1,100 reads. A statistical model such as a negative binomial distribution can account for this pattern, but learning these models is computationally challenging. Therefore, many applications - including imputation and segmentation and genome annotation (SAGA) - instead use Gaussian models and use a transformation such as log or inverse hyperbolic sine (asinh) to stabilize variance. RESULTS We show here that existing transformations do not fully stabilize variance in genomic data sets. To solve this issue, we propose VSS, a method that produces variance-stabilized signals for sequencing-based genomic signals. VSS learns the empirical relationship between the mean and variance of a given signal data set and produces transformed signals that normalize for this dependence. We show that VSS successfully stabilizes variance and that doing so improves downstream applications such as SAGA. VSS will eliminate the need for downstream methods to implement complex mean-variance relationship models, and will enable genomic signals to be easily understood by eye. AVAILABILITY https://github.com/faezeh-bayat/VSS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Faezeh Bayat
- Department of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Maxwell Libbrecht
- Department of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
17
|
Xiang G, Giardine BM, Mahony S, Zhang Y, Hardison RC. S3V2-IDEAS: a package for normalizing, denoising and integrating epigenomic datasets across different cell types. Bioinformatics 2021; 37:3011-3013. [PMID: 33681991 PMCID: PMC8479670 DOI: 10.1093/bioinformatics/btab148] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 01/26/2021] [Accepted: 03/01/2021] [Indexed: 02/02/2023] Open
Abstract
SUMMARY Epigenetic modifications reflect key aspects of transcriptional regulation, and many epigenomic datasets have been generated under different biological contexts to provide insights into regulatory processes. However, the technical noise in epigenomic datasets and the many dimensions (features) examined make it challenging to effectively extract biologically meaningful inferences from these datasets. We developed a package that reduces noise while normalizing the epigenomic data by a novel normalization method, followed by integrative dimensional reduction by learning and assigning epigenetic states. This package, called S3V2-IDEAS, can be used to identify epigenetic states for multiple features, or identify discretized signal intensity levels and a master peak list across different cell types for a single feature. We illustrate the outputs and performance of S3V2-IDEAS using 137 epigenomics datasets from the VISION project that provides ValIdated Systematic IntegratiON of epigenomic data in hematopoiesis. AVAILABILITY AND IMPLEMENTATION S3V2-IDEAS pipeline is freely available as open source software released under an MIT license at: https://github.com/guanjue/S3V2_IDEAS_ESMP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guanjue Xiang
- The Bioinformatics and Genomics Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- To whom correspondence should be addressed. or
| | - Belinda M Giardine
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Yu Zhang
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- To whom correspondence should be addressed. or
| |
Collapse
|
18
|
He P, Williams BA, Trout D, Marinov GK, Amrhein H, Berghella L, Goh ST, Plajzer-Frick I, Afzal V, Pennacchio LA, Dickel DE, Visel A, Ren B, Hardison RC, Zhang Y, Wold BJ. The changing mouse embryo transcriptome at whole tissue and single-cell resolution. Nature 2020; 583:760-767. [PMID: 32728245 PMCID: PMC7410830 DOI: 10.1038/s41586-020-2536-x] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 06/22/2020] [Indexed: 02/07/2023]
Abstract
During mammalian embryogenesis, differential gene expression gradually builds the identity and complexity of each tissue and organ system1. Here we systematically quantified mouse polyA-RNA from day 10.5 of embryonic development to birth, sampling 17 tissues and organs. The resulting developmental transcriptome is globally structured by dynamic cytodifferentiation, body-axis and cell-proliferation gene sets that were further characterized by the transcription factor motif codes of their promoters. We decomposed the tissue-level transcriptome using single-cell RNA-seq (sequencing of RNA reverse transcribed into cDNA) and found that neurogenesis and haematopoiesis dominate at both the gene and cellular levels, jointly accounting for one-third of differential gene expression and more than 40% of identified cell types. By integrating promoter sequence motifs with companion ENCODE epigenomic profiles, we identified a prominent promoter de-repression mechanism in neuronal expression clusters that was attributable to known and novel repressors. Focusing on the developing limb, single-cell RNA data identified 25 candidate cell types that included progenitor and differentiating states with computationally inferred lineage relationships. We extracted cell-type transcription factor networks and complementary sets of candidate enhancer elements by using single-cell RNA-seq to decompose integrative cis-element (IDEAS) models that were derived from whole-tissue epigenome chromatin data. These ENCODE reference data, computed network components and IDEAS chromatin segmentations are companion resources to the matching epigenomic developmental matrix, and are available for researchers to further mine and integrate.
Collapse
Affiliation(s)
- Peng He
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Brian A Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | | | - Henry Amrhein
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Libera Berghella
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Say-Tar Goh
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ingrid Plajzer-Frick
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Veena Afzal
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Comparative Biochemistry Program, University of California, Berkeley, Berkeley, CA, USA
| | - Diane E Dickel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Axel Visel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- School of Natural Sciences, University of California, Merced, Merced, CA, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Yu Zhang
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
| | - Barbara J Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
19
|
Wu C, Pan W. Integration of methylation QTL and enhancer-target gene maps with schizophrenia GWAS summary results identifies novel genes. Bioinformatics 2020; 35:3576-3583. [PMID: 30850848 DOI: 10.1093/bioinformatics/btz161] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 02/04/2019] [Accepted: 03/05/2019] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION Most trait-associated genetic variants identified in genome-wide association studies (GWASs) are located in non-coding regions of the genome and thought to act through their regulatory roles. RESULTS To account for enriched association signals in DNA regulatory elements, we propose a novel and general gene-based association testing strategy that integrates enhancer-target gene pairs and methylation quantitative trait locus data with GWAS summary results; it aims to both boost statistical power for new discoveries and enhance mechanistic interpretability of any new discovery. By reanalyzing two large-scale schizophrenia GWAS summary datasets, we demonstrate that the proposed method could identify some significant and novel genes (containing no genome-wide significant SNPs nearby) that would have been missed by other competing approaches, including the standard and some integrative gene-based association methods, such as one incorporating enhancer-target gene pairs and one integrating expression quantitative trait loci. AVAILABILITY AND IMPLEMENTATION Software: wuchong.org/egmethyl.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
20
|
Xiang G, Keller CA, Heuston E, Giardine BM, An L, Wixom AQ, Miller A, Cockburn A, Sauria MEG, Weaver K, Lichtenberg J, Göttgens B, Li Q, Bodine D, Mahony S, Taylor J, Blobel GA, Weiss MJ, Cheng Y, Yue F, Hughes J, Higgs DR, Zhang Y, Hardison RC. An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis. Genome Res 2020; 30:472-484. [PMID: 32132109 PMCID: PMC7111515 DOI: 10.1101/gr.255760.119] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 02/21/2020] [Indexed: 01/29/2023]
Abstract
Thousands of epigenomic data sets have been generated in the past decade, but it is difficult for researchers to effectively use all the data relevant to their projects. Systematic integrative analysis can help meet this need, and the VISION project was established for validated systematic integration of epigenomic data in hematopoiesis. Here, we systematically integrated extensive data recording epigenetic features and transcriptomes from many sources, including individual laboratories and consortia, to produce a comprehensive view of the regulatory landscape of differentiating hematopoietic cell types in mouse. By using IDEAS as our integrative and discriminative epigenome annotation system, we identified and assigned epigenetic states simultaneously along chromosomes and across cell types, precisely and comprehensively. Combining nuclease accessibility and epigenetic states produced a set of more than 200,000 candidate cis-regulatory elements (cCREs) that efficiently capture enhancers and promoters. The transitions in epigenetic states of these cCREs across cell types provided insights into mechanisms of regulation, including decreases in numbers of active cCREs during differentiation of most lineages, transitions from poised to active or inactive states, and shifts in nuclease accessibility of CTCF-bound elements. Regression modeling of epigenetic states at cCREs and gene expression produced a versatile resource to improve selection of cCREs potentially regulating target genes. These resources are available from our VISION website to aid research in genomics and hematopoiesis.
Collapse
Affiliation(s)
- Guanjue Xiang
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Cheryl A Keller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Elisabeth Heuston
- NHGRI Hematopoiesis Section, Genetics and Molecular Biology Branch, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Belinda M Giardine
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Lin An
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Alexander Q Wixom
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Amber Miller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - April Cockburn
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Michael E G Sauria
- Departments of Biology and Computer Science, Johns Hopkins University, Baltimore, Maryland 20218, USA
| | - Kathryn Weaver
- Departments of Biology and Computer Science, Johns Hopkins University, Baltimore, Maryland 20218, USA
| | - Jens Lichtenberg
- NHGRI Hematopoiesis Section, Genetics and Molecular Biology Branch, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Berthold Göttgens
- Welcome and MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 1TN, United Kingdom
| | - Qunhua Li
- Department of Statistics, Program in Bioinformatics and Genomics, Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - David Bodine
- NHGRI Hematopoiesis Section, Genetics and Molecular Biology Branch, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - James Taylor
- Departments of Biology and Computer Science, Johns Hopkins University, Baltimore, Maryland 20218, USA
| | - Gerd A Blobel
- Department of Pediatrics, Children's Hospital of Philadelphia and University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Mitchell J Weiss
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Yong Cheng
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Feng Yue
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Jim Hughes
- MRC Weatherall Institute of Molecular Medicine, Oxford University, Oxford OX3 9DS, United Kingdom
| | - Douglas R Higgs
- MRC Weatherall Institute of Molecular Medicine, Oxford University, Oxford OX3 9DS, United Kingdom
| | - Yu Zhang
- Department of Statistics, Program in Bioinformatics and Genomics, Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
21
|
Hardison RC, Zhang Y, Keller CA, Xiang G, Heuston EF, An L, Lichtenberg J, Giardine BM, Bodine D, Mahony S, Li Q, Yue F, Weiss MJ, Blobel GA, Taylor J, Hughes J, Higgs DR, Göttgens B. Systematic integration of GATA transcription factors and epigenomes via IDEAS paints the regulatory landscape of hematopoietic cells. IUBMB Life 2020; 72:27-38. [PMID: 31769130 PMCID: PMC6972633 DOI: 10.1002/iub.2195] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 10/17/2019] [Indexed: 01/15/2023]
Abstract
Members of the GATA family of transcription factors play key roles in the differentiation of specific cell lineages by regulating the expression of target genes. Three GATA factors play distinct roles in hematopoietic differentiation. In order to better understand how these GATA factors function to regulate genes throughout the genome, we are studying the epigenomic and transcriptional landscapes of hematopoietic cells in a model-driven, integrative fashion. We have formed the collaborative multi-lab VISION project to conduct ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis. The epigenomic data included nuclease accessibility in chromatin, CTCF occupancy, and histone H3 modifications for 20 cell types covering hematopoietic stem cells, multilineage progenitor cells, and mature cells across the blood cell lineages of mouse. The analysis used the Integrative and Discriminative Epigenome Annotation System (IDEAS), which learns all common combinations of features (epigenetic states) simultaneously in two dimensions-along chromosomes and across cell types. The result is a segmentation that effectively paints the regulatory landscape in readily interpretable views, revealing constitutively active or silent loci as well as the loci specifically induced or repressed in each stage and lineage. Nuclease accessible DNA segments in active chromatin states were designated candidate cis-regulatory elements in each cell type, providing one of the most comprehensive registries of candidate hematopoietic regulatory elements to date. Applications of VISION resources are illustrated for the regulation of genes encoding GATA1, GATA2, GATA3, and Ikaros. VISION resources are freely available from our website http://usevision.org.
Collapse
Affiliation(s)
- Ross C. Hardison
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Yu Zhang
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Cheryl A. Keller
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Guanjue Xiang
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Elisabeth F. Heuston
- Genetics and Molecular Biology Branch, Hematopoiesis SectionNational Institutes of Health, NHGRIBethesdaMD
| | - Lin An
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Jens Lichtenberg
- Genetics and Molecular Biology Branch, Hematopoiesis SectionNational Institutes of Health, NHGRIBethesdaMD
| | - Belinda M. Giardine
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - David Bodine
- Genetics and Molecular Biology Branch, Hematopoiesis SectionNational Institutes of Health, NHGRIBethesdaMD
| | - Shaun Mahony
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Qunhua Li
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Feng Yue
- Department of Biochemistry and Molecular BiologyThe Pennsylvania State University College of MedicineHershey, PA
| | - Mitchell J. Weiss
- Hematology DepartmentSt. Jude Children's Research HospitalMemphis, TN
| | | | - James Taylor
- Departments of Biology and of Computer ScienceJohns Hopkins UniversityBaltimore, MD
| | - Jim Hughes
- Laboratory of Gene RegulationWeatherall Institute of Molecular Medicine, Oxford UniversityOxfordUK
| | - Douglas R. Higgs
- Laboratory of Gene RegulationWeatherall Institute of Molecular Medicine, Oxford UniversityOxfordUK
| | - Berthold Göttgens
- Department of Hematology, Cambridge Institute for Medical ResearchUniversity of CambridgeCambridgeUK
| |
Collapse
|
22
|
Zhang Y, Mahony S. Direct prediction of regulatory elements from partial data without imputation. PLoS Comput Biol 2019; 15:e1007399. [PMID: 31682602 PMCID: PMC6855516 DOI: 10.1371/journal.pcbi.1007399] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 11/14/2019] [Accepted: 09/12/2019] [Indexed: 01/07/2023] Open
Abstract
Genome segmentation approaches allow us to characterize regulatory states in a given cell type using combinatorial patterns of histone modifications and other regulatory signals. In order to analyze regulatory state differences across cell types, current genome segmentation approaches typically require that the same regulatory genomics assays have been performed in all analyzed cell types. This necessarily limits both the numbers of cell types that can be analyzed and the complexity of the resulting regulatory states, as only a small number of histone modifications have been profiled across many cell types. Data imputation approaches that aim to estimate missing regulatory signals have been applied before genome segmentation. However, this approach is computationally costly and propagates any errors in imputation to produce incorrect genome segmentation results downstream. We present an extension to the IDEAS genome segmentation platform which can perform genome segmentation on incomplete regulatory genomics dataset collections without using imputation. Instead of relying on imputed data, we use an expectation-maximization approach to estimate marginal density functions within each regulatory state. We demonstrate that our genome segmentation results compare favorably with approaches based on imputation or other strategies for handling missing data. We further show that our approach can accurately impute missing data after genome segmentation, reversing the typical order of imputation/genome segmentation pipelines. Finally, we present a new 2D genome segmentation analysis of 127 human cell types studied by the Roadmap Epigenomics Consortium. By using an expanded set of chromatin marks that have been profiled in subsets of these cell types, our new segmentation results capture a more complex picture of combinatorial regulatory patterns that appear on the human genome.
Collapse
Affiliation(s)
- Yu Zhang
- Department of Statistics, Penn State University, University Park, Pennsylvania, United States of America
| | - Shaun Mahony
- Department of Biochemistry & Molecular Biology and Center for Eukaryotic Gene Regulation, Penn State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
23
|
Libbrecht MW, Rodriguez OL, Weng Z, Bilmes JA, Hoffman MM, Noble WS. A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types. Genome Biol 2019; 20:180. [PMID: 31462275 PMCID: PMC6714098 DOI: 10.1186/s13059-019-1784-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 08/05/2019] [Indexed: 12/31/2022] Open
Abstract
Semi-automated genome annotation methods such as Segway take as input a set of genome-wide measurements such as of histone modification or DNA accessibility and output an annotation of genomic activity in the target cell type. Here we present annotations of 164 human cell types using 1615 data sets. To produce these annotations, we automated the label interpretation step to produce a fully automated annotation strategy. Using these annotations, we developed a measure of the importance of each genomic position called the “conservation-associated activity score.” We further combined all annotations into a single, cell type-agnostic encyclopedia that catalogs all human regulatory elements.
Collapse
Affiliation(s)
| | - Oscar L Rodriguez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Boston, USA
| | - Jeffrey A Bilmes
- Department of Electrical Engineering, University of Washington, Seattle, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, Toronto, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Canada.,Department of Computer Science, University of Toronto, Toronto, Canada
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, USA. .,Department of Computer Science, University of Washington, Seattle, USA.
| |
Collapse
|
24
|
Poulet A, Li B, Dubos T, Rivera-Mulia JC, Gilbert DM, Qin ZS. RT States: systematic annotation of the human genome using cell type-specific replication timing programs. Bioinformatics 2019; 35:2167-2176. [PMID: 30475980 PMCID: PMC6681175 DOI: 10.1093/bioinformatics/bty957] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 11/05/2018] [Accepted: 11/21/2018] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION The replication timing (RT) program has been linked to many key biological processes including cell fate commitment, 3D chromatin organization and transcription regulation. Significant technology progress now allows to characterize the RT program in the entire human genome in a high-throughput and high-resolution fashion. These experiments suggest that RT changes dynamically during development in coordination with gene activity. Since RT is such a fundamental biological process, we believe that an effective quantitative profile of the local RT program from a diverse set of cell types in various developmental stages and lineages can provide crucial biological insights for a genomic locus. RESULTS In this study, we explored recurrent and spatially coherent combinatorial profiles from 42 RT programs collected from multiple lineages at diverse differentiation states. We found that a Hidden Markov Model with 15 hidden states provide a good model to describe these genome-wide RT profiling data. Each of the hidden state represents a unique combination of RT profiles across different cell types which we refer to as 'RT states'. To understand the biological properties of these RT states, we inspected their relationship with chromatin states, gene expression, functional annotation and 3D chromosomal organization. We found that the newly defined RT states possess interesting genome-wide functional properties that add complementary information to the existing annotation of the human genome. AVAILABILITY AND IMPLEMENTATION R scripts for inferring HMM models and Perl scripts for further analysis are available https://github.com/PouletAxel/script_HMM_Replication_timing. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Axel Poulet
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | | | - Juan Carlos Rivera-Mulia
- Department of Biological Science, Center for Genomics and Personalized Medicine, Florida State University, Tallahassee, FL, USA
| | - David M Gilbert
- Department of Biological Science, Center for Genomics and Personalized Medicine, Florida State University, Tallahassee, FL, USA
| | - Zhaohui S Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| |
Collapse
|
25
|
Dapas M, Sisk R, Legro RS, Urbanek M, Dunaif A, Hayes MG. Family-based quantitative trait meta-analysis implicates rare noncoding variants in DENND1A in polycystic ovary syndrome. J Clin Endocrinol Metab 2019; 104:3835-3850. [PMID: 31038695 PMCID: PMC6660913 DOI: 10.1210/jc.2018-02496] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 04/17/2019] [Indexed: 02/07/2023]
Abstract
CONTEXT Polycystic ovary syndrome (PCOS) is among the most common endocrine disorders of premenopausal women, affecting 5-15% of this population depending on the diagnostic criteria applied. It is characterized by hyperandrogenism, ovulatory dysfunction and polycystic ovarian morphology. PCOS is highly heritable, but only a small proportion of this heritability can be accounted for by the common genetic susceptibility variants identified to date. OBJECTIVE The objective of this study was to test whether rare genetic variants contribute to PCOS pathogenesis.Design, Patients, and Methods: We performed whole-genome sequencing on DNA from 261 individuals from 62 families with one or more daughters with PCOS. We tested for associations of rare variants with PCOS and its concomitant hormonal traits using a quantitative trait meta-analysis. RESULTS We found rare variants in DENND1A (P=5.31×10-5, Padj=0.039) that were significantly associated with reproductive and metabolic traits in PCOS families. CONCLUSIONS Common variants in DENND1A have previously been associated with PCOS diagnosis in genome-wide association studies. Subsequent studies indicated that DENND1A is an important regulator of human ovarian androgen biosynthesis. Our findings provide additional evidence that DENND1A plays a central role in PCOS and suggest that rare noncoding variants contribute to disease pathogenesis.
Collapse
Affiliation(s)
- Matthew Dapas
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Ryan Sisk
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Richard S Legro
- Department of Obstetrics and Gynecology, Penn State College of Medicine, Hershey, Pennsylvania
| | - Margrit Urbanek
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Reproductive Science, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Andrea Dunaif
- Division of Endocrinology, Diabetes, and Bone Disease, Icahn School of Medicine at Mount Sinai, New York, New York
- Correspondence and Reprint Requests: M. Geoffrey Hayes, PhD, Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, 303 East Chicago Avenue, Chicago, Illinois 60611. E-mail: ; or Andrea Dunaif, MD, Division of Endocrinology, Diabetes, and Bone Disease, Icahn School of Medicine at Mount Sinai, 5 East 98th Street, 3rd Floor, New York, New York 10029. E-mail:
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Department of Anthropology, Northwestern University, Evanston, Illinois
- Correspondence and Reprint Requests: M. Geoffrey Hayes, PhD, Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, 303 East Chicago Avenue, Chicago, Illinois 60611. E-mail: ; or Andrea Dunaif, MD, Division of Endocrinology, Diabetes, and Bone Disease, Icahn School of Medicine at Mount Sinai, 5 East 98th Street, 3rd Floor, New York, New York 10029. E-mail:
| |
Collapse
|
26
|
Backenroth D, He Z, Kiryluk K, Boeva V, Pethukova L, Khurana E, Christiano A, Buxbaum JD, Ionita-Laza I. FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications. Am J Hum Genet 2018; 102:920-942. [PMID: 29727691 PMCID: PMC5986983 DOI: 10.1016/j.ajhg.2018.03.026] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 03/21/2018] [Indexed: 10/17/2022] Open
Abstract
We describe a method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell-type- and/or tissue-specific way (FUN-LDA). Using this unsupervised approach, we predict tissue-specific functional effects for every position in the human genome in 127 different tissues and cell types. We demonstrate the usefulness of our predictions by using several validation experiments. Using eQTL data from several sources, including the GTEx project, Geuvadis project, and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used for (1) deriving the most likely cell or tissue type causally implicated for a complex trait by using summary statistics from genome-wide association studies and (2) estimating a tissue-based correlation matrix of various complex traits. We found large enrichment of heritability in functional components of relevant tissues for various complex traits, and FUN-LDA yielded higher enrichment estimates than existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA with state-of-the-art functional annotation methods and show that FUN-LDA has better prediction accuracy and higher resolution than these methods. In particular, our results suggest that tissue- and cell-type-specific functional prediction methods tend to have substantially better prediction accuracy than organism-level prediction methods. Scores for each position in the human genome and for each ENCODE and Roadmap tissue are available online (see Web Resources).
Collapse
Affiliation(s)
- Daniel Backenroth
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Zihuai He
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Krzysztof Kiryluk
- Department of Medicine, Columbia University, New York, NY 10032, USA
| | - Valentina Boeva
- INSERM, U900, 75005 Paris, France; Institut Curie, Mines ParisTech, PSL Research University, 75005 Paris, France
| | - Lynn Pethukova
- Department of Epidemiology, Columbia University, New York, NY 10032, USA; Department of Dermatology, Columbia University, New York, NY 10032, USA
| | - Ekta Khurana
- Department of Physiology and Biophysics, Weill Medical College, Cornell University, New York, NY 10021, USA
| | - Angela Christiano
- Department of Dermatology, Columbia University, New York, NY 10032, USA; Department of Genetics and Development, Columbia University, New York, NY 10032, USA
| | - Joseph D Buxbaum
- Departments of Psychiatry, Neuroscience, and Genetics and Genomic Sciences, Icahn School of Medicine at Mount SInai, New York, NY 10029, USA; Friedman Brain Institute and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | |
Collapse
|
27
|
Verma A, Lucas A, Verma SS, Zhang Y, Josyula N, Khan A, Hartzel DN, Lavage DR, Leader J, Ritchie MD, Pendergrass SA. PheWAS and Beyond: The Landscape of Associations with Medical Diagnoses and Clinical Measures across 38,662 Individuals from Geisinger. Am J Hum Genet 2018; 102:592-608. [PMID: 29606303 PMCID: PMC5985339 DOI: 10.1016/j.ajhg.2018.02.017] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 02/20/2018] [Indexed: 01/23/2023] Open
Abstract
Most phenome-wide association studies (PheWASs) to date have used a small to moderate number of SNPs for association with phenotypic data. We performed a large-scale single-cohort PheWAS, using electronic health record (EHR)-derived case-control status for 541 diagnoses using International Classification of Disease version 9 (ICD-9) codes and 25 median clinical laboratory measures. We calculated associations between these diagnoses and traits with ∼630,000 common frequency SNPs with minor allele frequency > 0.01 for 38,662 individuals. In this landscape PheWAS, we explored results within diseases and traits, comparing results to those previously reported in genome-wide association studies (GWASs), as well as previously published PheWASs. We further leveraged the context of functional impact from protein-coding to regulatory regions, providing a deeper interpretation of these associations. The comprehensive nature of this PheWAS allows for novel hypothesis generation, the identification of phenotypes for further study for future phenotypic algorithm development, and identification of cross-phenotype associations.
Collapse
Affiliation(s)
- Anurag Verma
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Anastasia Lucas
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shefali S Verma
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Yu Zhang
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Navya Josyula
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA 17822, USA
| | - Anqa Khan
- Mount Holyoke College, South Hadley, MA 01075, USA
| | - Dustin N Hartzel
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA 17822, USA
| | - Daniel R Lavage
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA 17822, USA
| | - Joseph Leader
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA 17822, USA
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA; Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA 17822, USA.
| |
Collapse
|
28
|
Abstract
Transcription is regulated by transcription factor (TF) binding at promoters and distal regulatory elements and histone modifications that control the accessibility of these elements. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) has become the standard assay for identifying genome-wide protein-DNA interactions in vitro and in vivo. As large-scale ChIP-seq data sets have been collected for different TFs and histone modifications, their potential to predict gene expression can be used to test hypotheses about the mechanisms of gene regulation. In addition, complementary functional genomics assays provide a global view of chromatin accessibility and long-range cis-regulatory interactions that are being combined with TF binding and histone remodeling to study the regulation of gene expression. Thus, ChIP-seq analysis is now widely integrated with other functional genomics assays to better understand gene regulatory mechanisms. In this review, we discuss advances and challenges in integrating ChIP-seq data to identify context-specific chromatin states associated with gene activity. We describe the overall computational design of integrating ChIP-seq data with other functional genomics assays. We also discuss the challenges of extending these methods to low-input ChIP-seq assays and related single-cell assays.
Collapse
Affiliation(s)
| | - Ali Mortazavi
- Corresponding author: Ali Mortazavi, Department of Developmental and Cell Biology, 2300 Biological Sciences 3, University of California, Irvine, CA 92697, USA. Tel: (949)824-6762; E-mail:
| |
Collapse
|