1
|
Foroozandeh Shahraki M, Farahbod M, Libbrecht MW. Robust chromatin state annotation. Genome Res 2024; 34:469-483. [PMID: 38514204 PMCID: PMC11067878 DOI: 10.1101/gr.278343.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 03/19/2024] [Indexed: 03/23/2024]
Abstract
With the goal of mapping genomic activity, international projects have recently measured epigenetic activity in hundreds of cell and tissue types. Chromatin state annotations produced by segmentation and genome annotation (SAGA) methods have emerged as the predominant way to summarize these epigenomic data sets in order to annotate the genome. These chromatin state annotations are essential for many genomic tasks, including identifying active regulatory elements and interpreting disease-associated genetic variation. However, despite the widespread applications of SAGA methods, no principled approach exists to evaluate the statistical significance of chromatin state assignments. Here, we propose the first method for assigning calibrated confidence scores to chromatin state annotations. Toward this goal, we performed a comprehensive evaluation of the reproducibility of the two most widely used existing SAGA methods, ChromHMM and Segway. We found that their predictions are frequently irreproducible. For example, when applying the same SAGA method on two sets of experimental replicates, 27%-69% of predicted enhancers fail to replicate. This suggests that a substantial fraction of predicted elements in existing chromatin state annotations cannot be relied upon. To remedy this problem, we introduce SAGAconf, a method for assigning a measure of confidence (r-value) to chromatin state annotations. SAGAconf works with any SAGA method and assigns an r-value to each genomic bin of a chromatin state annotation that represents the probability that the label of this bin will be reproduced in a replicated experiment. Thus, SAGAconf allows a researcher to select only the reliable predictions from a chromatin annotation for use in downstream analyses.
Collapse
Affiliation(s)
| | - Marjan Farahbod
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
| |
Collapse
|
2
|
Bell CG. Epigenomic insights into common human disease pathology. Cell Mol Life Sci 2024; 81:178. [PMID: 38602535 PMCID: PMC11008083 DOI: 10.1007/s00018-024-05206-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 03/13/2024] [Indexed: 04/12/2024]
Abstract
The epigenome-the chemical modifications and chromatin-related packaging of the genome-enables the same genetic template to be activated or repressed in different cellular settings. This multi-layered mechanism facilitates cell-type specific function by setting the local sequence and 3D interactive activity level. Gene transcription is further modulated through the interplay with transcription factors and co-regulators. The human body requires this epigenomic apparatus to be precisely installed throughout development and then adequately maintained during the lifespan. The causal role of the epigenome in human pathology, beyond imprinting disorders and specific tumour suppressor genes, was further brought into the spotlight by large-scale sequencing projects identifying that mutations in epigenomic machinery genes could be critical drivers in both cancer and developmental disorders. Abrogation of this cellular mechanism is providing new molecular insights into pathogenesis. However, deciphering the full breadth and implications of these epigenomic changes remains challenging. Knowledge is accruing regarding disease mechanisms and clinical biomarkers, through pathogenically relevant and surrogate tissue analyses, respectively. Advances include consortia generated cell-type specific reference epigenomes, high-throughput DNA methylome association studies, as well as insights into ageing-related diseases from biological 'clocks' constructed by machine learning algorithms. Also, 3rd-generation sequencing is beginning to disentangle the complexity of genetic and DNA modification haplotypes. Cell-free DNA methylation as a cancer biomarker has clear clinical utility and further potential to assess organ damage across many disorders. Finally, molecular understanding of disease aetiology brings with it the opportunity for exact therapeutic alteration of the epigenome through CRISPR-activation or inhibition.
Collapse
Affiliation(s)
- Christopher G Bell
- William Harvey Research Institute, Barts & The London Faculty of Medicine, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK.
| |
Collapse
|
3
|
Mar D, Babenko IM, Zhang R, Noble WS, Denisenko O, Vaisar T, Bomsztyk K. A High-Throughput PIXUL-Matrix-Based Toolbox to Profile Frozen and Formalin-Fixed Paraffin-Embedded Tissues Multiomes. J Transl Med 2024; 104:100282. [PMID: 37924947 PMCID: PMC10872585 DOI: 10.1016/j.labinv.2023.100282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/23/2023] [Accepted: 10/27/2023] [Indexed: 11/06/2023] Open
Abstract
Large-scale high-dimensional multiomics studies are essential to unravel molecular complexity in health and disease. We developed an integrated system for tissue sampling (CryoGrid), analytes preparation (PIXUL), and downstream multiomic analysis in a 96-well plate format (Matrix), MultiomicsTracks96, which we used to interrogate matched frozen and formalin-fixed paraffin-embedded (FFPE) mouse organs. Using this system, we generated 8-dimensional omics data sets encompassing 4 molecular layers of intracellular organization: epigenome (H3K27Ac, H3K4m3, RNA polymerase II, and 5mC levels), transcriptome (messenger RNA levels), epitranscriptome (m6A levels), and proteome (protein levels) in brain, heart, kidney, and liver. There was a high correlation between data from matched frozen and FFPE organs. The Segway genome segmentation algorithm applied to epigenomic profiles confirmed known organ-specific superenhancers in both FFPE and frozen samples. Linear regression analysis showed that proteomic profiles, known to be poorly correlated with transcriptomic data, can be more accurately predicted by the full suite of multiomics data, compared with using epigenomic, transcriptomic, or epitranscriptomic measurements individually.
Collapse
Affiliation(s)
- Daniel Mar
- UW Medicine South Lake Union, University of Washington, Seattle, Washington; Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington
| | - Ilona M Babenko
- Diabetes Institute, University of Washington, Seattle, Washington
| | - Ran Zhang
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington; Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington
| | - Oleg Denisenko
- UW Medicine South Lake Union, University of Washington, Seattle, Washington
| | - Tomas Vaisar
- Diabetes Institute, University of Washington, Seattle, Washington
| | - Karol Bomsztyk
- UW Medicine South Lake Union, University of Washington, Seattle, Washington; Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington; Matchstick Technologies, Inc, Kirkland, Washington.
| |
Collapse
|
4
|
Pan JH, Du PF. SilenceREIN: seeking silencers on anchors of chromatin loops by deep graph neural networks. Brief Bioinform 2023; 25:bbad494. [PMID: 38168841 PMCID: PMC10782921 DOI: 10.1093/bib/bbad494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 11/09/2023] [Accepted: 12/04/2023] [Indexed: 01/05/2024] Open
Abstract
Silencers are repressive cis-regulatory elements that play crucial roles in transcriptional regulation. Experimental methods for identifying silencers are always costly and time-consuming. Computational methods, which relies on genomic sequence features, have been introduced as alternative approaches. However, silencers do not have significant epigenomic signature. Therefore, we explore a new way to computationally identify silencers, by incorporating chromatin structural information. We propose the SilenceREIN method, which focuses on finding silencers on anchors of chromatin loops. By using graph neural networks, we extracted chromatin structural information from a regulatory element interaction network. SilenceREIN integrated the chromatin structural information with linear genomic signatures to find silencers. The predictive performance of SilenceREIN is comparable or better than other states-of-the-art methods. We performed a genome-wide scanning to systematically find silencers in human genome. Results suggest that silencers are widespread on anchors of chromatin loops. In addition, enrichment analysis of transcription factor binding motif support our prediction results. As far as we can tell, this is the first attempt to incorporate chromatin structural information in finding silencers. All datasets and source codes of SilenceREIN have been deposited in a GitHub repository (https://github.com/JianHPan/SilenceREIN).
Collapse
Affiliation(s)
- Jian-Hua Pan
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| |
Collapse
|
5
|
Fan K, Pfister E, Weng Z. Toward a comprehensive catalog of regulatory elements. Hum Genet 2023; 142:1091-1111. [PMID: 36935423 DOI: 10.1007/s00439-023-02519-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/03/2023] [Indexed: 03/21/2023]
Abstract
Regulatory elements are the genomic regions that interact with transcription factors to control cell-type-specific gene expression in different cellular environments. A precise and complete catalog of functional elements encoded by the human genome is key to understanding mammalian gene regulation. Here, we review the current state of regulatory element annotation. We first provide an overview of assays for characterizing functional elements, including genome, epigenome, transcriptome, three-dimensional chromatin interaction, and functional validation assays. We then discuss computational methods for defining regulatory elements, including peak-calling and other statistical modeling methods. Finally, we introduce several high-quality lists of regulatory element annotations and suggest potential future directions.
Collapse
Affiliation(s)
- Kaili Fan
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Edith Pfister
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA.
| |
Collapse
|
6
|
Kumar S, Gahramanov V, Patel S, Yaglom J, Kaczmarczyk L, Alexandrov IA, Gerlitz G, Salmon-Divon M, Sherman MY. Evolution of Resistance to Irinotecan in Cancer Cells Involves Generation of Topoisomerase-Guided Mutations in Non-Coding Genome That Reduce the Chances of DNA Breaks. Int J Mol Sci 2023; 24:ijms24108717. [PMID: 37240063 DOI: 10.3390/ijms24108717] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 05/01/2023] [Accepted: 05/07/2023] [Indexed: 05/28/2023] Open
Abstract
Resistance to chemotherapy is a leading cause of treatment failure. Drug resistance mechanisms involve mutations in specific proteins or changes in their expression levels. It is commonly understood that resistance mutations happen randomly prior to treatment and are selected during the treatment. However, the selection of drug-resistant mutants in culture could be achieved by multiple drug exposures of cloned genetically identical cells and thus cannot result from the selection of pre-existent mutations. Accordingly, adaptation must involve the generation of mutations de novo upon drug treatment. Here we explored the origin of resistance mutations to a widely used Top1 inhibitor, irinotecan, which triggers DNA breaks, causing cytotoxicity. The resistance mechanism involved the gradual accumulation of recurrent mutations in non-coding regions of DNA at Top1-cleavage sites. Surprisingly, cancer cells had a higher number of such sites than the reference genome, which may define their increased sensitivity to irinotecan. Homologous recombination repairs of DNA double-strand breaks at these sites following initial drug exposures gradually reverted cleavage-sensitive "cancer" sequences back to cleavage-resistant "normal" sequences. These mutations reduced the generation of DNA breaks upon subsequent exposures, thus gradually increasing drug resistance. Together, large target sizes for mutations and their Top1-guided generation lead to their gradual and rapid accumulation, synergistically accelerating the development of resistance.
Collapse
Affiliation(s)
- Santosh Kumar
- Department of Molecular Biology, Faculty of Natural Sciences, Ariel University, Ariel 40700, Israel
| | - Valid Gahramanov
- Department of Molecular Biology, Faculty of Natural Sciences, Ariel University, Ariel 40700, Israel
| | - Shivani Patel
- Department of Molecular Biology, Faculty of Natural Sciences, Ariel University, Ariel 40700, Israel
| | - Julia Yaglom
- Department of Molecular Biology, Faculty of Natural Sciences, Ariel University, Ariel 40700, Israel
| | - Lukasz Kaczmarczyk
- Department of Molecular Biology, Faculty of Natural Sciences, Ariel University, Ariel 40700, Israel
| | - Ivan A Alexandrov
- Department of Anatomy and Anthropology & Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Gabi Gerlitz
- Department of Molecular Biology, Faculty of Natural Sciences, Ariel University, Ariel 40700, Israel
| | | | - Michael Y Sherman
- Department of Molecular Biology, Faculty of Natural Sciences, Ariel University, Ariel 40700, Israel
| |
Collapse
|
7
|
Dsouza KB, Li AY, Bhargava VK, Libbrecht MW. Latent Representation of the Human Pan-Celltype Epigenome Through a Deep Recurrent Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2313-2323. [PMID: 34043510 DOI: 10.1109/tcbb.2021.3084147] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The availability of thousands of assays of epigenetic activity necessitates compressed representations of these data sets that summarize the epigenetic landscape of the genome. Until recently, most such representations were cell type-specific, applying to a single tissue or cell state. Recently, neural networks have made it possible to summarize data across tissues to produce a pan-cell type representation. In this work, we propose Epi-LSTM, a deep long short-term memory (LSTM) recurrent neural network autoencoder to capture the long-term dependencies in the epigenomic data. The latent representations from Epi-LSTM capture a variety of genomic phenomena, including gene-expression, promoter-enhancer interactions, replication timing, frequently interacting regions, and evolutionary conservation. These representations outperform existing methods in a majority of cell types while yielding smoother representations along the genomic axis due to their sequential nature.
Collapse
|
8
|
Mulero Hernández J, Fernández-Breis JT. Analysis of the landscape of human enhancer sequences in biological databases. Comput Struct Biotechnol J 2022; 20:2728-2744. [PMID: 35685360 PMCID: PMC9168495 DOI: 10.1016/j.csbj.2022.05.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/20/2022] [Accepted: 05/21/2022] [Indexed: 12/01/2022] Open
Abstract
The process of gene regulation extends as a network in which both genetic sequences and proteins are involved. The levels of regulation and the mechanisms involved are multiple. Transcription is the main control mechanism for most genes, being the downstream steps responsible for refining the transcription patterns. In turn, gene transcription is mainly controlled by regulatory events that occur at promoters and enhancers. Several studies are focused on analyzing the contribution of enhancers in the development of diseases and their possible use as therapeutic targets. The study of regulatory elements has advanced rapidly in recent years with the development and use of next generation sequencing techniques. All this information has generated a large volume of information that has been transferred to a growing number of public repositories that store this information. In this article, we analyze the content of those public repositories that contain information about human enhancers with the aim of detecting whether the knowledge generated by scientific research is contained in those databases in a way that could be computationally exploited. The analysis will be based on three main aspects identified in the literature: types of enhancers, type of evidence about the enhancers, and methods for detecting enhancer-promoter interactions. Our results show that no single database facilitates the optimal exploitation of enhancer data, most types of enhancers are not represented in the databases and there is need for a standardized model for enhancers. We have identified major gaps and challenges for the computational exploitation of enhancer data.
Collapse
Affiliation(s)
- Juan Mulero Hernández
- Dept. Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, Spain
| | | |
Collapse
|
9
|
Daneshpajouh H, Chen B, Shokraneh N, Masoumi S, Wiese KC, Libbrecht MW. Continuous chromatin state feature annotation of the human epigenome. Bioinformatics 2022; 38:3029-3036. [PMID: 35451453 PMCID: PMC9154241 DOI: 10.1093/bioinformatics/btac283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 02/18/2022] [Accepted: 04/18/2022] [Indexed: 12/02/2022] Open
Abstract
Motivation Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These methods take as input a set of sequencing-based assays of epigenomic activity, such as ChIP-seq measurements of histone modification and transcription factor binding. They output an annotation of the genome that assigns a chromatin state label to each genomic position. Existing SAGA methods have several limitations caused by the discrete annotation framework: such annotations cannot easily represent varying strengths of genomic elements, and they cannot easily represent combinatorial elements that simultaneously exhibit multiple types of activity. To remedy these limitations, we propose an annotation strategy that instead outputs a vector of chromatin state features at each position rather than a single discrete label. Continuous modeling is common in other fields, such as in topic modeling of text documents. We propose a method, epigenome-ssm-nonneg, that uses a non-negative state space model to efficiently annotate the genome with chromatin state features. We also propose several measures of the quality of a chromatin state feature annotation and we compare the performance of several alternative methods according to these quality measures. Results We show that chromatin state features from epigenome-ssm-nonneg are more useful for several downstream applications than both continuous and discrete alternatives, including their ability to identify expressed genes and enhancers. Therefore, we expect that these continuous chromatin state features will be valuable reference annotations to be used in visualization and downstream analysis. Availability and implementation Source code for epigenome-ssm is available at https://github.com/habibdanesh/epigenome-ssm and Zenodo (DOI: 10.5281/zenodo.6507585). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Habib Daneshpajouh
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Bowen Chen
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Neda Shokraneh
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Shohre Masoumi
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Kay C Wiese
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| |
Collapse
|
10
|
Ahmad K, Henikoff S, Ramachandran S. Managing the Steady State Chromatin Landscape by Nucleosome Dynamics. Annu Rev Biochem 2022; 91:183-195. [PMID: 35303789 DOI: 10.1146/annurev-biochem-032620-104508] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Gene regulation arises out of dynamic competition between nucleosomes, transcription factors, and other chromatin proteins for the opportunity to bind genomic DNA. The timescales of nucleosome assembly and binding of factors to DNA determine the outcomes of this competition at any given locus. Here, we review how these properties of chromatin proteins and the interplay between the dynamics of different factors are critical for gene regulation. We discuss how molecular structures of large chromatin-associated complexes, kinetic measurements, and high resolution mapping of protein-DNA complexes in vivo set the boundary conditions for chromatin dynamics, leading to models of how the steady state behaviors of regulatory elements arise. Expected final online publication date for the Annual Review of Biochemistry, Volume 91 is June 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Kami Ahmad
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA;
| | - Steven Henikoff
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA; .,Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Srinivas Ramachandran
- Department of Biochemistry and Molecular Genetics and RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, Colorado, USA
| |
Collapse
|
11
|
Long H, Reeves R, Simon MM. Mouse genomic and cellular annotations. Mamm Genome 2022; 33:19-30. [PMID: 35124726 PMCID: PMC8913471 DOI: 10.1007/s00335-021-09936-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 11/22/2021] [Indexed: 11/28/2022]
Abstract
AbstractMice have emerged as one of the most popular and valuable model organisms in the research of human biology. This is due to their genetic and physiological similarity to humans, short generation times, availability of genetically homologous inbred strains, and relatively easy laboratory maintenance. Therefore, following the release of the initial human reference genome, the generation of the mouse reference genome was prioritised and represented an important scientific resource for the mouse genetics community. In 2002, the Mouse Genome Sequencing Consortium published an initial draft of the mouse reference genome which contained ~ 96% of the euchromatic genome of female C57BL/6 J mice. Almost two decades on from the publication of the initial draft, sequencing efforts have continued to increase the completeness and accuracy of the C57BL/6 J reference genome alongside advances in genome annotation. Additionally new sequencing technologies have provided a wealth of data that has added to the repertoire of annotations associated with traditional genomic annotations. Including but not limited to advances in regulatory elements, the 3D genome and individual cellular states. In this review we focus on the reference genome C57BL/6 J and summarise the different aspects of genomic and cellular annotations, as well as their relevance to mouse genetic research. We denote a genomic annotation as a functional unit of the genome. Cellular annotations are annotations of cell type or state, defined by the transcriptomic expression profile of a cell. Due to the wide-ranging number and diversity of annotations describing the mouse genome, we focus on gene, repeat and regulatory element annotation as well as two relatively new technologies; 3D genome architecture and single-cell sequencing outlining their utility in genetic research and their current challenges.
Collapse
Affiliation(s)
- Helen Long
- MRC Harwell Institute, Mammalian Genetics Unit, Harwell Campus, Oxfordshire, OX11 0RD, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Richard Reeves
- MRC Harwell Institute, Mammalian Genetics Unit, Harwell Campus, Oxfordshire, OX11 0RD, UK
| | - Michelle M Simon
- MRC Harwell Institute, Mammalian Genetics Unit, Harwell Campus, Oxfordshire, OX11 0RD, UK.
| |
Collapse
|
12
|
Mansouri M, Khakabimamaghani S, Chindelevitch L, Ester M. Aristotle: stratified causal discovery for omics data. BMC Bioinformatics 2022; 23:42. [PMID: 35033007 PMCID: PMC8760642 DOI: 10.1186/s12859-021-04521-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 12/08/2021] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND There has been a simultaneous increase in demand and accessibility across genomics, transcriptomics, proteomics and metabolomics data, known as omics data. This has encouraged widespread application of omics data in life sciences, from personalized medicine to the discovery of underlying pathophysiology of diseases. Causal analysis of omics data may provide important insight into the underlying biological mechanisms. Existing causal analysis methods yield promising results when identifying potential general causes of an observed outcome based on omics data. However, they may fail to discover the causes specific to a particular stratum of individuals and missing from others. METHODS To fill this gap, we introduce the problem of stratified causal discovery and propose a method, Aristotle, for solving it. Aristotle addresses the two challenges intrinsic to omics data: high dimensionality and hidden stratification. It employs existing biological knowledge and a state-of-the-art patient stratification method to tackle the above challenges and applies a quasi-experimental design method to each stratum to find stratum-specific potential causes. RESULTS Evaluation based on synthetic data shows better performance for Aristotle in discovering true causes under different conditions compared to existing causal discovery methods. Experiments on a real dataset on Anthracycline Cardiotoxicity indicate that Aristotle's predictions are consistent with the existing literature. Moreover, Aristotle makes additional predictions that suggest further investigations.
Collapse
Affiliation(s)
- Mehrdad Mansouri
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Sahand Khakabimamaghani
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Leonid Chindelevitch
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Martin Ester
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| |
Collapse
|
13
|
Wu T, Jiang D, Zou M, Sun W, Wu D, Cui J, Huntress I, Peng X, Li G. Coupling high-throughput mapping with proteomics analysis delineates cis-regulatory elements at high resolution. Nucleic Acids Res 2022; 50:e5. [PMID: 34634809 PMCID: PMC8754656 DOI: 10.1093/nar/gkab890] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/20/2021] [Accepted: 09/17/2021] [Indexed: 12/30/2022] Open
Abstract
Growing evidence suggests that functional cis-regulatory elements (cis-REs) not only exist in epigenetically marked but also in unmarked sites of the human genome. While it is already difficult to identify cis-REs in the epigenetically marked sites, interrogating cis-REs residing within the unmarked sites is even more challenging. Here, we report adapting Reel-seq, an in vitro high-throughput (HTP) technique, to fine-map cis-REs at high resolution over a large region of the human genome in a systematic and continuous manner. Using Reel-seq, as a proof-of-principle, we identified 408 candidate cis-REs by mapping a 58 kb core region on the aging-related CDKN2A/B locus that harbors p16INK4a. By coupling Reel-seq with FREP-MS, a proteomics analysis technique, we characterized two cis-REs, one in an epigenetically marked site and the other in an epigenetically unmarked site. These elements are shown to regulate the p16INK4a expression over an ∼100 kb distance by recruiting the poly(A) binding protein PABPC1 and the transcription factor FOXC2. Downregulation of either PABPC1 or FOXC2 in human endothelial cells (ECs) can induce the p16INK4a-dependent cellular senescence. Thus, we confirmed the utility of Reel-seq and FREP-MS analyses for the systematic identification of cis-REs at high resolution over a large region of the human genome.
Collapse
Affiliation(s)
- Ting Wu
- Aging Institute, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Medicine, Xiangya School of Medicine, Central South University, Changsha 410083, China
| | - Danli Jiang
- Aging Institute, University of Pittsburgh, Pittsburgh, PA 15219, USA
| | - Meijuan Zou
- Aging Institute, University of Pittsburgh, Pittsburgh, PA 15219, USA
| | - Wei Sun
- Center for Pulmonary Vascular Biology and Medicine, Pittsburgh Heart, Lung, Blood, and Vascular Medicine Institute, University of Pittsburgh School of Medicine and University of Pittsburgh Medical Center, Pittsburgh, PA 15261, USA
| | - Di Wu
- Division of Oral Craniofacial Health Science, Adams School of Dentistry, Department of Biostatistics, UNC Gillings School of Global Public Health, University of North Carolina, NC 27599, USA
| | - Jing Cui
- Department of Medicine, Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Ian Huntress
- Department of Molecular Biomedical Sciences, North Carolina State University College of Veterinary Medicine, Raleigh, NC 27607, USA
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC 27695, USA
| | - Xinxia Peng
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC 27695, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA
| | - Gang Li
- Aging Institute, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Medicine, Division of Cardiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15223, USA
| |
Collapse
|
14
|
Vu H, Ernst J. Universal annotation of the human genome through integration of over a thousand epigenomic datasets. Genome Biol 2022; 23:9. [PMID: 34991667 PMCID: PMC8734071 DOI: 10.1186/s13059-021-02572-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 12/08/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Genome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative "stacked modeling" approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges. RESULTS Using a version of ChromHMM enhanced for large-scale applications, we apply the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, with the learned model denoted as the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we use in characterizing each state. Compared to per-cell-type annotations, the full-stack annotations directly differentiate constitutive from cell type-specific activity and is more predictive of locations of external genomic annotations. CONCLUSIONS The full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing per-cell-type annotations for studying the non-coding human genome.
Collapse
Affiliation(s)
- Ha Vu
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, CA, 90095, USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, CA, 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, CA, 90095, USA
- Computer Science Department, University of California, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, 90095, USA
| |
Collapse
|
15
|
Giacopuzzi E, Popitsch N, Taylor JC. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2522-2535. [PMID: 35234913 PMCID: PMC8934622 DOI: 10.1093/nar/gkac130] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 02/02/2022] [Accepted: 02/14/2022] [Indexed: 11/25/2022] Open
Abstract
Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses.
Collapse
Affiliation(s)
- Edoardo Giacopuzzi
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford OX4 2PG, UK
| | - Niko Popitsch
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
- Max Perutz Labs, University of Vienna, Dr. Bohr-Gasse 9, 1030 Vienna, Austria
| | - Jenny C Taylor
- To whom correspondence should be addressed. Tel: +44 01865 287631;
| |
Collapse
|
16
|
Libbrecht MW, Chan RCW, Hoffman MM. Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns. PLoS Comput Biol 2021; 17:e1009423. [PMID: 34648491 PMCID: PMC8516206 DOI: 10.1371/journal.pcbi.1009423] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.
Collapse
Affiliation(s)
| | - Rachel C. W. Chan
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Michael M. Hoffman
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| |
Collapse
|
17
|
Kouakou MR, Cameron D, Hannon E, Dempster EL, Mill J, Hill MJ, Bray NJ. Sites of active gene regulation in the prenatal frontal cortex and their role in neuropsychiatric disorders. Am J Med Genet B Neuropsychiatr Genet 2021; 186:376-388. [PMID: 34632689 DOI: 10.1002/ajmg.b.32877] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 04/06/2021] [Accepted: 09/21/2021] [Indexed: 12/21/2022]
Abstract
Common genetic variation appears to largely influence risk for neuropsychiatric disorders through effects on gene regulation. It is therefore possible to shed light on the biology of these conditions by testing for enrichment of associated genetic variation within regulatory genomic regions operating in specific tissues or cell types. Here, we have used the assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-Seq) to map open chromatin (an index of active regulatory genomic regions) in bulk tissue, NeuN+ and NeuN- nuclei from the prenatal human frontal cortex, and tested enrichment of single-nucleotide polymorphism (SNP) heritability for five neuropsychiatric disorders (autism spectrum disorder, attention deficit hyperactivity disorder [ADHD], bipolar disorder, major depressive disorder, and schizophrenia) within these regions. We observed significant enrichment of SNP heritability for ADHD, major depressive disorder, and schizophrenia within open chromatin regions (OCRs) mapped in bulk fetal frontal cortex, and for all five tested neuropsychiatric conditions when we restricted these sites to those overlapping histone modifications indicative of enhancers (H3K4me1) or promoters (H3K4me3) in fetal brain. SNP heritability for neuropsychiatric disorders was significantly enriched in OCRs identified in fetal frontal cortex NeuN- as well as NeuN+ nuclei overlapping fetal brain H3K4me1 or H3K4me3 sites. We additionally demonstrate the utility of our mapped OCRs for prioritizing potentially functional SNPs at genome-wide significant risk loci for neuropsychiatric disorders. Our data provide evidence for an early neurodevelopmental component to a range of neuropsychiatric conditions and highlight an important role for regulatory genomic regions active within both NeuN+ and NeuN- cells of the prenatal brain.
Collapse
Affiliation(s)
- Manuela R Kouakou
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
| | - Darren Cameron
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
| | - Eilis Hannon
- University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Emma L Dempster
- University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Jonathan Mill
- University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Matthew J Hill
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
| | - Nicholas J Bray
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
| |
Collapse
|
18
|
Fang K, Li T, Huang Y, Jin VX. NucHMM: a method for quantitative modeling of nucleosome organization identifying functional nucleosome states distinctly associated with splicing potentiality. Genome Biol 2021; 22:250. [PMID: 34446075 PMCID: PMC8390234 DOI: 10.1186/s13059-021-02465-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 08/12/2021] [Indexed: 01/01/2023] Open
Abstract
We develop a novel computational method, NucHMM, to identify functional nucleosome states associated with cell type-specific combinatorial histone marks and nucleosome organization features such as phasing, spacing and positioning. We test it on publicly available MNase-seq and ChIP-seq data in MCF7, H1, and IMR90 cells and identify 11 distinct functional nucleosome states. We demonstrate these nucleosome states are distinctly associated with the splicing potentiality of skipping exons. This advances our understanding of the chromatin function at the nucleosome level and offers insights into the interplay between nucleosome organization and splicing processes.
Collapse
Affiliation(s)
- Kun Fang
- Department of Molecular Medicine, UTHSA-UTSA Joint Biomedical Engineering Program, San Antonio, TX, 78229, USA
| | - Tianbao Li
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229, USA
| | - Yufei Huang
- Department of Medicine, UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15232, USA
| | - Victor X Jin
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229, USA.
| |
Collapse
|
19
|
Bayat F, Libbrecht M. VSS: Variance-stabilized signals for sequencing-based genomic signals. Bioinformatics 2021; 37:4383-4391. [PMID: 34165492 PMCID: PMC8652025 DOI: 10.1093/bioinformatics/btab457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 04/28/2021] [Accepted: 06/17/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION A sequencing-based genomic assay such as ChIP-seq outputs a real-valued signal for each position in the genome that measures the strength of activity at that position. Most genomic signals lack the property of variance stabilization. That is, a difference between 0 and 100 reads usually has a very different statistical importance from a difference between 1,000 and 1,100 reads. A statistical model such as a negative binomial distribution can account for this pattern, but learning these models is computationally challenging. Therefore, many applications - including imputation and segmentation and genome annotation (SAGA) - instead use Gaussian models and use a transformation such as log or inverse hyperbolic sine (asinh) to stabilize variance. RESULTS We show here that existing transformations do not fully stabilize variance in genomic data sets. To solve this issue, we propose VSS, a method that produces variance-stabilized signals for sequencing-based genomic signals. VSS learns the empirical relationship between the mean and variance of a given signal data set and produces transformed signals that normalize for this dependence. We show that VSS successfully stabilizes variance and that doing so improves downstream applications such as SAGA. VSS will eliminate the need for downstream methods to implement complex mean-variance relationship models, and will enable genomic signals to be easily understood by eye. AVAILABILITY https://github.com/faezeh-bayat/VSS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Faezeh Bayat
- Department of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Maxwell Libbrecht
- Department of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
20
|
Singh G, Mullany S, Moorthy SD, Zhang R, Mehdi T, Tian R, Duncan AG, Moses AM, Mitchell JA. A flexible repertoire of transcription factor binding sites and a diversity threshold determines enhancer activity in embryonic stem cells. Genome Res 2021; 31:564-575. [PMID: 33712417 PMCID: PMC8015845 DOI: 10.1101/gr.272468.120] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 02/19/2021] [Indexed: 12/28/2022]
Abstract
Transcriptional enhancers are critical for development and phenotype evolution and are often mutated in disease contexts; however, even in well-studied cell types, the sequence code conferring enhancer activity remains unknown. To examine the enhancer regulatory code for pluripotent stem cells, we identified genomic regions with conserved binding of multiple transcription factors in mouse and human embryonic stem cells (ESCs). Examination of these regions revealed that they contain on average 12.6 conserved transcription factor binding site (TFBS) sequences. Enriched TFBSs are a diverse repertoire of 70 different sequences representing the binding sequences of both known and novel ESC regulators. Using a diverse set of TFBSs from this repertoire was sufficient to construct short synthetic enhancers with activity comparable to native enhancers. Site-directed mutagenesis of conserved TFBSs in endogenous enhancers or TFBS deletion from synthetic sequences revealed a requirement for 10 or more different TFBSs. Furthermore, specific TFBSs, including the POU5F1:SOX2 comotif, are dispensable, despite cobinding the POU5F1 (also known as OCT4), SOX2, and NANOG master regulators of pluripotency. These findings reveal that a TFBS sequence diversity threshold overrides the need for optimized regulatory grammar and individual TFBSs that recruit specific master regulators.
Collapse
Affiliation(s)
- Gurdeep Singh
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Shanelle Mullany
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Sakthi D Moorthy
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Richard Zhang
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Tahmid Mehdi
- Department of Computer Science, University of Toronto, Toronto, M5S 2E4, Canada
| | - Ruxiao Tian
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Andrew G Duncan
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| | - Alan M Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada.,Department of Computer Science, University of Toronto, Toronto, M5S 2E4, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 3B3, Canada
| | - Jennifer A Mitchell
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, M5S 3G5, Canada
| |
Collapse
|
21
|
Human progranulin-expressing mice as a novel tool for the development of progranulin-modulating therapeutics. Neurobiol Dis 2021; 153:105314. [PMID: 33636385 DOI: 10.1016/j.nbd.2021.105314] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 01/24/2021] [Accepted: 02/22/2021] [Indexed: 11/24/2022] Open
Abstract
The granulin protein (also known as, and hereafter referred to as, progranulin) is a secreted glycoprotein that contributes to overall brain health. Heterozygous loss-of-function mutations in the gene encoding the progranulin protein (Granulin Precursor, GRN) are a common cause of familial frontotemporal dementia (FTD). Gene therapy approaches that aim to increase progranulin expression from a single wild-type allele, an area of active investigation for the potential treatment of GRN-dependent FTD, will benefit from the availability of a mouse model that expresses a genomic copy of the human GRN gene. Here we report the development and characterization of a novel mouse model that expresses the entire human GRN gene in its native genomic context as a single copy inserted into a defined locus (Hprt) in the mouse genome. We show that human and mouse progranulin are expressed in a similar tissue-specific pattern, suggesting that the two genes are regulated by similar mechanisms. Human progranulin rescues a phenotype characteristic of progranulin-null mice, the exaggerated and early deposition of the aging pigment lipofuscin in the brain, indicating that the two proteins are functionally similar. Longitudinal behavioural and neuropathological analyses revealed no significant differences between wild-type and human progranulin-overexpressing mice up to 18 months of age, providing evidence that long-term increase of progranulin levels is well tolerated in mice. Finally, we demonstrate that human progranulin expression can be increased in the brain using an antisense oligonucleotide that inhibits a known GRN-regulating micro-RNA, demonstrating that the transgene is responsive to potential gene therapy drugs. Human progranulin-expressing mice represent a novel and valuable tool to expedite the development of progranulin-modulating therapeutics.
Collapse
|
22
|
Identification and characterization of constrained non-exonic bases lacking predictive epigenomic and transcription factor binding annotations. Nat Commun 2020; 11:6168. [PMID: 33268804 PMCID: PMC7710766 DOI: 10.1038/s41467-020-19962-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 11/06/2020] [Indexed: 12/12/2022] Open
Abstract
Annotations of evolutionary sequence constraint based on multi-species genome alignments and genome-wide maps of epigenomic marks and transcription factor binding provide important complementary information for understanding the human genome and genetic variation. Here we developed the Constrained Non-Exonic Predictor (CNEP) to quantify the evidence of each base in the genome being in an evolutionarily constrained non-exonic element from an input of over 60,000 epigenomic and transcription factor binding features. We find that the CNEP score outperforms baseline and related existing scores at predicting evolutionarily constrained non-exonic bases from such data. However, a subset of them are still not well predicted by CNEP. We developed a complementary Conservation Signature Score by CNEP (CSS-CNEP) that is predictive of those bases. We further characterize the nature of constrained non-exonic bases with low CNEP scores using additional types of information. CNEP and CSS-CNEP are resources for analyzing constrained non-exonic bases in the genome. Genome-wide maps of evolutionary constraint and large-scale compendia of epigenomic and transcription factor data provide complementary information for genome annotation. Here, the authors develop the Constrained Non-Exonic Predictor (CNEP) that enables better understanding of their relationship.
Collapse
|
23
|
van der Lee R, Correard S, Wasserman WW. Deregulated Regulators: Disease-Causing cis Variants in Transcription Factor Genes. Trends Genet 2020; 36:523-539. [DOI: 10.1016/j.tig.2020.04.006] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 04/15/2020] [Accepted: 04/16/2020] [Indexed: 12/12/2022]
|
24
|
Dapas M, Dunaif A. The contribution of rare genetic variants to the pathogenesis of polycystic ovary syndrome. ACTA ACUST UNITED AC 2020; 12:26-32. [PMID: 32440573 DOI: 10.1016/j.coemr.2020.02.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Polycystic ovary syndrome (PCOS) is a highly heritable disorder, but only a small proportion of the heritability can be accounted for by common genetic risk variants identified to date. It is possible that variants with lower allele frequencies that cannot be detected using genome-wide association study arrays contribute to PCOS. Here, we discuss the challenges inherent to studying rare genetic variants in complex disease and review several recent studies that have used DNA sequencing techniques to investigate whether rare variants play a role in PCOS pathogenesis. We evaluate these findings in the context of the latest literature in PCOS and complex disease genetics.
Collapse
|
25
|
Zhou Y, Sun Y, Huang D, Li MJ. epiCOLOC: Integrating Large-Scale and Context-Dependent Epigenomics Features for Comprehensive Colocalization Analysis. Front Genet 2020; 11:53. [PMID: 32117461 PMCID: PMC7029718 DOI: 10.3389/fgene.2020.00053] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 01/17/2020] [Indexed: 12/18/2022] Open
Abstract
High-throughput genome-wide epigenomic assays, such as ChIP-seq, DNase-seq and ATAC-seq, have profiled a huge number of functional elements across numerous human tissues/cell types, which provide an unprecedented opportunity to interpret human genome and disease in context-dependent manner. Colocalization analysis determines whether genomic features are functionally related to a given search and will facilitate identifying the underlying biological functions characterizing intricate relationships with queries for genomic regions. Existing colocalization methods leveraged diverse assumptions and background models to assess the significance of enrichment, however, they only provided limited and predefined sets of epigenomic features. Here, we comprehensively collected and integrated over 44,385 bulk or single-cell epigenomic assays across 53 human tissues/cell types, such as transcription factor binding, histone modification, open chromatin and transcriptional event. By classifying these profiles into hierarchy of tissue/cell type, we developed a web portal, epiCOLOC (http://mulinlab.org/epicoloc or http://mulinlab.tmu.edu.cn/epicoloc), for users to perform context-dependent colocalization analysis in a convenient way.
Collapse
Affiliation(s)
- Yao Zhou
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Yongzheng Sun
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Dandan Huang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Mulin Jun Li
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China.,Collaborative Innovation Center of Tianjin for Medical Epigenetics, Tianjin Key Laboratory of Medical Epigenetics, Tianjin Medical University, Tianjin, China
| |
Collapse
|