1
|
Xiang G, He X, Giardine BM, Isaac KJ, Taylor DJ, McCoy RC, Jansen C, Keller CA, Wixom AQ, Cockburn A, Miller A, Qi Q, He Y, Li Y, Lichtenberg J, Heuston EF, Anderson SM, Luan J, Vermunt MW, Yue F, Sauria MEG, Schatz MC, Taylor J, Göttgens B, Hughes JR, Higgs DR, Weiss MJ, Cheng Y, Blobel GA, Bodine DM, Zhang Y, Li Q, Mahony S, Hardison RC. Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes. Genome Res 2024; 34:1089-1105. [PMID: 38951027 PMCID: PMC11368181 DOI: 10.1101/gr.277950.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 06/24/2024] [Indexed: 07/03/2024]
Abstract
Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state regulatory potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbor distinctive transcription factor binding motifs that are similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we show that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.
Collapse
Affiliation(s)
- Guanjue Xiang
- Bioinformatics and Genomics Graduate Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02215, USA
| | - Xi He
- Bioinformatics and Genomics Graduate Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Belinda M Giardine
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Kathryn J Isaac
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Camden Jansen
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Cheryl A Keller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Alexander Q Wixom
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - April Cockburn
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Amber Miller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Qian Qi
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Yanghua He
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
- Department of Human Nutrition, Food and Animal Sciences, University of Hawaìi at Mānoa, Honolulu, Hawaii 96822, USA
| | - Yichao Li
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Jens Lichtenberg
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, Maryland 20892, USA
| | - Elisabeth F Heuston
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, Maryland 20892, USA
| | - Stacie M Anderson
- Flow Cytometry Core, National Human Genome Research Institute, Bethesda, Maryland 20892, USA
| | - Jing Luan
- Department of Pediatrics, Children's Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Marit W Vermunt
- Department of Pediatrics, Children's Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Evanston, Illinois 60611, USA
| | - Michael E G Sauria
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Berthold Göttgens
- Wellcome and MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, United Kingdom
| | - Jim R Hughes
- MRC Weatherall Institute of Molecular Medicine, Oxford University, Oxford OX3 9DS, United Kingdom
| | - Douglas R Higgs
- MRC Weatherall Institute of Molecular Medicine, Oxford University, Oxford OX3 9DS, United Kingdom
| | - Mitchell J Weiss
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Yong Cheng
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Gerd A Blobel
- Department of Pediatrics, Children's Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - David M Bodine
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, Maryland 20892, USA
| | - Yu Zhang
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Qunhua Li
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Computational Biology and Bioinformatics, Genome Sciences Institute, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Computational Biology and Bioinformatics, Genome Sciences Institute, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- Center for Computational Biology and Bioinformatics, Genome Sciences Institute, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
2
|
Xiang G, He X, Giardine BM, Isaac KJ, Taylor DJ, McCoy RC, Jansen C, Keller CA, Wixom AQ, Cockburn A, Miller A, Qi Q, He Y, Li Y, Lichtenberg J, Heuston EF, Anderson SM, Luan J, Vermunt MW, Yue F, Sauria MEG, Schatz MC, Taylor J, Gottgens B, Hughes JR, Higgs DR, Weiss MJ, Cheng Y, Blobel GA, Bodine DM, Zhang Y, Li Q, Mahony S, Hardison RC. Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.02.535219. [PMID: 37066352 PMCID: PMC10103973 DOI: 10.1101/2023.04.02.535219] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state Regulatory Potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbored distinctive transcription factor binding motifs that were similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we showed that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.
Collapse
|
3
|
Foroozandeh Shahraki M, Farahbod M, Libbrecht MW. Robust chromatin state annotation. Genome Res 2024; 34:469-483. [PMID: 38514204 PMCID: PMC11067878 DOI: 10.1101/gr.278343.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 03/19/2024] [Indexed: 03/23/2024]
Abstract
With the goal of mapping genomic activity, international projects have recently measured epigenetic activity in hundreds of cell and tissue types. Chromatin state annotations produced by segmentation and genome annotation (SAGA) methods have emerged as the predominant way to summarize these epigenomic data sets in order to annotate the genome. These chromatin state annotations are essential for many genomic tasks, including identifying active regulatory elements and interpreting disease-associated genetic variation. However, despite the widespread applications of SAGA methods, no principled approach exists to evaluate the statistical significance of chromatin state assignments. Here, we propose the first method for assigning calibrated confidence scores to chromatin state annotations. Toward this goal, we performed a comprehensive evaluation of the reproducibility of the two most widely used existing SAGA methods, ChromHMM and Segway. We found that their predictions are frequently irreproducible. For example, when applying the same SAGA method on two sets of experimental replicates, 27%-69% of predicted enhancers fail to replicate. This suggests that a substantial fraction of predicted elements in existing chromatin state annotations cannot be relied upon. To remedy this problem, we introduce SAGAconf, a method for assigning a measure of confidence (r-value) to chromatin state annotations. SAGAconf works with any SAGA method and assigns an r-value to each genomic bin of a chromatin state annotation that represents the probability that the label of this bin will be reproduced in a replicated experiment. Thus, SAGAconf allows a researcher to select only the reliable predictions from a chromatin annotation for use in downstream analyses.
Collapse
Affiliation(s)
| | - Marjan Farahbod
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
| |
Collapse
|
4
|
Fan K, Pfister E, Weng Z. Toward a comprehensive catalog of regulatory elements. Hum Genet 2023; 142:1091-1111. [PMID: 36935423 DOI: 10.1007/s00439-023-02519-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/03/2023] [Indexed: 03/21/2023]
Abstract
Regulatory elements are the genomic regions that interact with transcription factors to control cell-type-specific gene expression in different cellular environments. A precise and complete catalog of functional elements encoded by the human genome is key to understanding mammalian gene regulation. Here, we review the current state of regulatory element annotation. We first provide an overview of assays for characterizing functional elements, including genome, epigenome, transcriptome, three-dimensional chromatin interaction, and functional validation assays. We then discuss computational methods for defining regulatory elements, including peak-calling and other statistical modeling methods. Finally, we introduce several high-quality lists of regulatory element annotations and suggest potential future directions.
Collapse
Affiliation(s)
- Kaili Fan
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Edith Pfister
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, 368 Plantation Street, ASC5-1069, Worcester, MA, 01605, USA.
| |
Collapse
|
5
|
Guneri-Sozeri PY, Özden-Yılmaz G, Kisim A, Cakiroglu E, Eray A, Uzuner H, Karakülah G, Pesen-Okvur D, Senturk S, Erkek-Ozhan S. FLI1 and FRA1 transcription factors drive the transcriptional regulatory networks characterizing muscle invasive bladder cancer. Commun Biol 2023; 6:199. [PMID: 36805539 PMCID: PMC9941102 DOI: 10.1038/s42003-023-04561-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 02/07/2023] [Indexed: 02/22/2023] Open
Abstract
Bladder cancer is mostly present in the form of urothelium carcinoma, causing over 150,000 deaths each year. Its histopathological classification as muscle invasive (MIBC) and non-muscle invasive (NMIBC) is the most prominent aspect, affecting the prognosis and progression of this disease. In this study, we defined the active regulatory landscape of MIBC and NMIBC cell lines using H3K27ac ChIP-seq and used an integrative approach to combine our findings with existing data. Our analysis revealed FRA1 and FLI1 as two critical transcription factors differentially regulating MIBC regulatory landscape. We show that FRA1 and FLI1 regulate the genes involved in epithelial cell migration and cell junction organization. Knock-down of FRA1 and FLI1 in MIBC revealed the downregulation of several EMT-related genes such as MAP4K4 and FLOT1. Further, ChIP-SICAP performed for FRA1 and FLI1 enabled us to infer chromatin binding partners of these transcription factors and link this information with their target genes. Finally, we show that knock-down of FRA1 and FLI1 result in significant reduction of invasion capacity of MIBC cells towards muscle microenvironment using IC-CHIP assays. Our results collectively highlight the role of these transcription factors in selection and design of targeted options for treatment of MIBC.
Collapse
Affiliation(s)
- Perihan Yagmur Guneri-Sozeri
- grid.21200.310000 0001 2183 9022Izmir Biomedicine and Genome Center, Inciralti, 35340 Izmir, Turkey ,grid.21200.310000 0001 2183 9022Dokuz Eylül University Izmir International Biomedicine and Genome Institute, Inciralti, 35340 Izmir, Turkey
| | - Gülden Özden-Yılmaz
- grid.21200.310000 0001 2183 9022Izmir Biomedicine and Genome Center, Inciralti, 35340 Izmir, Turkey
| | - Asli Kisim
- grid.419609.30000 0000 9261 240XIzmir Institute of Technology, Urla, 35430 Izmir, Turkey
| | - Ece Cakiroglu
- grid.21200.310000 0001 2183 9022Izmir Biomedicine and Genome Center, Inciralti, 35340 Izmir, Turkey ,grid.21200.310000 0001 2183 9022Dokuz Eylül University Izmir International Biomedicine and Genome Institute, Inciralti, 35340 Izmir, Turkey
| | - Aleyna Eray
- grid.21200.310000 0001 2183 9022Izmir Biomedicine and Genome Center, Inciralti, 35340 Izmir, Turkey ,grid.21200.310000 0001 2183 9022Dokuz Eylül University Izmir International Biomedicine and Genome Institute, Inciralti, 35340 Izmir, Turkey
| | - Hamdiye Uzuner
- grid.21200.310000 0001 2183 9022Izmir Biomedicine and Genome Center, Inciralti, 35340 Izmir, Turkey ,grid.21200.310000 0001 2183 9022Dokuz Eylül University Izmir International Biomedicine and Genome Institute, Inciralti, 35340 Izmir, Turkey
| | - Gökhan Karakülah
- grid.21200.310000 0001 2183 9022Izmir Biomedicine and Genome Center, Inciralti, 35340 Izmir, Turkey ,grid.21200.310000 0001 2183 9022Dokuz Eylül University Izmir International Biomedicine and Genome Institute, Inciralti, 35340 Izmir, Turkey
| | - Devrim Pesen-Okvur
- grid.419609.30000 0000 9261 240XIzmir Institute of Technology, Urla, 35430 Izmir, Turkey
| | - Serif Senturk
- grid.21200.310000 0001 2183 9022Izmir Biomedicine and Genome Center, Inciralti, 35340 Izmir, Turkey ,grid.21200.310000 0001 2183 9022Dokuz Eylül University Izmir International Biomedicine and Genome Institute, Inciralti, 35340 Izmir, Turkey
| | - Serap Erkek-Ozhan
- Izmir Biomedicine and Genome Center, Inciralti, 35340, Izmir, Turkey.
| |
Collapse
|
6
|
Orouji E, Raman AT. Computational methods to explore chromatin state dynamics. Brief Bioinform 2022; 23:6751148. [PMID: 36208178 PMCID: PMC9677473 DOI: 10.1093/bib/bbac439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/25/2022] [Accepted: 09/09/2022] [Indexed: 12/14/2022] Open
Abstract
The human genome is marked by several singular and combinatorial histone modifications that shape the different states of chromatin and its three-dimensional organization. Genome-wide mapping of these marks as well as histone variants and open chromatin regions is commonly carried out via profiling DNA-protein binding or via chromatin accessibility methods. After the generation of epigenomic datasets in a cell type, statistical models can be used to annotate the noncoding regions of DNA and infer the combinatorial histone marks or chromatin states (CS). These methods involve partitioning the genome and labeling individual segments based on their CS patterns. Chromatin labels enable the systematic discovery of genomic function and activity and can label the gene body, promoters or enhancers without using other genomic maps. CSs are dynamic and change under different cell conditions, such as in normal, preneoplastic or tumor cells. This review aims to explore the available computational tools that have been developed to capture CS alterations under two or more cellular conditions.
Collapse
Affiliation(s)
- Elias Orouji
- Corresponding author: Elias Orouji, Epigenomics Lab, Princess Margaret Cancer Centre, University Health Network (UHN), 101 College St., Toronto, ON M5G 1 L7, Canada. Tel: +1 (917) 647-2202; E-mail:
| | - Ayush T Raman
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Cambridge, Massachusetts, USA
| |
Collapse
|
7
|
Libbrecht MW, Chan RCW, Hoffman MM. Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns. PLoS Comput Biol 2021; 17:e1009423. [PMID: 34648491 PMCID: PMC8516206 DOI: 10.1371/journal.pcbi.1009423] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.
Collapse
Affiliation(s)
| | - Rachel C. W. Chan
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Michael M. Hoffman
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| |
Collapse
|
8
|
Xiang G, Giardine BM, Mahony S, Zhang Y, Hardison RC. S3V2-IDEAS: a package for normalizing, denoising and integrating epigenomic datasets across different cell types. Bioinformatics 2021; 37:3011-3013. [PMID: 33681991 PMCID: PMC8479670 DOI: 10.1093/bioinformatics/btab148] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 01/26/2021] [Accepted: 03/01/2021] [Indexed: 02/02/2023] Open
Abstract
SUMMARY Epigenetic modifications reflect key aspects of transcriptional regulation, and many epigenomic datasets have been generated under different biological contexts to provide insights into regulatory processes. However, the technical noise in epigenomic datasets and the many dimensions (features) examined make it challenging to effectively extract biologically meaningful inferences from these datasets. We developed a package that reduces noise while normalizing the epigenomic data by a novel normalization method, followed by integrative dimensional reduction by learning and assigning epigenetic states. This package, called S3V2-IDEAS, can be used to identify epigenetic states for multiple features, or identify discretized signal intensity levels and a master peak list across different cell types for a single feature. We illustrate the outputs and performance of S3V2-IDEAS using 137 epigenomics datasets from the VISION project that provides ValIdated Systematic IntegratiON of epigenomic data in hematopoiesis. AVAILABILITY AND IMPLEMENTATION S3V2-IDEAS pipeline is freely available as open source software released under an MIT license at: https://github.com/guanjue/S3V2_IDEAS_ESMP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guanjue Xiang
- The Bioinformatics and Genomics Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- To whom correspondence should be addressed. or
| | - Belinda M Giardine
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Yu Zhang
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- To whom correspondence should be addressed. or
| |
Collapse
|
9
|
Kunz T, Rieber L, Mahony S. Assessing relationships between chromatin interactions and regulatory genomic activities using the self-organizing map. Methods 2020; 189:12-21. [PMID: 32652235 DOI: 10.1016/j.ymeth.2020.07.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 06/09/2020] [Accepted: 07/03/2020] [Indexed: 11/24/2022] Open
Abstract
Few existing methods enable the visualization of relationships between regulatory genomic activities and genome organization as captured by Hi-C experimental data. Genome-wide Hi-C datasets are often displayed using "heatmap" matrices, but it is difficult to intuit from these heatmaps which biochemical activities are compartmentalized together. High-dimensional Hi-C data vectors can alternatively be projected onto three-dimensional space using dimensionality reduction techniques. The resulting three-dimensional structures can serve as scaffolds for projecting other forms of genomic information, thereby enabling the exploration of relationships between genome organization and various genome annotations. However, while three-dimensional models are contextually appropriate for chromatin interaction data, some analyses and visualizations may be more intuitively and conveniently performed in two-dimensional space. We present a novel approach to the visualization and analysis of chromatin organization based on the Self-Organizing Map (SOM). The SOM algorithm provides a two-dimensional manifold which adapts to represent the high dimensional chromatin interaction space. The resulting data structure can then be used to assess relationships between regulatory genomic activities and chromatin interactions. For example, given a set of genomic coordinates corresponding to a given biochemical activity, the degree to which this activity is segregated or compartmentalized in chromatin interaction space can be intuitively visualized on the 2D SOM grid and quantified using Lorenz curve analysis. We demonstrate our approach for exploratory analysis of genome compartmentalization in a high-resolution Hi-C dataset from the human GM12878 cell line. Our SOM-based approach provides an intuitive visualization of the large-scale structure of Hi-C data and serves as a platform for integrative analyses of the relationships between various genomic activities and genome organization.
Collapse
Affiliation(s)
- Timothy Kunz
- Biochemistry & Molecular Biology Department, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA
| | - Lila Rieber
- Biochemistry & Molecular Biology Department, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA
| | - Shaun Mahony
- Biochemistry & Molecular Biology Department, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
10
|
Xiang G, Keller CA, Heuston E, Giardine BM, An L, Wixom AQ, Miller A, Cockburn A, Sauria MEG, Weaver K, Lichtenberg J, Göttgens B, Li Q, Bodine D, Mahony S, Taylor J, Blobel GA, Weiss MJ, Cheng Y, Yue F, Hughes J, Higgs DR, Zhang Y, Hardison RC. An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis. Genome Res 2020; 30:472-484. [PMID: 32132109 PMCID: PMC7111515 DOI: 10.1101/gr.255760.119] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 02/21/2020] [Indexed: 01/29/2023]
Abstract
Thousands of epigenomic data sets have been generated in the past decade, but it is difficult for researchers to effectively use all the data relevant to their projects. Systematic integrative analysis can help meet this need, and the VISION project was established for validated systematic integration of epigenomic data in hematopoiesis. Here, we systematically integrated extensive data recording epigenetic features and transcriptomes from many sources, including individual laboratories and consortia, to produce a comprehensive view of the regulatory landscape of differentiating hematopoietic cell types in mouse. By using IDEAS as our integrative and discriminative epigenome annotation system, we identified and assigned epigenetic states simultaneously along chromosomes and across cell types, precisely and comprehensively. Combining nuclease accessibility and epigenetic states produced a set of more than 200,000 candidate cis-regulatory elements (cCREs) that efficiently capture enhancers and promoters. The transitions in epigenetic states of these cCREs across cell types provided insights into mechanisms of regulation, including decreases in numbers of active cCREs during differentiation of most lineages, transitions from poised to active or inactive states, and shifts in nuclease accessibility of CTCF-bound elements. Regression modeling of epigenetic states at cCREs and gene expression produced a versatile resource to improve selection of cCREs potentially regulating target genes. These resources are available from our VISION website to aid research in genomics and hematopoiesis.
Collapse
Affiliation(s)
- Guanjue Xiang
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Cheryl A Keller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Elisabeth Heuston
- NHGRI Hematopoiesis Section, Genetics and Molecular Biology Branch, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Belinda M Giardine
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Lin An
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Alexander Q Wixom
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Amber Miller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - April Cockburn
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Michael E G Sauria
- Departments of Biology and Computer Science, Johns Hopkins University, Baltimore, Maryland 20218, USA
| | - Kathryn Weaver
- Departments of Biology and Computer Science, Johns Hopkins University, Baltimore, Maryland 20218, USA
| | - Jens Lichtenberg
- NHGRI Hematopoiesis Section, Genetics and Molecular Biology Branch, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Berthold Göttgens
- Welcome and MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 1TN, United Kingdom
| | - Qunhua Li
- Department of Statistics, Program in Bioinformatics and Genomics, Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - David Bodine
- NHGRI Hematopoiesis Section, Genetics and Molecular Biology Branch, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - James Taylor
- Departments of Biology and Computer Science, Johns Hopkins University, Baltimore, Maryland 20218, USA
| | - Gerd A Blobel
- Department of Pediatrics, Children's Hospital of Philadelphia and University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Mitchell J Weiss
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Yong Cheng
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Feng Yue
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Jim Hughes
- MRC Weatherall Institute of Molecular Medicine, Oxford University, Oxford OX3 9DS, United Kingdom
| | - Douglas R Higgs
- MRC Weatherall Institute of Molecular Medicine, Oxford University, Oxford OX3 9DS, United Kingdom
| | - Yu Zhang
- Department of Statistics, Program in Bioinformatics and Genomics, Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
11
|
Hardison RC, Zhang Y, Keller CA, Xiang G, Heuston EF, An L, Lichtenberg J, Giardine BM, Bodine D, Mahony S, Li Q, Yue F, Weiss MJ, Blobel GA, Taylor J, Hughes J, Higgs DR, Göttgens B. Systematic integration of GATA transcription factors and epigenomes via IDEAS paints the regulatory landscape of hematopoietic cells. IUBMB Life 2020; 72:27-38. [PMID: 31769130 PMCID: PMC6972633 DOI: 10.1002/iub.2195] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 10/17/2019] [Indexed: 01/15/2023]
Abstract
Members of the GATA family of transcription factors play key roles in the differentiation of specific cell lineages by regulating the expression of target genes. Three GATA factors play distinct roles in hematopoietic differentiation. In order to better understand how these GATA factors function to regulate genes throughout the genome, we are studying the epigenomic and transcriptional landscapes of hematopoietic cells in a model-driven, integrative fashion. We have formed the collaborative multi-lab VISION project to conduct ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis. The epigenomic data included nuclease accessibility in chromatin, CTCF occupancy, and histone H3 modifications for 20 cell types covering hematopoietic stem cells, multilineage progenitor cells, and mature cells across the blood cell lineages of mouse. The analysis used the Integrative and Discriminative Epigenome Annotation System (IDEAS), which learns all common combinations of features (epigenetic states) simultaneously in two dimensions-along chromosomes and across cell types. The result is a segmentation that effectively paints the regulatory landscape in readily interpretable views, revealing constitutively active or silent loci as well as the loci specifically induced or repressed in each stage and lineage. Nuclease accessible DNA segments in active chromatin states were designated candidate cis-regulatory elements in each cell type, providing one of the most comprehensive registries of candidate hematopoietic regulatory elements to date. Applications of VISION resources are illustrated for the regulation of genes encoding GATA1, GATA2, GATA3, and Ikaros. VISION resources are freely available from our website http://usevision.org.
Collapse
Affiliation(s)
- Ross C. Hardison
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Yu Zhang
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Cheryl A. Keller
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Guanjue Xiang
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Elisabeth F. Heuston
- Genetics and Molecular Biology Branch, Hematopoiesis SectionNational Institutes of Health, NHGRIBethesdaMD
| | - Lin An
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Jens Lichtenberg
- Genetics and Molecular Biology Branch, Hematopoiesis SectionNational Institutes of Health, NHGRIBethesdaMD
| | - Belinda M. Giardine
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - David Bodine
- Genetics and Molecular Biology Branch, Hematopoiesis SectionNational Institutes of Health, NHGRIBethesdaMD
| | - Shaun Mahony
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Qunhua Li
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Feng Yue
- Department of Biochemistry and Molecular BiologyThe Pennsylvania State University College of MedicineHershey, PA
| | - Mitchell J. Weiss
- Hematology DepartmentSt. Jude Children's Research HospitalMemphis, TN
| | | | - James Taylor
- Departments of Biology and of Computer ScienceJohns Hopkins UniversityBaltimore, MD
| | - Jim Hughes
- Laboratory of Gene RegulationWeatherall Institute of Molecular Medicine, Oxford UniversityOxfordUK
| | - Douglas R. Higgs
- Laboratory of Gene RegulationWeatherall Institute of Molecular Medicine, Oxford UniversityOxfordUK
| | - Berthold Göttgens
- Department of Hematology, Cambridge Institute for Medical ResearchUniversity of CambridgeCambridgeUK
| |
Collapse
|