1
|
Skribbe M, Soneson C, Stadler MB, Schwaiger M, Suma Sreechakram VN, Iesmantavicius V, Hess D, Moreno EPF, Braun S, Seebacher J, Smallwood SA, Bühler M. A comprehensive Schizosaccharomyces pombe atlas of physical transcription factor interactions with proteins and chromatin. Mol Cell 2025; 85:1426-1444.e8. [PMID: 40015273 DOI: 10.1016/j.molcel.2025.01.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 12/16/2024] [Accepted: 01/30/2025] [Indexed: 03/01/2025]
Abstract
Transcription factors (TFs) are key regulators of gene expression, yet many of their targets and modes of action remain unknown. In Schizosaccharomyces pombe, one-third of TFs are solely homology predicted, with few experimentally validated. We created a comprehensive library of 89 endogenously tagged S. pombe TFs, mapping their protein and chromatin interactions using immunoprecipitation-mass spectrometry and chromatin immunoprecipitation sequencing. Our study identified protein interactors for half the TFs, with over a quarter potentially forming stable complexes. We discovered DNA-binding sites for most TFs across 2,027 unique genomic regions, revealing motifs for 38 TFs and uncovering a complex network of extensive TF cross- and autoregulation. Characterization of the largest TF family revealed conserved DNA sequence preferences but diverse binding patterns and identified a repressive heterodimer, Ntu1/Ntu2, linked to perinuclear gene localization. Our TFexplorer webtool makes all data interactively accessible, offering insights into TF interactions and regulatory mechanisms with broad biological relevance.
Collapse
Affiliation(s)
- Merle Skribbe
- Friedrich Miescher Institute for Biomedical Research, Fabrikstrasse 24, Basel, Switzerland; University of Basel, Petersplatz 10, Basel, Switzerland.
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Fabrikstrasse 24, Basel, Switzerland; SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Michael B Stadler
- Friedrich Miescher Institute for Biomedical Research, Fabrikstrasse 24, Basel, Switzerland; University of Basel, Petersplatz 10, Basel, Switzerland; SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Michaela Schwaiger
- Friedrich Miescher Institute for Biomedical Research, Fabrikstrasse 24, Basel, Switzerland; SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | | | - Daniel Hess
- Friedrich Miescher Institute for Biomedical Research, Fabrikstrasse 24, Basel, Switzerland
| | | | - Sigurd Braun
- Institute for Genetics, Justus-Liebig-University Giessen, Giessen, Germany
| | - Jan Seebacher
- Friedrich Miescher Institute for Biomedical Research, Fabrikstrasse 24, Basel, Switzerland
| | - Sebastien A Smallwood
- Friedrich Miescher Institute for Biomedical Research, Fabrikstrasse 24, Basel, Switzerland
| | - Marc Bühler
- Friedrich Miescher Institute for Biomedical Research, Fabrikstrasse 24, Basel, Switzerland; University of Basel, Petersplatz 10, Basel, Switzerland.
| |
Collapse
|
2
|
Wall BPG, Ogata JD, Nguyen M, McClay JL, Harrell JC, Dozmorov MG. Beyond Blacklists: A Critical Assessment of Exclusion Set Generation Strategies and Alternative Approaches. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.06.636968. [PMID: 39975128 PMCID: PMC11839099 DOI: 10.1101/2025.02.06.636968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Short-read sequencing data can be affected by alignment artifacts in certain genomic regions. Removing reads overlapping these exclusion regions, previously known as Blacklists, help to potentially improve biological signal. Tools like the widely used Blacklist software facilitate this process, but their algorithmic details and parameter choices are not always clearly documented, affecting reproducibility and biological relevance. We examined the Blacklist software and found that pre-generated exclusion sets were difficult to reproduce due to variability in input data, aligner choice, and read length. We also identified and addressed a coding issue that led to over-annotation of high-signal regions. We further explored the use of "sponge" sequences-unassembled genomic regions such as satellite DNA, ribosomal DNA, and mitochondrial DNA-as an alternative approach. Aligning reads to a genome that includes sponge sequences reduced signal correlation in ChIP-seq data comparably to Blacklist-derived exclusion sets while preserving biological signal. Sponge-based alignment also had minimal impact on RNA-seq gene counts, suggesting broader applicability beyond chromatin profiling. These results highlight the limitations of fixed exclusion sets and suggest that sponge sequences offer a flexible, alignment-guided strategy for reducing artifacts and improving functional genomics analyses.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - Jonathan D. Ogata
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - Joseph L. McClay
- Department of Pharmacotherapy and Outcomes Science, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
3
|
Kudron M, Gevirtzman L, Victorsen A, Lear BC, Gao J, Xu J, Samanta S, Frink E, Tran-Pearson A, Huynh C, Vafeados D, Hammonds A, Fisher W, Wall M, Wesseling G, Hernandez V, Lin Z, Kasparian M, White K, Allada R, Gerstein M, Hillier L, Celniker SE, Reinke V, Waterston RH. Binding profiles for 961 Drosophila and C. elegans transcription factors reveal tissue-specific regulatory relationships. Genome Res 2024; 34:2319-2334. [PMID: 39438113 PMCID: PMC11694743 DOI: 10.1101/gr.279037.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 10/17/2024] [Indexed: 10/25/2024]
Abstract
A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here, we present the culmination of the efforts of the modENCODE (model organism Encyclopedia of DNA Elements) and modERN (model organism Encyclopedia of Regulatory Networks) consortia to systematically assay TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). These data sets comprise 605 TFs identifying 3.6 M sites in the fly and 356 TFs identifying 0.9 M sites in the worm, and represent the majority of the regulatory space in each genome. We demonstrate that TFs associate with chromatin in clusters termed "metapeaks," that larger metapeaks have characteristics of high-occupancy target (HOT) regions, and that the importance of consensus sequence motifs bound by TFs depends on metapeak size and complexity. Combining ChIP-seq data with single-cell RNA-seq data in a machine-learning model identifies TFs with a prominent role in promoting target gene expression in specific cell types, even differentiating between parent-daughter cells during embryogenesis. These data are a rich resource for the community that should fuel and guide future investigations into TF function. To facilitate data accessibility and utility, all strains expressing green fluorescent protein (GFP)-tagged TFs are available at the stock centers for each organism. The chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center, GEO, and through a direct interface that provides rapid access to processed data sets and summary analyses, as well as widgets to probe the cell-type-specific TF-target relationships.
Collapse
Affiliation(s)
- Michelle Kudron
- Department of Genetics, Yale University, New Haven, Connecticut 06520, USA
| | - Louis Gevirtzman
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Alec Victorsen
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Bridget C Lear
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Jinrui Xu
- Department of Biology, Howard University, Washington, District of Columbia 20059, USA
- Center for Applied Data Science and Analytics, Howard University, Washington, District of Columbia 20059, USA
| | - Swapna Samanta
- Department of Genetics, Yale University, New Haven, Connecticut 06520, USA
| | - Emily Frink
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Adri Tran-Pearson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Chau Huynh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Dionne Vafeados
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Ann Hammonds
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - William Fisher
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Martha Wall
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois 60637, USA
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Greg Wesseling
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Vanessa Hernandez
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Zhichun Lin
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Mary Kasparian
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Kevin White
- Department of Biochemistry and Precision Medicine Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, 117597 Singapore
| | - Ravi Allada
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06520, USA
| | - LaDeana Hillier
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Susan E Celniker
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Valerie Reinke
- Department of Genetics, Yale University, New Haven, Connecticut 06520, USA;
| | - Robert H Waterston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA;
| |
Collapse
|
4
|
Hudaiberdiev S, Ovcharenko I. Functional characteristics and computational model of abundant hyperactive loci in the human genome. eLife 2024; 13:RP95170. [PMID: 39535534 PMCID: PMC11560132 DOI: 10.7554/elife.95170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024] Open
Abstract
Enhancers and promoters are classically considered to be bound by a small set of transcription factors (TFs) in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected five distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.
Collapse
Affiliation(s)
- Sanjarbek Hudaiberdiev
- National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of HealthBethesdaUnited States
| | - Ivan Ovcharenko
- National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of HealthBethesdaUnited States
| |
Collapse
|
5
|
Herbert A. A Compendium of G-Flipon Biological Functions That Have Experimental Validation. Int J Mol Sci 2024; 25:10299. [PMID: 39408629 PMCID: PMC11477331 DOI: 10.3390/ijms251910299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 09/16/2024] [Accepted: 09/18/2024] [Indexed: 10/20/2024] Open
Abstract
As with all new fields of discovery, work on the biological role of G-quadruplexes (GQs) has produced a number of results that at first glance are quite baffling, sometimes because they do not fit well together, but mostly because they are different from commonly held expectations. Like other classes of flipons, those that form G-quadruplexes have a repeat sequence motif that enables the fold. The canonical DNA motif (G3N1-7)3G3, where N is any nucleotide and G is guanine, is a feature that is under active selection in avian and mammalian genomes. The involvement of G-flipons in genome maintenance traces back to the invertebrate Caenorhabditis elegans and to ancient DNA repair pathways. The role of GQs in transcription is supported by the observation that yeast Rap1 protein binds both B-DNA, in a sequence-specific manner, and GQs, in a structure-specific manner, through the same helix. Other sequence-specific transcription factors (TFs) also engage both conformations to actuate cellular transactions. Noncoding RNAs can also modulate GQ formation in a sequence-specific manner and engage the same cellular machinery as localized by TFs, linking the ancient RNA world with the modern protein world. The coevolution of noncoding RNAs and sequence-specific proteins is supported by studies of early embryonic development, where the transient formation of G-quadruplexes coordinates the epigenetic specification of cell fate.
Collapse
Affiliation(s)
- Alan Herbert
- Discovery, InsideOutBio, 42 8th Street, Unit 3412, Charlestown, MA 02129, USA
| |
Collapse
|
6
|
Anderson AG, Moyers BA, Loupe JM, Rodriguez-Nunez I, Felker SA, Lawlor JMJ, Bunney WE, Bunney BG, Cartagena PM, Sequeira A, Watson SJ, Akil H, Mendenhall EM, Cooper GM, Myers RM. Allele-specific transcription factor binding across human brain regions offers mechanistic insight into eQTLs. Genome Res 2024; 34:1224-1234. [PMID: 39152038 PMCID: PMC11444172 DOI: 10.1101/gr.278601.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 08/14/2024] [Indexed: 08/19/2024]
Abstract
Transcription factors (TFs) regulate gene expression by facilitating or disrupting the formation of transcription initiation machinery at particular genomic loci. Because TF occupancy is driven in part by recognition of DNA sequence, genetic variation can influence TF-DNA associations and gene regulation. To identify variants that impact TF binding in human brain tissues, we assessed allele-specific binding (ASB) at heterozygous variants for 94 TFs in nine brain regions from two donors. Leveraging graph genomes constructed from phased genomic sequence data, we compared ChIP-seq signals between alleles at heterozygous variants within each brain region and identified thousands of variants exhibiting ASB for at least one TF. ASB reproducibility was measured by comparisons between independent experiments both within and between donors. We found that rare alleles in the general population more frequently led to reduced TF binding, whereas common alleles had an equal likelihood of increasing or decreasing binding. Further, for ASB variants in predicted binding motifs, the favored allele tended to be the one with the stronger expected motif match, but this concordance was not observed within highly occupied sites. We also found that neuron-specific cis-regulatory elements (cCREs), in contrast with oligodendrocyte-specific cCREs, showed depletion of ASB variants. We identified 2670 ASB variants associated with evidence for allele-specific gene expression in the brain from GTEx data and observed increasing eQTL effect direction concordance as ASB significance increases. These results provide a valuable and unique resource for mechanistic analysis of cis-regulatory variation in human brain tissue.
Collapse
Affiliation(s)
- Ashlyn G Anderson
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
- University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | - Belle A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Jacob M Loupe
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | | | | | - James M J Lawlor
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - William E Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Blynn G Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Preston M Cartagena
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Adolfo Sequeira
- Department of Psychiatry and Human Behavior, University of California, Irvine, California 92697, USA
| | - Stanley J Watson
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Huda Akil
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Eric M Mendenhall
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Gregory M Cooper
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| |
Collapse
|
7
|
Sprang M, Möllmann J, Andrade-Navarro MA, Fontaine JF. Overlooked poor-quality patient samples in sequencing data impair reproducibility of published clinically relevant datasets. Genome Biol 2024; 25:222. [PMID: 39152483 PMCID: PMC11328481 DOI: 10.1186/s13059-024-03331-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/08/2024] [Indexed: 08/19/2024] Open
Abstract
BACKGROUND Reproducibility is a major concern in biomedical studies, and existing publication guidelines do not solve the problem. Batch effects and quality imbalances between groups of biological samples are major factors hampering reproducibility. Yet, the latter is rarely considered in the scientific literature. RESULTS Our analysis uses 40 clinically relevant RNA-seq datasets to quantify the impact of quality imbalance between groups of samples on the reproducibility of gene expression studies. High-quality imbalance is frequent (14 datasets; 35%), and hundreds of quality markers are present in more than 50% of the datasets. Enrichment analysis suggests common stress-driven effects among the low-quality samples and highlights a complementary role of transcription factors and miRNAs to regulate stress response. Preliminary ChIP-seq results show similar trends. Quality imbalance has an impact on the number of differential genes derived by comparing control to disease samples (the higher the imbalance, the higher the number of genes), on the proportion of quality markers in top differential genes (the higher the imbalance, the higher the proportion; up to 22%) and on the proportion of known disease genes in top differential genes (the higher the imbalance, the lower the proportion). We show that removing outliers based on their quality score improves the resulting downstream analysis. CONCLUSIONS Thanks to a stringent selection of well-designed datasets, we demonstrate that quality imbalance between groups of samples can significantly reduce the relevance of differential genes, consequently reducing reproducibility between studies. Appropriate experimental design and analysis methods can substantially reduce the problem.
Collapse
Affiliation(s)
- Maximilian Sprang
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, Mainz, 55128, Germany
| | - Jannik Möllmann
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, Mainz, 55128, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, Mainz, 55128, Germany.
| | - Jean-Fred Fontaine
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, Mainz, 55128, Germany
- Central Institute for Decision Support Systems in Crop Protection (ZEPP), Rüdesheimer Str. 60-68, Bad Kreuznach, 55545, Germany
| |
Collapse
|
8
|
Hudaiberdiev S, Ovcharenko I. Functional characteristics and computational model of abundant hyperactive loci in the human genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.05.527203. [PMID: 36945558 PMCID: PMC10028745 DOI: 10.1101/2023.02.05.527203] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Enhancers and promoters are classically considered to be bound by a small set of TFs in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1,003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected 5 distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.
Collapse
Affiliation(s)
- Sanjarbek Hudaiberdiev
- National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of Health. Bethesda, MD
| | - Ivan Ovcharenko
- National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of Health. Bethesda, MD
| |
Collapse
|
9
|
Loupe JM, Anderson AG, Rizzardi LF, Rodriguez-Nunez I, Moyers B, Trausch-Lowther K, Jain R, Bunney WE, Bunney BG, Cartagena P, Sequeira A, Watson SJ, Akil H, Cooper GM, Myers RM. Multiomic profiling of transcription factor binding and function in human brain. Nat Neurosci 2024; 27:1387-1399. [PMID: 38831039 DOI: 10.1038/s41593-024-01658-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 04/19/2024] [Indexed: 06/05/2024]
Abstract
Transcription factors (TFs) orchestrate gene expression programs crucial for brain function, but we lack detailed information about TF binding in human brain tissue. We generated a multiomic resource (ChIP-seq, ATAC-seq, RNA-seq, DNA methylation) on bulk tissues and sorted nuclei from several postmortem brain regions, including binding maps for more than 100 TFs. We demonstrate improved measurements of TF activity, including motif recognition and gene expression modeling, upon identification and removal of high TF occupancy regions. Further, predictive TF binding models demonstrate a bias for these high-occupancy sites. Neuronal TFs SATB2 and TBR1 bind unique regions depleted for such sites and promote neuronal gene expression. Binding sites for TFs, including TBR1 and PKNOX1, are enriched for risk variants associated with neuropsychiatric disorders, predominantly in neurons. This work, titled BrainTF, is a powerful resource for future studies seeking to understand the roles of specific TFs in regulating gene expression in the human brain.
Collapse
Affiliation(s)
- Jacob M Loupe
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | - Lindsay F Rizzardi
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Department of Biochemistry and Molecular Biology, The University of Alabama in Birmingham, Birmingham, AL, USA
| | | | - Belle Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | - Rashmi Jain
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - William E Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA, USA
| | - Blynn G Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA, USA
| | - Preston Cartagena
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA, USA
| | - Adolfo Sequeira
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA, USA
| | - Stanley J Watson
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Huda Akil
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | | | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
| |
Collapse
|
10
|
Li T, Xu H, Teng S, Suo M, Bahitwa R, Xu M, Qian Y, Ramstein GP, Song B, Buckler ES, Wang H. Modeling 0.6 million genes for the rational design of functional cis-regulatory variants and de novo design of cis-regulatory sequences. Proc Natl Acad Sci U S A 2024; 121:e2319811121. [PMID: 38889146 PMCID: PMC11214048 DOI: 10.1073/pnas.2319811121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 05/14/2024] [Indexed: 06/20/2024] Open
Abstract
Rational design of plant cis-regulatory DNA sequences without expert intervention or prior domain knowledge is still a daunting task. Here, we developed PhytoExpr, a deep learning framework capable of predicting both mRNA abundance and plant species using the proximal regulatory sequence as the sole input. PhytoExpr was trained over 17 species representative of major clades of the plant kingdom to enhance its generalizability. Via input perturbation, quantitative functional annotation of the input sequence was achieved at single-nucleotide resolution, revealing an abundance of predicted high-impact nucleotides in conserved noncoding sequences and transcription factor binding sites. Evaluation of maize HapMap3 single-nucleotide polymorphisms (SNPs) by PhytoExpr demonstrates an enrichment of predicted high-impact SNPs in cis-eQTL. Additionally, we provided two algorithms that harnessed the power of PhytoExpr in designing functional cis-regulatory variants, and de novo creation of species-specific cis-regulatory sequences through in silico evolution of random DNA sequences. Our model represents a general and robust approach for functional variant discovery in population genetics and rational design of regulatory sequences for genome editing and synthetic biology.
Collapse
Affiliation(s)
- Tianyi Li
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing100193, People’s Republic of China
| | - Hui Xu
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing100193, People’s Republic of China
| | - Shouzhen Teng
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing100193, People’s Republic of China
| | - Mingrui Suo
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing100193, People’s Republic of China
| | - Revocatus Bahitwa
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing100193, People’s Republic of China
- Legumes Research Program, Research and Innovation Division, Tanzania Agricultural Research Institute, Ilonga, Kilosa, Morogoro67410, Tanzania
| | - Mingchi Xu
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing100193, People’s Republic of China
| | - Yiheng Qian
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing100193, People’s Republic of China
| | - Guillaume P. Ramstein
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus8000, Denmark
| | - Baoxing Song
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong261325, People’s Republic of China
- Key Laboratory of Maize Biology and Genetic Breeding in Arid Area of Northwest Region of the Ministry of Agriculture, College of Agronomy, Northwest A&F University, Yangling, Shaanxi712100, People’s Republic of China
| | - Edward S. Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY14853
- Agricultural Research Service, United States Department of Agriculture, Ithaca, NY14853
| | - Hai Wang
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, Department of Plant Genetics and Breeding, China Agricultural University, Beijing100193, People’s Republic of China
- Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing100193, People’s Republic of China
- Sanya Institute of China Agricultural University, Sanya572025, People’s Republic of China
| |
Collapse
|
11
|
Choi J, Lee EA. Analysis of REST binding sites with canonical and non-canonical motifs in human cell lines. BMC Med Genomics 2024; 17:92. [PMID: 38632583 PMCID: PMC11025195 DOI: 10.1186/s12920-024-01860-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 03/28/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Repressor element 1 (RE1) silencing transcription factor (REST) is a transcriptional repressor abundantly expressed in aging human brains. It is known to regulate genes associated with oxidative stress, inflammation, and neurological disorders by binding to a canonical form of sequence motif and its non-canonical variations. Although analysis of genomic sequence motifs is crucial to understand transcriptional regulation by transcription factors (TFs), a comprehensive characterization of various forms of RE1 motifs in human cell lines has not been performed. RESULTS Here, we analyzed 23 ENCODE REST ChIP-seq datasets from diverse human cell lines and identified a non-redundant set of 68,975 loci with ChIP-seq peaks. Our systematic characterization of these binding sites revealed that the canonical form of REST binding motif was found primarily in ChIP-seq peaks shared across multiple cell lines, while non-canonical forms of motifs were identified in both cell-line-specific binding sites and those shared across cell lines. Remarkably, we observed a notable prevalence of non-canonical motifs that corresponded to half segments of the canonical motif. Furthermore, our analysis unveiled the presence of cell-line-specific REST binding patterns, as evidenced by the clustering of ChIP-seq experiments according to their respective cell lines. This observation underscores the cell-line specificity of REST binding at certain genomic loci, implying intricate cell-line-specific regulatory mechanisms. CONCLUSIONS Overall, our study provides a comprehensive characterization of REST binding motifs in human cell lines and genome-wide RE1 motif profiles. These findings contribute to a deeper understanding of REST-mediated transcriptional regulation and highlight the importance of considering cell-line-specific effects in future investigations.
Collapse
Affiliation(s)
- Jaejoon Choi
- Division of Genetics and Genomics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Eunjung Alice Lee
- Division of Genetics and Genomics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA.
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA.
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
12
|
Xu J, Gao J, Ni P, Gerstein M. Less-is-more: selecting transcription factor binding regions informative for motif inference. Nucleic Acids Res 2024; 52:e20. [PMID: 38214231 PMCID: PMC10899791 DOI: 10.1093/nar/gkad1240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 12/06/2023] [Accepted: 12/17/2023] [Indexed: 01/13/2024] Open
Abstract
Numerous statistical methods have emerged for inferring DNA motifs for transcription factors (TFs) from genomic regions. However, the process of selecting informative regions for motif inference remains understudied. Current approaches select regions with strong ChIP-seq signal for a given TF, assuming that such strong signal primarily results from specific interactions between the TF and its motif. Additionally, these selection approaches do not account for non-target motifs, i.e. motifs of other TFs; they presume the occurrence of these non-target motifs infrequent compared to that of the target motif, and thus assume these have minimal interference with the identification of the target. Leveraging extensive ChIP-seq datasets, we introduced the concept of TF signal 'crowdedness', referred to as C-score, for each genomic region. The C-score helps in highlighting TF signals arising from non-specific interactions. Moreover, by considering the C-score (and adjusting for the length of genomic regions), we can effectively mitigate interference of non-target motifs. Using these tools, we find that in many instances, strong ChIP-seq signal stems mainly from non-specific interactions, and the occurrence of non-target motifs significantly impacts the accurate inference of the target motif. Prioritizing genomic regions with reduced crowdedness and short length markedly improves motif inference. This 'less-is-more' effect suggests that ChIP-seq region selection warrants more attention.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Biology, Howard University, Washington, DC 20059, USA
- Center for Applied Data Science and Analytics, Howard University, Washington, DC 20059, USA
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Pengyu Ni
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
13
|
Kudron M, Gevirtzman L, Victorsen A, Lear BC, Gao J, Xu J, Samanta S, Frink E, Tran-Pearson A, Huynh C, Vafeados D, Hammonds A, Fisher W, Wall M, Wesseling G, Hernandez V, Lin Z, Kasparian M, White K, Allada R, Gerstein M, Hillier L, Celniker SE, Reinke V, Waterston RH. Binding profiles for 954 Drosophila and C. elegans transcription factors reveal tissue specific regulatory relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576242. [PMID: 38293065 PMCID: PMC10827215 DOI: 10.1101/2024.01.18.576242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here we present the culmination of the modERN (model organism Encyclopedia of Regulatory Networks) consortium that systematically assayed TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). We describe key features of these datasets, comprising 604 TFs identifying 3.6M sites in the fly and 350 TFs identifying 0.9 M sites in the worm. Applying a machine learning model to these data identifies sets of TFs with a prominent role in promoting target gene expression in specific cell types. TF binding data are available through the ENCODE Data Coordinating Center and at https://epic.gs.washington.edu/modERNresource, which provides access to processed and summary data, as well as widgets to probe cell type-specific TF-target relationships. These data are a rich resource that should fuel investigations into TF function during development.
Collapse
Affiliation(s)
- Michelle Kudron
- Department of Genetics, Yale University, New Haven, Connecticut 06520
| | - Louis Gevirtzman
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Alec Victorsen
- Department of Laboratory Medicine & Pathology, University of Minnesota, Minneapolis, MN 55455
| | - Bridget C. Lear
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520
| | - Jinrui Xu
- Department of Biology, Howard University, Washington, District of Columbia 20059, USA
- Center for Applied Data Science and Analytics, Howard University, Washington, District of Columbia 20059, USA
| | - Swapna Samanta
- Department of Genetics, Yale University, New Haven, Connecticut 06520
| | - Emily Frink
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Adri Tran-Pearson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Chau Huynh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Dionne Vafeados
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Ann Hammonds
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720
| | - William Fisher
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720
| | - Martha Wall
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Greg Wesseling
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Vanessa Hernandez
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Zhichun Lin
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Mary Kasparian
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Kevin White
- Department of Biochemistry and Precision Medicine Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597
| | - Ravi Allada
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520
- Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06520, USA
| | - LaDeana Hillier
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Susan E. Celniker
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720
| | - Valerie Reinke
- Department of Genetics, Yale University, New Haven, Connecticut 06520
| | - Robert H. Waterston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| |
Collapse
|
14
|
Moyers BA, Partridge EC, Mackiewicz M, Betti MJ, Darji R, Meadows SK, Newberry KM, Brandsmeier LA, Wold BJ, Mendenhall EM, Myers RM. Characterization of human transcription factor function and patterns of gene regulation in HepG2 cells. Genome Res 2023; 33:1879-1892. [PMID: 37852782 PMCID: PMC10760452 DOI: 10.1101/gr.278205.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/13/2023] [Indexed: 10/20/2023]
Abstract
Transcription factors (TFs) are trans-acting proteins that bind cis-regulatory elements (CREs) in DNA to control gene expression. Here, we analyzed the genomic localization profiles of 529 sequence-specific TFs and 151 cofactors and chromatin regulators in the human cancer cell line HepG2, for a total of 680 broadly termed DNA-associated proteins (DAPs). We used this deep collection to model each TF's impact on gene expression, and identified a cohort of 26 candidate transcriptional repressors. We examine high occupancy target (HOT) sites in the context of three-dimensional genome organization and show biased motif placement in distal-promoter connections involving HOT sites. We also found a substantial number of closed chromatin regions with multiple DAPs bound, and explored their properties, finding that a MAFF/MAFK TF pair correlates with transcriptional repression. Altogether, these analyses provide novel insights into the regulatory logic of the human cell line HepG2 genome and show the usefulness of large genomic analyses for elucidation of individual TF functions.
Collapse
Affiliation(s)
- Belle A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | | | - Mark Mackiewicz
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Michael J Betti
- Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA
| | - Roshan Darji
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Sarah K Meadows
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | | | | | - Barbara J Wold
- Merkin Institute for Translational Research, California Institute of Technology, Pasadena, California 91125, USA
| | - Eric M Mendenhall
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| |
Collapse
|
15
|
Cascianelli S, Ceddia G, Marchesi A, Masseroli M. Identification of transcription factor high accumulation DNA zones. BMC Bioinformatics 2023; 24:395. [PMID: 37864168 PMCID: PMC10590011 DOI: 10.1186/s12859-023-05528-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 10/10/2023] [Indexed: 10/22/2023] Open
Abstract
BACKGROUND Transcription factors (TF) play a crucial role in the regulation of gene transcription; alterations of their activity and binding to DNA areas are strongly involved in cancer and other disease onset and development. For proper biomedical investigation, it is hence essential to correctly trace TF dense DNA areas, having multiple bindings of distinct factors, and select DNA high occupancy target (HOT) zones, showing the highest accumulation of such bindings. Indeed, systematic and replicable analysis of HOT zones in a large variety of cells and tissues would allow further understanding of their characteristics and could clarify their functional role. RESULTS Here, we propose, thoroughly explain and discuss a full computational procedure to study in-depth DNA dense areas of transcription factor accumulation and identify HOT zones. This methodology, developed as a computationally efficient parametric algorithm implemented in an R/Bioconductor package, uses a systematic approach with two alternative methods to examine transcription factor bindings and provide comparative and fully-reproducible assessments. It offers different resolutions by introducing three distinct types of accumulation, which can analyze DNA from single-base to region-oriented levels, and a moving window, which can estimate the influence of the neighborhood for each DNA base under exam. CONCLUSIONS We quantitatively assessed the full procedure by using our implemented software package, named TFHAZ, in two example applications of biological interest, proving its full reliability and relevance.
Collapse
Affiliation(s)
- Silvia Cascianelli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy
| | - Gaia Ceddia
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Alberto Marchesi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy
| |
Collapse
|
16
|
Zhu I, Landsman D. Clustered and diverse transcription factor binding underlies cell type specificity of enhancers for housekeeping genes. Genome Res 2023; 33:1662-1672. [PMID: 37884340 PMCID: PMC10691539 DOI: 10.1101/gr.278130.123] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 09/12/2023] [Indexed: 10/28/2023]
Abstract
Housekeeping genes are considered to be regulated by common enhancers across different tissues. Here we report that most of the commonly expressed mouse or human genes across different cell types, including more than half of the previously identified housekeeping genes, are associated with cell type-specific enhancers. Furthermore, the binding of most transcription factors (TFs) is cell type-specific. We reason that these cell type specificities are causally related to the collective TF recruitment at regulatory sites, as TFs tend to bind to regions associated with many other TFs and each cell type has a unique repertoire of expressed TFs. Based on binding profiles of hundreds of TFs from HepG2, K562, and GM12878 cells, we show that 80% of all TF peaks overlapping H3K27ac signals are in the top 20,000-23,000 most TF-enriched H3K27ac peak regions, and approximately 12,000-15,000 of these peaks are enhancers (nonpromoters). Those enhancers are mainly cell type-specific and include those linked to the majority of commonly expressed genes. Moreover, we show that the top 15,000 most TF-enriched regulatory sites in HepG2 cells, associated with about 200 TFs, can be predicted largely from the binding profile of as few as 30 TFs. Through motif analysis, we show that major enhancers harbor diverse and clustered motifs from a combination of available TFs uniquely present in each cell type. We propose a mechanism that explains how the highly focused TF binding at regulatory sites results in cell type specificity of enhancers for housekeeping and commonly expressed genes.
Collapse
Affiliation(s)
- Iris Zhu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - David Landsman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
17
|
Hu Qian S, Shi MW, Wang DY, Fear JM, Chen L, Tu YX, Liu HS, Zhang Y, Zhang SJ, Yu SS, Oliver B, Chen ZX. Integrating massive RNA-seq data to elucidate transcriptome dynamics in Drosophila melanogaster. Brief Bioinform 2023; 24:bbad177. [PMID: 37232385 PMCID: PMC10505420 DOI: 10.1093/bib/bbad177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 04/19/2023] [Accepted: 04/20/2023] [Indexed: 05/27/2023] Open
Abstract
The volume of ribonucleic acid (RNA)-seq data has increased exponentially, providing numerous new insights into various biological processes. However, due to significant practical challenges, such as data heterogeneity, it is still difficult to ensure the quality of these data when integrated. Although some quality control methods have been developed, sample consistency is rarely considered and these methods are susceptible to artificial factors. Here, we developed MassiveQC, an unsupervised machine learning-based approach, to automatically download and filter large-scale high-throughput data. In addition to the read quality used in other tools, MassiveQC also uses the alignment and expression quality as model features. Meanwhile, it is user-friendly since the cutoff is generated from self-reporting and is applicable to multimodal data. To explore its value, we applied MassiveQC to Drosophila RNA-seq data and generated a comprehensive transcriptome atlas across 28 tissues from embryogenesis to adulthood. We systematically characterized fly gene expression dynamics and found that genes with high expression dynamics were likely to be evolutionarily young and expressed at late developmental stages, exhibiting high nonsynonymous substitution rates and low phenotypic severity, and they were involved in simple regulatory programs. We also discovered that human and Drosophila had strong positive correlations in gene expression in orthologous organs, revealing the great potential of the Drosophila system for studying human development and disease.
Collapse
Affiliation(s)
- Sheng Hu Qian
- Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| | - Meng-Wei Shi
- Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| | - Dan-Yang Wang
- Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| | - Justin M Fear
- Section of Developmental Genomics, National Institute of Diabetes and Kidney and Digestive Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lu Chen
- Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| | - Yi-Xuan Tu
- Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| | - Hong-Shan Liu
- Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuan Zhang
- Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| | - Shuai-Jie Zhang
- Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| | - Shan-Shan Yu
- Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
| | - Brian Oliver
- Section of Developmental Genomics, National Institute of Diabetes and Kidney and Digestive Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Zhen-Xia Chen
- Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, China
- Section of Developmental Genomics, National Institute of Diabetes and Kidney and Digestive Diseases, National Institutes of Health, Bethesda, MD 20892, USA
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
- Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, China
- Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Shenzhen 518000, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China
| |
Collapse
|
18
|
Petrie MV, He Y, Gan Y, Ostrow AZ, Aparicio OM. Broadly Applicable Control Approaches Improve Accuracy of ChIP-Seq Data. Int J Mol Sci 2023; 24:9271. [PMID: 37298223 PMCID: PMC10252487 DOI: 10.3390/ijms24119271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/18/2023] [Accepted: 05/23/2023] [Indexed: 06/12/2023] Open
Abstract
Chromatin ImmunoPrecipitation (ChIP) is a widely used method for the analysis of protein-DNA interactions in vivo; however, ChIP has pitfalls, particularly false-positive signal enrichment that permeates the data. We have developed a new approach to control for non-specific enrichment in ChIP that involves the expression of a non-genome-binding protein targeted in the IP alongside the experimental target protein due to the sharing of epitope tags. ChIP of the protein provides a "sensor" for non-specific enrichment that can be used for the normalization of the experimental data, thereby correcting for non-specific signals and improving data quality as validated against known binding sites for several proteins that we tested, including Fkh1, Orc1, Mcm4, and Sir2. We also tested a DNA-binding mutant approach and showed that, when feasible, ChIP of a site-specific DNA-binding mutant of the target protein is likely an ideal control. These methods vastly improve our ChIP-seq results in S. cerevisiae and should be applicable in other systems.
Collapse
Affiliation(s)
| | | | | | | | - Oscar M. Aparicio
- Molecular and Computational Biology Section, University of Southern California, Los Angeles, CA 90089, USA; (M.V.P.); (Y.H.); (Y.G.); (A.Z.O.)
| |
Collapse
|
19
|
Morin A, Chu ECP, Sharma A, Adrian-Hamazaki A, Pavlidis P. Characterizing the targets of transcription regulators by aggregating ChIP-seq and perturbation expression data sets. Genome Res 2023; 33:763-778. [PMID: 37308292 PMCID: PMC10317128 DOI: 10.1101/gr.277273.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 04/26/2023] [Indexed: 06/14/2023]
Abstract
Mapping the gene targets of chromatin-associated transcription regulators (TRs) is a major goal of genomics research. ChIP-seq of TRs and experiments that perturb a TR and measure the differential abundance of gene transcripts are a primary means by which direct relationships are tested on a genomic scale. It has been reported that there is a poor overlap in the evidence across gene regulation strategies, emphasizing the need for integrating results from multiple experiments. Although research consortia interested in gene regulation have produced a valuable trove of high-quality data, there is an even greater volume of TR-specific data throughout the literature. In this study, we show a workflow for the identification, uniform processing, and aggregation of ChIP-seq and TR perturbation experiments for the ultimate purpose of ranking human and mouse TR-target interactions. Focusing on an initial set of eight regulators (ASCL1, HES1, MECP2, MEF2C, NEUROD1, PAX6, RUNX1, and TCF4), we identified 497 experiments suitable for analysis. We used this corpus to examine data concordance, to identify systematic patterns of the two data types, and to identify putative orthologous interactions between human and mouse. We build upon commonly used strategies to forward a procedure for aggregating and combining these two genomic methodologies, assessing these rankings against independent literature-curated evidence. Beyond a framework extensible to other TRs, our work also provides empirically ranked TR-target listings, as well as transparent experiment-level gene summaries for community use.
Collapse
Affiliation(s)
- Alexander Morin
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Eric Ching-Pan Chu
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Aman Sharma
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Alex Adrian-Hamazaki
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada;
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| |
Collapse
|
20
|
Vora M, Pyonteck SM, Popovitchenko T, Matlack TL, Prashar A, Kane NS, Favate J, Shah P, Rongo C. The hypoxia response pathway promotes PEP carboxykinase and gluconeogenesis in C. elegans. Nat Commun 2022; 13:6168. [PMID: 36257965 PMCID: PMC9579151 DOI: 10.1038/s41467-022-33849-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 10/05/2022] [Indexed: 12/31/2022] Open
Abstract
Actively dividing cells, including some cancers, rely on aerobic glycolysis rather than oxidative phosphorylation to generate energy, a phenomenon termed the Warburg effect. Constitutive activation of the Hypoxia Inducible Factor (HIF-1), a transcription factor known for mediating an adaptive response to oxygen deprivation (hypoxia), is a hallmark of the Warburg effect. HIF-1 is thought to promote glycolysis and suppress oxidative phosphorylation. Here, we instead show that HIF-1 can promote gluconeogenesis. Using a multiomics approach, we reveal the genomic, transcriptomic, and metabolomic landscapes regulated by constitutively active HIF-1 in C. elegans. We use RNA-seq and ChIP-seq under aerobic conditions to analyze mutants lacking EGL-9, a key negative regulator of HIF-1. We integrate these approaches to identify over two hundred genes directly and functionally upregulated by HIF-1, including the PEP carboxykinase PCK-1, a rate-limiting mediator of gluconeogenesis. This activation of PCK-1 by HIF-1 promotes survival in response to both oxidative and hypoxic stress. Our work identifies functional direct targets of HIF-1 in vivo, comprehensively describing the metabolome induced by HIF-1 activation in an organism.
Collapse
Affiliation(s)
- Mehul Vora
- The Waksman Institute, Rutgers The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Stephanie M Pyonteck
- The Waksman Institute, Rutgers The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Tatiana Popovitchenko
- The Waksman Institute, Rutgers The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Tarmie L Matlack
- The Waksman Institute, Rutgers The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Aparna Prashar
- The Department of Genetics, Rutgers The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Nanci S Kane
- The Waksman Institute, Rutgers The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - John Favate
- The Department of Genetics, Rutgers The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Premal Shah
- The Department of Genetics, Rutgers The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Christopher Rongo
- The Waksman Institute, Rutgers The State University of New Jersey, Piscataway, NJ, 08854, USA. .,The Department of Genetics, Rutgers The State University of New Jersey, Piscataway, NJ, 08854, USA.
| |
Collapse
|
21
|
Liu J, Zhou D. Minimum Functional Length Analysis of K-Mer Based on BPNN. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2920-2925. [PMID: 34310316 DOI: 10.1109/tcbb.2021.3098512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
BP neural network (BPNN), as a multilayer feed-forward network, can realize the deep cognition to target data and high accuracy to output results. However, there were still no related research of k-mer based on BPNN yet. In present study, BPNN was used to train and test binary classification data of each classification mode respectively. All k-mer were divided into two categories according to the X + Y content or completely random mode. Results showed that 1) For classification mode of X + Y content, the accuracy of k-mers classification was 100 percent, no matter k ≤ 6 or k ≥ 7; 2) For completely random classification mode, the accuracy of classification is 100 percent for k-mers of k ≤ 6; But for k-mers of k ≥ 7, the accuracy is less than 100 percent, and with the increase of k value, the accuracy of classification gradually decreases (gradually approaches 50 percent). The k-mers of k ≥ 7 should be the basic functional fragment of nucleic acid, and perform basic nucleic acid function in the DNA sequence. The k-mers of k ≤ 6 should be the basic component fragment of nucleic acid, and no longer perform basic nucleic acid function.
Collapse
|
22
|
Abstract
The nematode Caenorhabditis elegans has shed light on many aspects of eukaryotic biology, including genetics, development, cell biology, and genomics. A major factor in the success of C. elegans as a model organism has been the availability, since the late 1990s, of an essentially gap-free and well-annotated nuclear genome sequence, divided among 6 chromosomes. In this review, we discuss the structure, function, and biology of C. elegans chromosomes and then provide a general perspective on chromosome biology in other diverse nematode species. We highlight malleable chromosome features including centromeres, telomeres, and repetitive elements, as well as the remarkable process of programmed DNA elimination (historically described as chromatin diminution) that induces loss of portions of the genome in somatic cells of a handful of nematode species. An exciting future prospect is that nematode species may enable experimental approaches to study chromosome features and to test models of chromosome evolution. In the long term, fundamental insights regarding how speciation is integrated with chromosome biology may be revealed.
Collapse
Affiliation(s)
- Peter M Carlton
- Graduate School of Biostudies, Kyoto University, Kyoto 606-8501, Japan
| | - Richard E Davis
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO 80045, USA.,RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Shawn Ahmed
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA.,Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
23
|
Salbert G, Sérandour AA, Staels B, Lefebvre P, Eeckhoute J. The conundrum of the functional relationship between transcription factors and chromatin. Epigenomics 2022; 14:223-225. [PMID: 35034474 DOI: 10.2217/epi-2021-0509] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Affiliation(s)
- Gilles Salbert
- Université de Rennes 1, UMR6290 CNRS, Institut de Génétique et Développement de Rennes, Campus de Beaulieu, 35042, Rennes Cedex, France
| | | | - Bart Staels
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, F-59000, Lille, France
| | - Philippe Lefebvre
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, F-59000, Lille, France
| | - Jérôme Eeckhoute
- Univ. Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1011-EGID, F-59000, Lille, France
| |
Collapse
|
24
|
White SM, Snyder MP, Yi C. Master lineage transcription factors anchor trans mega transcriptional complexes at highly accessible enhancer sites to promote long-range chromatin clustering and transcription of distal target genes. Nucleic Acids Res 2021; 49:12196-12210. [PMID: 34850122 PMCID: PMC8643643 DOI: 10.1093/nar/gkab1105] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 10/09/2021] [Accepted: 11/15/2021] [Indexed: 12/28/2022] Open
Abstract
The term 'super enhancers' (SE) has been widely used to describe stretches of closely localized enhancers that are occupied collectively by large numbers of transcription factors (TFs) and co-factors, and control the transcription of highly-expressed genes. Through integrated analysis of >600 DNase-seq, ChIP-seq, GRO-seq, STARR-seq, RNA-seq, Hi-C and ChIA-PET data in five human cancer cell lines, we identified a new class of autonomous SEs (aSEs) that are excluded from classic SE calls by the widely used Rank Ordering of Super-Enhancers (ROSE) method. TF footprint analysis revealed that compared to classic SEs and regular enhancers, aSEs are tightly bound by a dense array of master lineage TFs, which serve as anchors to recruit additional TFs and co-factors in trans. In addition, aSEs are preferentially enriched for Cohesins, which likely involve in stabilizing long-distance interactions between aSEs and their distal target genes. Finally, we showed that aSEs can be reliably predicted using a single DNase-seq data or combined with Mediator and/or P300 ChIP-seq. Overall, our study demonstrates that aSEs represent a unique class of functionally important enhancer elements that distally regulate the transcription of highly expressed genes.
Collapse
Affiliation(s)
- Shannon M White
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Chunling Yi
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA
| |
Collapse
|
25
|
Li M, Liu J, Zhou J, Liu A, Chen E, Yang Q. DNA adduct formation and reduced EIF4A3expression contributes to benzo[a]pyrene-induced DNA damage in human bronchial epithelial BEAS-2B cells. Toxicol Lett 2021; 351:53-64. [PMID: 34454013 DOI: 10.1016/j.toxlet.2021.08.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 07/26/2021] [Accepted: 08/23/2021] [Indexed: 10/20/2022]
Abstract
Benzo[a]pyrene(B[a]P) is a known human carcinogen. The ability of B[a]P to form stable DNA adducts has been repeatedly demonstrated. However, the relationship between DNA adduct formation and cell damage and its underlying molecular mechanisms are less well understood. In this study, we determined the cytotoxicity of benzo[a]pyrenediolepoxide, a metabolite of B[a]P, in human bronchial epithelial cells (BEAS-2B). The formation of BPDE-DNA adducts was quantified using a dot blot. DNA damage resulting from the formation of BPDE-DNA adducts was detected by chromatin immuneprecipitation sequencing (ChIP-Seq), with minor modifications, using specific antibodies against BPDE. In total, 1846 differentially expressed gene loci were detected between the treatment and control groups. The distribution of the BPDE-bound regions indicated that BPDE could covalently bind with both coding and non-coding regions to cause DNA damage. However, the majority of binding occurred at protein-coding genes. Furthermore, among the BPDE-bound genes, we found 16 protein-coding genes related to DNA damage repair. We explored the response to BPDE exposure at the transcriptional level using qRT-PCR and observed a strong inhibition of EIF4A3. We then established an EIF4A3 overexpression cell model and performed comet assays, which revealed that the levels of DNA damage in EIF4A3-overexpressing cells were lower than those in normal cells following BPDE exposure. This suggests that the BPDE-DNA adduct-induced reduction in EIF4A3 expression contributed to the DNA damage induced by BPDE exposure in BEAS-2B cells. These novel findings indicate that ChIP-Seq combined with BPDE specific antibody may be used for exploring the underlying mechanism of DNA adduct-induced genomic damage.
Collapse
Affiliation(s)
- Mengcheng Li
- The Institute for Chemical Carcinogenesis, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou, 511436, China
| | - Jiayu Liu
- The Institute for Chemical Carcinogenesis, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou, 511436, China
| | - Jiazhen Zhou
- The Institute for Chemical Carcinogenesis, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou, 511436, China
| | - Anfei Liu
- The Institute for Chemical Carcinogenesis, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou, 511436, China
| | - Enzhao Chen
- The Institute for Chemical Carcinogenesis, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou, 511436, China
| | - Qiaoyuan Yang
- The Institute for Chemical Carcinogenesis, Guangzhou Medical University, Xinzao, Panyu District, Guangzhou, 511436, China; The State Key Lab of Respiratory Disease, The First Affiliated Hospital of Guangzhou Medical University, No. 151 Yanjiang Road, Yuexiu District, Guangzhou, 510120, China.
| |
Collapse
|
26
|
Blanco E, González-Ramírez M, Di Croce L. Productive visualization of high-throughput sequencing data using the SeqCode open portable platform. Sci Rep 2021; 11:19545. [PMID: 34599234 PMCID: PMC8486768 DOI: 10.1038/s41598-021-98889-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 08/20/2021] [Indexed: 12/23/2022] Open
Abstract
Large-scale sequencing techniques to chart genomes are entirely consolidated. Stable computational methods to perform primary tasks such as quality control, read mapping, peak calling, and counting are likewise available. However, there is a lack of uniform standards for graphical data mining, which is also of central importance. To fill this gap, we developed SeqCode, an open suite of applications that analyzes sequencing data in an elegant but efficient manner. Our software is a portable resource written in ANSI C that can be expected to work for almost all genomes in any computational configuration. Furthermore, we offer a user-friendly front-end web server that integrates SeqCode functions with other graphical analysis tools. Our analysis and visualization toolkit represents a significant improvement in terms of performance and usability as compare to other existing programs. Thus, SeqCode has the potential to become a key multipurpose instrument for high-throughput professional analysis; further, it provides an extremely useful open educational platform for the world-wide scientific community. SeqCode website is hosted at http://ldicrocelab.crg.eu, and the source code is freely distributed at https://github.com/eblancoga/seqcode.
Collapse
Affiliation(s)
- Enrique Blanco
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003, Barcelona, Spain.
| | - Mar González-Ramírez
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Luciano Di Croce
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,ICREA, Passeig Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
27
|
Novakovsky G, Saraswat M, Fornes O, Mostafavi S, Wasserman WW. Biologically relevant transfer learning improves transcription factor binding prediction. Genome Biol 2021; 22:280. [PMID: 34579793 PMCID: PMC8474956 DOI: 10.1186/s13059-021-02499-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 09/15/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets. Transfer learning can reduce the amount of data required for deep learning, while improving overall model performance, compared to training a separate model for each new task. RESULTS We assess a transfer learning strategy for TF binding prediction consisting of a pre-training step, wherein we train a multi-task model with multiple TFs, and a fine-tuning step, wherein we initialize single-task models for individual TFs with the weights learned by the multi-task model, after which the single-task models are trained at a lower learning rate. We corroborate that transfer learning improves model performance, especially if in the pre-training step the multi-task model is trained with biologically relevant TFs. We show the effectiveness of transfer learning for TFs with ~ 500 ChIP-seq peak regions. Using model interpretation techniques, we demonstrate that the features learned in the pre-training step are refined in the fine-tuning step to resemble the binding motif of the target TF (i.e., the recipient of transfer learning in the fine-tuning step). Moreover, pre-training with biologically relevant TFs allows single-task models in the fine-tuning step to learn useful features other than the motif of the target TF. CONCLUSIONS Our results confirm that transfer learning is a powerful technique for TF binding prediction.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada
| | - Manu Saraswat
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada.
| | - Sara Mostafavi
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada
- Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Canadian Institute for Advanced Research, CIFAR AI Chair, and Child and Brain Development, Toronto, ON, M5G 1 M1, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3 N1, Canada.
| |
Collapse
|
28
|
The Genome-Wide Binding Profile for Human RE1 Silencing Transcription Factor Unveils a Unique Genetic Circuitry in Hippocampus. J Neurosci 2021; 41:6582-6595. [PMID: 34210779 DOI: 10.1523/jneurosci.2059-20.2021] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 05/12/2021] [Accepted: 06/16/2021] [Indexed: 12/18/2022] Open
Abstract
Early studies in mouse neurodevelopment led to the discovery of the RE1 Silencing Transcription Factor (REST) and its role as a master repressor of neuronal gene expression. Recently, REST was reported to also repress neuronal genes in the human adult brain. These genes were found to be involved in pro-apoptotic pathways; and their repression, associated with increased REST levels during aging, were found to be neuroprotective and conserved across species. However, direct genome-wide REST binding profiles for REST in adult brain have not been identified for any species. Here, we apply this approach to mouse and human hippocampus. We find an expansion of REST binding sites in the human hippocampus that are lacking in both mouse hippocampus and other human non-neuronal cell types. The unique human REST binding sites are associated with genes involved in innate immunity processes and inflammation signaling which, on the basis of histology and recent public transcriptomic analyses, suggest that these new target genes are repressed in glia. We propose that the increases in REST expression in mid-adulthood presage the beginning of brain aging, and that human REST function has evolved to protect the longevity and function of both neurons and glia in human brain.SIGNIFICANCE STATEMENT The RE1 Silencing Transcription Factor (REST) repressor has served historically as a model for gene regulation during mouse neurogenesis. Recent studies of REST have also suggested a conserved role for REST repressor function across lower species during aging. However, direct genome-wide studies for REST have been lacking for human brain. Here, we perform the first genome-wide analysis of REST binding in both human and mouse hippocampus. The majority of REST-occupied genes in human hippocampus are distinct from those in mouse. Further, the REST-associated genes unique to human hippocampus represent a new set related to innate immunity and inflammation, where their gene dysregulation has been implicated in aging-related neuropathology, such as Alzheimer's disease.
Collapse
|
29
|
Spiegel J, Cuesta SM, Adhikari S, Hänsel-Hertsch R, Tannahill D, Balasubramanian S. G-quadruplexes are transcription factor binding hubs in human chromatin. Genome Biol 2021; 22:117. [PMID: 33892767 PMCID: PMC8063395 DOI: 10.1186/s13059-021-02324-z] [Citation(s) in RCA: 144] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 03/24/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The binding of transcription factors (TF) to genomic targets is critical in the regulation of gene expression. Short, double-stranded DNA sequence motifs are routinely implicated in TF recruitment, but many questions remain on how binding site specificity is governed. RESULTS Herein, we reveal a previously unappreciated role for DNA secondary structures as key features for TF recruitment. In a systematic, genome-wide study, we discover that endogenous G-quadruplex secondary structures (G4s) are prevalent TF binding sites in human chromatin. Certain TFs bind G4s with affinities comparable to double-stranded DNA targets. We demonstrate that, in a chromatin context, this binding interaction is competed out with a small molecule. Notably, endogenous G4s are prominent binding sites for a large number of TFs, particularly at promoters of highly expressed genes. CONCLUSIONS Our results reveal a novel non-canonical mechanism for TF binding whereby G4s operate as common binding hubs for many different TFs to promote increased transcription.
Collapse
Affiliation(s)
- Jochen Spiegel
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
| | - Sergio Martínez Cuesta
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
- Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
- Present Address: Data Sciences and Quantitative Biology, Discovery Sciences, AstraZeneca, Cambridge, UK
| | - Santosh Adhikari
- Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
| | - Robert Hänsel-Hertsch
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
- Present Address: Center for Molecular Medicine Cologne, University of Cologne, 50931, Cologne, Germany
| | - David Tannahill
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
| | - Shankar Balasubramanian
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
- Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
- School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK.
| |
Collapse
|
30
|
Nakato R, Sakata T. Methods for ChIP-seq analysis: A practical workflow and advanced applications. Methods 2021; 187:44-53. [PMID: 32240773 DOI: 10.1016/j.ymeth.2020.03.005] [Citation(s) in RCA: 120] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 03/17/2020] [Accepted: 03/18/2020] [Indexed: 12/13/2022] Open
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a central method in epigenomic research. Genome-wide analysis of histone modifications, such as enhancer analysis and genome-wide chromatin state annotation, enables systematic analysis of how the epigenomic landscape contributes to cell identity, development, lineage specification, and disease. In this review, we first present a typical ChIP-seq analysis workflow, from quality assessment to chromatin-state annotation. We focus on practical, rather than theoretical, approaches for biological studies. Next, we outline various advanced ChIP-seq applications and introduce several state-of-the-art methods, including prediction of gene expression level and chromatin loops from epigenome data and data imputation. Finally, we discuss recently developed single-cell ChIP-seq analysis methodologies that elucidate the cellular diversity within complex tissues and cancers.
Collapse
Affiliation(s)
- Ryuichiro Nakato
- Laboratory of Computational Genomics, Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan.
| | - Toyonori Sakata
- Laboratory of Genome Structure and Function, Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan.
| |
Collapse
|
31
|
Zhou M, Li H, Wang X, Guan Y. Evidence of widespread, independent sequence signature for transcription factor cobinding. Genome Res 2021; 31:265-278. [PMID: 33303494 PMCID: PMC7849410 DOI: 10.1101/gr.267310.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 12/03/2020] [Indexed: 01/03/2023]
Abstract
Transcription factors (TFs) are the vocabulary that genomes use to regulate gene expression and phenotypes. The interactions among TFs enrich this vocabulary and orchestrate diverse biological processes. Although simple models identify open chromatin and the presence of TF motifs as the two major contributors to TF binding patterns, it remains elusive what contributes to the in vivo TF cobinding landscape. In this study, we developed a machine learning algorithm to explore the contributors of the cobinding patterns. The algorithm substantially outperforms the state-of-the-field models for TF cobinding prediction. Game theory-based feature importance analysis reveals that, for most of the TF pairs we studied, independent motif sequences contribute one or more of the two TFs under investigation to their cobinding patterns. Such independent motif sequences include, but are not limited to, transcription initiation-related proteins and known TF complexes. We found the motif sequence signatures and the TFs are rarely mutual, corroborating a hierarchical and directional organization of the regulatory network and refuting the possibility of artifacts caused by shared sequence similarity with the TFs under investigation. We modeled such regulatory language with directed graphs, which reveal shared, global factors that are related to many binding and cobinding patterns.
Collapse
Affiliation(s)
- Manqi Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xueqing Wang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
32
|
Srivastava D, Aydin B, Mazzoni EO, Mahony S. An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding. Genome Biol 2021; 22:20. [PMID: 33413545 PMCID: PMC7788824 DOI: 10.1186/s13059-020-02218-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 12/03/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Transcription factor (TF) binding specificity is determined via a complex interplay between the transcription factor's DNA binding preference and cell type-specific chromatin environments. The chromatin features that correlate with transcription factor binding in a given cell type have been well characterized. For instance, the binding sites for a majority of transcription factors display concurrent chromatin accessibility. However, concurrent chromatin features reflect the binding activities of the transcription factor itself and thus provide limited insight into how genome-wide TF-DNA binding patterns became established in the first place. To understand the determinants of transcription factor binding specificity, we therefore need to examine how newly activated transcription factors interact with sequence and preexisting chromatin landscapes. RESULTS Here, we investigate the sequence and preexisting chromatin predictors of TF-DNA binding by examining the genome-wide occupancy of transcription factors that have been induced in well-characterized chromatin environments. We develop Bichrom, a bimodal neural network that jointly models sequence and preexisting chromatin data to interpret the genome-wide binding patterns of induced transcription factors. We find that the preexisting chromatin landscape is a differential global predictor of TF-DNA binding; incorporating preexisting chromatin features improves our ability to explain the binding specificity of some transcription factors substantially, but not others. Furthermore, by analyzing site-level predictors, we show that transcription factor binding in previously inaccessible chromatin tends to correspond to the presence of more favorable cognate DNA sequences. CONCLUSIONS Bichrom thus provides a framework for modeling, interpreting, and visualizing the joint sequence and chromatin landscapes that determine TF-DNA binding dynamics.
Collapse
Affiliation(s)
- Divyanshi Srivastava
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Begüm Aydin
- Department of Biology, New York University, New York, NY, USA
| | | | - Shaun Mahony
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
33
|
Massa AT, Mousel MR, Herndon MK, Herndon DR, Murdoch BM, White SN. Genome-Wide Histone Modifications and CTCF Enrichment Predict Gene Expression in Sheep Macrophages. Front Genet 2021; 11:612031. [PMID: 33488675 PMCID: PMC7817998 DOI: 10.3389/fgene.2020.612031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 11/30/2020] [Indexed: 12/14/2022] Open
Abstract
Alveolar macrophages function in innate and adaptive immunity, wound healing, and homeostasis in the lungs dependent on tissue-specific gene expression under epigenetic regulation. The functional diversity of tissue resident macrophages, despite their common myeloid lineage, highlights the need to study tissue-specific regulatory elements that control gene expression. Increasing evidence supports the hypothesis that subtle genetic changes alter sheep macrophage response to important production pathogens and zoonoses, for example, viruses like small ruminant lentiviruses and bacteria like Coxiella burnetii. Annotation of transcriptional regulatory elements will aid researchers in identifying genetic mutations of immunological consequence. Here we report the first genome-wide survey of regulatory elements in any sheep immune cell, utilizing alveolar macrophages. We assayed histone modifications and CTCF enrichment by chromatin immunoprecipitation with deep sequencing (ChIP-seq) in two sheep to determine cis-regulatory DNA elements and chromatin domain boundaries that control immunity-related gene expression. Histone modifications included H3K4me3 (denoting active promoters), H3K27ac (active enhancers), H3K4me1 (primed and distal enhancers), and H3K27me3 (broad silencers). In total, we identified 248,674 reproducible regulatory elements, which allowed assignment of putative biological function in macrophages to 12% of the sheep genome. Data exceeded the FAANG and ENCODE standards of 20 million and 45 million useable fragments for narrow and broad marks, respectively. Active elements showed consensus with RNA-seq data and were predictive of gene expression in alveolar macrophages from the publicly available Sheep Gene Expression Atlas. Silencer elements were not enriched for expressed genes, but rather for repressed developmental genes. CTCF enrichment enabled identification of 11,000 chromatin domains with mean size of 258 kb. To our knowledge, this is the first report to use immunoprecipitated CTCF to determine putative topological domains in sheep immune cells. Furthermore, these data will empower phenotype-associated mutation discovery since most causal variants are within regulatory elements.
Collapse
Affiliation(s)
- Alisha T Massa
- Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States
| | - Michelle R Mousel
- Animal Disease Research Unit, Agricultural Research Service, United States Department of Agriculture, Pullman, WA, United States.,Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States
| | - Maria K Herndon
- Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States
| | - David R Herndon
- Animal Disease Research Unit, Agricultural Research Service, United States Department of Agriculture, Pullman, WA, United States
| | - Brenda M Murdoch
- Department of Animal and Veterinary Science, University of Idaho, Moscow, ID, United States.,Center for Reproductive Biology, Washington State University, Pullman, WA, United States
| | - Stephen N White
- Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States.,Animal Disease Research Unit, Agricultural Research Service, United States Department of Agriculture, Pullman, WA, United States.,Center for Reproductive Biology, Washington State University, Pullman, WA, United States
| |
Collapse
|
34
|
Luo KL, Underwood RS, Greenwald I. Positive autoregulation of lag-1 in response to LIN-12 activation in cell fate decisions during C. elegans reproductive system development. Development 2020; 147:dev.193482. [PMID: 32839181 DOI: 10.1242/dev.193482] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 08/11/2020] [Indexed: 12/11/2022]
Abstract
During animal development, ligand binding releases the intracellular domain of LIN-12/Notch by proteolytic cleavage to translocate to the nucleus, where it associates with the DNA-binding protein LAG-1/CSL to activate target gene transcription. We investigated the spatiotemporal regulation of LAG-1/CSL expression in Caenorhabditis elegans and observed that an increase in endogenous LAG-1 levels correlates with LIN-12/Notch activation in different cell contexts during reproductive system development. We show that this increase is via transcriptional upregulation by creating a synthetic endogenous operon, and identified an enhancer region that contains multiple LAG-1 binding sites (LBSs) embedded in a more extensively conserved high occupancy target (HOT) region. We show that these LBSs are necessary for upregulation in response to LIN-12/Notch activity, indicating that lag-1 engages in direct positive autoregulation. Deletion of the HOT region from endogenous lag-1 reduced LAG-1 levels and abrogated positive autoregulation, but did not cause hallmark cell fate transformations associated with loss of lin-12/Notch or lag-1 activity. Instead, later somatic reproductive system defects suggest that proper transcriptional regulation of lag-1 confers robustness to somatic reproductive system development.
Collapse
Affiliation(s)
- Katherine Leisan Luo
- Integrated Program in Cellular, Molecular and Biophysical Studies, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA
| | - Ryan S Underwood
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Iva Greenwald
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| |
Collapse
|
35
|
Andreani T, Albrecht S, Fontaine JF, Andrade-Navarro MA. Computational identification of cell-specific variable regions in ChIP-seq data. Nucleic Acids Res 2020; 48:e53. [PMID: 32187374 PMCID: PMC7229859 DOI: 10.1093/nar/gkaa180] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 02/04/2020] [Accepted: 03/10/2020] [Indexed: 11/25/2022] Open
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is used to identify genome-wide DNA regions bound by proteins. Given one ChIP-seq experiment with replicates, binding sites not observed in all the replicates will usually be interpreted as noise and discarded. However, the recent discovery of high-occupancy target (HOT) regions suggests that there are regions where binding of multiple transcription factors can be identified. To investigate ChIP-seq variability, we developed a reproducibility score and a method that identifies cell-specific variable regions in ChIP-seq data by integrating replicated ChIP-seq experiments for multiple protein targets on a particular cell type. Using our method, we found variable regions in human cell lines K562, GM12878, HepG2, MCF-7 and in mouse embryonic stem cells (mESCs). These variable-occupancy target regions (VOTs) are CG dinucleotide rich, and show enrichment at promoters and R-loops. They overlap significantly with HOT regions, but are not blacklisted regions producing non-specific binding ChIP-seq peaks. Furthermore, in mESCs, VOTs are conserved among placental species suggesting that they could have a function important for this taxon. Our method can be useful to point to such regions along the genome in a given cell type of interest, to improve the downstream interpretative analysis before follow-up experiments.
Collapse
Affiliation(s)
- Tommaso Andreani
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany.,Institute of Molecular Biology (IMB), 55128 Mainz, Germany
| | - Steffen Albrecht
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Jean-Fred Fontaine
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | | |
Collapse
|
36
|
Abstract
Several decades elapsed between the first descriptions of G-quadruplex nucleic acid structures (G4s) assembled in vitro and the emergence of experimental findings indicating that such structures can form and function in living systems. A large body of evidence now supports roles for G4s in many aspects of nucleic acid biology, spanning processes from transcription and chromatin structure, mRNA processing, protein translation, DNA replication and genome stability, and telomere and mitochondrial function. Nonetheless, it must be acknowledged that some of this evidence is tentative, which is not surprising given the technical challenges associated with demonstrating G4s in biology. Here I provide an overview of evidence for G4 biology, focusing particularly on the many potential pitfalls that can be encountered in its investigation, and briefly discuss some of broader biological processes that may be impacted by G4s including cancer.
Collapse
Affiliation(s)
- F. Brad Johnson
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
37
|
Osmala M, Lähdesmäki H. Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns. BMC Bioinformatics 2020; 21:317. [PMID: 32689977 PMCID: PMC7370432 DOI: 10.1186/s12859-020-03621-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 06/19/2020] [Indexed: 12/11/2022] Open
Abstract
Background The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. Results In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. Conclusion PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.
Collapse
Affiliation(s)
- Maria Osmala
- Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02150, Finland.
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02150, Finland
| |
Collapse
|
38
|
Dissecting the regulatory activity and sequence content of loci with exceptional numbers of transcription factor associations. Genome Res 2020; 30:939-950. [PMID: 32616518 PMCID: PMC7397867 DOI: 10.1101/gr.260463.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 06/24/2020] [Indexed: 02/07/2023]
Abstract
DNA-associated proteins (DAPs) classically regulate gene expression by binding to regulatory loci such as enhancers or promoters. As expanding catalogs of genome-wide DAP binding maps reveal thousands of loci that, unlike the majority of conventional enhancers and promoters, associate with dozens of different DAPs with apparently little regard for motif preference, an understanding of DAP association and coordination at such regulatory loci is essential to deciphering how these regions contribute to normal development and disease. In this study, we aggregated publicly available ChIP-seq data from 469 human DAPs assayed in three cell lines and integrated these data with an orthogonal data set of 352 nonredundant, in vitro–derived motifs mapped to the genome within DNase I hypersensitivity footprints to characterize regions with high numbers of DAP associations. We establish a generalizable definition for high occupancy target (HOT) loci and identify putative driver DAP motifs in HepG2 cells, including HNF4A, SP1, SP5, and ETV4, that are highly prevalent and show sequence conservation at HOT loci. The number of different DAPs associated with an element is positively associated with evidence of regulatory activity, and by systematically mutating 245 HOT loci with a massively parallel mutagenesis assay, we localized regulatory activity to a central core region that depends on the motif sequences of our previously nominated driver DAPs. In sum, this work leverages the increasingly large number of DAP motif and ChIP-seq data publicly available to explore how DAP associations contribute to genome-wide transcriptional regulation.
Collapse
|
39
|
Partridge EC, Chhetri SB, Prokop JW, Ramaker RC, Jansen CS, Goh ST, Mackiewicz M, Newberry KM, Brandsmeier LA, Meadows SK, Messer CL, Hardigan AA, Coppola CJ, Dean EC, Jiang S, Savic D, Mortazavi A, Wold BJ, Myers RM, Mendenhall EM. Occupancy maps of 208 chromatin-associated proteins in one human cell type. Nature 2020; 583:720-728. [PMID: 32728244 PMCID: PMC7398277 DOI: 10.1038/s41586-020-2023-4] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 01/09/2020] [Indexed: 01/02/2023]
Abstract
Transcription factors are DNA-binding proteins that have key roles in gene regulation1,2. Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes3-6. However, only a minority of the more than 1,600 transcription factors encoded in the human genome has been assayed. Here we present, as part of the ENCODE (Encyclopedia of DNA Elements) project, data and analyses from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiments using the human HepG2 cell line for 208 chromatin-associated proteins (CAPs). These comprise 171 transcription factors and 37 transcriptional cofactors and chromatin regulator proteins, and represent nearly one-quarter of CAPs expressed in HepG2 cells. The binding profiles of these CAPs form major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalogue of DNA sequence motifs for transcription factors, and describe motifs that correspond to other transcription factors that are co-enriched with the primary ChIP target. For example, FOX family motifs are enriched in ChIP-seq peaks of 37 other CAPs. We show that motif content and occupancy patterns can distinguish between promoters and enhancers. This catalogue reveals high-occupancy target regions at which many CAPs associate, although each contains motifs for only a minority of the numerous associated transcription factors. These analyses provide a more complete overview of the gene regulatory networks that define this cell type, and demonstrate the usefulness of the large-scale production efforts of the ENCODE Consortium.
Collapse
Affiliation(s)
| | - Surya B Chhetri
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Department of Biological Sciences, The University of Alabama in Huntsville, Huntsville, AL, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MA, USA
| | - Jeremy W Prokop
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Department of Pediatrics and Human Development, College of Human Medicine, Michigan State University, Grand Rapids, MI, USA
| | - Ryne C Ramaker
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Camden S Jansen
- Department of Developmental and Cell Biology, University of California Irvine, Irvine, CA, USA
| | - Say-Tar Goh
- Division of Biology, California Institute of Technology, Pasadena, CA, USA
| | - Mark Mackiewicz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | | | - Sarah K Meadows
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - C Luke Messer
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Andrew A Hardigan
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Candice J Coppola
- Department of Biological Sciences, The University of Alabama in Huntsville, Huntsville, AL, USA
| | - Emma C Dean
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Shan Jiang
- Department of Developmental and Cell Biology, University of California Irvine, Irvine, CA, USA
| | - Daniel Savic
- Pharmaceutical Sciences Department, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California Irvine, Irvine, CA, USA
| | - Barbara J Wold
- Division of Biology, California Institute of Technology, Pasadena, CA, USA
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
| | - Eric M Mendenhall
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
- Department of Biological Sciences, The University of Alabama in Huntsville, Huntsville, AL, USA.
| |
Collapse
|
40
|
Fosslie M, Manaf A, Lerdrup M, Hansen K, Gilfillan GD, Dahl JA. Going low to reach high: Small-scale ChIP-seq maps new terrain. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2019; 12:e1465. [PMID: 31478357 DOI: 10.1002/wsbm.1465] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 07/02/2019] [Accepted: 07/25/2019] [Indexed: 12/20/2022]
Abstract
Chromatin immunoprecipitation (ChIP) enables mapping of specific histone modifications or chromatin-associated factors in the genome and represents a powerful tool in the study of chromatin and genome regulation. Importantly, recent technological advances that couple ChIP with whole-genome high-throughput sequencing (ChIP-seq) now allow the mapping of chromatin factors throughout the genome. However, the requirement for large amounts of ChIP-seq input material has long made it challenging to assess chromatin profiles of cell types only available in limited numbers. For many cell types, it is not feasible to reach high numbers when collecting them as homogeneous cell populations in vivo. Nonetheless, it is an advantage to work with pure cell populations to reach robust biological conclusions. Here, we review (a) how ChIP protocols have been scaled down for use with as little as a few hundred cells; (b) which considerations to be aware of when preparing small-scale ChIP-seq and analyzing data; and (c) the potential of small-scale ChIP-seq datasets for elucidating chromatin dynamics in various biological systems, including some examples such as oocyte maturation and preimplantation embryo development. This article is categorized under: Laboratory Methods and Technologies > Genetic/Genomic Methods Developmental Biology > Developmental Processes in Health and Disease Biological Mechanisms > Cell Fates.
Collapse
Affiliation(s)
| | - Adeel Manaf
- Department of Microbiology, Oslo University Hospital, Oslo, Norway
| | - Mads Lerdrup
- The Biotech Research and Innovation Centre, University of Copenhagen, Copenhagen, Denmark.,Centre for Epigenetics, University of Copenhagen, Copenhagen, Denmark
| | - Klaus Hansen
- The Biotech Research and Innovation Centre, University of Copenhagen, Copenhagen, Denmark.,Centre for Epigenetics, University of Copenhagen, Copenhagen, Denmark
| | - Gregor D Gilfillan
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - John Arne Dahl
- Department of Microbiology, Oslo University Hospital, Oslo, Norway
| |
Collapse
|