1
|
Povolotskii M, Yehezkehely M, Ram O, Lukatsky D. Bimodal specificity of TF-DNA recognition in embryonic stem cells. Nucleic Acids Res 2025; 53:gkaf333. [PMID: 40287827 PMCID: PMC12034040 DOI: 10.1093/nar/gkaf333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 03/11/2025] [Accepted: 04/17/2025] [Indexed: 04/29/2025] Open
Abstract
Transcription factors (TFs) bind genomic DNA regulating gene expression and developmental programs in embryonic stem cells (ESCs). Even though comprehensive genome-wide molecular maps for TF-DNA binding are experimentally available for key pluripotency-associated TFs, the understanding of molecular design principles responsible for TF-DNA recognition remains incomplete. Here, we show that binding preferences of key pluripotency TFs, such as Pou5f1 (Oct4), Smad1, Otx2, Srf, and Nanog, exhibit bimodality in the local GC-content distribution. Sequence-dependent binding specificity of these TFs is distributed across three major contributions. First, local GC-content is dominant in high-GC-content regions. Second, recognition of specific k-mers is predominant in low-GC-content regions. Third, short tandem repeats (STRs) are highly predictive in both low- and high-GC-content regions. In sharp contrast, the binding preferences of c-Myc are exclusively dominated by local GC-content and STRs in high-GC-content genomic regions. We demonstrate that the transition in the TF-DNA binding landscape upon ESC differentiation is regulated by the concentration of c-Myc, which forms a bivalent c-Myc-Max heterotetramer upon promoter binding, competing with key pluripotency factors such as Smad1. Finally, a direct interaction between c-Myc and key pluripotency factors is not required to achieve this transition.
Collapse
Affiliation(s)
- Michael Povolotskii
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, 8410501, Israel
| | - Maor Yehezkehely
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, 8410501, Israel
| | - Oren Ram
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - David B Lukatsky
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, 8410501, Israel
| |
Collapse
|
2
|
Schaepe JM, Fries T, Doughty BR, Crocker OJ, Hinks MM, Marklund E, Greenleaf WJ. Thermodynamic principles link in vitro transcription factor affinities to single-molecule chromatin states in cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.27.635162. [PMID: 39975040 PMCID: PMC11838358 DOI: 10.1101/2025.01.27.635162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
The molecular details governing transcription factor (TF) binding and the formation of accessible chromatin are not yet quantitatively understood - including how sequence context modulates affinity, how TFs search DNA, the kinetics of TF occupancy, and how motif grammars coordinate binding. To resolve these questions for a human TF, erythroid Krüppel-like factor (eKLF/KLF1), we quantitatively compare, in high throughput, in vitro TF binding rates and affinities with in vivo single molecule TF and nucleosome occupancies across engineered DNA sequences. We find that 40-fold flanking sequence effects on affinity are consistent with distal flanks tuning TF search parameters and captured by a linear energy model. Motif recognition probability, rather than time in the bound state, drives affinity changes, and in vitro and in nuclei measurements exhibit consistent, minutes-long TF residence times. Finally, pairing in vitro biophysical parameters with thermodynamic models accurately predicts in vivo single-molecule chromatin states for unseen motif grammars.
Collapse
Affiliation(s)
- Julia M Schaepe
- Bioengineering Department, Stanford University, Stanford, CA 94305, USA
| | - Torbjörn Fries
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | | | - Olivia J Crocker
- Genetics Department, Stanford University, Stanford, CA 94305, USA
| | - Michaela M Hinks
- Bioengineering Department, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - William J Greenleaf
- Genetics Department, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94205, USA
| |
Collapse
|
3
|
Horton CA, Alexandari AM, Hayes MGB, Marklund E, Schaepe JM, Aditham AK, Shah N, Suzuki PH, Shrikumar A, Afek A, Greenleaf WJ, Gordân R, Zeitlinger J, Kundaje A, Fordyce PM. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 2023; 381:eadd1250. [PMID: 37733848 DOI: 10.1126/science.add1250] [Citation(s) in RCA: 78] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/26/2023] [Indexed: 09/23/2023]
Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.
Collapse
Affiliation(s)
- Connor A Horton
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Michael G B Hayes
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Julia M Schaepe
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Arjun K Aditham
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
| | - Nilay Shah
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Peter H Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ariel Afek
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | | | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
- The University of Kansas Medical Center, Kansas City, KS 66103, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Polly M Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94110, USA
| |
Collapse
|
4
|
Samee MAH. Noncanonical binding of transcription factors: time to revisit specificity? Mol Biol Cell 2023; 34:pe4. [PMID: 37486893 PMCID: PMC10398899 DOI: 10.1091/mbc.e22-08-0325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 06/05/2023] [Accepted: 06/21/2023] [Indexed: 07/26/2023] Open
Abstract
Transcription factors (TFs) are one of the most studied classes of DNA-binding proteins that have a direct functional impact on gene transcription and thus, on human physiology and disease. The mechanisms that TFs use for recognizing target DNA binding sites have been studied for nearly five decades, yet they remain poorly understood. It is classically assumed that a TF recognizes a specific sequence pattern, or motif, as its binding sites. However, recent studies are consistently finding examples of noncanonical binding, that is, TFs binding at sites that do not resemble their sequence motifs. Here we review the current literature on four major types of noncanonical TF binding, namely binding based on DNA shape readout, at Guanine-quadruplex structures, at repeat sequences, and bispecific binding. These examples point to a critical need for studies to unify our current observations, many of which are at odds with the "one TF, one motif" view, into a more comprehensive definition of the DNA-binding specificity of TFs.
Collapse
|
5
|
Kalakoti Y, Clarancia Peter S, Gawande S, Sundar D. Modulation of DNA-protein interactions by proximal genetic elements as uncovered by interpretable deep learning. J Mol Biol 2023; 435:168121. [PMID: 37100167 DOI: 10.1016/j.jmb.2023.168121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 04/14/2023] [Accepted: 04/19/2023] [Indexed: 04/28/2023]
Abstract
Transcription factors (TF) recognize specific motifs in the genome that are typically 6-12 bp long to regulate various aspects of the cellular machinery. Presence of binding motifs and favorable genome accessibility are key drivers for a consistent TF-DNA interaction. Although these pre-requisites may occur thousands of times in the genome, there seems to be a high degree of selectivity for the sites that are actually bound. Here, we present a deep-learning framework that identifies and characterizes the upstream and downstream genetic elements to the binding motif, for their role in enforcing the mentioned selectivity. The proposed framework is based on an interpretable recurrent neural network architecture that enables for the relative analysis of sequence context features. We apply the framework to model twenty-six transcription factors and score the TF-DNA binding at a base-pair resolution. We find significant differences in activations of DNA context features for bound and unbound sequences. In addition to standardized evaluation protocols, we offer outstanding interpretability that enables us to identify and annotate DNA sequence with possible elements that modulate TF-DNA binding. Also, differences in data processing have a huge influence on the overall model performance. Overall, the proposed framework allows for novel insights on the non-coding genetic elements and their role in facilitating a stable TF-DNA interaction.
Collapse
Affiliation(s)
- Yogesh Kalakoti
- Department of Biochemical Engineering & Biotechnology, Indian Institute of Technology (IIT) Delhi, New Delhi - 110016, India.
| | | | - Swaraj Gawande
- Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Delhi - 110016, India.
| | - Durai Sundar
- Department of Biochemical Engineering & Biotechnology, Indian Institute of Technology (IIT) Delhi, New Delhi - 110016, India; Yardi School of Artificial Intelligence, Indian Institute of Technology (IIT) Delhi, New Delhi - 110016, India.
| |
Collapse
|
6
|
Mellul M, Lahav S, Imashimizu M, Tokunaga Y, Lukatsky DB, Ram O. Repetitive DNA symmetry elements negatively regulate gene expression in embryonic stem cells. Biophys J 2022; 121:3126-3135. [PMID: 35810331 PMCID: PMC9463640 DOI: 10.1016/j.bpj.2022.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 06/13/2022] [Accepted: 07/07/2022] [Indexed: 11/30/2022] Open
Abstract
Transcription factor (TF) binding to genomic DNA elements constitutes one of the key mechanisms that regulates gene expression program in cells. Both consensus and nonconsensus DNA sequence elements influence the recognition specificity of TFs. Based on the analysis of experimentally determined c-Myc binding preferences to genomic DNA, here we statistically predict that certain repetitive, nonconsensus DNA symmetry elements can relatively reduce TF-DNA binding preferences. This is in contrast to a different set of repetitive, nonconsensus symmetry elements that can increase the strength of TF-DNA binding. Using c-Myc enhancer reporter system containing consensus motif flanked by nonconsensus sequences in embryonic stem cells, we directly demonstrate that the enrichment in such negatively regulating repetitive symmetry elements is sufficient to reduce the gene expression level compared with native genomic sequences. Negatively regulating repetitive symmetry elements around consensus c-Myc motif and DNA sequences containing consensus c-Myc motif flanked by entirely randomized sequences show similar expression baseline. A possible explanation for this observation is that rather than complete repression, negatively regulating repetitive symmetry elements play a regulatory role in fine-tuning the reduction of gene expression, most probably by binding TFs other than c-Myc.
Collapse
Affiliation(s)
- Meir Mellul
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Jerusalem, Israel
| | - Shlomtzion Lahav
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Jerusalem, Israel
| | - Masahiko Imashimizu
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
| | - Yuji Tokunaga
- Graduate School of Pharmaceutical Sciences, the University of Tokyo, Tokyo, Japan
| | - David B Lukatsky
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| | - Oren Ram
- Department of Biological Chemistry, The Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Jerusalem, Israel.
| |
Collapse
|
7
|
Morrow A, Hughes J, Singh J, Joseph A, Yosef N. Epitome: predicting epigenetic events in novel cell types with multi-cell deep ensemble learning. Nucleic Acids Res 2021; 49:e110. [PMID: 34379786 PMCID: PMC8565335 DOI: 10.1093/nar/gkab676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 07/19/2021] [Accepted: 07/25/2021] [Indexed: 01/04/2023] Open
Abstract
The accumulation of large epigenomics data consortiums provides us with the opportunity to extrapolate existing knowledge to new cell types and conditions. We propose Epitome, a deep neural network that learns similarities of chromatin accessibility between well characterized reference cell types and a query cellular context, and copies over signal of transcription factor binding and modification of histones from reference cell types when chromatin profiles are similar to the query. Epitome achieves state-of-the-art accuracy when predicting transcription factor binding sites on novel cellular contexts and can further improve predictions as more epigenetic signals are collected from both reference cell types and the query cellular context of interest.
Collapse
Affiliation(s)
- Alyssa Kramer Morrow
- Electrical Engineering and Computer Science Department, University of California-Berkeley 465 Soda Hall, Berkeley, CA 94720-1776, USA
| | - John Weston Hughes
- Electrical Engineering and Computer Science Department, University of California-Berkeley 465 Soda Hall, Berkeley, CA 94720-1776, USA
- Computer Science Department, Stanford University, 353 Serra Mall, Stanford, CA 94305, USA
| | - Jahnavi Singh
- Electrical Engineering and Computer Science Department, University of California-Berkeley 465 Soda Hall, Berkeley, CA 94720-1776, USA
| | - Anthony Douglas Joseph
- Electrical Engineering and Computer Science Department, University of California-Berkeley 465 Soda Hall, Berkeley, CA 94720-1776, USA
- Center for Computational Biology, University of California-Berkeley 108 Stanley Hall, Berkeley, CA 94720-3220, USA
- Unite Genomics, Inc., 1301 Marina Village Pkwy, Suite 320, Alameda, CA 94501, USA
| | - Nir Yosef
- Electrical Engineering and Computer Science Department, University of California-Berkeley 465 Soda Hall, Berkeley, CA 94720-1776, USA
- Center for Computational Biology, University of California-Berkeley 108 Stanley Hall, Berkeley, CA 94720-3220, USA
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard University, Boston, MA, 02139, USA
- Chan Zuckerberg Biohub, San Francisco, CA, 94158, USA
| |
Collapse
|
8
|
Lukatsky DB. Understanding the Robustness of Protein Diffusion on DNA and Microtubules. Biophys J 2020; 118:2870-2871. [PMID: 32470323 DOI: 10.1016/j.bpj.2020.05.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 05/07/2020] [Indexed: 11/19/2022] Open
Affiliation(s)
- David B Lukatsky
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| |
Collapse
|
9
|
Teif VB. Soft Power of Nonconsensus Protein-DNA Binding. Biophys J 2020; 118:1797-1798. [PMID: 32187530 DOI: 10.1016/j.bpj.2020.02.026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 02/26/2020] [Indexed: 11/19/2022] Open
Affiliation(s)
- Vladimir B Teif
- School of Life Sciences, University of Essex, Colchester, United Kingdom.
| |
Collapse
|