1
|
Martin V, Zhuang F, Zhang Y, Pinheiro K, Gordân R. High-throughput data and modeling reveal insights into the mechanisms of cooperative DNA-binding by transcription factor proteins. Nucleic Acids Res 2023; 51:11600-11612. [PMID: 37889068 PMCID: PMC10681739 DOI: 10.1093/nar/gkad872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 09/21/2023] [Accepted: 10/05/2023] [Indexed: 10/28/2023] Open
Abstract
Cooperative DNA-binding by transcription factor (TF) proteins is critical for eukaryotic gene regulation. In the human genome, many regulatory regions contain TF-binding sites in close proximity to each other, which can facilitate cooperative interactions. However, binding site proximity does not necessarily imply cooperative binding, as TFs can also bind independently to each of their neighboring target sites. Currently, the rules that drive cooperative TF binding are not well understood. In addition, it is oftentimes difficult to infer direct TF-TF cooperativity from existing DNA-binding data. Here, we show that in vitro binding assays using DNA libraries of a few thousand genomic sequences with putative cooperative TF-binding events can be used to develop accurate models of cooperativity and to gain insights into cooperative binding mechanisms. Using factors ETS1 and RUNX1 as our case study, we show that the distance and orientation between ETS1 sites are critical determinants of cooperative ETS1-ETS1 binding, while cooperative ETS1-RUNX1 interactions show more flexibility in distance and orientation and can be accurately predicted based on the affinity and sequence/shape features of the binding sites. The approach described here, combining custom experimental design with machine-learning modeling, can be easily applied to study the cooperative DNA-binding patterns of any TFs.
Collapse
Affiliation(s)
- Vincentius Martin
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Farica Zhuang
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Yuning Zhang
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
- Program in Computational Biology & Bioinformatics, Durham, NC 27708, USA
| | - Kyle Pinheiro
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Raluca Gordân
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
- Department of Biostatistics & Bioinformatics, Department of Molecular Genetics and Microbiology, Department of Cell Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
2
|
Hill SL, Rogan PK, Wang YX, Knoll JHM. Differentially accessible, single copy sequences form contiguous domains along metaphase chromosomes that are conserved among multiple tissues. Mol Cytogenet 2021; 14:49. [PMID: 34670606 PMCID: PMC8527651 DOI: 10.1186/s13039-021-00567-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 09/08/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND During mitosis, chromatin engages in a dynamic cycle of condensation and decondensation. Condensation into distinct units to ensure high fidelity segregation is followed by rapid and reproducible decondensation to produce functional daughter cells. Factors contributing to the reproducibility of chromatin structure between cell generations are not well understood. We investigated local metaphase chromosome condensation along mitotic chromosomes within genomic intervals showing differential accessibility (DA) between homologs. DA was originally identified using short sequence-defined single copy (sc) DNA probes of < 5 kb in length by fluorescence in situ hybridization (scFISH) in peripheral lymphocytes. These structural differences between metaphase homologs are non-random, stable, and heritable epigenetic marks which have led to the proposed function of DA as a marker of chromatin memory. Here, we characterize the organization of DA intervals into chromosomal domains by identifying multiple DA loci in close proximity to each other and examine the conservation of DA between tissues. RESULTS We evaluated multiple adjacent scFISH probes at 6 different DA loci from chromosomal regions 2p23, 3p24, 12p12, 15q22, 15q24 and 20q13 within peripheral blood T-lymphocytes. DA was organized within domains that extend beyond the defined boundaries of individual scFISH probes. Based on hybridizations of 2 to 4 scFISH probes per domain, domains ranged in length from 16.0 kb to 129.6 kb. Transcriptionally inert chromosomal DA regions in T-lymphocytes also demonstrated conservation of DA in bone marrow and fibroblast cells. CONCLUSIONS We identified novel chromosomal regions with allelic differences in metaphase chromosome accessibility and demonstrated that these accessibility differences appear to be aggregated into contiguous domains extending beyond individual scFISH probes. These domains are encompassed by previously established topologically associated domain (TAD) boundaries. DA appears to be a conserved feature of human metaphase chromosomes across different stages of lymphocyte differentiation and germ cell origin, consistent with its proposed role in maintenance of intergenerational cellular chromosome memory.
Collapse
Affiliation(s)
- Seana L Hill
- Department of Pathology & Laboratory Medicine, Schulich School of Medicine & Dentistry, University of Western Ontario, London, Canada
| | - Peter K Rogan
- Departments of Biochemistry and Oncology, Schulich School of Medicine & Dentistry, University of Western Ontario, London, Canada
- Cytognomix Inc., London, ON, Canada
| | - Yi Xuan Wang
- Department of Pathology & Laboratory Medicine, Schulich School of Medicine & Dentistry, University of Western Ontario, London, Canada
| | - Joan H M Knoll
- Department of Pathology & Laboratory Medicine, Schulich School of Medicine & Dentistry, University of Western Ontario, London, Canada.
- Cytognomix Inc., London, ON, Canada.
| |
Collapse
|
3
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
4
|
Floc'hlay S, Wong ES, Zhao B, Viales RR, Thomas-Chollier M, Thieffry D, Garfield DA, Furlong EEM. Cis-acting variation is common across regulatory layers but is often buffered during embryonic development. Genome Res 2021; 31:211-224. [PMID: 33310749 PMCID: PMC7849415 DOI: 10.1101/gr.266338.120] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 12/09/2020] [Indexed: 12/14/2022]
Abstract
Precise patterns of gene expression are driven by interactions between transcription factors, regulatory DNA sequences, and chromatin. How DNA mutations affecting any one of these regulatory "layers" are buffered or propagated to gene expression remains unclear. To address this, we quantified allele-specific changes in chromatin accessibility, histone modifications, and gene expression in F1 embryos generated from eight Drosophila crosses at three embryonic stages, yielding a comprehensive data set of 240 samples spanning multiple regulatory layers. Genetic variation (allelic imbalance) impacts gene expression more frequently than chromatin features, with metabolic and environmental response genes being most often affected. Allelic imbalance in cis-regulatory elements (enhancers) is common and highly heritable, yet its functional impact does not generally propagate to gene expression. When it does, genetic variation impacts RNA levels through two alternative mechanisms involving either H3K4me3 or chromatin accessibility and H3K27ac. Changes in RNA are more predictive of variation in H3K4me3 than vice versa, suggesting a role for H3K4me3 downstream from transcription. The impact of a substantial proportion of genetic variation is consistent across embryonic stages, with 50% of allelic imbalanced features at one stage being also imbalanced at subsequent developmental stages. Crucially, buffering, as well as the magnitude and evolutionary impact of genetic variants, is influenced by regulatory complexity (i.e., number of enhancers regulating a gene), with transcription factors being most robust to cis-acting, but most influenced by trans-acting, variation.
Collapse
Affiliation(s)
- Swann Floc'hlay
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Emily S Wong
- Molecular, Structural and Computational Biology Division, Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales 2010, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, New South Wales 2052, Australia
| | - Bingqing Zhao
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| | - Rebecca R Viales
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| | - Morgane Thomas-Chollier
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
- Institut Universitaire de France (IUF), 75005 Paris, France
| | - Denis Thieffry
- Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - David A Garfield
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
| |
Collapse
|