1
|
de Martin X, Oliva B, Santpere G. Recruitment of homodimeric proneural factors by conserved CAT-CAT E-boxes drives major epigenetic reconfiguration in cortical neurogenesis. Nucleic Acids Res 2024; 52:12895-12917. [PMID: 39494521 PMCID: PMC11602148 DOI: 10.1093/nar/gkae950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 10/03/2024] [Accepted: 10/09/2024] [Indexed: 11/05/2024] Open
Abstract
Proneural factors of the basic helix-loop-helix family coordinate neurogenesis and neurodifferentiation. Among them, NEUROG2 and NEUROD2 subsequently act to specify neurons of the glutamatergic lineage. Disruption of these factors, their target genes and binding DNA motifs has been linked to various neuropsychiatric disorders. Proneural factors bind to specific DNA motifs called E-boxes (hexanucleotides of the form CANNTG, composed of two CAN half sites on opposed strands). While corticogenesis heavily relies on E-box activity, the collaboration of proneural factors on different E-box types and their chromatin remodeling mechanisms remain largely unknown. Here, we conducted a comprehensive analysis using chromatin immunoprecipitation followed by sequencing (ChIP-seq) data for NEUROG2 and NEUROD2, along with time-matched single-cell RNA-seq, ATAC-seq and DNA methylation data from the developing mouse cortex. Our findings show that these factors are highly enriched in transiently active genomic regions during intermediate stages of neuronal differentiation. Although they primarily bind CAG-containing E-boxes, their binding in dynamic regions is notably enriched in CAT-CAT E-boxes (i.e. CATATG, denoted as 5'3' half sites for dimers), which undergo significant DNA demethylation and exhibit the highest levels of evolutionary constraint. Aided by HT-SELEX data reanalysis, structural modeling and DNA footprinting, we propose that these proneural factors exert maximal chromatin remodeling influence during intermediate stages of neurogenesis by binding as homodimers to CAT-CAT motifs. This study provides an in-depth integrative analysis of the dynamic regulation of E-boxes during neuronal development, enhancing our understanding of the mechanisms underlying the binding specificity of critical proneural factors.
Collapse
Affiliation(s)
- Xabier de Martin
- Neurogenomics Group, Hospital del Mar Research Institute, Parc de Recerca Biomèdica de Barcelona (PRBB), Dr. Aiguader, 88, Barcelona 08003, Catalonia, Spain
| | - Baldomero Oliva
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Dr. Aiguader, 88, Barcelona 08003 Catalonia, Spain
| | - Gabriel Santpere
- Neurogenomics Group, Hospital del Mar Research Institute, Parc de Recerca Biomèdica de Barcelona (PRBB), Dr. Aiguader, 88, Barcelona 08003, Catalonia, Spain
- Department of Neuroscience, Yale School of Medicine, 333 Cedar st., New Haven, CT 06510, USA
| |
Collapse
|
2
|
Mitra R, Li J, Sagendorf JM, Jiang Y, Cohen AS, Chiu TP, Glasscock CJ, Rohs R. Geometric deep learning of protein-DNA binding specificity. Nat Methods 2024; 21:1674-1683. [PMID: 39103447 PMCID: PMC11399107 DOI: 10.1038/s41592-024-02372-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 06/14/2024] [Indexed: 08/07/2024]
Abstract
Predicting protein-DNA binding specificity is a challenging yet essential task for understanding gene regulation. Protein-DNA complexes usually exhibit binding to a selected DNA target site, whereas a protein binds, with varying degrees of binding specificity, to a wide range of DNA sequences. This information is not directly accessible in a single structure. Here, to access this information, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity from protein-DNA structure. DeepPBS can be applied to experimental or predicted structures. Interpretable protein heavy atom importance scores for interface residues can be extracted. When aggregated at the protein residue level, these scores are validated through mutagenesis experiments. Applied to designed proteins targeting specific DNA sequences, DeepPBS was demonstrated to predict experimentally measured binding specificity. DeepPBS offers a foundation for machine-aided studies that advance our understanding of molecular interactions and guide experimental designs and synthetic biology.
Collapse
Affiliation(s)
- Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jared M Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Yibei Jiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Ari S Cohen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Cameron J Glasscock
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
3
|
Oriol F, Alberto M, Joachim AP, Patrick G, M BP, Ruben MF, Jaume B, Altair CH, Ferran P, Oriol G, Narcis FF, Baldo O. Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in cis-regulatory elements. NAR Genom Bioinform 2024; 6:lqae068. [PMID: 38867914 PMCID: PMC11167492 DOI: 10.1093/nargab/lqae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/18/2024] [Accepted: 05/23/2024] [Indexed: 06/14/2024] Open
Abstract
Transcription factor (TF) binding is a key component of genomic regulation. There are numerous high-throughput experimental methods to characterize TF-DNA binding specificities. Their application, however, is both laborious and expensive, which makes profiling all TFs challenging. For instance, the binding preferences of ∼25% human TFs remain unknown; they neither have been determined experimentally nor inferred computationally. We introduce a structure-based learning approach to predict the binding preferences of TFs and the automated modelling of TF regulatory complexes. We show the advantage of using our approach over the classical nearest-neighbor prediction in the limits of remote homology. Starting from a TF sequence or structure, we predict binding preferences in the form of motifs that are then used to scan a DNA sequence for occurrences. The best matches are either profiled with a binding score or collected for their subsequent modeling into a higher-order regulatory complex with DNA. Co-operativity is modelled by: (i) the co-localization of TFs and (ii) the structural modeling of protein-protein interactions between TFs and with co-factors. We have applied our approach to automatically model the interferon-β enhanceosome and the pioneering complexes of OCT4, SOX2 (or SOX11) and KLF4 with a nucleosome, which are compared with the experimentally known structures.
Collapse
Affiliation(s)
- Fornes Oriol
- Centre for Molecular Medicine and Therapeutics. BC Children's Hospital Research Institute. Department of Medical Genetics. University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Meseguer Alberto
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | | | - Gohl Patrick
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bota Patricia M
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Molina-Fernández Ruben
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bonet Jaume
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
- Laboratory of Protein Design & Immunoengineering. School of Engineering. Ecole Polytechnique Federale de Lausanne. Lausanne 1015, Vaud, Switzerland
| | - Chinchilla-Hernandez Altair
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Pegenaute Ferran
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Gallego Oriol
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Fernandez-Fuentes Narcis
- Institute of Biological, Environmental and Rural Science. Aberystwyth University, SY23 3DA Aberystwyth, UK
| | - Oliva Baldo
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| |
Collapse
|
4
|
Mitra R, Li J, Sagendorf JM, Jiang Y, Chiu TP, Rohs R. DeepPBS: Geometric deep learning for interpretable prediction of protein-DNA binding specificity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.15.571942. [PMID: 38293168 PMCID: PMC10827229 DOI: 10.1101/2023.12.15.571942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Predicting specificity in protein-DNA interactions is a challenging yet essential task for understanding gene regulation. Here, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity across protein families based on protein-DNA structures. The DeepPBS architecture allows investigation of different family-specific recognition patterns. DeepPBS can be applied to predicted structures, and can aid in the modeling of protein-DNA complexes. DeepPBS is interpretable and can be used to calculate protein heavy atom-level importance scores, demonstrated as a case-study on p53-DNA interface. When aggregated at the protein residue level, these scores conform well with alanine scanning mutagenesis experimental data. The inference time for DeepPBS is sufficiently fast for analyzing simulation trajectories, as demonstrated on a molecular-dynamics simulation of a Drosophila Hox-DNA tertiary complex with its cofactor. DeepPBS and its corresponding data resources offer a foundation for machine-aided protein-DNA interaction studies, guiding experimental choices and complex design, as well as advancing our understanding of molecular interactions.
Collapse
|
5
|
Leong RZL, Lim LH, Chew YL, Teo SS. de novo transcriptome assembly for discovering gene expressed in Holothuria leucospilota with exposed to copper. Anim Biotechnol 2023; 34:4474-4487. [PMID: 36576030 DOI: 10.1080/10495398.2022.2158094] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Sea cucumber is a bioremediator as it can composite organic matter and excrete inorganic matter. Sea cucumber has the potential to serve as a bioindicator in marine habitat as they provide an integrated insight into the status of their environment over long periods. Sea cucumbers are sensitive to the organic concentration in the marine environment and can effectively provide an early warning system for any organic contamination that can negatively impact the ecosystem. The availability of a reference transcriptome for sea cucumber would constitute an essential tool for identifying genes involved in crucial steps of the defence pathway. De novo assembly of RNA-seq data enables researchers to study the transcriptomes without needing a genome sequence. In this study, sea cucumbers fed with Kappaphycus alvarezii powder were treated with 0.20 mg/L copper concentration comprehensive transcriptome data containing 75,149 Unigenes, with a total length of 20,460,032 bp. A total of 8820 genes were predicted from the unigenes, annotated, and functionally categorized into 25 functional groups with approximately 20% cluster in signal transduction mechanism. The reference transcriptome presented and validated in this study is meaningful for identifying a wide range of gene(s) related to the bioindication of sea cucumber in a high copper environment.
Collapse
Affiliation(s)
| | - Lai Huat Lim
- Faculty of Applied Sciences, UCSI University, W. P. Kuala Lumpur, Malaysia
| | - Yik Ling Chew
- Faculty of Pharmaceutical Sciences, UCSI University, W. P. Kuala Lumpur, Malaysia
| | - Swee Sen Teo
- Faculty of Applied Sciences, UCSI University, W. P. Kuala Lumpur, Malaysia
- Centre of Research for Advanced Aquaculture (CORAA), UCSI University, Kuala Lumpur, Malaysia
| |
Collapse
|
6
|
Li P, Yu A, Sun R, Liu A. Function and Evolution of C1-2i Subclass of C2H2-Type Zinc Finger Transcription Factors in POPLAR. Genes (Basel) 2022; 13:genes13101843. [PMID: 36292728 PMCID: PMC9602059 DOI: 10.3390/genes13101843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 09/24/2022] [Accepted: 10/11/2022] [Indexed: 11/16/2022] Open
Abstract
C2H2 zinc finger (C2H2-ZF) transcription factors participate in various aspects of normal plant growth regulation and stress responses. C1-2i C2H2-ZFs are a special subclass of conserved proteins that contain two ZnF-C2H2 domains. Some C1-2i C2H2-ZFs in Arabidopsis (ZAT) are involved in stress resistance and other functions. However, there is limited information on C1-2i C2H2-ZFs in Populus trichocarpa (PtriZATs). To analyze the function and evolution of C1-2i C2H2-ZFs, eleven PtriZATs were identified in P. trichocarpa, which can be classified into two subgroups. The protein structure, conserved ZnF-C2H2 domains and QALGGH motifs, showed high conservation during the evolution of PtriZATs in P. trichocarpa. The spacing between two ZnF-C2H2 domains, chromosomal locations and cis-elements implied the original proteins and function of PtriZATs. Furthermore, the gene expression of different tissues and stress treatment showed the functional differentiation of PtriZATs subgroups and their stress response function. The analysis of C1-2i C2H2-ZFs in different Populus species and plants implied their evolution and differentiation, especially in terms of stress resistance. Cis-elements and expression pattern analysis of interaction proteins implied the function of PtriZATs through binding with stress-related genes, which are involved in gene regulation by via epigenetic modification through histone regulation, DNA methylation, ubiquitination, etc. Our results for the origin and evolution of PtriZATs will contribute to understanding the functional differentiation of C1-2i C2H2-ZFs in P. trichocarpa. The interaction and expression results will lay a foundation for the further functional investigation of their roles and biological processes in Populus.
Collapse
|
7
|
Find and cut-and-transfer (FiCAT) mammalian genome engineering. Nat Commun 2021; 12:7071. [PMID: 34862378 PMCID: PMC8642419 DOI: 10.1038/s41467-021-27183-x] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 11/09/2021] [Indexed: 12/26/2022] Open
Abstract
While multiple technologies for small allele genome editing exist, robust technologies for targeted integration of large DNA fragments in mammalian genomes are still missing. Here we develop a gene delivery tool (FiCAT) combining the precision of a CRISPR-Cas9 (find module), and the payload transfer efficiency of an engineered piggyBac transposase (cut-and-transfer module). FiCAT combines the functionality of Cas9 DNA scanning and targeting DNA, with piggyBac donor DNA processing and transfer capacity. PiggyBac functional domains are engineered providing increased on-target integration while reducing off-target events. We demonstrate efficient delivery and programmable insertion of small and large payloads in cellulo (human (Hek293T, K-562) and mouse (C2C12)) and in vivo in mouse liver. Finally, we evolve more efficient versions of FiCAT by generating a targeted diversity of 394,000 variants and undergoing 4 rounds of evolution. In this work, we develop a precise and efficient targeted insertion of multi kilobase DNA fragments in mammalian genomes. Mammalian genome engineering has advanced tremendously over the last decade, however there is still a need for robust gene writing with size scaling capacity. Here the authors present Find Cut-and-Transfer (FiCAT) technology to delivery large targeted payload insertion in cell lines and in vivo in mouse models.
Collapse
|