1
|
Zhai J, Zhang Y, Zhang C, Yin X, Song M, Tang C, Ding P, Li Z, Ma C. deepTFBS: Improving within- and Cross-Species Prediction of Transcription Factor Binding Using Deep Multi-Task and Transfer Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025:e03135. [PMID: 40411397 DOI: 10.1002/advs.202503135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2025] [Revised: 04/24/2025] [Indexed: 05/26/2025]
Abstract
The precise prediction of transcription factor binding sites (TFBSs) is crucial in understanding gene regulation. In this study, deepTFBS, a comprehensive deep learning (DL) framework that builds a robust DNA language model of TF binding grammar for accurately predicting TFBSs within and across plant species is presented. Taking advantages of multi-task DL and transfer learning, deepTFBS is capable of leveraging the knowledge learned from large-scale TF binding profiles to enhance the prediction of TFBSs under small-sample training and cross-species prediction tasks. When tested using available information on 359 Arabidopsis TFs, deepTFBS outperformed previously described prediction strategies, including position weight matrix, deepSEA and DanQ, with a 244.49%, 49.15%, and 23.32% improvement of the area under the precision-recall curve (PRAUC), respectively. Further cross-species prediction of TFBS in wheat showed that deepTFBS yielded a significant PRAUC improvement of 30.6% over these three baseline models. deepTFBS can also utilize information from gene conservation and binding motifs, enabling efficient TFBS prediction in species where experimental data availability is limited. A case study, focusing on the WUSCHEL (WUS) transcription factor, illustrated the potential use of deepTFBS in cross-species applications, in our example between Arabidopsis and wheat. deepTFBS is publically available at https://github.com/cma2015/deepTFBS.
Collapse
Affiliation(s)
- Jingjing Zhai
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Yuzhou Zhang
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Chujun Zhang
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Xiaotong Yin
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Minggui Song
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Chenglong Tang
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Pengjun Ding
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Zenglin Li
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Chuang Ma
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China
| |
Collapse
|
2
|
Wang M, Yuan Y, Zhao Y, Hu Z, Zhang S, Luo J, Jiang CZ, Zhang Y, Sun D. PhWRKY30 activates salicylic acid biosynthesis to positively regulate antiviral defense response in petunia. HORTICULTURE RESEARCH 2025; 12:uhaf013. [PMID: 40190442 PMCID: PMC11966387 DOI: 10.1093/hr/uhaf013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Accepted: 01/07/2025] [Indexed: 04/09/2025]
Abstract
Petunia (Petunia hybrida) plants are highly threatened by a diversity of viruses, causing substantial damage to ornamental quality and seed yield. However, the regulatory mechanism of virus resistance in petunia is largely unknown. Here, we revealed that a member of petunia WRKY transcription factors, PhWRKY30, was dramatically up-regulated following Tobacco rattle virus (TRV) infection. Down-regulation of PhWRKY30 through TRV-based virus-induced gene silencing increased green fluorescent protein (GFP)-marked TRV RNA accumulation and exacerbated the symptomatic severity. In comparison with wild-type (WT) plants, PhWRKY30-RNAi transgenic petunia plants exhibited a compromised resistance to TRV infection, whereas an enhanced resistance was observed in PhWRKY30-overexpressing (OE) transgenic plants. PhWRKY30 affected salicylic acid (SA) production and expression of arogenate dehydratase 1 (PhADT1), phenylalanine ammonia-lyase 1 (PhPAL1), PhPAL2b, nonexpressor of pathogenesis-related proteins 1 (PhNPR1), and PhPR1 in SA biosynthesis and signaling pathway. SA treatment restored the reduced TRV resistance to WT levels in PhWRKY30-RNAi plants, and application of SA biosynthesis inhibitor 2-aminoindan-2-phosphonic acid inhibited promoted resistance in PhWRKY30-OE plants. The protein-DNA binding assays showed that PhWRKY30 specifically bound to the promoter of PhPAL2b. RNAi silencing and overexpression of PhPAL2b led to decreased and increased TRV resistance, respectively. The transcription of a number of reactive oxygen species- and RNA silencing-associated genes was changed in PhWRKY30 and PhPAL2b transgenic lines. PhWRKY30 and PhPAL2b were further characterized to be involved in the resistance to Tobacco mosaic virus (TMV) invasion. Our findings demonstrate that PhWRKY30 positively regulates antiviral defense against TRV and TMV infections by modulating SA content.
Collapse
Affiliation(s)
- Meiling Wang
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yanping Yuan
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yike Zhao
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Zhuo Hu
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Shasha Zhang
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jianrang Luo
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Cai-Zhong Jiang
- Department of Plant Sciences, University of California, Davis, Davis, CA 95616, USA
- Crops Pathology and Genetics Research Unit, USDA-ARS, Davis, CA 95616, USA
| | - Yanlong Zhang
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Daoyang Sun
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
3
|
Muñoz V, Goluguri RR, Ghosh C, Tanielian B, Sadqi M. Mechanisms for DNA Interplay in Eukaryotic Transcription Factors. Annu Rev Biophys 2025; 54:121-139. [PMID: 39879549 DOI: 10.1146/annurev-biophys-071524-111008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
Like their prokaryotic counterparts, eukaryotic transcription factors must recognize specific DNA sites, search for them efficiently, and bind to them to help recruit or block the transcription machinery. For eukaryotic factors, however, the genetic signals are extremely complex and scattered over vast, multichromosome genomes, while the DNA interplay occurs in a varying landscape defined by chromatin remodeling events and epigenetic modifications. Eukaryotic factors are rich in intrinsically disordered regions and are also distinct in their recognition of short DNA motifs and utilization of open DNA interaction interfaces as ways to gain access to DNA on nucleosomes. Recent findings are revealing the profound, unforeseen implications of such characteristics for the mechanisms of DNA interplay. In this review we discuss these implications and how they are shaping the eukaryotic transcription control paradigm into one of promiscuous signal recognition, highly dynamic interactions, heterogeneous DNA scanning, and multiprong conformational control.
Collapse
Affiliation(s)
- Victor Muñoz
- CREST Center for Cellular and Biomolecular Machines, University of California, Merced, California, USA;
- Department of Bioengineering, University of California, Merced, California, USA
| | - Rama Reddy Goluguri
- CREST Center for Cellular and Biomolecular Machines, University of California, Merced, California, USA;
- Department of Bioengineering, University of California, Merced, California, USA
- Department of Biochemistry, Stanford University, Palo Alto, California, USA
| | - Catherine Ghosh
- CREST Center for Cellular and Biomolecular Machines, University of California, Merced, California, USA;
- Department of Bioengineering, University of California, Merced, California, USA
| | - Benjamin Tanielian
- CREST Center for Cellular and Biomolecular Machines, University of California, Merced, California, USA;
- Chemistry and Biochemistry Graduate Program, University of California, Merced, California, USA
| | - Mourad Sadqi
- CREST Center for Cellular and Biomolecular Machines, University of California, Merced, California, USA;
- Department of Bioengineering, University of California, Merced, California, USA
| |
Collapse
|
4
|
Al Masri C, Vilseck JZ, Yu J, Hayes RL. Multisite λ-Dynamics for Protein-DNA Binding Affinity Prediction. J Chem Theory Comput 2025; 21:3536-3544. [PMID: 40123340 PMCID: PMC11983716 DOI: 10.1021/acs.jctc.4c01408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2024] [Revised: 02/24/2025] [Accepted: 03/10/2025] [Indexed: 03/25/2025]
Abstract
Transcription factors (TFs) regulate gene expression by binding to specific DNA sequences, playing critical roles in cellular processes and disease pathways. Computational methods, particularly λ-Dynamics, offer a promising approach for predicting TF relative binding affinities. This study evaluates the effectiveness of different λ-Dynamics perturbation schemes in determining binding free energy changes (ΔΔGb) of the WRKY transcription factor upon mutating its W-box binding site (GGTCAA) to a nonspecific sequence (GATAAA). Among the schemes tested, the single λ per base pair protocol demonstrated the fastest convergence and highest precision. Extending this protocol to additional mutants (GGTCCG and GGACAA) yielded ΔΔGb values that successfully ranked binding affinities, showcasing its strong potential for high-throughput screening of DNA binding sites.
Collapse
Affiliation(s)
- Carmen Al Masri
- Department
of Physics and Astronomy, Uninversity of
California, Irvine, California 92697, United States
| | - Jonah Z. Vilseck
- Department
of Biochemistry and Molecular Biology, Center for Computational Biology
and Bioinformatics, Indiana University School
of Medicine, Indianapolis, Indiana 46202, United States
| | - Jin Yu
- Department
of Physics and Astronomy, Department of Chemistry, University of California, Irvine, California 92697, United States
| | - Ryan L. Hayes
- Department
of Chemical and Biomolecular Engineering, Department of Pharmaceutical
Sciences, University of California, Irvine, California 92697, United States
| |
Collapse
|
5
|
Mhlanga MM, Fanucchi S, Ozturk M, Divangahi M. Cellular and Molecular Mechanisms of Innate Memory Responses. Annu Rev Immunol 2025; 43:615-640. [PMID: 40279311 DOI: 10.1146/annurev-immunol-101721-035114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2025]
Abstract
There has been an increasing effort to understand the memory responses of a complex interplay among innate, adaptive, and structural cells in peripheral organs and bone marrow. Trained immunity is coined as the de facto memory of innate immune cells and their progenitors. These cells acquire epigenetic modifications and shift their metabolism to equip an imprinted signature to a persistent fast-responsive functional state. Recent studies highlight the contribution of noncoding RNAs and modulation of chromatin structures in establishing this epigenetic readiness for potential immune perturbations. In this review, we discuss recent studies that highlight trained immunity-mediated memory responses emerging intrinsically in innate immune cells and as a complex interplay with other cells at the organ level. Lastly, we survey epigenetic contributors to trained immunity phenotypes-specifically, a recently discovered regulatory circuit coordinating the regulation of a key driver of trained immunity.
Collapse
Affiliation(s)
- Musa M Mhlanga
- Epigenomics & Single Cell Biophysics Group, Department of Cell Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University, Nijmegen, The Netherlands;
- Department of Internal Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | | | - Mumin Ozturk
- Epigenomics & Single Cell Biophysics Group, Department of Cell Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University, Nijmegen, The Netherlands;
- Department of Internal Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maziar Divangahi
- Departments of Medicine, Pathology, and Microbiology & Immunology, McGill University, Montreal, Quebec, Canada
- McGill University Health Centre, McGill International TB Centre, and Meakins-Christie Laboratories, McGill University, Montreal, Quebec, Canada;
| |
Collapse
|
6
|
Vanhaeren T, Troncoso-García ADR, Torres Maldonado JF, Divina F, Martínez-García PM. Application of XAI to the prediction of CTCF binding sites. RESULTS IN ENGINEERING 2025; 25:103776. [DOI: 10.1016/j.rineng.2024.103776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
7
|
Mota LC, Silva EC, Quinde CA, Cieza B, Basu A, Rodrigues LMR, Vila MMDC, Balcão VM. Potential of a newly isolated lytic bacteriophage to control Pseudomonas coronafaciens pv. garcae in coffee plants: Molecular characterization with in vitro and ex vivo experiments. Enzyme Microb Technol 2025; 184:110573. [PMID: 39700746 DOI: 10.1016/j.enzmictec.2024.110573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Revised: 12/03/2024] [Accepted: 12/13/2024] [Indexed: 12/21/2024]
Abstract
Traditionally, control of coffee plant bacterial halo blight (BHB) caused by the phytopathogen Pseudomonas coronafaciens pv. garcae (Pcg) involves frequent spraying of coffee plantations with non-environmentally friendly and potentially bacterial resistance-promoting copper products or with kasugamycin hydrochloride. In this study we report a leap forward in the quest for a new ecofriendly approach, characterizing (both physicochemically and biologically) and testing both in vitro and ex vivo a new lytic phage for Pcg. An in-depth molecular (genomic and DNA structural features) characterization of the phage was also undertaken. Phage PcgS01F belongs to the class Caudoviricetes, Drexlerviridae family and genus Guelphvirus, and presents a siphovirus-like morphotype. Phage PcgS01F showed a latency period of 40 min and a burst size of 46 PFU/host cell, allowing to conclude that it replicates well in Pcg IBSBF-158. At Multiplicity Of Infection (MOI, or the ratio of phage to bacteria) 1000, the performance of phage PcgS01F was much better than at MOI 10, promoting increasing bacterial reductions until the end of the in vitro inactivation assays, stabilizing at a significant 82 % bacterial load reduction. Phage PcgS01F infected and killed Pcg cells ex vivo in coffee plant leaves artificially contaminated, with a maximum of Pcg inactivation of 7.66 log CFU/mL at MOI 1000 after 36 h of incubation. This study provides evidence that the isolated phage is a promising candidate against the causative agent of BHB in coffee plants.
Collapse
Affiliation(s)
- Luan C Mota
- VBlab - Laboratory of Bacterial Viruses, University of Sorocaba, Sorocaba, SP 18023-000, Brazil.
| | - Erica C Silva
- VBlab - Laboratory of Bacterial Viruses, University of Sorocaba, Sorocaba, SP 18023-000, Brazil.
| | - Carlos A Quinde
- Department of Biological Sciences, University of South Carolina, Columbia, SC, USA.
| | - Basilio Cieza
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins University, Baltimore, MD, USA.
| | - Aakash Basu
- Department of Biosciences, Durham University, Durham, United Kingdom.
| | - Lucas M R Rodrigues
- VBlab - Laboratory of Bacterial Viruses, University of Sorocaba, Sorocaba, SP 18023-000, Brazil; Agronomic Institute of Campinas (IAC), Centro de Café Alcides Carvalho, Campinas, SP 13075-630, Brazil.
| | - Marta M D C Vila
- VBlab - Laboratory of Bacterial Viruses, University of Sorocaba, Sorocaba, SP 18023-000, Brazil.
| | - Victor M Balcão
- VBlab - Laboratory of Bacterial Viruses, University of Sorocaba, Sorocaba, SP 18023-000, Brazil; Department of Biology and CESAM, University of Aveiro, Campus Universitário de Santiago, Aveiro P-3810-193, Portugal.
| |
Collapse
|
8
|
Wan B, Yu J. Protein target search diffusion-association/dissociation free energy landscape around DNA binding site with flanking sequences. Biophys J 2025; 124:677-692. [PMID: 39818622 PMCID: PMC11900189 DOI: 10.1016/j.bpj.2025.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 12/05/2024] [Accepted: 01/13/2025] [Indexed: 01/18/2025] Open
Abstract
In this work we present a minimal structure-based model of protein diffusional search along local DNA amid protein binding and unbinding events on the DNA, taking into account protein-DNA electrostatic interactions and hydrogen-bonding (HB) interactions or contacts at the interface. We accordingly constructed the protein diffusion-association/dissociation free energy surface and mapped it to 1D as the protein slides along DNA, maintaining the protein-DNA interfacial HB contacts that presumably dictate the DNA sequence information detection. Upon DNA helical path correction, the protein 1D diffusion rates along local DNA can be physically derived to be consistent with experimental measurements. We also show that the sequence-dependent protein sliding or stepping patterns along DNA are regulated by collective interfacial HB dynamics, which also determines the ruggedness of the protein diffusion free energy landscape on the local DNA. In comparison, protein association or binding with DNA are generically dictated by the protein-DNA electrostatic interactions, with an interaction zone of nanometers around DNA. Extra degrees of freedom (DOFs) of the protein such as rotations and conformational fluctuations can be well accommodated within the protein-DNA electrostatic interaction zone. As such we demonstrate that the protein binding or association free energy profiling along DNA smoothens over the 1D diffusion free energy landscape, which leads to population variations for an order of magnitude upon a marginal free energetic smoothening around the specific or consensus sites. We further show that the protein unbinding or dissociation from a comparatively high-binding affinity DNA site is dominated by lateral diffusion to the flanking low-affinity sites. The results predict that experimental characterizations on the relative protein-DNA binding affinities or population profiling on the DNA are systematically and physically impacted by the extra DOFs of protein motions aside from 1D translation or helical tracking, as well as from flanking DNA sequences due to protein 1D diffusion and nonspecific binding/unbinding.
Collapse
Affiliation(s)
- Biao Wan
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Jin Yu
- Department of Physics and Astronomy, Department of Chemistry, NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, California.
| |
Collapse
|
9
|
Schroeder JW, Wolfe MB, Freddolino L. ShapeME: A tool and web front-end for de novo discovery of structural motifs underpinning protein-DNA interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.28.635290. [PMID: 39975017 PMCID: PMC11838363 DOI: 10.1101/2025.01.28.635290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Determining where transcriptional regulators bind within a genome is paramount to understanding how gene expression is regulated. Historically, position weight matrices (PWMs) have been used to define the binding preferences of DNA binding proteins1. However, PWMs treat the identity of each base in a sequence as an independent and additive measure of binding preference, which can limit their utility2. Models that consider higher order interactions between nearby bases yield greater success in predicting proteins' binding to DNA, but for many proteins there is still substantial room for improvement in predicting and understanding the determinants of proteins' binding to DNA3. In addition to DNA sequence motifs, structural motifs (e.g., a narrow minor groove width) are important determinants of binding for some DNA-binding proteins4. Despite the initial success of algorithms using structural features of DNA to predict binding properties of proteins from either ChIP-seq or SELEX data5-8, there remains a need for a de novo structural motif discovery framework which can be applied to data from a variety of experimental designs. Here, we present a unified workflow, capable of utilizing virtually any type of data representing sequence coverage or enrichment (e.g. ChIP-seq, RNA-seq, SELEX, etc.), to discover short structural motifs with explanatory power for a protein's DNA binding preference. We couple the DNAshapeR algorithm9 with our own information-theoretic approach to de novo motif discovery, and wrap shape and sequence motif inference and model selection into a single tool called ShapeME. Application of our structural motif discovery algorithm to proteins with ChIP-seq data in ENCODE datasets reveals a subset of proteins where short structural motifs outperform the best PWM for that protein as determined from the JASPAR database, or as identified by the sequence motif elicitation tool STREME. Our approach offers a powerful and versatile framework for inferring structural DNA binding motifs, and will complement current sequence-based motif elicitation tools in discovery of protein-DNA interaction principles. A web-based interface to ShapeME is available at https://seq2fun.dcmb.med.umich.edu/shapeme, with full source code available at https://github.com/freddolino-lab/ShapeME.
Collapse
Affiliation(s)
- Jeremy W. Schroeder
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Michael B. Wolfe
- Department of Biochemistry, University of Wisconsin - Madison, Madison, WI 53706, USA
| | - Lydia Freddolino
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
10
|
Butt W, Lai B, Chiu TP, Bhattarai M, Qian S, Bishop AR, Duan J, Alexandrov BS, Rohs R, He X. Contribution of DNA breathing to physical interactions with transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.20.633840. [PMID: 39896490 PMCID: PMC11785057 DOI: 10.1101/2025.01.20.633840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
Interaction between transcription factors (TFs) and DNA plays a key role in regulating gene expression. It is generally believed that these interactions are controlled through recognition of DNA core motifs by TFs. Nevertheless, several studies pointed out the limitation of this view, in particular, DNA sequence variants influencing TF binding are often located outside of core motifs. One possible explanation is that the physical properties of DNA may play a role in TF-DNA interactions. Recent studies have supported the importance of DNA shape features, especially in flanking regions of core motifs. Another important physical property of DNA is DNA breathing, the spontaneous opening of double-stranded DNA through thermal motions. But there have been few genomic studies of the role of DNA breathing in TF-DNA interactions. In this work, we analyzed in vitro TF-DNA binding data of three TFs and found that DNA breathing features inside or near core motifs are correlated with binding affinity. This suggests that these TFs may prefer locally and temporally melted DNA formed through breathing. We extended the analysis to 44 TFs with in vivo ChIP-seq binding data. We found that for a large proportion of TFs, their breathing features in or near core motifs are associated with binding, but the sign and magnitude of these associations vary substantially across TF families. Altogether, our study supports the hypothesis that DNA breathing features near binding motifs contribute to TF-DNA interactions.
Collapse
Affiliation(s)
- Waqaas Butt
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Ben Lai
- Toyota Technology Institute of Chicago, Chicago, Illinois, United States of America
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Manish Bhattarai
- Theoretical Division, Los Alamos National Lab, Los Alamos, New Mexico, United States of America
| | - Sheng Qian
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Alan R. Bishop
- Theoretical Division, Los Alamos National Lab, Los Alamos, New Mexico, United States of America
| | - Jubao Duan
- Center for Psychiatric Genetics, NorthShore University HealthSystem Research Institute, Chicago, Illinois, United States of America
| | - Boian S. Alexandrov
- Theoretical Division, Los Alamos National Lab, Los Alamos, New Mexico, United States of America
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
- Departments of Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, California, United States of America
| | - Xin He
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
11
|
Liu L, Han L, Han K, Zhang Z, Zhang H, Zhang L. Identification of co-localised transcription factors based on paired motifs analysis. IET Syst Biol 2024; 18:238-249. [PMID: 39588827 DOI: 10.1049/syb2.12104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Revised: 10/02/2024] [Accepted: 10/24/2024] [Indexed: 11/27/2024] Open
Abstract
The interaction of transcription factors (TFs) with DNA precisely regulates gene transcription. In mammalian cells, thousands of TFs often interact with DNA cis-regulatory elements in a combinatorial manner rather than act alone. The identification of cooperativity between TFs can help to explore the mechanism of transcriptional regulation. However, little is known about the cooperative patterns of TFs in the genome. To identify which TFs prefer co-localisation, the authors conducted a paired motif analysis in the accessible regions of the human genome based on the Poisson background model. Especially, the authors distinguish the cooperative binding TFs and the competitive binding TFs according to the distance between TF motifs. In the K562 cell line, the authors find that TFs from a same family are always competing the same binding sites, such as FOS_JUN family, whereas KLF family TFs show significant cooperative binding in the adjacency region. Furthermore, the comparative analysis across 16 human cell lines indicates that most TF combination patterns are conserved, but there are still some cell-line-specific patterns. Finally, in human prostate cancer cells (PC-3) and human prostate normal cells (RWPE-2), the authors investigate the specific TF combination patterns in the disease cell and normal cell. The results show that the cooperative binding TF pairs shared by PC-3 and RWPE-2 account for over 90%. Simultaneously, the authors also identify 26 specific TF combination pairs in PC-3 cancer cells.
Collapse
Affiliation(s)
- Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Lu Han
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Kaiyuan Han
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zheng Zhang
- Computer Science and Information Systems, Murray State University, Murray, USA
| | - Haojiang Zhang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lirong Zhang
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| |
Collapse
|
12
|
Fatemi
Abhari SH, Di Felice R. Probing Electrostatic Interactions in DNA-Bound CRISPR/Cas9 Complexes by Molecular Dynamics Simulations. ACS OMEGA 2024; 9:44974-44988. [PMID: 39554421 PMCID: PMC11561601 DOI: 10.1021/acsomega.4c04359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 10/16/2024] [Accepted: 10/21/2024] [Indexed: 11/19/2024]
Abstract
Engineered protein mutations may be exploited to tune molecular interactions in the cellular environment. Here, we have explored the structural consequences of different Cas9 mutations in genome-editing CRISPR/Cas9 systems by means of Molecular Dynamics simulations. We have characterized mutation-induced structural changes and their implications for changes in protein-DNA, DNA-RNA, and DNA-DNA interactions. We present the analysis of multiple trajectories over the cumulative time scale of 7.7 μs, focusing on triple mutations that have been associated with enhancement of genome editing specificity, as well as control mutations. We find that the structural changes induced by the protein mutations are consistent with decreasing the strength of the interaction between Cas9 and the nontarget DNA strand. We discuss the implications of this finding for genome editing specificity.
Collapse
Affiliation(s)
- Seyedeh Hoda Fatemi
Abhari
- Department
of Physics and Astronomy, University of
Southern California, Los Angeles, California 90089, United States
| | - Rosa Di Felice
- Departments
of Physics and Astronomy and Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- CNR
Institute of Nanoscience, Modena 41125, Italy
| |
Collapse
|
13
|
Kabir A, Bhattarai M, Peterson S, Najman-Licht Y, Rasmussen K, Shehu A, Bishop A, Alexandrov B, Usheva A. DNA breathing integration with deep learning foundational model advances genome-wide binding prediction of human transcription factors. Nucleic Acids Res 2024; 52:e91. [PMID: 39271116 PMCID: PMC11514457 DOI: 10.1093/nar/gkae783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/21/2024] [Accepted: 08/29/2024] [Indexed: 09/15/2024] Open
Abstract
It was previously shown that DNA breathing, thermodynamic stability, as well as transcriptional activity and transcription factor (TF) bindings are functionally correlated. To ascertain the precise relationship between TF binding and DNA breathing, we developed the multi-modal deep learning model EPBDxDNABERT-2, which is based on the Extended Peyrard-Bishop-Dauxois (EPBD) nonlinear DNA dynamics model. To train our EPBDxDNABERT-2, we used chromatin immunoprecipitation sequencing (ChIP-Seq) data comprising 690 ChIP-seq experimental results encompassing 161 distinct TFs and 91 human cell types. EPBDxDNABERT-2 significantly improves the prediction of over 660 TF-DNA, with an increase in the area under the receiver operating characteristic (AUROC) metric of up to 9.6% when compared to the baseline model that does not leverage DNA biophysical properties. We expanded our analysis to in vitro high-throughput Systematic Evolution of Ligands by Exponential enrichment (HT-SELEX) dataset of 215 TFs from 27 families, comparing EPBD with established frameworks. The integration of the DNA breathing features with DNABERT-2 foundational model, greatly enhanced TF-binding predictions. Notably, EPBDxDNABERT-2, trained on a large-scale multi-species genomes, with a cross-attention mechanism, improved predictive power shedding light on the mechanisms underlying disease-related non-coding variants discovered in genome-wide association studies.
Collapse
Affiliation(s)
- Anowarul Kabir
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544 NM, USA
- Department of Computer Science, George Mason University, 4400 University Dr, 22030 VA, USA
| | - Manish Bhattarai
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544 NM, USA
| | - Selma Peterson
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544 NM, USA
| | | | - Kim Ø Rasmussen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544 NM, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, 4400 University Dr, 22030 VA, USA
| | - Alan R Bishop
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544 NM, USA
| | - Boian Alexandrov
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, 87544 NM, USA
| | - Anny Usheva
- Department of Surgery, Brown University, 69 Brown St Box 1822, 02912 RI, USA
| |
Collapse
|
14
|
Mitra R, Li J, Sagendorf JM, Jiang Y, Cohen AS, Chiu TP, Glasscock CJ, Rohs R. Geometric deep learning of protein-DNA binding specificity. Nat Methods 2024; 21:1674-1683. [PMID: 39103447 PMCID: PMC11399107 DOI: 10.1038/s41592-024-02372-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 06/14/2024] [Indexed: 08/07/2024]
Abstract
Predicting protein-DNA binding specificity is a challenging yet essential task for understanding gene regulation. Protein-DNA complexes usually exhibit binding to a selected DNA target site, whereas a protein binds, with varying degrees of binding specificity, to a wide range of DNA sequences. This information is not directly accessible in a single structure. Here, to access this information, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity from protein-DNA structure. DeepPBS can be applied to experimental or predicted structures. Interpretable protein heavy atom importance scores for interface residues can be extracted. When aggregated at the protein residue level, these scores are validated through mutagenesis experiments. Applied to designed proteins targeting specific DNA sequences, DeepPBS was demonstrated to predict experimentally measured binding specificity. DeepPBS offers a foundation for machine-aided studies that advance our understanding of molecular interactions and guide experimental designs and synthetic biology.
Collapse
Affiliation(s)
- Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jared M Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Yibei Jiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Ari S Cohen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Cameron J Glasscock
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
15
|
Oka H, Kojima T, Kato R, Ihara K, Nakano H. Construction of transcript regulation mechanism prediction models based on binding motif environment of transcription factor AoXlnR in Aspergillus oryzae. J Bioinform Comput Biol 2024; 22:2450017. [PMID: 39051143 DOI: 10.1142/s0219720024500173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
DNA-binding transcription factors (TFs) play a central role in transcriptional regulation mechanisms, mainly through their specific binding to target sites on the genome and regulation of the expression of downstream genes. Therefore, a comprehensive analysis of the function of these TFs will lead to the understanding of various biological mechanisms. However, the functions of TFs in vivo are diverse and complicated, and the identified binding sites on the genome are not necessarily involved in the regulation of downstream gene expression. In this study, we investigated whether DNA structural information around the binding site of TFs can be used to predict the involvement of the binding site in the regulation of the expression of genes located downstream of the binding site. Specifically, we calculated the structural parameters based on the DNA shape around the DNA binding motif located upstream of the gene whose expression is directly regulated by one TF AoXlnR from Aspergillus oryzae, and showed that the presence or absence of expression regulation can be predicted from the sequence information with high accuracy ([Formula: see text]-1.0) by machine learning incorporating these parameters.
Collapse
Affiliation(s)
- Hiroya Oka
- Department of Applied Biosciences, Graduate School of Bioagricultural Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Takaaki Kojima
- Department of Applied Biosciences, Graduate School of Bioagricultural Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan
- Department of Agrobiological Resources, Faculty of Agriculture, Meijo University, Shiogamaguchi, Tempaku Nagoya 468-8502, Japan
| | - Ryuji Kato
- Department of Basic Medicinal Sciences, Graduate School of Pharmaceutical Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan
| | - Kunio Ihara
- Center for Gene Research, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8602, Japan
| | - Hideo Nakano
- Department of Applied Biosciences, Graduate School of Bioagricultural Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan
| |
Collapse
|
16
|
Vora DS, Bhandari SM, Sundar D. DNA shape features improve prediction of CRISPR/Cas9 activity. Methods 2024; 226:120-126. [PMID: 38641083 DOI: 10.1016/j.ymeth.2024.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/27/2024] [Accepted: 04/10/2024] [Indexed: 04/21/2024] Open
Abstract
The CRISPR/Cas9 genome editing technology has transformed basic and translational research in biology and medicine. However, the advances are hindered by off-target effects and a paucity in the knowledge of the mechanism of the Cas9 protein. Machine learning models have been proposed for the prediction of Cas9 activity at unintended sites, yet feature engineering plays a major role in the outcome of the predictors. This study evaluates the improvement in the performance of similar predictors upon inclusion of epigenetic and DNA shape feature groups in the conventionally used sequence-based Cas9 target and off-target datasets. The approach involved the utilization of neural networks trained on a diverse range of parameters, allowing us to systematically assess the performance increase for the meticulously designed datasets- (i) sequence only, (ii) sequence and epigenetic features, and (iii) sequence, epigenetic and DNA shape feature datasets. The addition of DNA shape information significantly improved predictive performance, evaluated by Akaike and Bayesian information criteria. The evaluation of individual feature importance by permutation and LIME-based methods also indicates that not only sequence features like mismatches and nucleotide composition, but also base pairing parameters like opening and stretch, that are indicative of distortion in the DNA-RNA hybrid in the presence of mismatches, influence model outcomes.
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India.
| | - Sakshi Manoj Bhandari
- Department of Mathematics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India.
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India; School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India.
| |
Collapse
|
17
|
Zhuang J, Huang X, Liu S, Gao W, Su R, Feng K. MulTFBS: A Spatial-Temporal Network with Multichannels for Predicting Transcription Factor Binding Sites. J Chem Inf Model 2024; 64:4322-4333. [PMID: 38733561 DOI: 10.1021/acs.jcim.3c02088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2024]
Abstract
Revealing the mechanisms that influence transcription factor binding specificity is the key to understanding gene regulation. In previous studies, DNA double helix structure and one-hot embedding have been used successfully to design computational methods for predicting transcription factor binding sites (TFBSs). However, DNA sequence as a kind of biological language, the method of word embedding representation in natural language processing, has not been considered properly in TFBS prediction models. In our work, we integrate different types of features of DNA sequence to design a multichanneled deep learning framework, namely MulTFBS, in which independent one-hot encoding, word embedding encoding, which can incorporate contextual information and extract the global features of the sequences, and double helix three-dimensional structural features have been trained in different channels. To extract sequence high-level information effectively, in our deep learning framework, we select the spatial-temporal network by combining convolutional neural networks and bidirectional long short-term memory networks with attention mechanism. Compared with six state-of-the-art methods on 66 universal protein-binding microarray data sets of different transcription factors, MulTFBS performs best on all data sets in the regression tasks, with the average R2 of 0.698 and the average PCC of 0.833, which are 5.4% and 3.2% higher, respectively, than the suboptimal method CRPTS. In addition, we evaluate the classification performance of MulTFBS for distinguishing bound or unbound regions on TF ChIP-seq data. The results show that our framework also performs well in the TFBS classification tasks.
Collapse
Affiliation(s)
- Jujuan Zhuang
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Xinru Huang
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Shuhan Liu
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Wanquan Gao
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Rui Su
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Kexin Feng
- The School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
18
|
Yuan Y, Zeng L, Kong D, Mao Y, Xu Y, Wang M, Zhao Y, Jiang CZ, Zhang Y, Sun D. Abscisic acid-induced transcription factor PsMYB306 negatively regulates tree peony bud dormancy release. PLANT PHYSIOLOGY 2024; 194:2449-2471. [PMID: 38206196 PMCID: PMC10980420 DOI: 10.1093/plphys/kiae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/08/2023] [Accepted: 12/02/2023] [Indexed: 01/12/2024]
Abstract
Bud dormancy is a crucial strategy for perennial plants to withstand adverse winter conditions. However, the regulatory mechanism of bud dormancy in tree peony (Paeonia suffruticosa) remains largely unknown. Here, we observed dramatically reduced and increased accumulation of abscisic acid (ABA) and bioactive gibberellins (GAs) GA1 and GA3, respectively, during bud endodormancy release of tree peony under prolonged chilling treatment. An Illumina RNA sequencing study was performed to identify potential genes involved in the bud endodormancy regulation in tree peony. Correlation matrix, principal component, and interaction network analyses identified a downregulated MYB transcription factor gene, PsMYB306, the expression of which positively correlated with 9-CIS-EPOXYCAROTENOID DIOXYGENASE 3 (PsNCED3) expression. Protein modeling analysis revealed 4 residues within the R2R3 domain of PsMYB306 to possess DNA binding capability. Transcription of PsMYB306 was increased by ABA treatment. Overexpression of PsMYB306 in petunia (Petunia hybrida) inhibited seed germination and plant growth, concomitant with elevated ABA and decreased GA contents. Silencing of PsMYB306 accelerated cold-triggered tree peony bud burst and influenced the production of ABA and GAs and the expression of their biosynthetic genes. ABA application reduced bud dormancy release and transcription of ENT-KAURENOIC ACID OXIDASE 1 (PsKAO1), GA20-OXIDASE 1 (PsGA20ox1), and GA3-OXIDASE 1 (PsGA3ox1) associated with GA biosynthesis in PsMYB306-silenced buds. In vivo and in vitro binding assays confirmed that PsMYB306 specifically transactivated the promoter of PsNCED3. Silencing of PsNCED3 also promoted bud break and growth. Altogether, our findings suggest that PsMYB306 negatively modulates cold-induced bud endodormancy release by regulating ABA production.
Collapse
Affiliation(s)
- Yanping Yuan
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Lingling Zeng
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Derong Kong
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yanxiang Mao
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yingru Xu
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Meiling Wang
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yike Zhao
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Cai-Zhong Jiang
- Department of Plant Sciences, University of California, Davis, Davis, CA 95616, USA
- Crops Pathology and Genetics Research Unit, USDA-ARS, Davis, CA 95616, USA
| | - Yanlong Zhang
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Daoyang Sun
- College of Landscape Architecture and Arts, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
19
|
Li J, Chiu TP, Rohs R. Predicting DNA structure using a deep learning method. Nat Commun 2024; 15:1243. [PMID: 38336958 PMCID: PMC10858265 DOI: 10.1038/s41467-024-45191-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 01/17/2024] [Indexed: 02/12/2024] Open
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA structure, also described as DNA shape, plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k-mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, DNA structural features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing an understanding of the effects of flanking regions on DNA structure in a target region of a sequence. The Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as versatile and powerful tool for diverse DNA structure-related studies.
Collapse
Affiliation(s)
- Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, 90089, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
20
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
21
|
Silva EC, Quinde CA, Cieza B, Basu A, Vila MMDC, Balcão VM. Molecular Characterization and Genome Mechanical Features of Two Newly Isolated Polyvalent Bacteriophages Infecting Pseudomonas syringae pv. garcae. Genes (Basel) 2024; 15:113. [PMID: 38255005 PMCID: PMC10815195 DOI: 10.3390/genes15010113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 01/06/2024] [Accepted: 01/16/2024] [Indexed: 01/24/2024] Open
Abstract
Coffee plants have been targeted by a devastating bacterial disease, a condition known as bacterial blight, caused by the phytopathogen Pseudomonas syringae pv. garcae (Psg). Conventional treatments of coffee plantations affected by the disease involve frequent spraying with copper- and kasugamycin-derived compounds, but they are both highly toxic to the environment and stimulate the appearance of bacterial resistance. Herein, we report the molecular characterization and mechanical features of the genome of two newly isolated (putative polyvalent) lytic phages for Psg. The isolated phages belong to class Caudoviricetes and present a myovirus-like morphotype belonging to the genuses Tequatrovirus (PsgM02F) and Phapecoctavirus (PsgM04F) of the subfamilies Straboviridae (PsgM02F) and Stephanstirmvirinae (PsgM04F), according to recent bacterial viruses' taxonomy, based on their complete genome sequences. The 165,282 bp (PsgM02F) and 151,205 bp (PsgM04F) genomes do not feature any lysogenic-related (integrase) genes and, hence, can safely be assumed to follow a lytic lifestyle. While phage PsgM02F produced a morphogenesis yield of 124 virions per host cell, phage PsgM04F produced only 12 virions per host cell, indicating that they replicate well in Psg with a 50 min latency period. Genome mechanical analyses established a relationship between genome bendability and virion morphogenesis yield within infected host cells.
Collapse
Affiliation(s)
- Erica C. Silva
- VBlab—Laboratory of Bacterial Viruses, University of Sorocaba, Sorocaba 18023-000, SP, Brazil; (E.C.S.); (M.M.D.C.V.)
| | - Carlos A. Quinde
- Department of Biological Sciences, University of South Carolina, Columbia, SC 29208, USA;
| | - Basilio Cieza
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins University, Baltimore, MD 21218, USA;
| | - Aakash Basu
- Department of Biosciences, Durham University, Durham DH1 3LE, UK;
| | - Marta M. D. C. Vila
- VBlab—Laboratory of Bacterial Viruses, University of Sorocaba, Sorocaba 18023-000, SP, Brazil; (E.C.S.); (M.M.D.C.V.)
| | - Victor M. Balcão
- VBlab—Laboratory of Bacterial Viruses, University of Sorocaba, Sorocaba 18023-000, SP, Brazil; (E.C.S.); (M.M.D.C.V.)
- Department of Biology and CESAM, University of Aveiro, Campus Universitário de Santiago, P-3810-193 Aveiro, Portugal
| |
Collapse
|
22
|
Kalsan M, Jabeen A, Ahmad S. Incorporating Sequence-Dependent DNA Shape and Dynamics into Transcriptome Data Analysis. Methods Mol Biol 2024; 2812:317-343. [PMID: 39068371 DOI: 10.1007/978-1-0716-3886-6_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Differentially expressed genes in a cellular context may be co-regulated by the same transcription factor. However, in the absence of a concurrent transcription factor binding data, such interactions are difficult to detect, especially at the single cell expression level. Motif enrichments in such genes can be used to gain insight into differential expressions caused by the shared upstream TFs. However, it is now established that many genes are co-regulated by the same TF due to a shared DNA shape or sequence-dependent conformational dynamics instead of sequence motif. In this work, we demonstrate how, starting from a gene expression data, such DNA shape and dynamics signatures can be potentially detected using publicly available tools, including DynaSeq, developed in our group for predicting the sequence-dependent components of these DNA shape features.
Collapse
Affiliation(s)
- Manisha Kalsan
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Almas Jabeen
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
| |
Collapse
|
23
|
Kratochvilová L, Vojsovič M, Valková N, Šislerová L, El Rashed Z, Inga A, Monti P, Brázda V. The presence of a G-quadruplex prone sequence upstream of a minimal promoter increases transcriptional activity in the yeast Saccharomyces cerevisiae. Biosci Rep 2023; 43:BSR20231348. [PMID: 38112096 PMCID: PMC10730334 DOI: 10.1042/bsr20231348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 11/07/2023] [Accepted: 11/21/2023] [Indexed: 12/20/2023] Open
Abstract
Non-canonical secondary structures in DNA are increasingly being revealed as critical players in DNA metabolism, including modulating the accessibility and activity of promoters. These structures comprise the so-called G-quadruplexes (G4s) that are formed from sequences rich in guanine bases. Using a well-defined transcriptional reporter system, we sought to systematically investigate the impact of the presence of G4 structures on transcription in yeast Saccharomyces cerevisiae. To this aim, different G4 prone sequences were modeled to vary the chance of intramolecular G4 formation, analyzed in vitro by Thioflavin T binding test and circular dichroism and then placed at the yeast ADE2 locus on chromosome XV, downstream and adjacent to a P53 response element (RE) and upstream from a minimal CYC1 promoter and Luciferase 1 (LUC1) reporter gene in isogenic strains. While the minimal CYC1 promoter provides basal reporter activity, the P53 RE enables LUC1 transactivation under the control of P53 family proteins expressed under the inducible GAL1 promoter. Thus, the impact of the different G4 prone sequences on both basal and P53 family protein-dependent expression was measured after shifting cells onto galactose containing medium. The results showed that the presence of G4 prone sequences upstream of a yeast minimal promoter increased its basal activity proportionally to their potential to form intramolecular G4 structures; consequently, this feature, when present near the target binding site of P53 family transcription factors, can be exploited to regulate the transcriptional activity of P53, P63 and P73 proteins.
Collapse
Affiliation(s)
- Libuše Kratochvilová
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 61200 Brno, Czech Republic
- Department of Food Chemistry and Biotechnology, Faculty of Chemistry, Brno University of Technology, Purkyňova 118, 61200 Brno, Czech Republic
| | - Matúš Vojsovič
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 61200 Brno, Czech Republic
- Department of Food Chemistry and Biotechnology, Faculty of Chemistry, Brno University of Technology, Purkyňova 118, 61200 Brno, Czech Republic
| | - Natália Valková
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 61200 Brno, Czech Republic
| | - Lucie Šislerová
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 61200 Brno, Czech Republic
- Department of Food Chemistry and Biotechnology, Faculty of Chemistry, Brno University of Technology, Purkyňova 118, 61200 Brno, Czech Republic
| | - Zeinab El Rashed
- Gene Expression Regulation SSD, IRCCS Ospedale Policlinico San Martino, 16132 Genoa, Italy
| | - Alberto Inga
- Laboratory of Transcriptional Networks, Department of Cellular, Computational and Integrative Biology, CIBIO, University of Trento, via Sommarive 9, 38123 Trento, Italy
| | - Paola Monti
- Mutagenesis and Cancer Prevention UO, IRCCS Ospedale Policlinico San Martino, 16132 Genoa, Italy
| | - Václav Brázda
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 61200 Brno, Czech Republic
- Department of Food Chemistry and Biotechnology, Faculty of Chemistry, Brno University of Technology, Purkyňova 118, 61200 Brno, Czech Republic
| |
Collapse
|
24
|
Mitra R, Li J, Sagendorf JM, Jiang Y, Chiu TP, Rohs R. DeepPBS: Geometric deep learning for interpretable prediction of protein-DNA binding specificity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.15.571942. [PMID: 38293168 PMCID: PMC10827229 DOI: 10.1101/2023.12.15.571942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Predicting specificity in protein-DNA interactions is a challenging yet essential task for understanding gene regulation. Here, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity across protein families based on protein-DNA structures. The DeepPBS architecture allows investigation of different family-specific recognition patterns. DeepPBS can be applied to predicted structures, and can aid in the modeling of protein-DNA complexes. DeepPBS is interpretable and can be used to calculate protein heavy atom-level importance scores, demonstrated as a case-study on p53-DNA interface. When aggregated at the protein residue level, these scores conform well with alanine scanning mutagenesis experimental data. The inference time for DeepPBS is sufficiently fast for analyzing simulation trajectories, as demonstrated on a molecular-dynamics simulation of a Drosophila Hox-DNA tertiary complex with its cofactor. DeepPBS and its corresponding data resources offer a foundation for machine-aided protein-DNA interaction studies, guiding experimental choices and complex design, as well as advancing our understanding of molecular interactions.
Collapse
|
25
|
Li J, Chiu TP, Rohs R. Deep DNAshape: Predicting DNA shape considering extended flanking regions using a deep learning method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.22.563383. [PMID: 37961633 PMCID: PMC10634709 DOI: 10.1101/2023.10.22.563383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA shape plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k -mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, refined DNA shape features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing a deeper understanding of the effects of flanking regions on DNA shape in a target region of a sequence. Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as a versatile and powerful tool for diverse DNA structure-related studies.
Collapse
|
26
|
Zhang P, Wang H, Xu H, Wei L, Liu L, Hu Z, Wang X. Deep flanking sequence engineering for efficient promoter design using DeepSEED. Nat Commun 2023; 14:6309. [PMID: 37813854 PMCID: PMC10562447 DOI: 10.1038/s41467-023-41899-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 09/20/2023] [Indexed: 10/11/2023] Open
Abstract
Designing promoters with desirable properties is essential in synthetic biology. Human experts are skilled at identifying strong explicit patterns in small samples, while deep learning models excel at detecting implicit weak patterns in large datasets. Biologists have described the sequence patterns of promoters via transcription factor binding sites (TFBSs). However, the flanking sequences of cis-regulatory elements, have long been overlooked and often arbitrarily decided in promoter design. To address this limitation, we introduce DeepSEED, an AI-aided framework that efficiently designs synthetic promoters by combining expert knowledge with deep learning techniques. DeepSEED has demonstrated success in improving the properties of Escherichia coli constitutive, IPTG-inducible, and mammalian cell doxycycline (Dox)-inducible promoters. Furthermore, our results show that DeepSEED captures the implicit features in flanking sequences, such as k-mer frequencies and DNA shape features, which are crucial for determining promoter properties.
Collapse
Affiliation(s)
- Pengcheng Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China
| | - Haochen Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China
| | - Hanwen Xu
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China
| | - Lei Wei
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China
| | - Liyang Liu
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China
| | - Zhirui Hu
- Center for Statistical Science, Tsinghua University, Beijing, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
27
|
Horton CA, Alexandari AM, Hayes MGB, Marklund E, Schaepe JM, Aditham AK, Shah N, Suzuki PH, Shrikumar A, Afek A, Greenleaf WJ, Gordân R, Zeitlinger J, Kundaje A, Fordyce PM. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 2023; 381:eadd1250. [PMID: 37733848 DOI: 10.1126/science.add1250] [Citation(s) in RCA: 78] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/26/2023] [Indexed: 09/23/2023]
Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.
Collapse
Affiliation(s)
- Connor A Horton
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Michael G B Hayes
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Julia M Schaepe
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Arjun K Aditham
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
| | - Nilay Shah
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Peter H Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ariel Afek
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | | | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
- The University of Kansas Medical Center, Kansas City, KS 66103, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Polly M Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94110, USA
| |
Collapse
|
28
|
Liu Z, Samee M. Structural underpinnings of mutation rate variations in the human genome. Nucleic Acids Res 2023; 51:7184-7197. [PMID: 37395403 PMCID: PMC10415140 DOI: 10.1093/nar/gkad551] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/06/2023] [Accepted: 06/15/2023] [Indexed: 07/04/2023] Open
Abstract
Single nucleotide mutation rates have critical implications for human evolution and genetic diseases. Importantly, the rates vary substantially across the genome and the principles underlying such variations remain poorly understood. A recent model explained much of this variation by considering higher-order nucleotide interactions in the 7-mer sequence context around mutated nucleotides. This model's success implicates a connection between DNA shape and mutation rates. DNA shape, i.e. structural properties like helical twist and tilt, is known to capture interactions between nucleotides within a local context. Thus, we hypothesized that changes in DNA shape features at and around mutated positions can explain mutation rate variations in the human genome. Indeed, DNA shape-based models of mutation rates showed similar or improved performance over current nucleotide sequence-based models. These models accurately characterized mutation hotspots in the human genome and revealed the shape features whose interactions underlie mutation rate variations. DNA shape also impacts mutation rates within putative functional regions like transcription factor binding sites where we find a strong association between DNA shape and position-specific mutation rates. This work demonstrates the structural underpinnings of nucleotide mutations in the human genome and lays the groundwork for future models of genetic variations to incorporate DNA shape.
Collapse
Affiliation(s)
- Zian Liu
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
29
|
Samee MAH. Noncanonical binding of transcription factors: time to revisit specificity? Mol Biol Cell 2023; 34:pe4. [PMID: 37486893 PMCID: PMC10398899 DOI: 10.1091/mbc.e22-08-0325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 06/05/2023] [Accepted: 06/21/2023] [Indexed: 07/26/2023] Open
Abstract
Transcription factors (TFs) are one of the most studied classes of DNA-binding proteins that have a direct functional impact on gene transcription and thus, on human physiology and disease. The mechanisms that TFs use for recognizing target DNA binding sites have been studied for nearly five decades, yet they remain poorly understood. It is classically assumed that a TF recognizes a specific sequence pattern, or motif, as its binding sites. However, recent studies are consistently finding examples of noncanonical binding, that is, TFs binding at sites that do not resemble their sequence motifs. Here we review the current literature on four major types of noncanonical TF binding, namely binding based on DNA shape readout, at Guanine-quadruplex structures, at repeat sequences, and bispecific binding. These examples point to a critical need for studies to unify our current observations, many of which are at odds with the "one TF, one motif" view, into a more comprehensive definition of the DNA-binding specificity of TFs.
Collapse
|
30
|
Cooper BH, Dantas Machado AC, Gan Y, Aparicio O, Rohs R. DNA binding specificity of all four Saccharomyces cerevisiae forkhead transcription factors. Nucleic Acids Res 2023; 51:5621-5633. [PMID: 37177995 PMCID: PMC10287902 DOI: 10.1093/nar/gkad372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 04/19/2023] [Accepted: 04/27/2023] [Indexed: 05/15/2023] Open
Abstract
Quantifying the nucleotide preferences of DNA binding proteins is essential to understanding how transcription factors (TFs) interact with their targets in the genome. High-throughput in vitro binding assays have been used to identify the inherent DNA binding preferences of TFs in a controlled environment isolated from confounding factors such as genome accessibility, DNA methylation, and TF binding cooperativity. Unfortunately, many of the most common approaches for measuring binding preferences are not sensitive enough for the study of moderate-to-low affinity binding sites, and are unable to detect small-scale differences between closely related homologs. The Forkhead box (FOX) family of TFs is known to play a crucial role in regulating a variety of key processes from proliferation and development to tumor suppression and aging. By using the high-sequencing depth SELEX-seq approach to study all four FOX homologs in Saccharomyces cerevisiae, we have been able to precisely quantify the contribution and importance of nucleotide positions all along an extended binding site. Essential to this process was the alignment of our SELEX-seq reads to a set of candidate core sequences determined using a recently developed tool for the alignment of enriched k-mers and a newly developed approach for the reprioritization of candidate cores.
Collapse
Affiliation(s)
- Brendon H Cooper
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Ana Carolina Dantas Machado
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Yan Gan
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Oscar M Aparicio
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
- Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
- Departments of Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
31
|
Dey U, Olymon K, Banik A, Abbas E, Yella VR, Kumar A. DNA structural properties of DNA binding sites for 21 transcription factors in the mycobacterial genome. Front Cell Infect Microbiol 2023; 13:1147544. [PMID: 37396305 PMCID: PMC10312376 DOI: 10.3389/fcimb.2023.1147544] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 05/19/2023] [Indexed: 07/04/2023] Open
Abstract
Mycobacterium tuberculosis, the causative agent of tuberculosis, has evolved over time into a multidrug resistance strain that poses a serious global pandemic health threat. The ability to survive and remain dormant within the host macrophage relies on multiple transcription factors contributing to virulence. To date, very limited structural insights from crystallographic and NMR studies are available for TFs and TF-DNA binding events. Understanding the role of DNA structure in TF binding is critical to deciphering MTB pathogenicity and has yet to be resolved at the genome scale. In this work, we analyzed the compositional and conformational preference of 21 mycobacterial TFs, evident at their DNA binding sites, in local and global scales. Results suggest that most TFs prefer binding to genomic regions characterized by unique DNA structural signatures, namely, high electrostatic potential, narrow minor grooves, high propeller twist, helical twist, intrinsic curvature, and DNA rigidity compared to the flanking sequences. Additionally, preference for specific trinucleotide motifs, with clear periodic signals of tetranucleotide motifs, are observed in the vicinity of the TF-DNA interactions. Altogether, our study reports nuanced DNA shape and structural preferences of 21 TFs.
Collapse
Affiliation(s)
- Upalabdha Dey
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, India
| | - Kaushika Olymon
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, India
| | - Anikesh Banik
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, India
| | - Eshan Abbas
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, India
| | - Venkata Rajesh Yella
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, India
| | - Aditya Kumar
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, India
| |
Collapse
|
32
|
Marri D, Filipovic D, Kana O, Tischkau S, Bhattacharya S. Prediction of mammalian tissue-specific CLOCK-BMAL1 binding to E-box DNA motifs. Sci Rep 2023; 13:7742. [PMID: 37173345 PMCID: PMC10182026 DOI: 10.1038/s41598-023-34115-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 04/25/2023] [Indexed: 05/15/2023] Open
Abstract
The Brain and Muscle ARNTL-Like 1 protein (BMAL1) forms a heterodimer with either Circadian Locomotor Output Cycles Kaput (CLOCK) or Neuronal PAS domain protein 2 (NPAS2) to act as a master regulator of the mammalian circadian clock gene network. The dimer binds to E-box gene regulatory elements on DNA, activating downstream transcription of clock genes. Identification of transcription factor binding sites and genomic features that correlate to DNA binding by BMAL1 is a challenging problem, given that CLOCK-BMAL1 or NPAS2-BMAL1 bind to several distinct binding motifs (CANNTG) on DNA. Using three different types of tissue-specific machine learning models with features based on (1) DNA sequence, (2) DNA sequence plus DNA shape, and (3) DNA sequence and shape plus histone modifications, we developed an interpretable predictive model of genome-wide BMAL1 binding to E-box motifs and dissected the mechanisms underlying BMAL1-DNA binding. Our results indicated that histone modifications, the local shape of the DNA, and the flanking sequence of the E-box motif are sufficient predictive features for BMAL1-DNA binding. Our models also provide mechanistic insights into tissue specificity of DNA binding by BMAL1.
Collapse
Affiliation(s)
- Daniel Marri
- Department of Biomedical Engineering, Michigan State University, East Lansing, MI, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - David Filipovic
- Department of Biomedical Engineering, Michigan State University, East Lansing, MI, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Omar Kana
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI, USA
- Institute for Integrative Toxicology, Michigan State University, East Lansing, MI, USA
| | - Shelley Tischkau
- Department of Pharmacology, Southern Illinois University School of Medicine, Springfield, IL, USA
| | - Sudin Bhattacharya
- Department of Biomedical Engineering, Michigan State University, East Lansing, MI, USA.
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA.
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI, USA.
- Institute for Integrative Toxicology, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
33
|
Kaur A, Chauhan APS, Aggarwal AK. Prediction of Enhancers in DNA Sequence Data using a Hybrid CNN-DLSTM Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1327-1336. [PMID: 35417351 DOI: 10.1109/tcbb.2022.3167090] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Enhancer, a distal cis-regulatory element controls gene expression. Experimental prediction of enhancer elements is time-consuming and expensive. Consequently, various inexpensive deep learning-based fast methods have been developed for predicting the enhancers and determining their strength. In this paper, we have proposed a two-stage deep learning-based framework leveraging DNA structural features, natural language processing, convolutional neural network, and long short-term memory to predict the enhancer elements accurately in the genomics data. In the first stage, we extracted the features from DNA sequence data by using three feature representation techniques viz., k-mer based feature extraction along with word2vector based interpretation of underlined patterns, one-hot encoding, and the DNAshape technique. In the second stage, strength of enhancers is predicted from the extracted features using a hybrid deep learning model. The method is capable of adapting itself to varying sizes of datasets. Also, as proposed model can capture long-range sequencing patterns, the robustness of the method remains unaffected against minor variations in the genomics sequence. The method outperforms the other state-of-the-art methods at both stages in terms of performance metrics of prediction accuracy, specificity, Mathew's correlation coefficient, and area under the ROC curve. In summary, the proposed method is a reliable method for enhancer prediction.
Collapse
|
34
|
Quan L, Chu X, Sun X, Wu T, Lyu Q. How Deepbics Quantifies Intensities of Transcription Factor-DNA Binding and Facilitates Prediction of Single Nucleotide Variant Pathogenicity With a Deep Learning Model Trained On ChIP-Seq Data Sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1594-1599. [PMID: 35471887 DOI: 10.1109/tcbb.2022.3170343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The binding of DNA sequences to cell type-specific transcription factors is essential for regulating gene expression in all organisms. Many variants occurring in these binding regions play crucial roles in human disease by disrupting the cis-regulation of gene expression. We first implemented a sequence-based deep learning model called deepBICS to quantify the intensity of transcription factors-DNA binding. The experimental results not only showed the superiority of deepBICS on ChIP-seq data sets but also suggested deepBICS as a language model could help the classification of disease-related and neutral variants. We then built a language model-based method called deepBICS4SNV to predict the pathogenicity of single nucleotide variants. The good performance of deepBICS4SNV on 2 tests related to Mendelian disorders and viral diseases shows the sequence contextual information derived from language models can improve prediction accuracy and generalization capability.
Collapse
|
35
|
Physicochemical models of protein-DNA binding with standard and modified base pairs. Proc Natl Acad Sci U S A 2023; 120:e2205796120. [PMID: 36656856 PMCID: PMC9942898 DOI: 10.1073/pnas.2205796120] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
DNA-binding proteins play important roles in various cellular processes, but the mechanisms by which proteins recognize genomic target sites remain incompletely understood. Functional groups at the edges of the base pairs (bp) exposed in the DNA grooves represent physicochemical signatures. As these signatures enable proteins to form specific contacts between protein residues and bp, their study can provide mechanistic insights into protein-DNA binding. Existing experimental methods, such as X-ray crystallography, can reveal such mechanisms based on physicochemical interactions between proteins and their DNA target sites. However, the low throughput of structural biology methods limits mechanistic insights for selection of many genomic sites. High-throughput binding assays enable prediction of potential target sites by determining relative binding affinities of a protein to massive numbers of DNA sequences. Many currently available computational methods are based on the sequence of standard Watson-Crick bp. They assume that the contribution of overall binding affinity is independent for each base pair, or alternatively include dinucleotides or short k-mers. These methods cannot directly expand to physicochemical contacts, and they are not suitable to apply to DNA modifications or non-Watson-Crick bp. These variations include DNA methylation, and synthetic or mismatched bp. The proposed method, DeepRec, can predict relative binding affinities as function of physicochemical signatures and the effect of DNA methylation or other chemical modifications on binding. Sequence-based modeling methods are in comparison a coarse-grain description and cannot achieve such insights. Our chemistry-based modeling framework provides a path towards understanding genome function at a mechanistic level.
Collapse
|
36
|
Li Y, Kong F, Cui H, Wang F, Li C, Ma J. SENIES: DNA Shape Enhanced Two-Layer Deep Learning Predictor for the Identification of Enhancers and Their Strength. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:637-645. [PMID: 35015646 DOI: 10.1109/tcbb.2022.3142019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Identifying enhancers is a critical task in bioinformatics due to their primary role in regulating gene expression. For this reason, various computational algorithms devoted to enhancer identification have been put forward over the years. More features are extracted from the single DNA sequences to boost the performance. Nevertheless, DNA structural information is neglected, which is an essential factor affecting the binding preferences of transcription factors to regulatory elements like enhancers. Here, we propose SENIES, a DNA shape enhanced deep learning predictor, to identify enhancers and their strength. The predictor consists of two layers where the first layer is for enhancer and non-enhancer identification, and the second layer is for predicting the strength of enhancers. Apart from two common sequence-derived features (i.e., one-hot and k-mer), DNA shape is introduced to describe the 3D structures of DNA sequences. Performance comparison with state-of-the-art methods conducted on public datasets demonstrates the effectiveness and robustness of our predictor. The code implementation of SENIES is publicly available at https://github.com/hlju-liye/SENIES.
Collapse
|
37
|
Sghaier N, Essemine J, Ayed RB, Gorai M, Ben Marzoug R, Rebai A, Qu M. An Evidence Theory and Fuzzy Logic Combined Approach for the Prediction of Potential ARF-Regulated Genes in Quinoa. PLANTS (BASEL, SWITZERLAND) 2022; 12:71. [PMID: 36616201 PMCID: PMC9824623 DOI: 10.3390/plants12010071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 11/26/2022] [Indexed: 06/17/2023]
Abstract
Quinoa constitutes among the tolerant plants to the challenging and harmful abiotic environmental factors. Quinoa was selected as among the model crops destined for bio-saline agriculture that could contribute to the staple food security for an ever-growing worldwide population under various climate change scenarios. The auxin response factors (ARFs) constitute the main contributors in the plant adaptation to severe environmental conditions. Thus, the determination of the ARF-binding sites represents the major step that could provide promising insights helping in plant breeding programs and improving agronomic traits. Hence, determining the ARF-binding sites is a challenging task, particularly in species with large genome sizes. In this report, we present a data fusion approach based on Dempster-Shafer evidence theory and fuzzy set theory to predict the ARF-binding sites. We then performed an "In-silico" identification of the ARF-binding sites in Chenopodium quinoa. The characterization of some known pathways implicated in the auxin signaling in other higher plants confirms our prediction reliability. Furthermore, several pathways with no or little available information about their functions were identified to play important roles in the adaptation of quinoa to environmental conditions. The predictive auxin response genes associated with the detected ARF-binding sites may certainly help to explore the biological roles of some unknown genes newly identified in quinoa.
Collapse
Affiliation(s)
- Nesrine Sghaier
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya 572024, China
- CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
- Laboratory of Advanced Technology and Intelligent Systems, National Engineering School of Sousse, Sousse 4023, Tunisia
| | - Jemaa Essemine
- CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Rayda Ben Ayed
- Department of Agronomy and Plant Biotechnology, National Institute of Agronomy of Tunisia (INAT), 43 Avenue Charles Nicolle, 1082 El Mahrajène, University of Carthage-Tunis, Tunis 1082, Tunisia
- Laboratory of Extremophile Plants, Centre of Biotechnology of Borj-Cédria, B.P. 901, Hammam Lif 2050, Tunisia
| | - Mustapha Gorai
- Higher Institute of Applied Biology Medenine, University of Gabes, Medenine 4119, Tunisia
| | - Riadh Ben Marzoug
- Laboratory of Molecular and Cellular Screening Processes, Sfax Biotechnology Center, B.P 1177, Sfax 3018, Tunisia
| | - Ahmed Rebai
- Laboratory of Molecular and Cellular Screening Processes, Sfax Biotechnology Center, B.P 1177, Sfax 3018, Tunisia
| | - Mingnan Qu
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya 572024, China
- CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| |
Collapse
|
38
|
Basu A, Bobrovnikov DG, Cieza B, Arcon JP, Qureshi Z, Orozco M, Ha T. Deciphering the mechanical code of the genome and epigenome. Nat Struct Mol Biol 2022; 29:1178-1187. [PMID: 36471057 PMCID: PMC10142808 DOI: 10.1038/s41594-022-00877-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 10/18/2022] [Indexed: 12/12/2022]
Abstract
Diverse DNA-deforming processes are impacted by the local mechanical and structural properties of DNA, which in turn depend on local sequence and epigenetic modifications. Deciphering this mechanical code (that is, this dependence) has been challenging due to the lack of high-throughput experimental methods. Here we present a comprehensive characterization of the mechanical code. Utilizing high-throughput measurements of DNA bendability via loop-seq, we quantitatively established how the occurrence and spatial distribution of dinucleotides, tetranucleotides and methylated CpG impact DNA bendability. We used our measurements to develop a physical model for the sequence and methylation dependence of DNA bendability. We validated the model by performing loop-seq on mouse genomic sequences around transcription start sites and CTCF-binding sites. We applied our model to test the predictions of all-atom molecular dynamics simulations and to demonstrate that sequence and epigenetic modifications can mechanically encode regulatory information in diverse contexts.
Collapse
Affiliation(s)
- Aakash Basu
- Department of Biosciences, Durham University, Durham, UK. .,Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Dmitriy G Bobrovnikov
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Basilio Cieza
- Department of Biophysics, Johns Hopkins University, Baltimore, MD, USA
| | - Juan Pablo Arcon
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Zan Qureshi
- Department of Biophysics, Johns Hopkins University, Baltimore, MD, USA
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology, Barcelona, Spain.,Department of Biochemistry and Biomedicine, Universitat de Barcelona, Barcelona, Spain
| | - Taekjip Ha
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, Baltimore, MD, USA. .,Department of Biophysics, Johns Hopkins University, Baltimore, MD, USA. .,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA. .,Howard Hughes Medical Institute, Baltimore, MD, USA.
| |
Collapse
|
39
|
Nosaki S, Mitsuda N, Sakamoto S, Kusubayashi K, Yamagami A, Xu Y, Bui TBC, Terada T, Miura K, Nakano T, Tanokura M, Miyakawa T. Brassinosteroid-induced gene repression requires specific and tight promoter binding of BIL1/BZR1 via DNA shape readout. NATURE PLANTS 2022; 8:1440-1452. [PMID: 36522451 DOI: 10.1038/s41477-022-01289-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 10/26/2022] [Indexed: 05/12/2023]
Abstract
BRZ-INSENSITIVE-LONG 1 (BIL1)/BRASSINAZOLE-RESISTANT 1 (BZR1) and its homologues are plant-specific transcription factors that convert the signalling of the phytohormones brassinosteroids (BRs) to transcriptional responses, thus controlling various physiological processes in plants. Although BIL1/BZR1 upregulates some BR-responsive genes and downregulates others, the molecular mechanism underlying the dual roles of BIL1/BZR1 is still poorly understood. Here we show that BR-responsive transcriptional repression by BIL1/BZR1 requires the tight binding of BIL1/BZR1 alone to the 10 bp elements of DNA fragments containing the known 6 bp core-binding motifs at the centre. Furthermore, biochemical and structural evidence demonstrates that the selectivity for two nucleobases flanking the core motifs is realized by the DNA shape readout of BIL1/BZR1 without direct recognition of the nucleobases. These results elucidate the molecular and structural basis of transcriptional repression by BIL1/BZR1 and contribute to further understanding of the dual roles of BIL1/BZR1 in BR-responsive gene regulation.
Collapse
Affiliation(s)
- Shohei Nosaki
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
- Tsukuba Plant-Innovation Research Center (T-PIRC), University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Nobutaka Mitsuda
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
- Global Zero Emission Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
| | - Shingo Sakamoto
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
- Global Zero Emission Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
| | - Kazuki Kusubayashi
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Ayumi Yamagami
- Graduate School of Biostudies, Kyoto University, Sakyo-ku, Kyoto, Japan
- Gene Discovery Research Group, RIKEN CSRS, Wako, Saitama, Japan
| | - Yuqun Xu
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Thi Bao Chau Bui
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Tohru Terada
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Kenji Miura
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
- Tsukuba Plant-Innovation Research Center (T-PIRC), University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Takeshi Nakano
- Graduate School of Biostudies, Kyoto University, Sakyo-ku, Kyoto, Japan
- Gene Discovery Research Group, RIKEN CSRS, Wako, Saitama, Japan
| | - Masaru Tanokura
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
| | - Takuya Miyakawa
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
- Graduate School of Biostudies, Kyoto University, Sakyo-ku, Kyoto, Japan.
| |
Collapse
|
40
|
Yan W, Li Z, Pian C, Wu Y. PlantBind: an attention-based multi-label neural network for predicting plant transcription factor binding sites. Brief Bioinform 2022; 23:6713513. [PMID: 36155619 DOI: 10.1093/bib/bbac425] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 08/29/2022] [Accepted: 08/31/2022] [Indexed: 12/14/2022] Open
Abstract
Identification of transcription factor binding sites (TFBSs) is essential to understanding of gene regulation. Designing computational models for accurate prediction of TFBSs is crucial because it is not feasible to experimentally assay all transcription factors (TFs) in all sequenced eukaryotic genomes. Although many methods have been proposed for the identification of TFBSs in humans, methods designed for plants are comparatively underdeveloped. Here, we present PlantBind, a method for integrated prediction and interpretation of TFBSs based on DNA sequences and DNA shape profiles. Built on an attention-based multi-label deep learning framework, PlantBind not only simultaneously predicts the potential binding sites of 315 TFs, but also identifies the motifs bound by transcription factors. During the training process, this model revealed a strong similarity among TF family members with respect to target binding sequences. Trans-species prediction performance using four Zea mays TFs demonstrated the suitability of this model for transfer learning. Overall, this study provides an effective solution for identifying plant TFBSs, which will promote greater understanding of transcriptional regulatory mechanisms in plants.
Collapse
Affiliation(s)
| | - Zutan Li
- Nanjing Agricultur al University
| | - Cong Pian
- College of Sciences at Nanjing Agricultural University
| | - Yufeng Wu
- State Key Laboratory for Crop Genetics and Germplasm Enhancement, Bioinformatics Center, College of Agriculture, Academy for Advanced Interdisciplinary Studies at Nanjing Agricultural University
| |
Collapse
|
41
|
Zhang Q, Zhang Y, Wang S, Chen ZH, Gribova V, Filaretov VF, Huang DS. Predicting In-Vitro DNA-Protein Binding With a Spatially Aligned Fusion of Sequence and Shape. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3144-3153. [PMID: 34882561 DOI: 10.1109/tcbb.2021.3133869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Discovery of transcription factor binding sites (TFBSs) is of primary importance for understanding the underlying binding mechanic and gene regulation process. Growing evidence indicates that apart from the primary DNA sequences, DNA shape landscape has a significant influence on transcription factor binding preference. To effectively model the co-influence of sequence and shape features, we emphasize the importance of position information of sequence motif and shape pattern. In this paper, we propose a novel deep learning-based architecture, named hybridShape eDeepCNN, for TFBS prediction which integrates DNA sequence and shape information in a spatially aligned manner. Our model utilizes the power of the multi-layer convolutional neural network and constructs an independent subnetwork to adapt for the distinct data distribution of heterogeneous features. Besides, we explore the usage of continuous embedding vectors as the representation of DNA sequences. Based on the experiments on 20 in-vitro datasets derived from universal protein binding microarrays (uPBMs), we demonstrate the superiority of our proposed method and validate the underlying design logic.
Collapse
|
42
|
Wetzel JL, Zhang K, Singh M. Learning probabilistic protein-DNA recognition codes from DNA-binding specificities using structural mappings. Genome Res 2022; 32:1776-1786. [PMID: 36123148 PMCID: PMC9528988 DOI: 10.1101/gr.276606.122] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 07/30/2022] [Indexed: 11/25/2022]
Abstract
Knowledge of how proteins interact with DNA is essential for understanding gene regulation. Although DNA-binding specificities for thousands of transcription factors (TFs) have been determined, the specific amino acid-base interactions comprising their structural interfaces are largely unknown. This lack of resolution hampers attempts to leverage these data in order to predict specificities for uncharacterized TFs or TFs mutated in disease. Here we introduce recognition code learning via automated mapping of protein-DNA structural interfaces (rCLAMPS), a probabilistic approach that uses DNA-binding specificities for TFs from the same structural family to simultaneously infer both which nucleotide positions are contacted by particular amino acids within the TF as well as a recognition code that relates each base-contacting amino acid to nucleotide preferences at the DNA positions it contacts. We apply rCLAMPS to homeodomains, the second largest family of TFs in metazoans and show that it learns a highly effective recognition code that can predict de novo DNA-binding specificities for TFs. Furthermore, we show that the inferred amino acid-nucleotide contacts reveal whether and how nucleotide preferences at individual binding site positions are altered by mutations within TFs. Our approach is an important step toward automatically uncovering the determinants of protein-DNA specificity from large compendia of DNA-binding specificities and inferring the altered functionalities of TFs mutated in disease.
Collapse
Affiliation(s)
- Joshua L Wetzel
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - Kaiqian Zhang
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - Mona Singh
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|
43
|
Liu J, Zhou D. Minimum Functional Length Analysis of K-Mer Based on BPNN. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2920-2925. [PMID: 34310316 DOI: 10.1109/tcbb.2021.3098512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
BP neural network (BPNN), as a multilayer feed-forward network, can realize the deep cognition to target data and high accuracy to output results. However, there were still no related research of k-mer based on BPNN yet. In present study, BPNN was used to train and test binary classification data of each classification mode respectively. All k-mer were divided into two categories according to the X + Y content or completely random mode. Results showed that 1) For classification mode of X + Y content, the accuracy of k-mers classification was 100 percent, no matter k ≤ 6 or k ≥ 7; 2) For completely random classification mode, the accuracy of classification is 100 percent for k-mers of k ≤ 6; But for k-mers of k ≥ 7, the accuracy is less than 100 percent, and with the increase of k value, the accuracy of classification gradually decreases (gradually approaches 50 percent). The k-mers of k ≥ 7 should be the basic functional fragment of nucleic acid, and perform basic nucleic acid function in the DNA sequence. The k-mers of k ≤ 6 should be the basic component fragment of nucleic acid, and no longer perform basic nucleic acid function.
Collapse
|
44
|
Controlling gene expression with deep generative design of regulatory DNA. Nat Commun 2022; 13:5099. [PMID: 36042233 PMCID: PMC9427793 DOI: 10.1038/s41467-022-32818-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 08/18/2022] [Indexed: 11/25/2022] Open
Abstract
Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Using mutagenesis typically requires screening sizable random DNA libraries, which limits the designs to span merely a short section of the promoter and restricts their control of gene expression. Here, we prototype a deep learning strategy based on generative adversarial networks (GAN) by learning directly from genomic and transcriptomic data. Our ExpressionGAN can traverse the entire regulatory sequence-expression landscape in a gene-specific manner, generating regulatory DNA with prespecified target mRNA levels spanning the whole gene regulatory structure including coding and adjacent non-coding regions. Despite high sequence divergence from natural DNA, in vivo measurements show that 57% of the highly-expressed synthetic sequences surpass the expression levels of highly-expressed natural controls. This demonstrates the applicability and relevance of deep generative design to expand our knowledge and control of gene expression regulation in any desired organism, condition or tissue. Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Here the authors present EspressionGAN, a generative adversarial network that uses genomic and transcriptomic data to generate regulatory sequences.
Collapse
|
45
|
Barissi S, Sala A, Wieczór M, Battistini F, Orozco M. DNAffinity: a machine-learning approach to predict DNA binding affinities of transcription factors. Nucleic Acids Res 2022; 50:9105-9114. [PMID: 36018808 PMCID: PMC9458447 DOI: 10.1093/nar/gkac708] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 07/21/2022] [Accepted: 08/08/2022] [Indexed: 12/24/2022] Open
Abstract
We present a physics-based machine learning approach to predict in vitro transcription factor binding affinities from structural and mechanical DNA properties directly derived from atomistic molecular dynamics simulations. The method is able to predict affinities obtained with techniques as different as uPBM, gcPBM and HT-SELEX with an excellent performance, much better than existing algorithms. Due to its nature, the method can be extended to epigenetic variants, mismatches, mutations, or any non-coding nucleobases. When complemented with chromatin structure information, our in vitro trained method provides also good estimates of in vivo binding sites in yeast.
Collapse
Affiliation(s)
| | | | - Miłosz Wieczór
- Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology. Baldiri Reixac 10–12, 08028 Barcelona, Spain,Department of Physical Chemistry. Gdansk University of Technology, 80-233 Gdańsk, Poland
| | | | - Modesto Orozco
- Correspondence may also be addressed to Modesto Orozco. Tel: +34 934 037 156;
| |
Collapse
|
46
|
Abstract
The human genome carries a vast amount of information within its DNA sequences. The chemical bases A, T, C, and G are the basic units of information content, that are arranged into patterns and codes. Expansive areas of the genome contain codes that are not yet well understood. To decipher these, mathematical and computational tools are applied here to study genomic signatures or general designs of sequences. A novel binary components analysis is devised and utilized. This seeks to isolate the physical and chemical properties of DNA bases, which reveals sequence design and function. Here, information theory tools break down the information content within DNA bases, in order to study them in isolation for their genomic signatures and non-random properties. In this way, the RY (purine/pyrimidine), WS (weak/strong), and KM (keto/amino) general designs are observed in the sequences. The results show that RY, KM, and WS components have a similar and stable overall profile across all human chromosomes. It reveals that the RY property of a sequence is most distant from randomness in the human genome with respect to the genomic signatures. This is true across all human chromosomes. It is concluded that there exists a widespread potential RY code, and furthermore, that this is likely a structural code. Ascertaining this feature of general design, and potential RY structural code has far-reaching implications. This is because it aids in the understanding of cell biology, growth, and development, as well as downstream in the study of human disease and potential drug design.
Collapse
|
47
|
Antikainen AA, Heinonen M, Lähdesmäki H. Modeling binding specificities of transcription factor pairs with random forests. BMC Bioinformatics 2022; 23:212. [PMID: 35659235 PMCID: PMC9166390 DOI: 10.1186/s12859-022-04734-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 05/12/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Transcription factors (TFs) bind regulatory DNA regions with sequence specificity, form complexes and regulate gene expression. In cooperative TF-TF binding, two transcription factors bind onto a shared DNA binding site as a pair. Previous work has demonstrated pairwise TF-TF-DNA interactions with position weight matrices (PWMs), which may however not sufficiently take into account the complexity and flexibility of pairwise binding.
Results
We propose two random forest (RF) methods for joint TF-TF binding site prediction: and . We train models with previously published large-scale CAP-SELEX DNA libraries, which comprise DNA sequences enriched for binding of a selected TF pair. builds a random forest with sub-sequences selected from CAP-SELEX DNA reads with previously proposed pairwise PWM. outperforms (area under receiver operating characteristics curve, AUROC, 0.75) the current state-of-the-art method i.e. orientation and spacing specific pairwise PWMs (AUROC 0.59). Thus, may be utilized to improve prediction accuracy for pre-determined binding preferences. However, pairwise TF binding is currently considered flexible; a pair may bind DNA with different orientations and amounts of dinucleotide gaps or overlap between the two motifs. Thus, we developed , which utilizes random forests by considering simultaneously multiple orientations and spacings of the two factors. Our approach outperforms (AUROC 0.78) PWMs, as well as (p<0.00195). provides an approach for predicting TF-TF binding sites without prior knowledge on pairwise binding preferences. However, more research is needed to assess eligibility for practical applications.
Conclusions
Random forest is well suited for modeling pairwise TF-TF-DNA binding specificities, and provides an improvement to pairwise binding site prediction accuracy.
Collapse
|
48
|
Krieger G, Lupo O, Wittkopp P, Barkai N. Evolution of transcription factor binding through sequence variations and turnover of binding sites. Genome Res 2022; 32:1099-1111. [PMID: 35618416 PMCID: PMC9248875 DOI: 10.1101/gr.276715.122] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/20/2022] [Indexed: 01/08/2023]
Abstract
Variations in noncoding regulatory sequences play a central role in evolution. Interpreting such variations, however, remains difficult even in the context of defined attributes such as transcription factor (TF) binding sites. Here, we systematically link variations in cis-regulatory sequences to TF binding by profiling the allele-specific binding of 27 TFs expressed in a yeast hybrid, in which two related genomes are present within the same nucleus. TFs localize preferentially to sites containing their known consensus motifs but occupy only a small fraction of the motif-containing sites available within the genomes. Differential binding of TFs to the orthologous alleles was well explained by variations that alter motif sequence, whereas differences in chromatin accessibility between alleles were of little apparent effect. Motif variations that abolished binding when present in only one allele were still bound when present in both alleles, suggesting evolutionary compensation, with a potential role for sequence conservation at the motif's vicinity. At the level of the full promoter, we identify cases of binding-site turnover, in which binding sites are reciprocally gained and lost, yet most interspecific differences remained uncompensated. Our results show the flexibility of TFs to bind imprecise motifs and the fast evolution of TF binding sites between related species.
Collapse
Affiliation(s)
- Gat Krieger
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Offir Lupo
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Patricia Wittkopp
- Department of Ecology and Evolutionary Biology, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
49
|
Quan L, Sun X, Wu J, Mei J, Huang L, He R, Nie L, Chen Y, Lyu Q. Learning Useful Representations of DNA Sequences From ChIP-Seq Datasets for Exploring Transcription Factor Binding Specificities. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:998-1008. [PMID: 32976105 DOI: 10.1109/tcbb.2020.3026787] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Deep learning has been successfully applied to surprisingly different domains. Researchers and practitioners are employing trained deep learning models to enrich our knowledge. Transcription factors (TFs)are essential for regulating gene expression in all organisms by binding to specific DNA sequences. Here, we designed a deep learning model named SemanticCS (Semantic ChIP-seq)to predict TF binding specificities. We trained our learning model on an ensemble of ChIP-seq datasets (Multi-TF-cell)to learn useful intermediate features across multiple TFs and cells. To interpret these feature vectors, visualization analysis was used. Our results indicate that these learned representations can be used to train shallow machines for other tasks. Using diverse experimental data and evaluation metrics, we show that SemanticCS outperforms other popular methods. In addition, from experimental data, SemanticCS can help to identify the substitutions that cause regulatory abnormalities and to evaluate the effect of substitutions on the binding affinity for the RXR transcription factor. The online server for SemanticCS is freely available at http://qianglab.scst.suda.edu.cn/semanticCS/.
Collapse
|
50
|
Zhang Y, Wang Z, Zeng Y, Liu Y, Xiong S, Wang M, Zhou J, Zou Q. A novel convolution attention model for predicting transcription factor binding sites by combination of sequence and shape. Brief Bioinform 2021; 23:6470969. [PMID: 34929739 DOI: 10.1093/bib/bbab525] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/28/2021] [Accepted: 11/13/2021] [Indexed: 12/17/2022] Open
Abstract
The discovery of putative transcription factor binding sites (TFBSs) is important for understanding the underlying binding mechanism and cellular functions. Recently, many computational methods have been proposed to jointly account for DNA sequence and shape properties in TFBSs prediction. However, these methods fail to fully utilize the latent features derived from both sequence and shape profiles and have limitation in interpretability and knowledge discovery. To this end, we present a novel Deep Convolution Attention network combining Sequence and Shape, dubbed as D-SSCA, for precisely predicting putative TFBSs. Experiments conducted on 165 ENCODE ChIP-seq datasets reveal that D-SSCA significantly outperforms several state-of-the-art methods in predicting TFBSs, and justify the utility of channel attention module for feature refinements. Besides, the thorough analysis about the contribution of five shapes to TFBSs prediction demonstrates that shape features can improve the predictive power for transcription factors-DNA binding. Furthermore, D-SSCA can realize the cross-cell line prediction of TFBSs, indicating the occupancy of common interplay patterns concerning both sequence and shape across various cell lines. The source code of D-SSCA can be found at https://github.com/MoonLord0525/.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China.,School of Computer Science and Engineering, University of Electronic Science and Technology of China, 611731, Chengdu, China
| | - Zixuan Wang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Yuanqi Zeng
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Yuhang Liu
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Shuwen Xiong
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Maocheng Wang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Jiliu Zhou
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 610054, Chengdu, China
| |
Collapse
|