1
|
Gondalia N, Quiroz LF, Lai L, Singh AK, Khan M, Brychkova G, McKeown PC, Chatterjee M, Spillane C. Harnessing promoter elements to enhance gene editing in plants: perspectives and advances. PLANT BIOTECHNOLOGY JOURNAL 2025; 23:1375-1395. [PMID: 40013512 DOI: 10.1111/pbi.14533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 10/20/2024] [Accepted: 11/16/2024] [Indexed: 02/28/2025]
Abstract
Genome-edited plants, endowed with climate-smart traits, have been promoted as tools for strengthening resilience against climate change. Successful plant gene editing (GE) requires precise regulation of the GE machinery, a process controlled by the promoters, which drives its transcription through interactions with transcription factors (TFs) and RNA polymerase. While constitutive promoters are extensively used in GE constructs, their limitations highlight the need for alternative approaches. This review emphasizes the promise of tissue/organ specific as well as inducible promoters, which enable targeted GE in a spatiotemporal manner with no effects on other tissues. Advances in synthetic biology have paved the way for the creation of synthetic promoters, offering refined control over gene expression and augmenting the potential of plant GE. The integration of these novel promoters with synthetic systems presents significant opportunities for precise and conditional genome editing. Moreover, the advent of bioinformatic tools and artificial intelligence is revolutionizing the characterization of regulatory elements, enhancing our understanding of their roles in plants. Thus, this review provides novel insights into the strategic use of promoters and promoter editing to enhance the precision, efficiency and specificity of plant GE, setting the stage for innovative crop improvement strategies.
Collapse
Affiliation(s)
- Nikita Gondalia
- Agriculture, Food Systems and Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Luis Felipe Quiroz
- Agriculture, Food Systems and Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Linyi Lai
- Agriculture, Food Systems and Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Avinash Kumar Singh
- Agriculture, Food Systems and Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Moman Khan
- Agriculture, Food Systems and Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Galina Brychkova
- Agriculture, Food Systems and Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Peter C McKeown
- Agriculture, Food Systems and Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Manash Chatterjee
- Agriculture, Food Systems and Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
- Viridian Seeds Ltd., Cambridge, UK
| | - Charles Spillane
- Agriculture, Food Systems and Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| |
Collapse
|
2
|
Nayak N, Mehrotra S, Karamchandani AN, Santelia D, Mehrotra R. Recent advances in designing synthetic plant regulatory modules. FRONTIERS IN PLANT SCIENCE 2025; 16:1567659. [PMID: 40241826 PMCID: PMC11999978 DOI: 10.3389/fpls.2025.1567659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Accepted: 03/17/2025] [Indexed: 04/18/2025]
Abstract
Introducing novel functions in plants through synthetic multigene circuits requires strict transcriptional regulation. Currently, the use of natural regulatory modules in synthetic circuits is hindered by our limited knowledge of complex plant regulatory mechanisms, the paucity of characterized promoters, and the possibility of crosstalk with endogenous circuits. Synthetic regulatory modules can overcome these limitations. This article introduces an integrative de novo approach for designing plant synthetic promoters by utilizing the available online tools and databases. The recent achievements in designing and validating synthetic plant promoters, enhancers, transcription factors, and the challenges of establishing synthetic circuits in plants are also discussed.
Collapse
Affiliation(s)
- Namitha Nayak
- Department of Biological Sciences, Birla Institute of Technology and Sciences Pilani, Goa, India
| | - Sandhya Mehrotra
- Department of Biological Sciences, Birla Institute of Technology and Sciences Pilani, Goa, India
| | | | - Diana Santelia
- Institute of Integrative Biology, ETH Zürich Universitätstrasse, Zürich, Switzerland
| | - Rajesh Mehrotra
- Department of Biological Sciences, Birla Institute of Technology and Sciences Pilani, Goa, India
| |
Collapse
|
3
|
Gummadi ASC, Muppa DK, Yella VR. Dissecting non-B DNA structural motifs in untranslated regions of eukaryotic genomes. Genomics Inform 2024; 22:25. [PMID: 39605082 PMCID: PMC11603647 DOI: 10.1186/s44342-024-00028-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 11/01/2024] [Indexed: 11/29/2024] Open
Abstract
The untranslated regions (UTRs) of genes significantly impact various biological processes, including transcription, posttranscriptional control, mRNA stability, localization, and translation efficiency. In functional areas of genomes, non-B DNA structures such as cruciform, curved, triplex, G-quadruplex, and Z-DNA structures are common and have an impact on cellular physiology. Although the role of these structures in cis-regulatory regions such as promoters is well established in eukaryotic genomes, their prevalence within UTRs across different eukaryotic classes has not been extensively documented. Our study investigated the prevalence of various non-B DNA motifs within the 5' and 3' UTRs across diverse eukaryotic species. Our comparative analysis encompassed the 5'-UTRs and 3'UTRs of 360 species representing diverse eukaryotic domains of life, including Arthropoda (Diptera, Hemiptera, and Hymenoptera), Chordata (Artiodactyla, Carnivora, Galliformes, Passeriformes, Primates, Rodentia, Squamata, Testudines), Magnoliophyta (Brassicales), Fabales (Poales), and Nematoda (Rhabditida), on the basis of datasets derived from the UTRdb. We observed that species belonging to taxonomic orders such as Rhabditida, Diptera, Brassicales, and Hemiptera present a prevalence of curved DNA motifs in their UTRs, whereas orders such as Testudines, Galliformes, and Rodentia present a preponderance of G-quadruplexes in both UTRs. The distribution of motifs is conserved across different taxonomic classes, although species-specific variations in motif preferences were also observed. Our research unequivocally illuminates the prevalence and potential functional implications of non-B DNA motifs, offering invaluable insights into the evolutionary and biological significance of these structures.
Collapse
Affiliation(s)
- Aruna Sesha Chandrika Gummadi
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522302, India
| | - Divya Kumari Muppa
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522302, India
| | - Venakata Rajesh Yella
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522302, India.
| |
Collapse
|
4
|
Zhang H, Cao D, Chen Z, Zhang X, Chen Y, Sessions C, Cruchaga C, Payne P, Li G, Province M, Li F. mosGraphGen: a novel tool to generate multi-omics signaling graphs to facilitate integrative and interpretable graph AI model development. BIOINFORMATICS ADVANCES 2024; 4:vbae151. [PMID: 39506989 PMCID: PMC11540438 DOI: 10.1093/bioadv/vbae151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 08/22/2024] [Accepted: 10/04/2024] [Indexed: 11/08/2024]
Abstract
Motivation Multi-omics data, i.e. genomics, epigenomics, transcriptomics, proteomics, characterize cellular complex signaling systems from multi-level and multi-view and provide a holistic view of complex cellular signaling pathways. However, it remains challenging to integrate and interpret multi-omics data for mining critical biomarkers. Graph AI models have been widely used to analyze graph-structure datasets, and are ideal for integrative multi-omics data analysis because they can naturally integrate and represent multi-omics data as a biologically meaningful multi-level signaling graph and interpret multi-omics data via graph node and edge ranking analysis. Nevertheless, it is nontrivial for graph-AI model developers to pre-analyze multi-omics data and convert the data into biologically meaningful graphs, which can be directly fed into graph-AI models. Results To resolve this challenge, we developed mosGraphGen (multi-omics signaling graph generator), generating Multi-omics Signaling graphs (mos-graph) of individual samples by mapping multi-omics data onto a biologically meaningful multi-level background signaling network with data normalization by aggregating measurements and aligning to the reference genome. With mosGraphGen, AI model developers can directly apply and evaluate their models using these mos-graphs. In the results, mosGraphGen was used and illustrated using two widely used multi-omics datasets of The Cancer Genome Atlas (TCGA) and Alzheimer's disease (AD) samples. Availability and implementation The code of mosGraphGen is open-source and publicly available via GitHub: https://github.com/FuhaiLiAiLab/mosGraphGen.
Collapse
Affiliation(s)
- Heming Zhang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Dekang Cao
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Zirui Chen
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Xiuyuan Zhang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Yixin Chen
- Department of Computer Science and Engineering, Washington University in St. Louis, Saint Louis, MO 63130, United States
| | - Cole Sessions
- Department of Pediatrics, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
- NeuroGenomics and Informatics Center, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Philip Payne
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Guangfu Li
- Department of Surgery, School of Medicine, University of Connecticut, Farmington, CT 06030, United States
| | - Michael Province
- Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| | - Fuhai Li
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
- Department of Pediatrics, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
- NeuroGenomics and Informatics Center, Washington University School of Medicine, Saint Louis, MO 63110-1010, United States
| |
Collapse
|
5
|
Zhang H, Cao D, Chen Z, Zhang X, Chen Y, Sessions C, Cruchaga C, Payne P, Li G, Province M, Li F. mosGraphGen: a novel tool to generate multi-omics signaling graphs to facilitate integrative and interpretable graph AI model development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594360. [PMID: 38798349 PMCID: PMC11118290 DOI: 10.1101/2024.05.15.594360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Multi-omics data, i.e., genomics, epigenomics, transcriptomics, proteomics, characterize cellular complex signaling systems from multi-level and multi-view and provide a holistic view of complex cellular signaling pathways. However, it remains challenging to integrate and interpret multi-omics data for mining key disease targets and signaling pathways. Graph AI models have been widely used to analyze graph-structure datasets, and are ideal for integrative multi-omics data analysis because they can naturally integrate and represent multi-omics data as a biologically meaningful multi-level signaling graph and interpret multi-omics data via graph node and edge ranking analysis. However, it is non-trivial for graph-AI model developers to pre-analyze multi-omics data and convert the data into biologically meaningful graphs, which can be directly fed into graph-AI models. To resolve this challenge, we developed mosGraphGen (multi-omics signaling graph generator), generating Multi-omics Signaling graphs (mos-graph) of individual samples by mapping multi-omics data onto a biologically meaningful multi-level background signaling network with data normalization by aggregating measurements and aligning to the reference genome. With mosGraphGen, AI model developers can directly apply and evaluate their models using these mos-graphs. In the results, mosGraphGen was used and illustrated using two widely used multi-omics datasets of TCGA and Alzheimer's disease (AD) samples. The code of mosGraphGen is open-source and publicly available via GitHub: https://github.com/FuhaiLiAiLab/mosGraphGen.
Collapse
Affiliation(s)
- Heming Zhang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Dekang Cao
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Zirui Chen
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Xiuyuan Zhang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Yixin Chen
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Cole Sessions
- Department of Pediatrics, Washington University in St. Louis, St. Louis, MO, USA
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University in St. Louis, St. Louis, MO, USA
- Hope Center for Neurological Disorders, Washington University in St. Louis, St. Louis, MO, USA
- NeuroGenomics and Informatics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Philip Payne
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Guangfu Li
- Department of Surgery, School of Medicine, University of Connecticut, CT, 06032, USA
| | - Michael Province
- Division of Statistical Genomics, Department of Genetics, Washington University in St. Louis, St. Louis, MO, USA
| | - Fuhai Li
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Department of Pediatrics, Washington University in St. Louis, St. Louis, MO, USA
- NeuroGenomics and Informatics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| |
Collapse
|
6
|
Uemura K, Ohyama T. Distinctive physical properties of DNA shared by RNA polymerase II gene promoters and 5'-flanking regions of tRNA genes. J Biochem 2024; 175:395-404. [PMID: 38102732 PMCID: PMC11005993 DOI: 10.1093/jb/mvad111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 10/30/2023] [Accepted: 11/26/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous noncoding (nc)RNAs have been identified. Similar to the transcription of protein-coding (mRNA) genes, long noncoding (lnc)RNA genes and most of micro (mi)RNA genes are transcribed by RNA polymerase II (Pol II). In the transcription of mRNA genes, core promoters play an indispensable role; they support the assembly of the preinitiation complex (PIC). However, the structural and/or physical properties of the core promoters of lncRNA and miRNA genes remain largely unexplored, in contrast with those of mRNA genes. Using the core promoters of human genes, we analyzed the repertoire and population ratios of residing core promoter elements (CPEs) and calculated the following five DNA physical properties (DPPs): duplex DNA free energy, base stacking energy, protein-induced deformability, rigidity and stabilizing energy of Z-DNA. Here, we show that their CPE and DPP profiles are similar to those of mRNA gene promoters. Importantly, the core promoters of these three classes of genes have two highly distinctive sites in their DPP profiles around the TSS and position -27. Similar characteristics in DPPs are also found in the 5'-flanking regions of tRNA genes, indicating their common essential roles in transcription initiation over the kingdom of RNA polymerases.
Collapse
Affiliation(s)
- Kohei Uemura
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
| | - Takashi Ohyama
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
- Department of Biology, Faculty of Education and Integrated Arts and Sciences, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
| |
Collapse
|
7
|
Manu YA, Abduljalal A, Rabiu MB, Lawal RD, Saleh J, Safiyanu M. Identification of putative promoter elements for epsilon glutathione s-transferases genes associated with resistance to DDT in the malaria vector mosquito anopheles arabiensis. SCIENTIFIC AFRICAN 2024; 23:None. [PMID: 38445294 PMCID: PMC10911095 DOI: 10.1016/j.sciaf.2023.e02047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 11/27/2023] [Accepted: 12/20/2023] [Indexed: 03/07/2024] Open
Abstract
The purpose of this study was to identify the putative regulatory elements in the promoter region of An. arabiensis strains which differed in susceptibility to DDT and compare with those identified in its sibling An. gambaie. Basal expression level of Epsilon class GSTs (Glutathione S - transferases) GSTe1 gene was 0.512 - 0.658 (95% CI) and GSTe2 0.672 - 1.204 (95% CI) in adults of DDT resistant KGB compared to 0.031 - 0.04 (95% CI) and 0.148 - 0.199 (95% CI) respectively in susceptible MAT strains of An. arabiensis. Induced mean expression of GSTe2 in larvae exposed to DDT for one hour was 0.901 - 1.172 (95% CI) in KGB and 0.475 - 0.724 (95% CI) in MAT strain. In present work, strain specific primers were used to amplify and sequenced the promoter regions of GSTe1 and GSTe2 in the KGB, MAT and field specimens. Computational analysis revealed presence of classical arthropod initiator sequence TCAGT and putative core promoter elements, GC, CAAT, TATA boxes. A typical TATA box was identified at 35 bp upstream Transcription Start Site (TSS) in GSTe1 but was absent in GSTe2. Several binding sites for regulatory elements downstream and multiple polymorphic sites were identified between strains. The role of these regulatory elements in transcription of these genes has not been determined. However, on comparison the 2 bp adenosine indel (insertion/deletion) which was essential in driving the promoter activity in An. gambiae was identified only DDT resistant KGB strain.
Collapse
Affiliation(s)
| | - Ado Abduljalal
- Centre for Infectious Disease Research, Bayero University, Kano
| | | | | | | | - Mahmud Safiyanu
- Department of Biochemistry, Yusuf Maitama Sule Univeristy, Kano
| |
Collapse
|
8
|
Uemura K, Ohyama T. Physical Peculiarity of Two Sites in Human Promoters: Universality and Diverse Usage in Gene Function. Int J Mol Sci 2024; 25:1487. [PMID: 38338773 PMCID: PMC10855393 DOI: 10.3390/ijms25031487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/15/2024] [Accepted: 01/18/2024] [Indexed: 02/12/2024] Open
Abstract
Since the discovery of physical peculiarities around transcription start sites (TSSs) and a site corresponding to the TATA box, research has revealed only the average features of these sites. Unsettled enigmas include the individual genes with these features and whether they relate to gene function. Herein, using 10 physical properties of DNA, including duplex DNA free energy, base stacking energy, protein-induced deformability, and stabilizing energy of Z-DNA, we clarified for the first time that approximately 97% of the promoters of 21,056 human protein-coding genes have distinctive physical properties around the TSS and/or position -27; of these, nearly 65% exhibited such properties at both sites. Furthermore, about 55% of the 21,056 genes had a minimum value of regional duplex DNA free energy within TSS-centered ±300 bp regions. Notably, distinctive physical properties within the promoters and free energies of the surrounding regions separated human protein-coding genes into five groups; each contained specific gene ontology (GO) terms. The group represented by immune response genes differed distinctly from the other four regarding the parameter of the free energies of the surrounding regions. A vital suggestion from this study is that physical-feature-based analyses of genomes may reveal new aspects of the organization and regulation of genes.
Collapse
Affiliation(s)
- Kohei Uemura
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan;
| | - Takashi Ohyama
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan;
- Department of Biology, Faculty of Education and Integrated Arts and Sciences, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
| |
Collapse
|
9
|
Yella VR, Vanaja A. Computational analysis on the dissemination of non-B DNA structural motifs in promoter regions of 1180 cellular genomes. Biochimie 2023; 214:101-111. [PMID: 37311475 DOI: 10.1016/j.biochi.2023.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/05/2023] [Accepted: 06/05/2023] [Indexed: 06/15/2023]
Abstract
The promoter regions of gene regulation are under evolutionary constraints and earlier studies uncovered that they are characterized by enrichment of functional non-B DNA structural signatures like curved DNA, cruciform DNA, G-quadruplex, triple-helical DNA, slipped DNA structures, and Z-DNA. However, these studies are restricted to a few model organisms, single non-B DNA motif types, or whole genomic sequences, and their comparative accumulation in promoter regions of different domains of life has not been reported comprehensively. In this study, for the first time, we investigated the preponderance of non-B DNA-prone motifs in promoter regions in 1180 genomes belonging to 28 taxonomic groups using the non-B DNA Motif Search Tool (nBMST). The trends suggest that they are predominant in promoters compared to the upstream and downstream regions of all three domains of life and variably linked to taxonomic groups. Cruciform DNA motif is the most abundant form of non-B DNA, spanning from archaea to lower eukaryotes. Curved DNA motifs are prominent in host-associated bacteria, and suppressed in mammals. Triplex-DNA and slipped DNA structure repeats are discretely dispersed in all lineages. G-quadruplex motifs are significantly enriched in mammals. We also observed that the unique enrichment of non-B DNA in promoters is strongly linked to genome GC, size, evolutionary time divergence, and ecological adaptations. Overall, our work systematically reports the unique non-B DNA structural landscape of cellular organisms from the perspective of the cis-regulatory code of genomes.
Collapse
Affiliation(s)
- Venkata Rajesh Yella
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, 522302, Andhra Pradesh, India.
| | - Akkinepally Vanaja
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, 522302, Andhra Pradesh, India; KL College of Pharmacy, Koneru Lakshmaiah Education Foundation, Guntur, 522302, Andhra Pradesh, India
| |
Collapse
|
10
|
Bubnova AN, Yakovleva IV, Korotkov EV, Kamionskaya AM. In Silico Verification of Predicted Potential Promoter Sequences in the Rice ( Oryza sativa) Genome. PLANTS (BASEL, SWITZERLAND) 2023; 12:3573. [PMID: 37896036 PMCID: PMC10609952 DOI: 10.3390/plants12203573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/09/2023] [Accepted: 10/12/2023] [Indexed: 10/29/2023]
Abstract
The exact identification of promoter sequences remains a serious problem in computational biology, as the promoter prediction algorithms under development continue to produce false-positive results. Therefore, to fully assess the validity of predicted sequences, it is necessary to perform a comprehensive test of their properties, such as the presence of downstream transcribed DNA regions behind them, or chromatin accessibility for transcription factor binding. In this paper, we examined the promoter sequences of chromosome 1 of the rice Oryza sativa genome from the Database of Potential Promoter Sequences predicted using a mathematical algorithm based on the derivation and calculation of statistically significant promoter classes. In this paper TATA motifs and cis-regulatory elements were identified in the predicted promoter sequences. We also verified the presence of potential transcription start sites near the predicted promoters by analyzing CAGE-seq data. We searched for unannotated transcripts behind the predicted sequences by de novo assembling transcripts from RNA-seq data. We also examined chromatin accessibility in the region of the predicted promoters by analyzing ATAC-seq data. As a result of this work, we identified the predicted sequences that are most likely to be promoters for further experimental validation in an in vivo or in vitro system.
Collapse
Affiliation(s)
- Anastasiya N. Bubnova
- Federal State Institution Federal Research Centre «Fundamentals of Biotechnology», Russian Academy of Sciences, 119071 Moscow, Russia (A.M.K.)
| | | | | | | |
Collapse
|
11
|
Dey U, Olymon K, Banik A, Abbas E, Yella VR, Kumar A. DNA structural properties of DNA binding sites for 21 transcription factors in the mycobacterial genome. Front Cell Infect Microbiol 2023; 13:1147544. [PMID: 37396305 PMCID: PMC10312376 DOI: 10.3389/fcimb.2023.1147544] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 05/19/2023] [Indexed: 07/04/2023] Open
Abstract
Mycobacterium tuberculosis, the causative agent of tuberculosis, has evolved over time into a multidrug resistance strain that poses a serious global pandemic health threat. The ability to survive and remain dormant within the host macrophage relies on multiple transcription factors contributing to virulence. To date, very limited structural insights from crystallographic and NMR studies are available for TFs and TF-DNA binding events. Understanding the role of DNA structure in TF binding is critical to deciphering MTB pathogenicity and has yet to be resolved at the genome scale. In this work, we analyzed the compositional and conformational preference of 21 mycobacterial TFs, evident at their DNA binding sites, in local and global scales. Results suggest that most TFs prefer binding to genomic regions characterized by unique DNA structural signatures, namely, high electrostatic potential, narrow minor grooves, high propeller twist, helical twist, intrinsic curvature, and DNA rigidity compared to the flanking sequences. Additionally, preference for specific trinucleotide motifs, with clear periodic signals of tetranucleotide motifs, are observed in the vicinity of the TF-DNA interactions. Altogether, our study reports nuanced DNA shape and structural preferences of 21 TFs.
Collapse
Affiliation(s)
- Upalabdha Dey
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, India
| | - Kaushika Olymon
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, India
| | - Anikesh Banik
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, India
| | - Eshan Abbas
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, India
| | - Venkata Rajesh Yella
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, India
| | - Aditya Kumar
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, India
| |
Collapse
|
12
|
Abstract
Repetitive elements in the human genome, once considered 'junk DNA', are now known to adopt more than a dozen alternative (that is, non-B) DNA structures, such as self-annealed hairpins, left-handed Z-DNA, three-stranded triplexes (H-DNA) or four-stranded guanine quadruplex structures (G4 DNA). These dynamic conformations can act as functional genomic elements involved in DNA replication and transcription, chromatin organization and genome stability. In addition, recent studies have revealed a role for these alternative structures in triggering error-generating DNA repair processes, thereby actively enabling genome plasticity. As a driving force for genetic variation, non-B DNA structures thus contribute to both disease aetiology and evolution.
Collapse
Affiliation(s)
- Guliang Wang
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Dell Paediatric Research Institute, Austin, TX, USA
| | - Karen M Vasquez
- Division of Pharmacology and Toxicology, College of Pharmacy, The University of Texas at Austin, Dell Paediatric Research Institute, Austin, TX, USA.
| |
Collapse
|
13
|
Karagyaur M, Primak A, Efimenko A, Skryabina M, Tkachuk V. The Power of Gene Technologies: 1001 Ways to Create a Cell Model. Cells 2022; 11:cells11203235. [PMID: 36291103 PMCID: PMC9599997 DOI: 10.3390/cells11203235] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 10/01/2022] [Accepted: 10/12/2022] [Indexed: 12/04/2022] Open
Abstract
Modern society faces many biomedical challenges that require urgent solutions. Two of the most important include the elucidation of mechanisms of socially significant diseases and the development of prospective drug treatments for these diseases. Experimental cell models are a convenient tool for addressing many of these problems. The power of cell models is further enhanced when combined with gene technologies, which allows the examination of even more subtle changes within the structure of the genome and permits testing of proteins in a native environment. The list and possibilities of these recently emerging technologies are truly colossal, which requires a rethink of a number of approaches for obtaining experimental cell models. In this review, we analyze the possibilities and limitations of promising gene technologies for obtaining cell models, and also give recommendations on the development and creation of relevant models. In our opinion, this review will be useful for novice cell biologists, as it provides some reference points in the rapidly growing universe of gene and cell technologies.
Collapse
Affiliation(s)
- Maxim Karagyaur
- Institute for Regenerative Medicine, Medical Research and Education Center, Lomonosov Moscow State University, 27/10, Lomonosovsky Ave., 119192 Moscow, Russia
- Faculty of Medicine, Lomonosov Moscow State University, 27/1, Lomonosovsky Ave., 119192 Moscow, Russia
- Correspondence:
| | - Alexandra Primak
- Faculty of Medicine, Lomonosov Moscow State University, 27/1, Lomonosovsky Ave., 119192 Moscow, Russia
| | - Anastasia Efimenko
- Institute for Regenerative Medicine, Medical Research and Education Center, Lomonosov Moscow State University, 27/10, Lomonosovsky Ave., 119192 Moscow, Russia
- Faculty of Medicine, Lomonosov Moscow State University, 27/1, Lomonosovsky Ave., 119192 Moscow, Russia
| | - Mariya Skryabina
- Faculty of Medicine, Lomonosov Moscow State University, 27/1, Lomonosovsky Ave., 119192 Moscow, Russia
| | - Vsevolod Tkachuk
- Institute for Regenerative Medicine, Medical Research and Education Center, Lomonosov Moscow State University, 27/10, Lomonosovsky Ave., 119192 Moscow, Russia
- Faculty of Medicine, Lomonosov Moscow State University, 27/1, Lomonosovsky Ave., 119192 Moscow, Russia
| |
Collapse
|
14
|
iProm-Zea: A two-layer model to identify plant promoters and their types using convolutional neural network. Genomics 2022; 114:110384. [PMID: 35533969 DOI: 10.1016/j.ygeno.2022.110384] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/18/2022] [Accepted: 05/02/2022] [Indexed: 01/14/2023]
Abstract
A promoter is a short DNA sequence near the start codon, responsible for initiating the transcription of a specific gene in the genome. The accurate recognition of promoters is important for achieving a better understanding of transcriptional regulation. Because of their importance in the process of biological transcriptional regulation, there is an urgent need to develop in silico tools to identify promoters and their types in a timely and accurate manner. A number of prediction methods have been developed in this regard; however, almost all of them are merely used for identifying promoters and their strength or sigma types. The TATA box region in TATA promoter influences the post-transcriptional processes; therefore, in the current study, we developed a two-layer predictor called "iProm-Zea" using the convolutional neural network (CNN) for identify TATA and TATA less promoters. The first layer can be used to identify a given DNA sequence as a promoter or non-promoter. The second layer can be used to identify whether the recognized promoter is the TATA promoter. To find an optimal feature encoding scheme and model, we employed four feature encoding schemes on different machine learning and CNN algorithms, and based on the evaluation results, we selected a one-hot encoding scheme and a CNN model for iProm-Zea. The 5-fold cross validation testing results demonstrated that the constructed predictor showed great potential for identifying promoters and classifying them as TATA and TATA less promoters. Furthermore, we performed cross-species analysis of iProm-Zea to evaluate its performance in other species. Moreover, to make it easier for other experimental scientists to obtain the results they need, we established a freely accessible and user-friendly web server at http://nsclbio.jbnu.ac.kr/tools/iProm-Zea/.
Collapse
|