1
|
Farias P, Francisco R, Maccario L, Herschend J, Sørensen SJ, Morais PV. Metabolic response of tellurite resistant Bacillus altitudinis strain 3W19 highlights the potential as a model organism for bioremediation. Sci Rep 2025; 15:12745. [PMID: 40222993 PMCID: PMC11994780 DOI: 10.1038/s41598-025-95321-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2024] [Accepted: 03/20/2025] [Indexed: 04/15/2025] Open
Abstract
Contaminated environments can pose new challenges when new contaminants appear and can select organisms with new genetic and metabolic strategies. The increased presence of Te(IV) in the environment is becoming more important. This highlights how underexplored the investigation of how bacteria molecularly respond to less common environmental contaminants, such as tellurite when compared to other metals/ metalloids. Understanding what tools an organism uses from its genetic pool when responding to a new contaminant requires a multiple-technique approach, such as metabolic tests and differential omics analysis. These analyses provide a full metabolic and phenotypical map of stress response that can include new resistance mechanisms, whether specific or not. This study aimed to determine if Bacillus altitudinis strain 3W19, isolated from a Te(IV) contaminated site, presents specific changes at the proteomic level when exposed to the metalloid. In strain 3W19, growth in the presence of Te(IV) upregulated pathways of amino acid metabolism and membrane transport and downregulated pathways of carbohydrate metabolism. Growth in the presence of Te(IV) also induced the formation of reactive oxygen species and lowered the metabolic activity of the strain. This metal led to the overexpression of the proteins of the ter gene cluster. When compared with other strains, the ter system identified in this strain differed in genomic organization from related Bacillus sp. strains. Together, these strain-specificities can contribute to understanding its Te(IV) resistance phenotype.
Collapse
Affiliation(s)
- Pedro Farias
- University of Coimbra, CEMMPRE, ARISE, Department of Life Sciences, 3000-456, Coimbra, Portugal
| | - Romeu Francisco
- University of Coimbra, CEMMPRE, ARISE, Department of Life Sciences, 3000-456, Coimbra, Portugal
| | - Lorrie Maccario
- Department of Biology, Section of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Jakob Herschend
- Department of Biology, Section of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Søren J Sørensen
- Department of Biology, Section of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Paula V Morais
- University of Coimbra, CEMMPRE, ARISE, Department of Life Sciences, 3000-456, Coimbra, Portugal.
| |
Collapse
|
2
|
Okay S. Fine-Tuning Gene Expression in Bacteria by Synthetic Promoters. Methods Mol Biol 2024; 2844:179-195. [PMID: 39068340 DOI: 10.1007/978-1-0716-4063-0_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Promoters are key genetic elements in the initiation and regulation of gene expression. A limited number of natural promoters has been described for the control of gene expression in synthetic biology applications. Therefore, synthetic promoters have been developed to fine-tune the transcription for the desired amount of gene product. Mostly, synthetic promoters are characterized using promoter libraries that are constructed via mutagenesis of promoter sequences. The strength of promoters in the library is determined according to the expression of a reporter gene such as gfp encoding green fluorescent protein. Gene expression can be controlled using inducers. The majority of the studies on gram-negative bacteria are conducted using the expression system of the model organism Escherichia coli while that of the model organism Bacillus subtilis is mostly used in the studies on gram-positive bacteria. Additionally, synthetic promoters for the cyanobacteria, which are phototrophic microorganisms, are evaluated, especially using the model cyanobacterium Synechocystis sp. PCC 6803. Moreover, a variety of algorithms based on machine learning methods were developed to characterize the features of promoter elements. Some of these in silico models were verified using in vitro or in vivo experiments. Identification of novel synthetic promoters with improved features compared to natural ones contributes much to the synthetic biology approaches in terms of fine-tuning gene expression.
Collapse
Affiliation(s)
- Sezer Okay
- Department of Vaccine Technology, Vaccine Institute, Hacettepe University, Ankara, Türkiye
| |
Collapse
|
3
|
Malhotra H, Saha BK, Phale PS. Development of efficient modules for recombinant protein expression and periplasmic localiszation in Pseudomonas bharatica CSV86 T. Protein Expr Purif 2023; 210:106310. [PMID: 37211150 DOI: 10.1016/j.pep.2023.106310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/17/2023] [Accepted: 05/18/2023] [Indexed: 05/23/2023]
Abstract
Escherichia coli has been widely employed as a host for heterologous protein expression. However, due to certain limitations, alternative hosts like Pseudomonas, Lactococcus and Bacillus are being explored. Pseudomonas bharatica CSV86T, a novel soil isolate, preferentially degrades wide range of aromatics over simple carbon sources like glucose and glycerol. Strain also possesses advantageous eco-physiological traits, making it an ideal host for engineering xenobiotic degradation pathways, which necessitates the development of heterologous expression systems. Based on the efficient growth, short lag-phase and rapid metabolism of naphthalene, Pnah and Psal promoters (regulated by NahR) were selected for expression. Pnah was found to be strong and leaky as compared to Psal, using 1-naphthol 2-hydroxylase (1NH, ∼66 kDa) as reporter gene in strain CSV86T. The Carbaryl hydrolase (CH, ∼72kDa) from Pseudomonas sp. C5pp was expressed under Pnah in strain CSV86T and could successfully be translocated to the periplasm due to the presence of the Tmd + Sp sequence. The recombinant CH was purified from the periplasmic fraction and the kinetic characteristics were found to be similar to the native protein from strain C5pp. These results potentiate the suitability of P. bharatica CSV86T as a desirable host, while Pnah and the Tmd + Sp can be employed for overexpression and periplasmic localisation, respectively. Such tools find application in heterologous protein expression and metabolic engineering applications.
Collapse
Affiliation(s)
- Harshit Malhotra
- Department of Biosciences and Bioengineering, Indian Institute of Technology-Bombay, Powai, Mumbai, 400076, India
| | - Braja Kishor Saha
- Department of Biosciences and Bioengineering, Indian Institute of Technology-Bombay, Powai, Mumbai, 400076, India
| | - Prashant S Phale
- Department of Biosciences and Bioengineering, Indian Institute of Technology-Bombay, Powai, Mumbai, 400076, India.
| |
Collapse
|
4
|
Sharma D, Sharma K, Mishra A, Siwach P, Mittal A, Jayaram B. Molecular dynamics simulation-based trinucleotide and tetranucleotide level structural and energy characterization of the functional units of genomic DNA. Phys Chem Chem Phys 2023; 25:7323-7337. [PMID: 36825435 DOI: 10.1039/d2cp04820e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Genomes of most organisms on earth are written in a universal language of life, made up of four units - adenine (A), thymine (T), guanine (G), and cytosine (C), and understanding the way they are put together has been a great challenge to date. Multiple efforts have been made to annotate this wonderfully engineered string of DNA using different methods but they lack a universal character. In this article, we have investigated the structural and energetic profiles of both prokaryotes and eukaryotes by considering two essential genomic sites, viz., the transcription start sites (TSS) and exon-intron boundaries. We have characterized these sites by mapping the structural and energy features of DNA obtained from molecular dynamics simulations, which considers all possible trinucleotide and tetranucleotide steps. For DNA, these physicochemical properties show distinct signatures at the TSS and intron-exon boundaries. Our results firmly convey the idea that DNA uses the same dialect for prokaryotes and eukaryotes and that it is worth going beyond sequence-level analyses to physicochemical space to determine the functional destiny of DNA sequences.
Collapse
Affiliation(s)
- Dinesh Sharma
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Kopal Sharma
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - Priyanka Siwach
- Department of Biotechnology, Chaudhary Devi Lal University, Sirsa, Haryana, India
| | - Aditya Mittal
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, India.,Department of Chemistry, Indian Institute of Technology, Delhi, India.
| |
Collapse
|
5
|
Mai DHA, Nguyen LT, Lee EY. TSSNote-CyaPromBERT: Development of an integrated platform for highly accurate promoter prediction and visualization of Synechococcus sp. and Synechocystis sp. through a state-of-the-art natural language processing model BERT. Front Genet 2022; 13:1067562. [PMID: 36523764 PMCID: PMC9745317 DOI: 10.3389/fgene.2022.1067562] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 11/17/2022] [Indexed: 07/30/2023] Open
Abstract
Since the introduction of the first transformer model with a unique self-attention mechanism, natural language processing (NLP) models have attained state-of-the-art (SOTA) performance on various tasks. As DNA is the blueprint of life, it can be viewed as an unusual language, with its characteristic lexicon and grammar. Therefore, NLP models may provide insights into the meaning of the sequential structure of DNA. In the current study, we employed and compared the performance of popular SOTA NLP models (i.e., XLNET, BERT, and a variant DNABERT trained on the human genome) to predict and analyze the promoters in freshwater cyanobacterium Synechocystis sp. PCC 6803 and the fastest growing cyanobacterium Synechococcus elongatus sp. UTEX 2973. These freshwater cyanobacteria are promising hosts for phototrophically producing value-added compounds from CO2. Through a custom pipeline, promoters and non-promoters from Synechococcus elongatus sp. UTEX 2973 were used to train the model. The trained model achieved an AUROC score of 0.97 and F1 score of 0.92. During cross-validation with promoters from Synechocystis sp. PCC 6803, the model achieved an AUROC score of 0.96 and F1 score of 0.91. To increase accessibility, we developed an integrated platform (TSSNote-CyaPromBERT) to facilitate large dataset extraction, model training, and promoter prediction from public dRNA-seq datasets. Furthermore, various visualization tools have been incorporated to address the "black box" issue of deep learning and feature analysis. The learning transfer ability of large language models may help identify and analyze promoter regions for newly isolated strains with similar lineages.
Collapse
|
6
|
Zhou S, Zheng J, Jia C. SPREAD: An ensemble predictor based on DNA autoencoder framework for discriminating promoters in Pseudomonas aeruginosa. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:13294-13305. [PMID: 36654047 DOI: 10.3934/mbe.2022622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Regulatory elements in DNA sequences, such as promoters, enhancers, terminators and so on, are essential for gene expression in physiological and pathological processes. A promoter is the specific DNA sequence that is located upstream of the coding gene and acts as the "switch" for gene transcriptional regulation. Lots of promoter predictors have been developed for different bacterial species, but only a few are designed for Pseudomonas aeruginosa, a widespread Gram-negative conditional pathogen in nature. In this work, an ensemble model named SPREAD is proposed for the recognition of promoters in Pseudomonas aeruginosa. In SPREAD, the DNA sequence autoencoder model LSTM is employed to extract potential sequence information, and the mean output probability value of CNN and RF is applied as the final prediction. Compared with G4PromFinder, the only state-of-the-art classifier for promoters in Pseudomonas aeruginosa, SPREAD improves the prediction performance significantly, with an accuracy of 0.98, recall of 0.98, precision of 0.98, specificity of 0.97 and F1-score of 0.98.
Collapse
Affiliation(s)
- Shengming Zhou
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jia Zheng
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
7
|
Genome-Wide Prediction of Transcription Start Sites in Conifers. Int J Mol Sci 2022; 23:ijms23031735. [PMID: 35163661 PMCID: PMC8836283 DOI: 10.3390/ijms23031735] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/30/2022] [Accepted: 02/01/2022] [Indexed: 02/04/2023] Open
Abstract
The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.
Collapse
|
8
|
Zhang M, Jia C, Li F, Li C, Zhu Y, Akutsu T, Webb GI, Zou Q, Coin LJM, Song J. Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction. Brief Bioinform 2022; 23:6502561. [PMID: 35021193 PMCID: PMC8921625 DOI: 10.1093/bib/bbab551] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/12/2021] [Accepted: 11/30/2021] [Indexed: 01/13/2023] Open
Abstract
Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.
Collapse
Affiliation(s)
| | - Cangzhi Jia
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | | | | | | | | | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Monash University, Melbourne, VIC 3800, Australia,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Quan Zou
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Lachlan J M Coin
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Jiangning Song
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| |
Collapse
|
9
|
Chevez-Guardado R, Peña-Castillo L. Promotech: a general tool for bacterial promoter recognition. Genome Biol 2021; 22:318. [PMID: 34789306 PMCID: PMC8597233 DOI: 10.1186/s13059-021-02514-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 10/11/2021] [Indexed: 12/14/2022] Open
Abstract
Promoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compare Promotech's performance with the performance of five other promoter prediction methods. Promotech outperforms these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech .
Collapse
Affiliation(s)
- Ruben Chevez-Guardado
- Department of Computer Science, Memorial University of Newfoundland, 230 Elizabeth Ave, St. John's, Newfoundland, A1C 5S7, Canada
| | - Lourdes Peña-Castillo
- Department of Computer Science, Memorial University of Newfoundland, 230 Elizabeth Ave, St. John's, Newfoundland, A1C 5S7, Canada. .,Department of Biology, Memorial University of Newfoundland, 230 Elizabeth Ave, St. John's, Newfoundland, A1C 5S7, Canada.
| |
Collapse
|
10
|
Cazier AP, Blazeck J. Advances in promoter engineering: novel applications and predefined transcriptional control. Biotechnol J 2021; 16:e2100239. [PMID: 34351706 DOI: 10.1002/biot.202100239] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 07/30/2021] [Accepted: 08/03/2021] [Indexed: 11/08/2022]
Abstract
Synthetic biology continues to progress by relying on more robust tools for transcriptional control, of which promoters are the most fundamental component. Numerous studies have sought to characterize promoter function, determine principles to guide their engineering, and create promoters with stronger expression or tailored inducible control. In this review, we will summarize promoter architecture and highlight recent advances in the field, focusing on the novel applications of inducible promoter design and engineering towards metabolic engineering and cellular therapeutic development. Additionally, we will highlight how the expansion of new, machine learning techniques for modeling and engineering promoter sequences are enabling more accurate prediction of promoter characteristics. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Andrew P Cazier
- School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, 311 Ferst St. NW, Atlanta, Georgia, 30332, USA
| | - John Blazeck
- School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, 311 Ferst St. NW, Atlanta, Georgia, 30332, USA
| |
Collapse
|
11
|
Mishra A, Dhanda S, Siwach P, Aggarwal S, Jayaram B. A novel method SEProm for prokaryotic promoter prediction based on DNA structure and energetics. Bioinformatics 2020; 36:2375-2384. [PMID: 31909789 DOI: 10.1093/bioinformatics/btz941] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 11/08/2019] [Accepted: 01/02/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Despite conservation in general architecture of promoters and protein-DNA interaction interface of RNA polymerases among various prokaryotes, identification of promoter regions in the whole genome sequences remains a daunting challenge. The available tools for promoter prediction do not seem to address the problem satisfactorily, apparently because the biochemical nature of promoter signals is yet to be understood fully. Using 28 structural and 3 energetic parameters, we found that prokaryotic promoter regions have a unique structural and energy state, quite distinct from that of coding regions and the information for this signature state is in-built in their sequences. We developed a novel promoter prediction tool from these 31 parameters using various statistical techniques. RESULTS Here, we introduce SEProm, a novel tool that is developed by studying and utilizing the in-built structural and energy information of DNA sequences, which is applicable to all prokaryotes including archaea. Compared to five most recent, diverged and current best available tools, SEProm performs much better, predicting promoters with an 'F-value' of 82.04 and 'Precision' of 81.08. The next best 'F-value' was obtained with PromPredict (72.14) followed by BProm (68.37). On the basis of 'Precision' value, the next best 'Precision' was observed for Pepper (75.39) followed by PromPredict (72.01). SEProm maintained the lead even when comparison was done on two test organisms (not involved in training for SEProm). AVAILABILITY AND IMPLEMENTATION The software is freely available with easy to follow instructions (www.scfbio-iitd.res.in/software/TSS_Predict.jsp). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akhilesh Mishra
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India
| | - Sahil Dhanda
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - Priyanka Siwach
- Supercomputing Facility for Bioinformatics & Computational Biology.,Department of Biotechnology, Chaudhary Devi Lal University, Sirsa 125055, India
| | - Shruti Aggarwal
- Supercomputing Facility for Bioinformatics & Computational Biology
| | - B Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology.,Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi 110016, India.,Department of Chemistry, Indian Institute of Technology, New Delhi 110016, India
| |
Collapse
|
12
|
Romsdahl J, Blachowicz A, Chiang YM, Venkateswaran K, Wang CCC. Metabolomic Analysis of Aspergillus niger Isolated From the International Space Station Reveals Enhanced Production Levels of the Antioxidant Pyranonigrin A. Front Microbiol 2020; 11:931. [PMID: 32670208 PMCID: PMC7326050 DOI: 10.3389/fmicb.2020.00931] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 04/20/2020] [Indexed: 11/13/2022] Open
Abstract
Secondary metabolite (SM) production in Aspergillus niger JSC-093350089, isolated from the International Space Station (ISS), is reported, along with a comparison to the experimentally established strain ATCC 1015. The analysis revealed enhanced production levels of naphtho-γ-pyrones and therapeutically relevant SMs, including bicoumanigrin A, aurasperones A and B, and the antioxidant pyranonigrin A. Genetic variants that may be responsible for increased SM production levels in JSC-093350089 were identified. These findings include INDELs within the predicted promoter region of flbA, which encodes a developmental regulator that modulates pyranonigrin A production via regulation of Fum21. The pyranonigrin A biosynthetic gene cluster was confirmed in A. niger, which revealed the involvement of a previously undescribed gene, pyrE, in its biosynthesis. UVC sensitivity assays enabled characterization of pyranonigrin A as a UV resistance agent in the ISS isolate.
Collapse
Affiliation(s)
- Jillian Romsdahl
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, United States
| | - Adriana Blachowicz
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, United States.,Biotechnology and Planetary Protection Group, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, United States
| | - Yi-Ming Chiang
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, United States
| | - Kasthuri Venkateswaran
- Biotechnology and Planetary Protection Group, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, United States
| | - Clay C C Wang
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, United States.,Department of Chemistry, Dornsife College of Letters, Arts, and Sciences, University of Southern California, Los Angeles, CA, United States
| |
Collapse
|
13
|
Freed E, Fenster J, Smolinski SL, Walker J, Henard CA, Gill R, Eckert CA. Building a genome engineering toolbox in nonmodel prokaryotic microbes. Biotechnol Bioeng 2018; 115:2120-2138. [PMID: 29750332 DOI: 10.1002/bit.26727] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 04/02/2018] [Accepted: 03/10/2018] [Indexed: 12/26/2022]
Abstract
The realization of a sustainable bioeconomy requires our ability to understand and engineer complex design principles for the development of platform organisms capable of efficient conversion of cheap and sustainable feedstocks (e.g., sunlight, CO2 , and nonfood biomass) into biofuels and bioproducts at sufficient titers and costs. For model microbes, such as Escherichia coli, advances in DNA reading and writing technologies are driving the adoption of new paradigms for engineering biological systems. Unfortunately, microbes with properties of interest for the utilization of cheap and renewable feedstocks, such as photosynthesis, autotrophic growth, and cellulose degradation, have very few, if any, genetic tools for metabolic engineering. Therefore, it is important to develop "design rules" for building a genetic toolbox for novel microbes. Here, we present an overview of our current understanding of these rules for the genetic manipulation of prokaryotic microbes and the available genetic tools to expand our ability to genetically engineer nonmodel systems.
Collapse
Affiliation(s)
- Emily Freed
- National Renewable Energy Laboratory, Biosciences Center, Golden, CO.,Renewable and Sustainable Energy Institute, University of Colorado, Boulder, CO
| | - Jacob Fenster
- Renewable and Sustainable Energy Institute, University of Colorado, Boulder, CO.,Chemical and Biological Engineering, University of Colorado, Boulder, CO
| | | | - Julie Walker
- Renewable and Sustainable Energy Institute, University of Colorado, Boulder, CO
| | - Calvin A Henard
- National Renewable Energy Laboratory, National Bioenergy Center, Golden, CO
| | - Ryan Gill
- National Renewable Energy Laboratory, Biosciences Center, Golden, CO.,Renewable and Sustainable Energy Institute, University of Colorado, Boulder, CO.,Chemical and Biological Engineering, University of Colorado, Boulder, CO
| | - Carrie A Eckert
- National Renewable Energy Laboratory, Biosciences Center, Golden, CO.,Renewable and Sustainable Energy Institute, University of Colorado, Boulder, CO
| |
Collapse
|
14
|
Ryasik A, Orlov M, Zykova E, Ermak T, Sorokin A. Bacterial promoter prediction: Selection of dynamic and static physical properties of DNA for reliable sequence classification. J Bioinform Comput Biol 2018; 16:1840003. [DOI: 10.1142/s0219720018400036] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Predicting promoter activity of DNA fragment is an important task for computational biology. Approaches using physical properties of DNA to predict bacterial promoters have recently gained a lot of attention. To select an adequate set of physical properties for training a classifier, various characteristics of DNA molecule should be taken into consideration. Here, we present a systematic approach that allows us to select less correlated properties for classification by means of both correlation and cophenetic coefficients as well as concordance matrices. To prove this concept, we have developed the first classifier that uses not only sequence and static physical properties of DNA fragment, but also dynamic properties of DNA open states. Therefore, the best performing models with accuracy values up to 90% for all types of sequences were obtained. Furthermore, we have demonstrated that the classifier can serve as a reliable tool enabling promoter DNA fragments to be distinguished from promoter islands despite the similarity of their nucleotide sequences.
Collapse
Affiliation(s)
- Artem Ryasik
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
| | - Mikhail Orlov
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
| | - Evgenia Zykova
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
- Department of Applied Research Informatization, State Institute of Information Technologies and Telecommunications (SIIT&T Informika), per. Brusov 21 st.2, Moscow, 125009, Russia
| | - Timofei Ermak
- Laboratory of Molecular Genetics Systems, Institute of Cytology and Genetics, pr. Akademika Lavrentyeva 10, Novosibirsk 630090, Russia
| | - Anatoly Sorokin
- Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics, ul. Institutskaya 3, Pushchino 142290, Russia
| |
Collapse
|
15
|
Abstract
Transcription is an intricate mechanism and is orchestrated at the promoter region. The cognate motifs in the promoters are observed in only a subset of total genes across different domains of life. Hence, sequence-motif based promoter prediction may not be a holistic approach for whole genomes. Conversely, the DNA structural property, duplex stability is a characteristic of promoters and can be used to delineate them from other genomic sequences. In this study, we have used a DNA duplex stability based algorithm ‘PromPredict’ for promoter prediction in a broad range of eukaryotes, representing various species of yeast, worm, fly, fish, and mammal. Efficiency of the software has been tested in promoter regions of 48 eukaryotic systems. PromPredict achieves recall values, which range from 68 to 92% in various eukaryotes. PromPredict performs well in mammals, although their core promoter regions are GC rich. ‘PromPredict’ has also been tested for its ability to predict promoter regions for various transcript classes (coding and non-coding), TATA-containing and TATA-less promoters as well as on promoter sequences belonging to different gene expression variability categories. The results support the idea that differential DNA duplex stability is a potential predictor of promoter regions in various genomes.
Collapse
|
16
|
Muñoz-Villagrán CM, Mendez KN, Cornejo F, Figueroa M, Undabarrena A, Morales EH, Arenas-Salinas M, Arenas FA, Castro-Nallar E, Vásquez CC. Comparative genomic analysis of a new tellurite-resistant Psychrobacter strain isolated from the Antarctic Peninsula. PeerJ 2018; 6:e4402. [PMID: 29479501 PMCID: PMC5822837 DOI: 10.7717/peerj.4402] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Accepted: 02/01/2018] [Indexed: 01/05/2023] Open
Abstract
The Psychrobacter genus is a cosmopolitan and diverse group of aerobic, cold-adapted, Gram-negative bacteria exhibiting biotechnological potential for low-temperature applications including bioremediation. Here, we present the draft genome sequence of a bacterium from the Psychrobacter genus isolated from a sediment sample from King George Island, Antarctica (3,490,622 bp; 18 scaffolds; G + C = 42.76%). Using phylogenetic analysis, biochemical properties and scanning electron microscopy the bacterium was identified as Psychrobacter glacincola BNF20, making it the first genome sequence reported for this species. P. glacincola BNF20 showed high tellurite (MIC 2.3 mM) and chromate (MIC 6.0 mM) resistance, respectively. Genome-wide nucleotide identity comparisons revealed that P. glacincola BNF20 is highly similar (>90%) to other uncharacterized Psychrobacter spp. such as JCM18903, JCM18902, and P11F6. Bayesian multi-locus phylogenetic analysis showed that P. glacincola BNF20 belongs to a polyphyletic clade with other bacteria isolated from polar regions. A high number of genes related to metal(loid) resistance were found, including tellurite resistance genetic determinants located in two contigs: Contig LIQB01000002.1 exhibited five ter genes, each showing putative promoter sequences (terACDEZ), whereas contig LIQB1000003.2 showed a variant of the terZ gene. Finally, investigating the presence and taxonomic distribution of ter genes in the NCBI’s RefSeq bacterial database (5,398 genomes, as January 2017), revealed that 2,623 (48.59%) genomes showed at least one ter gene. At the family level, most (68.7%) genomes harbored one ter gene and 15.6% exhibited five (including P. glacincola BNF20). Overall, our results highlight the diverse nature (genetic and geographic diversity) of the Psychrobacter genus, provide insights into potential mechanisms of metal resistance, and exemplify the benefits of sampling remote locations for prospecting new molecular determinants.
Collapse
Affiliation(s)
- Claudia Melissa Muñoz-Villagrán
- Laboratorio de Microbiología Molecular, Departamento de Biología, Universidad de Santiago de Chile, Santiago, Chile.,Departamento de Ciencias Básicas, Facultad de Ciencia, Universidad Santo Tomas Sede Santiago, Santiago, Chile
| | - Katterinne N Mendez
- Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Universidad Andrés Bello, Santiago, Chile
| | - Fabian Cornejo
- Laboratorio de Microbiología Molecular, Departamento de Biología, Universidad de Santiago de Chile, Santiago, Chile
| | - Maximiliano Figueroa
- Laboratorio de Microbiología Molecular, Departamento de Biología, Universidad de Santiago de Chile, Santiago, Chile
| | - Agustina Undabarrena
- Laboratorio de Microbiología Molecular y Biotecnología Ambiental, Departamento de Química & Centro de Biotecnología Daniel Alkalay Lowitt, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Eduardo Hugo Morales
- Laboratorio de Microbiología Molecular, Departamento de Biología, Universidad de Santiago de Chile, Santiago, Chile
| | | | - Felipe Alejandro Arenas
- Laboratorio de Microbiología Molecular, Departamento de Biología, Universidad de Santiago de Chile, Santiago, Chile
| | - Eduardo Castro-Nallar
- Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Universidad Andrés Bello, Santiago, Chile
| | - Claudio Christian Vásquez
- Laboratorio de Microbiología Molecular, Departamento de Biología, Universidad de Santiago de Chile, Santiago, Chile
| |
Collapse
|
17
|
Di Salvo M, Pinatel E, Talà A, Fondi M, Peano C, Alifano P. G4PromFinder: an algorithm for predicting transcription promoters in GC-rich bacterial genomes based on AT-rich elements and G-quadruplex motifs. BMC Bioinformatics 2018; 19:36. [PMID: 29409441 PMCID: PMC5801747 DOI: 10.1186/s12859-018-2049-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 01/29/2018] [Indexed: 11/10/2022] Open
Abstract
Background Over the last few decades, computational genomics has tremendously contributed to decipher biology from genome sequences and related data. Considerable effort has been devoted to the prediction of transcription promoter and terminator sites that represent the essential “punctuation marks” for DNA transcription. Computational prediction of promoters in prokaryotes is a problem whose solution is far from being determined in computational genomics. The majority of published bacterial promoter prediction tools are based on a consensus-sequences search and they were designed specifically for vegetative σ70 promoters and, therefore, not suitable for promoter prediction in bacteria encoding a lot of σ factors, like actinomycetes. Results In this study we investigated the possibility to identify putative promoters in prokaryotes based on evolutionarily conserved motifs, and focused our attention on GC-rich bacteria in which promoter prediction with conventional, consensus-based algorithms is often not-exhaustive. Here, we introduce G4PromFinder, a novel algorithm that predicts putative promoters based on AT-rich elements and G-quadruplex DNA motifs. We tested its performances by using available genomic and transcriptomic data of the model microorganisms Streptomyces coelicolor A3(2) and Pseudomonas aeruginosa PA14. We compared our results with those obtained by three currently available promoter predicting algorithms: the σ70consensus-based PePPER, the σ factors consensus-based bTSSfinder, and PromPredict which is based on double-helix DNA stability. Our results demonstrated that G4PromFinder is more suitable than the three reference tools for both the genomes. In fact our algorithm achieved the higher accuracy (F1-scores 0.61 and 0.53 in the two genomes) as compared to the next best tool that is PromPredict (F1-scores 0.46 and 0.48). Consensus-based algorithms produced lower performances with the analyzed GC-rich genomes. Conclusions Our analysis shows that G4PromFinder is a powerful tool for promoter search in GC-rich bacteria, especially for bacteria coding for a lot of σ factors, such as the model microorganism S. coelicolor A3(2). Moreover consensus-based tools and, in general, tools that are based on specific features of bacterial σ factors seem to be less performing for promoter prediction in these types of bacterial genomes. Electronic supplementary material The online version of this article (10.1186/s12859-018-2049-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marco Di Salvo
- Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy
| | - Eva Pinatel
- Institute of Biomedical Technologies National Research Council, Milan, Segrate, Italy
| | - Adelfia Talà
- Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy
| | - Marco Fondi
- Department of Biology, University of Florence, Florence, Italy
| | - Clelia Peano
- Institute of Genetic and Biomedical Research (IRGB), UOS of Milan, National Research Council, Milan, Italy.,Humanitas Clinical and Research Center, Milan, Rozzano, Italy
| | - Pietro Alifano
- Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy.
| |
Collapse
|
18
|
Chechetkin VR, Lobzin VV. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique. J Theor Biol 2017; 426:162-179. [PMID: 28552553 DOI: 10.1016/j.jtbi.2017.05.033] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2017] [Revised: 04/23/2017] [Accepted: 05/23/2017] [Indexed: 12/15/2022]
Abstract
Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization.
Collapse
Affiliation(s)
- V R Chechetkin
- Engelhardt Institute of Molecular Biology of Russian Academy of Sciences, Vavilov str., 32, Moscow 119334, Russia; Theoretical Department of Division for Perspective Investigations, Troitsk Institute of Innovation and Thermonuclear Investigations (TRINITI), Moscow, Troitsk District 108840, Russia.
| | - V V Lobzin
- School of Physics, University of Sydney, Sydney, NSW 2006, Australia.
| |
Collapse
|
19
|
Kumar A, Bansal M. Unveiling DNA structural features of promoters associated with various types of TSSs in prokaryotic transcriptomes and their role in gene expression. DNA Res 2017; 24:25-35. [PMID: 27803028 PMCID: PMC5381344 DOI: 10.1093/dnares/dsw045] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/23/2016] [Indexed: 01/28/2023] Open
Abstract
Next-generation sequencing studies have revealed that a variety of transcripts are present in the prokaryotic transcriptome and a significant fraction of them are functional, being involved in various regulatory activities apart from coding for proteins. Identification of promoters associated with different transcripts is necessary for characterization of the transcriptome. Promoter regions have been shown to have unique structural features as compared with their flanking region, in organisms covering all domains of life. Here we report an in silico analysis of DNA sequence dependent structural properties like stability, bendability and curvature in the promoter region of six different prokaryotic transcriptomes. Using these structural features, we predicted promoters associated with different categories of transcripts (mRNA, internal, antisense and non-coding), which constitute the transcriptome. Promoter annotation using structural features is fairly accurate and reliable with about 50% of the primary promoters being characterized by all three structural properties while at least one property identifies 95%. We also studied the relative differences of these structural features in terms of gene expression and found that the features, viz. lower stability, lesser bendability and higher curvature are more prominent in the promoter regions which are associated with high gene expression as compared with low expression genes. Hence, promoters, which are associated with higher gene expression, get annotated well using DNA structural features as compared with those, which are linked to lower gene expression.
Collapse
Affiliation(s)
| | - Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012 Karnataka, India
| |
Collapse
|
20
|
Kumar A, Manivelan V, Bansal M. Structural features of DNA are conserved in the promoter region of orthologous genes across different strains ofHelicobacter pylori. FEMS Microbiol Lett 2016; 363:fnw207. [DOI: 10.1093/femsle/fnw207] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/25/2016] [Indexed: 12/19/2022] Open
|
21
|
Nigatu D, Henkel W, Sobetzko P, Muskhelishvili G. Relationship between digital information and thermodynamic stability in bacterial genomes. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2016; 2016:4. [PMID: 26877724 PMCID: PMC4740571 DOI: 10.1186/s13637-016-0037-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 01/19/2016] [Indexed: 02/06/2023]
Abstract
Ever since the introduction of the Watson-Crick model, numerous efforts have been made to fully characterize the digital information content of the DNA. However, it became increasingly evident that variations of DNA configuration also provide an “analog” type of information related to the physicochemical properties of the DNA, such as thermodynamic stability and supercoiling. Hence, the parallel investigation of the digital information contained in the base sequence with associated analog parameters is very important for understanding the coding capacity of the DNA. In this paper, we represented analog information by its thermodynamic stability and compare it with digital information using Shannon and Gibbs entropy measures on the complete genome sequences of several bacteria, including Escherichia coli (E. coli), Bacillus subtilis (B. subtilis), Streptomyces coelicolor (S. coelicolor), and Salmonella typhimurium (S. typhimurium). Furthermore, the link to the broader classes of functional gene groups (anabolic and catabolic) is examined. Obtained results demonstrate the couplings between thermodynamic stability and digital sequence organization in the bacterial genomes. In addition, our data suggest a determinative role of the genome-wide distribution of DNA thermodynamic stability in the spatial organization of functional gene groups.
Collapse
Affiliation(s)
- Dawit Nigatu
- Transmission Systems Group, School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, Bremen, 28759 Germany
| | - Werner Henkel
- Transmission Systems Group, School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, Bremen, 28759 Germany
| | - Patrick Sobetzko
- Philipps-Universität Marburg, LOEWE-Zentrum für Synthetische Mikrobiologie, Hans-Meerwein-Straße, Mehrzweckgebäude, Marburg, 35043 Germany
| | - Georgi Muskhelishvili
- Microbiologie, Adaptation, Pathogénie, UMR5240 CNRS-UCBL-INSA-BayerCropScience, Lyon, France ; Jacobs University Bremen, Campus Ring 1, Bremen, 28759 Germany
| |
Collapse
|
22
|
Phylogenomic identification of regulatory sequences in bacteria: an analysis of statistical power and an application to Borrelia burgdorferi sensu lato. mBio 2015; 6:mBio.00011-15. [PMID: 25873371 PMCID: PMC4453575 DOI: 10.1128/mbio.00011-15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
UNLABELLED Phylogenomic footprinting is an approach for ab initio identification of genome-wide regulatory elements in bacterial species based on sequence conservation. The statistical power of the phylogenomic approach depends on the degree of sequence conservation, the length of regulatory elements, and the level of phylogenetic divergence among genomes. Building on an earlier model, we propose a binomial model that uses synonymous tree lengths as neutral expectations for determining the statistical significance of conserved intergenic spacer (IGS) sequences. Simulations show that the binomial model is robust to variations in the value of evolutionary parameters, including base frequencies and the transition-to-transversion ratio. We used the model to search for regulatory sequences in the Lyme disease species group (Borrelia burgdorferi sensu lato) using 23 genomes. The model indicates that the currently available set of Borrelia genomes would not yield regulatory sequences shorter than five bases, suggesting that genome sequences of additional B. burgdorferi sensu lato species are needed. Nevertheless, we show that previously known regulatory elements are indeed strongly conserved in sequence or structure across these Borrelia species. Further, we predict with sufficient confidence two new RpoS binding sites, 39 promoters, 19 transcription terminators, 28 noncoding RNAs, and four sets of coregulated genes. These putative cis- and trans-regulatory elements suggest novel, Borrelia-specific mechanisms regulating the transition between the tick and host environments, a key adaptation and virulence mechanism of B. burgdorferi. Alignments of IGS sequences are available on BorreliaBase.org, an online database of orthologous open reading frame (ORF) and IGS sequences in Borrelia. IMPORTANCE While bacterial genomes contain mostly protein-coding genes, they also house DNA sequences regulating the expression of these genes. Gene regulatory sequences tend to be conserved during evolution. By sequencing and comparing related genomes, one can therefore identify regulatory sequences in bacteria based on sequence conservation. Here, we describe a statistical framework by which one may determine how many genomes need to be sequenced and at what level of evolutionary relatedness in order to achieve a high level of statistical significance. We applied the framework to Borrelia burgdorferi, the Lyme disease agent, and identified a large number of candidate regulatory sequences, many of which are known to be involved in regulating the phase transition between the tick vector and mammalian hosts.
Collapse
|
23
|
Lloréns-Rico V, Lluch-Senar M, Serrano L. Distinguishing between productive and abortive promoters using a random forest classifier in Mycoplasma pneumoniae. Nucleic Acids Res 2015; 43:3442-53. [PMID: 25779052 PMCID: PMC4402517 DOI: 10.1093/nar/gkv170] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 02/22/2015] [Indexed: 12/01/2022] Open
Abstract
Distinguishing between promoter-like sequences in bacteria that belong to true or abortive promoters, or to those that do not initiate transcription at all, is one of the important challenges in transcriptomics. To address this problem, we have studied the genome-reduced bacterium Mycoplasma pneumoniae, for which the RNAs associated with transcriptional start sites have been recently experimentally identified. We determined the contribution to transcription events of different genomic features: the –10, extended –10 and –35 boxes, the UP element, the bases surrounding the –10 box and the nearest-neighbor free energy of the promoter region. Using a random forest classifier and the aforementioned features transformed into scores, we could distinguish between true, abortive promoters and non-promoters with good –10 box sequences. The methods used in this characterization of promoters can be extended to other bacteria and have important applications for promoter design in bacterial genome engineering.
Collapse
Affiliation(s)
- Verónica Lloréns-Rico
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Maria Lluch-Senar
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Luis Serrano
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
24
|
A statistical thermodynamic model for investigating the stability of DNA sequences from oligonucleotides to genomes. Biophys J 2015; 106:2465-73. [PMID: 24896126 DOI: 10.1016/j.bpj.2014.04.029] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Revised: 03/20/2014] [Accepted: 04/17/2014] [Indexed: 12/12/2022] Open
Abstract
We describe the development and testing of a simple statistical mechanics methodology for duplex DNA applicable to sequences of any composition and extensible to genomes. The microstates of a DNA sequence are modeled in terms of blocks of basepairs that are assumed to be fully closed (paired) or open. This approach generates an ensemble of bubblelike microstates that are used to calculate the corresponding partition function. The energies of the microstates are calculated as additive contributions from hydrogen bonding, basepair stacking, and solvation terms parameterized from a comprehensive series of molecular dynamics simulations including solvent and ions. Thermodynamic properties and nucleotide stability constants for DNA sequences follow directly from the partition function. The methodology was tested by comparing computed free energies per basepair with the experimental melting temperatures of 60 oligonucleotides, yielding a correlation coefficient of -0.96. The thermodynamic stability of genic/nongenic regions was tested in terms of nucleotide stability constants versus sequence for the Escherichia coli K-12 genome. It showed clear differentiation of the genes from promoters and captures genic regions with a sensitivity of 0.94. The statistical thermodynamic model presented here provides a seemingly new handle on the challenging problem of interpreting genomic sequences.
Collapse
|
25
|
|
26
|
Yella VR, Bansal M. In silico Identification of Eukaryotic Promoters. SYSTEMS AND SYNTHETIC BIOLOGY 2015. [DOI: 10.1007/978-94-017-9514-2_4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
27
|
Bansal M, Kumar A, Yella VR. Role of DNA sequence based structural features of promoters in transcription initiation and gene expression. Curr Opin Struct Biol 2014; 25:77-85. [PMID: 24503515 DOI: 10.1016/j.sbi.2014.01.007] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Accepted: 01/07/2014] [Indexed: 11/18/2022]
Abstract
Regulatory information for transcription initiation is present in a stretch of genomic DNA, called the promoter region that is located upstream of the transcription start site (TSS) of the gene. The promoter region interacts with different transcription factors and RNA polymerase to initiate transcription and contains short stretches of transcription factor binding sites (TFBSs), as well as structurally unique elements. Recent experimental and computational analyses of promoter sequences show that they often have non-B-DNA structural motifs, as well as some conserved structural properties, such as stability, bendability, nucleosome positioning preference and curvature, across a class of organisms. Here, we briefly describe these structural features, the differences observed in various organisms and their possible role in regulation of gene expression.
Collapse
Affiliation(s)
- Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.
| | - Aditya Kumar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
| | | |
Collapse
|
28
|
Huang WL, Tung CW, Liaw C, Huang HL, Ho SY. Rule-based knowledge acquisition method for promoter prediction in human and Drosophila species. ScientificWorldJournal 2014; 2014:327306. [PMID: 24955394 PMCID: PMC3927563 DOI: 10.1155/2014/327306] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2013] [Accepted: 10/10/2013] [Indexed: 01/08/2023] Open
Abstract
The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.
Collapse
Affiliation(s)
- Wen-Lin Huang
- Department of Management Information System, Asia Pacific Institute of Creativity, Miaoli 351, Taiwan
| | - Chun-Wei Tung
- School of Pharmacy, College of Pharmacy, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| | - Chyn Liaw
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Hui-Ling Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
29
|
YELLA VENKATARAJESH, BANSAL MANJU. DNA STRUCTURAL FEATURES AND ARCHITECTURE OF PROMOTER REGIONS PLAY A ROLE IN GENE RESPONSIVENESS OF S. cerevisiae. J Bioinform Comput Biol 2013; 11:1343001. [DOI: 10.1142/s0219720013430014] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Gene expression is the most fundamental biological process, which is essential for phenotypic variation. It is regulated by various external (environment and evolution) and internal (genetic) factors. The level of gene expression depends on promoter architecture, along with other external factors. Presence of sequence motifs, such as transcription factor binding sites (TFBSs) and TATA-box, or DNA methylation in vertebrates has been implicated in the regulation of expression of some genes in eukaryotes, but a large number of genes lack these sequences. On the other hand, several experimental and computational studies have shown that promoter sequences possess some special structural properties, such as low stability, less bendability, low nucleosome occupancy, and more curvature, which are prevalent across all organisms. These structural features may play role in transcription initiation and regulation of gene expression. We have studied the relationship between the structural features of promoter DNA, promoter directionality and gene expression variability in S. cerevisiae. This relationship has been analyzed for seven different measures of gene expression variability, along with two different regulatory effect measures. We find that a few of the variability measures of gene expression are linked to DNA structural properties, nucleosome occupancy, TATA-box presence, and bidirectionality of promoter regions. Interestingly, gene responsiveness is most intimately correlated with DNA structural features and promoter architecture.
Collapse
Affiliation(s)
- VENKATA RAJESH YELLA
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India
| | - MANJU BANSAL
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India
| |
Collapse
|
30
|
Kocíncová D, Lam JS. A deletion in the wapB promoter in many serotypes of Pseudomonas aeruginosa accounts for the lack of a terminal glucose residue in the core oligosaccharide and resistance to killing by R3-pyocin. Mol Microbiol 2013; 89:464-78. [PMID: 23750877 DOI: 10.1111/mmi.12289] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/04/2013] [Indexed: 01/16/2023]
Abstract
Pseudomonas aeruginosa is an opportunistic human pathogen producing a variety of virulence factors. One of them is lipopolysaccharide, consisting of endotoxic lipid A and long-chain O-antigen polysaccharide, which are connected together through a short linker region, called core oligosaccharide. Chemical structures of the core oligosaccharide are well conserved, with one exception, in that certain strains of P. aeruginosa add a terminal glucose residue (Glc(IV) ) to core by a transferase reaction, due to the activity of a glucosyltransferase, WapB. Here, we investigated the regulation of wapB expression. Our results showed that while the majority of analysed genomes of P. aeruginosa contain wapB, many of these have a conserved identical 5-nucleotide deletion in the upstream region that inactivated the promoter. This deletion is within the -10 hexamer that is recognized by a principle sigma factor (RpoD, or σ70) as proven by data from an electromobility shift assay. These results provide the molecular basis of why LPS core of many P. aeruginosa strains is lacking Glc(IV) . In addition, we show that absence of Glc(IV) due to an inactive wapB promoter confers resistance to killing by R3-pyocin, a phage tail-like bacteriocin of P. aeruginosa.
Collapse
Affiliation(s)
- Dana Kocíncová
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | | |
Collapse
|
31
|
Liman R, Facey PD, van Keulen G, Dyson PJ, Del Sol R. A laterally acquired galactose oxidase-like gene is required for aerial development during osmotic stress in Streptomyces coelicolor. PLoS One 2013; 8:e54112. [PMID: 23326581 PMCID: PMC3543389 DOI: 10.1371/journal.pone.0054112] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2012] [Accepted: 12/10/2012] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic reconstruction revealed that most Actinobacterial orthologs of S. coelicolor SCO2837, encoding a metal-dependent galactose oxidase-like protein, are found within Streptomyces and were probably acquired by horizontal gene transfer from fungi. Disruption of SCO2837 (glxA) caused a conditional bld phenotype that could not be reversed by extracellular complementation. Studies aimed at characterising the regulation of expression of glxA showed that it is not a target for other bld genes. We provide evidence that glxA is required for osmotic adaptation, although independently from the known osmotic stress response element SigB. glxA has been predicted to be part of an operon with the transcription unit comprising the upstream cslA gene and glxA. However, both phenotypic and expression studies indicate that it is also expressed from an independent promoter region internal to cslA. GlxA displays an in situ localisation pattern similar to that one observed for CslA at hyphal tips, but localisation of the former is independent of the latter. The functional role of GlxA in relation to CslA is discussed.
Collapse
Affiliation(s)
- Recep Liman
- Faculty of Science, Department of Genetics, Usak University, Usak, Turkey
| | - Paul D. Facey
- Institute of Life Science, College of Medicine, Swansea University, Singleton Park, Swansea, United Kingdom
| | - Geertje van Keulen
- Institute of Life Science, College of Medicine, Swansea University, Singleton Park, Swansea, United Kingdom
| | - Paul J. Dyson
- Institute of Life Science, College of Medicine, Swansea University, Singleton Park, Swansea, United Kingdom
| | - Ricardo Del Sol
- Institute of Life Science, College of Medicine, Swansea University, Singleton Park, Swansea, United Kingdom
- * E-mail:
| |
Collapse
|
32
|
Cobb RE, Luo Y, Freestone T, Zhao H. Drug Discovery and Development via Synthetic Biology. Synth Biol (Oxf) 2013. [DOI: 10.1016/b978-0-12-394430-6.00010-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
33
|
Meysman P, Marchal K, Engelen K. DNA structural properties in the classification of genomic transcription regulation elements. Bioinform Biol Insights 2012; 6:155-68. [PMID: 22837642 PMCID: PMC3399529 DOI: 10.4137/bbi.s9426] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
It has been long known that DNA molecules encode information at various levels. The most basic level comprises the base sequence itself and is primarily important for the encoding of proteins and direct base recognition by DNA-binding proteins. A more elusive level consists of the local structural properties of the DNA molecule wherein the DNA sequence only plays an indirect supportive role. These properties are nevertheless an important factor in a large number of biomolecular processes and can be considered as informative signals for the presence of a variety of genomic features. Several recent studies have unequivocally shown the benefit of relying on such DNA properties for modeling and predicting genomic features as diverse as transcription start sites, transcription factor binding sites, or nucleosome occupancy. This review is meant to provide an overview of the key aspects of these DNA conformational and physicochemical properties. To illustrate their potential added value compared to relying solely on the nucleotide sequence in genomics studies, we discuss their application in research on transcription regulation mechanisms as representative cases.
Collapse
Affiliation(s)
- Pieter Meysman
- Department of Molecular and Microbial Systems, KULeuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | | | | |
Collapse
|
34
|
Kumar A, Bansal M. Characterization of structural and free energy properties of promoters associated with Primary and Operon TSS in Helicobacter pylori genome and their orthologs. J Biosci 2012; 37:423-31. [DOI: 10.1007/s12038-012-9214-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
35
|
Khandelwal G, Jayaram B. DNA-water interactions distinguish messenger RNA genes from transfer RNA genes. J Am Chem Soc 2012; 134:8814-6. [PMID: 22551381 DOI: 10.1021/ja3020956] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Physicochemical properties of DNA sequences as a guide to developing insights into genome organization has received little attention. Here, we utilize the energetics of DNA to further advance the knowledge on its language at a molecular level. Specifically, we ask the question whether physicochemical properties of different functional units on genomes differ. We extract intramolecular and solvation energies of different DNA base pair steps from a comprehensive set of molecular dynamics simulations. We then investigate the solvation behavior of DNA sequences coding for mRNAs and tRNAs. Distinguishing mRNA genes from tRNA genes is a tricky problem in genome annotation without assumptions on length of DNA and secondary structure of the product of transcription. We find that solvation energetics of DNA behaves as an extremely efficient property in discriminating 2,063,537 genes coding for mRNAs from 56,251 genes coding for tRNAs in all (~1500) completely sequenced prokaryotic genomes.
Collapse
Affiliation(s)
- Garima Khandelwal
- Department of Chemistry, Indian Institute of Technology Delhi, Hauz Khas, New Delhi-110016, India
| | | |
Collapse
|
36
|
Rangannan V, Bansal M. PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes. BMC Res Notes 2011; 4:257. [PMID: 21781326 PMCID: PMC3160392 DOI: 10.1186/1756-0500-4-257] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Accepted: 07/22/2011] [Indexed: 12/19/2022] Open
Abstract
Background As more and more genomes are being sequenced, an overview of their genomic features and annotation of their functional elements, which control the expression of each gene or transcription unit of the genome, is a fundamental challenge in genomics and bioinformatics. Findings Relative stability of DNA sequence has been used to predict promoter regions in 913 microbial genomic sequences with GC-content ranging from 16.6% to 74.9%. Irrespective of the genome GC-content the relative stability based promoter prediction method has already been proven to be robust in terms of recall and precision. The predicted promoter regions for the 913 microbial genomes have been accumulated in a database called PromBase. Promoter search can be carried out in PromBase either by specifying the gene name or the genomic position. Each predicted promoter region has been assigned to a reliability class (low, medium, high, very high and highest) based on the difference between its average free energy and the downstream region. The recall and precision values for each class are shown graphically in PromBase. In addition, PromBase provides detailed information about base composition, CDS and CG/TA skews for each genome and various DNA sequence dependent structural properties (average free energy, curvature and bendability) in the vicinity of all annotated translation start sites (TLS). Conclusion PromBase is a database, which contains predicted promoter regions and detailed analysis of various genomic features for 913 microbial genomes. PromBase can serve as a valuable resource for comparative genomics study and help the experimentalist to rapidly access detailed information on various genomic features and putative promoter regions in any given genome. This database is freely accessible for academic and non- academic users via the worldwide web http://nucleix.mbu.iisc.ernet.in/prombase/.
Collapse
Affiliation(s)
- Vetriselvi Rangannan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560 012, India.
| | | |
Collapse
|
37
|
Morey C, Mookherjee S, Rajasekaran G, Bansal M. DNA free energy-based promoter prediction and comparative analysis of Arabidopsis and rice genomes. PLANT PHYSIOLOGY 2011; 156:1300-15. [PMID: 21531900 PMCID: PMC3135951 DOI: 10.1104/pp.110.167809] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Accepted: 04/21/2011] [Indexed: 05/06/2023]
Abstract
The cis-regulatory regions on DNA serve as binding sites for proteins such as transcription factors and RNA polymerase. The combinatorial interaction of these proteins plays a crucial role in transcription initiation, which is an important point of control in the regulation of gene expression. We present here an analysis of the performance of an in silico method for predicting cis-regulatory regions in the plant genomes of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) on the basis of free energy of DNA melting. For protein-coding genes, we achieve recall and precision of 96% and 42% for Arabidopsis and 97% and 31% for rice, respectively. For noncoding RNA genes, the program gives recall and precision of 94% and 75% for Arabidopsis and 95% and 90% for rice, respectively. Moreover, 96% of the false-positive predictions were located in noncoding regions of primary transcripts, out of which 20% were found in the first intron alone, indicating possible regulatory roles. The predictions for orthologous genes from the two genomes showed a good correlation with respect to prediction scores and promoter organization. Comparison of our results with an existing program for promoter prediction in plant genomes indicates that our method shows improved prediction capability.
Collapse
Affiliation(s)
| | | | | | - Manju Bansal
- Indian Institute of Science, Bangalore 560 012, India
| |
Collapse
|
38
|
Using single-nucleotide polymorphisms to discriminate disease-associated from carried genomes of Neisseria meningitidis. J Bacteriol 2011; 193:3633-41. [PMID: 21622743 DOI: 10.1128/jb.01198-10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Neisseria meningitidis is one of the main agents of bacterial meningitis, causing substantial morbidity and mortality worldwide. However, most of the time N. meningitidis is carried as a commensal not associated with invasive disease. The genomic basis of the difference between disease-associated and carried isolates of N. meningitidis may provide critical insight into mechanisms of virulence, yet it has remained elusive. Here, we have taken a comparative genomics approach to interrogate the difference between disease-associated and carried isolates of N. meningitidis at the level of individual nucleotide variations (i.e., single nucleotide polymorphisms [SNPs]). We aligned complete genome sequences of 8 disease-associated and 4 carried isolates of N. meningitidis to search for SNPs that show mutually exclusive patterns of variation between the two groups. We found 63 SNPs that distinguish the 8 disease-associated genomes from the 4 carried genomes of N. meningitidis, which is far more than can be expected by chance alone given the level of nucleotide variation among the genomes. The putative list of SNPs that discriminate between disease-associated and carriage genomes may be expected to change with increased sampling or changes in the identities of the isolates being compared. Nevertheless, we show that these discriminating SNPs are more likely to reflect phenotypic differences than shared evolutionary history. Discriminating SNPs were mapped to genes, and the functions of the genes were evaluated for possible connections to virulence mechanisms. A number of overrepresented functional categories related to virulence were uncovered among SNP-associated genes, including genes related to the category "symbiosis, encompassing mutualism through parasitism."
Collapse
|
39
|
Sato M. GC Wave Analysis in Promoter Regions via Wavelet Analysis and Support Vector Machine. ACTA ACUST UNITED AC 2011. [DOI: 10.1016/j.procs.2011.08.053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|