1
|
Chen DD, Zhang LL, Zhang JH, Ban WT, Li Q, Wu JC. Comparative genomic analysis of metal-tolerant bacteria reveals significant differences in metal adaptation strategies. Microbiol Spectr 2025:e0168024. [PMID: 40272196 DOI: 10.1128/spectrum.01680-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 12/18/2024] [Indexed: 04/25/2025] Open
Abstract
Metal-tolerant bacteria have been commercially used in wastewater treatment, bio-fertilizer, and soil remediation, etc. However, the mechanisms underlying their actions are not yet fully understood. We isolated metal-tolerant bacteria from the rhizosphere soil samples with metal-enriched media containing Cu, Fe, or Mn, sequenced and compared the genomes, and analyzed their metal adaptation strategies at genomic levels to better understand their action mechanisms. Totally, 32 metal-tolerant isolates were identified and classified into 12 genera based on phylogenetic analysis. The determination of maximum tolerance concentration and the effect of metal ions on the isolates indicated that Serratia marcescens X1 (CuSO4: 1,000 mg/L, FeSO4: 1,000 mg/L, and MnSO4.4H2O: 2,000 mg/L), Mammaliicoccus sciuri X26 (FeSO4: 600 mg/L and MnSO4.4H2O: 2,000 mg/L), and Rummeliibacillus pycnus X33 (CuSO4: 400 mg/L, FeSO4: 1,000 mg/L, and MnSO4.4H2O: 800 mg/L) showed significant differences in metal tolerance to Cu, Fe, and Mn with other isolates. They possess quite different genomic features that enable them to adapt to various metal ions. S. marcescens X1 possesses abundant genes required for Cu, Fe, and Mn homeostasis. M. sciuri X26 has a number of genes involved in Mn and Zn homeostasis but with no genes responsible for Cu and Ca transport. R. pycnus X33 is rich in Fe, Zn, and Mg transport systems but poor in Cu and Mn transport systems. It is thus inferred that the combined use of them would compensate for their differences and enhance their ability in accumulating a wider range of heavy metals for promoting their applications in industry, agriculture, and ecology. IMPORTANCE Metal-tolerant bacteria have wide applications in environmental, agricultural, and ecological fields, but their action strategies are not yet fully understood. We isolated 32 metal-tolerant bacteria from the rhizosphere soil samples. Among them, Serratia marcescens X1, Mammaliicoccus sciuri X26, and Rummeliibacillus pycnus X33 showed significant differences in metal tolerance to Cu, Fe, and Mn with other isolates. Comparative genomic analysis revealed that they have abundant and different genomic features to adapt to various metal ions. It is thus inferred that the combined use of them would compensate for their differences and enhance their ability to accumulate heavy metal ions, widening their applications in industry, agriculture, and ecology.
Collapse
Affiliation(s)
- Dai Di Chen
- Guangdong Engineering Technology Research Center of Enzyme and Biocatalysis, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, China
| | - Liu Lian Zhang
- Guangdong Engineering Technology Research Center of Enzyme and Biocatalysis, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, China
| | - Jiu Hua Zhang
- Guangdong Engineering Technology Research Center of Enzyme and Biocatalysis, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, China
| | - Wen Ting Ban
- Guangdong Engineering Technology Research Center of Enzyme and Biocatalysis, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, China
| | - Qingxin Li
- Guangdong Engineering Technology Research Center of Enzyme and Biocatalysis, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, China
| | - Jin Chuan Wu
- Guangdong Engineering Technology Research Center of Enzyme and Biocatalysis, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, China
| |
Collapse
|
2
|
Benvenuti JL, Casa PL, Pessi de Abreu F, Martinez GS, de Avila E Silva S. From straight to curved: A historical perspective of DNA shape. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 193:46-54. [PMID: 39260792 DOI: 10.1016/j.pbiomolbio.2024.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 07/30/2024] [Accepted: 09/04/2024] [Indexed: 09/13/2024]
Abstract
DNA is the macromolecule responsible for storing the genetic information of a cell and it has intrinsic properties such as deformability, stability and curvature. DNA Curvature plays an important role in gene transcription and, consequently, in the subsequent production of proteins, a fundamental process of cells. With recent advances in bioinformatics and theoretical biology, it became possible to analyze and understand the involvement of DNA Curvature as a discriminatory characteristic of gene-promoting regions. These regions act as sites where RNAp (ribonucleic acid-polymerase) binds to initiate transcription. This review aims to describe the formation of Curvature, as well as highlight its importance in predicting promoters. Furthermore, this article provides the potential of DNA Curvature as a distinguishing feature for promoter prediction tools, as well as outlining the calculation procedures that have been described by other researchers. This work may support further studies directed towards the enhancement of promoter prediction software.
Collapse
Affiliation(s)
- Jean Lucas Benvenuti
- Universidade de Caxias do Sul. Petrópolis, Caxias do Sul, Rio Grande do Sul, Brazil.
| | - Pedro Lenz Casa
- Universidade de Caxias do Sul. Petrópolis, Caxias do Sul, Rio Grande do Sul, Brazil
| | - Fernanda Pessi de Abreu
- Universidade de Caxias do Sul. Petrópolis, Caxias do Sul, Rio Grande do Sul, Brazil; Instituto de Biociências, Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | | | | |
Collapse
|
3
|
Zhao S, Xu Z, Wang J. Stenotrophomonas pavanii MY01 induces phosphate precipitation of Cu(II) and Zn(II) by degrading glyphosate: performance, pathway and possible genes involved. Front Microbiol 2024; 15:1479902. [PMID: 39507330 PMCID: PMC11538021 DOI: 10.3389/fmicb.2024.1479902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 09/30/2024] [Indexed: 11/08/2024] Open
Abstract
Microbial bioremediation is an advanced technique for removing herbicides and heavy metals from agricultural soil. In this study, the strain Stenotrophomonas pavanii MY01 was used for its ability to degrade glyphosate, a phosphorus-containing organic compound, producing PO4 3- as a byproduct. PO4 3- is known to form stable precipitates with heavy metals, indicating that strain MY01 could potentially remove heavy metals by degrading glyphosate. Therefore, the present experiment induced phosphate precipitation from Cu(II) (Hereinafter referred to as Cu2+) and Zn(II) (Hereinafter referred to as Zn2+) by degrading glyphosate with strain MY01. Meanwhile, the whole genome of strain MY01 was mined for its glyphosate degradation mechanism and its heavy metal removal mechanism. The results of the study showed that the strain degraded glyphosate best at 34°C, pH = 7.7, and an inoculum of 0.7%, reaching 72.98% within 3d. The highest removal of Cu2+ and Zn2+ in the test was 75.95 and 68.54%, respectively. A comparison of strain MY01's genome with glyphosate degradation genes showed that protein sequences GE000474 and GE002603 had strong similarity to glyphosate oxidoreductase and C-P lyase. This suggests that these sequences may be key to the strain's ability to degrade glyphosate. The GE001435 sequence appears to be related to the phosphate pathway, which could enable phosphate excretion into the environment, where it forms stable coordination complexes with heavy metals.
Collapse
Affiliation(s)
- Shengchen Zhao
- College of Resource and Environmental Science, Jilin Agricultural University, Changchun, Jilin, China
| | - Zitong Xu
- Key Laboratory of Straw Biology and Utilization, Ministry of Education, Jilin Agricultural University, Changchun, Jilin, China
| | - Jihong Wang
- College of Resource and Environmental Science, Jilin Agricultural University, Changchun, Jilin, China
| |
Collapse
|
4
|
Zhao S, Wang J. Biodegradation of atrazine and nicosulfuron by Streptomyces nigra LM01: Performance, degradative pathway, and possible genes involved. JOURNAL OF HAZARDOUS MATERIALS 2024; 471:134336. [PMID: 38640665 DOI: 10.1016/j.jhazmat.2024.134336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 04/21/2024]
Abstract
Microbial herbicide degradation is an efficient bioremediation method. In this study, a strain of Streptomyces nigra, LM01, which efficiently degrades atrazine and nicosulfuron, was isolated from a corn field using a direct isolation method. The degradation effects of the identified strain on two herbicides were investigated and optimized using an artificial neural network. The maximum degradation rates of S. nigra LM01 were 58.09 % and 42.97 % for atrazine and nicosulfuron, respectively. The degradation rate of atrazine in the soil reached 67.94 % when the concentration was 108 CFU/g after 5 d and was less effective than that of nicosulfuron. Whole genome sequencing of strain LM01 helped elucidate the possible degradation pathways of atrazine and nicosulfuron. The protein sequences of strain LM01 were aligned with the sequences of the degraded proteins of the two herbicides by using the National Center for Biotechnology Information platform. The sequence (GE005358, GE001556, GE004212, GE005218, GE004846, GE002487) with the highest query cover was retained and docked with the small-molecule ligands of the herbicides. The results revealed a binding energy of - 6.23 kcal/mol between GE005358 and the atrazine ligand and - 6.66 kcal/mol between GE002487 and the nicosulfuron ligand.
Collapse
Affiliation(s)
- Shengchen Zhao
- College of Resource and Environmental Science, Jilin Agricultural University, Changchun 130118, Jilin, China
| | - Jihong Wang
- College of Resource and Environmental Science, Jilin Agricultural University, Changchun 130118, Jilin, China.
| |
Collapse
|
5
|
Lei R, Jia J, Qin L, Wei X. iPro2L-DG: Hybrid network based on improved densenet and global attention mechanism for identifying promoter sequences. Heliyon 2024; 10:e27364. [PMID: 38510021 PMCID: PMC10950492 DOI: 10.1016/j.heliyon.2024.e27364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/24/2024] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
The promoter is a key DNA sequence whose primary function is to control the initiation time and the degree of expression of gene transcription. Accurate identification of promoters is essential for understanding gene expression studies. Traditional sequencing techniques for identifying promoters are costly and time-consuming. Therefore, the development of computational methods to identify promoters has become critical. Since deep learning methods show great potential in identifying promoters, this study proposes a new promoter prediction model, called iPro2L-DG. The iPro2L-DG predictor, based on an improved Densely Connected Convolutional Network (DenseNet) and a Global Attention Mechanism (GAM), is constructed to achieve the prediction of promoters. The promoter sequences are combined feature encoding using C2 encoding and nucleotide chemical property (NCP) encoding. An improved DenseNet extracts advanced feature information from the combined feature encoding. GAM evaluates the importance of advanced feature information in terms of channel and spatial dimensions, and finally uses a Full Connect Neural Network (FNN) to derive prediction probabilities. The experimental results showed that the accuracy of iPro2L-DG in the first layer (promoter identification) was 94.10% with Matthews correlation coefficient value of 0.8833. In the second layer (promoter strength prediction), the accuracy was 89.42% with Matthews correlation coefficient value of 0.7915. The iPro2L-DG predictor significantly outperforms other existing predictors in promoter identification and promoter strength prediction. Therefore, our proposed model iPro2L-DG is the most advanced promoter prediction tool. The source code of the iPro2L-DG model can be found in https://github.com/leirufeng/iPro2L-DG.
Collapse
Affiliation(s)
- Rufeng Lei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Lulu Qin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Xin Wei
- Business School, Jiangxi Institute of Fashion Technology, Nanchang, 330044, China
| |
Collapse
|
6
|
Meier D, Rauch C, Wagner M, Klemm P, Blumenkamp P, Müller R, Ellenberger E, Karia KM, Vecchione S, Serrania J, Lechner M, Fritz G, Goesmann A, Becker A. A MoClo-Compatible Toolbox of ECF Sigma Factor-Based Regulatory Switches for Proteobacterial Chassis. BIODESIGN RESEARCH 2024; 6:0025. [PMID: 38384496 PMCID: PMC10880074 DOI: 10.34133/bdr.0025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 12/04/2023] [Indexed: 02/23/2024] Open
Abstract
The construction of complex synthetic gene circuits with predetermined and reliable output depends on orthogonal regulatory parts that do not inadvertently interfere with the host machinery or with other circuit components. Previously, extracytoplasmic function sigma factors (ECFs), a diverse group of alternative sigma factors with distinct promoter specificities, were shown to have great potential as context-independent regulators, but so far, they have only been used in a few model species. Here, we show that the alphaproteobacterium Sinorhizobium meliloti, which has been proposed as a plant-associated bacterial chassis for synthetic biology, has a similar phylogenetic ECF acceptance range as the gammaproteobacterium Escherichia coli. A common set of orthogonal ECF-based regulators that can be used in both bacterial hosts was identified and used to create 2-step delay circuits. The genetic circuits were implemented in single copy in E. coli by chromosomal integration using an established method that utilizes bacteriophage integrases. In S. meliloti, we demonstrated the usability of single-copy pABC plasmids as equivalent carriers of the synthetic circuits. The circuits were either implemented on a single pABC or modularly distributed on 3 such plasmids. In addition, we provide a toolbox containing pABC plasmids compatible with the Golden Gate (MoClo) cloning standard and a library of basic parts that enable the construction of ECF-based circuits in S. meliloti and in E. coli. This work contributes to building a context-independent and species-overarching ECF-based toolbox for synthetic biology applications.
Collapse
Affiliation(s)
- Doreen Meier
- Center for Synthetic Microbiology (SYNMIKRO) and Department of Biology,
Philipps-Universität Marburg, Marburg, Germany
| | - Christian Rauch
- Center for Synthetic Microbiology (SYNMIKRO) and Department of Biology,
Philipps-Universität Marburg, Marburg, Germany
| | - Marcel Wagner
- Center for Synthetic Microbiology (SYNMIKRO) and Department of Biology,
Philipps-Universität Marburg, Marburg, Germany
| | - Paul Klemm
- Center for Synthetic Microbiology (SYNMIKRO) and Department of Biology,
Philipps-Universität Marburg, Marburg, Germany
| | - Patrick Blumenkamp
- Bioinformatics and Systems Biology,
Justus-Liebig-Universität Giessen, Giessen, Germany
| | - Raphael Müller
- Bioinformatics and Systems Biology,
Justus-Liebig-Universität Giessen, Giessen, Germany
| | - Eric Ellenberger
- Center for Synthetic Microbiology (SYNMIKRO) and Department of Biology,
Philipps-Universität Marburg, Marburg, Germany
| | - Kinnari M. Karia
- Center for Synthetic Microbiology (SYNMIKRO) and Department of Biology,
Philipps-Universität Marburg, Marburg, Germany
| | - Stefano Vecchione
- Center for Synthetic Microbiology (SYNMIKRO) and Department of Biology,
Philipps-Universität Marburg, Marburg, Germany
| | - Javier Serrania
- Center for Synthetic Microbiology (SYNMIKRO) and Department of Biology,
Philipps-Universität Marburg, Marburg, Germany
| | - Marcus Lechner
- Center for Synthetic Microbiology (SYNMIKRO) and Department of Biology,
Philipps-Universität Marburg, Marburg, Germany
| | - Georg Fritz
- The University of Western Australia, School of Molecular Sciences, Perth, Australia
| | - Alexander Goesmann
- Bioinformatics and Systems Biology,
Justus-Liebig-Universität Giessen, Giessen, Germany
| | - Anke Becker
- Center for Synthetic Microbiology (SYNMIKRO) and Department of Biology,
Philipps-Universität Marburg, Marburg, Germany
| |
Collapse
|
7
|
Wang J, Liu H, Raheem A, Ma Q, Liang X, Guo Y, Lu D. Exploring Mycoplasma ovipneumoniae NXNK2203 infection in sheep: insights from histopathology and whole genome sequencing. BMC Vet Res 2024; 20:20. [PMID: 38200549 PMCID: PMC10777581 DOI: 10.1186/s12917-023-03866-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 12/23/2023] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Mycoplasma ovipneumoniae (M. ovipneumoniae) is a significant pathogen causing respiratory infections in goats and sheep. This study focuses on investigating vulnerability of Hu sheep to M. ovipneumoniae infection in the context of late spring's cold weather conditions through detailed autopsy of a severely affected Hu sheep and whole genome sequencing of M. ovipneumoniae. RESULTS The autopsy findings of the deceased sheep revealed severe pulmonary damage with concentrated tracheal and lung lesions. Histopathological analysis showed tissue degeneration, mucus accumulation, alveolar septum thickening, and cellular necrosis. Immunohistochemistry analysis indicated that M. ovipneumoniae was more in the bronchi compared to the trachea. Genome analysis of M. ovipneumoniae identified a 1,014,835 bp with 686 coding sequences, 3 rRNAs, 30 tRNAs, 6 CRISPRs, 11 genomic islands, 4 prophages, 73 virulence factors, and 20 secreted proteins. CONCLUSION This study investigates the vulnerability of Hu sheep to M. ovipneumoniae infection during late spring's cold weather conditions. Autopsy findings showed severe pulmonary injury in affected sheep, and whole genome sequencing identified genetic elements associated with pathogenicity and virulence factors of M. ovipneumoniae.
Collapse
Affiliation(s)
- Jiandong Wang
- NingXia Academy of Agricultural and Forestry Sciences, Yinchuan, 750002, China
| | - Hongyan Liu
- NingXia Academy of Agricultural and Forestry Sciences, Yinchuan, 750002, China
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
| | - Abdul Raheem
- National Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Qing Ma
- NingXia Academy of Agricultural and Forestry Sciences, Yinchuan, 750002, China
| | - Xiaojun Liang
- NingXia Academy of Agricultural and Forestry Sciences, Yinchuan, 750002, China
| | - Yanan Guo
- School of Agriculture, Ningxia University, Yinchuan, 750021, China.
| | - Doukun Lu
- National Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
8
|
DLm6Am: A Deep-Learning-Based Tool for Identifying N6,2′-O-Dimethyladenosine Sites in RNA Sequences. Int J Mol Sci 2022; 23:ijms231911026. [PMID: 36232325 PMCID: PMC9570463 DOI: 10.3390/ijms231911026] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/10/2022] [Accepted: 09/15/2022] [Indexed: 11/25/2022] Open
Abstract
N6,2′-O-dimethyladenosine (m6Am) is a post-transcriptional modification that may be associated with regulatory roles in the control of cellular functions. Therefore, it is crucial to accurately identify transcriptome-wide m6Am sites to understand underlying m6Am-dependent mRNA regulation mechanisms and biological functions. Here, we used three sequence-based feature-encoding schemes, including one-hot, nucleotide chemical property (NCP), and nucleotide density (ND), to represent RNA sequence samples. Additionally, we proposed an ensemble deep learning framework, named DLm6Am, to identify m6Am sites. DLm6Am consists of three similar base classifiers, each of which contains a multi-head attention module, an embedding module with two parallel deep learning sub-modules, a convolutional neural network (CNN) and a Bi-directional long short-term memory (BiLSTM), and a prediction module. To demonstrate the superior performance of our model’s architecture, we compared multiple model frameworks with our method by analyzing the training data and independent testing data. Additionally, we compared our model with the existing state-of-the-art computational methods, m6AmPred and MultiRM. The accuracy (ACC) for the DLm6Am model was improved by 6.45% and 8.42% compared to that of m6AmPred and MultiRM on independent testing data, respectively, while the area under receiver operating characteristic curve (AUROC) for the DLm6Am model was increased by 4.28% and 5.75%, respectively. All the results indicate that DLm6Am achieved the best prediction performance in terms of ACC, Matthews correlation coefficient (MCC), AUROC, and the area under precision and recall curves (AUPR). To further assess the generalization performance of our proposed model, we implemented chromosome-level leave-out cross-validation, and found that the obtained AUROC values were greater than 0.83, indicating that our proposed method is robust and can accurately predict m6Am sites.
Collapse
|
9
|
Fan G, Song W, Guan Z, Zhang W, Lu X. Some novel features of strong promoters discovered in Cytophaga hutchinsonii. Appl Microbiol Biotechnol 2022; 106:2529-2540. [PMID: 35318522 DOI: 10.1007/s00253-022-11869-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 02/25/2022] [Accepted: 03/05/2022] [Indexed: 11/28/2022]
Abstract
Cytophaga hutchinsonii is an important Gram-negative bacterium belonging to the Bacteroides phylum that can efficiently degrade cellulose. But the promoter that mediates the initiation of gene transcription has been unknown for a long time. In this study, we determined the transcription start site (TSS) of C. hutchinsonii by 5' rapid amplification of cDNA ends (5'RACE). The promoter structure was first identified as TAAT and TATTG which are located -5 and -31 bp upstream of TSS, respectively. The function of -5 and -31 regions and the spacer length of the promoter Pchu_1284 were explored by site directed ligase-independent mutagenesis (SLIM). The results showed that the promoter activities were sharply decreased when the TTG motif was mutated into guanine (G) or cytosine (C). Interestingly, we found that the strong promoter was accompanied with many TTTG motifs which could enhance the promoter activities within certain copies. These characteristics were different from other promoters of Bacteriodes species. Furthermore, we carried out genome scanning analysis for C. hutchinsonii and another Bacteroides species by Perl6.0. The results indicated that the promoter structure of C. hutchinsonii possessed more unique features than other species. Also, the screened inducible promoter Pchu_2268 was used to overexpress protein CHU_2196 with a molecular weight of 120 kDa in C. hutchinsonii. The present study enriched the promoter structure of Bacteroidetes species and also provided a novel method for the highly expressed large protein (cellulase) in vivo, which was helpful to elucidate the unique cellulose degradation mechanism of C. hutchinsonii.Key points• The conserved structure of strong promoter of C. hutchinsonii was elucidated.• Two novel regulation motifs of TTTG and AATTATG in the promoter were discovered.• A new method for induced expression of cellulase in vivo was established.• Helpful for explained the unique cellulose degradation mechanism of C. hutchinsonii.
Collapse
Affiliation(s)
- Guoqing Fan
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, 266200, China
| | - Wenxia Song
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, 266200, China
| | - Zhiwei Guan
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, 266200, China.,School of Life Science, Qilu Normal University, Jinan, 250200, China
| | - Weican Zhang
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, 266200, China
| | - Xuemei Lu
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, 266200, China.
| |
Collapse
|
10
|
Qiao H, Zhang S, Xue T, Wang J, Wang B. iPro-GAN: A novel model based on generative adversarial learning for identifying promoters and their strength. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 215:106625. [PMID: 35038653 DOI: 10.1016/j.cmpb.2022.106625] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 12/13/2021] [Accepted: 01/06/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND AND OBJECTIVE Promoter is a component of the gene, which can specifically bind with RNA polymerase and determine where transcription starts, and also determine the transcription efficiency of the gene. Promoters can be divided into strong promoters and weak promoters because their structures and the interaction time interval are quite different. The functional variation of the promoter can lead to a variety of diseases. Therefore, identifying promoters and their strength is necessary and has important biological significance. A novel and promising model based on deep learning is proposed to achieve it. METHODS In this work, we build a power model named iPro-GAN for identification of promoters and their strength. First, we collect benchmark datasets and independent datasets for training and testing. Then, Moran-based spatial auto-cross correlation method is used as feature extraction method. Finally, deep convolution generative adversarial network with 10-fold cross validation is applied for classifying. The first layer of the model is used to identify the promoter and the second layer is used to determine its type. RESULTS On the benchmark data set, the accuracy of the first layer predictor is 93.15%, and the accuracy of the second layer predictor is 92.30%. On the independent data set, the accuracy of the first layer predictor is 86.77%, and the accuracy of the second layer predictor is 91.66%. In particular, breakthrough progress has been made in the identification of promoters' strength. CONCLUSIONS These results are far higher than the existing best predictor, which indicate that our model is serviceable and practicable to identify promoters and their strength. Furthermore, the datasets and source codes are available from this link: https://github.com/Bovbene/iPro-GAN.
Collapse
Affiliation(s)
- Huijuan Qiao
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China.
| | - Tian Xue
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Jinyue Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Bowei Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| |
Collapse
|
11
|
Casa PL, de Abreu FP, Benvenuti JL, Martinez GS, de Avila e Silva S. Beyond consensual motifs: an analysis of DNA curvature within Escherichia coli promoters. Biologia (Bratisl) 2022. [DOI: 10.1007/s11756-021-00999-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
12
|
Zhang M, Jia C, Li F, Li C, Zhu Y, Akutsu T, Webb GI, Zou Q, Coin LJM, Song J. Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction. Brief Bioinform 2022; 23:6502561. [PMID: 35021193 PMCID: PMC8921625 DOI: 10.1093/bib/bbab551] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/12/2021] [Accepted: 11/30/2021] [Indexed: 01/13/2023] Open
Abstract
Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.
Collapse
Affiliation(s)
| | - Cangzhi Jia
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | | | | | | | | | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Monash University, Melbourne, VIC 3800, Australia,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Quan Zou
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Lachlan J M Coin
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| | - Jiangning Song
- Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
| |
Collapse
|
13
|
Deep N-terminomics of Mycobacterium tuberculosis H37Rv extensively correct annotated encoding genes. Genomics 2021; 114:292-304. [PMID: 34915127 DOI: 10.1016/j.ygeno.2021.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 11/28/2021] [Accepted: 12/09/2021] [Indexed: 11/24/2022]
Abstract
Mycobacterium tuberculosis (MTB) is a severe causing agent of tuberculosis (TB). Although H37Rv, the type strain of M. tuberculosis was sequenced in 1998, annotation errors of encoding genes have been frequently reported in hundreds of papers. This phenomenon is particularly severe at the 5' end of the genes. Here, we applied a TMPP [(N-Succinimidyloxycarbonylmethyl) tris (2,4,6-trimethoxyphenyl) phosphonium bromide] labeling combined with StageTip separating strategy on M. tuberculosis H37Rv to characterize the N-terminal start sites of its annotated encoding genes. Totally, 1047 proteins were identified with 2058 TMPP labeled N-terminal peptides from all the 2625 mass spectrometer (MS) sequenced proteins. Comparative genomics analysis allowed the re-annotation of 43 proteins' N-termini in H37Rv and 762 proteins in Mycobacteriaceae. All revised N-termini start sites were distributed in 5'-UTR of annotated genes due to over-annotation of previous N-terminal initiation codon, especially the ATG. In addition, we identified and verified a novel gene Rv1078A in +3 frame different from the annotated gene Rv1078 in +2 frame. Altogether, our findings contribute to the better understanding of N-terminal of H37Rv and other species from Mycobacteriaceae that can assist future studies on biological study.
Collapse
|
14
|
Chevez-Guardado R, Peña-Castillo L. Promotech: a general tool for bacterial promoter recognition. Genome Biol 2021; 22:318. [PMID: 34789306 PMCID: PMC8597233 DOI: 10.1186/s13059-021-02514-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 10/11/2021] [Indexed: 12/14/2022] Open
Abstract
Promoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compare Promotech's performance with the performance of five other promoter prediction methods. Promotech outperforms these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech .
Collapse
Affiliation(s)
- Ruben Chevez-Guardado
- Department of Computer Science, Memorial University of Newfoundland, 230 Elizabeth Ave, St. John's, Newfoundland, A1C 5S7, Canada
| | - Lourdes Peña-Castillo
- Department of Computer Science, Memorial University of Newfoundland, 230 Elizabeth Ave, St. John's, Newfoundland, A1C 5S7, Canada. .,Department of Biology, Memorial University of Newfoundland, 230 Elizabeth Ave, St. John's, Newfoundland, A1C 5S7, Canada.
| |
Collapse
|
15
|
Al Jaseem MAJ, Abdullah KM, Qais FA, Shamsi A, Naseem I. Mechanistic insight into glycation inhibition of human serum albumin by vitamin B9: Multispectroscopic and molecular docking approach. Int J Biol Macromol 2021; 181:426-434. [PMID: 33775768 DOI: 10.1016/j.ijbiomac.2021.03.153] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 03/09/2021] [Accepted: 03/23/2021] [Indexed: 10/21/2022]
Abstract
Advanced glycation end products (AGEs) formation produces free radicals that play a role in diabetes mellitus; hence inhibition of glycation plays a part in minimizing diabetes-related complications. This study was intended to examine the AGEs formation of HSA upon prolonged incubation of 28 days at 37 °C and further investigate the antiglycation potential of folic acid (FA). FA shows a significant binding affinity to the HSA with a binding constant (K) of 104 M-1. The evaluation of enthalpy change (∆H0) and entropy change (∆So) implied that the HSA-FA complex is stabilized primarily by hydrophobic interaction and hydrogen bonding. Molecular docking analysis depicted that FA binds with HSA in subdomain IIA (Sudlow's site I) with a binding energy of -7.0 kcal mol-1. AGEs were characterized by free lysine and thiol groups, carbonyl content, and AGEs specific fluorescence. The presence of FA significantly decreased glycation from free lysine and carbonyl content estimation and AGEs specific fluorescence. Multispectroscopic observations and molecular docking and examination of various biomarkers demonstrate the antiglycation activity of FA and its capacity to prevent disease progression in diabetes.
Collapse
Affiliation(s)
| | - K M Abdullah
- Department of Biochemistry, Jain University, Bengaluru, India
| | - Faizan Abul Qais
- Department of Agricultural Microbiology, Aligarh Muslim University, India
| | - Anas Shamsi
- Centre of Medical and Bio-Allied Health Sciences Research, Ajman University, United Arab Emirates; Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India
| | - Imrana Naseem
- Department of Biochemistry, F/O Life Sciences, Aligarh Muslim University, India.
| |
Collapse
|
16
|
Coppens L, Lavigne R. SAPPHIRE: a neural network based classifier for σ70 promoter prediction in Pseudomonas. BMC Bioinformatics 2020; 21:415. [PMID: 32962628 PMCID: PMC7510298 DOI: 10.1186/s12859-020-03730-z] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 08/31/2020] [Indexed: 11/23/2022] Open
Abstract
Background In silico promoter prediction represents an important challenge in bioinformatics as it provides a first-line approach to identifying regulatory elements to support wet-lab experiments. Historically, available promoter prediction software have focused on sigma factor-associated promoters in the model organism E. coli. As a consequence, traditional promoter predictors yield suboptimal predictions when applied to other prokaryotic genera, such as Pseudomonas, a Gram-negative bacterium of crucial medical and biotechnological importance. Results We developed SAPPHIRE, a promoter predictor for σ70 promoters in Pseudomonas. This promoter prediction relies on an artificial neural network that evaluates sequences on their similarity to the − 35 and − 10 boxes of σ70 promoters found experimentally in P. aeruginosa and P. putida. SAPPHIRE currently outperforms established predictive software when classifying Pseudomonas σ70 promoters and was built to allow further expansion in the future. Conclusions SAPPHIRE is the first predictive tool for bacterial σ70 promoters in Pseudomonas. SAPPHIRE is free, publicly available and can be accessed online at www.biosapphire.com. Alternatively, users can download the tool as a Python 3 script for local application from this site.
Collapse
Affiliation(s)
- Lucas Coppens
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 21, Box 2462, 3001, Leuven, Belgium
| | - Rob Lavigne
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 21, Box 2462, 3001, Leuven, Belgium.
| |
Collapse
|
17
|
Chen YL, Guo DH, Li QZ. An energy model for recognizing the prokaryotic promoters based on molecular structure. Genomics 2019; 112:2072-2079. [PMID: 31809797 DOI: 10.1016/j.ygeno.2019.12.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 11/06/2019] [Accepted: 12/01/2019] [Indexed: 11/19/2022]
Abstract
Promoter is an important functional elements of DNA sequences, which is in charge of gene transcription initiation. Recognizing promoter have important help for understanding the relative life phenomena. Based on the concept that promoter is mainly determined by its sequence and structure, a novel statistical physics model for predicting promoter in Escherichia coli K-12 is proposed. The total energies of DNA local structure of sequence segments in the three benchmark promoter sequence datasets, the sole prediction parameter, are calculated by using principles from statistical physics and information theory. The better results are obtained. And a web-server PhysMPrePro for predicting promoter is established at http://202.207.14.87:8032/bioinformation/PhysMPrePro/index.asp, so that other scientists can easily get their desired results by our web-server.
Collapse
Affiliation(s)
- Ying-Li Chen
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China; The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University, Hohhot 010070, China.
| | - Dong-Hua Guo
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Qian-Zhong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China; The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University, Hohhot 010070, China.
| |
Collapse
|
18
|
Silva JCF, Teixeira RM, Silva FF, Brommonschenkel SH, Fontes EPB. Machine learning approaches and their current application in plant molecular biology: A systematic review. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2019; 284:37-47. [PMID: 31084877 DOI: 10.1016/j.plantsci.2019.03.020] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 02/28/2019] [Accepted: 03/26/2019] [Indexed: 05/19/2023]
Abstract
Machine learning (ML) is a field of artificial intelligence that has rapidly emerged in molecular biology, thus allowing the exploitation of Big Data concepts in plant genomics. In this context, the main challenges are given in terms of how to analyze massive datasets and extract new knowledge in all levels of cellular systems research. In summary, ML techniques allow complex interactions to be inferred in several biological systems. Despite its potential, ML has been underused due to complex computational algorithms and definition terms. Therefore, a systematic review to disentangle ML approaches is relevant for plant scientists and has been considered in this study. We presented the main steps for ML development (from data selection to evaluation of classification/prediction models) with a respective discussion approaching functional genomics mainly in terms of pathogen effector genes in plant immunity. Additionally, we also considered how to access public source databases under an ML framework towards advancing plant molecular biology and introduced novel powerful tools, such as deep learning.
Collapse
Affiliation(s)
- Jose Cleydson F Silva
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Department of Biochemistry and Molecular Biology/Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Ruan M Teixeira
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Department of Biochemistry and Molecular Biology/Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Fabyano F Silva
- Department of Animal Science, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Sergio H Brommonschenkel
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Plant Pathology Department /Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Elizabeth P B Fontes
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Department of Biochemistry and Molecular Biology/Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil.
| |
Collapse
|
19
|
Xiao X, Xu ZC, Qiu WR, Wang P, Ge HT, Chou KC. iPSW(2L)-PseKNC: A two-layer predictor for identifying promoters and their strength by hybrid features via pseudo K-tuple nucleotide composition. Genomics 2018; 111:1785-1793. [PMID: 30529532 DOI: 10.1016/j.ygeno.2018.12.001] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 11/20/2018] [Accepted: 12/04/2018] [Indexed: 12/20/2022]
Abstract
The promoter is a regulatory DNA region about 81-1000 base pairs long, usually located near the transcription start site (TSS) along upstream of a given gene. By combining a certain protein called transcription factor, the promoter provides the starting point for regulated gene transcription, and hence plays a vitally important role in gene transcriptional regulation. With explosive growth of DNA sequences in the post-genomic age, it has become an urgent challenge to develop computational method for effectively identifying promoters because the information thus obtained is very useful for both basic research and drug development. Although some prediction methods were developed in this regard, most of them were limited at merely identifying whether a query DNA sequence being of a promoter or not. However, based on their strength-distinct levels for transcriptional activation and expression, promoter should be divided into two categories: strong and weak types. Here a new two-layer predictor, called "iPSW(2L)-PseKNC", was developed by fusing the physicochemical properties of nucleotides and their nucleotide density into PseKNC (pseudo K-tuple nucleotide composition). Its 1st-layer serves to predict whether a query DNA sequence sample is of promoter or not, while its 2nd-layer is able to predict the strength of promoters. It has been observed through rigorous cross-validations that the 1st-layer sub-predictor is remarkably superior to the existing state-of-the-art predictors in identifying the promoters and non-promoters, and that the 2nd-layer sub-predictor can do what is beyond the reach of the existing predictors. Moreover, the web-server for iPSW(2L)-PseKNC has been established at http://www.jci-bioinfo.cn/iPSW(2L)-PseKNC, by which the majority of experimental scientists can easily get the results they need.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.
| | - Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA
| | - Peng Wang
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Hui-Ting Ge
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
20
|
Di Salvo M, Pinatel E, Talà A, Fondi M, Peano C, Alifano P. G4PromFinder: an algorithm for predicting transcription promoters in GC-rich bacterial genomes based on AT-rich elements and G-quadruplex motifs. BMC Bioinformatics 2018; 19:36. [PMID: 29409441 PMCID: PMC5801747 DOI: 10.1186/s12859-018-2049-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 01/29/2018] [Indexed: 11/10/2022] Open
Abstract
Background Over the last few decades, computational genomics has tremendously contributed to decipher biology from genome sequences and related data. Considerable effort has been devoted to the prediction of transcription promoter and terminator sites that represent the essential “punctuation marks” for DNA transcription. Computational prediction of promoters in prokaryotes is a problem whose solution is far from being determined in computational genomics. The majority of published bacterial promoter prediction tools are based on a consensus-sequences search and they were designed specifically for vegetative σ70 promoters and, therefore, not suitable for promoter prediction in bacteria encoding a lot of σ factors, like actinomycetes. Results In this study we investigated the possibility to identify putative promoters in prokaryotes based on evolutionarily conserved motifs, and focused our attention on GC-rich bacteria in which promoter prediction with conventional, consensus-based algorithms is often not-exhaustive. Here, we introduce G4PromFinder, a novel algorithm that predicts putative promoters based on AT-rich elements and G-quadruplex DNA motifs. We tested its performances by using available genomic and transcriptomic data of the model microorganisms Streptomyces coelicolor A3(2) and Pseudomonas aeruginosa PA14. We compared our results with those obtained by three currently available promoter predicting algorithms: the σ70consensus-based PePPER, the σ factors consensus-based bTSSfinder, and PromPredict which is based on double-helix DNA stability. Our results demonstrated that G4PromFinder is more suitable than the three reference tools for both the genomes. In fact our algorithm achieved the higher accuracy (F1-scores 0.61 and 0.53 in the two genomes) as compared to the next best tool that is PromPredict (F1-scores 0.46 and 0.48). Consensus-based algorithms produced lower performances with the analyzed GC-rich genomes. Conclusions Our analysis shows that G4PromFinder is a powerful tool for promoter search in GC-rich bacteria, especially for bacteria coding for a lot of σ factors, such as the model microorganism S. coelicolor A3(2). Moreover consensus-based tools and, in general, tools that are based on specific features of bacterial σ factors seem to be less performing for promoter prediction in these types of bacterial genomes. Electronic supplementary material The online version of this article (10.1186/s12859-018-2049-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marco Di Salvo
- Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy
| | - Eva Pinatel
- Institute of Biomedical Technologies National Research Council, Milan, Segrate, Italy
| | - Adelfia Talà
- Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy
| | - Marco Fondi
- Department of Biology, University of Florence, Florence, Italy
| | - Clelia Peano
- Institute of Genetic and Biomedical Research (IRGB), UOS of Milan, National Research Council, Milan, Italy.,Humanitas Clinical and Research Center, Milan, Rozzano, Italy
| | - Pietro Alifano
- Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy.
| |
Collapse
|
21
|
Shahmuradov IA, Mohamad Razali R, Bougouffa S, Radovanovic A, Bajic VB. bTSSfinder: a novel tool for the prediction of promoters in cyanobacteria and Escherichia coli. Bioinformatics 2017; 33:334-340. [PMID: 27694198 PMCID: PMC5408793 DOI: 10.1093/bioinformatics/btw629] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 09/27/2016] [Indexed: 12/01/2022] Open
Abstract
Motivation The computational search for promoters in prokaryotes remains an attractive problem in bioinformatics. Despite the attention it has received for many years, the problem has not been addressed satisfactorily. In any bacterial genome, the transcription start site is chosen mostly by the sigma (σ) factor proteins, which control the gene activation. The majority of published bacterial promoter prediction tools target σ70 promoters in Escherichia coli. Moreover, no σ-specific classification of promoters is available for prokaryotes other than for E. coli. Results Here, we introduce bTSSfinder, a novel tool that predicts putative promoters for five classes of σ factors in Cyanobacteria (σA, σC, σH, σG and σF) and for five classes of sigma factors in E. coli (σ70, σ38, σ32, σ28 and σ24). Comparing to currently available tools, bTSSfinder achieves higher accuracy (MCC = 0.86, F1-score = 0.93) compared to the next best tool with MCC = 0.59, F1-score = 0.79) and covers multiple classes of promoters. Availability and Implementation bTSSfinder is available standalone and online at http://www.cbrc.kaust.edu.sa/btssfinder. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ilham Ayub Shahmuradov
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Rozaimi Mohamad Razali
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Salim Bougouffa
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Aleksandar Radovanovic
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Vladimir B Bajic
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), 4700 King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| |
Collapse
|
22
|
Shahmuradov IA, Umarov RK, Solovyev VV. TSSPlant: a new tool for prediction of plant Pol II promoters. Nucleic Acids Res 2017; 45:e65. [PMID: 28082394 PMCID: PMC5416875 DOI: 10.1093/nar/gkw1353] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Revised: 12/16/2016] [Accepted: 12/27/2016] [Indexed: 11/22/2022] Open
Abstract
Our current knowledge of eukaryotic promoters indicates their complex architecture that is often composed of numerous functional motifs. Most of known promoters include multiple and in some cases mutually exclusive transcription start sites (TSSs). Moreover, TSS selection depends on cell/tissue, development stage and environmental conditions. Such complex promoter structures make their computational identification notoriously difficult. Here, we present TSSPlant, a novel tool that predicts both TATA and TATA-less promoters in sequences of a wide spectrum of plant genomes. The tool was developed by using large promoter collections from ppdb and PlantProm DB. It utilizes eighteen significant compositional and signal features of plant promoter sequences selected in this study, that feed the artificial neural network-based model trained by the backpropagation algorithm. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA promoters (MCC≃0.84 and F1-score≃0.91 versus MCC≃0.51 and F1-score≃0.71) and TATA-less promoters (MCC≃0.80, F1-score≃0.89 versus MCC≃0.29 and F1-score≃0.50). TSSPlant is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/.
Collapse
Affiliation(s)
- Ilham A. Shahmuradov
- King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
- Institue of Molecular Biology and Biotechnologies, ANAS, 2 Matbuat strasse, Baku AZ1073, Azerbaijan
| | - Ramzan Kh. Umarov
- King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | | |
Collapse
|
23
|
Kumar A, Bansal M. Unveiling DNA structural features of promoters associated with various types of TSSs in prokaryotic transcriptomes and their role in gene expression. DNA Res 2017; 24:25-35. [PMID: 27803028 PMCID: PMC5381344 DOI: 10.1093/dnares/dsw045] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/23/2016] [Indexed: 01/28/2023] Open
Abstract
Next-generation sequencing studies have revealed that a variety of transcripts are present in the prokaryotic transcriptome and a significant fraction of them are functional, being involved in various regulatory activities apart from coding for proteins. Identification of promoters associated with different transcripts is necessary for characterization of the transcriptome. Promoter regions have been shown to have unique structural features as compared with their flanking region, in organisms covering all domains of life. Here we report an in silico analysis of DNA sequence dependent structural properties like stability, bendability and curvature in the promoter region of six different prokaryotic transcriptomes. Using these structural features, we predicted promoters associated with different categories of transcripts (mRNA, internal, antisense and non-coding), which constitute the transcriptome. Promoter annotation using structural features is fairly accurate and reliable with about 50% of the primary promoters being characterized by all three structural properties while at least one property identifies 95%. We also studied the relative differences of these structural features in terms of gene expression and found that the features, viz. lower stability, lesser bendability and higher curvature are more prominent in the promoter regions which are associated with high gene expression as compared with low expression genes. Hence, promoters, which are associated with higher gene expression, get annotated well using DNA structural features as compared with those, which are linked to lower gene expression.
Collapse
Affiliation(s)
| | - Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012 Karnataka, India
| |
Collapse
|
24
|
Lloréns-Rico V, Lluch-Senar M, Serrano L. Distinguishing between productive and abortive promoters using a random forest classifier in Mycoplasma pneumoniae. Nucleic Acids Res 2015; 43:3442-53. [PMID: 25779052 PMCID: PMC4402517 DOI: 10.1093/nar/gkv170] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 02/22/2015] [Indexed: 12/01/2022] Open
Abstract
Distinguishing between promoter-like sequences in bacteria that belong to true or abortive promoters, or to those that do not initiate transcription at all, is one of the important challenges in transcriptomics. To address this problem, we have studied the genome-reduced bacterium Mycoplasma pneumoniae, for which the RNAs associated with transcriptional start sites have been recently experimentally identified. We determined the contribution to transcription events of different genomic features: the –10, extended –10 and –35 boxes, the UP element, the bases surrounding the –10 box and the nearest-neighbor free energy of the promoter region. Using a random forest classifier and the aforementioned features transformed into scores, we could distinguish between true, abortive promoters and non-promoters with good –10 box sequences. The methods used in this characterization of promoters can be extended to other bacteria and have important applications for promoter design in bacterial genome engineering.
Collapse
Affiliation(s)
- Verónica Lloréns-Rico
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Maria Lluch-Senar
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain
| | - Luis Serrano
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr Aiguader 88, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
25
|
Soltani S, Askari H, Ejlali N, Aghdam R. The structural properties of DNA regulate gene expression. MOLECULAR BIOSYSTEMS 2014; 10:273-80. [PMID: 24281302 DOI: 10.1039/c3mb70311h] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Regulatory sequences such as promoters not only contain cis-regulatory elements as switches of transcription, but also exhibit particular topological features. In this paper, we introduce a systematic genome scale approach to characterize the roles of structural conformation and stability profile of promoter sequence in gene expression. The average free energy of promoter dinucleotides stacking nearest neighbors are subjected to scrutiny by statistical hidden Markov models to reveal the function of constrains and properties of promoter structure in transcription. When applied for a 1000 bp 5' upstream sequence of genes, the proposed model via assessing free energy profile identified co-expressed genes of Arabidopsis thaliana in response to the auxin hormone. The applied perspective dynamic network which mediates transcription regulation provides a great hindrance to conceive how DNA conformation interacts with cis-regulatory elements, chromatin structure and many other factors. This study indeed drew the complexity of the promoter's regulatory behavior from sequence over the former studies and evokes a new hypothesis to be validated experimentally.
Collapse
Affiliation(s)
- Sattar Soltani
- Department of Biotechnology, Faculty of New Technologies Engineering, Shahid Beheshti University, G. C., Tehran, Iran.
| | | | | | | |
Collapse
|
26
|
Meysman P, Collado-Vides J, Morett E, Viola R, Engelen K, Laukens K. Structural properties of prokaryotic promoter regions correlate with functional features. PLoS One 2014; 9:e88717. [PMID: 24516674 PMCID: PMC3918002 DOI: 10.1371/journal.pone.0088717] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Accepted: 01/10/2014] [Indexed: 12/31/2022] Open
Abstract
The structural properties of the DNA molecule are known to play a critical role in transcription. In this paper, the structural profiles of promoter regions were studied within the context of their diversity and their function for eleven prokaryotic species; Escherichia coli, Klebsiella pneumoniae, Salmonella Typhimurium, Pseudomonas auroginosa, Geobacter sulfurreducens Helicobacter pylori, Chlamydophila pneumoniae, Synechocystis sp., Synechoccocus elongates, Bacillus anthracis, and the archaea Sulfolobus solfataricus. The main anchor point for these promoter regions were transcription start sites identified through high-throughput experiments or collected within large curated databases. Prokaryotic promoter regions were found to be less stable and less flexible than the genomic mean across all studied species. However, direct comparison between species revealed differences in their structural profiles that can not solely be explained by the difference in genomic GC content. In addition, comparison with functional data revealed that there are patterns in the promoter structural profiles that can be linked to specific functional loci, such as sigma factor regulation or transcription factor binding. Interestingly, a novel structural element clearly visible near the transcription start site was found in genes associated with essential cellular functions and growth in several species. Our analyses reveals the great diversity in promoter structural profiles both between and within prokaryotic species. We observed relationships between structural diversity and functional features that are interesting prospects for further research to yet uncharacterized functional loci defined by DNA structural properties.
Collapse
Affiliation(s)
- Pieter Meysman
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Julio Collado-Vides
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Enrique Morett
- Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
- Instituto Nacional de Medicina Genómica, Mexico City, Mexico
| | - Roberto Viola
- Department of Computational Biology, Fondazione Edmund Mach, San Michele all’Adige, Trento, Italy
| | - Kristof Engelen
- Department of Computational Biology, Fondazione Edmund Mach, San Michele all’Adige, Trento, Italy
- * E-mail: (KE); (KL)
| | - Kris Laukens
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
- * E-mail: (KE); (KL)
| |
Collapse
|
27
|
Bansal M, Kumar A, Yella VR. Role of DNA sequence based structural features of promoters in transcription initiation and gene expression. Curr Opin Struct Biol 2014; 25:77-85. [PMID: 24503515 DOI: 10.1016/j.sbi.2014.01.007] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Accepted: 01/07/2014] [Indexed: 11/18/2022]
Abstract
Regulatory information for transcription initiation is present in a stretch of genomic DNA, called the promoter region that is located upstream of the transcription start site (TSS) of the gene. The promoter region interacts with different transcription factors and RNA polymerase to initiate transcription and contains short stretches of transcription factor binding sites (TFBSs), as well as structurally unique elements. Recent experimental and computational analyses of promoter sequences show that they often have non-B-DNA structural motifs, as well as some conserved structural properties, such as stability, bendability, nucleosome positioning preference and curvature, across a class of organisms. Here, we briefly describe these structural features, the differences observed in various organisms and their possible role in regulation of gene expression.
Collapse
Affiliation(s)
- Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.
| | - Aditya Kumar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
| | | |
Collapse
|
28
|
Kumari S, Ware D. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots. PLoS One 2013; 8:e79011. [PMID: 24205361 PMCID: PMC3812177 DOI: 10.1371/journal.pone.0079011] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 09/18/2013] [Indexed: 01/22/2023] Open
Abstract
Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays) and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max) reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS) exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica) and dicot (A. thaliana) genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription.
Collapse
Affiliation(s)
- Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
- United States Department of Agriculture-Agriculture Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, New York, United States of America
| |
Collapse
|
29
|
Thomas M, Lange-Grünweller K, Hartmann D, Golde L, Schlereth J, Streng D, Aigner A, Grünweller A, Hartmann RK. Analysis of transcriptional regulation of the human miR-17-92 cluster; evidence for involvement of Pim-1. Int J Mol Sci 2013; 14:12273-96. [PMID: 23749113 PMCID: PMC3709785 DOI: 10.3390/ijms140612273] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2013] [Revised: 05/14/2013] [Accepted: 05/22/2013] [Indexed: 01/07/2023] Open
Abstract
The human polycistronic miRNA cluster miR-17-92 is frequently overexpressed in hematopoietic malignancies and cancers. Its transcription is in part controlled by an E2F-regulated host gene promoter. An intronic A/T-rich region directly upstream of the miRNA coding region also contributes to cluster expression. Our deletion analysis of the A/T-rich region revealed a strong dependence on c-Myc binding to the functional E3 site. Yet, constructs lacking the 5′-proximal ~1.3 kb or 3′-distal ~0.1 kb of the 1.5 kb A/T-rich region still retained residual specific promoter activity, suggesting multiple transcription start sites (TSS) in this region. Furthermore, the protooncogenic kinase, Pim-1, its phosphorylation target HP1γ and c-Myc colocalize to the E3 region, as inferred from chromatin immunoprecipitation. Analysis of pri-miR-17-92 expression levels in K562 and HeLa cells revealed that silencing of E2F3, c-Myc or Pim-1 negatively affects cluster expression, with a synergistic effect caused by c-Myc/Pim-1 double knockdown in HeLa cells. Thus, we show, for the first time, that the protooncogene Pim-1 is part of the network that regulates transcription of the human miR-17-92 cluster.
Collapse
Affiliation(s)
- Maren Thomas
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, 35032 Marburg, Germany; E-Mails: (M.T.); (K.L.-G.); (D.H.); (L.G.); (J.S.); (D.S.)
| | - Kerstin Lange-Grünweller
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, 35032 Marburg, Germany; E-Mails: (M.T.); (K.L.-G.); (D.H.); (L.G.); (J.S.); (D.S.)
| | - Dorothee Hartmann
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, 35032 Marburg, Germany; E-Mails: (M.T.); (K.L.-G.); (D.H.); (L.G.); (J.S.); (D.S.)
| | - Lara Golde
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, 35032 Marburg, Germany; E-Mails: (M.T.); (K.L.-G.); (D.H.); (L.G.); (J.S.); (D.S.)
| | - Julia Schlereth
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, 35032 Marburg, Germany; E-Mails: (M.T.); (K.L.-G.); (D.H.); (L.G.); (J.S.); (D.S.)
| | - Dennis Streng
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, 35032 Marburg, Germany; E-Mails: (M.T.); (K.L.-G.); (D.H.); (L.G.); (J.S.); (D.S.)
| | - Achim Aigner
- Medizinische Fakultät, Rudolf-Boehm-Institut für Pharmakologie und Toxikologie, Klinische Pharmakologie, Universität Leipzig, 04107 Leipzig, Germany; E-Mail:
| | - Arnold Grünweller
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, 35032 Marburg, Germany; E-Mails: (M.T.); (K.L.-G.); (D.H.); (L.G.); (J.S.); (D.S.)
- Authors to whom correspondence should be addressed; E-Mails: (A.G.); (R.K.H.); Tel.: +49-6421-28-25553 (R.K.H.); Fax: +49-6421-28-25854 (R.K.H.)
| | - Roland K. Hartmann
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, 35032 Marburg, Germany; E-Mails: (M.T.); (K.L.-G.); (D.H.); (L.G.); (J.S.); (D.S.)
- Authors to whom correspondence should be addressed; E-Mails: (A.G.); (R.K.H.); Tel.: +49-6421-28-25553 (R.K.H.); Fax: +49-6421-28-25854 (R.K.H.)
| |
Collapse
|
30
|
Meysman P, Marchal K, Engelen K. DNA structural properties in the classification of genomic transcription regulation elements. Bioinform Biol Insights 2012; 6:155-68. [PMID: 22837642 PMCID: PMC3399529 DOI: 10.4137/bbi.s9426] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
It has been long known that DNA molecules encode information at various levels. The most basic level comprises the base sequence itself and is primarily important for the encoding of proteins and direct base recognition by DNA-binding proteins. A more elusive level consists of the local structural properties of the DNA molecule wherein the DNA sequence only plays an indirect supportive role. These properties are nevertheless an important factor in a large number of biomolecular processes and can be considered as informative signals for the presence of a variety of genomic features. Several recent studies have unequivocally shown the benefit of relying on such DNA properties for modeling and predicting genomic features as diverse as transcription start sites, transcription factor binding sites, or nucleosome occupancy. This review is meant to provide an overview of the key aspects of these DNA conformational and physicochemical properties. To illustrate their potential added value compared to relying solely on the nucleotide sequence in genomics studies, we discuss their application in research on transcription regulation mechanisms as representative cases.
Collapse
Affiliation(s)
- Pieter Meysman
- Department of Molecular and Microbial Systems, KULeuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | | | | |
Collapse
|
31
|
Rangannan V, Bansal M. PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes. BMC Res Notes 2011; 4:257. [PMID: 21781326 PMCID: PMC3160392 DOI: 10.1186/1756-0500-4-257] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Accepted: 07/22/2011] [Indexed: 12/19/2022] Open
Abstract
Background As more and more genomes are being sequenced, an overview of their genomic features and annotation of their functional elements, which control the expression of each gene or transcription unit of the genome, is a fundamental challenge in genomics and bioinformatics. Findings Relative stability of DNA sequence has been used to predict promoter regions in 913 microbial genomic sequences with GC-content ranging from 16.6% to 74.9%. Irrespective of the genome GC-content the relative stability based promoter prediction method has already been proven to be robust in terms of recall and precision. The predicted promoter regions for the 913 microbial genomes have been accumulated in a database called PromBase. Promoter search can be carried out in PromBase either by specifying the gene name or the genomic position. Each predicted promoter region has been assigned to a reliability class (low, medium, high, very high and highest) based on the difference between its average free energy and the downstream region. The recall and precision values for each class are shown graphically in PromBase. In addition, PromBase provides detailed information about base composition, CDS and CG/TA skews for each genome and various DNA sequence dependent structural properties (average free energy, curvature and bendability) in the vicinity of all annotated translation start sites (TLS). Conclusion PromBase is a database, which contains predicted promoter regions and detailed analysis of various genomic features for 913 microbial genomes. PromBase can serve as a valuable resource for comparative genomics study and help the experimentalist to rapidly access detailed information on various genomic features and putative promoter regions in any given genome. This database is freely accessible for academic and non- academic users via the worldwide web http://nucleix.mbu.iisc.ernet.in/prombase/.
Collapse
Affiliation(s)
- Vetriselvi Rangannan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560 012, India.
| | | |
Collapse
|
32
|
Morey C, Mookherjee S, Rajasekaran G, Bansal M. DNA free energy-based promoter prediction and comparative analysis of Arabidopsis and rice genomes. PLANT PHYSIOLOGY 2011; 156:1300-15. [PMID: 21531900 PMCID: PMC3135951 DOI: 10.1104/pp.110.167809] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Accepted: 04/21/2011] [Indexed: 05/06/2023]
Abstract
The cis-regulatory regions on DNA serve as binding sites for proteins such as transcription factors and RNA polymerase. The combinatorial interaction of these proteins plays a crucial role in transcription initiation, which is an important point of control in the regulation of gene expression. We present here an analysis of the performance of an in silico method for predicting cis-regulatory regions in the plant genomes of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) on the basis of free energy of DNA melting. For protein-coding genes, we achieve recall and precision of 96% and 42% for Arabidopsis and 97% and 31% for rice, respectively. For noncoding RNA genes, the program gives recall and precision of 94% and 75% for Arabidopsis and 95% and 90% for rice, respectively. Moreover, 96% of the false-positive predictions were located in noncoding regions of primary transcripts, out of which 20% were found in the first intron alone, indicating possible regulatory roles. The predictions for orthologous genes from the two genomes showed a good correlation with respect to prediction scores and promoter organization. Comparison of our results with an existing program for promoter prediction in plant genomes indicates that our method shows improved prediction capability.
Collapse
Affiliation(s)
| | | | | | - Manju Bansal
- Indian Institute of Science, Bangalore 560 012, India
| |
Collapse
|
33
|
Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci 2010; 130:91-100. [DOI: 10.1007/s12064-010-0114-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Accepted: 10/23/2010] [Indexed: 12/27/2022]
|
34
|
Rangannan V, Bansal M. High-quality annotation of promoter regions for 913 bacterial genomes. ACTA ACUST UNITED AC 2010; 26:3043-50. [PMID: 20956245 DOI: 10.1093/bioinformatics/btq577] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
MOTIVATION The number of bacterial genomes being sequenced is increasing very rapidly and hence, it is crucial to have procedures for rapid and reliable annotation of their functional elements such as promoter regions, which control the expression of each gene or each transcription unit of the genome. The present work addresses this requirement and presents a generic method applicable across organisms. RESULTS Relative stability of the DNA double helical sequences has been used to discriminate promoter regions from non-promoter regions. Based on the difference in stability between neighboring regions, an algorithm has been implemented to predict promoter regions on a large scale over 913 microbial genome sequences. The average free energy values for the promoter regions as well as their downstream regions are found to differ, depending on their GC content. Threshold values to identify promoter regions have been derived using sequences flanking a subset of translation start sites from all microbial genomes and then used to predict promoters over the complete genome sequences. An average recall value of 72% (which indicates the percentage of protein and RNA coding genes with predicted promoter regions assigned to them) and precision of 56% is achieved over the 913 microbial genome dataset. AVAILABILITY The binary executable for 'PromPredict' algorithm (implemented in PERL and supported on Linux and MS Windows) and the predicted promoter data for all 913 microbial genomes are available at http://nucleix.mbu.iisc.ernet.in/prombase/.
Collapse
|
35
|
Bland C, Newsome AS, Markovets AA. Promoter prediction in E. coli based on SIDD profiles and Artificial Neural Networks. BMC Bioinformatics 2010; 11 Suppl 6:S17. [PMID: 20946600 PMCID: PMC3026364 DOI: 10.1186/1471-2105-11-s6-s17] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the major challenges in biology is the correct identification of promoter regions. Computational methods based on motif searching have been the traditional approach taken. Recent studies have shown that DNA structural properties, such as curvature, stacking energy, and stress-induced duplex destabilization (SIDD) are useful in promoter prediction, as well. In this paper, the currently used SIDD energy threshold method is compared to the proposed artificial neural network (ANN) approach for finding promoters based on SIDD profile data. RESULTS When compared to the SIDD threshold prediction method, artificial neural networks showed noticeable improvements for precision, recall, and F-score over a range of values. The maximal F-score for the ANN classifier was 62.3 and 56.8 for the threshold-based classifier. CONCLUSIONS Artificial neural networks were used to predict promoters based on SIDD profile data. Results using this technique were an improvement over the previous SIDD threshold approach. Over a wide range of precision-recall values, artificial neural networks were more capable of identifying distinctive characteristics of promoter regions than threshold based methods.
Collapse
Affiliation(s)
- Charles Bland
- Department Natural Sciences and Environmental Health, Mississippi Valley State University, 14000 Hwy 82 West, Itta Bena, Mississippi 38941, USA
| | | | | |
Collapse
|