1
|
Zaidi SSA, Kayani MUR, Zhang X, Ouyang Y, Shamsi IH. Prediction and analysis of metagenomic operons via MetaRon: a pipeline for prediction of Metagenome and whole-genome opeRons. BMC Genomics 2021; 22:60. [PMID: 33468056 PMCID: PMC7814594 DOI: 10.1186/s12864-020-07357-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 12/27/2020] [Indexed: 11/10/2022] Open
Abstract
Background Efficient regulation of bacterial genes in response to the environmental stimulus results in unique gene clusters known as operons. Lack of complete operonic reference and functional information makes the prediction of metagenomic operons a challenging task; thus, opening new perspectives on the interpretation of the host-microbe interactions. Results In this work, we identified whole-genome and metagenomic operons via MetaRon (Metagenome and whole-genome opeRon prediction pipeline). MetaRon identifies operons without any experimental or functional information. MetaRon was implemented on datasets with different levels of complexity and information. Starting from its application on whole-genome to simulated mixture of three whole-genomes (E. coli MG1655, Mycobacterium tuberculosis H37Rv and Bacillus subtilis str. 16), E. coli c20 draft genome extracted from chicken gut and finally on 145 whole-metagenome data samples from human gut. MetaRon consistently achieved high operon prediction sensitivity, specificity and accuracy across E. coli whole-genome (97.8, 94.1 and 92.4%), simulated genome (93.7, 75.5 and 88.1%) and E. coli c20 (87, 91 and 88%,), respectively. Finally, we identified 1,232,407 unique operons from 145 paired-end human gut metagenome samples. We also report strong association of type 2 diabetes with Maltose phosphorylase (K00691), 3-deoxy-D-glycero-D-galacto-nononate 9-phosphate synthase (K21279) and an uncharacterized protein (K07101). Conclusion With MetaRon, we were able to remove two notable limitations of existing whole-genome operon prediction methods: (1) generalizability (ability to predict operons in unrelated bacterial genomes), and (2) whole-genome and metagenomic data management. We also demonstrate the use of operons as a subset to represent the trends of secondary metabolites in whole-metagenome data and the role of secondary metabolites in the occurrence of disease condition. Using operonic data from metagenome to study secondary metabolic trends will significantly reduce the data volume to more precise data. Furthermore, the identification of metabolic pathways associated with the occurrence of type 2 diabetes (T2D) also presents another dimension of analyzing the human gut metagenome. Presumably, this study is the first organized effort to predict metagenomic operons and perform a detailed analysis in association with a disease, in this case type 2 diabetes. The application of MetaRon to metagenomic data at diverse scale will be beneficial to understand the gene regulation and therapeutic metagenomics.
Collapse
Affiliation(s)
- Syed Shujaat Ali Zaidi
- Bioinformatics Division, Beijing National Research Institute for Information Science and Technology (BNRIST), Department of Automation, Tsinghua University, Beijing, 100084, People's Republic of China.,Bioscience Department, COMSATS Institute of Information Technology, Islamabad, 44000, Pakistan.,Center for Innovation in Brain Science, University of Arizona, Tucson, 85719, USA
| | - Masood Ur Rehman Kayani
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai, 2000025, People's Republic of China
| | - Xuegong Zhang
- Bioinformatics Division, Beijing National Research Institute for Information Science and Technology (BNRIST), Department of Automation, Tsinghua University, Beijing, 100084, People's Republic of China
| | - Younan Ouyang
- China National Rice Research Institute (CNRRI), 28 Shuidaosuo rd, Fuyang, Hangzhou, 311400, People's Republic of China
| | - Imran Haider Shamsi
- Department of Agronomy, College of Agriculture and Biotechnology, Key Laboratory of Crop Germplasm Resource, Zhejiang University, Hangzhou, 310058, People's Republic of China.
| |
Collapse
|
2
|
Zaidi SSA, Zhang X. Computational operon prediction in whole-genomes and metagenomes. Brief Funct Genomics 2018; 16:181-193. [PMID: 27659221 DOI: 10.1093/bfgp/elw034] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Microbial diversity in unique environmental settings enables abrupt responses catalysed by altering the gene regulation and formation of gene clusters called operons. Operons increases bacterial adaptability, which in turn increases their survival. This review article presents the emergence of computational operon prediction methods for whole microbial genomes and metagenomes, and discusses their strengths and limitations. Most of the whole-genome operon prediction methods struggle to generalize on unrelated genomes. The applicability of universal whole-genome operon prediction methods to metagenomic data is an interesting yet less investigated question. We have evaluated the potential of various operon prediction features for genomic and metagenomic data. Most of operon prediction methods with high accuracy have been compiled into databases. Despite of the high predictive performance, the data among many databases are not completely consistent for similar species. We performed a correlation analysis between the computationally predicted operon databases and experimentally validated data for Escherichia coli, Bacillus subtilis and Mycobacterium tuberculosis. Operon prediction for most of the less characterized microbes cannot be verified due to absence of experimentally validated operons. The generation of validated information for other microbes would test the authenticity of operon databases for other less annotated microbes as well. Advances in sequencing technologies and development of better analysis methods will help researchers to overcome the technological hurdles (such as long sequencing reads and improved contig size) and further improve operon predictions and better utilize operonic information.
Collapse
|
3
|
Mao X, Ma Q, Liu B, Chen X, Zhang H, Xu Y. Revisiting operons: an analysis of the landscape of transcriptional units in E. coli. BMC Bioinformatics 2015; 16:356. [PMID: 26538447 PMCID: PMC4634151 DOI: 10.1186/s12859-015-0805-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 10/29/2015] [Indexed: 11/21/2022] Open
Abstract
Background Bacterial operons are considerably more complex than what were thought. At least their components are dynamically rather than statically defined as previously assumed. Here we present a computational study of the landscape of the transcriptional units (TUs) of E. coli K12, revealed by the available genomic and transcriptomic data, providing new understanding about the complexity of TUs as a whole encoded in the genome of E. coli K12. Results and conclusion Our main findings include that (i) different TUs may overlap with each other by sharing common genes, giving rise to clusters of overlapped TUs (TUCs) along the genomic sequence; (ii) the intergenic regions in front of the first gene of each TU tend to have more conserved sequence motifs than those of the other genes inside the TU, suggesting that TUs each have their own promoters; (iii) the terminators associated with the 3’ ends of TUCs tend to be Rho-independent terminators, substantially more often than terminators of TUs that end inside a TUC; and (iv) the functional relatedness of adjacent gene pairs in individual TUs is higher than those in TUCs, suggesting that individual TUs are more basic functional units than TUCs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0805-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xizeng Mao
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens, USA. .,Present address: MD Anderson Cancer Center, Houston, TX, 77054, USA.
| | - Qin Ma
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens, USA. .,BioEnergy Research Center (BESC), Athens, GA, USA. .,Present address: Department of Plant Science, South Dakota State University, Brookings, SD, 57006, USA. .,Present address: BioSNTR, Brookings, SD, USA.
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, China.
| | - Xin Chen
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens, USA. .,College of Computer Sciences and Technology, Changchun, Jilin, China.
| | - Hanyuan Zhang
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens, USA. .,Present address: Systems Biology and Biomedical Informatics (SBBI) Laboratory University of Nebraska-Lincoln 122B/122C Avery Hall, 1144 T St, Lincoln, NE, 68588-0115, USA.
| | - Ying Xu
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens, USA. .,BioEnergy Research Center (BESC), Athens, GA, USA. .,College of Computer Sciences and Technology, Changchun, Jilin, China. .,School of Public Health, Jilin University, Changchun, Jilin, China.
| |
Collapse
|
4
|
Xiao Y, van Hijum SAFT, Abee T, Wells-Bennik MHJ. Genome-Wide Transcriptional Profiling of Clostridium perfringens SM101 during Sporulation Extends the Core of Putative Sporulation Genes and Genes Determining Spore Properties and Germination Characteristics. PLoS One 2015; 10:e0127036. [PMID: 25978838 PMCID: PMC4433262 DOI: 10.1371/journal.pone.0127036] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 04/11/2015] [Indexed: 11/19/2022] Open
Abstract
The formation of bacterial spores is a highly regulated process and the ultimate properties of the spores are determined during sporulation and subsequent maturation. A wide variety of genes that are expressed during sporulation determine spore properties such as resistance to heat and other adverse environmental conditions, dormancy and germination responses. In this study we characterized the sporulation phases of C. perfringens enterotoxic strain SM101 based on morphological characteristics, biomass accumulation (OD600), the total viable counts of cells plus spores, the viable count of heat resistant spores alone, the pH of the supernatant, enterotoxin production and dipicolinic acid accumulation. Subsequently, whole-genome expression profiling during key phases of the sporulation process was performed using DNA microarrays, and genes were clustered based on their time-course expression profiles during sporulation. The majority of previously characterized C. perfringens germination genes showed upregulated expression profiles in time during sporulation and belonged to two main clusters of genes. These clusters with up-regulated genes contained a large number of C. perfringens genes which are homologs of Bacillus genes with roles in sporulation and germination; this study therefore suggests that those homologs are functional in C. perfringens. A comprehensive homology search revealed that approximately half of the upregulated genes in the two clusters are conserved within a broad range of sporeforming Firmicutes. Another 30% of upregulated genes in the two clusters were found only in Clostridium species, while the remaining 20% appeared to be specific for C. perfringens. These newly identified genes may add to the repertoire of genes with roles in sporulation and determining spore properties including germination behavior. Their exact roles remain to be elucidated in future studies.
Collapse
Affiliation(s)
- Yinghua Xiao
- NIZO food research, Ede, The Netherlands
- Top Institute Food and Nutrition, Wageningen, The Netherlands
- Laboratory of Food Microbiology, Wageningen University, Wageningen, The Netherlands
| | - Sacha A. F. T. van Hijum
- NIZO food research, Ede, The Netherlands
- Top Institute Food and Nutrition, Wageningen, The Netherlands
- Center for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Tjakko Abee
- Top Institute Food and Nutrition, Wageningen, The Netherlands
- Laboratory of Food Microbiology, Wageningen University, Wageningen, The Netherlands
| | - Marjon H. J. Wells-Bennik
- NIZO food research, Ede, The Netherlands
- Top Institute Food and Nutrition, Wageningen, The Netherlands
- * E-mail:
| |
Collapse
|
5
|
Zhou C, Ma Q, Li G. Elucidation of operon structures across closely related bacterial genomes. PLoS One 2014; 9:e100999. [PMID: 24959722 PMCID: PMC4069176 DOI: 10.1371/journal.pone.0100999] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 06/01/2014] [Indexed: 11/30/2022] Open
Abstract
About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components.
Collapse
Affiliation(s)
- Chuan Zhou
- School of Mathematics, Shandong University, Jinan, China
| | - Qin Ma
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics, University of Georgia, Athens, Georgia, United States of America
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, China
| |
Collapse
|
6
|
Chuang LY, Yang CH, Tsai JH, Yang CH. Operon prediction using chaos embedded particle swarm optimization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1299-1309. [PMID: 24384714 DOI: 10.1109/tcbb.2013.63] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Operons contain valuable information for drug design and determining protein functions. Genes within an operon are co-transcribed to a single-strand mRNA and must be coregulated. The identification of operons is, thus, critical for a detailed understanding of the gene regulations. However, currently used experimental methods for operon detection are generally difficult to implement and time consuming. In this paper, we propose a chaotic binary particle swarm optimization (CBPSO) to predict operons in bacterial genomes. The intergenic distance, participation in the same metabolic pathway and the cluster of orthologous groups (COG) properties of the Escherichia coli genome are used to design a fitness function. Furthermore, the Bacillus subtilis, Pseudomonas aeruginosa PA01, Staphylococcus aureus and Mycobacterium tuberculosis genomes are tested and evaluated for accuracy, sensitivity, and specificity. The computational results indicate that the proposed method works effectively in terms of enhancing the performance of the operon prediction. The proposed method also achieved a good balance between sensitivity and specificity when compared to methods from the literature.
Collapse
Affiliation(s)
| | - Cheng-Huei Yang
- National Kaohsiung Institute of Marine Technology, Kaohsiung
| | - Jui-Hung Tsai
- National Kaohsiung University of Applied Sciences, Kaohsiung
| | - Cheng-Hong Yang
- National Kaohsiung University of Applied Sciences, Kaohsiung
| |
Collapse
|
7
|
Scott E, Dyer DW. Divergence of the SigB regulon and pathogenesis of the Bacillus cereus sensu lato group. BMC Genomics 2012; 13:564. [PMID: 23088190 PMCID: PMC3485630 DOI: 10.1186/1471-2164-13-564] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2012] [Accepted: 10/10/2012] [Indexed: 12/31/2022] Open
Abstract
Background The Bacillus cereus sensu lato group currently includes seven species (B. cereus, B. anthracis, B. mycoides, B. pseudomycoides, B. thuringiensis, B. weihenstephanensis and B. cytotoxicus) that recent phylogenetic and phylogenomic analyses suggest are likely a single species, despite their varied phenotypes. Although horizontal gene transfer and insertion-deletion events are clearly important for promoting divergence among these genomes, recent studies have demonstrated that a major basis for phenotypic diversity in these organisms may be differential regulation of the highly similar gene content shared by these organisms. To explore this hypothesis, we used an in silico approach to evaluate the relationship of pathogenic potential and the divergence of the SigB-dependent general stress response within the B. cereus sensu lato group, since SigB has been demonstrated to support pathogenesis in Bacillus, Listeria and Staphylococcus species. Results During the divergence of these organisms from a common “SigB-less” ancestor, the placement of SigB promoters at varied locations in the B. cereus sensu lato genomes predict alternative structures for the SigB regulon in different organisms. Predicted promoter changes suggesting differential transcriptional control of a common gene pool predominate over evidence of indels or horizontal gene transfer for explaining SigB regulon divergence. Conclusions Four lineages of the SigB regulon have arisen that encompass different gene contents and suggest different strategies for supporting pathogenesis. This is consistent with the hypothesis that divergence within the B. cereus sensu lato group rests in part on alternative strategies for regulation of a common gene pool.
Collapse
Affiliation(s)
- Edgar Scott
- Department of Microbiology and Immunology, Oklahoma University Health Sciences Center, Oklahoma City, 73117, USA
| | | |
Collapse
|
8
|
Chuang LY, Chang HW, Tsai JH, Yang CH. Features for computational operon prediction in prokaryotes. Brief Funct Genomics 2012; 11:291-9. [PMID: 22753776 DOI: 10.1093/bfgp/els024] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Accurate prediction of operons can improve the functional annotation and application of genes within operons in prokaryotes. Here, we review several features: (i) intergenic distance, (ii) metabolic pathways, (iii) homologous genes, (iv) promoters and terminators, (v) gene order conservation, (vi) microarray, (vii) clusters of orthologous groups, (viii) gene length ratio, (ix) phylogenetic profiles, (x) operon length/size and (xi) STRING database scores, as well as some other features, which have been applied in recent operon prediction methods in prokaryotes in the literature. Based on a comparison of the prediction performances of these features, we conclude that other, as yet undiscovered features, or feature selection with a receiver operating characteristic analysis before algorithm processing can improve operon prediction in prokaryotes.
Collapse
Affiliation(s)
- Li-Yeh Chuang
- Department of Chemical Engineering & Institute of Biotechnology and Chemical Engineering, I-Shou University, Taiwan
| | | | | | | |
Collapse
|
9
|
Febrer M, McLay K, Caccamo M, Twomey KB, Ryan RP. Advances in bacterial transcriptome and transposon insertion-site profiling using second-generation sequencing. Trends Biotechnol 2011; 29:586-94. [PMID: 21764162 DOI: 10.1016/j.tibtech.2011.06.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2011] [Revised: 05/25/2011] [Accepted: 06/09/2011] [Indexed: 12/20/2022]
Abstract
The arrival of second-generation sequencing has revolutionized the study of bacteria within a short period. The sequence information generated from these platforms has helped in our understanding of bacterial development, adaptation and diversity and how bacteria cause disease. Furthermore, these technologies have quickly been adapted for high-throughput studies that were previously performed using DNA cloning or microarray-based applications. This has facilitated a more comprehensive study of bacterial transcriptomes through RNA sequencing (RNA-Seq) and the systematic determination of gene function by 'transposon monitoring'. In this review, we provide an outline of these powerful tools and the in silico analyses used in their application, and also highlight the biological questions being addressed in these approaches.
Collapse
Affiliation(s)
- Melanie Febrer
- The Genome Analysis Centre, Norwich Research Park, Colney Lane, Norwich NR4 7UH, UK
| | | | | | | | | |
Collapse
|
10
|
Toledo-Arana A, Solano C. Deciphering the physiological blueprint of a bacterial cell: revelations of unanticipated complexity in transcriptome and proteome. Bioessays 2010; 32:461-7. [PMID: 20486131 DOI: 10.1002/bies.201000020] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
During the last few months, several pioneer genome-wide transcriptomic, proteomic and metabolomic studies have revolutionised the understanding of bacterial biological processes, leading to a picture that resembles eukaryotic complexity. Technological advances such as next-generation high-throughput sequencing and high-density oligonucleotide microarrays have allowed the determination, in several bacteria, of the entire boundaries of all expressed transcripts. Consequently, novel RNA-mediated regulatory mechanisms have been discovered including multifunctional RNAs. Moreover, resolution of bacterial proteome organisation (interactome) and global protein localisation (localizome) have unveiled an unanticipated complexity that highlights the significance of protein multifunctionality and localisation in the cell. Also, analysis of a complete bacterial metabolic network has again revealed a high fraction of multifunctional enzymes and an unexpectedly high level of metabolic responses and adaptation. Altogether, these novel approaches have permitted the deciphering of the entire physiological landscape of one of the smallest bacteria, Mycoplasma pneumoniae. Here, we summarise and discuss recent findings aimed at defining the blueprint of any prokaryote.
Collapse
Affiliation(s)
- Alejandro Toledo-Arana
- Laboratory of Microbial Biofilms, Instituto de Agrobiotecnología, Universidad Pública de Navarra-CSIC-Gobierno de Navarra, Campus de Arrosadía, Pamplona, Spain.
| | | |
Collapse
|
11
|
Abstract
An operon is a fundamental unit of transcription and contains specific functional genes for the construction and regulation of networks at the entire genome level. The correct prediction of operons is vital for understanding gene regulations and functions in newly sequenced genomes. As experimental methods for operon detection tend to be nontrivial and time consuming, various methods for operon prediction have been proposed in the literature. In this study, a binary particle swarm optimization is used for operon prediction in bacterial genomes. The intergenic distance, participation in the same metabolic pathway, the cluster of orthologous groups, the gene length ratio and the operon length are used to design a fitness function. We trained the proper values on the Escherichia coli genome, and used the above five properties to implement feature selection. Finally, our study used the intergenic distance, metabolic pathway and the gene length ratio property to predict operons. Experimental results show that the prediction accuracy of this method reached 92.1%, 93.3% and 95.9% on the Bacillus subtilis genome, the Pseudomonas aeruginosa PA01 genome and the Staphylococcus aureus genome, respectively. This method has enabled us to predict operons with high accuracy for these three genomes, for which only limited data on the properties of the operon structure exists.
Collapse
Affiliation(s)
- Li-Yeh Chuang
- Department of Chemical Engineering, I-Shou University, Kaohsiung, Taiwan
| | | | | |
Collapse
|
12
|
Taboada B, Verde C, Merino E. High accuracy operon prediction method based on STRING database scores. Nucleic Acids Res 2010; 38:e130. [PMID: 20385580 PMCID: PMC2896540 DOI: 10.1093/nar/gkq254] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We present a simple and highly accurate computational method for operon prediction, based on intergenic distances and functional relationships between the protein products of contiguous genes, as defined by STRING database (Jensen,L.J., Kuhn,M., Stark,M., Chaffron,S., Creevey,C., Muller,J., Doerks,T., Julien,P., Roth,A., Simonovic,M. et al. (2009) STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res., 37, D412–D416). These two parameters were used to train a neural network on a subset of experimentally characterized Escherichia coli and Bacillus subtilis operons. Our predictive model was successfully tested on the set of experimentally defined operons in E. coli and B. subtilis, with accuracies of 94.6 and 93.3%, respectively. As far as we know, these are the highest accuracies ever obtained for predicting bacterial operons. Furthermore, in order to evaluate the predictable accuracy of our model when using an organism's data set for the training procedure, and a different organism's data set for testing, we repeated the E. coli operon prediction analysis using a neural network trained with B. subtilis data, and a B. subtilis analysis using a neural network trained with E. coli data. Even for these cases, the accuracies reached with our method were outstandingly high, 91.5 and 93%, respectively. These results show the potential use of our method for accurately predicting the operons of any other organism. Our operon predictions for fully-sequenced genomes are available at http://operons.ibt.unam.mx/OperonPredictor/.
Collapse
Affiliation(s)
- Blanca Taboada
- Centro de Ciencias Aplicadas y Desarrollo Tecnológico, Universidad Nacional Autónoma de México, México, D.F., México
| | | | | |
Collapse
|