1
|
Grund ME, Choi SJ, McNitt DH, Barbier M, Hu G, LaSala PR, Cote CK, Berisio R, Lukomski S. Burkholderia collagen-like protein 8, Bucl8, is a unique outer membrane component of a putative tetrapartite efflux pump in Burkholderia pseudomallei and Burkholderia mallei. PLoS One 2020; 15:e0242593. [PMID: 33227031 PMCID: PMC7682875 DOI: 10.1371/journal.pone.0242593] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 11/06/2020] [Indexed: 12/19/2022] Open
Abstract
Bacterial efflux pumps are an important pathogenicity trait because they extrude a variety of xenobiotics. Our laboratory previously identified in silico Burkholderia collagen-like protein 8 (Bucl8) in the hazardous pathogens Burkholderia pseudomallei and Burkholderia mallei. We hypothesize that Bucl8, which contains two predicted tandem outer membrane efflux pump domains, is a component of a putative efflux pump. Unique to Bucl8, as compared to other outer membrane proteins, is the presence of an extended extracellular region containing a collagen-like (CL) domain and a non-collagenous C-terminus (Ct). Molecular modeling and circular dichroism spectroscopy with a recombinant protein, corresponding to this extracellular CL-Ct portion of Bucl8, demonstrated that it adopts a collagen triple helix, whereas functional assays screening for Bucl8 ligands identified binding to fibrinogen. Bioinformatic analysis of the bucl8 gene locus revealed it resembles a classical efflux-pump operon. The bucl8 gene is co-localized with downstream fusCDE genes encoding fusaric acid (FA) resistance, and with an upstream gene, designated as fusR, encoding a LysR-type transcriptional regulator. Using reverse transcriptase (RT)-qPCR, we defined the boundaries and transcriptional organization of the fusR-bucl8-fusCDE operon. We found exogenous FA induced bucl8 transcription over 80-fold in B. pseudomallei, while deletion of the entire bucl8 locus decreased the minimum inhibitory concentration of FA 4-fold in its isogenic mutant. We furthermore showed that the putative Bucl8-associated pump expressed in the heterologous Escherichia coli host confers FA resistance. On the contrary, the Bucl8-associated pump did not confer resistance to a panel of clinically-relevant antimicrobials in Burkholderia and E. coli. We finally demonstrated that deletion of the bucl8-locus drastically affects the growth of the mutant in L-broth. We determined that Bucl8 is a component of a novel tetrapartite efflux pump, which confers FA resistance, fibrinogen binding, and optimal growth.
Collapse
Affiliation(s)
- Megan E. Grund
- Department of Microbiology, Immunology and Cell Biology, School of
Medicine, West Virginia University, Morgantown, WV, United States of
America
| | - Soo J. Choi
- Department of Microbiology, Immunology and Cell Biology, School of
Medicine, West Virginia University, Morgantown, WV, United States of
America
| | - Dudley H. McNitt
- Department of Microbiology, Immunology and Cell Biology, School of
Medicine, West Virginia University, Morgantown, WV, United States of
America
| | - Mariette Barbier
- Department of Microbiology, Immunology and Cell Biology, School of
Medicine, West Virginia University, Morgantown, WV, United States of
America
| | - Gangqing Hu
- Department of Microbiology, Immunology and Cell Biology, School of
Medicine, West Virginia University, Morgantown, WV, United States of
America
- Cancer Center, West Virginia University, Morgantown, WV, United States of
America
- Bioinformatics Core, West Virginia University, Morgantown, WV, United
States of America
| | - P. Rocco LaSala
- Department of Pathology, West Virginia University, Morgantown, WV, United
States of America
| | - Christopher K. Cote
- Bacteriology Division, The United States Army Medical Research Institute
of Infectious Diseases (USAMRIID), Frederick, MD, United States of
America
| | - Rita Berisio
- Institute of Biostructures and Bioimaging, National Research Council,
Naples, Italy
| | - Slawomir Lukomski
- Department of Microbiology, Immunology and Cell Biology, School of
Medicine, West Virginia University, Morgantown, WV, United States of
America
- Cancer Center, West Virginia University, Morgantown, WV, United States of
America
| |
Collapse
|
2
|
Hua ZG, Lin Y, Yuan YZ, Yang DC, Wei W, Guo FB. ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes. Nucleic Acids Res 2015; 43:W85-90. [PMID: 25977299 PMCID: PMC4489317 DOI: 10.1093/nar/gkv491] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 05/02/2015] [Indexed: 01/09/2023] Open
Abstract
In 2003, we developed an ab initio program, ZCURVE 1.0, to find genes in bacterial and archaeal genomes. In this work, we present the updated version (i.e. ZCURVE 3.0). Using 422 prokaryotic genomes, the average accuracy was 93.7% with the updated version, compared with 88.7% with the original version. Such results also demonstrate that ZCURVE 3.0 is comparable with Glimmer 3.02 and may provide complementary predictions to it. In fact, the joint application of the two programs generated better results by correctly finding more annotated genes while also containing fewer false-positive predictions. As the exclusive function, ZCURVE 3.0 contains one post-processing program that can identify essential genes with high accuracy (generally >90%). We hope ZCURVE 3.0 will receive wide use with the web-based running mode. The updated ZCURVE can be freely accessed from http://cefg.uestc.edu.cn/zcurve/ or http://tubic.tju.edu.cn/zcurveb/ without any restrictions.
Collapse
Affiliation(s)
- Zhi-Gang Hua
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China Health Big Data Science Research Center, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 610054, China Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yan Lin
- Department of Physics, Tianjin University, Tianjin 300072, China Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin 300072, China Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China
| | - Ya-Zhou Yuan
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China Health Big Data Science Research Center, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 610054, China Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - De-Chang Yang
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China Health Big Data Science Research Center, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 610054, China Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wen Wei
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China Health Big Data Science Research Center, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 610054, China Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Feng-Biao Guo
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China Health Big Data Science Research Center, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 610054, China Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
3
|
Schüler W, Bunikis I, Weber-Lehman J, Comstedt P, Kutschan-Bunikis S, Stanek G, Huber J, Meinke A, Bergström S, Lundberg U. Complete genome sequence of Borrelia afzelii K78 and comparative genome analysis. PLoS One 2015; 10:e0120548. [PMID: 25798594 PMCID: PMC4370689 DOI: 10.1371/journal.pone.0120548] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Accepted: 01/23/2015] [Indexed: 02/04/2023] Open
Abstract
The main Borrelia species causing Lyme borreliosis in Europe and Asia are Borrelia afzelii, B. garinii, B. burgdorferi and B. bavariensis. This is in contrast to the United States, where infections are exclusively caused by B. burgdorferi. Until to date the genome sequences of four B. afzelii strains, of which only two include the numerous plasmids, are available. In order to further assess the genetic diversity of B. afzelii, the most common species in Europe, responsible for the large variety of clinical manifestations of Lyme borreliosis, we have determined the full genome sequence of the B. afzelii strain K78, a clinical isolate from Austria. The K78 genome contains a linear chromosome (905,949 bp) and 13 plasmids (8 linear and 5 circular) together presenting 1,309 open reading frames of which 496 are located on plasmids. With the exception of lp28-8, all linear replicons in their full length including their telomeres have been sequenced. The comparison with the genomes of the four other B. afzelii strains, ACA-1, PKo, HLJ01 and Tom3107, as well as the one of B. burgdorferi strain B31, confirmed a high degree of conservation within the linear chromosome of B. afzelii, whereas plasmid encoded genes showed a much larger diversity. Since some plasmids present in B. burgdorferi are missing in the B. afzelii genomes, the corresponding virulence factors of B. burgdorferi are found in B. afzelii on other unrelated plasmids. In addition, we have identified a species specific region in the circular plasmid, cp26, which could be used for species determination. Different non-coding RNAs have been located on the B. afzelii K78 genome, which have not previously been annotated in any of the published Borrelia genomes.
Collapse
Affiliation(s)
| | - Ignas Bunikis
- Department of Molecular Biology, Umeå University, Umeå, Sweden
| | | | | | | | - Gerold Stanek
- Medical University of Vienna, Institute for Hygiene and Applied Immunology, Vienna, Austria
| | | | | | - Sven Bergström
- Department of Molecular Biology, Umeå University, Umeå, Sweden
| | | |
Collapse
|
4
|
Pérez-Rodríguez J, Arroyo-Peña AG, García-Pedrajas N. Improving translation initiation site and stop codon recognition by using more than two classes. Bioinformatics 2014; 30:2702-8. [PMID: 24903421 DOI: 10.1093/bioinformatics/btu369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The recognition of translation initiation sites and stop codons is a fundamental part of any gene recognition program. Currently, the most successful methods use powerful classifiers, such as support vector machines with various string kernels. These methods all use two classes, one of positive instances and another one of negative instances that are constructed using sequences from the whole genome. However, the features of the negative sequences differ depending on the position of the negative samples in the gene. There are differences depending on whether they are from exons, introns, intergenic regions or any other functional part of the genome. Thus, the positive class is fairly homogeneous, as all its sequences come from the same part of the gene, but the negative class is composed of different instances. The classifier suffers from this problem. In this article, we propose the training of different classifiers with different negative, more homogeneous, classes and the combination of these classifiers for improved accuracy. RESULTS The proposed method achieves better accuracy than the best state-of-the-art method, both in terms of the geometric mean of the specificity and sensitivity and the area under the receiver operating characteristic and precision recall curves. The method is tested on the whole human genome. The results for recognizing both translation initiation sites and stop codons indicated improvements in the rates of both false-negative results (FN) and false-positive results (FP). On an average, for translation initiation site recognition, the false-negative ratio was reduced by 30.2% and the FP ratio decreased by 10.9%. For stop codon prediction, FP were reduced by 41.4% and FN by 31.7%. AVAILABILITY AND IMPLEMENTATION The source code is licensed under the General Public License and is thus freely available. The datasets and source code can be obtained from http://cib.uco.es/site-recognition. CONTACT npedrajas@uco.es.
Collapse
Affiliation(s)
- Javier Pérez-Rodríguez
- Department of Computing and Numerical Analysis, University of Córdoba, Campus Universitario de Rabanales, Edificio Einstein, Planta 3, 14071 Córdoba, Spain
| | - Alexis G Arroyo-Peña
- Department of Computing and Numerical Analysis, University of Córdoba, Campus Universitario de Rabanales, Edificio Einstein, Planta 3, 14071 Córdoba, Spain
| | - Nicolás García-Pedrajas
- Department of Computing and Numerical Analysis, University of Córdoba, Campus Universitario de Rabanales, Edificio Einstein, Planta 3, 14071 Córdoba, Spain
| |
Collapse
|
5
|
Liu Y, Guo J, Hu G, Zhu H. Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics 2013; 14 Suppl 5:S12. [PMID: 23735199 PMCID: PMC3622649 DOI: 10.1186/1471-2105-14-s5-s12] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background Metagenomic sequencing is becoming a powerful technology for exploring micro-ogranisms from various environments, such as human body, without isolation and cultivation. Accurately identifying genes from metagenomic fragments is one of the most fundamental issues. Results In this article, we present a novel gene prediction method named MetaGUN for metagenomic fragments based on a machine learning approach of SVM. It implements in a three-stage strategy to predict genes. Firstly, it classifies input fragments into phylogenetic groups by a k-mer based sequence binning method. Then, protein-coding sequences are identified for each group independently with SVM classifiers that integrate entropy density profiles (EDP) of codon usage, translation initiation site (TIS) scores and open reading frame (ORF) length as input patterns. Finally, the TISs are adjusted by employing a modified version of MetaTISA. To identify protein-coding sequences, MetaGun builds the universal module and the novel module. The former is based on a set of representative species, while the latter is designed to find potential functionary DNA sequences with conserved domains. Conclusions Comparisons on artificial shotgun fragments with multiple current metagenomic gene finders show that MetaGUN predicts better results on both 3' and 5' ends of genes with fragments of various lengths. Especially, it makes the most reliable predictions among these methods. As an application, MetaGUN was used to predict genes for two samples of human gut microbiome. It identifies thousands of additional genes with significant evidences. Further analysis indicates that MetaGUN tends to predict more potential novel genes than other current metagenomic gene finders.
Collapse
Affiliation(s)
- Yongchu Liu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, China
| | | | | | | |
Collapse
|
6
|
Klassen JL, Currie CR. ORFcor: identifying and accommodating ORF prediction inconsistencies for phylogenetic analysis. PLoS One 2013; 8:e58387. [PMID: 23484025 PMCID: PMC3590147 DOI: 10.1371/journal.pone.0058387] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2012] [Accepted: 02/04/2013] [Indexed: 12/26/2022] Open
Abstract
The high-throughput annotation of open reading frames (ORFs) required by modern genome sequencing projects necessitates computational protocols that sometimes annotate orthologous ORFs inconsistently. Such inconsistencies hinder comparative analyses by non-uniformly extending or truncating 5′ and/or 3′ sequence ends, causing ORFs that are in fact identical to artificially diverge. Whereas strategies exist to correct such inconsistencies during whole-genome annotation, equivalent software designed to correct subsets of these data without genome reannotation is lacking. We therefore developed ORFcor, which corrects annotation inconsistencies using consensus start and stop positions derived from sets of closely related orthologs. ORFcor corrects inconsistent ORF annotations in diverse test datasets with specificities and sensitivities approaching 100% when sufficiently related orthologs (e.g., from the same taxonomic family) are available for comparison. The ORFcor package is implemented in Perl, multithreaded to handle large datasets, includes related scripts to facilitate high-throughput phylogenomic analyses, and is freely available at www.currielab.wisc.edu/downloads.html.
Collapse
Affiliation(s)
- Jonathan L Klassen
- Department of Bacteriology, University of Wisconsin-Madison, Madison, Wisconsin, USA.
| | | |
Collapse
|
7
|
Rebustini IT, Hayashi T, Reynolds AD, Dillard ML, Carpenter EM, Hoffman MP. miR-200c regulates FGFR-dependent epithelial proliferation via Vldlr during submandibular gland branching morphogenesis. Development 2011; 139:191-202. [PMID: 22115756 DOI: 10.1242/dev.070151] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The regulation of epithelial proliferation during organ morphogenesis is crucial for normal development, as dysregulation is associated with tumor formation. Non-coding microRNAs (miRNAs), such as miR-200c, are post-transcriptional regulators of genes involved in cancer. However, the role of miR-200c during normal development is unknown. We screened miRNAs expressed in the mouse developing submandibular gland (SMG) and found that miR-200c accumulates in the epithelial end buds. Using both loss- and gain-of-function, we demonstrated that miR-200c reduces epithelial proliferation during SMG morphogenesis. To identify the mechanism, we predicted miR-200c target genes and confirmed their expression during SMG development. We discovered that miR-200c targets the very low density lipoprotein receptor (Vldlr) and its ligand reelin, which unexpectedly regulate FGFR-dependent epithelial proliferation. Thus, we demonstrate that miR-200c influences FGFR-mediated epithelial proliferation during branching morphogenesis via a Vldlr-dependent mechanism. miR-200c and Vldlr may be novel targets for controlling epithelial morphogenesis during glandular repair or regeneration.
Collapse
Affiliation(s)
- Ivan T Rebustini
- Matrix and Morphogenesis Section, Laboratory of Cell and Developmental Biology, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | | | | | | |
Collapse
|
8
|
Zheng X, Hu GQ, She ZS, Zhu H. Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes. BMC Genomics 2011; 12:361. [PMID: 21749696 PMCID: PMC3160421 DOI: 10.1186/1471-2164-12-361] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2011] [Accepted: 07/12/2011] [Indexed: 11/28/2022] Open
Abstract
Background Shine-Dalgarno (SD) signal has long been viewed as the dominant translation initiation signal in prokaryotes. Recently, leaderless genes, which lack 5'-untranslated regions (5'-UTR) on their mRNAs, have been shown abundant in archaea. However, current large-scale in silico analyses on initiation mechanisms in bacteria are mainly based on the SD-led initiation way, other than the leaderless one. The study of leaderless genes in bacteria remains open, which causes uncertain understanding of translation initiation mechanisms for prokaryotes. Results Here, we study signals in translation initiation regions of all genes over 953 bacterial and 72 archaeal genomes, then make an effort to construct an evolutionary scenario in view of leaderless genes in bacteria. With an algorithm designed to identify multi-signal in upstream regions of genes for a genome, we classify all genes into SD-led, TA-led and atypical genes according to the category of the most probable signal in their upstream sequences. Particularly, occurrence of TA-like signals about 10 bp upstream to translation initiation site (TIS) in bacteria most probably means leaderless genes. Conclusions Our analysis reveals that leaderless genes are totally widespread, although not dominant, in a variety of bacteria. Especially for Actinobacteria and Deinococcus-Thermus, more than twenty percent of genes are leaderless. Analyzed in closely related bacterial genomes, our results imply that the change of translation initiation mechanisms, which happens between the genes deriving from a common ancestor, is linearly dependent on the phylogenetic relationship. Analysis on the macroevolution of leaderless genes further shows that the proportion of leaderless genes in bacteria has a decreasing trend in evolution.
Collapse
Affiliation(s)
- Xiaobin Zheng
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China
| | | | | | | |
Collapse
|
9
|
Angiuoli SV, Dunning Hotopp JC, Salzberg SL, Tettelin H. Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinformatics 2011; 12:272. [PMID: 21718539 PMCID: PMC3142524 DOI: 10.1186/1471-2105-12-272] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2011] [Accepted: 06/30/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes) is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. RESULTS We introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review. CONCLUSIONS Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.
Collapse
Affiliation(s)
- Samuel V Angiuoli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.
| | | | | | | |
Collapse
|
10
|
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11:119. [PMID: 20211023 PMCID: PMC2848648 DOI: 10.1186/1471-2105-11-119] [Citation(s) in RCA: 6647] [Impact Index Per Article: 474.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Accepted: 03/08/2010] [Indexed: 11/10/2022] Open
Abstract
Background The quality of automated gene prediction in microbial organisms has improved steadily over the past decade, but there is still room for improvement. Increasing the number of correct identifications, both of genes and of the translation initiation sites for each gene, and reducing the overall number of false positives, are all desirable goals. Results With our years of experience in manually curating genomes for the Joint Genome Institute, we developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm). With Prodigal, we focused specifically on the three goals of improved gene structure prediction, improved translation initiation site recognition, and reduced false positives. We compared the results of Prodigal to existing gene-finding methods to demonstrate that it met each of these objectives. Conclusion We built a fast, lightweight, open source gene prediction program called Prodigal http://compbio.ornl.gov/prodigal/. Prodigal achieved good results compared to existing methods, and we believe it will be a valuable asset to automated microbial annotation pipelines.
Collapse
Affiliation(s)
- Doug Hyatt
- Computational Biology and Bioinformatics Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA.
| | | | | | | | | | | |
Collapse
|
11
|
Luo C, Hu GQ, Zhu H. Genome reannotation of Escherichia coli CFT073 with new insights into virulence. BMC Genomics 2009; 10:552. [PMID: 19930606 PMCID: PMC2785843 DOI: 10.1186/1471-2164-10-552] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 11/22/2009] [Indexed: 11/30/2022] Open
Abstract
Background As one of human pathogens, the genome of Uropathogenic Escherichia coli strain CFT073 was sequenced and published in 2002, which was significant in pathogenetic bacterial genomics research. However, the current RefSeq annotation of this pathogen is now outdated to some degree, due to missing or misannotation of some essential genes associated with its virulence. We carried out a systematic reannotation by combining automated annotation tools with manual efforts to provide a comprehensive understanding of virulence for the CFT073 genome. Results The reannotation excluded 608 coding sequences from the RefSeq annotation. Meanwhile, a total of 299 coding sequences were newly added, about one third of them are found in genomic island (GI) regions while more than one fifth of them are located in virulence related regions pathogenicity islands (PAIs). Furthermore, there are totally 341 genes were relocated with their translational initiation sites (TISs), which resulted in a high quality of gene start annotation. In addition, 94 pseudogenes annotated in RefSeq were thoroughly inspected and updated. The number of miscellaneous genes (sRNAs) has been updated from 6 in RefSeq to 46 in the reannotation. Based on the adjustment in the reannotation, subsequent analysis were conducted by both general and case studies on new virulence factors or new virulence-associated genes that are crucial during the urinary tract infections (UTIs) process, including invasion, colonization, nutrition uptaking and population density control. Furthermore, miscellaneous RNAs collected in the reannotation are believed to contribute to the virulence of strain CFT073. The reannotation including the nucleotide data, the original RefSeq annotation, and all reannotated results is freely available via http://mech.ctb.pku.edu.cn/CFT073/. Conclusion As a result, the reannotation presents a more comprehensive picture of mechanisms of uropathogenicity of UPEC strain CFT073. The new genes change the view of its uropathogenicity in many respects, particularly by new genes in GI regions and new virulence-associated factors. The reannotation thus functions as an important source by providing new information about genomic structure and organization, and gene function. Moreover, we expect that the detailed analysis will facilitate the studies for exploration of novel virulence mechanisms and help guide experimental design.
Collapse
Affiliation(s)
- Chengwei Luo
- State Key Laboratory for Turbulence and Complex Systems, and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China
| | | | | |
Collapse
|
12
|
Pallejà A, García-Vallvé S, Romeu A. Adaptation of the short intergenic spacers between co-directional genes to the Shine-Dalgarno motif among prokaryote genomes. BMC Genomics 2009; 10:537. [PMID: 19922619 PMCID: PMC2784483 DOI: 10.1186/1471-2164-10-537] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2009] [Accepted: 11/18/2009] [Indexed: 11/30/2022] Open
Abstract
Background In prokaryote genomes most of the co-directional genes are in close proximity. Even the coding sequence or the stop codon of a gene can overlap with the Shine-Dalgarno (SD) sequence of the downstream co-directional gene. In this paper we analyze how the presence of SD may influence the stop codon usage or the spacing lengths between co-directional genes. Results The SD sequences for 530 prokaryote genomes have been predicted using computer calculations of the base-pairing free energy between translation initiation regions and the 16S rRNA 3' tail. Genomes with a large number of genes with the SD sequence concentrate this regulatory motif from 4 to 11 bps before the start codon. However, not all genes seem to have the SD sequence. Genes separated from 1 to 4 bps from a co-directional upstream gene show a high SD presence, though this regulatory signal is located towards the 3' end of the coding sequence of the upstream gene. Genes separated from 9 to 15 bps show the highest SD presence as they accommodate the SD sequence within an intergenic region. However, genes separated from around 5 to 8 bps have a lower percentage of SD presence and when the SD is present, the stop codon usage of the upstream gene changes to accommodate the overlap between the SD sequence and the stop codon. Conclusion The SD presence makes the intergenic lengths from 5 to 8 bps less frequent and causes an adaptation of the stop codon usage. Our results introduce new elements to the discussion of which factors affect the intergenic lengths, which cannot be totally explained by the pressure to compact the prokaryote genomes.
Collapse
Affiliation(s)
- Albert Pallejà
- Department of Biochemistry and Biotechnology, Rovira i Virgili University, Tarragona, Catalonia, Spain.
| | | | | |
Collapse
|
13
|
Hu GQ, Guo JT, Liu YC, Zhu H. MetaTISA: Metagenomic Translation Initiation Site Annotator for improving gene start prediction. ACTA ACUST UNITED AC 2009; 25:1843-5. [PMID: 19389734 DOI: 10.1093/bioinformatics/btp272] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
SUMMARY We proposed a tool named MetaTISA with an aim to improve TIS prediction of current gene-finders for metagenomes. The method employs a two-step strategy to predict translation initiation sites (TISs) by first clustering metagenomic fragments into phylogenetic groups and then predicting TISs independently for each group in an unsupervised manner. As evaluated on experimentally verified TISs, MetaTISA greatly improves the accuracies of TIS prediction of current gene-finders. AVAILABILITY The C++ source code is freely available under the GNU GPL license via http://mech.ctb.pku.edu.cn/MetaTISA/.
Collapse
Affiliation(s)
- Gang-Qing Hu
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering and Center for Theoretical Biology, Peking University, Beijing 100871, China
| | | | | | | |
Collapse
|