Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hu GQ, Zheng X, Yang YF, Ortet P, She ZS, Zhu H. ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes. Nucleic Acids Res 2007;36:D114-9. [PMID: 17942412 PMCID: PMC2238952 DOI: 10.1093/nar/gkm799] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

For:	Hu GQ, Zheng X, Yang YF, Ortet P, She ZS, Zhu H. ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes. Nucleic Acids Res 2007;36:D114-9. [PMID: 17942412 PMCID: PMC2238952 DOI: 10.1093/nar/gkm799] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Number	Cited by Other Article(s)
1	Burkholderia collagen-like protein 8, Bucl8, is a unique outer membrane component of a putative tetrapartite efflux pump in Burkholderia pseudomallei and Burkholderia mallei. PLoS One 2020;15:e0242593. [PMID: 33227031 PMCID: PMC7682875 DOI: 10.1371/journal.pone.0242593] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 11/06/2020] [Indexed: 12/19/2022] Open Abstract Bacterial efflux pumps are an important pathogenicity trait because they extrude a variety of xenobiotics. Our laboratory previously identified in silico Burkholderia collagen-like protein 8 (Bucl8) in the hazardous pathogens Burkholderia pseudomallei and Burkholderia mallei. We hypothesize that Bucl8, which contains two predicted tandem outer membrane efflux pump domains, is a component of a putative efflux pump. Unique to Bucl8, as compared to other outer membrane proteins, is the presence of an extended extracellular region containing a collagen-like (CL) domain and a non-collagenous C-terminus (Ct). Molecular modeling and circular dichroism spectroscopy with a recombinant protein, corresponding to this extracellular CL-Ct portion of Bucl8, demonstrated that it adopts a collagen triple helix, whereas functional assays screening for Bucl8 ligands identified binding to fibrinogen. Bioinformatic analysis of the bucl8 gene locus revealed it resembles a classical efflux-pump operon. The bucl8 gene is co-localized with downstream fusCDE genes encoding fusaric acid (FA) resistance, and with an upstream gene, designated as fusR, encoding a LysR-type transcriptional regulator. Using reverse transcriptase (RT)-qPCR, we defined the boundaries and transcriptional organization of the fusR-bucl8-fusCDE operon. We found exogenous FA induced bucl8 transcription over 80-fold in B. pseudomallei, while deletion of the entire bucl8 locus decreased the minimum inhibitory concentration of FA 4-fold in its isogenic mutant. We furthermore showed that the putative Bucl8-associated pump expressed in the heterologous Escherichia coli host confers FA resistance. On the contrary, the Bucl8-associated pump did not confer resistance to a panel of clinically-relevant antimicrobials in Burkholderia and E. coli. We finally demonstrated that deletion of the bucl8-locus drastically affects the growth of the mutant in L-broth. We determined that Bucl8 is a component of a novel tetrapartite efflux pump, which confers FA resistance, fibrinogen binding, and optimal growth. Collapse Key Words Collapse MESH Headings Bacterial Outer Membrane Proteins/metabolism Bacterial Outer Membrane Proteins/physiology Burkholderia/genetics Burkholderia/metabolism Burkholderia mallei/genetics Burkholderia mallei/metabolism Burkholderia pseudomallei/genetics Burkholderia pseudomallei/metabolism Collagen/metabolism Drug Resistance, Multiple, Bacterial/genetics Escherichia coli/genetics Escherichia coli Proteins/genetics Genes, Bacterial/drug effects Membrane Transport Proteins/metabolism Operon/drug effects Transcription Factors/metabolism Collapse Grants P20 GM103434 NIGMS NIH HHS U54 GM104942 NIGMS NIH HHS Vaccine Development Center at WVU-HSC National Institute of General Medical Sciences Italian MIUR Collapse Affiliation(s) Collapse
2	Selection on start codons in prokaryotes and potential compensatory nucleotide substitutions. Sci Rep 2017;7:12422. [PMID: 28963504 PMCID: PMC5622118 DOI: 10.1038/s41598-017-12619-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 09/06/2017] [Indexed: 11/29/2022] Open Abstract Reconstruction of the evolution of start codons in 36 groups of closely related bacterial and archaeal genomes reveals purifying selection affecting AUG codons. The AUG starts are replaced by GUG and especially UUG significantly less frequently than expected under the neutral expectation derived from the frequencies of the respective nucleotide triplet substitutions in non-coding regions and in 4-fold degenerate sites. Thus, AUG is the optimal start codon that is actively maintained by purifying selection. However, purifying selection on start codons is significantly weaker than the selection on the same codons in coding sequences, although the switches between the codons result in conservative amino acid substitutions. The only exception is the AUG to UUG switch that is strongly selected against among start codons. Selection on start codons is most pronounced in evolutionarily conserved, highly expressed genes. Mutation of the start codon to a sub-optimal form (GUG or UUG) tends to be compensated by mutations in the Shine-Dalgarno sequence towards a stronger translation initiation signal. Together, all these findings indicate that in prokaryotes, translation start signals are subject to weak but significant selection for maximization of initiation rate and, consequently, protein production. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
3	A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites. PLoS One 2015. [PMID: 26204119 PMCID: PMC4512697 DOI: 10.1371/journal.pone.0133691] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open Abstract The identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any particular gene. We have formulated a reference-free method to score the TIS annotation quality. The method is based on a comparison of the observed and expected distribution of all TISs in a particular genome given prior gene-calling. We have assessed the TIS annotations for all available NCBI RefSeq microbial genomes and found that approximately 87% is of appropriate quality, whereas 13% needs substantial improvement. We have analyzed a number of factors that could affect TIS annotation quality such as GC-content, taxonomy, the fraction of genes with a Shine-Dalgarno sequence and the year of publication. The analysis showed that only the first factor has a clear effect. We have then formulated a straightforward Principle Component Analysis-based TIS identification strategy to self-organize and score potential TISs. The strategy is independent of reference data and a priori calculations. A representative set of 277 genomes was subjected to the analysis and we found a clear increase in TIS annotation quality for the genomes with a low quality score. The PCA-based annotation was also compared with annotation with the current tool of reference, Prodigal. The comparison for the model genome of Escherichia coli K12 showed that both methods supplement each other and that prediction agreement can be used as an indicator of a correct TIS annotation. Importantly, the data suggest that the addition of a PCA-based strategy to a Prodigal prediction can be used to ‘flag’ TIS annotations for re-evaluation and in addition can be used to evaluate a given annotation in case a Prodigal annotation is lacking. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
4	Characterization of the cryptic AV3 promoter of ageratum yellow vein virus in prokaryotic and eukaryotic systems. PLoS One 2014;9:e108608. [PMID: 25268755 PMCID: PMC4182527 DOI: 10.1371/journal.pone.0108608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 08/25/2014] [Indexed: 11/19/2022] Open Abstract A cryptic prokaryotic promoter, designated AV3 promoter, has been previously identified in certain begomovirus genus, including ageratum yellow vein virus isolate NT (AYVV-NT). In this study, we demonstrated that the core nucleotides in the putative -10 and -35 boxes are necessary but not sufficient for promoter activity in Escherichia coli, and showed that AYVV-NT AV3 promoter could specifically interact with single-stranded DNA-binding protein and sigma 70 of E. coli involved in transcription. Several AYVV-NT-encoded proteins were found to increase the activity of AV3 promoter. The transcription start sites downstream to AV3 promoter were mapped to nucleotide positions 803 or 805 in E. coli, and 856 in Nicotiana benthamiana. The eukaryotic activity of AV3 promoter and the translatability of a short downstream open reading frame were further confirmed by using a green fluorescent protein reporter construct in yeast (Saccharomyces cerevisiae) cells. These results suggested that AV3 promoter might be a remnant of evolution that retained cryptic activity at present. Collapse Key Words Collapse MESH Headings Ageratum/virology Agrobacterium/genetics Base Sequence Begomovirus/genetics Begomovirus/metabolism Biological Evolution Escherichia coli/genetics Escherichia coli/virology Genome, Viral Molecular Sequence Data Open Reading Frames Plant Diseases/virology Promoter Regions, Genetic Saccharomyces cerevisiae/genetics Saccharomyces cerevisiae/virology Nicotiana/virology Transcription, Genetic Viral Proteins/chemistry Viral Proteins/genetics Viral Proteins/metabolism Collapse Grants Collapse Affiliation(s) Collapse
5	N-terminomics and proteogenomics, getting off to a good start. Proteomics 2014;14:2637-46. [PMID: 25116052 DOI: 10.1002/pmic.201400157] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Revised: 04/23/2014] [Accepted: 08/08/2014] [Indexed: 12/11/2022] Abstract Proteogenomics consists of the annotation or reannotation of protein-coding nucleic acid sequences based on the empirical observation of their gene products. While functional annotation of predicted genes is increasingly feasible given the multiplicity of genomes available for many branches of the tree of life, the accurate annotation of the translational start sites is still a point of contention. Extensive coverage of the proteome, including specifically the N-termini, is now possible, thanks to next-generation mass spectrometers able to record data from thousands of proteins at once. Efforts to increase the peptide coverage of protein sequences and to detect low abundance proteins are important to make proteomic and proteogenomic studies more comprehensive. In this review, we present the panoply of N-terminus-oriented strategies that have been developed over the last decade. Collapse Key Words Gene annotation N-terminomics Peptide enrichment Peptide signal Proteogenomics Technology Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
6	Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics 2013;14 Suppl 5:S12. [PMID: 23735199 PMCID: PMC3622649 DOI: 10.1186/1471-2105-14-s5-s12] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open Abstract Background Metagenomic sequencing is becoming a powerful technology for exploring micro-ogranisms from various environments, such as human body, without isolation and cultivation. Accurately identifying genes from metagenomic fragments is one of the most fundamental issues. Results In this article, we present a novel gene prediction method named MetaGUN for metagenomic fragments based on a machine learning approach of SVM. It implements in a three-stage strategy to predict genes. Firstly, it classifies input fragments into phylogenetic groups by a k-mer based sequence binning method. Then, protein-coding sequences are identified for each group independently with SVM classifiers that integrate entropy density profiles (EDP) of codon usage, translation initiation site (TIS) scores and open reading frame (ORF) length as input patterns. Finally, the TISs are adjusted by employing a modified version of MetaTISA. To identify protein-coding sequences, MetaGun builds the universal module and the novel module. The former is based on a set of representative species, while the latter is designed to find potential functionary DNA sequences with conserved domains. Conclusions Comparisons on artificial shotgun fragments with multiple current metagenomic gene finders show that MetaGUN predicts better results on both 3' and 5' ends of genes with fragments of various lengths. Especially, it makes the most reliable predictions among these methods. As an application, MetaGUN was used to predict genes for two samples of human gut microbiome. It identifies thousands of additional genes with significant evidences. Further analysis indicates that MetaGUN tends to predict more potential novel genes than other current metagenomic gene finders. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
7	ClubSub-P: Cluster-Based Subcellular Localization Prediction for Gram-Negative Bacteria and Archaea. Front Microbiol 2011;2:218. [PMID: 22073040 PMCID: PMC3210502 DOI: 10.3389/fmicb.2011.00218] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Accepted: 10/12/2011] [Indexed: 12/17/2022] Open Abstract The subcellular localization (SCL) of proteins provides important clues to their function in a cell. In our efforts to predict useful vaccine targets against Gram-negative bacteria, we noticed that misannotated start codons frequently lead to wrongly assigned SCLs. This and other problems in SCL prediction, such as the relatively high false-positive and false-negative rates of some tools, can be avoided by applying multiple prediction tools to groups of homologous proteins. Here we present ClubSub-P, an online database that combines existing SCL prediction tools into a consensus pipeline from more than 600 proteomes of fully sequenced microorganisms. On top of the consensus prediction at the level of single sequences, the tool uses clusters of homologous proteins from Gram-negative bacteria and from Archaea to eliminate false-positive and false-negative predictions. ClubSub-P can assign the SCL of proteins from Gram-negative bacteria and Archaea with high precision. The database is searchable, and can easily be expanded using either new bacterial genomes or new prediction tools as they become available. This will further improve the performance of the SCL prediction, as well as the detection of misannotated start codons and other annotation errors. ClubSub-P is available online at http://toolkit.tuebingen.mpg.de/clubsubp/ Collapse Key Words clustering protein homology signal peptide start codon prediction subcellular localization prediction Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
8	Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes. BMC Genomics 2011;12:361. [PMID: 21749696 PMCID: PMC3160421 DOI: 10.1186/1471-2164-12-361] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2011] [Accepted: 07/12/2011] [Indexed: 11/28/2022] Open Abstract Background Shine-Dalgarno (SD) signal has long been viewed as the dominant translation initiation signal in prokaryotes. Recently, leaderless genes, which lack 5'-untranslated regions (5'-UTR) on their mRNAs, have been shown abundant in archaea. However, current large-scale in silico analyses on initiation mechanisms in bacteria are mainly based on the SD-led initiation way, other than the leaderless one. The study of leaderless genes in bacteria remains open, which causes uncertain understanding of translation initiation mechanisms for prokaryotes. Results Here, we study signals in translation initiation regions of all genes over 953 bacterial and 72 archaeal genomes, then make an effort to construct an evolutionary scenario in view of leaderless genes in bacteria. With an algorithm designed to identify multi-signal in upstream regions of genes for a genome, we classify all genes into SD-led, TA-led and atypical genes according to the category of the most probable signal in their upstream sequences. Particularly, occurrence of TA-like signals about 10 bp upstream to translation initiation site (TIS) in bacteria most probably means leaderless genes. Conclusions Our analysis reveals that leaderless genes are totally widespread, although not dominant, in a variety of bacteria. Especially for Actinobacteria and Deinococcus-Thermus, more than twenty percent of genes are leaderless. Analyzed in closely related bacterial genomes, our results imply that the change of translation initiation mechanisms, which happens between the genes deriving from a common ancestor, is linearly dependent on the phylogenetic relationship. Analysis on the macroevolution of leaderless genes further shows that the proportion of leaderless genes in bacteria has a decreasing trend in evolution. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
9	Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010;11:119. [PMID: 20211023 PMCID: PMC2848648 DOI: 10.1186/1471-2105-11-119] [Citation(s) in RCA: 6242] [Impact Index Per Article: 445.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Accepted: 03/08/2010] [Indexed: 11/10/2022] Open Abstract Background The quality of automated gene prediction in microbial organisms has improved steadily over the past decade, but there is still room for improvement. Increasing the number of correct identifications, both of genes and of the translation initiation sites for each gene, and reducing the overall number of false positives, are all desirable goals. Results With our years of experience in manually curating genomes for the Joint Genome Institute, we developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm). With Prodigal, we focused specifically on the three goals of improved gene structure prediction, improved translation initiation site recognition, and reduced false positives. We compared the results of Prodigal to existing gene-finding methods to demonstrate that it met each of these objectives. Conclusion We built a fast, lightweight, open source gene prediction program called Prodigal http://compbio.ornl.gov/prodigal/. Prodigal achieved good results compared to existing methods, and we believe it will be a valuable asset to automated microbial annotation pipelines. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
10	DIGA--a database of improved gene annotation for phytopathogens. BMC Genomics 2010;11:54. [PMID: 20089203 PMCID: PMC2825234 DOI: 10.1186/1471-2164-11-54] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2009] [Accepted: 01/21/2010] [Indexed: 11/28/2022] Open Abstract Background Bacterial plant pathogens are very harmful to their host plants, which can cause devastating agricultural losses in the world. With the development of microbial genome sequencing, many strains of phytopathogens have been sequenced. However, some misannotations exist in these phytopathogen genomes. Our objective is to improve these annotations and store them in a central database DIGAP. Description DIGAP includes the following improved information on phytopathogen genomes. (i) All the 'hypothetical proteins' were checked, and non-coding ORFs recognized by the Z curve method were removed. (ii) The translation initiation sites (TISs) of 20% ~ 25% of all the protein-coding genes have been corrected based on the NCBI RefSeq, ProTISA database and an ab initio program, GS-Finder. (iii) Potential functions of about 10% 'hypothetical proteins' have been predicted using sequence alignment tools. (iv) Two theoretical gene expression indices, the codon adaptation index (CAI) and the E(g) index, were calculated to predict the gene expression levels. (v) Potential agricultural bactericide targets and their homology-modeled 3D structures are provided in the database, which is of significance for agricultural antibiotic discovery. Conclusion The results in DIGAP provide useful information for understanding the pathogenetic mechanisms of phytopathogens and for finding agricultural bactericides. DIGAP is freely available at http://ibi.hzau.edu.cn/digap/. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
11	Genome reannotation of Escherichia coli CFT073 with new insights into virulence. BMC Genomics 2009;10:552. [PMID: 19930606 PMCID: PMC2785843 DOI: 10.1186/1471-2164-10-552] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 11/22/2009] [Indexed: 11/30/2022] Open Abstract Background As one of human pathogens, the genome of Uropathogenic Escherichia coli strain CFT073 was sequenced and published in 2002, which was significant in pathogenetic bacterial genomics research. However, the current RefSeq annotation of this pathogen is now outdated to some degree, due to missing or misannotation of some essential genes associated with its virulence. We carried out a systematic reannotation by combining automated annotation tools with manual efforts to provide a comprehensive understanding of virulence for the CFT073 genome. Results The reannotation excluded 608 coding sequences from the RefSeq annotation. Meanwhile, a total of 299 coding sequences were newly added, about one third of them are found in genomic island (GI) regions while more than one fifth of them are located in virulence related regions pathogenicity islands (PAIs). Furthermore, there are totally 341 genes were relocated with their translational initiation sites (TISs), which resulted in a high quality of gene start annotation. In addition, 94 pseudogenes annotated in RefSeq were thoroughly inspected and updated. The number of miscellaneous genes (sRNAs) has been updated from 6 in RefSeq to 46 in the reannotation. Based on the adjustment in the reannotation, subsequent analysis were conducted by both general and case studies on new virulence factors or new virulence-associated genes that are crucial during the urinary tract infections (UTIs) process, including invasion, colonization, nutrition uptaking and population density control. Furthermore, miscellaneous RNAs collected in the reannotation are believed to contribute to the virulence of strain CFT073. The reannotation including the nucleotide data, the original RefSeq annotation, and all reannotated results is freely available via http://mech.ctb.pku.edu.cn/CFT073/. Conclusion As a result, the reannotation presents a more comprehensive picture of mechanisms of uropathogenicity of UPEC strain CFT073. The new genes change the view of its uropathogenicity in many respects, particularly by new genes in GI regions and new virulence-associated factors. The reannotation thus functions as an important source by providing new information about genomic structure and organization, and gene function. Moreover, we expect that the detailed analysis will facilitate the studies for exploration of novel virulence mechanisms and help guide experimental design. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
12	PairWise Neighbours database: overlaps and spacers among prokaryote genomes. BMC Genomics 2009;10:281. [PMID: 19555467 PMCID: PMC2716372 DOI: 10.1186/1471-2164-10-281] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2009] [Accepted: 06/25/2009] [Indexed: 05/25/2023] Open Abstract Background Although prokaryotes live in a variety of habitats and possess different metabolic and genomic complexity, they have several genomic architectural features in common. The overlapping genes are a common feature of the prokaryote genomes. The overlapping lengths tend to be short because as the overlaps become longer they have more risk of deleterious mutations. The spacers between genes tend to be short too because of the tendency to reduce the non coding DNA among prokaryotes. However they must be long enough to maintain essential regulatory signals such as the Shine-Dalgarno (SD) sequence, which is responsible of an efficient translation. Description PairWise Neighbours is an interactive and intuitive database used for retrieving information about the spacers and overlapping genes among bacterial and archaeal genomes. It contains 1,956,294 gene pairs from 678 fully sequenced prokaryote genomes and is freely available at the URL . This database provides information about the overlaps and their conservation across species. Furthermore, it allows the wide analysis of the intergenic regions providing useful information such as the location and strength of the SD sequence. Conclusion There are experiments and bioinformatic analysis that rely on correct annotations of the initiation site. Therefore, a database that studies the overlaps and spacers among prokaryotes appears to be desirable. PairWise Neighbours database permits the reliability analysis of the overlapping structures and the study of the SD presence and location among the adjacent genes, which may help to check the annotation of the initiation sites. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
13	Comparative genomic analysis of ten Streptococcus pneumoniae temperate bacteriophages. J Bacteriol 2009;191:4854-62. [PMID: 19502408 PMCID: PMC2715734 DOI: 10.1128/jb.01272-08] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open Abstract Streptococcus pneumoniae is an important human pathogen that often carries temperate bacteriophages. As part of a program to characterize the genetic makeup of prophages associated with clinical strains and to assess the potential roles that they play in the biology and pathogenesis in their host, we performed comparative genomic analysis of 10 temperate pneumococcal phages. All of the genomes are organized into five major gene clusters: lysogeny, replication, packaging, morphogenesis, and lysis clusters. All of the phage particles observed showed a Siphoviridae morphology. The only genes that are well conserved in all the genomes studied are those involved in the integration and the lysis of the host in addition to two genes, of unknown function, within the replication module. We observed that a high percentage of the open reading frames contained no similarities to any sequences catalogued in public databases; however, genes that were homologous to known phage virulence genes, including the pblB gene of Streptococcus mitis and the vapE gene of Dichelobacter nodosus, were also identified. Interestingly, bioinformatic tools showed the presence of a toxin-antitoxin system in the phage phiSpn_6, and this represents the first time that an addition system in a pneumophage has been identified. Collectively, the temperate pneumophages contain a diverse set of genes with various levels of similarity among them. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
14	MetaTISA: Metagenomic Translation Initiation Site Annotator for improving gene start prediction. ACTA ACUST UNITED AC 2009;25:1843-5. [PMID: 19389734 DOI: 10.1093/bioinformatics/btp272] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Abstract SUMMARY We proposed a tool named MetaTISA with an aim to improve TIS prediction of current gene-finders for metagenomes. The method employs a two-step strategy to predict translation initiation sites (TISs) by first clustering metagenomic fragments into phylogenetic groups and then predicting TISs independently for each group in an unsupervised manner. As evaluated on experimentally verified TISs, MetaTISA greatly improves the accuracies of TIS prediction of current gene-finders. AVAILABILITY The C++ source code is freely available under the GNU GPL license via http://mech.ctb.pku.edu.cn/MetaTISA/. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
15	Prediction of translation initiation site for microbial genomes with TriTISA. ACTA ACUST UNITED AC 2008;25:123-5. [PMID: 19015130 DOI: 10.1093/bioinformatics/btn576] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Abstract UNLABELLED We report a new and simple method, TriTISA, for accurate prediction of translation initiation site (TIS) of microbial genomes. TriTISA classifies all candidate TISs into three categories based on evolutionary properties, and characterizes them in terms of Markov models. Then, it employs a Bayesian methodology for the selection of true TIS with a non-supervised, iterative procedure. Assessment on experimentally verified TIS data shows that TriTISA is overall better than all other methods of the state-of-the-art for microbial genome TIS prediction. In particular, TriTISA is shown to have a robust accuracy independent of the quality of initial annotation. AVAILABILITY The C++ source code is freely available under the GNU GPL license via http://mech.ctb.pku.edu.cn/protisa/TriTISA. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
16	Bioinformatics in China: a personal perspective. PLoS Comput Biol 2008;4:e1000020. [PMID: 18437216 PMCID: PMC2291564 DOI: 10.1371/journal.pcbi.1000020] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
17	Shuttle vector expression in Thermococcus kodakaraensis: contributions of cis elements to protein synthesis in a hyperthermophilic archaeon. Appl Environ Microbiol 2008;74:3099-104. [PMID: 18378640 DOI: 10.1128/aem.00305-08] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open Abstract Shuttle vectors that replicate stably and express selectable phenotypes in both Thermococcus kodakaraensis and Escherichia coli have been constructed. Plasmid pTN1 from Thermococcus nautilis was ligated to the commercial vector pCR2.1-TOPO, and selectable markers were added so that T. kodakaraensis transformants could be selected by DeltatrpE complementation and/or mevinolin resistance. Based on Western blot measurements, shuttle vector expression of RpoL-HA, a hemagglutinin (HA) epitope-tagged subunit of T. kodakaraensis RNA polymerase (RNAP), was approximately 8-fold higher than chromosome expression. An idealized ribosome binding sequence (5'-AGGTGG) was incorporated for RpoL-HA expression, and changes to this sequence reduced expression. Changing the translation initiation codon from AUG to GUG did not reduce RpoL-HA expression, but replacing AUG with UUG dramatically reduced RpoL-HA synthesis. When functioning as translation initiation codons, AUG, GUG, and UUG all directed the incorporation of methionine as the N-terminal residue of RpoL-HA synthesized in T. kodakaraensis. Affinity purification confirmed that an HA- plus six-histidine-tagged RpoL subunit (RpoL-HA-his(6)) synthesized ectopically from a shuttle vector was assembled in vivo into RNAP holoenzymes that were active and could be purified directly from T. kodakaraensis cell lysates by Ni(2+) binding and imidazole elution. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
18	Computational evaluation of TIS annotation for prokaryotic genomes. BMC Bioinformatics 2008;9:160. [PMID: 18366730 PMCID: PMC2362131 DOI: 10.1186/1471-2105-9-160] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2007] [Accepted: 03/25/2008] [Indexed: 11/10/2022] Open Abstract Background Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks. Results Based on a homogeneity assumption that gene translation-related signals are uniformly distributed across a genome, we have established a computational method for a large-scale quantitative assessment of the reliability of TIS annotations for any prokaryotic genome. The method consists of modeling a positional weight matrix (PWM) of aligned sequences around predicted TISs in terms of a linear combination of three elementary PWMs, one for true TIS and the two others for false TISs. The three elementary PWMs are obtained using a reference set with highly reliable TIS predictions. A generalized least square estimator determines the weighting of the true TIS in the observed PWM, from which the accuracy of the prediction is derived. The validity of the method and the extent of the limitation of the assumptions are explicitly addressed by testing on experimentally verified TISs with variable accuracy of the reference sets. The method is applied to estimate the accuracy of TIS annotations that are provided on public databases such as RefSeq and ProTISA and by programs such as EasyGene, GeneMarkS, Glimmer 3 and TiCo. It is shown that RefSeq's TIS prediction is significantly less accurate than two recent predictors, Tico and ProTISA. With convincing proofs, we show two general preferential biases in the RefSeq annotation, i.e. over-annotating the longest open reading frame (LORF) and under-annotating ATG start codon. Finally, we have established a new TIS database, SupTISA, based on the best prediction of all the predictors; SupTISA has achieved an average accuracy of 92% over all 532 complete genomes. Conclusion Large-scale computational evaluation of TIS annotation has been achieved. A new TIS database much better than RefSeq has been constructed, and it provides a valuable resource for further TIS studies. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse