1
|
Comment on: Resistance gene naming and numbering: is it a new gene or not? J Antimicrob Chemother 2016; 71:2677-8. [PMID: 27261266 DOI: 10.1093/jac/dkw204] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
|
2
|
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2015; 44:D733-45. [PMID: 26553804 PMCID: PMC4702849 DOI: 10.1093/nar/gkv1189] [Citation(s) in RCA: 3322] [Impact Index Per Article: 369.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 10/24/2015] [Indexed: 12/12/2022] Open
Abstract
The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
Collapse
|
3
|
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36 766 member database signatures integrated into 26 238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.
Collapse
|
4
|
|
5
|
Haloferax volcanii archaeosortase is required for motility, mating, and C-terminal processing of the S-layer glycoprotein. Mol Microbiol 2013; 88:1164-75. [PMID: 23651326 DOI: 10.1111/mmi.12248] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/29/2013] [Indexed: 01/29/2023]
Abstract
Cell surfaces are decorated by a variety of proteins that facilitate interactions with their environments and support cell stability. These secreted proteins are anchored to the cell by mechanisms that are diverse, and, in archaea, poorly understood. Recently published in silico data suggest that in some species a subset of secreted euryarchaeal proteins, which includes the S-layer glycoprotein, is processed and covalently linked to the cell membrane by enzymes referred to as archaeosortases. In silico work led to the proposal that an independent, sortase-like system for proteolysis-coupled, carboxy-terminal lipid modification exists in bacteria (exosortase) and archaea (archaeosortase). Here, we provide the first in vivo characterization of an archaeosortase in the haloarchaeal model organism Haloferax volcanii. Deletion of the artA gene (HVO_0915) resulted in multiple biological phenotypes: (a) poor growth, especially under low-salt conditions, (b) alterations in cell shape and the S-layer, (c) impaired motility, suppressors of which still exhibit poor growth, and (d) impaired conjugation. We studied one of the ArtA substrates, the S-layer glycoprotein, using detailed proteomic analysis. While the carboxy-terminal region of S-layer glycoproteins, consisting of a putative threonine-rich O-glycosylated region followed by a hydrophobic transmembrane helix, has been notoriously resistant to any proteomic peptide identification, we were able to identify two overlapping peptides from the transmembrane domain present in the ΔartA strain but not in the wild-type strain. This clearly shows that ArtA is involved in carboxy-terminal post-translational processing of the S-layer glycoprotein. As it is known from previous studies that a lipid is covalently attached to the carboxy-terminal region of the S-layer glycoprotein, our data strongly support the conclusion that archaeosortase functions analogously to sortase, mediating proteolysis-coupled, covalent cell surface attachment.
Collapse
|
6
|
Abstract
CharProtDB (http://www.jcvi.org/charprotdb/) is a curated database of biochemically characterized proteins. It provides a source of direct rather than transitive assignments of function, designed to support automated annotation pipelines. The initial data set in CharProtDB was collected through manual literature curation over the years by analysts at the J. Craig Venter Institute (JCVI) [formerly The Institute of Genomic Research (TIGR)] as part of their prokaryotic genome sequencing projects. The CharProtDB has been expanded by import of selected records from publicly available protein collections whose biocuration indicated direct rather than homology-based assignment of function. Annotations in CharProtDB include gene name, symbol and various controlled vocabulary terms, including Gene Ontology terms, Enzyme Commission number and TransportDB accession. Each annotation is referenced with the source; ideally a journal reference, or, if imported and lacking one, the original database source.
Collapse
|
7
|
Abstract
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
Collapse
|
8
|
SMURF: Genomic mapping of fungal secondary metabolite clusters. Fungal Genet Biol 2010; 47:736-41. [PMID: 20554054 PMCID: PMC2916752 DOI: 10.1016/j.fgb.2010.06.003] [Citation(s) in RCA: 513] [Impact Index Per Article: 36.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2009] [Revised: 05/25/2010] [Accepted: 06/02/2010] [Indexed: 01/07/2023]
Abstract
Fungi produce an impressive array of secondary metabolites (SMs) including mycotoxins, antibiotics and pharmaceuticals. The genes responsible for their biosynthesis, export, and transcriptional regulation are often found in contiguous gene clusters. To facilitate annotation of these clusters in sequenced fungal genomes, we developed the web-based software SMURF (www.jcvi.org/smurf/) to systematically predict clustered SM genes based on their genomic context and domain content. We applied SMURF to catalog putative clusters in 27 publicly available fungal genomes. Comparison with genetically characterized clusters from six fungal species showed that SMURF accurately recovered all clusters and detected additional potential clusters. Subsequent comparative analysis revealed the striking biosynthetic capacity and variability of the fungal SM pathways and the correlation between unicellularity and the absence of SMs. Further genetics studies are needed to experimentally confirm these clusters.
Collapse
|
9
|
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).
Collapse
|
10
|
New developments in the InterPro database. Nucleic Acids Res 2007; 35:D224-8. [PMID: 17202162 PMCID: PMC1899100 DOI: 10.1093/nar/gkl841] [Citation(s) in RCA: 349] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Revised: 10/06/2006] [Accepted: 10/06/2006] [Indexed: 11/14/2022] Open
Abstract
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.
Collapse
|
11
|
Abstract
The complete genome of Aeromonas hydrophila ATCC 7966(T) was sequenced. Aeromonas, a ubiquitous waterborne bacterium, has been placed by the Environmental Protection Agency on the Contaminant Candidate List because of its potential to cause human disease. The 4.7-Mb genome of this emerging pathogen shows a physiologically adroit organism with broad metabolic capabilities and considerable virulence potential. A large array of virulence genes, including some identified in clinical isolates of Aeromonas spp. or Vibrio spp., may confer upon this organism the ability to infect a wide range of hosts. However, two recognized virulence markers, a type III secretion system and a lateral flagellum, that are reported in other A. hydrophila strains are not identified in the sequenced isolate, ATCC 7966(T). Given the ubiquity and free-living lifestyle of this organism, there is relatively little evidence of fluidity in terms of mobile elements in the genome of this particular strain. Notable aspects of the metabolic repertoire of A. hydrophila include dissimilatory sulfate reduction and resistance mechanisms (such as thiopurine reductase, arsenate reductase, and phosphonate degradation enzymes) against toxic compounds encountered in polluted waters. These enzymes may have bioremediative as well as industrial potential. Thus, the A. hydrophila genome sequence provides valuable insights into its ability to flourish in both aquatic and host environments.
Collapse
|
12
|
Abstract
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Collapse
|
13
|
Abstract
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Collapse
|
14
|
Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 2002; 419:498-511. [PMID: 12368864 PMCID: PMC3836256 DOI: 10.1038/nature01097] [Citation(s) in RCA: 3062] [Impact Index Per Article: 139.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2002] [Accepted: 09/02/2002] [Indexed: 11/08/2022]
Abstract
The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.
Collapse
|
15
|
Abstract
Virulence and immunity are poorly understood in Mycobacterium tuberculosis. We sequenced the complete genome of the M. tuberculosis clinical strain CDC1551 and performed a whole-genome comparison with the laboratory strain H37Rv in order to identify polymorphic sequences with potential relevance to disease pathogenesis, immunity, and evolution. We found large-sequence and single-nucleotide polymorphisms in numerous genes. Polymorphic loci included a phospholipase C, a membrane lipoprotein, members of an adenylate cyclase gene family, and members of the PE/PPE gene family, some of which have been implicated in virulence or the host immune response. Several gene families, including the PE/PPE gene family, also had significantly higher synonymous and nonsynonymous substitution frequencies compared to the genome as a whole. We tested a large sample of M. tuberculosis clinical isolates for a subset of the large-sequence and single-nucleotide polymorphisms and found widespread genetic variability at many of these loci. We performed phylogenetic and epidemiological analysis to investigate the evolutionary relationships among isolates and the origins of specific polymorphic loci. A number of these polymorphisms appear to have occurred multiple times as independent events, suggesting that these changes may be under selective pressure. Together, these results demonstrate that polymorphisms among M. tuberculosis strains are more extensive than initially anticipated, and genetic variation may have an important role in disease pathogenesis and immunity.
Collapse
|
16
|
InterPro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform 2002; 3:225-35. [PMID: 12230031 DOI: 10.1093/bib/3.3.225] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The exponential increase in the submission of nucleotide sequences to the nucleotide sequence database by genome sequencing centres has resulted in a need for rapid, automatic methods for classification of the resulting protein sequences. There are several signature and sequence cluster-based methods for protein classification, each resource having distinct areas of optimum application owing to the differences in the underlying analysis methods. In recognition of this, InterPro was developed as an integrated documentation resource for protein families, domains and functional sites, to rationalise the complementary efforts of the individual protein signature database projects. The member databases - PRINTS, PROSITE, Pfam, ProDom, SMART and TIGRFAMs - form the InterPro core. Related signatures from each member database are unified into single InterPro entries. Each InterPro entry includes a unique accession number, functional descriptions and literature references, and links are made back to the relevant member database(s). Release 4.0 of InterPro (November 2001) contains 4,691 entries, representing 3,532 families, 1,068 domains, 74 repeats and 15 sites of post-translational modification (PTMs) encoded by different regular expressions, profiles, fingerprints and hidden Markov models (HMMs). Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (2,141,621 InterPro hits from 586,124 SWISS-PROT and TrEMBL protein sequences). The database is freely accessible for text- and sequence-based searches.
Collapse
|
17
|
A bacterial genome in flux: the twelve linear and nine circular extrachromosomal DNAs in an infectious isolate of the Lyme disease spirochete Borrelia burgdorferi. Mol Microbiol 2000; 35:490-516. [PMID: 10672174 DOI: 10.1046/j.1365-2958.2000.01698.x] [Citation(s) in RCA: 598] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We have determined that Borrelia burgdorferi strain B31 MI carries 21 extrachromosomal DNA elements, the largest number known for any bacterium. Among these are 12 linear and nine circular plasmids, whose sequences total 610 694 bp. We report here the nucleotide sequence of three linear and seven circular plasmids (comprising 290 546 bp) in this infectious isolate. This completes the genome sequencing project for this organism; its genome size is 1 521 419 bp (plus about 2000 bp of undetermined telomeric sequences). Analysis of the sequence implies that there has been extensive and sometimes rather recent DNA rearrangement among a number of the linear plasmids. Many of these events appear to have been mediated by recombinational processes that formed duplications. These many regions of similarity are reflected in the fact that most plasmid genes are members of one of the genome's 161 paralogous gene families; 107 of these gene families, which vary in size from two to 41 members, contain at least one plasmid gene. These rearrangements appear to have contributed to a surprisingly large number of apparently non-functional pseudogenes, a very unusual feature for a prokaryotic genome. The presence of these damaged genes suggests that some of the plasmids may be in a period of rapid evolution. The sequence predicts 535 plasmid genes >/=300 bp in length that may be intact and 167 apparently mutationally damaged and/or unexpressed genes (pseudogenes). The large majority, over 90%, of genes on these plasmids have no convincing similarity to genes outside Borrelia, suggesting that they perform specialized functions.
Collapse
|
18
|
|
19
|
Abstract
The intraocular expansion of perfluoromethane (CF4), perfluoroethane (C2F6), and perfluoropropane (C3F8) was determined by a direct method for measuring intravitreal gas. A bubble of CF4 was found to expand 1.9 times the volume initially injected; C2F6 expanded 3.3 times; C3F8 expanded four times. The expansion characteristics of the experimental gases were compared with those of sulfur hexafluoride (SF6) and octofluorocyclobutane (C4F8), two gases already in clinical use.
Collapse
|