1
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2024; 52:D33-D43. [PMID: 37994677 PMCID: PMC10767890 DOI: 10.1093/nar/gkad1044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/20/2023] [Accepted: 10/23/2023] [Indexed: 11/24/2023] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Collapse
|
2
|
Putting everything in its place: using the INSDC compliant Pathogen Data Object Model to better structure genomic data submitted for public health applications. Microb Genom 2023; 9:001145. [PMID: 38085797 PMCID: PMC10763499 DOI: 10.1099/mgen.0.001145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/13/2023] [Indexed: 12/18/2023] Open
Abstract
Fast, efficient public health actions require well-organized and coordinated systems that can supply timely and accurate knowledge. Public databases of pathogen genomic data, such as the International Nucleotide Sequence Database Collaboration (INSDC), have become essential tools for efficient public health decisions. However, these international resources began primarily for academic purposes, rather than for surveillance or interventions. Now, queries need to access not only the whole genomes of multiple pathogens but also make connections using robust contextual metadata to identify issues of public health relevance. Databases that over time developed a patchwork of submission formats and requirements need to be consistently organized and coordinated internationally to allow effective searches.To help resolve these issues, we propose a common pathogen data structure called the Pathogen Data Object Model (DOM) that will formalize the minimum pieces of sequence data and contextual data necessary for general public health uses, while recognizing that submitters will likely withhold a wide range of non-public contextual data. Further, we propose contributors use the Pathogen DOM for all pathogen submissions (bacterial, viral, fungal, and parasites), which will simplify data submissions and provide a consistent and transparent data structure for downstream data analyses. We also highlight how improved submission tools can support the Pathogen DOM, offering users additional easy-to-use methods to ensure this structure is followed.
Collapse
|
3
|
Identification and Characterization of ten Escherichia coli Strains Encoding Novel Shiga Toxin 2 Subtypes, Stx2n as Well as Stx2j, Stx2m, and Stx2o, in the United States. Microorganisms 2023; 11:2561. [PMID: 37894219 PMCID: PMC10608928 DOI: 10.3390/microorganisms11102561] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/10/2023] [Accepted: 10/12/2023] [Indexed: 10/29/2023] Open
Abstract
The sharing of genome sequences in online data repositories allows for large scale analyses of specific genes or gene families. This can result in the detection of novel gene subtypes as well as the development of improved detection methods. Here, we used publicly available WGS data to detect a novel Stx subtype, Stx2n in two clinical E. coli strains isolated in the USA. During this process, additional Stx2 subtypes were detected; six Stx2j, one Stx2m strain, and one Stx2o, were all analyzed for variability from the originally described subtypes. Complete genome sequences were assembled from short- or long-read sequencing and analyzed for serotype, and ST types. The WGS data from Stx2n- and Stx2o-producing STEC strains were further analyzed for virulence genes pro-phage analysis and phage insertion sites. Nucleotide and amino acid maximum parsimony trees showed expected clustering of the previously described subtypes and a clear separation of the novel Stx2n subtype. WGS data were used to design OMNI PCR primers for the detection of all known stx1 (283 bp amplicon), stx2 (400 bp amplicon), intimin encoded by eae (221 bp amplicon), and stx2f (438 bp amplicon) subtypes. These primers were tested in three different laboratories, using standard reference strains. An analysis of the complete genome sequence showed variability in serogroup, virulence genes, and ST type, and Stx2 pro-phages showed variability in size, gene composition, and phage insertion sites. The strains with Stx2j, Stx2m, Stx2n, and Stx2o showed toxicity to Vero cells. Stx2j carrying strain, 2012C-4221, was induced when grown with sub-inhibitory concentrations of ciprofloxacin, and toxicity was detected. Taken together, these data highlight the need to reinforce genomic surveillance to identify the emergence of potential new Stx2 or Stx1 variants. The importance of this surveillance has a paramount impact on public health. Per our description in this study, we suggest that 2017C-4317 be designated as the Stx2n type-strain.
Collapse
|
4
|
Database resources of the National Center for Biotechnology Information in 2023. Nucleic Acids Res 2023; 51:D29-D38. [PMID: 36370100 PMCID: PMC9825438 DOI: 10.1093/nar/gkac1032] [Citation(s) in RCA: 85] [Impact Index Per Article: 85.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/11/2022] [Accepted: 11/09/2022] [Indexed: 11/15/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. New resources include the Comparative Genome Resource (CGR) and the BLAST ClusteredNR database. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, IgBLAST, GDV, RefSeq, NCBI Virus, GenBank type assemblies, iCn3D, ClinVar, GTR, dbGaP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Collapse
|
5
|
Abstract
Antimicrobial resistance (AMR) is a significant public health threat. Low-cost whole-genome sequencing, which is often used in surveillance programmes, provides an opportunity to assess AMR gene content in these genomes using in silico approaches. A variety of bioinformatic tools have been developed to identify these genomic elements. Most of those tools rely on reference databases of nucleotide or protein sequences and collections of models and rules for analysis. While the tools are critical for the identification of AMR genes, the databases themselves also provide significant utility for researchers, for applications ranging from sequence analysis to information about AMR phenotypes. Additionally, these databases can be evaluated by domain experts and others to ensure their accuracy. Here we describe how we curate the genes, point mutations and blast rules, and hidden Markov models used in NCBI’s AMRFinderPlus, along with the quality-control steps we take to ensure database quality. We also describe the web interfaces that display the full structure of the database and their newly developed cross-browser relationships. Then, using the Reference Gene Catalog as an example, we detail how the databases, rules and models are made publicly available, as well as how to access the software. In addition, as part of the Pathogen Detection system, we have analysed over 1 million publicly available genomes using AMRFinderPlus and its databases. We discuss how the computed analyses generated by those tools can be accessed through a web interface. Finally, we conclude with NCBI’s plans to make these databases accessible over the long-term.
Collapse
|
6
|
Use of Whole Genome Sequencing by the Federal Interagency Collaboration for Genomics for Food and Feed Safety in the United States. J Food Prot 2022; 85:755-772. [PMID: 35259246 DOI: 10.4315/jfp-21-437] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 02/22/2022] [Indexed: 11/11/2022]
Abstract
ABSTRACT This multiagency report developed by the Interagency Collaboration for Genomics for Food and Feed Safety provides an overview of the use of and transition to whole genome sequencing (WGS) technology for detection and characterization of pathogens transmitted commonly by food and for identification of their sources. We describe foodborne pathogen analysis, investigation, and harmonization efforts among the following federal agencies: National Institutes of Health; Department of Health and Human Services, Centers for Disease Control and Prevention (CDC) and U.S. Food and Drug Administration (FDA); and the U.S. Department of Agriculture, Food Safety and Inspection Service, Agricultural Research Service, and Animal and Plant Health Inspection Service. We describe single nucleotide polymorphism, core-genome, and whole genome multilocus sequence typing data analysis methods as used in the PulseNet (CDC) and GenomeTrakr (FDA) networks, underscoring the complementary nature of the results for linking genetically related foodborne pathogens during outbreak investigations while allowing flexibility to meet the specific needs of Interagency Collaboration partners. We highlight how we apply WGS to pathogen characterization (virulence and antimicrobial resistance profiles) and source attribution efforts and increase transparency by making the sequences and other data publicly available through the National Center for Biotechnology Information. We also highlight the impact of current trends in the use of culture-independent diagnostic tests for human diagnostic testing on analytical approaches related to food safety and what is next for the use of WGS in the area of food safety. HIGHLIGHTS
Collapse
|
7
|
Abstract
Virulence is a complex mix of microbial traits and host susceptibility that could ultimately lead to disease. The increased prevalence of multidrug resistant infections complicates treatment options, augmenting the need for developing robust computational methods and pipelines that enable researchers and clinicians to rapidly identify the underlying mechanism(s) of virulence in any given sample/isolate. Consequently, the National Center for Biotechnology and Information at the National Institutes of Health hosted an in-person hackathon in Bethesda, Maryland during July 2019 to assist with developing cloud-based methods to reduce reliance on local computational infrastructure. Groups of attendees were assigned tasks that are relevant to identifying relevant tools, constructing pipelines capable of identifying microbial virulence factors, and managing the associated data and metadata. Specifically, the assigned tasks consisted of the following: data indexing, metabolic functions, virulence factors, antimicrobial resistance, mobile elements in enterococci, and metatranscriptomics. The cloud-based framework established by this hackathon can be augmented and built upon by the research community to aid in the rapid identification of microbial virulence factors.
Collapse
|
8
|
AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Sci Rep 2021; 11:12728. [PMID: 34135355 PMCID: PMC8208984 DOI: 10.1038/s41598-021-91456-0] [Citation(s) in RCA: 296] [Impact Index Per Article: 98.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 05/19/2021] [Indexed: 12/26/2022] Open
Abstract
Antimicrobial resistance (AMR) is a significant public health threat. With the rise of affordable whole genome sequencing, in silico approaches to assessing AMR gene content can be used to detect known resistance mechanisms and potentially identify novel mechanisms. To enable accurate assessment of AMR gene content, as part of a multi-agency collaboration, NCBI developed a comprehensive AMR gene database, the Bacterial Antimicrobial Resistance Reference Gene Database and the AMR gene detection tool AMRFinder. Here, we describe the expansion of the Reference Gene Database, now called the Reference Gene Catalog, to include putative acid, biocide, metal, stress resistance genes, in addition to virulence genes and species-specific point mutations. Genes and point mutations are classified by broad functions, as well as more detailed functions. As we have expanded both the functional repertoire of identified genes and functionality, NCBI released a new version of AMRFinder, known as AMRFinderPlus. This new tool allows users the option to utilize only the core set of AMR elements, or include stress response and virulence genes, too. AMRFinderPlus can detect acquired genes and point mutations in both protein and nucleotide sequence. In addition, the evidence used to identify the gene has been expanded to include whether nucleotide or protein sequence was used, its location in the contig, and presence of an internal stop codon. These database improvements and functional expansions will enable increased precision in identifying AMR genes, linking AMR genotypes and phenotypes, and determining possible relationships between AMR, virulence, and stress response.
Collapse
|
9
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2021; 49:D10-D17. [PMID: 33095870 PMCID: PMC7778943 DOI: 10.1093/nar/gkaa892] [Citation(s) in RCA: 410] [Impact Index Per Article: 136.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 09/25/2020] [Accepted: 10/08/2020] [Indexed: 11/14/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 34 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface and NCBI datasets. Additional resources that were updated in the past year include PMC, Bookshelf, Genome Data Viewer, SRA, ClinVar, dbSNP, dbVar, Pathogen Detection, BLAST, Primer-BLAST, IgBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Collapse
|
10
|
FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science. Nat Commun 2019; 10:3313. [PMID: 31346170 PMCID: PMC6658474 DOI: 10.1038/s41467-019-11306-6] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Accepted: 07/02/2019] [Indexed: 02/08/2023] Open
Abstract
FDA proactively invests in tools to support innovation of emerging technologies, such as infectious disease next generation sequencing (ID-NGS). Here, we introduce FDA-ARGOS quality-controlled reference genomes as a public database for diagnostic purposes and demonstrate its utility on the example of two use cases. We provide quality control metrics for the FDA-ARGOS genomic database resource and outline the need for genome quality gap filling in the public domain. In the first use case, we show more accurate microbial identification of Enterococcus avium from metagenomic samples with FDA-ARGOS reference genomes compared to non-curated GenBank genomes. In the second use case, we demonstrate the utility of FDA-ARGOS reference genomes for Ebola virus target sequence comparison as part of a composite validation strategy for ID-NGS diagnostic tests. The use of FDA-ARGOS as an in silico target sequence comparator tool combined with representative clinical testing could reduce the burden for completing ID-NGS clinical trials. To be able to use infectious disease next generation sequencing as a diagnostic tool, appropriate reference datasets are required. Here, Sichtig et al. describe FDA-ARGOS, a reference database for high-quality microbial reference genomes, and demonstrate its utility on the example of two use cases.
Collapse
|
11
|
Proposal for assignment of allele numbers for mobile colistin resistance (mcr) genes. J Antimicrob Chemother 2018; 73:2625-2630. [PMID: 30053115 PMCID: PMC6148208 DOI: 10.1093/jac/dky262] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The initial report of the mcr-1 (mobile colistin resistance) gene has led to many reports of mcr-1 variants and other mcr genes from different bacterial species originating from human, animal and environmental samples in different geographical locations. Resistance gene nomenclature is complex and unfortunately problems such as different names being used for the same gene/protein or the same name being used for different genes/proteins are not uncommon. Registries exist for some families, such as bla (β-lactamase) genes, but there is as yet no agreed nomenclature scheme for mcr genes. The National Center for Biotechnology Information (NCBI) recently took over assigning bla allele numbers from the longstanding Lahey β-lactamase website and has agreed to do the same for mcr genes. Here, we propose a nomenclature scheme that we hope will be acceptable to researchers in this area and that will reduce future confusion.
Collapse
|
12
|
Comment on: Resistance gene naming and numbering: is it a new gene or not? J Antimicrob Chemother 2016; 71:2677-8. [PMID: 27261266 DOI: 10.1093/jac/dkw204] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
|
13
|
Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation. Clin Infect Dis 2016; 63:380-6. [PMID: 27090985 DOI: 10.1093/cid/ciw242] [Citation(s) in RCA: 229] [Impact Index Per Article: 28.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 04/07/2016] [Indexed: 12/15/2022] Open
Abstract
Listeria monocytogenes (Lm) causes severe foodborne illness (listeriosis). Previous molecular subtyping methods, such as pulsed-field gel electrophoresis (PFGE), were critical in detecting outbreaks that led to food safety improvements and declining incidence, but PFGE provides limited genetic resolution. A multiagency collaboration began performing real-time, whole-genome sequencing (WGS) on all US Lm isolates from patients, food, and the environment in September 2013, posting sequencing data into a public repository. Compared with the year before the project began, WGS, combined with epidemiologic and product trace-back data, detected more listeriosis clusters and solved more outbreaks (2 outbreaks in pre-WGS year, 5 in WGS year 1, and 9 in year 2). Whole-genome multilocus sequence typing and single nucleotide polymorphism analyses provided equivalent phylogenetic relationships relevant to investigations; results were most useful when interpreted in context of epidemiological data. WGS has transformed listeriosis outbreak surveillance and is being implemented for other foodborne pathogens.
Collapse
|
14
|
Abstract
Many genomes are incorrectly identified at GenBank. We developed a plan to find and correct misidentified genomes using genomic comparison statistics together with a scaffold of reliably identified genomes from type. A workshop was organized with broad representation from the bacterial taxonomic community to review the proposal, the GenBank Microbial Genomic Taxonomy Workshop, Bethesda MD, May 12–13, 2015.
Collapse
|
15
|
1338Improving Detection and Investigation of Listeriosis Outbreaks Using Real-Time Whole-Genome Sequencing. Open Forum Infect Dis 2014. [DOI: 10.1093/ofid/ofu051.155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
16
|
|
17
|
Recent advances in biocuration: meeting report from the Fifth International Biocuration Conference. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas036. [PMID: 23110974 PMCID: PMC3483532 DOI: 10.1093/database/bas036] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The 5th International Biocuration Conference brought together over 300 scientists to exchange on their work, as well as discuss issues relevant to the International Society for Biocuration's (ISB) mission. Recurring themes this year included the creation and promotion of gold standards, the need for more ontologies, and more formal interactions with journals. The conference is an essential part of the ISB's goal to support exchanges among members of the biocuration community. Next year's conference will be held in Cambridge, UK, from 7 to 10 April 2013. In the meanwhile, the ISB website provides information about the society's activities (http://biocurator.org), as well as related events of interest.
Collapse
|
18
|
Abstract
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries.
Collapse
|
19
|
|
20
|
Abstract
Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which has been a boon to scientists studying micro-organism evolution and to those who wish to understand the biological underpinnings of microbial systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date. ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and plasmids from four major groups: prokaryotes, bacteriophages and the mitochondrial and chloroplast organelles. There are 7180 clusters containing 376 513 proteins with curated gene and protein functional annotation. PubMed identifiers and external cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore more detailed information, such as multiple alignments, phylogenetic trees and genomic neighborhoods. ProtClustDB provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.
Collapse
|
21
|
Abstract
NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 × 106 proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.
Collapse
|
22
|
Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2008; 12:137-41. [PMID: 18416670 PMCID: PMC3196215 DOI: 10.1089/omi.2008.0017] [Citation(s) in RCA: 544] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures (SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.
Collapse
|
23
|
Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome. J Bacteriol 2007; 189:8186-95. [PMID: 17675389 PMCID: PMC2168654 DOI: 10.1128/jb.00690-07] [Citation(s) in RCA: 203] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The distributed-genome hypothesis (DGH) states that pathogenic bacteria possess a supragenome that is much larger than the genome of any single bacterium and that these pathogens utilize genetic recombination and a large, noncore set of genes as a means of diversity generation. We sequenced the genomes of eight nasopharyngeal strains of Streptococcus pneumoniae isolated from pediatric patients with upper respiratory symptoms and performed quantitative genomic analyses among these and nine publicly available pneumococcal strains. Coding sequences from all strains were grouped into 3,170 orthologous gene clusters, of which 1,454 (46%) were conserved among all 17 strains. The majority of the gene clusters, 1,716 (54%), were not found in all strains. Genic differences per strain pair ranged from 35 to 629 orthologous clusters, with each strain's genome containing between 21 and 32% noncore genes. The distribution of the orthologous clusters per genome for the 17 strains was entered into the finite-supragenome model, which predicted that (i) the S. pneumoniae supragenome contains more than 5,000 orthologous clusters and (ii) 99% of the orthologous clusters ( approximately 3,000) that are represented in the S. pneumoniae population at frequencies of >or=0.1 can be identified if 33 representative genomes are sequenced. These extensive genic diversity data support the DGH and provide a basis for understanding the great differences in clinical phenotype associated with various pneumococcal strains. When these findings are taken together with previous studies that demonstrated the presence of a supragenome for Streptococcus agalactiae and Haemophilus influenzae, it appears that the possession of a distributed genome is a common host interaction strategy.
Collapse
|
24
|
Abstract
pED208 is a transfer-derepressed mutant of the IncFV plasmid, F(0)lac, which has an IS2 element inserted in its traY gene, resulting in constitutive overexpression of its transfer (tra) region. The pED208 transfer region, which encodes proteins responsible for pilus synthesis and conjugative plasmid transfer, was sequenced and found to be very similar to the F tra region in terms of its organization although most pED208 tra proteins share only about 45% amino acid identity. All the essential genes for F transfer had homologs within the pED208 transfer region with the exception of traQ, which encodes the chaperone for stable F-pilin expression. F(0)lac appears to have a fertility inhibition system different than the FinOP system of other F-like plasmids, and its transfer efficiency was increased in the presence of F or R100, suggesting that it could be mobilized by these plasmids. The F-like transfer systems specified by F, R100, and F(0)lac were highly specific for their cognate origins of transfer (oriT) as measured by their abilities to mobilize chimeric oriT-containing plasmids.
Collapse
|
25
|
Die Bedeutung der Liquoruntersuchungen bei unklaren Rückenschmerzen. Dtsch Med Wochenschr 1940. [DOI: 10.1055/s-0028-1122286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|