2
|
Cooley NP, Wright ES. Many purported pseudogenes in bacterial genomes are bona fide genes. BMC Genomics 2024; 25:365. [PMID: 38622536 PMCID: PMC11017572 DOI: 10.1186/s12864-024-10137-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/17/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Microbial genomes are largely comprised of protein coding sequences, yet some genomes contain many pseudogenes caused by frameshifts or internal stop codons. These pseudogenes are believed to result from gene degradation during evolution but could also be technical artifacts of genome sequencing or assembly. RESULTS Using a combination of observational and experimental data, we show that many putative pseudogenes are attributable to errors that are incorporated into genomes during assembly. Within 126,564 publicly available genomes, we observed that nearly identical genomes often substantially differed in pseudogene counts. Causal inference implicated assembler, sequencing platform, and coverage as likely causative factors. Reassembly of genomes from raw reads confirmed that each variable affects the number of putative pseudogenes in an assembly. Furthermore, simulated sequencing reads corroborated our observations that the quality and quantity of raw data can significantly impact the number of pseudogenes in an assembler dependent fashion. The number of unexpected pseudogenes due to internal stops was highly correlated (R2 = 0.96) with average nucleotide identity to the ground truth genome, implying relative pseudogene counts can be used as a proxy for overall assembly correctness. Applying our method to assemblies in RefSeq resulted in rejection of 3.6% of assemblies due to significantly elevated pseudogene counts. Reassembly from real reads obtained from high coverage genomes showed considerable variability in spurious pseudogenes beyond that observed with simulated reads, reinforcing the finding that high coverage is necessary to mitigate assembly errors. CONCLUSIONS Collectively, these results demonstrate that many pseudogenes in microbial genome assemblies are actually genes. Our results suggest that high read coverage is required for correct assembly and indicate an inflated number of pseudogenes due to internal stops is indicative of poor overall assembly quality.
Collapse
Affiliation(s)
- Nicholas P Cooley
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Erik S Wright
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for Evolutionary Biology and Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
3
|
Chen J, Xu F. Application of Nanopore Sequencing in the Diagnosis and Treatment of Pulmonary Infections. Mol Diagn Ther 2023; 27:685-701. [PMID: 37563539 PMCID: PMC10590290 DOI: 10.1007/s40291-023-00669-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2023] [Indexed: 08/12/2023]
Abstract
This review provides an in-depth discussion of the development, principles and utility of nanopore sequencing technology and its diverse applications in the identification of various pulmonary pathogens. We examined the emergence and advancements of nanopore sequencing as a significant player in this field. We illustrate the challenges faced in diagnosing mixed infections and further scrutinize the use of nanopore sequencing in the identification of single pathogens, including viruses (with a focus on its use in epidemiology, outbreak investigation, and viral resistance), bacteria (emphasizing 16S targeted sequencing, rare bacterial lung infections, and antimicrobial resistance studies), fungi (employing internal transcribed spacer sequencing), tuberculosis, and atypical pathogens. Furthermore, we discuss the role of nanopore sequencing in metagenomics and its potential for unbiased detection of all pathogens in a clinical setting, emphasizing its advantages in sequencing genome repeat areas and structural variant regions. We discuss the limitations in dealing with host DNA removal, the inherent high error rate of nanopore sequencing technology, along with the complexity of operation and processing, while acknowledging the possibilities provided by recent technological improvements. We compared nanopore sequencing with the BioFire system, a rapid molecular diagnostic system based on polymerase chain reaction. Although the BioFire system serves well for the rapid screening of known and common pathogens, it falls short in the identification of unknown or rare pathogens and in providing comprehensive genome analysis. As technological advancements continue, it is anticipated that the role of nanopore sequencing technology in diagnosing and treating lung infections will become increasingly significant.
Collapse
Affiliation(s)
- Jie Chen
- Department of Infectious Diseases, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310009, Zhejiang, China
| | - Feng Xu
- Department of Infectious Diseases, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310009, Zhejiang, China.
| |
Collapse
|
4
|
Brown DG, Wahlig TA, Ma A, Certain LK, Chalmers PN, Fisher MA, Leung DT. Genomic Characterization of 2 Cutibacterium acnes Isolates from a Surgical Site Infection Reveals Large Genomic Inversion. Pathog Immun 2023; 8:64-76. [PMID: 37830077 PMCID: PMC10566467 DOI: 10.20411/pai.v8i1.606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 08/08/2023] [Indexed: 10/14/2023] Open
Abstract
Background Cutibacterium acnes is a common commensal of human skin but may also present as an opportunistic pathogen in prosthetic joint and wound infections. Unfortunately, few complete genomes of C. acnes are publicly available, and even fewer are of isolates associated with infection. Here we report the isolation, characterization, and complete genomes of 2 C. acnes isolates from a surgical site infection of an elbow. Methods We used standard microbiological methods for phenotypic characterization and performed whole genome sequencing on 2 C. acnes isolates using a combination of short-read and long-read sequencing. Results Antibiotic susceptibility testing showed beta-lactamase negative and low minimal inhibitory concentrations to all antibiotics tested, with the exception of metronidazole. We assembled complete genomes of the 2 isolates, which are approximately 2.5 megabases in length. The isolates belong to the single-locus sequence type (SLST) H1 and the multi-locus sequence type (MLST) IB. Both isolates have similar composition of known virulence genes, and we found no evidence of plasmids but did find phage-associated genes. Notably, the 2 genomes are 99.97% identical but contain a large genomic inversion encompassing approximately half of the genome. Conclusions This is the first characterization of this large-scale genomic inversion in nearly identical isolates from the same wound. This report adds to the limited numbers of publicly available infection-associated complete genomes of C. acnes.
Collapse
Affiliation(s)
- D. Garrett Brown
- Division of Infectious Diseases, University of Utah School of Medicine, Salt Lake City, Utah
| | - Taylor A. Wahlig
- Division of Infectious Diseases, University of Utah School of Medicine, Salt Lake City, Utah
| | - Angela Ma
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah
- ARUP Laboratories, Salt Lake City, Utah
| | - Laura K. Certain
- Division of Infectious Diseases, University of Utah School of Medicine, Salt Lake City, Utah
- Department of Orthopaedic Surgery, University of Utah, Salt Lake City, Utah
| | - Peter N. Chalmers
- Department of Orthopaedic Surgery, University of Utah, Salt Lake City, Utah
| | - Mark A. Fisher
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah
- ARUP Laboratories, Salt Lake City, Utah
| | - Daniel T. Leung
- Division of Infectious Diseases, University of Utah School of Medicine, Salt Lake City, Utah
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah
| |
Collapse
|
5
|
Dicks J, Fazal MA, Oliver K, Grayson NE, Turnbull JD, Bane E, Burnett E, Deheer-Graham A, Holroyd N, Kaushal D, Keane J, Langridge G, Lomax J, McGregor H, Picton S, Quail M, Singh D, Tracey A, Korlach J, Russell JE, Alexander S, Parkhill J. NCTC3000: a century of bacterial strain collecting leads to a rich genomic data resource. Microb Genom 2023; 9:mgen000976. [PMID: 37194944 PMCID: PMC10272881 DOI: 10.1099/mgen.0.000976] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 02/07/2023] [Indexed: 05/18/2023] Open
Abstract
The National Collection of Type Cultures (NCTC) was founded on 1 January 1920 in order to fulfil a recognized need for a centralized repository for bacterial and fungal strains within the UK. It is among the longest-established collections of its kind anywhere in the world and today holds approximately 6000 type and reference bacterial strains - many of medical, scientific and veterinary importance - available to academic, health, food and veterinary institutions worldwide. Recently, a collaboration between NCTC, Pacific Biosciences and the Wellcome Sanger Institute established the NCTC3000 project to long-read sequence and assemble the genomes of up to 3000 NCTC strains. Here, at the beginning of the collection's second century, we introduce the resulting NCTC3000 sequence read datasets, genome assemblies and annotations as a unique, historically and scientifically relevant resource for the benefit of the international bacterial research community.
Collapse
Affiliation(s)
- Jo Dicks
- Culture Collections, UK Health Security Agency, 61 Colindale Avenue, London, NW9 5EQ, UK
| | - Mohammed-Abbas Fazal
- Culture Collections, UK Health Security Agency, 61 Colindale Avenue, London, NW9 5EQ, UK
| | - Karen Oliver
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Nicholas E. Grayson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
- Present address: Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK, OX3 9DU, UK
| | - Jake D. Turnbull
- Culture Collections, UK Health Security Agency, 61 Colindale Avenue, London, NW9 5EQ, UK
| | - Evangeline Bane
- Culture Collections, UK Health Security Agency, 61 Colindale Avenue, London, NW9 5EQ, UK
| | - Edward Burnett
- Culture Collections, UK Health Security Agency, 61 Colindale Avenue, London, NW9 5EQ, UK
| | - Ana Deheer-Graham
- Culture Collections, UK Health Security Agency, 61 Colindale Avenue, London, NW9 5EQ, UK
| | - Nancy Holroyd
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Dorota Kaushal
- Culture Collections, UK Health Security Agency, 61 Colindale Avenue, London, NW9 5EQ, UK
| | - Jacqueline Keane
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Gemma Langridge
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
- Present address: Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | - Jane Lomax
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Hannah McGregor
- Culture Collections, UK Health Security Agency, 61 Colindale Avenue, London, NW9 5EQ, UK
| | - Steve Picton
- Pacific Biosciences, 1305 O’Brien Drive, Menlo Park, CA, USA
| | - Michael Quail
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Deepak Singh
- Pacific Biosciences, 1305 O’Brien Drive, Menlo Park, CA, USA
| | - Alan Tracey
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Jonas Korlach
- Pacific Biosciences, 1305 O’Brien Drive, Menlo Park, CA, USA
| | - Julie E. Russell
- Culture Collections, UK Health Security Agency, 61 Colindale Avenue, London, NW9 5EQ, UK
| | - Sarah Alexander
- Culture Collections, UK Health Security Agency, 61 Colindale Avenue, London, NW9 5EQ, UK
| | - Julian Parkhill
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
- Present address: Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB3 0ES, UK
| |
Collapse
|