1
|
Alvarez RV, Landsman D. GTax: improving de novo transcriptome assembly by removing foreign RNA contamination. Genome Biol 2024; 25:12. [PMID: 38191464 PMCID: PMC10773103 DOI: 10.1186/s13059-023-03141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 12/08/2023] [Indexed: 01/10/2024] Open
Abstract
The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts.
Collapse
Affiliation(s)
- Roberto Vera Alvarez
- Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD, USA
| | - David Landsman
- Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD, USA.
| |
Collapse
|
2
|
Schlanderer J, Hoffmann H, Lüddecke J, Golubov A, Grasse W, Kindler EV, Kohl TA, Merker M, Metzger C, Mohr V, Niemann S, Pilloni C, Plesnik S, Raya B, Shresta B, Utpatel C, Zengerle R, Beutler M, Paust N. Two-stage tuberculosis diagnostics: combining centrifugal microfluidics to detect TB infection and Inh and Rif resistance at the point of care with subsequent antibiotic resistance profiling by targeted NGS. Lab Chip 2023; 24:74-84. [PMID: 37999937 DOI: 10.1039/d3lc00783a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2023]
Abstract
Globally, tuberculosis (TB) remains the deadliest bacterial infectious disease, and spreading antibiotic resistances is the biggest challenge for combatting the disease. Rapid and comprehensive diagnostics including drug susceptibility testing (DST) would assure early treatment, reduction of morbidity and the interruption of transmission chains. To date, rapid genetic resistance testing addresses only one to four drug groups while complete DST is done phenotypically and takes several weeks. To overcome these limitations, we developed a two-stage workflow for rapid TB diagnostics including DST from a single sputum sample that can be completed within three days. The first stage is qPCR detection of M. tuberculosis complex (MTBC) including antibiotic resistance testing against the first-line antibiotics, isoniazid (Inh) and rifampicin (Rif). The test is automated by centrifugal microfluidics and designed for point of care (PoC). Furthermore, enriched MTBC DNA is provided in a detachable sample tube to enable the second stage: if the PCR detects MTBC and resistance to either Inh or Rif, the MTBC DNA is shipped to specialized facilities and analyzed by targeted next generation sequencing (tNGS) to assess the complete resistance profile. Proof-of-concept testing of the PoC test revealed an analytical sensitivity of 44.2 CFU ml-1, a diagnostic sensitivity of 96%, and a diagnostic specificity of 100% for MTBC detection. Coupled tNGS successfully provided resistance profiles, demonstrated for samples from 17 patients. To the best of our knowledge, the presented combination of PoC qPCR with tNGS allows for the fastest comprehensive TB diagnostics comprising decentralized pathogen detection with subsequent resistance profiling in a facility specialized in tNGS.
Collapse
Affiliation(s)
| | - Harald Hoffmann
- SYNLAB Gauting SYNLAB Human Genetics Munich, 82131 Gauting, Germany
| | - Jan Lüddecke
- Hahn-Schickard, 79110 Freiburg, Germany.
- Laboratory for MEMS Applications, IMTEK - Department of Microsystems Engineering, University of Freiburg, 79110 Freiburg, Germany
| | - Andrey Golubov
- WHO supranational Tuberculosis Reference Laboratory, IML red, 82131 Gauting, Germany
| | | | | | - Thomas A Kohl
- Molecular and Experimental Mycobacteriology, Forschungszentrum Borstel, 23845 Borstel, Germany
| | - Matthias Merker
- Molecular and Experimental Mycobacteriology, Forschungszentrum Borstel, 23845 Borstel, Germany
| | | | - Vanessa Mohr
- Molecular and Experimental Mycobacteriology, Forschungszentrum Borstel, 23845 Borstel, Germany
| | - Stefan Niemann
- Molecular and Experimental Mycobacteriology, Forschungszentrum Borstel, 23845 Borstel, Germany
| | - Claudia Pilloni
- WHO supranational Tuberculosis Reference Laboratory, IML red, 82131 Gauting, Germany
| | - Sara Plesnik
- WHO supranational Tuberculosis Reference Laboratory, IML red, 82131 Gauting, Germany
| | - Bijendra Raya
- German Nepal Tuberculosis Project (GENETUP), Nepal Anti-Tuberculosis Association (NATA), Kalimati, Nepal
| | - Bhawana Shresta
- German Nepal Tuberculosis Project (GENETUP), Nepal Anti-Tuberculosis Association (NATA), Kalimati, Nepal
| | - Christian Utpatel
- Molecular and Experimental Mycobacteriology, Forschungszentrum Borstel, 23845 Borstel, Germany
| | - Roland Zengerle
- Hahn-Schickard, 79110 Freiburg, Germany.
- Laboratory for MEMS Applications, IMTEK - Department of Microsystems Engineering, University of Freiburg, 79110 Freiburg, Germany
| | - Markus Beutler
- WHO supranational Tuberculosis Reference Laboratory, IML red, 82131 Gauting, Germany
| | - Nils Paust
- Hahn-Schickard, 79110 Freiburg, Germany.
- Laboratory for MEMS Applications, IMTEK - Department of Microsystems Engineering, University of Freiburg, 79110 Freiburg, Germany
| |
Collapse
|
3
|
Camacho C, Boratyn GM, Joukov V, Vera Alvarez R, Madden TL. ElasticBLAST: accelerating sequence search via cloud computing. BMC Bioinformatics 2023; 24:117. [PMID: 36967390 PMCID: PMC10040096 DOI: 10.1186/s12859-023-05245-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 03/21/2023] [Indexed: 03/28/2023] Open
Abstract
BACKGROUND Biomedical researchers use alignments produced by BLAST (Basic Local Alignment Search Tool) to categorize their query sequences. Producing such alignments is an essential bioinformatics task that is well suited for the cloud. The cloud can perform many calculations quickly as well as store and access large volumes of data. Bioinformaticians can also use it to collaborate with other researchers, sharing their results, datasets and even their pipelines on a common platform. RESULTS We present ElasticBLAST, a cloud native application to perform BLAST alignments in the cloud. ElasticBLAST can handle anywhere from a few to many thousands of queries and run the searches on thousands of virtual CPUs (if desired), deleting resources when it is done. It uses cloud native tools for orchestration and can request discounted instances, lowering cloud costs for users. It is supported on Amazon Web Services and Google Cloud Platform. It can search BLAST databases that are user provided or from the National Center for Biotechnology Information. CONCLUSION We show that ElasticBLAST is a useful application that can efficiently perform BLAST searches for the user in the cloud, demonstrating that with two examples. At the same time, it hides much of the complexity of working in the cloud, lowering the threshold to move work to the cloud.
Collapse
Affiliation(s)
- Christiam Camacho
- grid.280285.50000 0004 0507 7840National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894 USA
| | - Grzegorz M. Boratyn
- grid.280285.50000 0004 0507 7840National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894 USA
| | - Victor Joukov
- grid.280285.50000 0004 0507 7840National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894 USA
| | - Roberto Vera Alvarez
- grid.280285.50000 0004 0507 7840National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894 USA
| | - Thomas L. Madden
- grid.280285.50000 0004 0507 7840National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894 USA
| |
Collapse
|
4
|
Camacho C, Boratyn GM, Joukov V, Alvarez RV, Madden TL. ElasticBLAST: Accelerating Sequence Search via Cloud Computing. bioRxiv 2023:2023.01.04.522777. [PMID: 36789435 PMCID: PMC9928022 DOI: 10.1101/2023.01.04.522777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Background Biomedical researchers use alignments produced by BLAST (Basic Local Alignment Search Tool) to categorize their query sequences. Producing such alignments is an essential bioinformatics task that is well suited for the cloud. The cloud can perform many calculations quickly as well as store and access large volumes of data. Bioinformaticians can also use it to collaborate with other researchers, sharing their results, datasets and even their pipelines on a common platform. Results We present ElasticBLAST, a cloud native application to perform BLAST alignments in the cloud. ElasticBLAST can handle anywhere from a few to many thousands of queries and run the searches on thousands of virtual CPUs (if desired), deleting resources when it is done. It uses cloud native tools for orchestration and can request discounted instances, lowering cloud costs for users. It is supported on Amazon Web Services and Google Cloud Platform. It can search BLAST databases that are user provided or from the National Center for Biotechnology Information. Conclusion We show that ElasticBLAST is a useful application that can efficiently perform BLAST searches for the user in the cloud, demonstrating that with two examples. At the same time, it hides much of the complexity of working in the cloud, lowering the threshold to move work to the cloud.
Collapse
Affiliation(s)
- Christiam Camacho
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Grzegorz M. Boratyn
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Victor Joukov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Roberto Vera Alvarez
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | | |
Collapse
|
5
|
Govender KN, Eyre DW. Benchmarking taxonomic classifiers with Illumina and Nanopore sequence data for clinical metagenomic diagnostic applications. Microb Genom 2022; 8. [PMID: 36269282 PMCID: PMC9676057 DOI: 10.1099/mgen.0.000886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Culture-independent metagenomic detection of microbial species has the potential to provide rapid and precise real-time diagnostic results. However, it is potentially limited by sequencing and taxonomic classification errors. We use simulated and real-world data to benchmark rates of species misclassification using 100 reference genomes for each of the ten common bloodstream pathogens and six frequent blood-culture contaminants (n=1568, only 68 genomes were available for Micrococcus luteus). Simulating both with and without sequencing error for both the Illumina and Oxford Nanopore platforms, we evaluated commonly used classification tools including Kraken2, Bracken and Centrifuge, utilizing mini (8 GB) and standard (30–50 GB) databases. Bracken with the standard database performed best, the median percentage of reads across both sequencing platforms identified correctly to the species level was 97.8% (IQR 92.7:99.0) [range 5:100]. For Kraken2 with a mini database, a commonly used combination, median species-level identification was 86.4% (IQR 50.5:93.7) [range 4.3:100]. Classification performance varied by species, with Escherichia coli being more challenging to classify correctly (probability of reads being assigned to the correct species: 56.1–96.0%, varying by tool used). Human read misclassification was negligible. By filtering out shorter Nanopore reads we found performance similar or superior to Illumina sequencing, despite higher sequencing error rates. Misclassification was more common when the misclassified species had a higher average nucleotide identity to the true species. Our findings highlight taxonomic misclassification of sequencing data occurs and varies by sequencing and analysis workflow. To account for ‘bioinformatic contamination’ we present a contamination catalogue that can be used in metagenomic pipelines to ensure accurate results that can support clinical decision making.
Collapse
Affiliation(s)
- Kumeren N Govender
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - David W Eyre
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK.,Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| |
Collapse
|
6
|
Abstract
The decreasing cost of sequencing and concomitant augmentation of publicly available genomes have created an acute need for automated software to assess genomic contamination. During the last 6 years, 18 programs have been published, each with its own strengths and weaknesses. Deciding which tools to use becomes more and more difficult without an understanding of the underlying algorithms. We review these programs, benchmarking six of them, and present their main operating principles. This article is intended to guide researchers in the selection of appropriate tools for specific applications. Finally, we present future challenges in the developing field of contamination detection.
Collapse
Affiliation(s)
- Luc Cornet
- BCCM/IHEM, Mycology and Aerobiology, Sciensano, Bruxelles, Belgium
| | - Denis Baurain
- InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium.
| |
Collapse
|