1
|
Liu Y, Li Y, Chen E, Xu J, Zhang W, Zeng X, Luo X. Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat. Commun Biol 2024; 7:1678. [PMID: 39702496 DOI: 10.1038/s42003-024-07376-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 12/05/2024] [Indexed: 12/21/2024] Open
Abstract
Error self-correction is crucial for analyzing long-read sequencing data, but existing methods often struggle with noisy data or are tailored to technologies like PacBio HiFi. There is a gap in methods optimized for Nanopore R10 simplex reads, which typically have error rates below 2%. We introduce DeChat, a novel approach designed specifically for these reads. DeChat enables repeat- and haplotype-aware error correction, leveraging the strengths of both de Bruijn graphs and variant-aware multiple sequence alignment to create a synergistic approach. This approach avoids read overcorrection, ensuring that variants in repeats and haplotypes are preserved while sequencing errors are accurately corrected. Benchmarking on simulated and real datasets shows that DeChat-corrected reads have significantly fewer errors-up to two orders of magnitude lower-compared to other methods, without losing read information. Furthermore, DeChat-corrected reads clearly improves genome assembly and taxonomic classification.
Collapse
Affiliation(s)
- Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Yichen Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Enlian Chen
- College of Biology, Hunan University, Changsha, China
| | - Jialu Xu
- College of Biology, Hunan University, Changsha, China
| | - Wenhai Zhang
- College of Biology, Hunan University, Changsha, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| |
Collapse
|
2
|
Agustinho DP, Fu Y, Menon VK, Metcalf GA, Treangen TJ, Sedlazeck FJ. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods 2024; 21:954-966. [PMID: 38689099 PMCID: PMC11955098 DOI: 10.1038/s41592-024-02262-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 03/29/2024] [Indexed: 05/02/2024]
Abstract
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Collapse
Affiliation(s)
- Daniel P Agustinho
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
- Senior research project manager, Human Genetics, Genentech, South San Francisco, CA, USA
| | - Ginger A Metcalf
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Bioengineering, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
3
|
Wattanasombat S, Tongjai S. Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline. F1000Res 2024; 13:556. [PMID: 38984017 PMCID: PMC11231628 DOI: 10.12688/f1000research.149577.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/14/2024] [Indexed: 07/11/2024] Open
Abstract
Background Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources. Methods We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment. Results Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among de novo assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime. Conclusions The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.
Collapse
Affiliation(s)
- Sara Wattanasombat
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Siripong Tongjai
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| |
Collapse
|
4
|
Jaudou S, Deneke C, Tran ML, Salzinger C, Vorimore F, Goehler A, Schuh E, Malorny B, Fach P, Grützke J, Delannoy S. Exploring Long-Read Metagenomics for Full Characterization of Shiga Toxin-Producing Escherichia coli in Presence of Commensal E. coli. Microorganisms 2023; 11:2043. [PMID: 37630603 PMCID: PMC10458860 DOI: 10.3390/microorganisms11082043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 07/26/2023] [Accepted: 08/07/2023] [Indexed: 08/27/2023] Open
Abstract
The characterization of Shiga toxin-producing Escherichia coli (STEC) is necessary to assess their pathogenic potential, but isolation of the strain from complex matrices such as milk remains challenging. In previous work, we have shown the potential of long-read metagenomics to characterize eae-positive STEC from artificially contaminated raw milk without isolating the strain. The presence of multiple E. coli strains in the sample was shown to potentially hinder the correct characterization of the STEC strain. Here, we aimed at determining the STEC:commensal ratio that would prevent the characterization of the STEC. We artificially contaminated pasteurized milk with different ratios of an eae-positive STEC and a commensal E. coli and applied the method previously developed. Results showed that the STEC strain growth was better than the commensal E. coli after enrichment in acriflavine-supplemented BPW. The STEC was successfully characterized in all samples with at least 10 times more STEC post-enrichment compared to the commensal E. coli. However, the presence of equivalent proportions of STEC and commensal E. coli prevented the full characterization of the STEC strain. This study confirms the potential of long-read metagenomics for STEC characterization in an isolation-free manner while refining its limit regarding the presence of background E. coli strains.
Collapse
Affiliation(s)
- Sandra Jaudou
- COLiPATH Unit, Laboratory for Food Safety, ANSES, 94700 Maisons-Alfort, France; (S.J.)
- National Study Center for Sequencing in Risk Assessment, Department of Biological Safety, German Federal Institute for Risk Assessment, 12277 Berlin, Germany
| | - Carlus Deneke
- National Study Center for Sequencing in Risk Assessment, Department of Biological Safety, German Federal Institute for Risk Assessment, 12277 Berlin, Germany
| | - Mai-Lan Tran
- COLiPATH Unit, Laboratory for Food Safety, ANSES, 94700 Maisons-Alfort, France; (S.J.)
- Genomics Platform IdentyPath, Laboratory for Food Safety, ANSES, 94700 Maisons-Alfort, France
| | - Carina Salzinger
- National Reference Laboratory for Escherichia coli Including VTEC, Department of Biological Safety, German Federal Institute for Risk Assessment, 12277 Berlin, Germany
| | - Fabien Vorimore
- Genomics Platform IdentyPath, Laboratory for Food Safety, ANSES, 94700 Maisons-Alfort, France
| | - André Goehler
- National Reference Laboratory for Escherichia coli Including VTEC, Department of Biological Safety, German Federal Institute for Risk Assessment, 12277 Berlin, Germany
| | - Elisabeth Schuh
- National Reference Laboratory for Escherichia coli Including VTEC, Department of Biological Safety, German Federal Institute for Risk Assessment, 12277 Berlin, Germany
| | - Burkhard Malorny
- National Study Center for Sequencing in Risk Assessment, Department of Biological Safety, German Federal Institute for Risk Assessment, 12277 Berlin, Germany
| | - Patrick Fach
- COLiPATH Unit, Laboratory for Food Safety, ANSES, 94700 Maisons-Alfort, France; (S.J.)
- Genomics Platform IdentyPath, Laboratory for Food Safety, ANSES, 94700 Maisons-Alfort, France
| | - Josephine Grützke
- National Study Center for Sequencing in Risk Assessment, Department of Biological Safety, German Federal Institute for Risk Assessment, 12277 Berlin, Germany
| | - Sabine Delannoy
- COLiPATH Unit, Laboratory for Food Safety, ANSES, 94700 Maisons-Alfort, France; (S.J.)
- Genomics Platform IdentyPath, Laboratory for Food Safety, ANSES, 94700 Maisons-Alfort, France
| |
Collapse
|
5
|
Eco-evolutionary implications of helminth microbiomes. J Helminthol 2023; 97:e22. [PMID: 36790127 DOI: 10.1017/s0022149x23000056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
The evolution of helminth parasites has long been seen as an interplay between host resistance to infection and the parasite's capacity to bypass such resistance. However, there has recently been an increasing appreciation of the role of symbiotic microbes in the interaction of helminth parasites and their hosts. It is now clear that helminths have a different microbiome from the organisms they parasitize, and sometimes amid large variability, components of the microbiome are shared among different life stages or among populations of the parasite. Helminths have been shown to acquire microbes from their parent generations (vertical transmission) and from their surroundings (horizontal transmission). In this latter case, natural selection has been strongly linked to the fact that helminth-associated microbiota is not simply a random assemblage of the pool of microbes available from their organismal hosts or environments. Indeed, some helminth parasites and specific microbial taxa have evolved complex ecological relationships, ranging from obligate mutualism to reproductive manipulation of the helminth by associated microbes. However, our understanding is still very elementary regarding the net effect of all microbiome components in the eco-evolution of helminths and their interaction with hosts. In this non-exhaustible review, we focus on the bacterial microbiome associated with helminths (as opposed to the microbiome of their hosts) and highlight relevant concepts and key findings in bacterial transmission, ecological associations, and taxonomic and functional diversity of the bacteriome. We integrate the microbiome dimension in a discussion of the evolution of helminth parasites and identify fundamental knowledge gaps, finally suggesting research avenues for understanding the eco-evolutionary impacts of the microbiome in host-parasite interactions in light of new technological developments.
Collapse
|
6
|
Metagenomic nanopore sequencing versus conventional diagnosis for identification of the dieback pathogens of mango trees. Biotechniques 2022; 73:261-272. [DOI: 10.2144/btn-2022-0087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Dieback is one of the most dangerous fungal diseases affecting mango trees. In this study, nanopore metagenome sequencing of the root-soil samples and infected plant tissues was conducted to identify the fungal pathogens present. Soil analysis of the infected mango trees showed the abundance of the Dikarya subkingdom (59%) including Lasiodiplodia theobromae (15%), Alternaria alternata (6%), Ceratocystis huliohia and Colletotrichum gloeosporioides. Analysis of the infected plant tissues revealed the presence of A. alternata (34%). The data were deposited in the National Center of Biotechnology Information (PRJNA767267). In conclusion, nanopore metagenome sequencing analysis was a valuable tool to rapidly identify dieback-associated fungal pathogens.
Collapse
|
7
|
VeChat: correcting errors in long reads using variation graphs. Nat Commun 2022; 13:6657. [PMID: 36333324 PMCID: PMC9636371 DOI: 10.1038/s41467-022-34381-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022] Open
Abstract
Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat .
Collapse
|