Liu Y, Li Y, Chen E, Xu J, Zhang W, Zeng X, Luo X. Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat.
Commun Biol 2024;
7:1678. [PMID:
39702496 DOI:
10.1038/s42003-024-07376-y]
[Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 12/05/2024] [Indexed: 12/21/2024] Open
Abstract
Error self-correction is crucial for analyzing long-read sequencing data, but existing methods often struggle with noisy data or are tailored to technologies like PacBio HiFi. There is a gap in methods optimized for Nanopore R10 simplex reads, which typically have error rates below 2%. We introduce DeChat, a novel approach designed specifically for these reads. DeChat enables repeat- and haplotype-aware error correction, leveraging the strengths of both de Bruijn graphs and variant-aware multiple sequence alignment to create a synergistic approach. This approach avoids read overcorrection, ensuring that variants in repeats and haplotypes are preserved while sequencing errors are accurately corrected. Benchmarking on simulated and real datasets shows that DeChat-corrected reads have significantly fewer errors-up to two orders of magnitude lower-compared to other methods, without losing read information. Furthermore, DeChat-corrected reads clearly improves genome assembly and taxonomic classification.
Collapse