Guo L, Huo H. An efficient Burrows-Wheeler transform-based aligner for short read mapping.
Comput Biol Chem 2024;
110:108050. [PMID:
38447272 DOI:
10.1016/j.compbiolchem.2024.108050]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/15/2024] [Accepted: 03/01/2024] [Indexed: 03/08/2024]
Abstract
Read mapping as the foundation of computational biology is a bottleneck task under the pressure of sequencing throughput explodes. In this work, we present an efficient Burrows-Wheeler transform-based aligner for next-generation sequencing (NGS) short read. Firstly, we propose a difference-aware classification strategy to assign specific reads to the computationally more economical search modes, and present some acceleration techniques, such as a seed pruning method based on the property of maximum coverage interval to reduce the redundant locating for candidate regions, redesigning LF calculation to support fast query. Then, we propose a heuristic verification to determine the best mapping from amounts of flanking sequences. Incorporated with low-distortion string embedding, most dissimilar sequences are filtered out cheaply, and the highly similar sequences left are just right for the wavefront alignment algorithm's preference. We provide a full spectrum benchmark with different read lengths, the results show that our method is 1.3-1.4 times faster than state-of-the-art Burrows-Wheeler transform-based methods (including bowtie2, bwa-MEM, and hisat2) over 101bp reads and has a speedup with 1.5-13 times faster over 750bp to 1000bp reads; meanwhile, our method has comparable memory usage and accuracy. However, hash-based methods (including Strobealign, Minimap2, and Accel-Align) are significantly faster, in part because Burrows-Wheeler transform-based methods calculate on the compressed space. The source code is available: https://github.com/Lilu-guo/Effaln.
Collapse