Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rossi M, Oliva M, Langmead B, Gagie T, Boucher C. MONI: A Pangenomic Index for Finding Maximal Exact Matches. J Comput Biol 2022;29:169-187. [PMID: 35041495 PMCID: PMC8892979 DOI: 10.1089/cmb.2021.0290] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

For:	Rossi M, Oliva M, Langmead B, Gagie T, Boucher C. MONI: A Pangenomic Index for Finding Maximal Exact Matches. J Comput Biol 2022;29:169-187. [PMID: 35041495 PMCID: PMC8892979 DOI: 10.1089/cmb.2021.0290] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

Number

Cited by Other Article(s)

Depuydt L, Renders L, Van de Vyver S, Veys L, Gagie T, Fostier J. b-move: faster bidirectional character extensions in a run-length compressed index. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.30.596587. [PMID: 38854079 PMCID: PMC11160816 DOI: 10.1101/2024.05.30.596587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]

Abstract

Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.'s r-index and Nishimoto and Tabei's move structure, alleviate memory constraints but focus primarily on backward search for MEM-finding. Arakawa et al.'s br-index initiates complete approximate pattern matching using bidirectional search in run-length compressed space, but with significant computational overhead due to complex memory access patterns. We introduce b-move, a novel bidirectional extension of the move structure, enabling fast, cache-efficient bidirectional character extensions in run-length compressed space. It achieves bidirectional character extensions up to 8 times faster than the br-index, closing the performance gap with FM-index-based alternatives, while maintaining the br-index's favorable memory characteristics. For example, all available complete E. coli genomes on NCBI's RefSeq collection can be compiled into a b-move index that fits into the RAM of a typical laptop. Thus, b-move proves practical and scalable for pan-genome indexing and querying. We provide a C++ implementation of b-move, supporting efficient lossless approximate pattern matching including locate functionality, available at https://github.com/biointec/b-move under the AGPL-3.0 license.

Funding

Lore Depuydt : PhD Fellowship FR (1117322N), Research Foundation - Flanders (FWO) Luca Renders : PhD Fellowship SB (1SE7822N), Research Foundation - Flanders (FWO) Travis Gagie : NSERC Discovery Grant RGPIN-07185-2020 to Travis Gagie and NIH grant R01HG011392 to Ben Langmead.

Collapse

Hwang S, Brown NK, Ahmed OY, Jenike KM, Kovaka S, Schatz MC, Langmead B. MEM-based pangenome indexing for k-mer queries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.20.595044. [PMID: 38826299 PMCID: PMC11142109 DOI: 10.1101/2024.05.20.595044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]

Zakeri M, Brown NK, Ahmed OY, Gagie T, Langmead B. Movi: a fast and cache-efficient full-text pangenome index. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.04.565615. [PMID: 37961660 PMCID: PMC10635132 DOI: 10.1101/2023.11.04.565615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Cozzi D, Rossi M, Rubinacci S, Gagie T, Köppl D, Boucher C, Bonizzoni P. μ- PBWT: a lightweight r-indexing of the PBWT for storing and querying UK Biobank data. Bioinformatics 2023;39:btad552. [PMID: 37688560 PMCID: PMC10502237 DOI: 10.1093/bioinformatics/btad552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/07/2023] [Accepted: 09/07/2023] [Indexed: 09/11/2023] Open

Shivakumar VS, Ahmed OY, Kovaka S, Zakeri M, Langmead B. Sigmoni: classification of nanopore signal with a compressed pangenome index. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.15.553308. [PMID: 37645873 PMCID: PMC10462034 DOI: 10.1101/2023.08.15.553308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]

Ma J, Cáceres M, Salmela L, Mäkinen V, Tomescu AI. Chaining for accurate alignment of erroneous long reads to acyclic variation graphs. Bioinformatics 2023;39:btad460. [PMID: 37494467 PMCID: PMC10423031 DOI: 10.1093/bioinformatics/btad460] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/08/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open

Abstract

MOTIVATION

Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications such as improving variant calling. While the vg toolkit [Garrison et al. (Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9)] is a popular aligner of short reads, GraphAligner [Rautiainen and Marschall (GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253-28)] is the state-of-the-art aligner of erroneous long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds.

RESULTS

We present a new algorithm to co-linearly chain a set of seeds in a string labeled acyclic graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of erroneous long reads to acyclic variation graphs, GraphChainer. We run experiments aligning real and simulated PacBio CLR reads with average error rates 15% and 5%. Compared to GraphAligner, GraphChainer aligns 12-17% more reads, and 21-28% more total read length, on real PacBio CLR reads from human chromosomes 1, 22, and the whole human pangenome. On both simulated and real data, GraphChainer aligns between 95% and 99% of all reads, and of total read length. We also show that minigraph [Li et al. (The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020;21:265-19.)] and minichain [Chandra and Jain (Sequence to graph alignment using gap-sensitive co-linear chaining. In: Proceedings of the 27th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2023). Springer, 2023, 58-73.)] obtain an accuracy of <60% on this setting.

AVAILABILITY AND IMPLEMENTATION

GraphChainer is freely available at https://github.com/algbio/GraphChainer. The datasets and evaluation pipeline can be reached from the previous address.

Collapse

Ahmed O, Rossi M, Boucher C, Langmead B. Efficient taxa identification using a pangenome index. Genome Res 2023;33:1069-1077. [PMID: 37258301 PMCID: PMC10538492 DOI: 10.1101/gr.277642.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 05/22/2023] [Indexed: 06/02/2023]

Leonard AS, Crysnanto D, Mapel XM, Bhati M, Pausch H. Graph construction method impacts variation representation and analyses in a bovine super-pangenome. Genome Biol 2023;24:124. [PMID: 37217946 PMCID: PMC10204317 DOI: 10.1186/s13059-023-02969-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 05/10/2023] [Indexed: 05/24/2023] Open

Ahmed OY, Rossi M, Gagie T, Boucher C, Langmead B. SPUMONI 2: improved classification using a pangenome index of minimizer digests. Genome Biol 2023;24:122. [PMID: 37202771 PMCID: PMC10197461 DOI: 10.1186/s13059-023-02958-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 05/03/2023] [Indexed: 05/20/2023] Open

Mun T, Vaddadi NSK, Langmead B. Pangenomic genotyping with the marker array. Algorithms Mol Biol 2023;18:2. [PMID: 37147657 PMCID: PMC10161648 DOI: 10.1186/s13015-023-00225-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 04/22/2023] [Indexed: 05/07/2023] Open

Martínez-Guardiola C, Brown NK, Silva-Coira F, Köppl D, Gagie T, Ladra S. Augmented Thresholds for MONI. PROCEEDINGS. DATA COMPRESSION CONFERENCE 2023;2023:268-277. [PMID: 38818281 PMCID: PMC11138128 DOI: 10.1109/dcc55655.2023.00035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]

Mun T, Vaddadi NSK, Langmead B. Pangenomic Genotyping with the Marker Array. ALGORITHMS IN BIOINFORMATICS : ... INTERNATIONAL WORKSHOP, WABI ..., PROCEEDINGS. WABI (WORKSHOP) 2022;242:19. [PMID: 36409181 PMCID: PMC9674407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Rossi M, Oliva M, Bonizzoni P, Langmead B, Gagie T, Boucher C. Finding Maximal Exact Matches Using the r-Index. J Comput Biol 2022;29:188-194. [PMID: 35041518 PMCID: PMC8902461 DOI: 10.1089/cmb.2021.0445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open