Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rautiainen M, Mäkinen V, Marschall T. Bit-parallel sequence-to-graph alignment. Bioinformatics 2020;35:3599-3607. [PMID: 30851095 PMCID: PMC6761980 DOI: 10.1093/bioinformatics/btz162] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 01/19/2019] [Accepted: 03/07/2019] [Indexed: 01/16/2023] Open

For:	Rautiainen M, Mäkinen V, Marschall T. Bit-parallel sequence-to-graph alignment. Bioinformatics 2020;35:3599-3607. [PMID: 30851095 PMCID: PMC6761980 DOI: 10.1093/bioinformatics/btz162] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 01/19/2019] [Accepted: 03/07/2019] [Indexed: 01/16/2023] Open

Number

Cited by Other Article(s)

Cui Y, Peng C, Xia Z, Yang C, Guo Y. A survey of sequence-to-graph mapping algorithms in the pangenome era. Genome Biol 2025;26:138. [PMID: 40405275 PMCID: PMC12096488 DOI: 10.1186/s13059-025-03606-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 05/06/2025] [Indexed: 05/24/2025] Open

Chandra G, Gibney D, Jain C. Haplotype-aware sequence alignment to pangenome graphs. Genome Res 2024;34:1265-1275. [PMID: 39013594 PMCID: PMC11529843 DOI: 10.1101/gr.279143.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 06/24/2024] [Indexed: 07/18/2024]

Abstract

Modern pangenome graphs are built using haplotype-resolved genome assemblies. When mapping reads to a pangenome graph, prioritizing alignments that are consistent with the known haplotypes improves genotyping accuracy. However, the existing rigorous formulations for colinear chaining and alignment problems do not consider the haplotype paths in a pangenome graph. This often leads to spurious read alignments to those paths that are unlikely recombinations of the known haplotypes. In this paper, we develop novel formulations and algorithms for sequence-to-graph alignment and chaining problems. Inspired by the genotype imputation models, we assume that a query sequence is an imperfect mosaic of reference haplotypes. Accordingly, we introduce a recombination penalty in the scoring functions for each haplotype switch. First, we solve haplotype-aware sequence-to-graph alignment in [Formula: see text] time, where Q is the query sequence, E is the set of edges, and H is the set of haplotypes represented in the graph. To complement our solution, we prove that an algorithm significantly faster than [Formula: see text] is impossible under the strong exponential time hypothesis (SETH). Second, we propose a haplotype-aware chaining algorithm that runs in [Formula: see text] time after graph preprocessing, where N is the count of input anchors. We then establish that a chaining algorithm significantly faster than [Formula: see text] is impossible under SETH. As a proof-of-concept, we implemented our chaining algorithm in the Minichain aligner. By aligning sequences sampled from the human major histocompatibility complex (MHC) to a pangenome graph of 60 MHC haplotypes, we demonstrate that our algorithm achieves better consistency with ground-truth recombinations compared with a haplotype-agnostic algorithm.

Collapse

Bernardini G, Gabory E, Pissis SP, Stougie L, Sweering M, Zuba W. Elastic-Degenerate String Matching with 1 Error or Mismatch. THEORY OF COMPUTING SYSTEMS 2024;68:1442-1467. [DOI: 10.1007/s00224-024-10194-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/06/2024] [Indexed: 01/05/2025]

Abstract AbstractAn elastic-degenerate (ED) string is a sequence of n finite sets of strings of total length N, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an

$$\mathcal {\tilde{O}}(nm^{\omega -1})+\mathcal {O}(N)$$

O ~ ( n m ω - 1 ) + O ( N ) -time algorithm [Bernardini et al., SIAM J. Comput. 2022], where

$$\omega $$

ω denotes the matrix multiplication exponent and the

$$\mathcal {\tilde{O}}(\cdot )$$

O ~ ( · ) notation suppresses polylog factors. In the k-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most k errors. k-EDSM can be solved in

$$\mathcal {O}(k^2mG+kN)$$

O ( k 2 m G + k N ) time, under edit distance, or

$$\mathcal {O}(kmG+kN)$$

O ( k m G + k N ) time, under Hamming distance, where G denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, G is only bounded by N, and so even for

$$k=1$$

k = 1 , the existing algorithms run in

$$\varOmega (mN)$$

Ω ( m N ) time in the worst case. In this paper we make progress in this direction. We show that 1-EDSM can be solved in

$$\mathcal {O}((nm^2 + N)\log m)$$

O ( ( n m 2 + N ) log m ) or

$$\mathcal {O}(nm^3 + N)$$

O ( n m 3 + N ) time under edit distance. For the decision version of the problem, we present a faster

$$\mathcal {O}(nm^2\sqrt{\log m} + N\log \log m)$$

O ( n m 2 log m + N log log m ) -time algorithm. We also show that 1-EDSM can be solved in

$$\mathcal {O}(nm^2 + N\log m)$$

O ( n m 2 + N log m ) time under Hamming distance. Our algorithms for edit distance rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or 2d range emptiness), which we show how to solve efficiently. In order to obtain an even faster algorithm for Hamming distance, we rely on employing and adapting the k-errata trees for indexing with errors [Cole et al., STOC 2004]. This is an extended version of a paper presented at LATIN 2022. Collapse

Gabory E, Mwaniki MN, Pisanti N, Pissis SP, Radoszewski J, Sweering M, Zuba W. Pangenome comparison via ED strings. FRONTIERS IN BIOINFORMATICS 2024;4:1397036. [PMID: 39391331 PMCID: PMC11464492 DOI: 10.3389/fbinf.2024.1397036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 08/23/2024] [Indexed: 10/12/2024] Open

Xue Z, Zhou A, Zhu X, Li L, Zhu H, Jin X, Wang J. NIPT-PG: empowering non-invasive prenatal testing to learn from population genomics through an incremental pan-genomic approach. Brief Bioinform 2024;25:bbae266. [PMID: 38836702 DOI: 10.1093/bib/bbae266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/03/2024] [Accepted: 05/21/2024] [Indexed: 06/06/2024] Open

Avila Cartes J, Bonizzoni P, Ciccolella S, Della Vedova G, Denti L, Didelot X, Monti DC, Pirola Y. RecGraph: recombination-aware alignment of sequences to variation graphs. Bioinformatics 2024;40:btae292. [PMID: 38676570 PMCID: PMC11256948 DOI: 10.1093/bioinformatics/btae292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 02/23/2024] [Accepted: 04/25/2024] [Indexed: 04/29/2024] Open

Dabbaghie F, Srikakulam SK, Marschall T, Kalinina OV. PanPA: generation and alignment of panproteome graphs. BIOINFORMATICS ADVANCES 2023;3:vbad167. [PMID: 38145107 PMCID: PMC10748787 DOI: 10.1093/bioadv/vbad167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/26/2023]

Ma J, Cáceres M, Salmela L, Mäkinen V, Tomescu AI. Chaining for accurate alignment of erroneous long reads to acyclic variation graphs. Bioinformatics 2023;39:btad460. [PMID: 37494467 PMCID: PMC10423031 DOI: 10.1093/bioinformatics/btad460] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/08/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open

Abstract

MOTIVATION

Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications such as improving variant calling. While the vg toolkit [Garrison et al. (Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9)] is a popular aligner of short reads, GraphAligner [Rautiainen and Marschall (GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253-28)] is the state-of-the-art aligner of erroneous long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds.

RESULTS

We present a new algorithm to co-linearly chain a set of seeds in a string labeled acyclic graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of erroneous long reads to acyclic variation graphs, GraphChainer. We run experiments aligning real and simulated PacBio CLR reads with average error rates 15% and 5%. Compared to GraphAligner, GraphChainer aligns 12-17% more reads, and 21-28% more total read length, on real PacBio CLR reads from human chromosomes 1, 22, and the whole human pangenome. On both simulated and real data, GraphChainer aligns between 95% and 99% of all reads, and of total read length. We also show that minigraph [Li et al. (The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020;21:265-19.)] and minichain [Chandra and Jain (Sequence to graph alignment using gap-sensitive co-linear chaining. In: Proceedings of the 27th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2023). Springer, 2023, 58-73.)] obtain an accuracy of <60% on this setting.

AVAILABILITY AND IMPLEMENTATION

GraphChainer is freely available at https://github.com/algbio/GraphChainer. The datasets and evaluation pipeline can be reached from the previous address.

Collapse

Shi J, Tian Z, Lai J, Huang X. Plant pan-genomics and its applications. MOLECULAR PLANT 2023;16:168-186. [PMID: 36523157 DOI: 10.1016/j.molp.2022.12.009] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 12/07/2022] [Accepted: 12/12/2022] [Indexed: 06/17/2023]

Baaijens JA, Bonizzoni P, Boucher C, Della Vedova G, Pirola Y, Rizzi R, Sirén J. Computational graph pangenomics: a tutorial on data structures and their applications. NATURAL COMPUTING 2022;21:81-108. [PMID: 36969737 PMCID: PMC10038355 DOI: 10.1007/s11047-022-09882-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/14/2022] [Indexed: 05/08/2023]

Abstract

Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations-thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome, is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.

Collapse

Pandey P, Gao Y, Kingsford C. VariantStore: an index for large-scale genomic variant search. Genome Biol 2021;22:231. [PMID: 34412679 PMCID: PMC8375130 DOI: 10.1186/s13059-021-02442-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 07/27/2021] [Indexed: 11/18/2022] Open

Quince C, Nurk S, Raguideau S, James R, Soyer OS, Summers JK, Limasset A, Eren AM, Chikhi R, Darling AE. STRONG: metagenomics strain resolution on assembly graphs. Genome Biol 2021;22:214. [PMID: 34311761 PMCID: PMC8311964 DOI: 10.1186/s13059-021-02419-7] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Accepted: 06/29/2021] [Indexed: 12/30/2022] Open

Lu TY, Chaisson MJP. Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs. Nat Commun 2021;12:4250. [PMID: 34253730 PMCID: PMC8275641 DOI: 10.1038/s41467-021-24378-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 06/10/2021] [Indexed: 12/11/2022] Open

Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021;10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/23/2021] [Indexed: 11/20/2022] Open

Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021;10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/04/2021] [Indexed: 11/08/2023] Open

Schulz T, Wittler R, Rahmann S, Hach F, Stoye J. Detecting High Scoring Local Alignments in Pangenome Graphs. Bioinformatics 2021;37:2266-2274. [PMID: 33532821 PMCID: PMC8388040 DOI: 10.1093/bioinformatics/btab077] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 12/02/2020] [Accepted: 01/29/2021] [Indexed: 11/23/2022] Open

Dilthey AT. State-of-the-art genome inference in the human MHC. Int J Biochem Cell Biol 2021;131:105882. [PMID: 33189874 DOI: 10.1016/j.biocel.2020.105882] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 10/29/2020] [Accepted: 11/04/2020] [Indexed: 12/20/2022]

Darby CA, Gaddipati R, Schatz MC, Langmead B. Vargas: heuristic-free alignment for assessing linear and graph read aligners. Bioinformatics 2020;36:3712-3718. [PMID: 32321164 PMCID: PMC7320598 DOI: 10.1093/bioinformatics/btaa265] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 03/19/2020] [Accepted: 04/15/2020] [Indexed: 12/31/2022] Open

Li H, Feng X, Chu C. The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020;21:265. [PMID: 33066802 PMCID: PMC7568353 DOI: 10.1186/s13059-020-02168-z] [Citation(s) in RCA: 219] [Impact Index Per Article: 43.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 09/23/2020] [Indexed: 12/21/2022] Open

Garg S, Aach J, Li H, Sebenius I, Durbin R, Church G. A haplotype-aware de novo assembly of related individuals using pedigree sequence graph. Bioinformatics 2020;36:2385-2392. [PMID: 31860070 DOI: 10.1093/bioinformatics/btz942] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 11/23/2019] [Accepted: 12/18/2019] [Indexed: 01/11/2023] Open

Rautiainen M, Marschall T. GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253. [PMID: 32972461 PMCID: PMC7513500 DOI: 10.1186/s13059-020-02157-2] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 08/26/2020] [Indexed: 02/07/2023] Open

Dvorkina T, Antipov D, Korobeynikov A, Nurk S. SPAligner: alignment of long diverged molecular sequences to assembly graphs. BMC Bioinformatics 2020;21:306. [PMID: 32703258 PMCID: PMC7379835 DOI: 10.1186/s12859-020-03590-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 06/08/2020] [Indexed: 12/27/2022] Open

Guyomar C, Delage W, Legeai F, Mougel C, Simon JC, Lemaitre C. MinYS: mine your symbiont by targeted genome assembly in symbiotic communities. NAR Genom Bioinform 2020;2:lqaa047. [PMID: 33575599 PMCID: PMC7671366 DOI: 10.1093/nargab/lqaa047] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 05/20/2020] [Accepted: 06/17/2020] [Indexed: 12/17/2022] Open

Chang X, Eizenga J, Novak AM, Sirén J, Paten B. Distance indexing and seed clustering in sequence graphs. Bioinformatics 2020;36:i146-i153. [PMID: 32657356 PMCID: PMC7355256 DOI: 10.1093/bioinformatics/btaa446] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Ghaffaari A, Marschall T. Fully-sensitive seed finding in sequence graphs using a hybrid index. Bioinformatics 2020;35:i81-i89. [PMID: 31510650 PMCID: PMC6612829 DOI: 10.1093/bioinformatics/btz341] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

Eizenga JM, Novak AM, Sibbesen JA, Heumos S, Ghaffaari A, Hickey G, Chang X, Seaman JD, Rounthwaite R, Ebler J, Rautiainen M, Garg S, Paten B, Marschall T, Sirén J, Garrison E. Pangenome Graphs. Annu Rev Genomics Hum Genet 2020;21:139-162. [PMID: 32453966 DOI: 10.1146/annurev-genom-120219-080406] [Citation(s) in RCA: 136] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Affiliation(s)

Jordan M Eizenga Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Adam M Novak Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Jonas A Sibbesen Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Simon Heumos Quantitative Biology Center, University of Tübingen, 72076 Tübingen, Germany
Ali Ghaffaari Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
Glenn Hickey Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Xian Chang Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Josiah D Seaman Royal Botanic Gardens, Kew, Richmond TW9 3AB, United Kingdom.,School of Biological and Chemical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
Robin Rounthwaite Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Jana Ebler Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
Mikko Rautiainen Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
Shilpa Garg Departments of Genetics and Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02215, USA.,Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
Benedict Paten Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Tobias Marschall Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
Jouni Sirén Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Erik Garrison Genomics Institute, University of California, Santa Cruz, California 95064, USA;

Collapse

Sherman RM, Salzberg SL. Pan-genomics in the human genome era. Nat Rev Genet 2020;21:243-254. [PMID: 32034321 PMCID: PMC7752153 DOI: 10.1038/s41576-020-0210-7] [Citation(s) in RCA: 180] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/02/2020] [Indexed: 12/25/2022]

Morgulis A, Agarwala R. SRPRISM (Single Read Paired Read Indel Substitution Minimizer): an efficient aligner for assemblies with explicit guarantees. Gigascience 2020;9:giaa023. [PMID: 32315028 PMCID: PMC7172022 DOI: 10.1093/gigascience/giaa023] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 08/15/2019] [Indexed: 11/12/2022] Open

Dilthey AT, Meyer SA, Kaasch AJ. Ultraplexing: increasing the efficiency of long-read sequencing for hybrid assembly with k-mer-based multiplexing. Genome Biol 2020;21:68. [PMID: 32171299 PMCID: PMC7071681 DOI: 10.1186/s13059-020-01974-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 02/24/2020] [Indexed: 01/10/2023] Open

Mokveld T, Linthorst J, Al-Ars Z, Holstege H, Reinders M. CHOP: haplotype-aware path indexing in population graphs. Genome Biol 2020;21:65. [PMID: 32160922 PMCID: PMC7066762 DOI: 10.1186/s13059-020-01963-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 02/18/2020] [Indexed: 12/20/2022] Open

Yokoyama TT, Sakamoto Y, Seki M, Suzuki Y, Kasahara M. MoMI-G: modular multi-scale integrated genome graph browser. BMC Bioinformatics 2019;20:548. [PMID: 31690272 PMCID: PMC6833150 DOI: 10.1186/s12859-019-3145-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 10/09/2019] [Indexed: 01/30/2023] Open

Abstract

Background

Genome graph is an emerging approach for representing structural variants on genomes with branches. For example, representing structural variants of cancer genomes as a genome graph is more natural than representing such genomes as differences from the linear reference genome. While more and more structural variants are being identified by long-read sequencing, many of them are difficult to visualize using existing structural variants visualization tools. To this end, visualization method for large genome graphs such as human cancer genome graphs is demanded.

Results

We developed MOdular Multi-scale Integrated Genome graph browser, MoMI-G, a web-based genome graph browser that can visualize genome graphs with structural variants and supporting evidences such as read alignments, read depth, and annotations. This browser allows more intuitive recognition of large, nested, and potentially more complex structural variations. MoMI-G has view modules for different scales, which allow users to view the whole genome down to nucleotide-level alignments of long reads. Alignments spanning reference alleles and those spanning alternative alleles are shown in the same view. Users can customize the view, if they are not satisfied with the preset views. In addition, MoMI-G has Interval Card Deck, a feature for rapid manual inspection of hundreds of structural variants. Herein, we describe the utility of MoMI-G by using representative examples of large and nested structural variations found in two cell lines, LC-2/ad and CHM1.

Conclusions

Users can inspect complex and large structural variations found by long-read analysis in large genomes such as human genomes more smoothly and more intuitively. In addition, users can easily filter out false positives by manually inspecting hundreds of identified structural variants with supporting long-read alignments and annotations in a short time.

Software availability

MoMI-G is freely available at https://github.com/MoMI-G/MoMI-G under the MIT license.

Collapse

Yanes L, Garcia Accinelli G, Wright J, Ward BJ, Clavijo BJ. A Sequence Distance Graph framework for genome assembly and analysis. F1000Res 2019;8:1490. [PMID: 31723420 PMCID: PMC6833988 DOI: 10.12688/f1000research.20233.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/12/2019] [Indexed: 11/20/2022] Open