Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Marco-Sola S, Moure JC, Moreto M, Espinosa A. Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics 2021;37:456-463. [PMID: 32915952 PMCID: PMC8355039 DOI: 10.1093/bioinformatics/btaa777] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 07/22/2020] [Accepted: 09/01/2020] [Indexed: 12/30/2022] Open

For:	Marco-Sola S, Moure JC, Moreto M, Espinosa A. Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics 2021;37:456-463. [PMID: 32915952 PMCID: PMC8355039 DOI: 10.1093/bioinformatics/btaa777] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 07/22/2020] [Accepted: 09/01/2020] [Indexed: 12/30/2022] Open

Number

Cited by Other Article(s)

Guo L, Huo H. An efficient Burrows-Wheeler transform-based aligner for short read mapping. Comput Biol Chem 2024;110:108050. [PMID: 38447272 DOI: 10.1016/j.compbiolchem.2024.108050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/15/2024] [Accepted: 03/01/2024] [Indexed: 03/08/2024]

Avila Cartes J, Bonizzoni P, Ciccolella S, Della Vedova G, Denti L, Didelot X, Monti DC, Pirola Y. RecGraph: recombination-aware alignment of sequences to variation graphs. BIOINFORMATICS (OXFORD, ENGLAND) 2024;40:btae292. [PMID: 38676570 DOI: 10.1093/bioinformatics/btae292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 02/23/2024] [Accepted: 04/25/2024] [Indexed: 04/29/2024]

Duchen D, Clipman SJ, Vergara C, Thio CL, Thomas DL, Duggal P, Wojcik GL. A hepatitis B virus (HBV) sequence variation graph improves alignment and sample-specific consensus sequence construction. PLoS One 2024;19:e0301069. [PMID: 38669259 PMCID: PMC11051683 DOI: 10.1371/journal.pone.0301069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 03/09/2024] [Indexed: 04/28/2024] Open

Schloissnig S, Pani S, Rodriguez-Martin B, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M, Hunt S, Rausch T, Marschall T, Korbel JO. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.18.590093. [PMID: 38659906 PMCID: PMC11042266 DOI: 10.1101/2024.04.18.590093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]

Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson Z, Storz SHR, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, Miller DE. Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.05.24303792. [PMID: 38496498 PMCID: PMC10942501 DOI: 10.1101/2024.03.05.24303792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]

Affiliation(s)

Jonas A. Gustafson Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
Sophia B. Gibson Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA Department of Genome Sciences, University of Washington, Seattle, WA, USA
Nikhita Damaraju Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA Institute for Public Health Genetics, University of Washington, Seattle, WA, USA
Miranda PG Zalusky Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
Kendra Hoekzema Department of Genome Sciences, University of Washington, Seattle, WA, USA
David Twesigomwe Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
Lei Yang Pacific Northwest Research Institute, Seattle, WA, USA
Anthony A. Snead Department of Biology, New York University, New York, NY, USA
Phillip A. Richmond Alamya Health, Baton Rouge, LA, USA
Wouter De Coster Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
Nathan D. Olson Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
Andrea Guarracino Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA Human Technopole, Milan, Italy
Qiuhui Li Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Angela L. Miller Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
Joy Goffena Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
Zachery Anderson Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
Sophie HR Storz Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
Sydney A. Ward Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
Maisha Sinha Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
Claudia Gonzaga-Jauregui International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México
Wayne E. Clarke New York Genome Center, New York, NY, USA Outlier Informatics Inc., Saskatoon, SK, Canada
Anna O. Basile New York Genome Center, New York, NY, USA
André Corvelo New York Genome Center, New York, NY, USA
Catherine Reeves New York Genome Center, New York, NY, USA
Adrienne Helland New York Genome Center, New York, NY, USA
Rajeeva Lochan Musunuri New York Genome Center, New York, NY, USA
Mahler Revsine Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Karynne E. Patterson Department of Genome Sciences, University of Washington, Seattle, WA, USA
Cate R. Paschal Department of Laboratories, Seattle Children’s Hospital, Seattle, WA, USA Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
Christina Zakarian Department of Genome Sciences, University of Washington, Seattle, WA, USA
Sara Goodwin Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
Tanner D. Jensen Department of Genetics, Stanford University, Stanford, CA, USA
Esther Robb Department of Computer Science, Stanford University, Stanford, CA, USA
The 1000 Genomes ONT Sequencing Consortium
University of Washington Center for Rare Disease Research (UW-CRDR)
Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium
W. Richard McCombie Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
Fritz J. Sedlazeck Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA Department of Computer Science, Rice University, Houston, TX, USA
Justin M. Zook Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
Stephen B. Montgomery Department of Genetics, Stanford University, Stanford, CA, USA
Erik Garrison Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Mikhail Kolmogorov Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD, USA
Michael C. Schatz Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Richard N. McLaughlin Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA Pacific Northwest Research Institute, Seattle, WA, USA
Harriet Dashnow Department of Human Genetics, University of Utah, Salt Lake City, UT, USA Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
Michael C. Zody New York Genome Center, New York, NY, USA
Matt Loose Deep Seq, School of Life Sciences, University of Nottingham, Nottingham, England
Miten Jain Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA
Evan E. Eichler Department of Genome Sciences, University of Washington, Seattle, WA, USA Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
Danny E. Miller Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA

Collapse

de Oliveira Martins L, Mather AE, Page AJ. Scalable neighbour search and alignment with uvaia. PeerJ 2024;12:e16890. [PMID: 38464752 PMCID: PMC10924453 DOI: 10.7717/peerj.16890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 01/15/2024] [Indexed: 03/12/2024] Open

Groot Koerkamp R, Ivanov P. Exact global alignment using A* with chaining seed heuristic and match pruning. Bioinformatics 2024;40:btae032. [PMID: 38265119 PMCID: PMC10932610 DOI: 10.1093/bioinformatics/btae032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 11/14/2023] [Accepted: 01/20/2024] [Indexed: 01/25/2024] Open

Song B, Buckler ES, Stitzer MC. New whole-genome alignment tools are needed for tapping into plant diversity. TRENDS IN PLANT SCIENCE 2024;29:355-369. [PMID: 37749022 DOI: 10.1016/j.tplants.2023.08.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/19/2023] [Accepted: 08/23/2023] [Indexed: 09/27/2023]

Holt JM, Saunders CT, Rowell WJ, Kronenberg Z, Wenger AM, Eberle M. HiPhase: jointly phasing small, structural, and tandem repeat variants from HiFi sequencing. Bioinformatics 2024;40:btae042. [PMID: 38269623 PMCID: PMC10868326 DOI: 10.1093/bioinformatics/btae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 12/13/2023] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open

Benoit G, Raguideau S, James R, Phillippy AM, Chikhi R, Quince C. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat Biotechnol 2024:10.1038/s41587-023-01983-6. [PMID: 38168989 DOI: 10.1038/s41587-023-01983-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/08/2023] [Indexed: 01/05/2024]

LoTempio J, Delot E, Vilain E. Benchmarking long-read genome sequence alignment tools for human genomics applications. PeerJ 2023;11:e16515. [PMID: 38130927 PMCID: PMC10734412 DOI: 10.7717/peerj.16515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/02/2023] [Indexed: 12/23/2023] Open

Abstract

Background

The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation.

Results

For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001-100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385.

Conclusions

These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform.

Collapse

Dunn T, Narayanasamy S. vcfdist: accurately benchmarking phased small variant calls in human genomes. Nat Commun 2023;14:8149. [PMID: 38071244 PMCID: PMC10710436 DOI: 10.1038/s41467-023-43876-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 11/22/2023] [Indexed: 12/18/2023] Open

Aguado-Puig Q, Doblas M, Matzoros C, Espinosa A, Moure JC, Marco-Sola S, Moreto M. WFA-GPU: gap-affine pairwise read-alignment using GPUs. Bioinformatics 2023;39:btad701. [PMID: 37975878 PMCID: PMC10697739 DOI: 10.1093/bioinformatics/btad701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 11/09/2023] [Accepted: 11/16/2023] [Indexed: 11/19/2023] Open

Abstract

MOTIVATION

Advances in genomics and sequencing technologies demand faster and more scalable analysis methods that can process longer sequences with higher accuracy. However, classical pairwise alignment methods, based on dynamic programming (DP), impose impractical computational requirements to align long and noisy sequences like those produced by PacBio and Nanopore technologies. The recently proposed wavefront alignment (WFA) algorithm paves the way for more efficient alignment tools, improving time and memory complexity over previous methods. However, high-performance computing (HPC) platforms require efficient parallel algorithms and tools to exploit the computing resources available on modern accelerator-based architectures.

RESULTS

This paper presents WFA-GPU, a GPU (graphics processing unit)-accelerated tool to compute exact gap-affine alignments based on the WFA algorithm. We present the algorithmic adaptations and performance optimizations that allow exploiting the massively parallel capabilities of modern GPU devices to accelerate the alignment computations. In particular, we propose a CPU-GPU co-design capable of performing inter-sequence and intra-sequence parallel sequence alignment, combining a succinct WFA-data representation with an efficient GPU implementation. As a result, we demonstrate that our implementation outperforms the original multi-threaded WFA implementation by up to 4.3× and up to 18.2× when using heuristic methods on long and noisy sequences. Compared to other state-of-the-art tools and libraries, the WFA-GPU is up to 29× faster than other GPU implementations and up to four orders of magnitude faster than other CPU implementations. Furthermore, WFA-GPU is the only GPU solution capable of correctly aligning long reads using a commodity GPU.

AVAILABILITY AND IMPLEMENTATION

WFA-GPU code and documentation are publicly available at https://github.com/quim0/WFA-GPU.

Collapse

Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, Blackmon H, Charles M, Cheng HH, Fedrigo O, Fiddaman SR, Formenti G, Frantz LAF, Gilbert MTP, Hearn CJ, Jarvis ED, Klopp C, Marcos S, Mason AS, Velez-Irizarry D, Xu L, Warren WC. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol 2023;21:267. [PMID: 37993882 PMCID: PMC10664547 DOI: 10.1186/s12915-023-01758-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/02/2023] [Indexed: 11/24/2023] Open

Affiliation(s)

Edward S Rice Bond Life Sciences Center, University of Missouri, Columbia, MO, USA Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
Antton Alberdi Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
James Alfieri Department of Ecology & Evolutionary Biology, Texas A&M University, College Station, TX, USA
Giridhar Athrey Department of Poultry Science, Texas A&M University, College Station, TX, USA
Jennifer R Balacco Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
Philippe Bardou Sigenae, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, 31326, France
Heath Blackmon Department of Biology, Texas A&M University, College Station, TX, USA
Mathieu Charles University Paris-Saclay, INRAE, AgroParisTech, GABI, Sigenae, Jouy-en-Josas, France
Hans H Cheng Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
Olivier Fedrigo Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
Steven R Fiddaman Department of Biology, University of Oxford, Oxford, OX1 3SZ, UK
Giulio Formenti Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
Laurent A F Frantz Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany School of Biological and Behavioural Sciences, Queen Mary University of London, London, E1 4DQ, UK
M Thomas P Gilbert Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
Cari J Hearn Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
Erich D Jarvis Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA The Howard Hughes Medical Institute, Chevy Chase, MD, USA
Christophe Klopp Sigenae, Genotoul Bioinfo, MIAT UR875, INRAE, Castanet Tolosan, France
Sofia Marcos Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark Applied Genomics and Bioinformatics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain
Andrew S Mason Department of Biology, The University of York, York, UK
Deborah Velez-Irizarry Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
Luohao Xu Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing, 400715, China
Wesley C Warren Department of Animal Sciences, University of Missouri, Columbia, MO, USA.

Collapse

Kille B, Garrison E, Treangen TJ, Phillippy AM. Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation. Bioinformatics 2023;39:btad512. [PMID: 37603771 PMCID: PMC10505501 DOI: 10.1093/bioinformatics/btad512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/19/2023] [Accepted: 08/18/2023] [Indexed: 08/23/2023] Open

Ayad LAK, Chikhi R, Pissis SP. Seedability: optimizing alignment parameters for sensitive sequence comparison. BIOINFORMATICS ADVANCES 2023;3:vbad108. [PMID: 37621456 PMCID: PMC10444664 DOI: 10.1093/bioadv/vbad108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 08/02/2023] [Accepted: 08/10/2023] [Indexed: 08/26/2023]

Liu D, Steinegger M. Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices. Bioinformatics 2023;39:btad487. [PMID: 37535681 PMCID: PMC10457662 DOI: 10.1093/bioinformatics/btad487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 06/10/2023] [Accepted: 08/01/2023] [Indexed: 08/05/2023] Open

Benoit G, Raguideau S, James R, Phillippy AM, Chikhi R, Quince C. Efficient High-Quality Metagenome Assembly from Long Accurate Reads using Minimizer-space de Bruijn Graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.07.548136. [PMID: 37786716 PMCID: PMC10541625 DOI: 10.1101/2023.07.07.548136] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]

Shaw J, Yu YW. Proving sequence aligners can guarantee accuracy in almost O(m log n) time through an average-case analysis of the seed-chain-extend heuristic. Genome Res 2023;33:1175-1187. [PMID: 36990779 PMCID: PMC10538486 DOI: 10.1101/gr.277637.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 03/16/2023] [Indexed: 03/31/2023]

Santus L, Garriga E, Deorowicz S, Gudyś A, Notredame C. Towards the accurate alignment of over a million protein sequences: Current state of the art. Curr Opin Struct Biol 2023;80:102577. [PMID: 37012200 DOI: 10.1016/j.sbi.2023.102577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/21/2023] [Accepted: 02/27/2023] [Indexed: 04/04/2023]

Sahlin K, Baudeau T, Cazaux B, Marchet C. A survey of mapping algorithms in the long-reads era. Genome Biol 2023;24:133. [PMID: 37264447 PMCID: PMC10236595 DOI: 10.1186/s13059-023-02972-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/12/2023] [Indexed: 06/03/2023] Open

Kille B, Garrison E, Treangen TJ, Phillippy AM. Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.16.540882. [PMID: 37325780 PMCID: PMC10268037 DOI: 10.1101/2023.05.16.540882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]

Garrison E, Guarracino A, Heumos S, Villani F, Bao Z, Tattini L, Hagmann J, Vorbrugg S, Marco-Sola S, Kubica C, Ashbrook DG, Thorell K, Rusholme-Pilcher RL, Liti G, Rudbeck E, Nahnsen S, Yang Z, Moses MN, Nobrega FL, Wu Y, Chen H, de Ligt J, Sudmant PH, Soranzo N, Colonna V, Williams RW, Prins P. Building pangenome graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.05.535718. [PMID: 37066137 PMCID: PMC10104075 DOI: 10.1101/2023.04.05.535718] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]

Marco-Sola S, Eizenga JM, Guarracino A, Paten B, Garrison E, Moreto M. Optimal gap-affine alignment in O(s) space. Bioinformatics 2023;39:7030690. [PMID: 36749013 PMCID: PMC9940620 DOI: 10.1093/bioinformatics/btad074] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 01/02/2023] [Indexed: 02/08/2023] Open

Duchen D, Clipman S, Vergara C, Thio CL, Thomas DL, Duggal P, Wojcik GL. A hepatitis B virus (HBV) sequence variation graph improves sequence alignment and sample-specific consensus sequence construction for genetic analysis of HBV. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.11.523611. [PMID: 36711598 PMCID: PMC9882026 DOI: 10.1101/2023.01.11.523611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

Kovaka S, Ou S, Jenike KM, Schatz MC. Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing. Nat Methods 2023;20:12-16. [PMID: 36635537 PMCID: PMC10068675 DOI: 10.1038/s41592-022-01716-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Garrison E, Guarracino A. Unbiased pangenome graphs. Bioinformatics 2023;39:6854971. [PMID: 36448683 PMCID: PMC9805579 DOI: 10.1093/bioinformatics/btac743] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 09/14/2022] [Indexed: 12/03/2022] Open

Sahlin K. Strobealign: flexible seed size enables ultra-fast and accurate read alignment. Genome Biol 2022;23:260. [PMID: 36522758 PMCID: PMC9753264 DOI: 10.1186/s13059-022-02831-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 12/02/2022] [Indexed: 12/23/2022] Open

Chromosome-scale haplotype-resolved pangenomics. Trends Genet 2022;38:1103-1107. [PMID: 35817620 DOI: 10.1016/j.tig.2022.06.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 06/14/2022] [Accepted: 06/16/2022] [Indexed: 01/24/2023]

Lightweight Pattern Matching Method for DNA Sequencing in Internet of Medical Things. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:6980335. [PMID: 36120669 PMCID: PMC9477578 DOI: 10.1155/2022/6980335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 06/28/2022] [Accepted: 07/29/2022] [Indexed: 11/18/2022]

Abstract

An area of medical science, that is, gaining prominence, is DNA sequencing. Genetic mutations responsible for the disease have been detected using DNA sequencing. The research is focusing on pattern identification methodologies for dealing with DNA-sequencing problems relating to various applications. A few examples of such problems are alignment and assembly of short reads from next generation sequencing (NGS), comparing DNA sequences, and determining the frequency of a pattern in a sequence. The approximate matching of DNA sequences is also well suited for many applications equivalent to the exact matching of the sequence since the DNA sequences are often subject to mutation. Consequently, recognizing pattern similarity becomes necessary. Furthermore, it can also be used in virtually every application that calls for pattern matching, for example, spell-checking, spam filtering, and search engines. According to the traditional approach, finding a similar pattern in the case where the sequence length is l_s and the pattern length is l_p occurs in O (l_s∗l_p). This heavy processing is caused by comparing every character of the sequence repeatedly with the pattern. The research intended to reduce the time complexity of the pattern matching by introducing an approach named “optimized pattern similarity identification” (OPSI). This methodology constructs a table, entitled “shift beyond for avoiding redundant comparison” (SBARC), to bypass the characters in the texts that are already compared with the pattern. The table pertains to the information about the character distance to be skipped in the matching. OPSI discovers at most spots of similar patterns occur in the sequence (by ignoring è mismatches). The experiment resulted in the time complexity identified as O (l_s. è). In comparison to the size of the pattern, the allowed number of mismatches will be much smaller. Aspects such as scalability, generalizability, and performance of the OPSI algorithm are discussed. In comparison with the hamming distance-based approximate pattern matching algorithm, the proposed algorithm is found to be 69% more efficient.

Collapse

Jain C, Rhie A, Hansen NF, Koren S, Phillippy AM. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat Methods 2022;19:705-710. [PMID: 35365778 PMCID: PMC10510034 DOI: 10.1038/s41592-022-01457-8] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 03/17/2022] [Indexed: 01/10/2023]

Song B, Marco-Sola S, Moreto M, Johnson L, Buckler ES, Stitzer MC. AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication. Proc Natl Acad Sci U S A 2022;119:e2113075119. [PMID: 34934012 PMCID: PMC8740769 DOI: 10.1073/pnas.2113075119] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2021] [Indexed: 12/04/2022] Open

Alser M, Lindegger J, Firtina C, Almadhoun N, Mao H, Singh G, Gomez-Luna J, Mutlu O. From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures. Comput Struct Biotechnol J 2022;20:4579-4599. [PMID: 36090814 PMCID: PMC9436709 DOI: 10.1016/j.csbj.2022.08.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 08/08/2022] [Accepted: 08/08/2022] [Indexed: 02/01/2023] Open

Abu‐Hashem M, Gutub A. Efficient computation of Hash Hirschberg protein alignment utilizing hyper threading multi‐core sharing technology. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2021. [DOI: 10.1049/cit2.12070] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Yan Y, Chaturvedi N, Appuswamy R. Accel-Align: a fast sequence mapper and aligner based on the seed-embed-extend method. BMC Bioinformatics 2021;22:257. [PMID: 34016035 PMCID: PMC8139006 DOI: 10.1186/s12859-021-04162-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 05/04/2021] [Indexed: 12/30/2022] Open

Abstract

Background

Improvements in sequencing technology continue to drive sequencing cost towards $100 per genome. However, mapping sequenced data to a reference genome remains a computationally-intensive task due to the dependence on edit distance for dealing with INDELs and mismatches introduced by sequencing. All modern aligners use seed–filter–extend methodology and rely on filtration heuristics to reduce the overhead of edit distance computation. However, filtering has inherent performance–accuracy trade-offs that limits its effectiveness.

Results

Motivated by algorithmic advances in randomized low-distortion embedding, we introduce SEE, a new methodology for developing sequence mappers and aligners. While SFE focuses on eliminating sub-optimal candidates, SEE focuses instead on identifying optimal candidates. To do so, SEE transforms the read and reference strings from edit distance regime to the Hamming regime by embedding them using a randomized algorithm, and uses Hamming distance over the embedded set to identify optimal candidates. To show that SEE performs well in practice, we present Accel-Align an SEE-based short-read sequence mapper and aligner that is 3–12\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}× faster than state-of-the-art aligners on commodity CPUs, without any special-purpose hardware, while providing comparable accuracy.

Conclusions

As sequencing technologies continue to increase read length while improving throughput and accuracy, we believe that randomized embeddings open up new avenues for optimization that cannot be achieved by using edit distance. Thus, the techniques presented in this paper have a much broader scope as they can be used for other applications like graph alignment, multiple sequence alignment, and sequence assembly.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04162-z.

Collapse

Silvestre-Ryan J, Holmes I. Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing. Genome Biol 2021;22:38. [PMID: 33468205 PMCID: PMC7814537 DOI: 10.1186/s13059-020-02255-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 12/20/2020] [Indexed: 01/09/2023] Open