Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Jain C, Dilthey A, Koren S, Aluru S, Phillippy AM. A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases. J Comput Biol 2018;25:766-779. [PMID: 29708767 PMCID: PMC6067103 DOI: 10.1089/cmb.2018.0036] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Jain C, Dilthey A, Koren S, Aluru S, Phillippy AM. A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases. J Comput Biol 2018;25:766-779. [PMID: 29708767 PMCID: PMC6067103 DOI: 10.1089/cmb.2018.0036] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Sweeten AP, Schatz MC, Phillippy AM. ModDotPlot-Rapid and interactive visualization of complex repeats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.15.589623. [PMID: 38712106 PMCID: PMC11071298 DOI: 10.1101/2024.04.15.589623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]

Zheng H, Marçais G, Kingsford C. Creating and Using Minimizer Sketches in Computational Genomics. J Comput Biol 2023;30:1251-1276. [PMID: 37646787 PMCID: PMC11082048 DOI: 10.1089/cmb.2023.0094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023] Open

Greenberg G, Ravi AN, Shomorony I. LexicHash: sequence similarity estimation via lexicographic comparison of hashes. Bioinformatics 2023;39:btad652. [PMID: 37878809 PMCID: PMC10628434 DOI: 10.1093/bioinformatics/btad652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 10/11/2023] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open

Geli-Cruz OJ, Santos-Flores CJ, Cafaro MJ, Ropelewski A, Van Dam AR. Benchmarking assembly free nanopore read mappers to classify complex millipede gut microbiota via Oxford Nanopore Sequencing Technology. J Biol Methods 2023;10:e99010003. [PMID: 37937256 PMCID: PMC10627078 DOI: 10.14440/jbm.2023.376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 03/13/2023] [Accepted: 04/27/2023] [Indexed: 11/09/2023] Open

Ekim B, Sahlin K, Medvedev P, Berger B, Chikhi R. Efficient mapping of accurate long reads in minimizer space with mapquik. Genome Res 2023;33:1188-1197. [PMID: 37399256 PMCID: PMC10538364 DOI: 10.1101/gr.277679.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 06/26/2023] [Indexed: 07/05/2023]

Diesh C, Stevens GJ, Xie P, De Jesus Martinez T, Hershberg EA, Leung A, Guo E, Dider S, Zhang J, Bridge C, Hogue G, Duncan A, Morgan M, Flores T, Bimber BN, Haw R, Cain S, Buels RM, Stein LD, Holmes IH. JBrowse 2: a modular genome browser with views of synteny and structural variation. Genome Biol 2023;24:74. [PMID: 37069644 PMCID: PMC10108523 DOI: 10.1186/s13059-023-02914-z] [Citation(s) in RCA: 43] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 03/20/2023] [Indexed: 04/19/2023] Open

Affiliation(s)

Colin Diesh Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720 USA
Garrett J Stevens Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720 USA
Peter Xie Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720 USA
Teresa De Jesus Martinez Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720 USA
Elliot A. Hershberg Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720 USA
Angel Leung Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720 USA
Emma Guo Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720 USA
Shihab Dider Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720 USA
Junjun Zhang Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3 Canada
Caroline Bridge Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3 Canada
Gregory Hogue Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3 Canada
Andrew Duncan Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3 Canada
Matthew Morgan Center for Applied Systems and Software, 224 Milne Computer Center, 1800 SW Campus Way, Oregon State University, Corvallis, OR 97331 USA
Tia Flores Center for Applied Systems and Software, 224 Milne Computer Center, 1800 SW Campus Way, Oregon State University, Corvallis, OR 97331 USA
Benjamin N. Bimber Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR 97006 USA
Robin Haw Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3 Canada
Scott Cain Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3 Canada
Robert M. Buels Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720 USA
Lincoln D. Stein Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3 Canada
Ian H. Holmes Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720 USA

Collapse

Piña JS, Orozco-Arias S, Tobón-Orozco N, Camargo-Forero L, Tabares-Soto R, Guyot R. G-SAIP: Graphical Sequence Alignment Through Parallel Programming in the Post-Genomic Era. Evol Bioinform Online 2023;19:11769343221150585. [PMID: 36703866 PMCID: PMC9871978 DOI: 10.1177/11769343221150585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 12/23/2022] [Indexed: 01/22/2023] Open

Das A, Schatz MC. Sketching and sampling approaches for fast and accurate long read classification. BMC Bioinformatics 2022;23:452. [PMID: 36316646 PMCID: PMC9624007 DOI: 10.1186/s12859-022-05014-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 10/27/2022] [Indexed: 11/05/2022] Open

Abstract

BACKGROUND

In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read.

RESULTS

Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a "screen") of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read's similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy.

CONCLUSIONS

The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching .

Collapse

Bray JE, Correia A, Varga M, Jolley KA, Maiden MCJ, Rodrigues CMC. Ribosomal MLST nucleotide identity (rMLST-NI), a rapid bacterial species identification method: application to Klebsiella and Raoultella genomic species validation. Microb Genom 2022;8. [PMID: 36098501 PMCID: PMC9676034 DOI: 10.1099/mgen.0.000849] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Kille B, Balaji A, Sedlazeck FJ, Nute M, Treangen TJ. Multiple genome alignment in the telomere-to-telomere assembly era. Genome Biol 2022;23:182. [PMID: 36038949 PMCID: PMC9421119 DOI: 10.1186/s13059-022-02735-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 07/21/2022] [Indexed: 01/22/2023] Open

Jain C, Rhie A, Hansen NF, Koren S, Phillippy AM. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat Methods 2022;19:705-710. [PMID: 35365778 PMCID: PMC10510034 DOI: 10.1038/s41592-022-01457-8] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 03/17/2022] [Indexed: 01/10/2023]

Deng Z, Xia X, Deng Y, Zhao M, Gu C, Geng Y, Wang J, Yang Q, He M, Xiao Q, Xiao W, He L, Liang S, Xu H, Lü M, Yu Z. ANI analysis of poxvirus genomes reveals its potential application to viral species rank demarcation. Virus Evol 2022;8:veac031. [PMID: 35646390 PMCID: PMC9071573 DOI: 10.1093/ve/veac031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/25/2022] [Accepted: 04/28/2022] [Indexed: 11/12/2022] Open

Affiliation(s)

Zhaobin Deng
Xuyang Xia State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, No. 8 Linyin Street, Wuhou District, Chengdu 610000, P. R. China
Yiqi Deng State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, No. 8 Linyin Street, Wuhou District, Chengdu 610000, P. R. China
Mingde Zhao Laboratory Animal Center, Southwest Medical University, No. 1, Section 1, Xianglin Road, Longmatan District, Luzhou 64600, P. R. China
Congwei Gu Laboratory Animal Center, Southwest Medical University, No. 1, Section 1, Xianglin Road, Longmatan District, Luzhou 64600, P. R. China
Yi Geng College of Veterinary Medicine, Sichuan Agricultural University, No. 211 Huimin Road, Wenjiang District, Chengdu 610000, P. R. China
Jun Wang Key Laboratory of Sichuan Province for Fishes Conservation and Utilization in the Upper Reaches of the Yangtze River, No. 1124 Dongtong Road, Neijiang 641100, P. R. China
Qian Yang Laboratory Animal Center, Southwest Medical University, No. 1, Section 1, Xianglin Road, Longmatan District, Luzhou 64600, P. R. China
Manli He Laboratory Animal Center, Southwest Medical University, No. 1, Section 1, Xianglin Road, Longmatan District, Luzhou 64600, P. R. China
Qihai Xiao Laboratory Animal Center, Southwest Medical University, No. 1, Section 1, Xianglin Road, Longmatan District, Luzhou 64600, P. R. China
Wudian Xiao Laboratory Animal Center, Southwest Medical University, No. 1, Section 1, Xianglin Road, Longmatan District, Luzhou 64600, P. R. China
Lvqin He Laboratory Animal Center, Southwest Medical University, No. 1, Section 1, Xianglin Road, Longmatan District, Luzhou 64600, P. R. China
Sicheng Liang Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, No. 25 Taiping Street, Jiangyang District, Luzhou 646000, P. R. China
Heng Xu State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, No. 8 Linyin Street, Wuhou District, Chengdu 610000, P. R. China
Muhan Lü Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, No. 25 Taiping Street, Jiangyang District, Luzhou 646000, P. R. China Laboratory Animal Center, Southwest Medical University, No. 1, Section 1, Xianglin Road, Longmatan District, Luzhou 64600, P. R. China Department of Anatomy and Embryology, Faculty of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8575, Japan School of Comprehensive Human Sciences, Doctoral Program in Biomedical Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8575, Japan
Zehui Yu Laboratory Animal Center, Southwest Medical University, No. 1, Section 1, Xianglin Road, Longmatan District, Luzhou 64600, P. R. China Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, No. 25 Taiping Street, Jiangyang District, Luzhou 646000, P. R. China School of Basic Medical Sciences, Zhejiang University, No. 866 Yuhangtang Road, Xihu District, Hangzhou 310000, P. R. China

Collapse

Sahlin K. Effective sequence similarity detection with strobemers. Genome Res 2021;31:2080-2094. [PMID: 34667119 PMCID: PMC8559714 DOI: 10.1101/gr.275648.121] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 08/20/2021] [Indexed: 01/08/2023]

Fu Y, Mahmoud M, Muraliraman VV, Sedlazeck FJ, Treangen TJ. Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment. Gigascience 2021;10:6375129. [PMID: 34561697 PMCID: PMC8463296 DOI: 10.1093/gigascience/giab063] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/22/2021] [Accepted: 08/29/2021] [Indexed: 01/23/2023] Open

Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Technology dictates algorithms: recent developments in read alignment. Genome Biol 2021;22:249. [PMID: 34446078 PMCID: PMC8390189 DOI: 10.1186/s13059-021-02443-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 07/28/2021] [Indexed: 01/08/2023] Open

Affiliation(s)

Mohammed Alser Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
Jeremy Rotman Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
Dhrithi Deshpande Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA
Kodi Taraszka Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
Huwenbo Shi Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
Pelin Icer Baykal Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
Harry Taegyun Yang Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA Bioinformatics Interdepartmental Ph.D. Program, University of California Los Angeles, Los Angeles, CA, 90095, USA
Victor Xue Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
Sergey Knyazev Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
Benjamin D Singer Division of Pulmonary and Critical Care Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA Department of Biochemistry & Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, USA Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
Brunilda Balliu Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
David Koslicki Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16801, USA Biology Department, Pennsylvania State University, University Park, PA, 16801, USA The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16801, USA
Pavel Skums Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
Alex Zelikovsky Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 119991, Russia
Can Alkan Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey Bilkent-Hacettepe Health Sciences and Technologies Program, Ankara, Turkey
Onur Mutlu Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
Serghei Mangul Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.

Collapse

Jones-Freeman B, Chonwerawong M, Marcelino VR, Deshpande AV, Forster SC, Starkey MR. The microbiome and host mucosal interactions in urinary tract diseases. Mucosal Immunol 2021;14:779-792. [PMID: 33542492 DOI: 10.1038/s41385-020-00372-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 12/03/2020] [Indexed: 02/06/2023]

Almodaresi F, Zakeri M, Patro R. Puffaligner : A Fast, Efficient, and Accurate Aligner Based on the Pufferfish Index. Bioinformatics 2021;37:4048-4055. [PMID: 34117875 PMCID: PMC9502150 DOI: 10.1093/bioinformatics/btab408] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 04/30/2021] [Accepted: 06/11/2021] [Indexed: 12/22/2022] Open

Tian L, Mazloom R, Heath LS, Vinatzer BA. LINflow: a computational pipeline that combines an alignment-free with an alignment-based method to accelerate generation of similarity matrices for prokaryotic genomes. PeerJ 2021;9:e10906. [PMID: 33828908 PMCID: PMC8000461 DOI: 10.7717/peerj.10906] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 01/14/2021] [Indexed: 01/21/2023] Open

Abstract

Background

Computing genomic similarity between strains is a prerequisite for genome-based prokaryotic classification and identification. Genomic similarity was first computed as Average Nucleotide Identity (ANI) values based on the alignment of genomic fragments. Since this is computationally expensive, faster and computationally cheaper alignment-free methods have been developed to estimate ANI. However, these methods do not reach the level of accuracy of alignment-based methods.

Methods

Here we introduce LINflow, a computational pipeline that infers pairwise genomic similarity in a set of genomes. LINflow takes advantage of the speed of the alignment-free sourmash tool to identify the genome in a dataset that is most similar to a query genome and the precision of the alignment-based pyani software to precisely compute ANI between the query genome and the most similar genome identified by sourmash. This is repeated for each new genome that is added to a dataset. The sequentially computed ANI values are stored as Life Identification Numbers (LINs), which are then used to infer all other pairwise ANI values in the set. We tested LINflow on four sets, 484 genomes in total, and compared the needed time and the generated similarity matrices with other tools.

Results

LINflow is up to 150 times faster than pyani and pairwise ANI values generated by LINflow are highly correlated with those computed by pyani. However, because LINflow infers most pairwise ANI values instead of computing them directly, ANI values occasionally depart from the ANI values computed by pyani. In conclusion, LINflow is a fast and memory-efficient pipeline to infer similarity among a large set of prokaryotic genomes. Its ability to quickly add new genome sequences to an already computed similarity matrix makes LINflow particularly useful for projects when new genome sequences need to be regularly added to an existing dataset.

Collapse

Fan J, Huang S, Chorlton SD. BugSeq: a highly accurate cloud platform for long-read metagenomic analyses. BMC Bioinformatics 2021;22:160. [PMID: 33765910 PMCID: PMC7993542 DOI: 10.1186/s12859-021-04089-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 03/18/2021] [Indexed: 12/21/2022] Open

Gwak HJ, Lee SJ, Rho M. Application of computational approaches to analyze metagenomic data. J Microbiol 2021;59:233-241. [DOI: 10.1007/s12275-021-0632-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 01/18/2021] [Accepted: 01/19/2021] [Indexed: 01/04/2023]

Criscuolo A. On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference. F1000Res 2020;9:1309. [PMID: 33335719 PMCID: PMC7713896 DOI: 10.12688/f1000research.26930.1] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/12/2020] [Indexed: 12/29/2022] Open

Mikheenko A, Bzikadze AV, Gurevich A, Miga KH, Pevzner PA. TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 2020;36:i75-i83. [PMID: 32657355 PMCID: PMC7355294 DOI: 10.1093/bioinformatics/btaa440] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Jain C, Rhie A, Zhang H, Chu C, Walenz BP, Koren S, Phillippy AM. Weighted minimizer sampling improves long read mapping. Bioinformatics 2020;36:i111-i118. [PMID: 32657365 PMCID: PMC7355284 DOI: 10.1093/bioinformatics/btaa435] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the k-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.g. Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome to avoid excessive false positives. By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions.

RESULTS

We introduce a novel weighted-minimizer sampling algorithm. A unique feature of the proposed algorithm is that it performs minimizer sampling while considering a weight for each k-mer; i.e. the higher the weight of a k-mer, the more likely it is to be selected. By down-weighting frequently occurring k-mers, we are able to meet both objectives: (i) avoid excessive false-positive matches and (ii) maintain the minimizer match guarantee. We tested our algorithm, Winnowmap, using both simulated and real long-read data and compared it to a state-of-the-art long read mapper, Minimap2. Our results demonstrate a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome (154.3 Mbp), and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp). Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes.

AVAILABILITY AND IMPLEMENTATION

Winnowmap is built on top of the Minimap2 codebase and is available at https://github.com/marbl/winnowmap.

Collapse

Hafezqorani S, Yang C, Lo T, Nip KM, Warren RL, Birol I. Trans-NanoSim characterizes and simulates nanopore RNA-sequencing data. Gigascience 2020;9:5855462. [PMID: 32520350 PMCID: PMC7285873 DOI: 10.1093/gigascience/giaa061] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 04/14/2020] [Accepted: 05/12/2020] [Indexed: 01/08/2023] Open

Elworth RAL, Wang Q, Kota PK, Barberan CJ, Coleman B, Balaji A, Gupta G, Baraniuk RG, Shrivastava A, Treangen T. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics. Nucleic Acids Res 2020;48:5217-5234. [PMID: 32338745 PMCID: PMC7261164 DOI: 10.1093/nar/gkaa265] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/20/2020] [Accepted: 04/04/2020] [Indexed: 02/01/2023] Open

Rice ES, Koren S, Rhie A, Heaton MP, Kalbfleisch TS, Hardy T, Hackett PH, Bickhart DM, Rosen BD, Ley BV, Maurer NW, Green RE, Phillippy AM, Petersen JL, Smith TPL. Continuous chromosome-scale haplotypes assembled from a single interspecies F1 hybrid of yak and cattle. Gigascience 2020;9:giaa029. [PMID: 32242610 PMCID: PMC7118895 DOI: 10.1093/gigascience/giaa029] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 01/08/2020] [Accepted: 03/10/2020] [Indexed: 12/30/2022] Open

Affiliation(s)

Edward S Rice Department of Animal Science, University of Nebraska–Lincoln, C203 ANSC, Lincoln, NE 68583, USA Bond Life Sciences Center, University of Missouri, 1201 Rollins Street, Columbia, MO 65201, USA
Sergey Koren Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, 9000 Rockville Pike, Bethesda, MD 20892, USA
Arang Rhie Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, 9000 Rockville Pike, Bethesda, MD 20892, USA
Michael P Heaton US Meat Animal Research Center, US Department of Agriculture, State Spur 18D, Clay Center, NE 68933, USA
Theodore S Kalbfleisch Gluck Equine Research Center, University of Kentucky, 1400 Nicholasville Rd., Lexington, KY 40546, USA
Timothy Hardy USYAKS, Livermore, CO 80536, USA
Peter H Hackett USYAKS, Livermore, CO 80536, USA
Derek M Bickhart Dairy Forage Research Center, 1925 Linden Drive, ARS USDA, Madison, WI 53706, USA
Benjamin D Rosen Animal Genomics and Improvement Laboratory, 10300 Baltimore Ave., ARS USDA, Beltsville, MD 20705, USA
Brian Vander Ley Great Plains Veterinary Educational Center, School of Veterinary Medicine and Biomedical Sciences, University of Nebraska–Lincoln, 820 Road 313, Clay Center, NE 68933, USA
Nicholas W Maurer Department of Biomolecular Engineering, University of California, 1156 High St., Santa Cruz, CA 95064, USA
Richard E Green Department of Biomolecular Engineering, University of California, 1156 High St., Santa Cruz, CA 95064, USA
Adam M Phillippy Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, 9000 Rockville Pike, Bethesda, MD 20892, USA
Jessica L Petersen Department of Animal Science, University of Nebraska–Lincoln, C203 ANSC, Lincoln, NE 68583, USA
Timothy P L Smith US Meat Animal Research Center, US Department of Agriculture, State Spur 18D, Clay Center, NE 68933, USA

Collapse

Rowe WPM. When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data. Genome Biol 2019;20:199. [PMID: 31519212 PMCID: PMC6744645 DOI: 10.1186/s13059-019-1809-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 09/02/2019] [Indexed: 01/21/2023] Open

Dilthey AT, Jain C, Koren S, Phillippy AM. Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps. Nat Commun 2019;10:3066. [PMID: 31296857 PMCID: PMC6624308 DOI: 10.1038/s41467-019-10934-2] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Accepted: 06/11/2019] [Indexed: 12/20/2022] Open

Armstrong J, Fiddes IT, Diekhans M, Paten B. Whole-Genome Alignment and Comparative Annotation. Annu Rev Anim Biosci 2019;7:41-64. [PMID: 30379572 PMCID: PMC6450745 DOI: 10.1146/annurev-animal-020518-115005] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]