Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Salmela L, Walve R, Rivals E, Ukkonen E. Accurate self-correction of errors in long reads using de Bruijn graphs. Bioinformatics 2017;33:799-806. [PMID: 27273673 PMCID: PMC5351550 DOI: 10.1093/bioinformatics/btw321] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 05/03/2016] [Accepted: 05/16/2016] [Indexed: 12/04/2022] Open

For:	Salmela L, Walve R, Rivals E, Ukkonen E. Accurate self-correction of errors in long reads using de Bruijn graphs. Bioinformatics 2017;33:799-806. [PMID: 27273673 PMCID: PMC5351550 DOI: 10.1093/bioinformatics/btw321] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 05/03/2016] [Accepted: 05/16/2016] [Indexed: 12/04/2022] Open

Number

Cited by Other Article(s)

Liu Y, Li Y, Chen E, Xu J, Zhang W, Zeng X, Luo X. Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat. Commun Biol 2024;7:1678. [PMID: 39702496 DOI: 10.1038/s42003-024-07376-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 12/05/2024] [Indexed: 12/21/2024] Open

Li H, Durbin R. Genome assembly in the telomere-to-telomere era. Nat Rev Genet 2024;25:658-670. [PMID: 38649458 DOI: 10.1038/s41576-024-00718-w] [Citation(s) in RCA: 50] [Impact Index Per Article: 50.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/25/2024]

Tang T, Liu Y, Zheng B, Li R, Zhang X, Liu Y. Integration of hybrid and self-correction method improves the quality of long-read sequencing data. Brief Funct Genomics 2024;23:249-255. [PMID: 37340778 DOI: 10.1093/bfgp/elad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 06/04/2023] [Accepted: 06/05/2023] [Indexed: 06/22/2023] Open

Kang X, Xu J, Luo X, Schönhuth A. Hybrid-hybrid correction of errors in long reads with HERO. Genome Biol 2023;24:275. [PMID: 38041098 PMCID: PMC10690975 DOI: 10.1186/s13059-023-03112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 11/16/2023] [Indexed: 12/03/2023] Open

Pourmohammadi R, Abouei J, Anpalagan A. Error analysis of the PacBio sequencing CCS reads. Int J Biostat 2023;19:439-453. [PMID: 37155831 DOI: 10.1515/ijb-2021-0091] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 09/07/2022] [Indexed: 05/10/2023]

Mastrorosa FK, Miller DE, Eichler EE. Applications of long-read sequencing to Mendelian genetics. Genome Med 2023;15:42. [PMID: 37316925 PMCID: PMC10266321 DOI: 10.1186/s13073-023-01194-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 05/18/2023] [Indexed: 06/16/2023] Open

Zhu W, Liao X. LCAT: an isoform-sensitive error correction for transcriptome sequencing long reads. Front Genet 2023;14:1166975. [PMID: 37292144 PMCID: PMC10245045 DOI: 10.3389/fgene.2023.1166975] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 05/04/2023] [Indexed: 06/10/2023] Open

Prudnikow L, Pannicke B, Wünschiers R. A primer on pollen assignment by nanopore-based DNA sequencing. Front Ecol Evol 2023. [DOI: 10.3389/fevo.2023.1112929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/15/2023] Open

Becker D, Popp D, Bonk F, Kleinsteuber S, Harms H, Centler F. Metagenomic Analysis of Anaerobic Microbial Communities Degrading Short-Chain Fatty Acids as Sole Carbon Sources. Microorganisms 2023;11:microorganisms11020420. [PMID: 36838385 PMCID: PMC9959488 DOI: 10.3390/microorganisms11020420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 02/04/2023] [Indexed: 02/11/2023] Open

Abstract

Analyzing microbial communities using metagenomes is a powerful approach to understand compositional structures and functional connections in anaerobic digestion (AD) microbiomes. Whereas short-read sequencing approaches based on the Illumina platform result in highly fragmented metagenomes, long-read sequencing leads to more contiguous assemblies. To evaluate the performance of a hybrid approach of these two sequencing approaches we compared the metagenome-assembled genomes (MAGs) resulting from five AD microbiome samples. The samples were taken from reactors fed with short-chain fatty acids at different feeding regimes (continuous and discontinuous) and organic loading rates (OLR). Methanothrix showed a high relative abundance at all feeding regimes but was strongly reduced in abundance at higher OLR, when Methanosarcina took over. The bacterial community composition differed strongly between reactors of different feeding regimes and OLRs. However, the functional potential was similar regardless of feeding regime and OLR. The hybrid sequencing approach using Nanopore long-reads and Illumina MiSeq reads improved assembly statistics, including an increase of the N50 value (on average from 32 to 1740 kbp) and an increased length of the longest contig (on average from 94 to 1898 kbp). The hybrid approach also led to a higher share of high-quality MAGs and generated five potentially circular genomes while none were generated using MiSeq-based contigs only. Finally, 27 hybrid MAGs were reconstructed of which 18 represent potentially new species-15 of them bacterial species. During pathway analysis, selected MAGs revealed similar gene patterns of butyrate degradation and might represent new butyrate-degrading bacteria. The demonstrated advantages of adding long reads to metagenomic analyses make the hybrid approach the preferable option when dealing with complex microbiomes.

Collapse

Muñoz-Barrera A, Rubio-Rodríguez LA, Díaz-de Usera A, Jáspez D, Lorenzo-Salazar JM, González-Montelongo R, García-Olivares V, Flores C. From Samples to Germline and Somatic Sequence Variation: A Focus on Next-Generation Sequencing in Melanoma Research. Life (Basel) 2022;12:1939. [PMID: 36431075 PMCID: PMC9695713 DOI: 10.3390/life12111939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 11/12/2022] [Accepted: 11/16/2022] [Indexed: 11/24/2022] Open

Rayamajhi N, Cheng CHC, Catchen JM. Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki. G3 (BETHESDA, MD.) 2022;12:jkac192. [PMID: 35904764 PMCID: PMC9635638 DOI: 10.1093/g3journal/jkac192] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 07/18/2022] [Indexed: 11/16/2022]

Cai D, Shang J, Sun Y. HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization. Bioinformatics 2022;38:5360-5367. [PMID: 36308467 PMCID: PMC9750122 DOI: 10.1093/bioinformatics/btac708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/06/2022] [Accepted: 10/25/2022] [Indexed: 12/24/2022] Open

Genome sequence assembly algorithms and misassembly identification methods. Mol Biol Rep 2022;49:11133-11148. [PMID: 36151399 DOI: 10.1007/s11033-022-07919-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 09/05/2022] [Indexed: 10/14/2022]

Coulter M, Entizne JC, Guo W, Bayer M, Wonneberger R, Milne L, Schreiber M, Haaning A, Muehlbauer GJ, McCallum N, Fuller J, Simpson C, Stein N, Brown JWS, Waugh R, Zhang R. BaRTv2: a highly resolved barley reference transcriptome for accurate transcript-specific RNA-seq quantification. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022;111:1183-1202. [PMID: 35704392 PMCID: PMC9546494 DOI: 10.1111/tpj.15871] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 05/02/2022] [Accepted: 06/09/2022] [Indexed: 06/15/2023]

Affiliation(s)

Max Coulter Division of Plant SciencesUniversity of Dundee, James Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK
Juan Carlos Entizne Division of Plant SciencesUniversity of Dundee, James Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK
Wenbin Guo Information and Computational SciencesJames Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK
Micha Bayer Information and Computational SciencesJames Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK
Ronja Wonneberger Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)Corrensstrasse 3D‐06466Stadt SeelandGermany
Linda Milne Information and Computational SciencesJames Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK
Miriam Schreiber Division of Plant SciencesUniversity of Dundee, James Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK
Allison Haaning Department of Agronomy and Plant GeneticsUniversity of Minnesota1991 Upper Buford Circle, 542 Borlaug HallSt PaulMinnesota55108USA
Gary J. Muehlbauer Department of Agronomy and Plant GeneticsUniversity of Minnesota1991 Upper Buford Circle, 542 Borlaug HallSt PaulMinnesota55108USA
Nicola McCallum Cell and Molecular SciencesJames Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK
John Fuller Cell and Molecular SciencesJames Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK
Craig Simpson Cell and Molecular SciencesJames Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK
Nils Stein Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)Corrensstrasse 3D‐06466Stadt SeelandGermany Center for Integrated Breeding Research (CiBreed)Georg‐August‐UniversityGöttingenGermany
John W. S. Brown Division of Plant SciencesUniversity of Dundee, James Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK Cell and Molecular SciencesJames Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK
Robbie Waugh Division of Plant SciencesUniversity of Dundee, James Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK Cell and Molecular SciencesJames Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK School of Agriculture and Wine & Waite Research InstituteUniversity of AdelaideWaite CampusGlen OsmondSouth Australia5064Australia
Runxuan Zhang Information and Computational SciencesJames Hutton InstituteInvergowrieDundeeDD2 5DAScotlandUK

Collapse

de la Rubia I, Srivastava A, Xue W, Indi JA, Carbonell-Sala S, Lagarde J, Albà MM, Eyras E. RATTLE: reference-free reconstruction and quantification of transcriptomes from Nanopore sequencing. Genome Biol 2022;23:153. [PMID: 35804393 PMCID: PMC9264490 DOI: 10.1186/s13059-022-02715-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/20/2022] [Indexed: 11/04/2022] Open

Zhang R, Kuo R, Coulter M, Calixto CPG, Entizne JC, Guo W, Marquez Y, Milne L, Riegler S, Matsui A, Tanaka M, Harvey S, Gao Y, Wießner-Kroh T, Paniagua A, Crespi M, Denby K, Hur AB, Huq E, Jantsch M, Jarmolowski A, Koester T, Laubinger S, Li QQ, Gu L, Seki M, Staiger D, Sunkar R, Szweykowska-Kulinska Z, Tu SL, Wachter A, Waugh R, Xiong L, Zhang XN, Conesa A, Reddy ASN, Barta A, Kalyna M, Brown JWS. A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis. Genome Biol 2022;23:149. [PMID: 35799267 PMCID: PMC9264592 DOI: 10.1186/s13059-022-02711-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 06/15/2022] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis.

RESULTS

We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts-twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage.

CONCLUSIONS

AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.

Collapse

Affiliation(s)

Runxuan Zhang Information and Computational Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK.
Richard Kuo The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK
Max Coulter Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK
Cristiane P G Calixto Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK Present address: Institute of Biosciences, University of São Paulo, São Paulo, 05508-090, Brazil
Juan Carlos Entizne Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK
Wenbin Guo Information and Computational Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
Yamile Marquez Centre for Genomic Regulation, C/ Dr. Aiguader 88, 08003, Barcelona, Spain
Linda Milne Information and Computational Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
Stefan Riegler Institute of Molecular Plant Biology, Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences (BOKU), Muthgasse 18, 1190, Vienna, Austria Present address: Institute of Science and Technology Austria, Am Campus 1, 3400, Klosterneuburg, Austria
Akihiro Matsui Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
Maho Tanaka Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
Sarah Harvey Centre for Novel Agricultural Products (CNAP), Department of Biology, University of York Wentworth Way, York, YO10 5DD, UK
Yubang Gao College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
Theresa Wießner-Kroh Center for Plant Molecular Biology (ZMBP), University of Tübingen, Auf der Morgenstelle 32, 72076, Tübingen, Germany
Alejandro Paniagua Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Paterna, Valencia, Spain
Martin Crespi French National Centre for Scientific Research \| CNRS INRAE-Universities of Paris Saclay and Paris, Institute of Plant Sciences Paris Saclay IPS2, Rue de Noetzlin, 91192, Gif sur Yvette, France
Katherine Denby Centre for Novel Agricultural Products (CNAP), Department of Biology, University of York Wentworth Way, York, YO10 5DD, UK
Asa Ben Hur Department of Computer Science, Colorado State University, 1873 Campus Delivery, Fort Collins, CO, 80523-1873, USA
Enamul Huq Department of Molecular Biosciences, University of Texas at Austin, 100 East 24th St., Austin, TX, 78712-1095, USA
Michael Jantsch Department of Cell and Developmental Biology, Center for Anatomy and Cell Biology, Medical University of Vienna, Schwarzspanierstrasse 17 A-1090, Vienna, Austria
Artur Jarmolowski Department of Gene Expression, Adam Mickiewicz University, Poznań, Poland
Tino Koester RNA Biology and Molecular Physiology, Faculty for Biology, Bielefeld University, Universitaetsstrasse 25, 33615, Bielefeld, Germany
Sascha Laubinger Institut für Biologie und Umweltwissenschaften (IBU), Carl von Ossietzky Universität Oldenburg, Carl von Ossietzky-Str. 9-11, 26111, Oldenburg, Germany Institute of Biology, Department of Genetics, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
Qingshun Quinn Li Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, 91766, USA Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, 361102, Fujian, China
Lianfeng Gu College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
Motoaki Seki Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
Dorothee Staiger RNA Biology and Molecular Physiology, Faculty for Biology, Bielefeld University, Universitaetsstrasse 25, 33615, Bielefeld, Germany
Ramanjulu Sunkar Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA
Zofia Szweykowska-Kulinska Department of Gene Expression, Adam Mickiewicz University, Poznań, Poland
Shih-Long Tu Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
Andreas Wachter Center for Plant Molecular Biology (ZMBP), University of Tübingen, Auf der Morgenstelle 32, 72076, Tübingen, Germany Present address: Institute for Molecular Physiology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 17, 55128, Mainz, Germany
Robbie Waugh Cell and Molecular Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
Liming Xiong Department of Biology, Hong Kong Baptist University, Hong Kong, China
Xiao-Ning Zhang Biology Department, School of Arts and Sciences, St. Bonaventure University, 3261 West State Road, St. Bonaventure, NY, 14778, USA
Ana Conesa Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Paterna, Valencia, Spain
Anireddy S N Reddy Department of Biology and Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO, 80523, USA
Andrea Barta Max F. Perutz Laboratories, Medical University of Vienna, Center of Medical Biochemistry, Dr.-Bohr-Gasse 9/3, A-1030, Vienna, Austria
Maria Kalyna Institute of Molecular Plant Biology, Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences (BOKU), Muthgasse 18, 1190, Vienna, Austria
John W S Brown Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK Cell and Molecular Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK

Collapse

Hoang MTV, Irinyi L, Hu Y, Schwessinger B, Meyer W. Long-Reads-Based Metagenomics in Clinical Diagnosis With a Special Focus on Fungal Infections. Front Microbiol 2022;12:708550. [PMID: 35069461 PMCID: PMC8770865 DOI: 10.3389/fmicb.2021.708550] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 12/03/2021] [Indexed: 12/12/2022] Open

Abstract

Identification of the causative infectious agent is essential in the management of infectious diseases, with the ideal diagnostic method being rapid, accurate, and informative, while remaining cost-effective. Traditional diagnostic techniques rely on culturing and cell propagation to isolate and identify the causative pathogen. These techniques are limited by the ability and the time required to grow or propagate an agent in vitro and the facts that identification based on morphological traits are non-specific, insensitive, and reliant on technical expertise. The evolution of next-generation sequencing has revolutionized genomic studies to generate more data at a cheaper cost. These are divided into short- and long-read sequencing technologies, depending on the length of reads generated during sequencing runs. Long-read sequencing also called third-generation sequencing emerged commercially through the instruments released by Pacific Biosciences and Oxford Nanopore Technologies, although relying on different sequencing chemistries, with the first one being more accurate both platforms can generate ultra-long sequence reads. Long-read sequencing is capable of entirely spanning previously established genomic identification regions or potentially small whole genomes, drastically improving the accuracy of the identification of pathogens directly from clinical samples. Long-read sequencing may also provide additional important clinical information, such as antimicrobial resistance profiles and epidemiological data from a single sequencing run. While initial applications of long-read sequencing in clinical diagnosis showed that it could be a promising diagnostic technique, it also has highlighted the need for further optimization. In this review, we show the potential long-read sequencing has in clinical diagnosis of fungal infections and discuss the pros and cons of its implementation.

Collapse

Athanasopoulou K, Boti MA, Adamopoulos PG, Skourou PC, Scorilas A. Third-Generation Sequencing: The Spearhead towards the Radical Transformation of Modern Genomics. Life (Basel) 2021;12:life12010030. [PMID: 35054423 PMCID: PMC8780579 DOI: 10.3390/life12010030] [Citation(s) in RCA: 99] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/20/2021] [Accepted: 12/23/2021] [Indexed: 12/14/2022] Open

Chen Z, He X. Application of third-generation sequencing in cancer research. MEDICAL REVIEW (BERLIN, GERMANY) 2021;1:150-171. [PMID: 37724303 PMCID: PMC10388785 DOI: 10.1515/mr-2021-0013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/09/2021] [Indexed: 09/20/2023]

Sacristán-Horcajada E, González-de la Fuente S, Peiró-Pastor R, Carrasco-Ramiro F, Amils R, Requena JM, Berenguer J, Aguado B. ARAMIS: From systematic errors of NGS long reads to accurate assemblies. Brief Bioinform 2021;22:bbab170. [PMID: 34013348 PMCID: PMC8574707 DOI: 10.1093/bib/bbab170] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/31/2021] [Accepted: 04/11/2021] [Indexed: 01/23/2023] Open

Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021;39:1348-1365. [PMID: 34750572 PMCID: PMC8988251 DOI: 10.1038/s41587-021-01108-x] [Citation(s) in RCA: 806] [Impact Index Per Article: 201.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 09/22/2021] [Indexed: 12/13/2022]

Guo H, Fu Y, Gao Y, Li J, Wang Y, Liu B. deGSM: Memory Scalable Construction Of Large Scale de Bruijn Graph. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2157-2166. [PMID: 31056509 DOI: 10.1109/tcbb.2019.2913932] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Sahlin K. Effective sequence similarity detection with strobemers. Genome Res 2021;31:2080-2094. [PMID: 34667119 PMCID: PMC8559714 DOI: 10.1101/gr.275648.121] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 08/20/2021] [Indexed: 01/08/2023]

Lima L, Marchet C, Caboche S, Da Silva C, Istace B, Aury JM, Touzet H, Chikhi R. Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data. Brief Bioinform 2021;21:1164-1181. [PMID: 31232449 DOI: 10.1093/bib/bbz058] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 04/05/2019] [Accepted: 04/22/2019] [Indexed: 12/13/2022] Open

Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus monodon. Life (Basel) 2021;11:life11080862. [PMID: 34440606 PMCID: PMC8399832 DOI: 10.3390/life11080862] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 08/07/2021] [Accepted: 08/17/2021] [Indexed: 12/16/2022] Open

Ito Y, Terao Y, Noma S, Tagami M, Yoshida E, Hayashizaki Y, Itoh M, Kawaji H. Nanopore sequencing reveals TACC2 locus complexity and diversity of isoforms transcribed from an intronic promoter. Sci Rep 2021;11:9355. [PMID: 33931666 PMCID: PMC8087818 DOI: 10.1038/s41598-021-88018-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 04/07/2021] [Indexed: 12/12/2022] Open

Affiliation(s)

Yosuke Ito Faculty of Medicine, Department of Obstetrics and Gynecology, Juntendo University, 2-1-1 Hongo, Bunkyo, Tokyo, 113-8421, Japan.,Preventive Medicine and Applied Genomics Unit, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan
Yasuhisa Terao Faculty of Medicine, Department of Obstetrics and Gynecology, Juntendo University, 2-1-1 Hongo, Bunkyo, Tokyo, 113-8421, Japan.
Shohei Noma Laboratory for Comprehensive Genomic Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan
Michihira Tagami Laboratory for Comprehensive Genomic Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan
Emiko Yoshida Faculty of Medicine, Department of Obstetrics and Gynecology, Juntendo University, 2-1-1 Hongo, Bunkyo, Tokyo, 113-8421, Japan.,RIKEN Center for Integrative Medical Sciences, Nucleic Acid Diagnostic System Development Unit, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan.,Diagnostics and Therapeutics of Intractable Diseases, Intractable Disease Research Center, Juntendo University Graduate School of Medicine, Tokyo, Japan
Yoshihide Hayashizaki RIKEN Preventive Medicine and Diagnosis Innovation Program, 2-1 Hirosawa, Wako, Yokohama, Saitama, 351-0198, Japan
Masayoshi Itoh RIKEN Preventive Medicine and Diagnosis Innovation Program, 2-1 Hirosawa, Wako, Yokohama, Saitama, 351-0198, Japan.,Laboratory for Advanced Genomics Circuit, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan
Hideya Kawaji Preventive Medicine and Applied Genomics Unit, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan. .,RIKEN Preventive Medicine and Diagnosis Innovation Program, 2-1 Hirosawa, Wako, Yokohama, Saitama, 351-0198, Japan. .,Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, 2-1-6 Kamikitazawa, Setagaya-ku, Tokyo, 156-8506, Japan.

Collapse

Du N, Shang J, Sun Y. Improving protein domain classification for third-generation sequencing reads using deep learning. BMC Genomics 2021;22:251. [PMID: 33836667 PMCID: PMC8033682 DOI: 10.1186/s12864-021-07468-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Accepted: 02/19/2021] [Indexed: 12/21/2022] Open

Oliva M, Milicchio F, King K, Benson G, Boucher C, Prosperi M. Portable nanopore analytics: are we there yet? Bioinformatics 2021;36:4399-4405. [PMID: 32277811 DOI: 10.1093/bioinformatics/btaa237] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 02/07/2020] [Accepted: 04/06/2020] [Indexed: 01/23/2023] Open

Morisse P, Marchet C, Limasset A, Lecroq T, Lefebvre A. Scalable long read self-correction and assembly polishing with multiple sequence alignment. Sci Rep 2021;11:761. [PMID: 33436980 PMCID: PMC7804095 DOI: 10.1038/s41598-020-80757-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 12/22/2020] [Indexed: 11/09/2022] Open

Sahlin K, Medvedev P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun 2021;12:2. [PMID: 33397972 PMCID: PMC7782715 DOI: 10.1038/s41467-020-20340-8] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 11/25/2020] [Indexed: 01/24/2023] Open

Zhang H, Jain C, Aluru S. A comprehensive evaluation of long read error correction methods. BMC Genomics 2020;21:889. [PMID: 33349243 PMCID: PMC7751105 DOI: 10.1186/s12864-020-07227-0] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/12/2020] [Indexed: 01/07/2023] Open

Abstract

BACKGROUND

Third-generation single molecule sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used.

RESULTS

In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research.

CONCLUSIONS

Despite the high error rate of long reads, the state-of-the-art correction tools can achieve high correction quality. When short reads are available, the best hybrid methods outperform non-hybrid methods in terms of correction quality and computing resource usage. When choosing tools for use, practitioners are suggested to be careful with a few correction tools that discard reads, and check the effect of error correction tools on downstream analysis. Our evaluation code is available as open-source at https://github.com/haowenz/LRECE .

Collapse

Roe D, Williams J, Ivery K, Brouckaert J, Downey N, Locklear C, Kuang R, Maiers M. Efficient Sequencing, Assembly, and Annotation of Human KIR Haplotypes. Front Immunol 2020;11:582927. [PMID: 33162997 PMCID: PMC7581912 DOI: 10.3389/fimmu.2020.582927] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 09/17/2020] [Indexed: 12/04/2022] Open

Prezza N, Pisanti N, Sciortino M, Rosone G. Variable-order reference-free variant discovery with the Burrows-Wheeler Transform. BMC Bioinformatics 2020;21:260. [PMID: 32938358 PMCID: PMC7493873 DOI: 10.1186/s12859-020-03586-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 06/08/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases. The preliminary tool, however, was able to identify only SNPs and it was too slow and memory consuming due to the use of additional heavy data structures (namely, the Suffix and LCP arrays), besides the BWT.

RESULTS

In this paper, we introduce a new algorithm and the corresponding tool ebwt2InDel that (i) extend the framework of [Prezza et al., AMB 2019] to detect also INDELs, and (ii) implements recent algorithmic findings that allow to perform the whole analysis using just the BWT, thus reducing the working space by one order of magnitude and allowing the analysis of full genomes. Finally, we describe a simple strategy for effectively parallelizing our tool for SNP detection only. On a 24-cores machine, the parallel version of our tool is one order of magnitude faster than the sequential one. The tool ebwt2InDel is available at github.com/nicolaprezza/ebwt2InDel .

CONCLUSIONS

Results on a synthetic dataset covered at 30x (Human chromosome 1) show that our tool is indeed able to find up to 83% of the SNPs and 72% of the existing INDELs. These percentages considerably improve the 71% of SNPs and 51% of INDELs found by the state-of-the art tool based on de Bruijn graphs. We furthermore report results on larger (real) Human whole-genome sequencing experiments. Also in these cases, our tool exhibits a much higher sensitivity than the state-of-the art tool.

Collapse

Langa J, Estonba A, Conklin D. EXFI: Exon and splice graph prediction without a reference genome. Ecol Evol 2020;10:8880-8893. [PMID: 32884664 PMCID: PMC7452765 DOI: 10.1002/ece3.6587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 06/03/2020] [Accepted: 06/08/2020] [Indexed: 11/19/2022] Open

Eizenga JM, Novak AM, Sibbesen JA, Heumos S, Ghaffaari A, Hickey G, Chang X, Seaman JD, Rounthwaite R, Ebler J, Rautiainen M, Garg S, Paten B, Marschall T, Sirén J, Garrison E. Pangenome Graphs. Annu Rev Genomics Hum Genet 2020;21:139-162. [PMID: 32453966 DOI: 10.1146/annurev-genom-120219-080406] [Citation(s) in RCA: 136] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Affiliation(s)

Jordan M Eizenga Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Adam M Novak Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Jonas A Sibbesen Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Simon Heumos Quantitative Biology Center, University of Tübingen, 72076 Tübingen, Germany
Ali Ghaffaari Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
Glenn Hickey Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Xian Chang Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Josiah D Seaman Royal Botanic Gardens, Kew, Richmond TW9 3AB, United Kingdom.,School of Biological and Chemical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
Robin Rounthwaite Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Jana Ebler Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
Mikko Rautiainen Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
Shilpa Garg Departments of Genetics and Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02215, USA.,Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
Benedict Paten Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Tobias Marschall Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
Jouni Sirén Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Erik Garrison Genomics Institute, University of California, Santa Cruz, California 95064, USA;

Collapse

Batista FM, Stapleton T, Lowther JA, Fonseca VG, Shaw R, Pond C, Walker DI, van Aerle R, Martinez-Urtaza J. Whole Genome Sequencing of Hepatitis A Virus Using a PCR-Free Single-Molecule Nanopore Sequencing Approach. Front Microbiol 2020;11:874. [PMID: 32523561 PMCID: PMC7261825 DOI: 10.3389/fmicb.2020.00874] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Accepted: 04/14/2020] [Indexed: 12/18/2022] Open

Olson ND, Treangen TJ, Hill CM, Cepeda-Espinoza V, Ghurye J, Koren S, Pop M. Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes. Brief Bioinform 2020;20:1140-1150. [PMID: 28968737 DOI: 10.1093/bib/bbx098] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 07/13/2017] [Indexed: 01/09/2023] Open

Siadjeu C, Pucker B, Viehöver P, Albach DC, Weisshaar B. High Contiguity De Novo Genome Sequence Assembly of Trifoliate Yam (Dioscorea dumetorum) Using Long Read Sequencing. Genes (Basel) 2020;11:E274. [PMID: 32143301 PMCID: PMC7140821 DOI: 10.3390/genes11030274] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 02/25/2020] [Accepted: 02/29/2020] [Indexed: 12/17/2022] Open

Das AK, Goswami S, Lee K, Park SJ. A hybrid and scalable error correction algorithm for indel and substitution errors of long reads. BMC Genomics 2019;20:948. [PMID: 31856721 PMCID: PMC6923905 DOI: 10.1186/s12864-019-6286-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Abstract

BACKGROUND

Long-read sequencing has shown the promises to overcome the short length limitations of second-generation sequencing by providing more complete assembly. However, the computation of the long sequencing reads is challenged by their higher error rates (e.g., 13% vs. 1%) and higher cost ($0.3 vs. $0.03 per Mbp) compared to the short reads.

METHODS

In this paper, we present a new hybrid error correction tool, called ParLECH (Parallel Long-read Error Correction using Hybrid methodology). The error correction algorithm of ParLECH is distributed in nature and efficiently utilizes the k-mer coverage information of high throughput Illumina short-read sequences to rectify the PacBio long-read sequences.ParLECH first constructs a de Bruijn graph from the short reads, and then replaces the indel error regions of the long reads with their corresponding widest path (or maximum min-coverage path) in the short read-based de Bruijn graph. ParLECH then utilizes the k-mer coverage information of the short reads to divide each long read into a sequence of low and high coverage regions, followed by a majority voting to rectify each substituted error base.

RESULTS

ParLECH outperforms latest state-of-the-art hybrid error correction methods on real PacBio datasets. Our experimental evaluation results demonstrate that ParLECH can correct large-scale real-world datasets in an accurate and scalable manner. ParLECH can correct the indel errors of human genome PacBio long reads (312 GB) with Illumina short reads (452 GB) in less than 29 h using 128 compute nodes. ParLECH can align more than 92% bases of an E. coli PacBio dataset with the reference genome, proving its accuracy.

CONCLUSION

ParLECH can scale to over terabytes of sequencing data using hundreds of computing nodes. The proposed hybrid error correction methodology is novel and rectifies both indel and substitution errors present in the original long reads or newly introduced by the short reads.

Collapse

Multiplexed Non-barcoded Long-Read Sequencing and Assembling Genomes of Bacillus Strains in Error-Free Simulations. Curr Microbiol 2019;77:79-84. [PMID: 31722044 DOI: 10.1007/s00284-019-01808-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 11/02/2019] [Indexed: 10/25/2022]

Gao Y, Liu B, Wang Y, Xing Y. TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain. Bioinformatics 2019;35:i200-i207. [PMID: 31510677 PMCID: PMC6612900 DOI: 10.1093/bioinformatics/btz376] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Firtina C, Bar-Joseph Z, Alkan C, Cicek AE. Hercules: a profile HMM-based hybrid error correction algorithm for long reads. Nucleic Acids Res 2019;46:e125. [PMID: 30124947 PMCID: PMC6265270 DOI: 10.1093/nar/gky724] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 08/07/2018] [Indexed: 01/15/2023] Open

Babarinde IA, Li Y, Hutchins AP. Computational Methods for Mapping, Assembly and Quantification for Coding and Non-coding Transcripts. Comput Struct Biotechnol J 2019;17:628-637. [PMID: 31193391 PMCID: PMC6526290 DOI: 10.1016/j.csbj.2019.04.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 04/24/2019] [Accepted: 04/29/2019] [Indexed: 12/17/2022] Open

Kim HM, Weber JA, Lee N, Park SG, Cho YS, Bhak Y, Lee N, Jeon Y, Jeon S, Luria V, Karger A, Kirschner MW, Jo YJ, Woo S, Shin K, Chung O, Ryu JC, Yim HS, Lee JH, Edwards JS, Manica A, Bhak J, Yum S. The genome of the giant Nomura's jellyfish sheds light on the early evolution of active predation. BMC Biol 2019;17:28. [PMID: 30925871 PMCID: PMC6441219 DOI: 10.1186/s12915-019-0643-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 02/28/2019] [Indexed: 01/08/2023] Open

Affiliation(s)

Hak-Min Kim Korean Genomics Industrialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
Jessica A Weber Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA Department of Biology, University of New Mexico, Albuquerque, NM, 87131, USA
Nayoung Lee Ecological Risk Research Division, Korea Institute of Ocean Science and Technology (KIOST), Geoje, 53201, Republic of Korea
Seung Gu Park Korean Genomics Industrialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
Yun Sung Cho Korean Genomics Industrialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea Clinomics Inc., Ulsan, 44919, Republic of Korea
Youngjune Bhak Korean Genomics Industrialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
Nayun Lee Ecological Risk Research Division, Korea Institute of Ocean Science and Technology (KIOST), Geoje, 53201, Republic of Korea
Yeonsu Jeon Korean Genomics Industrialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
Sungwon Jeon Korean Genomics Industrialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
Victor Luria Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
Amir Karger IT - Research Computing, Harvard Medical School, Boston, MA, 02115, USA
Marc W Kirschner Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
Ye Jin Jo Ecological Risk Research Division, Korea Institute of Ocean Science and Technology (KIOST), Geoje, 53201, Republic of Korea
Seonock Woo Faculty of Marine Environmental Science, University of Science and Technology (UST), Geoje, 53201, Republic of Korea Marine Biotechnology Research Center, Korea Institute of Ocean Science and Technology (KIOST), Busan, 49111, Republic of Korea
Kyoungsoon Shin Ballast Water Center, Korea Institute of Ocean Science and Technology (KIOST), Geoje, 53201, Republic of Korea
Oksung Chung Clinomics Inc., Ulsan, 44919, Republic of Korea Personal Genomics Institute, Genome Research Foundation, Cheongju, 28160, Republic of Korea
Jae-Chun Ryu Cellular and Molecular Toxicology Laboratory, Center for Environment, Health and Welfare Research, Korea Institute of Science and Technology (KIST), Seoul, 02792, Republic of Korea
Hyung-Soon Yim Marine Biotechnology Research Center, Korea Institute of Ocean Science and Technology (KIOST), Busan, 49111, Republic of Korea
Jung-Hyun Lee Marine Biotechnology Research Center, Korea Institute of Ocean Science and Technology (KIOST), Busan, 49111, Republic of Korea
Jeremy S Edwards Chemistry and Chemical Biology, UNM Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM, 87131, USA
Andrea Manica Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
Jong Bhak Korean Genomics Industrialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea. Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea. Clinomics Inc., Ulsan, 44919, Republic of Korea. Personal Genomics Institute, Genome Research Foundation, Cheongju, 28160, Republic of Korea.
Seungshic Yum Ecological Risk Research Division, Korea Institute of Ocean Science and Technology (KIOST), Geoje, 53201, Republic of Korea. Faculty of Marine Environmental Science, University of Science and Technology (UST), Geoje, 53201, Republic of Korea.

Collapse

Zhao L, Zhang H, Kohnen MV, Prasad KVSK, Gu L, Reddy ASN. Analysis of Transcriptome and Epitranscriptome in Plants Using PacBio Iso-Seq and Nanopore-Based Direct RNA Sequencing. Front Genet 2019;10:253. [PMID: 30949200 PMCID: PMC6438080 DOI: 10.3389/fgene.2019.00253] [Citation(s) in RCA: 98] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 03/06/2019] [Indexed: 12/18/2022] Open

Fu S, Wang A, Au KF. A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol 2019;20:26. [PMID: 30717772 PMCID: PMC6362602 DOI: 10.1186/s13059-018-1605-z] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 12/05/2018] [Indexed: 12/20/2022] Open

Khan M, Fadaie Z, Cornelis SS, Cremers FPM, Roosing S. Identification and Analysis of Genes Associated with Inherited Retinal Diseases. Methods Mol Biol 2019;1834:3-27. [PMID: 30324433 DOI: 10.1007/978-1-4939-8669-9_1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Bakhtiari M, Shleizer-Burko S, Gymrek M, Bansal V, Bafna V. Targeted genotyping of variable number tandem repeats with adVNTR. Genome Res 2018;28:1709-1719. [PMID: 30352806 PMCID: PMC6211647 DOI: 10.1101/gr.235119.118] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 10/02/2018] [Indexed: 12/20/2022]

Wang JR, Holt J, McMillan L, Jones CD. FMLRC: Hybrid long read error correction using an FM-index. BMC Bioinformatics 2018;19:50. [PMID: 29426289 PMCID: PMC5807796 DOI: 10.1186/s12859-018-2051-3] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 02/01/2018] [Indexed: 11/16/2022] Open

Liu Y, Lan C, Blumenstein M, Li J. Bi-level error correction for PacBio long reads. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017;17:899-905. [PMID: 29990239 DOI: 10.1109/tcbb.2017.2780832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Abstract

The latest sequencing technologies such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines can generate long reads at the length of thousands of nucleic bases which is much longer than the reads at the length of hundreds generated by Illumina machines. However, these long reads are prone to much higher error rates, for example 15%, making downstream analysis and applications very difficult. Error correction is a process to improve the quality of sequencing data. Hybrid correction strategies have been recently proposed to combine Illumina reads of low error rates to fix sequencing errors in the noisy long reads with good performance. In this paper, we propose a new method named Bicolor, a bi-level framework of hybrid error correction for further improving the quality of PacBio long reads. At the first level, our method uses a de Bruijn graph-based error correction idea to search paths in pairs of solid -mers iteratively with an increasing length of -mer. At the second level, we combine the processed results under different parameters from the first level. In particular, a multiple sequence alignment algorithm is used to align those similar long reads, followed by a voting algorithm which determines the final base at each position of the reads. We compare the superior performance of Bicolor with three state-of-the-art methods on three real data sets. Results demonstrate that Bicolor always achieves the highest identity ratio. Bicolor also achieves a higher alignment ratio () and a higher number of aligned reads than the current methods on two data sets. On the third data set, our method is closely competitive to the current methods in terms of number of aligned reads and genome coverage. The C++ source codes of our algorithm are freely available at https://github.com/yuansliu/Bicolor.

Collapse