Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024;23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open

Sami A, El-Metwally S, Rashad MZ. MAC-ErrorReads: machine learning-assisted classifier for filtering erroneous NGS reads. BMC Bioinformatics 2024;25:61. [PMID: 38321434 PMCID: PMC10848413 DOI: 10.1186/s12859-024-05681-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/29/2024] [Indexed: 02/08/2024] Open

Abstract

BACKGROUND

The rapid advancement of next-generation sequencing (NGS) machines in terms of speed and affordability has led to the generation of a massive amount of biological data at the expense of data quality as errors become more prevalent. This introduces the need to utilize different approaches to detect and filtrate errors, and data quality assurance is moved from the hardware space to the software preprocessing stages.

RESULTS

We introduce MAC-ErrorReads, a novel Machine learning-Assisted Classifier designed for filtering Erroneous NGS Reads. MAC-ErrorReads transforms the erroneous NGS read filtration process into a robust binary classification task, employing five supervised machine learning algorithms. These models are trained on features extracted through the computation of Term Frequency-Inverse Document Frequency (TF_IDF) values from various datasets such as E. coli, GAGE S. aureus, H. Chr14, Arabidopsis thaliana Chr1 and Metriaclima zebra. Notably, Naive Bayes demonstrated robust performance across various datasets, displaying high accuracy, precision, recall, F1-score, MCC, and ROC values. The MAC-ErrorReads NB model accurately classified S. aureus reads, surpassing most error correction tools with a 38.69% alignment rate. For H. Chr14, tools like Lighter, Karect, CARE, Pollux, and MAC-ErrorReads showed rates above 99%. BFC and RECKONER exceeded 98%, while Fiona had 95.78%. For the Arabidopsis thaliana Chr1, Pollux, Karect, RECKONER, and MAC-ErrorReads demonstrated good alignment rates of 92.62%, 91.80%, 91.78%, and 90.87%, respectively. For the Metriaclima zebra, Pollux achieved a high alignment rate of 91.23%, despite having the lowest number of mapped reads. MAC-ErrorReads, Karect, and RECKONER demonstrated good alignment rates of 83.76%, 83.71%, and 83.67%, respectively, while also producing reasonable numbers of mapped reads to the reference genome.

CONCLUSIONS

This study demonstrates that machine learning approaches for filtering NGS reads effectively identify and retain the most accurate reads, significantly enhancing assembly quality and genomic coverage. The integration of genomics and artificial intelligence through machine learning algorithms holds promise for enhancing NGS data quality, advancing downstream data analysis accuracy, and opening new opportunities in genetics, genomics, and personalized medicine research.

Collapse

Beiki H, Murdoch BM, Park CA, Kern C, Kontechy D, Becker G, Rincon G, Jiang H, Zhou H, Thorne J, Koltes JE, Michal JJ, Davenport K, Rijnkels M, Ross PJ, Hu R, Corum S, McKay S, Smith TPL, Liu W, Ma W, Zhang X, Xu X, Han X, Jiang Z, Hu ZL, Reecy JM. Enhanced bovine genome annotation through integration of transcriptomics and epi-transcriptomics datasets facilitates genomic biology. Gigascience 2024;13:giae019. [PMID: 38626724 PMCID: PMC11020238 DOI: 10.1093/gigascience/giae019] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Revised: 07/29/2023] [Accepted: 03/27/2024] [Indexed: 04/18/2024] Open

Abstract

BACKGROUND

The accurate identification of the functional elements in the bovine genome is a fundamental requirement for high-quality analysis of data informing both genome biology and genomic selection. Functional annotation of the bovine genome was performed to identify a more complete catalog of transcript isoforms across bovine tissues.

RESULTS

A total of 160,820 unique transcripts (50% protein coding) representing 34,882 unique genes (60% protein coding) were identified across tissues. Among them, 118,563 transcripts (73% of the total) were structurally validated by independent datasets (PacBio isoform sequencing data, Oxford Nanopore Technologies sequencing data, de novo assembled transcripts from RNA sequencing data) and comparison with Ensembl and NCBI gene sets. In addition, all transcripts were supported by extensive data from different technologies such as whole transcriptome termini site sequencing, RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression, chromatin immunoprecipitation sequencing, and assay for transposase-accessible chromatin using sequencing. A large proportion of identified transcripts (69%) were unannotated, of which 86% were produced by annotated genes and 14% by unannotated genes. A median of two 5' untranslated regions were expressed per gene. Around 50% of protein-coding genes in each tissue were bifunctional and transcribed both coding and noncoding isoforms. Furthermore, we identified 3,744 genes that functioned as noncoding genes in fetal tissues but as protein-coding genes in adult tissues. Our new bovine genome annotation extended more than 11,000 annotated gene borders compared to Ensembl or NCBI annotations. The resulting bovine transcriptome was integrated with publicly available quantitative trait loci data to study tissue-tissue interconnection involved in different traits and construct the first bovine trait similarity network.

CONCLUSIONS

These validated results show significant improvement over current bovine genome annotations.

Collapse

Affiliation(s)

Hamid Beiki Department of Animal Science, Iowa State University, Ames, IA 50011, USA
Brenda M Murdoch Department of Animal and Veterinary and Food Science, University of Idaho, ID 83844, USA
Carissa A Park Department of Animal Science, Iowa State University, Ames, IA 50011, USA
Chandlar Kern Department of Animal Science, Pennsylvania State University, PA 16802, USA
Denise Kontechy Department of Animal and Veterinary and Food Science, University of Idaho, ID 83844, USA
Gabrielle Becker Department of Animal and Veterinary and Food Science, University of Idaho, ID 83844, USA
Gonzalo Rincon Zoetis, Parsippany-Troy Hills, NJ 07054, USA
Honglin Jiang Department of Animal and Poultry Sciences, Virginia Tech, VA 24060, USA
Huaijun Zhou Department of Animal Science, University of California, Davis, CA 95616, USA
Jacob Thorne Department of Animal and Veterinary and Food Science, University of Idaho, ID 83844, USA
James E Koltes Department of Animal Science, Iowa State University, Ames, IA 50011, USA
Jennifer J Michal Department of Animal Science, Washington State University, WA 99164, USA
Kimberly Davenport Department of Animal and Veterinary and Food Science, University of Idaho, ID 83844, USA
Monique Rijnkels Department of Veterinary Integrative Biosciences, Texas A&M University, TX 77843, USA
Pablo J Ross Department of Animal Science, University of California, Davis, CA 95616, USA
Rui Hu Department of Animal and Poultry Sciences, Virginia Tech, VA 24060, USA
Sarah Corum Zoetis, Parsippany-Troy Hills, NJ 07054, USA
Stephanie McKay University of Missouri, Columbia, MO 65211, USA
Timothy P L Smith USDA, ARS, USMARC, 68933, USA
Wansheng Liu Department of Animal Science, Pennsylvania State University, PA 16802, USA
Wenzhi Ma Department of Animal Science, Pennsylvania State University, PA 16802, USA
Xiaohui Zhang Department of Animal Science, Washington State University, WA 99164, USA
Xiaoqing Xu Department of Animal Science, University of California, Davis, CA 95616, USA
Xuelei Han Department of Animal Science, Washington State University, WA 99164, USA
Zhihua Jiang Department of Animal Science, Washington State University, WA 99164, USA
Zhi-Liang Hu Department of Animal Science, Iowa State University, Ames, IA 50011, USA
James M Reecy Department of Animal Science, Iowa State University, Ames, IA 50011, USA

Collapse

Yan L, Yin Z, Zhang H, Zhao Z, Wang M, Müller A, Kallenborn F, Wichmann A, Wei Y, Niu B, Schmidt B, Liu W. RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data. Methods 2023;216:39-50. [PMID: 37330158 DOI: 10.1016/j.ymeth.2023.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/26/2023] [Accepted: 06/12/2023] [Indexed: 06/19/2023] Open

Cheng C, Fei Z, Xiao P. Methods to improve the accuracy of next-generation sequencing. Front Bioeng Biotechnol 2023;11:982111. [PMID: 36741756 PMCID: PMC9895957 DOI: 10.3389/fbioe.2023.982111] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 01/11/2023] [Indexed: 01/21/2023] Open

Genome sequence assembly algorithms and misassembly identification methods. Mol Biol Rep 2022;49:11133-11148. [PMID: 36151399 DOI: 10.1007/s11033-022-07919-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 09/05/2022] [Indexed: 10/14/2022]

Lightweight Pattern Matching Method for DNA Sequencing in Internet of Medical Things. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:6980335. [PMID: 36120669 PMCID: PMC9477578 DOI: 10.1155/2022/6980335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 06/28/2022] [Accepted: 07/29/2022] [Indexed: 11/18/2022]

Abstract

An area of medical science, that is, gaining prominence, is DNA sequencing. Genetic mutations responsible for the disease have been detected using DNA sequencing. The research is focusing on pattern identification methodologies for dealing with DNA-sequencing problems relating to various applications. A few examples of such problems are alignment and assembly of short reads from next generation sequencing (NGS), comparing DNA sequences, and determining the frequency of a pattern in a sequence. The approximate matching of DNA sequences is also well suited for many applications equivalent to the exact matching of the sequence since the DNA sequences are often subject to mutation. Consequently, recognizing pattern similarity becomes necessary. Furthermore, it can also be used in virtually every application that calls for pattern matching, for example, spell-checking, spam filtering, and search engines. According to the traditional approach, finding a similar pattern in the case where the sequence length is l_s and the pattern length is l_p occurs in O (l_s∗l_p). This heavy processing is caused by comparing every character of the sequence repeatedly with the pattern. The research intended to reduce the time complexity of the pattern matching by introducing an approach named “optimized pattern similarity identification” (OPSI). This methodology constructs a table, entitled “shift beyond for avoiding redundant comparison” (SBARC), to bypass the characters in the texts that are already compared with the pattern. The table pertains to the information about the character distance to be skipped in the matching. OPSI discovers at most spots of similar patterns occur in the sequence (by ignoring è mismatches). The experiment resulted in the time complexity identified as O (l_s. è). In comparison to the size of the pattern, the allowed number of mismatches will be much smaller. Aspects such as scalability, generalizability, and performance of the OPSI algorithm are discussed. In comparison with the hamming distance-based approximate pattern matching algorithm, the proposed algorithm is found to be 69% more efficient.

Collapse

Kallenborn F, Cascitti J, Schmidt B. CARE 2.0: reducing false-positive sequencing error corrections using machine learning. BMC Bioinformatics 2022;23:227. [PMID: 35698033 PMCID: PMC9195321 DOI: 10.1186/s12859-022-04754-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/30/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Next-generation sequencing pipelines often perform error correction as a preprocessing step to obtain cleaned input data. State-of-the-art error correction programs are able to reliably detect and correct the majority of sequencing errors. However, they also introduce new errors by making false-positive corrections. These correction mistakes can have negative impact on downstream analysis, such as k-mer statistics, de-novo assembly, and variant calling. This motivates the need for more precise error correction tools.

RESULTS

We present CARE 2.0, a context-aware read error correction tool based on multiple sequence alignment targeting Illumina datasets. In addition to a number of newly introduced optimizations its most significant change is the replacement of CARE 1.0's hand-crafted correction conditions with a novel classifier based on random decision forests trained on Illumina data. This results in up to two orders-of-magnitude fewer false-positive corrections compared to other state-of-the-art error correction software. At the same time, CARE 2.0 is able to achieve high numbers of true-positive corrections comparable to its competitors. On a simulated full human dataset with 914M reads CARE 2.0 generates only 1.2M false positives (FPs) (and 801.4M true positives (TPs)) at a highly competitive runtime while the best corrections achieved by other state-of-the-art tools contain at least 3.9M FPs and at most 814.5M TPs. Better de-novo assembly and improved k-mer analysis show the applicability of CARE 2.0 to real-world data.

CONCLUSION

False-positive corrections can negatively influence down-stream analysis. The precision of CARE 2.0 greatly reduces the number of those corrections compared to other state-of-the-art programs including BFC, Karect, Musket, Bcool, SGA, and Lighter. Thus, higher-quality datasets are produced which improve k-mer analysis and de-novo assembly in real-world datasets which demonstrates the applicability of machine learning techniques in the context of sequencing read error correction. CARE 2.0 is written in C++/CUDA for Linux systems and can be run on the CPU as well as on CUDA-enabled GPUs. It is available at https://github.com/fkallen/CARE .

Collapse

Nogueira-Rodrigues J, Leite SC, Pinto-Costa R, Sousa SC, Luz LL, Sintra MA, Oliveira R, Monteiro AC, Pinheiro GG, Vitorino M, Silva JA, Simão S, Fernandes VE, Provazník J, Benes V, Cruz CD, Safronov BV, Magalhães A, Reis CA, Vieira J, Vieira CP, Tiscórnia G, Araújo IM, Sousa MM. Rewired glycosylation activity promotes scarless regeneration and functional recovery in spiny mice after complete spinal cord transection. Dev Cell 2021;57:440-450.e7. [PMID: 34986324 DOI: 10.1016/j.devcel.2021.12.008] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 11/26/2021] [Accepted: 12/08/2021] [Indexed: 12/11/2022]

Affiliation(s)

Joana Nogueira-Rodrigues Nerve Regeneration Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal; Graduate Program in Molecular and Cell Biology, Instituto de Ciências Biomédicas Abel Salazar (ICBAS), University of Porto, 4050-313 Porto, Portugal
Sérgio C Leite Nerve Regeneration Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal
Rita Pinto-Costa Nerve Regeneration Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal
Sara C Sousa Nerve Regeneration Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal; Graduate Program in Molecular and Cell Biology, Instituto de Ciências Biomédicas Abel Salazar (ICBAS), University of Porto, 4050-313 Porto, Portugal
Liliana L Luz Neuronal Networks Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal
Maria A Sintra Nerve Regeneration Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal
Raquel Oliveira Translational NeuroUrology Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal; Department of Biomedicine, Experimental Biology Unit, Faculty of Medicine of Porto, University of Porto, 4200-319 Porto, Portugal; Regeneration Group, Wolfson Centre for Age-Related Diseases, Institute of Psychiatry, Psychology and Neuroscience, King's College London WC2R 2LS, London, UK
Ana C Monteiro Nerve Regeneration Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal
Gonçalo G Pinheiro Molecular & Regenerative Medicine Laboratory, Centro de Ciências do Mar (CCMAR), University of Algarve, 8005-139 Faro, Portugal; Faculty of Medicine and Biomedical Sciences, University of Algarve, 8005-139 Faro, Portugal
Marta Vitorino Molecular & Regenerative Medicine Laboratory, Centro de Ciências do Mar (CCMAR), University of Algarve, 8005-139 Faro, Portugal; Faculty of Medicine and Biomedical Sciences, University of Algarve, 8005-139 Faro, Portugal
Joana A Silva Faculty of Medicine and Biomedical Sciences, University of Algarve, 8005-139 Faro, Portugal
Sónia Simão Faculty of Medicine and Biomedical Sciences, University of Algarve, 8005-139 Faro, Portugal; Algarve Biomedical Center Research Institute (ABC-RI), University of Algarve, 8005-139 Faro, Portugal
Vitor E Fernandes Faculty of Medicine and Biomedical Sciences, University of Algarve, 8005-139 Faro, Portugal; Algarve Biomedical Center Research Institute (ABC-RI), University of Algarve, 8005-139 Faro, Portugal
Jan Provazník Genomics Core Facility, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
Vladimir Benes Genomics Core Facility, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
Célia D Cruz Translational NeuroUrology Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal; Department of Biomedicine, Experimental Biology Unit, Faculty of Medicine of Porto, University of Porto, 4200-319 Porto, Portugal
Boris V Safronov Neuronal Networks Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal
Ana Magalhães Glycobiology in Cancer Group, Institute of Molecular Pathology and Immunology, IPATIMUP), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal; Department of Molecular Biology, Instituto de Ciências Biomédicas Abel Salazar (ICBAS), University of Porto, 4050-313 Porto, Portugal
Celso A Reis Glycobiology in Cancer Group, Institute of Molecular Pathology and Immunology, IPATIMUP), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal; Department of Molecular Biology, Instituto de Ciências Biomédicas Abel Salazar (ICBAS), University of Porto, 4050-313 Porto, Portugal; Department of Pathology, Faculty of Medicine of Porto, University of Porto, 4200-319 Porto, Portugal
Jorge Vieira Phenotypic Evolution Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal
Cristina P Vieira Phenotypic Evolution Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal
Gustavo Tiscórnia Molecular & Regenerative Medicine Laboratory, Centro de Ciências do Mar (CCMAR), University of Algarve, 8005-139 Faro, Portugal; Clinica Eugin, Research and Development, 08006 Barcelona, Spain
Inês M Araújo Faculty of Medicine and Biomedical Sciences, University of Algarve, 8005-139 Faro, Portugal; Algarve Biomedical Center Research Institute (ABC-RI), University of Algarve, 8005-139 Faro, Portugal; Champalimaud Research Program, Champalimaud Center for the Unknown, 1400-038 Lisbon, Portugal
Mónica M Sousa Nerve Regeneration Group, Instituto de Biologia Molecular e Celular (IBMC), Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, 4200-135 Porto, Portugal.

Collapse

Leinonen M, Salmela L. Extraction of long k-mers using spaced seeds. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;PP:1-1. [PMID: 34529572 DOI: 10.1109/tcbb.2021.3113131] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Zhang X, Ping P, Hutvagner G, Blumenstein M, Li J. Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach. Nucleic Acids Res 2021;49:e106. [PMID: 34291293 PMCID: PMC8631080 DOI: 10.1093/nar/gkab610] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 07/01/2021] [Accepted: 07/06/2021] [Indexed: 12/21/2022] Open

Kallenborn F, Hildebrandt A, Schmidt B. CARE: context-aware sequencing read error correction. Bioinformatics 2021;37:889-895. [PMID: 32818262 DOI: 10.1093/bioinformatics/btaa738] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 07/14/2020] [Accepted: 08/14/2020] [Indexed: 11/14/2022] Open

He F, Steige KA, Kovacova V, Göbel U, Bouzid M, Keightley PD, Beyer A, de Meaux J. Cis-regulatory evolution spotlights species differences in the adaptive potential of gene expression plasticity. Nat Commun 2021;12:3376. [PMID: 34099660 PMCID: PMC8184852 DOI: 10.1038/s41467-021-23558-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 04/29/2021] [Indexed: 11/09/2022] Open

Zhang X, Liu Y, Yu Z, Blumenstein M, Hutvagner G, Li J. Instance-based error correction for short reads of disease-associated genes. BMC Bioinformatics 2021;22:142. [PMID: 34078284 PMCID: PMC8170817 DOI: 10.1186/s12859-021-04058-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 03/02/2021] [Indexed: 12/12/2022] Open

Heo Y, Manikandan G, Ramachandran A, Chen D. Comprehensive Evaluation of Error-Correction Methodologies for Genome Sequencing Data. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Li HD, Zhang W, Luo Y, Wang J. IsoDetect: Detection of Splice Isoforms from Third Generation Long Reads Based on Short Feature Sequences. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200316101205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Reis M, Wiegleb G, Claude J, Lata R, Horchler B, Ha NT, Reimer C, Vieira CP, Vieira J, Posnien N. Multiple loci linked to inversions are associated with eye size variation in species of the Drosophila virilis phylad. Sci Rep 2020;10:12832. [PMID: 32732947 PMCID: PMC7393161 DOI: 10.1038/s41598-020-69719-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 07/14/2020] [Indexed: 11/26/2022] Open

Affiliation(s)

Micael Reis Department of Developmental Biology, Göttingen Center for Molecular Biosciences (GZMB), University of Goettingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Gordon Wiegleb Department of Developmental Biology, Göttingen Center for Molecular Biosciences (GZMB), University of Goettingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany.,International Max Planck Research School for Genome Science, Am Fassberg 11, 37077, Göttingen, Germany
Julien Claude Institut Des Sciences de l'Evolution de Montpellier, CNRS/UM2/IRD, 2 Place Eugène Bataillon, cc64, 34095, Montpellier Cedex 5, France
Rodrigo Lata Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal.,Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Porto, Portugal
Britta Horchler Department of Developmental Biology, Göttingen Center for Molecular Biosciences (GZMB), University of Goettingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Ngoc-Thuy Ha Animal Breeding and Genetics Group, Department of Animal Sciences, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany.,Center for Integrated Breeding Research, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
Christian Reimer Animal Breeding and Genetics Group, Department of Animal Sciences, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany.,Center for Integrated Breeding Research, University of Goettingen, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
Cristina P Vieira Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal.,Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Porto, Portugal
Jorge Vieira Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal.,Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Porto, Portugal
Nico Posnien Department of Developmental Biology, Göttingen Center for Molecular Biosciences (GZMB), University of Goettingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany.

Collapse

Mitchell K, Brito JJ, Mandric I, Wu Q, Knyazev S, Chang S, Martin LS, Karlsberg A, Gerasimov E, Littman R, Hill BL, Wu NC, Yang HT, Hsieh K, Chen L, Littman E, Shabani T, Enik G, Yao D, Sun R, Schroeder J, Eskin E, Zelikovsky A, Skums P, Pop M, Mangul S. Benchmarking of computational error-correction methods for next-generation sequencing data. Genome Biol 2020;21:71. [PMID: 32183840 PMCID: PMC7079412 DOI: 10.1186/s13059-020-01988-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 03/06/2020] [Indexed: 12/16/2022] Open

Affiliation(s)

Keith Mitchell Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
Jaqueline J Brito Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
Igor Mandric Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
Qiaozhen Wu Department of Mathematics, University of California Los Angeles, 520 Portola Plaza, Los Angeles, CA, 90095, USA
Sergey Knyazev Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
Sei Chang Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
Lana S Martin Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
Aaron Karlsberg Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
Ekaterina Gerasimov Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
Russell Littman UCLA Bioinformatics, 621 Charles E Young Dr S, Los Angeles, CA, 90024, USA
Brian L Hill Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
Nicholas C Wu Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
Harry Taegyun Yang Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
Kevin Hsieh Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
Linus Chen Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
Eli Littman Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
Taylor Shabani Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
German Enik Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
Douglas Yao Department of Molecular, Cell, and Developmental Biology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
Ren Sun Department of Molecular and Medical Pharmacology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
Jan Schroeder Epigenetics & Reprogramming Laboratory, Monash University, 15 Innovation Walk, Melbourne, VIC, 3800, Australia
Eleazar Eskin Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
Alex Zelikovsky Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA The Laboratory of Bioinformatics, I.M, Sechenov First Moscow State Medical University, Moscow, Russia, 119991
Pavel Skums Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
Mihai Pop Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA
Serghei Mangul Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.

Collapse

Das AK, Goswami S, Lee K, Park SJ. A hybrid and scalable error correction algorithm for indel and substitution errors of long reads. BMC Genomics 2019;20:948. [PMID: 31856721 PMCID: PMC6923905 DOI: 10.1186/s12864-019-6286-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Abstract

BACKGROUND

Long-read sequencing has shown the promises to overcome the short length limitations of second-generation sequencing by providing more complete assembly. However, the computation of the long sequencing reads is challenged by their higher error rates (e.g., 13% vs. 1%) and higher cost ($0.3 vs. $0.03 per Mbp) compared to the short reads.

METHODS

In this paper, we present a new hybrid error correction tool, called ParLECH (Parallel Long-read Error Correction using Hybrid methodology). The error correction algorithm of ParLECH is distributed in nature and efficiently utilizes the k-mer coverage information of high throughput Illumina short-read sequences to rectify the PacBio long-read sequences.ParLECH first constructs a de Bruijn graph from the short reads, and then replaces the indel error regions of the long reads with their corresponding widest path (or maximum min-coverage path) in the short read-based de Bruijn graph. ParLECH then utilizes the k-mer coverage information of the short reads to divide each long read into a sequence of low and high coverage regions, followed by a majority voting to rectify each substituted error base.

RESULTS

ParLECH outperforms latest state-of-the-art hybrid error correction methods on real PacBio datasets. Our experimental evaluation results demonstrate that ParLECH can correct large-scale real-world datasets in an accurate and scalable manner. ParLECH can correct the indel errors of human genome PacBio long reads (312 GB) with Illumina short reads (452 GB) in less than 29 h using 128 compute nodes. ParLECH can align more than 92% bases of an E. coli PacBio dataset with the reference genome, proving its accuracy.

CONCLUSION

ParLECH can scale to over terabytes of sequencing data using hundreds of computing nodes. The proposed hybrid error correction methodology is novel and rectifies both indel and substitution errors present in the original long reads or newly introduced by the short reads.

Collapse

Ahmed N, Lévy J, Ren S, Mushtaq H, Bertels K, Al-Ars Z. GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data. BMC Bioinformatics 2019;20:520. [PMID: 31653208 PMCID: PMC6815017 DOI: 10.1186/s12859-019-3086-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 09/06/2019] [Indexed: 01/27/2023] Open

GAAP: A Genome Assembly + Annotation Pipeline. BIOMED RESEARCH INTERNATIONAL 2019;2019:4767354. [PMID: 31346518 PMCID: PMC6617929 DOI: 10.1155/2019/4767354] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 05/20/2019] [Accepted: 05/26/2019] [Indexed: 12/24/2022]

Current challenges and solutions of de novo assembly. QUANTITATIVE BIOLOGY 2019. [DOI: 10.1007/s40484-019-0166-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Manekar SC, Sathe SR. Estimating the k-mer Coverage Frequencies in Genomic Datasets: A Comparative Assessment of the State-of-the-art. Curr Genomics 2019;20:2-15. [PMID: 31015787 PMCID: PMC6446480 DOI: 10.2174/1389202919666181026101326] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 10/05/2018] [Accepted: 10/24/2018] [Indexed: 12/24/2022] Open

Limasset A, Flot JF, Peterlongo P. Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs. Bioinformatics 2019;36:1374-1381. [DOI: 10.1093/bioinformatics/btz102] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 01/07/2019] [Accepted: 02/18/2019] [Indexed: 12/25/2022] Open

Ershov V, Tarasov A, Lapidus A, Korobeynikov A. IonHammer: Homopolymer-Space Hamming Clustering for IonTorrent Read Error Correction. J Comput Biol 2019;26:124-127. [DOI: 10.1089/cmb.2018.0152] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Zhao L, Xie J, Bai L, Chen W, Wang M, Zhang Z, Wang Y, Zhao Z, Li J. Mining statistically-solid k-mers for accurate NGS error correction. BMC Genomics 2018;19:912. [PMID: 30598110 PMCID: PMC6311904 DOI: 10.1186/s12864-018-5272-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Manekar SC, Sathe SR. A benchmark study of k-mer counting methods for high-throughput sequencing. Gigascience 2018;7:5140149. [PMID: 30346548 PMCID: PMC6280066 DOI: 10.1093/gigascience/giy125] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Accepted: 10/16/2018] [Indexed: 11/25/2022] Open

Mukherjee K, Washimkar D, Muggli MD, Salmela L, Boucher C. Error correcting optical mapping data. Gigascience 2018;7:5005021. [PMID: 29846578 PMCID: PMC6007263 DOI: 10.1093/gigascience/giy061] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 05/16/2018] [Indexed: 12/31/2022] Open

Yoon S, Kim D, Kang K, Park WJ. TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix. BMC Genomics 2018;19:653. [PMID: 30180798 PMCID: PMC6123912 DOI: 10.1186/s12864-018-5034-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 08/23/2018] [Indexed: 01/15/2023] Open

Abstract

BACKGROUND

The challenges when developing a good de novo transcriptome assembler include how to deal with read errors and sequence repeats. Almost all de novo assemblers utilize a de Bruijn graph, with which complexity grows linearly with data size while suffering from errors and repeats. Although one can correct the errors by inspecting the topological structure of the graph, this is not an easy task when there are too many branches. Two research directions are to improve either the graph reliability or the path search precision, and in this study, we focused on the former.

RESULTS

We present TraRECo, a greedy approach to de novo assembly employing error-aware graph construction. In the proposed approach, we built contigs by direct read alignment within a distance margin and performed a junction search to construct splicing graphs. While doing so, a contig of length l was represented by a 4 × l matrix (called a consensus matrix), in which each element was the base count of the aligned reads so far. A representative sequence was obtained by taking the majority in each column of the consensus matrix to be used for further read alignment. Once the splicing graphs had been obtained, we used IsoLasso to find paths with a noticeable read depth. The experiments using real and simulated reads show that the method provided considerable improvement in sensitivity and moderately better performance when comparing sensitivity and precision. This was achieved by the error-aware graph construction using the consensus matrix, with which the reads having errors were made usable for the graph construction (otherwise, they might have been eventually discarded). This improved the quality of the coverage depth information used in the subsequent path search step and finally the reliability of the graph.

CONCLUSIONS

De novo assembly is mainly used to explore undiscovered isoforms and must be able to represent as many reads as possible in an efficient way. In this sense, TraRECo provides us with a potential alternative for improving graph reliability even though the computational burden is much higher than the single k-mer in the de Bruijn graph approach.

Collapse

Xia C, Wang M, Cornejo OE, Jiwan DA, See DR, Chen X. Secretome Characterization and Correlation Analysis Reveal Putative Pathogenicity Mechanisms and Identify Candidate Avirulence Genes in the Wheat Stripe Rust Fungus Puccinia striiformis f. sp. tritici. Front Microbiol 2017;8:2394. [PMID: 29312156 PMCID: PMC5732408 DOI: 10.3389/fmicb.2017.02394] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2017] [Accepted: 11/20/2017] [Indexed: 12/30/2022] Open

Abstract

Stripe (yellow) rust, caused by Puccinia striiformis f. sp. tritici (Pst), is one of the most destructive diseases of wheat worldwide. Planting resistant cultivars is an effective way to control this disease, but race-specific resistance can be overcome quickly due to the rapid evolving Pst population. Studying the pathogenicity mechanisms is critical for understanding how Pst virulence changes and how to develop wheat cultivars with durable resistance to stripe rust. We re-sequenced 7 Pst isolates and included additional 7 previously sequenced isolates to represent balanced virulence/avirulence profiles for several avirulence loci in seretome analyses. We observed an uneven distribution of heterozygosity among the isolates. Secretome comparison of Pst with other rust fungi identified a large portion of species-specific secreted proteins, suggesting that they may have specific roles when interacting with the wheat host. Thirty-two effectors of Pst were identified from its secretome. We identified candidates for Avr genes corresponding to six Yr genes by correlating polymorphisms for effector genes to the virulence/avirulence profiles of the 14 Pst isolates. The putative AvYr76 was present in the avirulent isolates, but absent in the virulent isolates, suggesting that deleting the coding region of the candidate avirulence gene has produced races virulent to resistance gene Yr76. We conclude that incorporating avirulence/virulence phenotypes into correlation analysis with variations in genomic structure and secretome, particularly presence/absence polymorphisms of effectors, is an efficient way to identify candidate Avr genes in Pst. The candidate effector genes provide a rich resource for further studies to determine the evolutionary history of Pst populations and the co-evolutionary arms race between Pst and wheat. The Avr candidates identified in this study will lead to cloning avirulence genes in Pst, which will enable us to understand molecular mechanisms underlying Pst-wheat interactions, to determine the effectiveness of resistance genes and further to develop durable resistance to stripe rust.

Collapse

Huang YT, Huang YW. An efficient error correction algorithm using FM-index. BMC Bioinformatics 2017;18:524. [PMID: 29179672 PMCID: PMC5704532 DOI: 10.1186/s12859-017-1940-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Accepted: 11/14/2017] [Indexed: 11/10/2022] Open

Illingworth CJR, Roy S, Beale MA, Tutill H, Williams R, Breuer J. On the effective depth of viral sequence data. Virus Evol 2017;3:vex030. [PMID: 29250429 PMCID: PMC5724399 DOI: 10.1093/ve/vex030] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

Savel D, LaFramboise T, Grama A, Koyuturk M. Pluribus-Exploring the Limits of Error Correction Using a Suffix Tree. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017;14:1378-1388. [PMID: 27362987 PMCID: PMC5754272 DOI: 10.1109/tcbb.2016.2586060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Paracchini V, Petrillo M, Lievens A, Puertas Gallardo A, Martinsohn JT, Hofherr J, Maquet A, Silva APB, Kagkli DM, Querci M, Patak A, Angers-Loustau A. Novel nuclear barcode regions for the identification of flatfish species. Food Control 2017;79:297-308. [PMID: 28867876 PMCID: PMC5446357 DOI: 10.1016/j.foodcont.2017.04.009] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 04/05/2017] [Accepted: 04/06/2017] [Indexed: 01/30/2023]

Song L, Huang W, Kang J, Huang Y, Ren H, Ding K. Comparison of error correction algorithms for Ion Torrent PGM data: application to hepatitis B virus. Sci Rep 2017;7:8106. [PMID: 28808243 PMCID: PMC5556038 DOI: 10.1038/s41598-017-08139-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 07/05/2017] [Indexed: 01/26/2023] Open

Lee B, Moon T, Yoon S, Weissman T. DUDE-Seq: Fast, flexible, and robust denoising for targeted amplicon sequencing. PLoS One 2017;12:e0181463. [PMID: 28749987 PMCID: PMC5531809 DOI: 10.1371/journal.pone.0181463] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/30/2017] [Indexed: 11/29/2022] Open

Ahola V, Wahlberg N, Frilander MJ. Butterfly Genomics: Insights from the Genome ofMelitaea cinxia. ANN ZOOL FENN 2017. [DOI: 10.5735/086.054.0123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Salmela L, Walve R, Rivals E, Ukkonen E. Accurate self-correction of errors in long reads using de Bruijn graphs. Bioinformatics 2017;33:799-806. [PMID: 27273673 PMCID: PMC5351550 DOI: 10.1093/bioinformatics/btw321] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 05/03/2016] [Accepted: 05/16/2016] [Indexed: 12/04/2022] Open

Zhao L, Chen Q, Li W, Jiang P, Wong L, Li J. MapReduce for accurate error correction of next-generation sequencing data. Bioinformatics 2017;33:3844-3851. [PMID: 28205674 DOI: 10.1093/bioinformatics/btx089] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 02/14/2017] [Indexed: 11/14/2022] Open

Jue NK, Batta-Lona PG, Trusiak S, Obergfell C, Bucklin A, O'Neill MJ, O'Neill RJ. Rapid Evolutionary Rates and Unique Genomic Signatures Discovered in the First Reference Genome for the Southern Ocean Salp, Salpa thompsoni (Urochordata, Thaliacea). Genome Biol Evol 2016;8:3171-3186. [PMID: 27624472 PMCID: PMC5174732 DOI: 10.1093/gbe/evw215] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Abstract

A preliminary genome sequence has been assembled for the Southern Ocean salp, Salpa thompsoni (Urochordata, Thaliacea). Despite the ecological importance of this species in Antarctic pelagic food webs and its potential role as an indicator of changing Southern Ocean ecosystems in response to climate change, no genomic resources are available for S. thompsoni or any closely related urochordate species. Using a multiple-platform, multiple-individual approach, we have produced a 318,767,936-bp genome sequence, covering >50% of the estimated 602 Mb (±173 Mb) genome size for S. thompsoni. Using a nonredundant set of predicted proteins, >50% (16,823) of sequences showed significant homology to known proteins and ∼38% (12,151) of the total protein predictions were associated with Gene Ontology functional information. We have generated 109,958 SNP variant and 9,782 indel predictions for this species, serving as a resource for future phylogenomic and population genetic studies. Comparing the salp genome to available assemblies for four other urochordates, Botryllus schlosseri, Ciona intestinalis, Ciona savignyi and Oikopleura dioica, we found that S. thompsoni shares the previously estimated rapid rates of evolution for these species. High mutation rates are thus independent of genome size, suggesting that rates of evolution >1.5 times that observed for vertebrates are a broad taxonomic characteristic of urochordates. Tests for positive selection implemented in PAML revealed a small number of genes with sites undergoing rapid evolution, including genes involved in ribosome biogenesis and metabolic and immune process that may be reflective of both adaptation to polar, planktonic environments as well as the complex life history of the salps. Finally, we performed an initial survey of small RNAs, revealing the presence of known, conserved miRNAs, as well as novel miRNA genes; unique piRNAs; and mature miRNA signatures for varying developmental stages. Collectively, these resources provide a genomic foundation supporting S. thompsoni as a model species for further examination of the exceptional rates and patterns of genomic evolution shown by urochordates. Additionally, genomic data will allow for the development of molecular indicators of key life history events and processes and afford new understandings and predictions of impacts of climate change on this key species of Antarctic pelagic ecosystems.

Collapse

From next-generation resequencing reads to a high-quality variant data set. Heredity (Edinb) 2016;118:111-124. [PMID: 27759079 DOI: 10.1038/hdy.2016.102] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2016] [Revised: 09/03/2016] [Accepted: 09/06/2016] [Indexed: 12/11/2022] Open

Improved Efficiency and Reliability of NGS Amplicon Sequencing Data Analysis for Genetic Diagnostic Procedures Using AGSA Software. BIOMED RESEARCH INTERNATIONAL 2016;2016:5623089. [PMID: 27656653 PMCID: PMC5021467 DOI: 10.1155/2016/5623089] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 06/28/2016] [Indexed: 11/23/2022]

Akogwu I, Wang N, Zhang C, Gong P. A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis. Hum Genomics 2016;10 Suppl 2:20. [PMID: 27461106 PMCID: PMC4965716 DOI: 10.1186/s40246-016-0068-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Abstract

BACKGROUND

Innumerable opportunities for new genomic research have been stimulated by advancement in high-throughput next-generation sequencing (NGS). However, the pitfall of NGS data abundance is the complication of distinction between true biological variants and sequence error alterations during downstream analysis. Many error correction methods have been developed to correct erroneous NGS reads before further analysis, but independent evaluation of the impact of such dataset features as read length, genome size, and coverage depth on their performance is lacking. This comparative study aims to investigate the strength and weakness as well as limitations of some newest k-spectrum-based methods and to provide recommendations for users in selecting suitable methods with respect to specific NGS datasets.

METHODS

Six k-spectrum-based methods, i.e., Reptile, Musket, Bless, Bloocoo, Lighter, and Trowel, were compared using six simulated sets of paired-end Illumina sequencing data. These NGS datasets varied in coverage depth (10× to 120×), read length (36 to 100 bp), and genome size (4.6 to 143 MB). Error Correction Evaluation Toolkit (ECET) was employed to derive a suite of metrics (i.e., true positives, false positive, false negative, recall, precision, gain, and F-score) for assessing the correction quality of each method.

RESULTS

Results from computational experiments indicate that Musket had the best overall performance across the spectra of examined variants reflected in the six datasets. The lowest accuracy of Musket (F-score = 0.81) occurred to a dataset with a medium read length (56 bp), a medium coverage (50×), and a small-sized genome (5.4 MB). The other five methods underperformed (F-score < 0.80) and/or failed to process one or more datasets.

CONCLUSIONS

This study demonstrates that various factors such as coverage depth, read length, and genome size may influence performance of individual k-spectrum-based error correction methods. Thus, efforts have to be paid in choosing appropriate methods for error correction of specific NGS datasets. Based on our comparative study, we recommend Musket as the top choice because of its consistently superior performance across all six testing datasets. Further extensive studies are warranted to assess these methods using experimental datasets generated by NGS platforms (e.g., 454, SOLiD, and Ion Torrent) under more diversified parameter settings (k-mer values and edit distances) and to compare them against other non-k-spectrum-based classes of error correction methods.

Collapse

Miclotte G, Heydari M, Demeester P, Rombauts S, Van de Peer Y, Audenaert P, Fostier J. Jabba: hybrid error correction for long sequencing reads. Algorithms Mol Biol 2016;11:10. [PMID: 27148393 PMCID: PMC4855726 DOI: 10.1186/s13015-016-0075-7] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 04/25/2016] [Indexed: 11/13/2022] Open

Sameith K, Roscito JG, Hiller M. Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly. Brief Bioinform 2016;18:1-8. [PMID: 26868358 PMCID: PMC5221426 DOI: 10.1093/bib/bbw003] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 01/02/2016] [Indexed: 11/13/2022] Open

Alic AS, Tomas A, Medina I, Blanquer I. MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2015.09.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Tong L, Yang C, Wu PY, Wang MD. Evaluating the impact of sequencing error correction for RNA-seq data with ERCC RNA spike-in controls. ... IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS. IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2016;2016:74-77. [PMID: 27532064 DOI: 10.1109/bhi.2016.7455838] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Angers-Loustau A, Petrillo M, Paracchini V, Kagkli DM, Rischitor PE, Puertas Gallardo A, Patak A, Querci M, Kreysa J. Towards Plant Species Identification in Complex Samples: A Bioinformatics Pipeline for the Identification of Novel Nuclear Barcode Candidates. PLoS One 2016;11:e0147692. [PMID: 26807711 PMCID: PMC4725681 DOI: 10.1371/journal.pone.0147692] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Accepted: 01/07/2016] [Indexed: 11/25/2022] Open

Park SJ, Saito-Adachi M, Komiyama Y, Nakai K. Advances, practice, and clinical perspectives in high-throughput sequencing. Oral Dis 2016;22:353-64. [DOI: 10.1111/odi.12403] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Revised: 11/16/2015] [Accepted: 11/16/2015] [Indexed: 01/06/2023]

Alic AS, Ruzafa D, Dopazo J, Blanquer I. Objective review ofde novostand-alone error correction methods for NGS data. WILEY INTERDISCIPLINARY REVIEWS: COMPUTATIONAL MOLECULAR SCIENCE 2016. [DOI: 10.1002/wcms.1239] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]