Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kallenborn F, Cascitti J, Schmidt B. CARE 2.0: reducing false-positive sequencing error corrections using machine learning. BMC Bioinformatics 2022;23:227. [PMID: 35698033 PMCID: PMC9195321 DOI: 10.1186/s12859-022-04754-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/30/2022] [Indexed: 11/10/2022] Open

For:	Kallenborn F, Cascitti J, Schmidt B. CARE 2.0: reducing false-positive sequencing error corrections using machine learning. BMC Bioinformatics 2022;23:227. [PMID: 35698033 PMCID: PMC9195321 DOI: 10.1186/s12859-022-04754-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/30/2022] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Moon Y, Hong CH, Kim YH, Kim JK, Ye SH, Kang EK, Choi HW, Cho H, Choi H, Lee DE, Choi Y, Kim TM, Heo SG, Han N, Hong KM. Enhancing Clinical Applications by Evaluation of Sensitivity and Specificity in Whole Exome Sequencing. Int J Mol Sci 2024;25:13250. [PMID: 39769013 PMCID: PMC11678496 DOI: 10.3390/ijms252413250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/04/2024] [Accepted: 12/06/2024] [Indexed: 01/11/2025] Open

Affiliation(s)

Youngbeen Moon Bioinformatics Analysis Team, Research Core Center, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (Y.M.); (J.-K.K.)
Chung Hwan Hong Cancer Molecular Biology Branch, Division of Cancer Biology, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (C.H.H.); (S.-H.Y.); (E.-K.K.); (H.W.C.)
Young-Ho Kim Diagnostic and Therapeutics Technology Branch, Division of Technology Convergence, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (Y.-H.K.); (H.C.); (H.C.)
Jong-Kwang Kim Bioinformatics Analysis Team, Research Core Center, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (Y.M.); (J.-K.K.)
Seo-Hyeon Ye Cancer Molecular Biology Branch, Division of Cancer Biology, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (C.H.H.); (S.-H.Y.); (E.-K.K.); (H.W.C.)
Eun-Kyung Kang Cancer Molecular Biology Branch, Division of Cancer Biology, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (C.H.H.); (S.-H.Y.); (E.-K.K.); (H.W.C.)
Hye Won Choi Cancer Molecular Biology Branch, Division of Cancer Biology, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (C.H.H.); (S.-H.Y.); (E.-K.K.); (H.W.C.)
Hyeri Cho Diagnostic and Therapeutics Technology Branch, Division of Technology Convergence, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (Y.-H.K.); (H.C.); (H.C.)
Hana Choi Diagnostic and Therapeutics Technology Branch, Division of Technology Convergence, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (Y.-H.K.); (H.C.); (H.C.)
Dong-eun Lee Biostatistics Collaboration Team, Research Core Center, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea;
Yongdoo Choi Division of Technology Convergence, National Cancer Center, 323 Ilsan-ro, Goyang 10408, Gyeonggi-do, Republic of Korea;
Tae-Min Kim Department of Medical Informatics and Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Gyeonggi-do, Republic of Korea;
Seong Gu Heo Dana Farber Cancer Institute, Boston, MA 02215, USA; The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA Harvard Medical School, Boston, MA 02115, USA
Namshik Han Milner Therapeutics Institute, University of Cambridge, Cambridge CB2 0AW, UK; Cambridge Centre for AI in Medicine, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge CB3 0WA, UK Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, UK
Kyeong-Man Hong Bioinformatics Analysis Team, Research Core Center, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (Y.M.); (J.-K.K.) Cancer Molecular Biology Branch, Division of Cancer Biology, Research Institute, National Cancer Center, Goyang 10408, Gyeonggi-do, Republic of Korea; (C.H.H.); (S.-H.Y.); (E.-K.K.); (H.W.C.)

Collapse

Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024;23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open

Schmidt B, Hildebrandt A. From GPUs to AI and quantum: three waves of acceleration in bioinformatics. Drug Discov Today 2024;29:103990. [PMID: 38663581 DOI: 10.1016/j.drudis.2024.103990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/05/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024]

Tang T, Liu Y, Zheng B, Li R, Zhang X, Liu Y. Integration of hybrid and self-correction method improves the quality of long-read sequencing data. Brief Funct Genomics 2024;23:249-255. [PMID: 37340778 DOI: 10.1093/bfgp/elad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 06/04/2023] [Accepted: 06/05/2023] [Indexed: 06/22/2023] Open

Kallenborn F, Schmidt B. CAREx: context-aware read extension of paired-end sequencing data. BMC Bioinformatics 2024;25:186. [PMID: 38730374 PMCID: PMC11088031 DOI: 10.1186/s12859-024-05802-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 05/03/2024] [Indexed: 05/12/2024] Open

Sami A, El-Metwally S, Rashad MZ. MAC-ErrorReads: machine learning-assisted classifier for filtering erroneous NGS reads. BMC Bioinformatics 2024;25:61. [PMID: 38321434 PMCID: PMC10848413 DOI: 10.1186/s12859-024-05681-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/29/2024] [Indexed: 02/08/2024] Open

Abstract

BACKGROUND

The rapid advancement of next-generation sequencing (NGS) machines in terms of speed and affordability has led to the generation of a massive amount of biological data at the expense of data quality as errors become more prevalent. This introduces the need to utilize different approaches to detect and filtrate errors, and data quality assurance is moved from the hardware space to the software preprocessing stages.

RESULTS

We introduce MAC-ErrorReads, a novel Machine learning-Assisted Classifier designed for filtering Erroneous NGS Reads. MAC-ErrorReads transforms the erroneous NGS read filtration process into a robust binary classification task, employing five supervised machine learning algorithms. These models are trained on features extracted through the computation of Term Frequency-Inverse Document Frequency (TF_IDF) values from various datasets such as E. coli, GAGE S. aureus, H. Chr14, Arabidopsis thaliana Chr1 and Metriaclima zebra. Notably, Naive Bayes demonstrated robust performance across various datasets, displaying high accuracy, precision, recall, F1-score, MCC, and ROC values. The MAC-ErrorReads NB model accurately classified S. aureus reads, surpassing most error correction tools with a 38.69% alignment rate. For H. Chr14, tools like Lighter, Karect, CARE, Pollux, and MAC-ErrorReads showed rates above 99%. BFC and RECKONER exceeded 98%, while Fiona had 95.78%. For the Arabidopsis thaliana Chr1, Pollux, Karect, RECKONER, and MAC-ErrorReads demonstrated good alignment rates of 92.62%, 91.80%, 91.78%, and 90.87%, respectively. For the Metriaclima zebra, Pollux achieved a high alignment rate of 91.23%, despite having the lowest number of mapped reads. MAC-ErrorReads, Karect, and RECKONER demonstrated good alignment rates of 83.76%, 83.71%, and 83.67%, respectively, while also producing reasonable numbers of mapped reads to the reference genome.

CONCLUSIONS

This study demonstrates that machine learning approaches for filtering NGS reads effectively identify and retain the most accurate reads, significantly enhancing assembly quality and genomic coverage. The integration of genomics and artificial intelligence through machine learning algorithms holds promise for enhancing NGS data quality, advancing downstream data analysis accuracy, and opening new opportunities in genetics, genomics, and personalized medicine research.

Collapse

Długosz M, Deorowicz S. Illumina reads correction: evaluation and improvements. Sci Rep 2024;14:2232. [PMID: 38278837 PMCID: PMC11222498 DOI: 10.1038/s41598-024-52386-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 01/18/2024] [Indexed: 01/28/2024] Open

Yan L, Yin Z, Zhang H, Zhao Z, Wang M, Müller A, Kallenborn F, Wichmann A, Wei Y, Niu B, Schmidt B, Liu W. RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data. Methods 2023;216:39-50. [PMID: 37330158 DOI: 10.1016/j.ymeth.2023.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/26/2023] [Accepted: 06/12/2023] [Indexed: 06/19/2023] Open