1
|
Sharma P, Sana T, Khatoon S, Naikoo UM, Mosina, Malhotra N, Hasnain MS, Nayak AK, Narang J. Nanopores for DNA and biomolecule analysis: Diagnostic, genomic insights, applications in energy conversion and catalysis. Anal Biochem 2025; 701:115791. [PMID: 39894145 DOI: 10.1016/j.ab.2025.115791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Revised: 01/21/2025] [Accepted: 01/27/2025] [Indexed: 02/04/2025]
Abstract
Recently, nanopores have emerged as highly significant structures with broad applications in diverse scientific and technological fields. They can naturally occur in biological membranes or be artificially fabricated using advanced techniques. Recent advances in nanopore technology have revolutionized genomics by offering previously unheard-of capacities for deoxyribo nucleic acid (DNA) sequencing and analysis. These tiny pores allow individual molecules to be found more easily, allowing for real-time DNA analysis and providing currently unheard-of insights into genetics and diagnostics. By tracking alterations in electrical or ionic currents as biomolecules traverse the pore, nanopores make possible the real-time recognition of other biomolecules, like proteins, nucleic acids, and small molecules, eliminating the need for labeling. This label-free detection potential holds a huge promise in medical diagnostics, genotyping, environmental monitoring, etc. Nanopores have significantly improved DNA sequencing technology such as increment in read length, enabling researchers to sequence entire genomic regions, accuracy can be improved and recent updates have led to a reported increase in total DNA reads, demonstrating the technology's capacity for high-throughput applications via trapping individual DNA strands and monitoring the variations of ionic current as each nucleotide passes across the pore. Finally, nanopore sequencing is well-known as a novel and highly flexible technique for DNA analyses, which has a huge deal of promise in clinical diagnosis and genomics research. Hence, this review article comprehensively explains nanopores for DNA analysis and other biomolecules, their synthesis, and diverse applications.
Collapse
Affiliation(s)
- Pradakshina Sharma
- Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Tasmiya Sana
- Centre for Nanotechnology Research, Vellore Institute of Technology, Vellore, 632014, Tamil Nadu, India
| | - Shaheen Khatoon
- Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Ubiad Mushtaq Naikoo
- Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Mosina
- Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Nitesh Malhotra
- Department of Physiotherapy, School of Allied Health Sciences, Manav Rachna International Institute of Research and Studies, Faridabad, Haryana, 121003, India
| | - Md Saquib Hasnain
- Department of Pharmacy, Marwadi University, Rajkot, 360003, Gujarat, India.
| | - Amit Kumar Nayak
- Department of Pharmaceutics, School of Pharmaceutical Sciences, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, 751003, Odisha, India.
| | - Jagriti Narang
- Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India.
| |
Collapse
|
2
|
Straver R, Vermeulen C, Verity-Legg J, Pagès-Gallego M, Stoker DG, van Oudenaarden A, de Ridder J. ReQuant: improved base modification calling by k-mer value imputation. Nucleic Acids Res 2025; 53:gkaf323. [PMID: 40347136 PMCID: PMC12065109 DOI: 10.1093/nar/gkaf323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 03/11/2025] [Accepted: 04/14/2025] [Indexed: 05/12/2025] Open
Abstract
Nanopore sequencing allows identification of base modifications, such as methylation, directly from raw current data. Prevailing approaches, including deep learning (DL) methods, require training data covering all possible sequence contexts. These data can be prohibitively expensive or impossible to obtain for some modifications. Hence, research into DNA modifications focuses on the most prevalent modification in human DNA: 5mC in a CpG context. Improved generalization is required to reach the technology's full potential: calling any modification from raw current values. We developed ReQuant, an algorithm to impute full, k-mer based, modification models from limited k-mer context training data. ReQuant is highly accurate for calling modifications (CpG/GpC methylation and CpG glucosylation) in Lambda Phage R9 data when fitting on ≤25% of all possible 6-mers with a modification and extends to human R10 data. The success of our approach shows that DNA modifications have a consistent and therefore predictable effect on Nanopore current levels, suggesting that interpretable rule-based imputation in unseen contexts is possible. Our approach circumvents the need for modification-specific DL tools and enables modification calling when not all sequence contexts can be obtained, opening a vast field of biological base modification research.
Collapse
Affiliation(s)
- Roy Straver
- Center for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
- Oncode Institute, 3521 AL Utrecht, The Netherlands
| | - Carlo Vermeulen
- Center for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
- Oncode Institute, 3521 AL Utrecht, The Netherlands
| | - Joe R Verity-Legg
- Oncode Institute, 3521 AL Utrecht, The Netherlands
- Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, The Netherlands
| | - Marc Pagès-Gallego
- Center for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
- Oncode Institute, 3521 AL Utrecht, The Netherlands
| | - Dieter G G Stoker
- Center for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
- Oncode Institute, 3521 AL Utrecht, The Netherlands
| | - Alexander van Oudenaarden
- Oncode Institute, 3521 AL Utrecht, The Netherlands
- Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), 3584 CT Utrecht, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
- Oncode Institute, 3521 AL Utrecht, The Netherlands
| |
Collapse
|
3
|
Grech VS, Lotsaris K, Touma TE, Kefala V, Rallis E. The Role of Artificial Intelligence in Identifying NF1 Gene Variants and Improving Diagnosis. Genes (Basel) 2025; 16:560. [PMID: 40428382 PMCID: PMC12111457 DOI: 10.3390/genes16050560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2025] [Revised: 05/04/2025] [Accepted: 05/05/2025] [Indexed: 05/29/2025] Open
Abstract
Neurofibromatosis type 1 (NF1) is an autosomal dominant disorder caused by mutations in the NF1 gene, typically diagnosed during early childhood and characterized by significant phenotypic heterogeneity. Despite advancements in next-generation sequencing (NGS), the diagnostic process remains challenging due to the gene's complexity, high mutational burden, and frequent identification of variants of uncertain significance (VUS). This review explores the emerging role of artificial intelligence (AI) in enhancing NF1 variant detection, classification, and interpretation. A systematic literature search was conducted across PubMed, IEEE Xplore, Google Scholar, and ResearchGate to identify recent studies applying AI technologies to NF1 genetic analysis, focusing on variant interpretation, structural modeling, tumor classification, and therapeutic prediction. The review highlights the application of AI-based tools such as VEST3, REVEL, ClinPred, and NF1-specific models like DITTO and RENOVO-NF1, which have demonstrated improved accuracy in classifying missense variants and reclassifying VUS. Structural modeling platforms like AlphaFold contribute further insights into the impact of NF1 mutations on neurofibromin structure and function. In addition, deep learning models, such as LTC neural networks, support tumor classification and therapeutic outcome prediction, particularly in NF1-associated complications like congenital pseudarthrosis of the tibia (CPT). The integration of AI methodologies offers substantial potential to improve diagnostic precision, enable early intervention, and support personalized medicine approaches. However, key challenges remain, including algorithmic bias, limited data diversity, and the need for functional validation. Ongoing refinement and clinical validation of these tools are essential to ensure their effective implementation and equitable use in NF1 diagnostics.
Collapse
Affiliation(s)
- Vasiliki Sofia Grech
- Department of Biomedical Sciences, School of Health and Care Sciences, University of West Attica, GR-12243 Athens, Greece; (V.K.); (E.R.)
| | - Kleomenis Lotsaris
- Department of Psychiatry, General Hospital of Athens: “Evaggelismos”, GR-10676 Athens, Greece;
| | - Theano Eirini Touma
- Child and Adolescent Psychiatrist, General Hospital “Asklepieio Voulas”, GR-16673 Voula, Greece;
| | - Vassiliki Kefala
- Department of Biomedical Sciences, School of Health and Care Sciences, University of West Attica, GR-12243 Athens, Greece; (V.K.); (E.R.)
| | - Efstathios Rallis
- Department of Biomedical Sciences, School of Health and Care Sciences, University of West Attica, GR-12243 Athens, Greece; (V.K.); (E.R.)
| |
Collapse
|
4
|
Arima A. Recent advances in single-particle analysis with nanopore technology. ANAL SCI 2025; 41:677-685. [PMID: 40186842 DOI: 10.1007/s44211-025-00757-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Accepted: 03/20/2025] [Indexed: 04/07/2025]
Abstract
Nanopore sensors have been used as ultrasensitive tools for single-particle detection based on ionic current measurement. This simple, yet powerful technique allows researchers to acquire various physical properties of individual particles in a label-free manner. This mini-review describes the recent progress in nanopore technology demonstrated by our group. We first focus on the major advancements in nanopore architecture contributing to high-spatial resolution, followed by the detection strategy designed for long-term analysis. Then, we summarize the application of nanopore technology in infection diagnosis using machine learning. Following that, we discuss its potential for gene therapy, facilitated by high spatial resolution. Furthermore, we also highlighted potential applications of next-generation nanopore technology that contribute to a healthier future.
Collapse
Affiliation(s)
- Akihide Arima
- Research Institute for Quantum and Chemical Innovation, Institutes of Innovation for Future Society, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan.
| |
Collapse
|
5
|
Mears MC, Read QD, Bakre A. Comparison of direct RNA sequencing of Orthoavulavirus javaense using two different chemistries on the MinION platform. J Virol Methods 2025; 333:115103. [PMID: 39724954 DOI: 10.1016/j.jviromet.2024.115103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Revised: 12/18/2024] [Accepted: 12/21/2024] [Indexed: 12/28/2024]
Abstract
Rapidly identifying and sequencing viral pathogens in poultry flocks can substantially reduce economic loss especially during disease outbreaks. Current next generation sequencing technologies require multi-step laboratory-intensive workflows to generate sequence data which precludes field adaptation. In this study, we hypothesized that direct RNA sequencing (DRS) using an Oxford Nanopore Technology (ONT) MinION device would enable sequencing of the full-length viral RNA genome of Orthoavulavirus javaense (OAVJ), the causative of Newcastle disease, a major poultry challenge. The data demonstrate that a custom OAVJ-specific adapter paired with the ONT DRS kits enables capture and sequencing of OAVJ viral RNAs. Further, the new ONT SQK-RNA004 chemistry and flow cells, paired with the associated super accurate base calling workflow improves on read quality and length compared to the previous SQK-RNA002 chemistry. This is the first report of a method to sequence near full-length viral RNA genome of a member of the Paramyxoviridae family. While additional improvements in DRS are needed before widespread adaptation of this method for rapid field sequencing, DRS of OAVJ has the potential to enable further studies into the viral epitranscriptome and its role in infection and pathogenesis.
Collapse
Affiliation(s)
- Megan C Mears
- Exotic and Emerging Avian Viral Disease Research Unit, Southeast Poultry Research Laboratories, US National Poultry Research Center, 934 College Station Road, Athens, GA 30605, United States
| | - Quentin D Read
- USDA-ARS Southeast Area, 840 Oval Drive, Raleigh, NC 27606, United States
| | - Abhijeet Bakre
- Exotic and Emerging Avian Viral Disease Research Unit, Southeast Poultry Research Laboratories, US National Poultry Research Center, 934 College Station Road, Athens, GA 30605, United States.
| |
Collapse
|
6
|
Baramidze V, Sella L, Japaridze T, Dzotsenidze N, Lamazoshvili D, Abashidze N, Basilidze M, Tomashvili G. A Barcoded ITS Primer-Based Nanopore Sequencing Protocol for Detection of Alternaria Species and Other Fungal Pathogens in Diverse Plant Hosts. J Fungi (Basel) 2025; 11:249. [PMID: 40278070 PMCID: PMC12027965 DOI: 10.3390/jof11040249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 03/13/2025] [Accepted: 03/17/2025] [Indexed: 04/26/2025] Open
Abstract
Alternaria is a genus that contains several important plant pathogens affecting nearly 400 plant species worldwide, including economically important crops such as grapes, citrus, and ornamental plants. Rapid, scalable, and efficient methods of pathogen detection are crucial for managing plant diseases and ensuring agricultural productivity. Current amplicon sequencing protocols for Alternaria detection often require the enzymatic barcoding of amplicons, increasing hands-on time, cost, and contamination risk. We present a proof-of-concept study using custom barcoded primers, combining universal primers targeting ITS1 and ITS2 regions (600 bp) coupled with Oxford Nanopore Technologies (ONT) barcode sequences. Sequencing was performed on infected grapevine, mandarin orange, thuja, and maple tree samples. In total, we analyzed 38 samples using qPCR; 8 tested positive for Alternaria, which were sequenced using a newly developed protocol. As a result, we could identify Alternaria in every positive sample, and besides the pathogen of interest, we could identify the associated mycobiome. This protocol reduces hands-on time and cost, making a significant advancement over current sequencing methods. Future work will focus on optimizing our approach for high-throughput sequencing of up to 96 samples and determining the method's applicability for large-scale mycobiome analysis.
Collapse
Affiliation(s)
- Vladimer Baramidze
- Department of Plant Protection, Agricultural University of Georgia, Kakha Bendukidze University Campus, Tbilisi 0159, Georgia; (T.J.); (N.D.); (D.L.)
- Microbiome Research Center, OxGEn Solutions, 14th km Natakhtari, Mtskheta 3308, Georgia; (N.A.); (M.B.)
| | - Luca Sella
- Department of Land, Environment, Agriculture and Forestry, University of Padua, 35020 Padova, Italy;
| | - Tamar Japaridze
- Department of Plant Protection, Agricultural University of Georgia, Kakha Bendukidze University Campus, Tbilisi 0159, Georgia; (T.J.); (N.D.); (D.L.)
| | - Nino Dzotsenidze
- Department of Plant Protection, Agricultural University of Georgia, Kakha Bendukidze University Campus, Tbilisi 0159, Georgia; (T.J.); (N.D.); (D.L.)
| | - Daviti Lamazoshvili
- Department of Plant Protection, Agricultural University of Georgia, Kakha Bendukidze University Campus, Tbilisi 0159, Georgia; (T.J.); (N.D.); (D.L.)
| | - Nino Abashidze
- Microbiome Research Center, OxGEn Solutions, 14th km Natakhtari, Mtskheta 3308, Georgia; (N.A.); (M.B.)
| | - Maka Basilidze
- Microbiome Research Center, OxGEn Solutions, 14th km Natakhtari, Mtskheta 3308, Georgia; (N.A.); (M.B.)
| | - Giorgi Tomashvili
- Department of Virology and Molecular Biology, National Center for Disease Control and Public Health (NCDC), Tbilisi 0198, Georgia;
| |
Collapse
|
7
|
Tsui WHA, Ding SC, Jiang P, Lo YMD. Artificial intelligence and machine learning in cell-free-DNA-based diagnostics. Genome Res 2025; 35:1-19. [PMID: 39843210 PMCID: PMC11789496 DOI: 10.1101/gr.278413.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2025]
Abstract
The discovery of circulating fetal and tumor cell-free DNA (cfDNA) molecules in plasma has opened up tremendous opportunities in noninvasive diagnostics such as the detection of fetal chromosomal aneuploidies and cancers and in posttransplantation monitoring. The advent of high-throughput sequencing technologies makes it possible to scrutinize the characteristics of cfDNA molecules, opening up the fields of cfDNA genetics, epigenetics, transcriptomics, and fragmentomics, providing a plethora of biomarkers. Machine learning (ML) and/or artificial intelligence (AI) technologies that are known for their ability to integrate high-dimensional features have recently been applied to the field of liquid biopsy. In this review, we highlight various AI and ML approaches in cfDNA-based diagnostics. We first introduce the biology of cell-free DNA and basic concepts of ML and AI technologies. We then discuss selected examples of ML- or AI-based applications in noninvasive prenatal testing and cancer liquid biopsy. These applications include the deduction of fetal DNA fraction, plasma DNA tissue mapping, and cancer detection and localization. Finally, we offer perspectives on the future direction of using ML and AI technologies to leverage cfDNA fragmentation patterns in terms of methylomic and transcriptional investigations.
Collapse
Affiliation(s)
- W H Adrian Tsui
- Center for Novostics, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Spencer C Ding
- Center for Novostics, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Peiyong Jiang
- Center for Novostics, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
- State Key Laboratory of Translational Oncology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Y M Dennis Lo
- Center for Novostics, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong SAR, China;
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
- State Key Laboratory of Translational Oncology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| |
Collapse
|
8
|
Li Q, Sun C, Wang D, Lou J. BaseNet: A transformer-based toolkit for nanopore sequencing signal decoding. Comput Struct Biotechnol J 2024; 23:3430-3444. [PMID: 39391372 PMCID: PMC11465205 DOI: 10.1016/j.csbj.2024.09.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 09/18/2024] [Accepted: 09/24/2024] [Indexed: 10/12/2024] Open
Abstract
Nanopore sequencing provides a rapid, convenient and high-throughput solution for nucleic acid sequencing. Accurate basecalling in nanopore sequencing is crucial for downstream analysis. Traditional approaches such as Hidden Markov Models (HMM), Recurrent Neural Networks (RNN), and Convolutional Neural Networks (CNN) have improved basecalling accuracy but there is a continuous need for higher accuracy and reliability. In this study, we introduce BaseNet (https://github.com/liqingwen98/BaseNet), an open-source toolkit that utilizes transformer models for advanced signal decoding in nanopore sequencing. BaseNet incorporates both autoregressive and non-autoregressive transformer-based decoding mechanisms, offering state-of-the-art algorithms freely accessible for future improvement. Our research indicates that cross-attention weights effectively map the relationship between current signals and base sequences, joint loss training through adding a pair of forward and reverse decoder facilitate model converge, and large-scale pre-trained models achieve superior decoding accuracy. This study helps to advance the field of nanopore sequencing signal decoding, contributes to technological advancements, and provides novel concepts and tools for researchers and practitioners.
Collapse
Affiliation(s)
- Qingwen Li
- Key Laboratory of Epigenetic Regulation and Intervention, Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chen Sun
- Beijing Polyseq Biotech Co. Ltd., Beijing 100089, China
| | - Daqian Wang
- Beijing Polyseq Biotech Co. Ltd., Beijing 100089, China
| | - Jizhong Lou
- Key Laboratory of Epigenetic Regulation and Intervention, Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Beijing Polyseq Biotech Co. Ltd., Beijing 100089, China
| |
Collapse
|
9
|
Li Q, Sun C, Wang D, Lou J. GCRTcall: a transformer based basecaller for nanopore RNA sequencing enhanced by gated convolution and relative position embedding via joint loss training. Front Genet 2024; 15:1443532. [PMID: 39649096 PMCID: PMC11621211 DOI: 10.3389/fgene.2024.1443532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 11/04/2024] [Indexed: 12/10/2024] Open
Abstract
Nanopore sequencing, renowned for its ability to sequence DNA and RNA directly with read lengths extending to several hundred kilobases or even megabases, holds significant promise in fields like transcriptomics and other omics studies. Despite its potential, the technology's limited accuracy in base identification has restricted its widespread application. Although many algorithms have been developed to improve DNA decoding, advancements in RNA sequencing remain limited. Addressing this challenge, we introduce GCRTcall, a novel approach integrating Transformer architecture with gated convolutional networks and relative positional encoding for RNA sequencing signal decoding. Our evaluation demonstrates that GCRTcall achieves state-of-the-art performance in RNA basecalling.
Collapse
Affiliation(s)
- Qingwen Li
- Key Laboratory of Epigenetic Regulation and Intervention, Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Chen Sun
- Beijing Polyseq Biotech Co., Ltd., Beijing, China
| | - Daqian Wang
- Beijing Polyseq Biotech Co., Ltd., Beijing, China
| | - Jizhong Lou
- Key Laboratory of Epigenetic Regulation and Intervention, Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- Beijing Polyseq Biotech Co., Ltd., Beijing, China
| |
Collapse
|
10
|
Diensthuber G, Pryszcz LP, Llovera L, Lucas MC, Delgado-Tejedor A, Cruciani S, Roignant JY, Begik O, Novoa EM. Enhanced detection of RNA modifications and read mapping with high-accuracy nanopore RNA basecalling models. Genome Res 2024; 34:1865-1877. [PMID: 39271295 PMCID: PMC11610583 DOI: 10.1101/gr.278849.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 09/10/2024] [Indexed: 09/15/2024]
Abstract
In recent years, nanopore direct RNA sequencing (DRS) became a valuable tool for studying the epitranscriptome, owing to its ability to detect multiple modifications within the same full-length native RNA molecules. Although RNA modifications can be identified in the form of systematic basecalling "errors" in DRS data sets, N6-methyladenosine (m6A) modifications produce relatively low "errors" compared with other RNA modifications, limiting the applicability of this approach to m6A sites that are modified at high stoichiometries. Here, we demonstrate that the use of alternative RNA basecalling models, trained with fully unmodified sequences, increases the "error" signal of m6A, leading to enhanced detection and improved sensitivity even at low stoichiometries. Moreover, we find that high-accuracy alternative RNA basecalling models can show up to 97% median basecalling accuracy, outperforming currently available RNA basecalling models, which show 91% median basecalling accuracy. Notably, the use of high-accuracy basecalling models is accompanied by a significant increase in the number of mapped reads-especially in shorter RNA fractions-and increased basecalling error signatures at pseudouridine (Ψ)- and N1-methylpseudouridine (m1Ψ)-modified sites. Overall, our work demonstrates that alternative RNA basecalling models can be used to improve the detection of RNA modifications, read mappability, and basecalling accuracy in nanopore DRS data sets.
Collapse
Affiliation(s)
- Gregor Diensthuber
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
- Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Leszek P Pryszcz
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | - Laia Llovera
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
| | - Morghan C Lucas
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
- Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Anna Delgado-Tejedor
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
- Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Sonia Cruciani
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain
- Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Jean-Yves Roignant
- Center for Integrative Genomics, Faculty of Biology and Medicine, University of Lausanne, 1015 Lausanne, Switzerland
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Mainz, 55128 Mainz, Germany
| | - Oguzhan Begik
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain;
| | - Eva Maria Novoa
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain;
- Universitat Pompeu Fabra, Barcelona 08003, Spain
| |
Collapse
|
11
|
Liu L, Liu Z, Xu X, Wang J, Tong Z. Solid-state nanochannels based on electro-optical dual signals for detection of analytes. Talanta 2024; 279:126615. [PMID: 39096787 DOI: 10.1016/j.talanta.2024.126615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 07/09/2024] [Accepted: 07/23/2024] [Indexed: 08/05/2024]
Abstract
The sensitive detection of analytes of different sizes is crucial significance for environmental protection, food safety and medical diagnostics. The confined space of nanochannels provides a location closest to the molecular reaction behaviors in real systems, thereby opening new opportunities for the precise detection of analytes. However, due to the susceptibility to external interference on the confined space of nanochannels, the high sensitivity nature of the current signals through the nanochannels is more troubling for the detection reliability. Combining highly sensitive optical signals with the sensitive current signals of solid-state nanochannels establishes a nanochannel detection platform based on electro-optical dual signals, potentially offering more sensitive, specific, and accuracy detection of analytes. This review summarizes the last five years of applications of solid-state nanochannels based on electro-optical dual signals in analytes detection. Firstly, the detection principles of solid-state nanochannels and the construction strategies of nanochannel electro-optical sensing platforms are discussed. Subsequently, the review comprehensively outlines the applications involving nanochannels with electrical signals combined with fluorescence signals, electrical signals combined with surface-enhanced Raman spectroscopy signals, and electrical signals combined with other optical signals in analyte detection. Additionally, the perspectives and difficulties of nanochannels are investigated on the basis of electro-optical dual signals.
Collapse
Affiliation(s)
- Lingxiao Liu
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Zhiwei Liu
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Xinrui Xu
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Jiang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Zhaoyang Tong
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China.
| |
Collapse
|
12
|
Bhandari BK, Goldman N. A generalized protein identification method for novel and diverse sequencing technologies. NAR Genom Bioinform 2024; 6:lqae126. [PMID: 39296929 PMCID: PMC11409062 DOI: 10.1093/nargab/lqae126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 08/01/2024] [Accepted: 09/03/2024] [Indexed: 09/21/2024] Open
Abstract
Protein sequencing is a rapidly evolving field with much progress towards the realization of a new generation of protein sequencers. The early devices, however, may not be able to reliably discriminate all 20 amino acids, resulting in a partial, noisy and possibly error-prone signature of a protein. Rather than achieving de novo sequencing, these devices may aim to identify target proteins by comparing such signatures to databases of known proteins. However, there are no broadly applicable methods for this identification problem. Here, we devise a hidden Markov model method to study the generalized problem of protein identification from noisy signature data. Based on a hypothetical sequencing device that can simulate several novel technologies, we show that on the human protein database (N = 20 181) our method has a good performance under many different operating conditions such as various levels of signal resolvability, different numbers of discriminated amino acids, sequence fragments, and insertion and deletion error rates. Our results demonstrate the possibility of protein identification with high accuracy on many early experimental devices. We anticipate our method to be applicable for a wide range of protein sequencing devices in the future.
Collapse
Affiliation(s)
- Bikash Kumar Bhandari
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
13
|
Wang R, Chen J. DeepCorr: a novel error correction method for 3GS long reads based on deep learning. PeerJ Comput Sci 2024; 10:e2160. [PMID: 39678285 PMCID: PMC11639150 DOI: 10.7717/peerj-cs.2160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 06/07/2024] [Indexed: 12/17/2024]
Abstract
Long reads generated by third-generation sequencing (3GS) technologies are involved in many biological analyses and play a vital role due to their ultra-long read length. However, the high error rate affects the downstream process. DeepCorr, a novel error correction algorithm for data from both PacBio and ONT platforms based on deep learning is proposed. The core algorithm adopts a recurrent neural network to capture the long-term dependencies in the long reads to convert the problem of long-read error correction to a multi-classification task. It first aligns the high-precision short reads to long reads to generate the corresponding feature vectors and labels, then feeds these vectors to the neural network, and finally trains the model for prediction and error correction. DeepCorr produces untrimmed corrected long reads and improves the alignment identity while maintaining the length advantage. It can capture and make full use of the dependencies to polish those bases that are not aligned by any short read. DeepCorr achieves better performance than that of the state-of-the-art error correction methods on real-world PacBio and ONT benchmark data sets and consumes fewer computing resources. It is a comprehensive deep learning-based tool that enables one to correct long reads accurately.
Collapse
Affiliation(s)
- Rongshu Wang
- Department of Electronic Engineering, Information School, Yunnan University, Kunming, Yunnan, China
| | - Jianhua Chen
- Department of Electronic Engineering, Information School, Yunnan University, Kunming, Yunnan, China
| |
Collapse
|
14
|
Wang R, Chen J. NmTHC: a hybrid error correction method based on a generative neural machine translation model with transfer learning. BMC Genomics 2024; 25:573. [PMID: 38849740 PMCID: PMC11157743 DOI: 10.1186/s12864-024-10446-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 05/22/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUNDS The single-pass long reads generated by third-generation sequencing technology exhibit a higher error rate. However, the circular consensus sequencing (CCS) produces shorter reads. Thus, it is effective to manage the error rate of long reads algorithmically with the help of the homologous high-precision and low-cost short reads from the Next Generation Sequencing (NGS) technology. METHODS In this work, a hybrid error correction method (NmTHC) based on a generative neural machine translation model is proposed to automatically capture discrepancies within the aligned regions of long reads and short reads, as well as the contextual relationships within the long reads themselves for error correction. Akin to natural language sequences, the long read can be regarded as a special "genetic language" and be processed with the idea of generative neural networks. The algorithm builds a sequence-to-sequence(seq2seq) framework with Recurrent Neural Network (RNN) as the core layer. The before and post-corrected long reads are regarded as the sentences in the source and target language of translation, and the alignment information of long reads with short reads is used to create the special corpus for training. The well-trained model can be used to predict the corrected long read. RESULTS NmTHC outperforms the latest mainstream hybrid error correction methods on real-world datasets from two mainstream platforms, including PacBio and Nanopore. Our experimental evaluation results demonstrate that NmTHC can align more bases with the reference genome without any segmenting in the six benchmark datasets, proving that it enhances alignment identity without sacrificing any length advantages of long reads. CONCLUSION Consequently, NmTHC reasonably adopts the generative Neural Machine Translation (NMT) model to transform hybrid error correction tasks into machine translation problems and provides a novel perspective for solving long-read error correction problems with the ideas of Natural Language Processing (NLP). More remarkably, the proposed methodology is sequencing-technology-independent and can produce more precise reads.
Collapse
Affiliation(s)
- Rongshu Wang
- Department of Electronic Engineering, Information School, Yunnan University, Kunming, Yunnan, China
| | - Jianhua Chen
- Department of Electronic Engineering, Information School, Yunnan University, Kunming, Yunnan, China.
| |
Collapse
|
15
|
Yao B, Hsu C, Goldner G, Michaeli Y, Ebenstein Y, Listgarten J. Effective training of nanopore callers for epigenetic marks with limited labelled data. Open Biol 2024; 14:230449. [PMID: 38862018 DOI: 10.1098/rsob.230449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 03/04/2024] [Indexed: 06/13/2024] Open
Abstract
Nanopore sequencing platforms combined with supervised machine learning (ML) have been effective at detecting base modifications in DNA such as 5-methylcytosine (5mC) and N6-methyladenine (6mA). These ML-based nanopore callers have typically been trained on data that span all modifications on all possible DNA [Formula: see text]-mer backgrounds-a complete training dataset. However, as nanopore technology is pushed to more and more epigenetic modifications, such complete training data will not be feasible to obtain. Nanopore calling has historically been performed with hidden Markov models (HMMs) that cannot make successful calls for [Formula: see text]-mer contexts not seen during training because of their independent emission distributions. However, deep neural networks (DNNs), which share parameters across contexts, are increasingly being used as callers, often outperforming their HMM cousins. It stands to reason that a DNN approach should be able to better generalize to unseen [Formula: see text]-mer contexts. Indeed, herein we demonstrate that a common DNN approach (DeepSignal) outperforms a common HMM approach (Nanopolish) in the incomplete data setting. Furthermore, we propose a novel hybrid HMM-DNN approach, amortized-HMM, that outperforms both the pure HMM and DNN approaches on 5mC calling when the training data are incomplete. This type of approach is expected to be useful for calling other base modifications such as 5-hydroxymethylcytosine and for the simultaneous calling of different modifications, settings in which complete training data are not likely to be available.
Collapse
Affiliation(s)
- Brian Yao
- Department of Electrical Engineering & Computer Sciences, University of California , Berkeley, CA 94720, USA
| | - Chloe Hsu
- Department of Electrical Engineering & Computer Sciences, University of California , Berkeley, CA 94720, USA
| | - Gal Goldner
- Department of Chemical Physics, Tel Aviv University , Tel Aviv-Yafo, Israel
| | - Yael Michaeli
- Department of Chemical Physics, Tel Aviv University , Tel Aviv-Yafo, Israel
| | - Yuval Ebenstein
- Department of Chemical Physics, Tel Aviv University , Tel Aviv-Yafo, Israel
- Edmond J. Safra Center for Bioinformatics, Tel Aviv University , Tel Aviv-Yafo, Israel
| | - Jennifer Listgarten
- Department of Electrical Engineering & Computer Sciences, University of California , Berkeley, CA 94720, USA
- Center for Computational Biology, University of California , Berkeley, CA 94720, USA
| |
Collapse
|
16
|
Gündüz HA, Mreches R, Moosbauer J, Robertson G, To XY, Franzosa EA, Huttenhower C, Rezaei M, McHardy AC, Bischl B, Münch PC, Binder M. Optimized model architectures for deep learning on genomic data. Commun Biol 2024; 7:516. [PMID: 38693292 PMCID: PMC11063068 DOI: 10.1038/s42003-024-06161-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 04/08/2024] [Indexed: 05/03/2024] Open
Abstract
The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines.
Collapse
Affiliation(s)
- Hüseyin Anil Gündüz
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - René Mreches
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Julia Moosbauer
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Gary Robertson
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Xiao-Yin To
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Eric A Franzosa
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Mina Rezaei
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Alice C McHardy
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- German Centre for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany
| | - Bernd Bischl
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Philipp C Münch
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany.
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA.
- German Centre for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany.
| | - Martin Binder
- Department of Statistics, LMU Munich, Munich, Germany.
- Munich Center for Machine Learning, Munich, Germany.
| |
Collapse
|
17
|
Cao B, Zheng Y, Shao Q, Liu Z, Xie L, Zhao Y, Wang B, Zhang Q, Wei X. Efficient data reconstruction: The bottleneck of large-scale application of DNA storage. Cell Rep 2024; 43:113699. [PMID: 38517891 DOI: 10.1016/j.celrep.2024.113699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/15/2023] [Accepted: 01/05/2024] [Indexed: 03/24/2024] Open
Abstract
Over the past decade, the rapid development of DNA synthesis and sequencing technologies has enabled preliminary use of DNA molecules for digital data storage, overcoming the capacity and persistence bottlenecks of silicon-based storage media. DNA storage has now been fully accomplished in the laboratory through existing biotechnology, which again demonstrates the viability of carbon-based storage media. However, the high cost and latency of data reconstruction pose challenges that hinder the practical implementation of DNA storage beyond the laboratory. In this article, we review existing advanced DNA storage methods, analyze the characteristics and performance of biotechnological approaches at various stages of data writing and reading, and discuss potential factors influencing DNA storage from the perspective of data reconstruction.
Collapse
Affiliation(s)
- Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China; Centre for Frontier AI Research, Agency for Science, Technology, and Research (A(∗)STAR), 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| | - Qi Shao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Zhenlu Liu
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Yunzhu Zhao
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Xuefu Street, Dalian, Liaoning 116622, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China.
| | - Xiaopeng Wei
- School of Computer Science and Technology, Dalian University of Technology, Lingshui Street, Dalian, Liaoning 116024, China
| |
Collapse
|
18
|
Chafai N, Bonizzi L, Botti S, Badaoui B. Emerging applications of machine learning in genomic medicine and healthcare. Crit Rev Clin Lab Sci 2024; 61:140-163. [PMID: 37815417 DOI: 10.1080/10408363.2023.2259466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 09/12/2023] [Indexed: 10/11/2023]
Abstract
The integration of artificial intelligence technologies has propelled the progress of clinical and genomic medicine in recent years. The significant increase in computing power has facilitated the ability of artificial intelligence models to analyze and extract features from extensive medical data and images, thereby contributing to the advancement of intelligent diagnostic tools. Artificial intelligence (AI) models have been utilized in the field of personalized medicine to integrate clinical data and genomic information of patients. This integration allows for the identification of customized treatment recommendations, ultimately leading to enhanced patient outcomes. Notwithstanding the notable advancements, the application of artificial intelligence (AI) in the field of medicine is impeded by various obstacles such as the limited availability of clinical and genomic data, the diversity of datasets, ethical implications, and the inconclusive interpretation of AI models' results. In this review, a comprehensive evaluation of multiple machine learning algorithms utilized in the fields of clinical and genomic medicine is conducted. Furthermore, we present an overview of the implementation of artificial intelligence (AI) in the fields of clinical medicine, drug discovery, and genomic medicine. Finally, a number of constraints pertaining to the implementation of artificial intelligence within the healthcare industry are examined.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Faculty of Sciences, Department of Biology, Mohammed V University in Rabat, Rabat, Morocco
| | - Luigi Bonizzi
- Department of Biomedical, Surgical and Dental Science, University of Milan, Milan, Italy
| | - Sara Botti
- PTP Science Park, Via Einstein - Loc. Cascina Codazza, Lodi, Italy
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Faculty of Sciences, Department of Biology, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laâyoune, Morocco
| |
Collapse
|
19
|
Dorey A, Howorka S. Nanopore DNA sequencing technologies and their applications towards single-molecule proteomics. Nat Chem 2024; 16:314-334. [PMID: 38448507 DOI: 10.1038/s41557-023-01322-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 07/14/2023] [Indexed: 03/08/2024]
Abstract
Sequencing of nucleic acids with nanopores has emerged as a powerful tool offering rapid readout, high accuracy, low cost and portability. This label-free method for sequencing at the single-molecule level is an achievement on its own. However, nanopores also show promise for the technologically even more challenging sequencing of polypeptides, something that could considerably benefit biological discovery, clinical diagnostics and homeland security, as current techniques lack portability and speed. Here we survey the biochemical innovations underpinning commercial and academic nanopore DNA/RNA sequencing techniques, and explore how these advances can fuel developments in future protein sequencing with nanopores.
Collapse
Affiliation(s)
- Adam Dorey
- Department of Chemistry & Institute of Structural Molecular Biology, University College London, London, UK.
| | - Stefan Howorka
- Department of Chemistry & Institute of Structural Molecular Biology, University College London, London, UK.
| |
Collapse
|
20
|
Mackinnon AC, Chandrashekar DS, Suster DI. Molecular pathology as basis for timely cancer diagnosis and therapy. Virchows Arch 2024; 484:155-168. [PMID: 38012424 DOI: 10.1007/s00428-023-03707-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/16/2023] [Accepted: 11/08/2023] [Indexed: 11/29/2023]
Abstract
Precision and personalized therapeutics have witnessed significant advancements in technology, revolutionizing the capabilities of laboratories to generate vast amounts of genetic data. Coupled with computational resources for analysis and interpretation, and integrated with various other types of data, including genomic data, electronic medical health (EMH) data, and clinical knowledge, these advancements support optimized health decisions. Among these technologies, next-generation sequencing (NGS) stands out as a transformative tool in the field of cancer treatment, playing a crucial role in precision oncology. NGS-based workflows are employed across a range of applications, including gene panels, exome sequencing, and whole-genome sequencing, supporting comprehensive analysis of the entire cancer genome, including mutations, copy number variations, gene expression profiles, and epigenetic modifications. By utilizing the power of NGS, these workflows contribute to enhancing our understanding of disease mechanisms, diagnosis confirmation, identifying therapeutic targets, and guiding personalized treatment decisions. This manuscript explores the diverse applications of NGS in cancer treatment, highlighting its significance in guiding diagnosis and treatment decisions, identifying therapeutic targets, monitoring disease progression, and improving patient outcomes.
Collapse
Affiliation(s)
- A Craig Mackinnon
- Department of Pathology, University of Alabama at Birmingham, 619 19Th Street South, Birmingham, AL, 35249, USA.
| | | | - David I Suster
- Department of Pathology, Rutgers University New Jersey Medical School, 150 Bergen Street, Newark, NJ, 07103, USA.
| |
Collapse
|
21
|
Choon YW, Choon YF, Nasarudin NA, Al Jasmi F, Remli MA, Alkayali MH, Mohamad MS. Artificial intelligence and database for NGS-based diagnosis in rare disease. Front Genet 2024; 14:1258083. [PMID: 38371307 PMCID: PMC10870236 DOI: 10.3389/fgene.2023.1258083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 11/24/2023] [Indexed: 02/20/2024] Open
Abstract
Rare diseases (RDs) are rare complex genetic diseases affecting a conservative estimate of 300 million people worldwide. Recent Next-Generation Sequencing (NGS) studies are unraveling the underlying genetic heterogeneity of this group of diseases. NGS-based methods used in RDs studies have improved the diagnosis and management of RDs. Concomitantly, a suite of bioinformatics tools has been developed to sort through big data generated by NGS to understand RDs better. However, there are concerns regarding the lack of consistency among different methods, primarily linked to factors such as the lack of uniformity in input and output formats, the absence of a standardized measure for predictive accuracy, and the regularity of updates to the annotation database. Today, artificial intelligence (AI), particularly deep learning, is widely used in a variety of biological contexts, changing the healthcare system. AI has demonstrated promising capabilities in boosting variant calling precision, refining variant prediction, and enhancing the user-friendliness of electronic health record (EHR) systems in NGS-based diagnostics. This paper reviews the state of the art of AI in NGS-based genetics, and its future directions and challenges. It also compare several rare disease databases.
Collapse
Affiliation(s)
- Yee Wen Choon
- Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
- Faculty of Data Science and Informatics, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
| | - Yee Fan Choon
- Faculty of Dentistry, Lincoln University College, Petaling Jaya, Selangor, Malaysia
| | - Nurul Athirah Nasarudin
- Health Data Science Lab, Department of Genetics and Genomics, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Fatma Al Jasmi
- Health Data Science Lab, Department of Genetics and Genomics, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Muhamad Akmal Remli
- Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
- Faculty of Data Science and Informatics, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
| | | | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| |
Collapse
|
22
|
Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. Toward Explainable Artificial Intelligence for Precision Pathology. ANNUAL REVIEW OF PATHOLOGY 2024; 19:541-570. [PMID: 37871132 DOI: 10.1146/annurev-pathmechdis-051222-113147] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The rapid development of precision medicine in recent years has started to challenge diagnostic pathology with respect to its ability to analyze histological images and increasingly large molecular profiling data in a quantitative, integrative, and standardized way. Artificial intelligence (AI) and, more precisely, deep learning technologies have recently demonstrated the potential to facilitate complex data analysis tasks, including clinical, histological, and molecular data for disease classification; tissue biomarker quantification; and clinical outcome prediction. This review provides a general introduction to AI and describes recent developments with a focus on applications in diagnostic pathology and beyond. We explain limitations including the black-box character of conventional AI and describe solutions to make machine learning decisions more transparent with so-called explainable AI. The purpose of the review is to foster a mutual understanding of both the biomedical and the AI side. To that end, in addition to providing an overview of the relevant foundations in pathology and machine learning, we present worked-through examples for a better practical understanding of what AI can achieve and how it should be done.
Collapse
Affiliation(s)
- Frederick Klauschen
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Jonas Dippel
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
| | - Philipp Keyl
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Philipp Jurmeister
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Michael Bockmayr
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
| | - Andreas Mock
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
- German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
| | - Oliver Buchstab
- Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
| | - Maximilian Alber
- Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany
- Aignostics, Berlin, Germany
| | | | - Grégoire Montavon
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Klaus-Robert Müller
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany
- Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
- Department of Artificial Intelligence, Korea University, Seoul, Korea
- Max Planck Institute for Informatics, Saarbrücken, Germany
| |
Collapse
|
23
|
Stuber A, Schlotter T, Hengsteler J, Nakatsuka N. Solid-State Nanopores for Biomolecular Analysis and Detection. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2024; 187:283-316. [PMID: 38273209 DOI: 10.1007/10_2023_240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
Advances in nanopore technology and data processing have rendered DNA sequencing highly accessible, unlocking a new realm of biotechnological opportunities. Commercially available nanopores for DNA sequencing are of biological origin and have certain disadvantages such as having specific environmental requirements to retain functionality. Solid-state nanopores have received increased attention as modular systems with controllable characteristics that enable deployment in non-physiological milieu. Thus, we focus our review on summarizing recent innovations in the field of solid-state nanopores to envision the future of this technology for biomolecular analysis and detection. We begin by introducing the physical aspects of nanopore measurements ranging from interfacial interactions at pore and electrode surfaces to mass transport of analytes and data analysis of recorded signals. Then, developments in nanopore fabrication and post-processing techniques with the pros and cons of different methodologies are examined. Subsequently, progress to facilitate DNA sequencing using solid-state nanopores is described to assess how this platform is evolving to tackle the more complex challenge of protein sequencing. Beyond sequencing, we highlight the recent developments in biosensing of nucleic acids, proteins, and sugars and conclude with an outlook on the frontiers of nanopore technologies.
Collapse
Affiliation(s)
- Annina Stuber
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich, Switzerland
| | - Tilman Schlotter
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich, Switzerland
| | - Julian Hengsteler
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich, Switzerland
| | - Nako Nakatsuka
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich, Switzerland.
| |
Collapse
|
24
|
Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023; 24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.
Collapse
Affiliation(s)
- Tianwei Yue
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Yuanxin Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Longxiang Zhang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Chunming Gu
- Department of Biomedical Engineering, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA;
| | - Haoru Xue
- The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Wenping Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Qi Lyu
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA;
| | - Yujie Dun
- School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China;
| |
Collapse
|
25
|
Ashraf H, Ebler J, Marschall T. Allele detection using k-mer-based sequencing error profiles. BIOINFORMATICS ADVANCES 2023; 3:vbad149. [PMID: 37928341 PMCID: PMC10625474 DOI: 10.1093/bioadv/vbad149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 09/21/2023] [Accepted: 10/19/2023] [Indexed: 11/07/2023]
Abstract
Motivation For genotype and haplotype inference, typically, sequencing reads aligned to a reference genome are used. The alignments identify the genomic origin of the reads and help to infer the absence or presence of sequence variants in the genome. Since long sequencing reads often come with high rates of systematic sequencing errors, single nucleotides in the reads are not always correctly aligned to the reference genome, which can thus lead to wrong conclusions about the allele carried by a sequencing read at the variant site. Thus, allele detection is not a trivial task, especially for single-nucleotide polymorphisms and indels. Results To learn the characteristics of sequencing errors, we introduce a method to create an error model in non-variant regions of the genome. This information is later used to distinguish sequencing errors from alternative alleles in variant regions. We show that our method, k-merald, improves allele detection accuracy leading to better genotyping performance as compared to the existing WhatsHap implementation using edit-distance-based allele detection, with a decrease of 18% and 24% in error rate for high-coverage Oxford Nanopore and PacBio CLR sequencing reads for sample HG002, respectively. We additionally observed a prominent improvement in genotyping performance for sequencing data with low coverage. For 3× coverage Oxford Nanopore sequencing data, the genotyping error rate reduced from 34% to 31%, corresponding to a 9% decrease. Availability and implementation https://github.com/whatshap/whatshap.
Collapse
Affiliation(s)
- Hufsah Ashraf
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| |
Collapse
|
26
|
van Dijk EL, Naquin D, Gorrichon K, Jaszczyszyn Y, Ouazahrou R, Thermes C, Hernandez C. Genomics in the long-read sequencing era. Trends Genet 2023; 39:649-671. [PMID: 37230864 DOI: 10.1016/j.tig.2023.04.006] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023]
Abstract
Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.
Collapse
Affiliation(s)
- Erwin L van Dijk
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.
| | - Delphine Naquin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Kévin Gorrichon
- National Center of Human Genomics Research (CNRGH), 91000 Évry-Courcouronnes, France
| | - Yan Jaszczyszyn
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Rania Ouazahrou
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Claude Thermes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Céline Hernandez
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
27
|
Kuśmirek W. Estimated Nucleotide Reconstruction Quality Symbols of Basecalling Tools for Oxford Nanopore Sequencing. SENSORS (BASEL, SWITZERLAND) 2023; 23:6787. [PMID: 37571570 PMCID: PMC10422362 DOI: 10.3390/s23156787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 07/21/2023] [Accepted: 07/27/2023] [Indexed: 08/13/2023]
Abstract
Currently, one of the fastest-growing DNA sequencing technologies is nanopore sequencing. One of the key stages involved in processing sequencer data is the basecalling process, where the input sequence of currents measured on the nanopores of the sequencer reproduces the DNA sequences, called DNA reads. Many of the applications dedicated to basecalling, together with the DNA sequence, provide the estimated quality of the reconstruction of a given nucleotide (quality symbols are contained on every fourth line of the FASTQ file; each nucleotide in the FASTQ file corresponds to exactly one estimated nucleotide reconstruction quality symbol). Herein, we compare the estimated nucleotide reconstruction quality symbols (signs from every fourth line of the FASTQ file) reported by other basecallers. The conducted experiments consisted of basecalling the same raw datasets from the nanopore device by other basecallers and comparing the provided quality symbols, denoting the estimated quality of the nucleotide reconstruction. The results show that the estimated quality reported by different basecallers may vary, depending on the tool used, particularly in terms of range and distribution. Moreover, we mapped basecalled DNA reads to reference genomes and calculated matched and mismatched rates for groups of nucleotides with the same quality symbol. Finally, the presented paper shows that the estimated nucleotide reconstruction quality reported in the basecalling process is not used in any investigated tool for processing nanopore DNA reads.
Collapse
Affiliation(s)
- Wiktor Kuśmirek
- Institute of Computer Science, Warsaw University of Technology, 00-661 Warsaw, Poland
| |
Collapse
|
28
|
Pagès-Gallego M, de Ridder J. Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling. Genome Biol 2023; 24:71. [PMID: 37041647 PMCID: PMC10088207 DOI: 10.1186/s13059-023-02903-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 03/20/2023] [Indexed: 04/13/2023] Open
Abstract
BACKGROUND Nanopore-based DNA sequencing relies on basecalling the electric current signal. Basecalling requires neural networks to achieve competitive accuracies. To improve sequencing accuracy further, new models are continuously proposed with new architectures. However, benchmarking is currently not standardized, and evaluation metrics and datasets used are defined on a per publication basis, impeding progress in the field. This makes it impossible to distinguish data from model driven improvements. RESULTS To standardize the process of benchmarking, we unified existing benchmarking datasets and defined a rigorous set of evaluation metrics. We benchmarked the latest seven basecaller models by recreating and analyzing their neural network architectures. Our results show that overall Bonito's architecture is the best for basecalling. We find, however, that species bias in training can have a large impact on performance. Our comprehensive evaluation of 90 novel architectures demonstrates that different models excel at reducing different types of errors and using recurrent neural networks (long short-term memory) and a conditional random field decoder are the main drivers of high performing models. CONCLUSIONS We believe that our work can facilitate the benchmarking of new basecaller tools and that the community can further expand on this work.
Collapse
Affiliation(s)
- Marc Pagès-Gallego
- Center for Molecular Medicine, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| |
Collapse
|
29
|
Wu S, Schmitz U. Single-cell and long-read sequencing to enhance modelling of splicing and cell-fate determination. Comput Struct Biotechnol J 2023; 21:2373-2380. [PMID: 37066125 PMCID: PMC10091034 DOI: 10.1016/j.csbj.2023.03.023] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/13/2023] [Accepted: 03/13/2023] [Indexed: 04/03/2023] Open
Abstract
Single-cell sequencing technologies have revolutionised the life sciences and biomedical research. Single-cell sequencing provides high-resolution data on cell heterogeneity, allowing high-fidelity cell type identification, and lineage tracking. Computational algorithms and mathematical models have been developed to make sense of the data, compensate for errors and simulate the biological processes, which has led to breakthroughs in our understanding of cell differentiation, cell-fate determination and tissue cell composition. The development of long-read (a.k.a. third-generation) sequencing technologies has produced powerful tools for investigating alternative splicing, isoform expression (at the RNA level), genome assembly and the detection of complex structural variants (at the DNA level). In this review, we provide an overview of the recent advancements in single-cell and long-read sequencing technologies, with a particular focus on the computational algorithms that help in correcting, analysing, and interpreting the resulting data. Additionally, we review some mathematical models that use single-cell and long-read sequencing data to study cell-fate determination and alternative splicing, respectively. Moreover, we highlight the emerging opportunities in modelling cell-fate determination that result from the combination of single-cell and long-read sequencing technologies.
Collapse
Affiliation(s)
- Siyuan Wu
- Department of Molecular & Cell Biology, James Cook University, Townsville 4811, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns 4870, Queensland, Australia
- School of Mathematics, Monash University, Melbourne 3800, Victoria, Australia
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville 4811, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns 4870, Queensland, Australia
| |
Collapse
|
30
|
Crittenden CM, Lanzillotti MB, Chen B. Top-Down Mass Spectrometry of Synthetic Single Guide Ribonucleic Acids Enabled by Facile Sample Clean-Up. Anal Chem 2023; 95:3180-3186. [PMID: 36606446 DOI: 10.1021/acs.analchem.2c03030] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
In recent years, CRISPR-Cas9 genome editing has become an important technology in biomedical research and has demonstrated tremendous therapeutic potential. With Cas9 endonuclease, the use of single guide ribonucleic acids (sgRNAs) allows for sequence-specific cutting on target double-stranded deoxyribonucleic acids. Therefore, the design and quality of sgRNAs can greatly affect the efficiency and specificity of genome editing. Mass spectrometry (MS) has been a powerful tool to detect molecular features and sequence a variety of biomolecules; however, as the sizes of oligonucleotides get larger, it becomes more challenging to desalt samples and achieve high-quality intact spectra with effective fragmentation. Here, we develop a simple but effective online column-based clean-up method (reversed-phase column in a size exclusion mode) that removes formulation salts and metal adducts from larger oligonucleotides upon entering the mass spectrometer in a consistent manner. Using the top-down approach without any nuclease digestion, we characterized and sequenced 100-nucleotide-long sgRNAs by higher-energy collision dissociation (HCD), collision-induced dissociation (CID), ultraviolet photodissociation (UVPD), and activated electron photodetachment (a-EPD). In a single 10 min liquid chromatography-tandem MS (LC-MS/MS) run, CID yielded the best sequence coverage, of 67%. When adding complementary UVPD and a-EPD runs, we achieved 80% overall sequence coverage and 100% cleavages for the variable sequence, the first 20 nucleotides from the 5' end. This LC-MS/MS platform provides a facile top-down workflow to analyze and sequence larger chemically modified oligonucleotides with no sample treatment.
Collapse
Affiliation(s)
- Christopher M Crittenden
- Small Molecule Analytical Chemistry, Genentech Inc., South San Francisco, California 94080, United States
| | | | - Bifan Chen
- Small Molecule Analytical Chemistry, Genentech Inc., South San Francisco, California 94080, United States
| |
Collapse
|
31
|
Bergen DJM, Maurizi A, Formosa MM, McDonald GLK, El-Gazzar A, Hassan N, Brandi ML, Riancho JA, Rivadeneira F, Ntzani E, Duncan EL, Gregson CL, Kiel DP, Zillikens MC, Sangiorgi L, Högler W, Duran I, Mäkitie O, Van Hul W, Hendrickx G. High Bone Mass Disorders: New Insights From Connecting the Clinic and the Bench. J Bone Miner Res 2023; 38:229-247. [PMID: 36161343 PMCID: PMC10092806 DOI: 10.1002/jbmr.4715] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/05/2022] [Accepted: 09/22/2022] [Indexed: 02/04/2023]
Abstract
Monogenic high bone mass (HBM) disorders are characterized by an increased amount of bone in general, or at specific sites in the skeleton. Here, we describe 59 HBM disorders with 50 known disease-causing genes from the literature, and we provide an overview of the signaling pathways and mechanisms involved in the pathogenesis of these disorders. Based on this, we classify the known HBM genes into HBM (sub)groups according to uniform Gene Ontology (GO) terminology. This classification system may aid in hypothesis generation, for both wet lab experimental design and clinical genetic screening strategies. We discuss how functional genomics can shape discovery of novel HBM genes and/or mechanisms in the future, through implementation of omics assessments in existing and future model systems. Finally, we address strategies to improve gene identification in unsolved HBM cases and highlight the importance for cross-laboratory collaborations encompassing multidisciplinary efforts to transfer knowledge generated at the bench to the clinic. © 2022 The Authors. Journal of Bone and Mineral Research published by Wiley Periodicals LLC on behalf of American Society for Bone and Mineral Research (ASBMR).
Collapse
Affiliation(s)
- Dylan J M Bergen
- School of Physiology, Pharmacology, and Neuroscience, Faculty of Life Sciences, University of Bristol, Bristol, UK.,Musculoskeletal Research Unit, Translational Health Sciences, Bristol Medical School, Faculty of Health Sciences, University of Bristol, Bristol, UK
| | - Antonio Maurizi
- Department of Biotechnological and Applied Clinical Sciences, University of L'Aquila, L'Aquila, Italy
| | - Melissa M Formosa
- Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, Malta.,Center for Molecular Medicine and Biobanking, University of Malta, Msida, Malta
| | - Georgina L K McDonald
- School of Physiology, Pharmacology, and Neuroscience, Faculty of Life Sciences, University of Bristol, Bristol, UK
| | - Ahmed El-Gazzar
- Department of Paediatrics and Adolescent Medicine, Johannes Kepler University Linz, Linz, Austria
| | - Neelam Hassan
- Musculoskeletal Research Unit, Translational Health Sciences, Bristol Medical School, Faculty of Health Sciences, University of Bristol, Bristol, UK
| | | | - José A Riancho
- Department of Internal Medicine, Hospital U M Valdecilla, University of Cantabria, IDIVAL, Santander, Spain
| | - Fernando Rivadeneira
- Department of Internal Medicine, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Evangelia Ntzani
- Department of Hygiene and Epidemiology, Medical School, University of Ioannina, Ioannina, Greece.,Center for Evidence Synthesis in Health, Policy and Practice, Center for Research Synthesis in Health, School of Public Health, Brown University, Providence, RI, USA.,Institute of Biosciences, University Research Center of loannina, University of Ioannina, Ioannina, Greece
| | - Emma L Duncan
- Department of Twin Research & Genetic Epidemiology, School of Life Course Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK.,Department of Endocrinology, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - Celia L Gregson
- Musculoskeletal Research Unit, Translational Health Sciences, Bristol Medical School, Faculty of Health Sciences, University of Bristol, Bristol, UK
| | - Douglas P Kiel
- Marcus Institute for Aging Research, Hebrew SeniorLife and Department of Medicine Beth Israel Deaconess Medical Center and Harvard Medical School, Broad Institute of MIT & Harvard, Cambridge, MA, USA
| | - M Carola Zillikens
- Department of Internal Medicine, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Luca Sangiorgi
- Department of Rare Skeletal Diseases, IRCCS Rizzoli Orthopaedic Institute, Bologna, Italy
| | - Wolfgang Högler
- Department of Paediatrics and Adolescent Medicine, Johannes Kepler University Linz, Linz, Austria.,Institute of Metabolism and Systems Research, University of Birmingham, Birmingham, UK
| | | | - Outi Mäkitie
- Children's Hospital, University of Helsinki and Helsinki University Hospital, Helsinki, Finland.,Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland.,Folkhälsan Research Centre, Folkhälsan Institute of Genetics, Helsinki, Finland
| | - Wim Van Hul
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium
| | | |
Collapse
|
32
|
Senanayake A, Gamaarachchi H, Herath D, Ragel R. DeepSelectNet: deep neural network based selective sequencing for oxford nanopore sequencing. BMC Bioinformatics 2023; 24:31. [PMID: 36709261 PMCID: PMC9883605 DOI: 10.1186/s12859-023-05151-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 01/17/2023] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND Nanopore sequencing allows selective sequencing, the ability to programmatically reject unwanted reads in a sample. Selective sequencing has many present and future applications in genomics research and the classification of species from a pool of species is an example. Existing methods for selective sequencing for species classification are still immature and the accuracy highly varies depending on the datasets. For the five datasets we tested, the accuracy of existing methods varied in the range of [Formula: see text] 77 to 97% (average accuracy < 89%). Here we present DeepSelectNet, an accurate deep-learning-based method that can directly classify nanopore current signals belonging to a particular species. DeepSelectNet utilizes novel data preprocessing techniques and improved neural network architecture for regularization. RESULTS For the five datasets tested, DeepSelectNet's accuracy varied between [Formula: see text] 91 and 99% (average accuracy [Formula: see text] 95%). At its best performance, DeepSelectNet achieved a nearly 12% accuracy increase compared to its deep learning-based predecessor SquiggleNet. Furthermore, precision and recall evaluated for DeepSelectNet on average were always > 89% (average [Formula: see text] 95%). In terms of execution performance, DeepSelectNet outperformed SquiggleNet by [Formula: see text] 13% on average. Thus, DeepSelectNet is a practically viable method to improve the effectiveness of selective sequencing. CONCLUSIONS Compared to base alignment and deep learning predecessors, DeepSelectNet can significantly improve the accuracy to enable real-time species classification using selective sequencing. The source code of DeepSelectNet is available at https://github.com/AnjanaSenanayake/DeepSelectNet .
Collapse
Affiliation(s)
- Anjana Senanayake
- Department of Computer Engineering, University of Peradeniya, Peradeniya, Sri Lanka.
| | - Hasindu Gamaarachchi
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
- School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
| | - Damayanthi Herath
- Department of Computer Engineering, University of Peradeniya, Peradeniya, Sri Lanka
| | - Roshan Ragel
- Department of Computer Engineering, University of Peradeniya, Peradeniya, Sri Lanka
| |
Collapse
|
33
|
Provost KL, Yang J, Carstens BC. The impacts of fine-tuning, phylogenetic distance, and sample size on big-data bioacoustics. PLoS One 2022; 17:e0278522. [PMID: 36477744 PMCID: PMC9728902 DOI: 10.1371/journal.pone.0278522] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022] Open
Abstract
Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness. While recordings of bioacoustic data have been captured and stored in collections for decades, the automated extraction of data from these recordings has only recently been facilitated by artificial intelligence methods. These have yet to be evaluated with respect to accuracy of different automation strategies and features. Here, we use a recently published machine learning framework to extract syllables from ten bird species ranging in their phylogenetic relatedness from 1 to 85 million years, to compare how phylogenetic relatedness influences accuracy. We also evaluate the utility of applying trained models to novel species. Our results indicate that model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa. However, we also find that the application of models trained on multiple distantly related species can improve the overall accuracy to levels near that of training and analyzing a model on the same species. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.
Collapse
Affiliation(s)
- Kaiya L. Provost
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, Ohio, United States of America
| | - Jiaying Yang
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, Ohio, United States of America
| | - Bryan C. Carstens
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, Ohio, United States of America
| |
Collapse
|
34
|
Catacalos C, Krohannon A, Somalraju S, Meyer KD, Janga SC, Chakrabarti K. Epitranscriptomics in parasitic protists: Role of RNA chemical modifications in posttranscriptional gene regulation. PLoS Pathog 2022; 18:e1010972. [PMID: 36548245 PMCID: PMC9778586 DOI: 10.1371/journal.ppat.1010972] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
"Epitranscriptomics" is the new RNA code that represents an ensemble of posttranscriptional RNA chemical modifications, which can precisely coordinate gene expression and biological processes. There are several RNA base modifications, such as N6-methyladenosine (m6A), 5-methylcytosine (m5C), and pseudouridine (Ψ), etc. that play pivotal roles in fine-tuning gene expression in almost all eukaryotes and emerging evidences suggest that parasitic protists are no exception. In this review, we primarily focus on m6A, which is the most abundant epitranscriptomic mark and regulates numerous cellular processes, ranging from nuclear export, mRNA splicing, polyadenylation, stability, and translation. We highlight the universal features of spatiotemporal m6A RNA modifications in eukaryotic phylogeny, their homologs, and unique processes in 3 unicellular parasites-Plasmodium sp., Toxoplasma sp., and Trypanosoma sp. and some technological advances in this rapidly developing research area that can significantly improve our understandings of gene expression regulation in parasites.
Collapse
Affiliation(s)
- Cassandra Catacalos
- Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Alexander Krohannon
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis (IUPUI), Indianapolis, Indiana, United States of America
| | - Sahiti Somalraju
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis (IUPUI), Indianapolis, Indiana, United States of America
| | - Kate D. Meyer
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - Sarath Chandra Janga
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis (IUPUI), Indianapolis, Indiana, United States of America
| | - Kausik Chakrabarti
- Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| |
Collapse
|
35
|
White LK, Hesselberth JR. Modification mapping by nanopore sequencing. Front Genet 2022; 13:1037134. [PMID: 36386798 PMCID: PMC9650216 DOI: 10.3389/fgene.2022.1037134] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 10/07/2022] [Indexed: 06/26/2024] Open
Abstract
Next generation sequencing (NGS) has provided biologists with an unprecedented view into biological processes and their regulation over the past 2 decades, fueling a wave of development of high throughput methods based on short read DNA and RNA sequencing. For nucleic acid modifications, NGS has been coupled with immunoprecipitation, chemical treatment, enzymatic treatment, and/or the use of reverse transcriptase enzymes with fortuitous activities to enrich for and to identify covalent modifications of RNA and DNA. However, the majority of nucleic acid modifications lack commercial monoclonal antibodies, and mapping techniques that rely on chemical or enzymatic treatments to manipulate modification signatures add additional technical complexities to library preparation. Moreover, such approaches tend to be specific to a single class of RNA or DNA modification, and generate only indirect readouts of modification status. Third generation sequencing technologies such as the commercially available "long read" platforms from Pacific Biosciences and Oxford Nanopore Technologies are an attractive alternative for high throughput detection of nucleic acid modifications. While the former can indirectly sense modified nucleotides through changes in the kinetics of reverse transcription reactions, nanopore sequencing can in principle directly detect any nucleic acid modification that produces a signal distortion as the nucleic acid passes through a nanopore sensor embedded within a charged membrane. To date, more than a dozen endogenous DNA and RNA modifications have been interrogated by nanopore sequencing, as well as a number of synthetic nucleic acid modifications used in metabolic labeling, structure probing, and other emerging applications. This review is intended to introduce the reader to nanopore sequencing and key principles underlying its use in direct detection of nucleic acid modifications in unamplified DNA or RNA samples, and outline current approaches for detecting and quantifying nucleic acid modifications by nanopore sequencing. As this technology matures, we anticipate advances in both sequencing chemistry and analysis methods will lead to rapid improvements in the identification and quantification of these epigenetic marks.
Collapse
Affiliation(s)
| | - Jay R. Hesselberth
- Department of Biochemistry and Molecular Genetics, RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, CO, United States
| |
Collapse
|
36
|
Zhao X, Zhang Y, Hang D, Meng J, Wei Z. Detecting RNA modification using direct RNA sequencing: A systematic review. Comput Struct Biotechnol J 2022; 20:5740-5749. [PMID: 36382183 PMCID: PMC9619219 DOI: 10.1016/j.csbj.2022.10.023] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 10/16/2022] [Accepted: 10/16/2022] [Indexed: 11/28/2022] Open
Abstract
Post-transcriptional RNA modifications are involved in a range of important cellular processes, including the regulation of gene expression and fine-tuning of the functions of RNA molecules. To decipher the context-specific functions of these post-transcriptional modifications, it is crucial to accurately determine their transcriptomic locations and modification levels under a given cellular condition. With the newly emerged sequencing technology, especially nanopore direct RNA sequencing, different RNA modifications can be detected simultaneously with a single molecular level resolution. Here we provide a systematic review of 15 published RNA modification prediction tools based on direct RNA sequencing data, including their computational models, input-output formats, supported modification types, and reported performances. Finally, we also discussed the potential challenges and future improvements of nanopore sequencing-based methods for RNA modification detection.
Collapse
Affiliation(s)
- Xichen Zhao
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, 215123 Suzhou, Jiangsu, China
| | - Yuxin Zhang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, 215123 Suzhou, Jiangsu, China
- Institute of Systems, Molecular and Integrative Biology, L69 7ZB Liverpool, UK
| | - Daiyun Hang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, 215123 Suzhou, Jiangsu, China
- Department of Computer Science, University of Liverpool, L69 7ZB Liverpool, UK
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, 215123 Suzhou, Jiangsu, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, 215123 Suzhou, Jiangsu, China
- Institute of Systems, Molecular and Integrative Biology, L69 7ZB Liverpool, UK
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, 215123 Suzhou, Jiangsu, China
- Institute of Life Course and Medical Sciences, L69 7ZB Liverpool, UK
| |
Collapse
|
37
|
Song Z, Liang Y, Yang J. Nanopore Detection Assisted DNA Information Processing. NANOMATERIALS (BASEL, SWITZERLAND) 2022; 12:nano12183135. [PMID: 36144924 PMCID: PMC9504103 DOI: 10.3390/nano12183135] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 09/04/2022] [Accepted: 09/06/2022] [Indexed: 05/27/2023]
Abstract
The deoxyribonucleotide (DNA) molecule is a stable carrier for large amounts of genetic information and provides an ideal storage medium for next-generation information processing technologies. Technologies that process DNA information, representing a cross-disciplinary integration of biology and computer techniques, have become attractive substitutes for technologies that process electronic information alone. The detailed applications of DNA technologies can be divided into three components: storage, computing, and self-assembly. The quality of DNA information processing relies on the accuracy of DNA reading. Nanopore detection allows researchers to accurately sequence nucleotides and is thus widely used to read DNA. In this paper, we introduce the principles and development history of nanopore detection and conduct a systematic review of recent developments and specific applications in DNA information processing involving nanopore detection and nanopore-based storage. We also discuss the potential of artificial intelligence in nanopore detection and DNA information processing. This work not only provides new avenues for future nanopore detection development, but also offers a foundation for the construction of more advanced DNA information processing technologies.
Collapse
Affiliation(s)
- Zichen Song
- School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
| | - Yuan Liang
- Department of Computer Science and Technology, School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
| | - Jing Yang
- School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
| |
Collapse
|
38
|
Yeh YM, Lu YC. MSRCall: a multi-scale deep neural network to basecall Oxford Nanopore sequences. Bioinformatics 2022; 38:3877-3884. [PMID: 35766808 DOI: 10.1093/bioinformatics/btac435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 05/05/2022] [Accepted: 06/27/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION MinION, a third-generation sequencer from Oxford Nanopore Technologies, is a portable device that can provide long-nucleotide read data in real-time. It primarily aims to deduce the makeup of nucleotide sequences from the ionic current signals generated when passing DNA/RNA fragments through nanopores charged with a voltage difference. To determine nucleotides from measured signals, a translation process known as basecalling is required. However, compared to NGS basecallers, the calling accuracy of MinION still needs to be improved. RESULTS In this work, a simple but powerful neural network architecture called multi-scale recurrent caller (MSRCall) is proposed. MSRCall comprises a multi-scale structure, recurrent layers, a fusion block and a connectionist temporal classification decoder. To better identify both short-and long-range dependencies, the recurrent layer is redesigned to capture various time-scale features with a multi-scale structure. The results show that MSRCall outperforms other basecallers in terms of both read and consensus accuracies. AVAILABILITY AND IMPLEMENTATION MSRCall is available at: https://github.com/d05943006/MSRCall. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang-Ming Yeh
- Graduate Institute of Electronics Engineering, National Taiwan University, Taipei City 106319, Taiwan
| | - Yi-Chang Lu
- Graduate Institute of Electronics Engineering, National Taiwan University, Taipei City 106319, Taiwan
| |
Collapse
|
39
|
Lee Y, Ha U, Moon S. Ongoing endeavors to detect mobilization of transposable elements. BMB Rep 2022. [PMID: 35725016 PMCID: PMC9340088 DOI: 10.5483/bmbrep.2022.55.7.088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Transposable elements (TEs) are DNA sequences capable of mobilization from one location to another in the genome. Since the discovery of ‘Dissociation (Dc) locus’ by Barbara McClintock in maize (1), mounting evidence in the era of genomics indicates that a significant fraction of most eukaryotic genomes is composed of TE sequences, involving in various aspects of biological processes such as development, physiology, diseases and evolution. Although technical advances in genomics have discovered numerous functional impacts of TE across species, our understanding of TEs is still ongoing process due to challenges resulted from complexity and abundance of TEs in the genome. In this mini-review, we briefly summarize biology of TEs and their impacts on the host genome, emphasizing importance of understanding TE landscape in the genome. Then, we introduce recent endeavors especially in vivo retrotransposition assays and long read sequencing technology for identifying de novo insertions/TE polymorphism, which will broaden our knowledge of extraordinary relationship between genomic cohabitants and their host.
Collapse
Affiliation(s)
- Yujeong Lee
- Department of Biological Sciences, Kangwon National University, Chuncheon 24341, Korea
| | - Una Ha
- Department of Biological Sciences, Kangwon National University, Chuncheon 24341, Korea
| | - Sungjin Moon
- Department of Biological Sciences, Kangwon National University, Chuncheon 24341, Korea
| |
Collapse
|
40
|
Danilevsky A, Polsky AL, Shomron N. Adaptive sequencing using nanopores and deep learning of mitochondrial DNA. Brief Bioinform 2022; 23:6634223. [PMID: 35804265 DOI: 10.1093/bib/bbac251] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 05/13/2022] [Accepted: 05/30/2022] [Indexed: 12/24/2022] Open
Abstract
Nanopore sequencing is an emerging technology that reads DNA by utilizing a unique method of detecting nucleic acid sequences and identifies the various chemical modifications they carry. Deep learning has increased in popularity as a useful technique to solve many complex computational tasks. 'Adaptive sequencing' is an implementation of selective sequencing, intended for use on the nanopore sequencing platform. In this study, we demonstrated an alternative method of software-based selective sequencing that is performed in real time by combining nanopore sequencing and deep learning. Our results showed the feasibility of using deep learning for classifying signals from only the first 200 nucleotides in a raw nanopore sequencing signal format. This was further demonstrated by comparing the accuracy of our deep learning classification model across data from several human cell lines and other eukaryotic organisms. We used custom deep learning models and a script that utilizes a 'Read Until' framework to target mitochondrial molecules in real time from a human cell line sample. This achieved a significant separation and enrichment ability of 2.3-fold. In a series of very short sequencing experiments (10, 30 and 120 min), we identified genomic and mitochondrial reads with accuracy above 90%, although mitochondrial DNA comprised only 0.1% of the total input material. The uniqueness of our method is the ability to distinguish two groups of DNA even without a labeled reference. This contrasts with studies that required a well-defined reference, whether of a DNA sequence or of another type of representation. Additionally, our method showed higher correlation to the theoretically possible enrichment factor, compared with other published methods. We believe that our results will lay the foundation for rapid and selective sequencing using nanopore technology and will pave the approach for clinical applications that use nanopore sequencing data.
Collapse
Affiliation(s)
- Artem Danilevsky
- Faculty of Medicine and Edmond J Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv 69978, Israel
| | - Avital Luba Polsky
- Faculty of Medicine and Edmond J Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv 69978, Israel
| | - Noam Shomron
- Faculty of Medicine and Edmond J Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
41
|
Lee Y, Ha U, Moon S. Ongoing endeavors to detect mobilization of transposable elements. BMB Rep 2022; 55:305-315. [PMID: 35725016 PMCID: PMC9340088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 05/26/2022] [Accepted: 06/14/2022] [Indexed: 02/21/2025] Open
Abstract
Transposable elements (TEs) are DNA sequences capable of mobilization from one location to another in the genome. Since the discovery of 'Dissociation (Dc) locus' by Barbara McClintock in maize (1), mounting evidence in the era of genomics indicates that a significant fraction of most eukaryotic genomes is composed of TE sequences, involving in various aspects of biological processes such as development, physiology, diseases and evolution. Although technical advances in genomics have discovered numerous functional impacts of TE across species, our understanding of TEs is still ongoing process due to challenges resulted from complexity and abundance of TEs in the genome. In this mini-review, we briefly summarize biology of TEs and their impacts on the host genome, emphasizing importance of understanding TE landscape in the genome. Then, we introduce recent endeavors especially in vivo retrotransposition assays and long read sequencing technology for identifying de novo insertions/TE polymorphism, which will broaden our knowledge of extraordinary relationship between genomic cohabitants and their host. [BMB Reports 2022; 55(7): 305-315].
Collapse
Affiliation(s)
- Yujeong Lee
- Department of Biological Sciences, Kangwon National University, Chuncheon 24341, Korea
| | - Una Ha
- Department of Biological Sciences, Kangwon National University, Chuncheon 24341, Korea
| | - Sungjin Moon
- Department of Biological Sciences, Kangwon National University, Chuncheon 24341, Korea
| |
Collapse
|
42
|
Quazi S. Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 2022; 39:120. [PMID: 35704152 PMCID: PMC9198206 DOI: 10.1007/s12032-022-01711-1] [Citation(s) in RCA: 91] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/14/2022] [Indexed: 10/28/2022]
Abstract
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Collapse
Affiliation(s)
- Sameer Quazi
- GenLab Biosolutions Private Limited, Bangalore, Karnataka, 560043, India.
- Department of Biomedical Sciences, School of Life Sciences, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
43
|
Abstract
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Collapse
Affiliation(s)
- Sameer Quazi
- GenLab Biosolutions Private Limited, Bangalore, Karnataka, 560043, India.
- Department of Biomedical Sciences, School of Life Sciences, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
44
|
Senel E, Rajewsky N, Karaiskos N. Optocoder: computational decoding of spatially indexed bead arrays. NAR Genom Bioinform 2022; 4:lqac042. [PMID: 35685220 PMCID: PMC9172073 DOI: 10.1093/nargab/lqac042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 04/27/2022] [Accepted: 05/16/2022] [Indexed: 12/18/2022] Open
Abstract
Advancing technologies that quantify gene expression in space are transforming contemporary biology research. A class of spatial transcriptomics methods uses barcoded bead arrays that are optically decoded via microscopy and are later matched to sequenced data from the respective libraries. To obtain a detailed representation of the tissue in space, robust and efficient computational pipelines are required to process microscopy images and accurately basecall the bead barcodes. Optocoder is a computational framework that processes microscopy images to decode bead barcodes in space. It efficiently aligns images, detects beads, and corrects for confounding factors of the fluorescence signal, such as crosstalk and phasing. Furthermore, Optocoder employs supervised machine learning to strongly increase the number of matches between optically decoded and sequenced barcodes. We benchmark Optocoder using data from an in-house spatial transcriptomics platform, as well as from Slide-Seq(V2), and we show that it efficiently processes all datasets without modification. Optocoder is publicly available, open-source and provided as a stand-alone Python package on GitHub: https://github.com/rajewsky-lab/optocoder.
Collapse
Affiliation(s)
- Enes Senel
- Systems Biology of Gene Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Humboldt-Universität zu Berlin, Institut für Biologie, 10099 Berlin, Germany
| | - Nikolaus Rajewsky
- Systems Biology of Gene Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Humboldt-Universität zu Berlin, Institut für Biologie, 10099 Berlin, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Berlin, Berlin, Germany
- Department of Pediatric Oncology, Universitätsmedizin Charité, Berlin, Germany
| | - Nikos Karaiskos
- Systems Biology of Gene Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| |
Collapse
|
45
|
Borowiec ML, Dikow RB, Frandsen PB, McKeeken A, Valentini G, White AE. Deep learning as a tool for ecology and evolution. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13901] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Marek L. Borowiec
- Entomology, Plant Pathology and Nematology University of Idaho Moscow ID USA
- Institute for Bioinformatics and Evolutionary Studies (IBEST) University of Idaho Moscow ID USA
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
| | - Paul B. Frandsen
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
- Department of Plant and Wildlife Sciences Brigham Young University Provo UT USA
| | - Alexander McKeeken
- Entomology, Plant Pathology and Nematology University of Idaho Moscow ID USA
| | | | - Alexander E. White
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
- Department of Botany, National Museum of Natural History Smithsonian Institution Washington DC USA
| |
Collapse
|
46
|
Bhat GR, Sethi I, Rah B, Kumar R, Afroze D. Innovative in Silico Approaches for Characterization of Genes and Proteins. Front Genet 2022; 13:865182. [PMID: 35664302 PMCID: PMC9159363 DOI: 10.3389/fgene.2022.865182] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
Bioinformatics is an amalgamation of biology, mathematics and computer science. It is a science which gathers the information from biology in terms of molecules and applies the informatic techniques to the gathered information for understanding and organizing the data in a useful manner. With the help of bioinformatics, the experimental data generated is stored in several databases available online like nucleotide database, protein databases, GENBANK and others. The data stored in these databases is used as reference for experimental evaluation and validation. Till now several online tools have been developed to analyze the genomic, transcriptomic, proteomics, epigenomics and metabolomics data. Some of them include Human Splicing Finder (HSF), Exonic Splicing Enhancer Mutation taster, and others. A number of SNPs are observed in the non-coding, intronic regions and play a role in the regulation of genes, which may or may not directly impose an effect on the protein expression. Many mutations are thought to influence the splicing mechanism by affecting the existing splice sites or creating a new sites. To predict the effect of mutation (SNP) on splicing mechanism/signal, HSF was developed. Thus, the tool is helpful in predicting the effect of mutations on splicing signals and can provide data even for better understanding of the intronic mutations that can be further validated experimentally. Additionally, rapid advancement in proteomics have steered researchers to organize the study of protein structure, function, relationships, and dynamics in space and time. Thus the effective integration of all of these technological interventions will eventually lead to steering up of next-generation systems biology, which will provide valuable biological insights in the field of research, diagnostic, therapeutic and development of personalized medicine.
Collapse
Affiliation(s)
- Gh. Rasool Bhat
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| | - Itty Sethi
- Institute of Human Genetics, University of Jammu, Jammu, India
| | - Bilal Rah
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| | - Rakesh Kumar
- School of Biotechnology, Shri Mata Vaishno Devi University, Katra, India
| | - Dil Afroze
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| |
Collapse
|
47
|
Neumann D, Reddy ASN, Ben-Hur A. RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data. BMC Bioinformatics 2022; 23:142. [PMID: 35443610 PMCID: PMC9020074 DOI: 10.1186/s12859-022-04686-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 03/30/2022] [Indexed: 11/11/2022] Open
Abstract
Background Despite recent progress in basecalling of Oxford nanopore DNA sequencing data, its wide adoption is still being hampered by its relatively low accuracy compared to short read technologies. Furthermore, very little of the recent research was focused on basecalling of RNA data, which has different characteristics than its DNA counterpart. Results We fill this gap by benchmarking a fully convolutional deep learning basecalling architecture with improved performance compared to Oxford nanopore’s RNA basecallers. Availability The source code for our basecaller is available at: https://github.com/biodlab/RODAN. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04686-y.
Collapse
Affiliation(s)
- Don Neumann
- Department of Computer Science, Colorado State University, 1873 Campus Delivery, Fort Collins, CO, 80523-1873, USA
| | - Anireddy S N Reddy
- Department of Biology, Colorado State University, 1878 Campus Delivery, Fort Collins, CO, 80523-1878, USA
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, 1873 Campus Delivery, Fort Collins, CO, 80523-1873, USA.
| |
Collapse
|
48
|
Napieralski A, Nowak R. Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing. SENSORS (BASEL, SWITZERLAND) 2022; 22:2275. [PMID: 35336445 PMCID: PMC8954548 DOI: 10.3390/s22062275] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 03/13/2022] [Accepted: 03/14/2022] [Indexed: 06/14/2023]
Abstract
Third-generation DNA sequencers provided by Oxford Nanopore Technologies (ONT) produce a series of samples of an electrical current in the nanopore. Such a time series is used to detect the sequence of nucleotides. The task of translation of current values into nucleotide symbols is called basecalling. Various solutions for basecalling have already been proposed. The earlier ones were based on Hidden Markov Models, but the best ones use neural networks or other machine learning models. Unfortunately, achieved accuracy scores are still lower than competitive sequencing techniques, like Illumina's. Basecallers differ in the input data type-currently, most of them work on a raw data straight from the sequencer (time series of current). Still, the approach of using event data is also explored. Event data is obtained by preprocessing of raw data and dividing it into segments described by several features computed from raw data values within each segment. We propose a novel basecaller that uses joint processing of raw and event data. We define basecalling as a sequence-to-sequence translation, and we use a machine learning model based on an encoder-decoder architecture of recurrent neural networks. Our model incorporates twin encoders and an attention mechanism. We tested our solution on simulated and real datasets. We compare the full model accuracy results with its components: processing only raw or event data. We compare our solution with the existing ONT basecaller-Guppy. Results of numerical experiments show that joint raw and event data processing provides better basecalling accuracy than processing each data type separately. We implement an application called Ravvent, freely available under MIT licence.
Collapse
Affiliation(s)
- Adam Napieralski
- Institute of Computer Science, Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, Poland;
| | | |
Collapse
|
49
|
Artificial Intelligence and Cardiovascular Genetics. Life (Basel) 2022; 12:life12020279. [PMID: 35207566 PMCID: PMC8875522 DOI: 10.3390/life12020279] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/26/2022] [Accepted: 02/09/2022] [Indexed: 12/13/2022] Open
Abstract
Polygenic diseases, which are genetic disorders caused by the combined action of multiple genes, pose unique and significant challenges for the diagnosis and management of affected patients. A major goal of cardiovascular medicine has been to understand how genetic variation leads to the clinical heterogeneity seen in polygenic cardiovascular diseases (CVDs). Recent advances and emerging technologies in artificial intelligence (AI), coupled with the ever-increasing availability of next generation sequencing (NGS) technologies, now provide researchers with unprecedented possibilities for dynamic and complex biological genomic analyses. Combining these technologies may lead to a deeper understanding of heterogeneous polygenic CVDs, better prognostic guidance, and, ultimately, greater personalized medicine. Advances will likely be achieved through increasingly frequent and robust genomic characterization of patients, as well the integration of genomic data with other clinical data, such as cardiac imaging, coronary angiography, and clinical biomarkers. This review discusses the current opportunities and limitations of genomics; provides a brief overview of AI; and identifies the current applications, limitations, and future directions of AI in genomics.
Collapse
|
50
|
Rabbi F, Dabbagh SR, Angin P, Yetisen AK, Tasoglu S. Deep Learning-Enabled Technologies for Bioimage Analysis. MICROMACHINES 2022; 13:mi13020260. [PMID: 35208385 PMCID: PMC8880650 DOI: 10.3390/mi13020260] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 01/31/2022] [Accepted: 02/03/2022] [Indexed: 02/05/2023]
Abstract
Deep learning (DL) is a subfield of machine learning (ML), which has recently demonstrated its potency to significantly improve the quantification and classification workflows in biomedical and clinical applications. Among the end applications profoundly benefitting from DL, cellular morphology quantification is one of the pioneers. Here, we first briefly explain fundamental concepts in DL and then we review some of the emerging DL-enabled applications in cell morphology quantification in the fields of embryology, point-of-care ovulation testing, as a predictive tool for fetal heart pregnancy, cancer diagnostics via classification of cancer histology images, autosomal polycystic kidney disease, and chronic kidney diseases.
Collapse
Affiliation(s)
- Fazle Rabbi
- Department of Mechanical Engineering, Koç University, Sariyer, Istanbul 34450, Turkey; (F.R.); (S.R.D.)
| | - Sajjad Rahmani Dabbagh
- Department of Mechanical Engineering, Koç University, Sariyer, Istanbul 34450, Turkey; (F.R.); (S.R.D.)
- Koç University Arçelik Research Center for Creative Industries (KUAR), Koç University, Sariyer, Istanbul 34450, Turkey
- Koc University Is Bank Artificial Intelligence Lab (KUIS AILab), Koç University, Sariyer, Istanbul 34450, Turkey
| | - Pelin Angin
- Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey;
| | - Ali Kemal Yetisen
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK;
| | - Savas Tasoglu
- Department of Mechanical Engineering, Koç University, Sariyer, Istanbul 34450, Turkey; (F.R.); (S.R.D.)
- Koç University Arçelik Research Center for Creative Industries (KUAR), Koç University, Sariyer, Istanbul 34450, Turkey
- Koc University Is Bank Artificial Intelligence Lab (KUIS AILab), Koç University, Sariyer, Istanbul 34450, Turkey
- Institute of Biomedical Engineering, Boğaziçi University, Çengelköy, Istanbul 34684, Turkey
- Physical Intelligence Department, Max Planck Institute for Intelligent Systems, 70569 Stuttgart, Germany
- Correspondence:
| |
Collapse
|