1
|
Herbert J, Thompson S, Beckett AH, Robson SC. Impact of microbiological molecular methodologies on adaptive sampling using nanopore sequencing in metagenomic studies. ENVIRONMENTAL MICROBIOME 2025; 20:47. [PMID: 40325409 PMCID: PMC12054170 DOI: 10.1186/s40793-025-00704-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 04/08/2025] [Indexed: 05/07/2025]
Abstract
INTRODUCTION Metagenomics, the genomic analysis of all species present within a mixed population, is an important tool used for the exploration of microbiomes in clinical and environmental microbiology. Whilst the development of next-generation sequencing, and more recently third generation long-read approaches such as nanopore sequencing, have greatly advanced the study of metagenomics, recovery of unbiased material from microbial populations remains challenging. One promising advancement in genomic sequencing from Oxford Nanopore Technologies (ONT) is adaptive sampling, which enables real-time enrichment or depletion of target sequences. As sequencing technologies continue to develop, and advances such as adaptive sampling become common techniques within the microbiological toolkit, it is essential to evaluate the benefits of such advancements to metagenomic studies, and the impact of methodological choices on research outcomes. AIM AND METHODS Given the rapid development of sequencing tools and chemistry, this study aimed to demonstrate the impacts of choice of DNA extraction kit and sequencing chemistry on downstream metagenomic analyses. We first explored the quality and accuracy of 16S rRNA amplicon sequencing for DNA extracted from the ZymoBIOMICS Microbial Community Standard, using a range of commercially available DNA extraction kits to understand the effects of different kit biases on assessment of microbiome composition. We next compared the quality and accuracy of metagenomic analyses for two nanopore-based ligation chemistry kits with differing levels of base-calling error; the older and more error-prone (~ 97% accuracy) LSK109 chemistry, and newer more accurate (~ 99% accuracy) LSK112 Q20 + chemistry. Finally, we assessed the impact of the nanopore sequencing chemistry version on the output of the novel adaptive sampling approach for real-time enrichment of the genome for the yeast Saccharomyces cerevisiae from the microbial community. RESULTS Firstly, DNA extraction kit methodology impacted the composition of the yield, with mechanical bead-beating methodologies providing the least biased picture due to efficient lysis of Gram-positive microbes present in the community standard, with differences in bead-beating methodologies also producing variation in composition. Secondly, whilst use of the Q20 + nanopore sequencing kit chemistry improved the base-calling data quality, the resulting metagenomic assemblies were not significantly improved based on common metrics and assembly statistics. Most importantly, we demonstrated the effective application of adaptive sampling for enriching a low-abundance genome within a metagenomic sample. This resulted in a 5-7-fold increase in target enrichment compared to non-adaptive sequencing, despite a reduction in overall sequencing throughput due to strand-rejection processes. Interestingly, no significant differences in adaptive sampling enrichment efficiency were observed between the older and newer ONT sequencing chemistries, suggesting that adaptive sampling performs consistently across different library preparation kits. CONCLUSION Our findings underscore the importance of selecting a DNA extraction methodology that minimises bias to ensure an accurate representation of microbial diversity in metagenomic studies. Additionally, despite the improved base-calling accuracy provided by newer Q20 + sequencing chemistry, we demonstrate that even older ONT sequencing chemistries can achieve reliable metagenomic sequencing results, enabling researchers to confidently use these approaches depending on their specific experimental needs. Critically, we highlight the significant potential of ONT's adaptive sampling technology for targeted enrichment of specific genomes within metagenomic samples. This approach offers broad applicability for enriching target organisms or genetic elements (e.g., pathogens or plasmids) or depleting unwanted DNA (e.g., host DNA) in diverse sample types from environmental and clinical studies. However, researchers should carefully weigh the benefits of adaptive sampling against the potential trade-offs in sequencing throughput, particularly for low-abundance targets, where strand rejection can lead to pore blocking. These results provide valuable guidance for optimising adaptive sampling in metagenomic workflows to achieve specific research objectives.
Collapse
Affiliation(s)
- Josephine Herbert
- Centre for Enzyme Innovation, University of Portsmouth, Portsmouth, Hampshire, PO1 2DT, UK
- Institute of Life Sciences and Healthcare, University of Portsmouth, Portsmouth, Hampshire, PO1 2DT, UK
| | - Stanley Thompson
- Institute of Life Sciences and Healthcare, University of Portsmouth, Portsmouth, Hampshire, PO1 2DT, UK
- Department of Life Sciences, University of Bath, Bath, BA2 7AY, UK
| | - Angela H Beckett
- Centre for Enzyme Innovation, University of Portsmouth, Portsmouth, Hampshire, PO1 2DT, UK
- Institute of Life Sciences and Healthcare, University of Portsmouth, Portsmouth, Hampshire, PO1 2DT, UK
| | - Samuel C Robson
- Centre for Enzyme Innovation, University of Portsmouth, Portsmouth, Hampshire, PO1 2DT, UK.
- Institute of Life Sciences and Healthcare, University of Portsmouth, Portsmouth, Hampshire, PO1 2DT, UK.
| |
Collapse
|
2
|
Belinchon-Moreno J, Berard A, Canaguier A, Chovelon V, Cruaud C, Engelen S, Feriche-Linares R, Le-Clainche I, Marande W, Rittener-Ruff V, Lagnel J, Hinsinger D, Boissot N, Faivre-Rampant P. Nanopore adaptive sampling to identify the NLR gene family in melon (Cucumis melo L.). BMC Genomics 2025; 26:126. [PMID: 39930362 PMCID: PMC11808957 DOI: 10.1186/s12864-025-11295-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 01/27/2025] [Indexed: 02/14/2025] Open
Abstract
BACKGROUND Nanopore adaptive sampling (NAS) offers a promising approach for assessing genetic diversity in targeted genomic regions. Here we designed and validated an experiment to enrich a set of resistance genes in several melon cultivars as a proof of concept. RESULTS Using the same reference to guide read acceptance or rejection with NAS, we successfully and accurately reconstructed the 15 regions in two newly assembled ssp. melo genomes and in a third ssp. agrestis cultivar. We obtained fourfold enrichment regardless of the tested samples, but with some variations according to the enriched regions. The accuracy of our assembly was further confirmed by PCR in the agrestis cultivar. We discussed parameters that could influence the enrichment and accuracy of NAS generated assemblies. CONCLUSIONS Overall, we demonstrated that NAS is a simple and efficient approach for exploring complex genomic regions, such as clusters of Nucleotide-binding site leucine-rich repeat (NLR) resistance genes. These regions are characterized by containing a high number of copy number variations, presence-absence polymorphisms and repetitive elements. These features make accurate assembly challenging but are crucial to study due to their central role in plant immunity and disease resistance. This approach facilitates resistance gene characterization in a large number of individuals, as required when breeding new cultivars suitable for the agroecological transition.
Collapse
Affiliation(s)
- Javier Belinchon-Moreno
- Université Paris-Saclay, Centre INRAE Île-de-France Versailles-Saclay, EPGV, Evry, 91057, France
- INRAE, Génétique et Amélioration des Fruits et Légumes, Montfavet, 84143, France
| | - Aurélie Berard
- Université Paris-Saclay, Centre INRAE Île-de-France Versailles-Saclay, EPGV, Evry, 91057, France
| | - Aurélie Canaguier
- Université Paris-Saclay, Centre INRAE Île-de-France Versailles-Saclay, EPGV, Evry, 91057, France
| | - Véronique Chovelon
- INRAE, Génétique et Amélioration des Fruits et Légumes, Montfavet, 84143, France
| | - Corinne Cruaud
- Commissariat à l'Energie Atomique (CEA), Genoscope, Institut de Biologie François-Jacob, Université Paris-Saclay, 2 Rue Gaston Crémieux, Evry, 91057, France
| | - Stéfan Engelen
- Génomique Métabolique, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), CNRS, Univ. Evry, Université Paris-Saclay, Genoscope, Evry, 91057, France
| | | | - Isabelle Le-Clainche
- Université Paris-Saclay, Centre INRAE Île-de-France Versailles-Saclay, EPGV, Evry, 91057, France
| | - William Marande
- INRAE, Centre National de Ressources Génomiques Végétales, Castanet-Tolosan, 31326, France
| | | | - Jacques Lagnel
- INRAE, Génétique et Amélioration des Fruits et Légumes, Montfavet, 84143, France
| | - Damien Hinsinger
- Université Paris-Saclay, Centre INRAE Île-de-France Versailles-Saclay, EPGV, Evry, 91057, France
| | - Nathalie Boissot
- INRAE, Génétique et Amélioration des Fruits et Légumes, Montfavet, 84143, France
| | - Patricia Faivre-Rampant
- Université Paris-Saclay, Centre INRAE Île-de-France Versailles-Saclay, EPGV, Evry, 91057, France.
| |
Collapse
|
3
|
Hunter B, Cromwell T, Shim H. Nanopore sequencing of protozoa: Decoding biological information on a string of biochemical molecules into human-readable signals. Comput Struct Biotechnol J 2025; 27:440-450. [PMID: 39906158 PMCID: PMC11791290 DOI: 10.1016/j.csbj.2025.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 01/04/2025] [Accepted: 01/05/2025] [Indexed: 02/06/2025] Open
Abstract
Biological information is encoded in a sequence of biochemical molecules such as nucleic acids and amino acids, and nanopore sequencing is a long-read sequencing technology capable of directly decoding these molecules into human-readable signals. The long reads from nanopore sequencing offer the advantage of obtaining contiguous information, which is particularly beneficial for decoding complex or repetitive regions in a genome. In this study, we investigated the efficacy of nanopore sequencing in decoding biological information from distinctive genomes in metagenomic samples, which pose significant challenges for traditional short-read sequencing technologies. Specifically, we sequenced blood and fecal samples from mice infected with Trypanosoma brucei, a unicellular protozoan known for its hypervariable and dynamic regions that help it evade host immunity. Such characteristics are also prevalent in other host-dependent parasites, such as bacteriophages. The taxonomic classification results showed a high proportion of nanopore reads identified as T. brucei in the infected blood samples, with no significant identification in the control blood samples and fecal samples. Furthermore, metagenomic de novo assembly of these nanopore reads yielded contigs that mapped to the reference genome of T. brucei in the infected blood samples with over 96 % accuracy. This exploratory work demonstrates the potential of nanopore sequencing for the challenging task of classifying and assembling hypervariable and dynamic genomes from metagenomic samples.
Collapse
Affiliation(s)
- Branden Hunter
- Department of Biology, California State University, 2555 East San Ramon Ave, Fresno, CA 93740, USA
| | - Timothy Cromwell
- Department of Computer Science, California State University, 2576 East San Ramon Ave, Fresno, CA 93740, USA
| | - Hyunjin Shim
- Department of Biology, California State University, 2555 East San Ramon Ave, Fresno, CA 93740, USA
- Center for Biosystems and Biotech Data Science, Ghent University Global Campus, 119-5 Songdomunhwa-ro, Yeonsu-gu, Incheon 21985, South Korea
| |
Collapse
|
4
|
Zhang J, Hou L, Ma L, Cai Z, Ye S, Liu Y, Ji P, Zuo Z, Zhao F. Real-time and programmable transcriptome sequencing with PROFIT-seq. Nat Cell Biol 2024; 26:2183-2194. [PMID: 39443694 PMCID: PMC11628399 DOI: 10.1038/s41556-024-01537-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 09/18/2024] [Indexed: 10/25/2024]
Abstract
The high diversity and complexity of the eukaryotic transcriptome make it difficult to effectively detect specific transcripts of interest. Current targeted RNA sequencing methods often require complex pre-sequencing enrichment steps, which can compromise the comprehensive characterization of the entire transcriptome. Here we describe programmable full-length isoform transcriptome sequencing (PROFIT-seq), a method that enriches target transcripts while maintaining unbiased quantification of the whole transcriptome. PROFIT-seq employs combinatorial reverse transcription to capture polyadenylated, non-polyadenylated and circular RNAs, coupled with a programmable control system that selectively enriches target transcripts during sequencing. This approach achieves over 3-fold increase in effective data yield and reduces the time required for detecting specific pathogens or key mutations by 75%. We applied PROFIT-seq to study colorectal polyp development, revealing the intricate relationship between host immune responses and bacterial infection. PROFIT-seq offers a powerful tool for accurate and efficient sequencing of target transcripts while preserving overall transcriptome quantification, with broad applications in clinical diagnostics and targeted enrichment scenarios.
Collapse
Affiliation(s)
- Jinyang Zhang
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Lingling Hou
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Lianjun Ma
- Endoscopy Center, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Zhengyi Cai
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Shujun Ye
- Endoscopy Center, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Yang Liu
- Endoscopy Center, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Peifeng Ji
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Zhenqiang Zuo
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Fangqing Zhao
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China.
| |
Collapse
|
5
|
Islam Sajib MS, Brunker K, Oravcova K, Everest P, Murphy ME, Forde T. Advances in Host Depletion and Pathogen Enrichment Methods for Rapid Sequencing-Based Diagnosis of Bloodstream Infection. J Mol Diagn 2024; 26:741-753. [PMID: 38925458 DOI: 10.1016/j.jmoldx.2024.05.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 05/05/2024] [Accepted: 05/17/2024] [Indexed: 06/28/2024] Open
Abstract
Bloodstream infection is a major cause of morbidity and death worldwide. Timely and appropriate treatment can reduce mortality among critically ill patients. Current diagnostic methods are too slow to inform precise antibiotic choice, leading to the prescription of empirical antibiotics, which may fail to cover the resistance profile of the pathogen, risking poor patient outcomes. Additionally, overuse of broad-spectrum antibiotics may lead to more resistant organisms, putting further pressure on the dwindling pipeline of antibiotics, and risk transmission of these resistant organisms in the health care environment. Therefore, rapid diagnostics are urgently required to better inform antibiotic choice early in the course of treatment. Sequencing offers great promise in reducing time to microbiological diagnosis; however, the amount of host DNA compared with the pathogen in patient samples presents a significant obstacle. Various host-depletion and bacterial-enrichment strategies have been used in samples, such as saliva, urine, or tissue. However, these methods have yet to be collectively integrated and/or extensively explored for rapid bloodstream infection diagnosis. Although most of these workflows possess individual strengths, their lack of analytical/clinical sensitivity and/or comprehensiveness demands additional improvements or synergistic application. This review provides a distinctive classification system for various methods based on their working principles to guide future research, and discusses their strengths and limitations and explores potential avenues for improvement to assist the reader in workflow selection.
Collapse
Affiliation(s)
- Mohammad S Islam Sajib
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, United Kingdom.
| | - Kirstyn Brunker
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, United Kingdom; Medical Research Council-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom
| | - Katarina Oravcova
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, United Kingdom
| | - Paul Everest
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, United Kingdom
| | - Michael E Murphy
- Department of Microbiology, National Health Service Greater Glasgow and Clyde, Glasgow, United Kingdom; School of Medicine, Dentistry and Nursing, University of Glasgow, Glasgow, United Kingdom
| | - Taya Forde
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
6
|
Firtina C, Soysal M, Lindegger J, Mutlu O. RawHash2: mapping raw nanopore signals using hash-based seeding and adaptive quantization. Bioinformatics 2024; 40:btae478. [PMID: 39078113 PMCID: PMC11333567 DOI: 10.1093/bioinformatics/btae478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 07/04/2024] [Accepted: 07/29/2024] [Indexed: 07/31/2024] Open
Abstract
SUMMARY Raw nanopore signals can be analyzed while they are being generated, a process known as real-time analysis. Real-time analysis of raw signals is essential to utilize the unique features that nanopore sequencing provides, enabling the early stopping of the sequencing of a read or the entire sequencing run based on the analysis. The state-of-the-art mechanism, RawHash, offers the first hash-based efficient and accurate similarity identification between raw signals and a reference genome by quickly matching their hash values. In this work, we introduce RawHash2, which provides major improvements over RawHash, including more sensitive quantization and chaining algorithms, weighted mapping decisions, frequency filters to reduce ambiguous seed hits, minimizers for hash-based sketching, and support for the R10.4 flow cell version and POD5 and SLOW5 file formats. Compared to RawHash, RawHash2 provides better F1 accuracy (on average by 10.57% and up to 20.25%) and better throughput (on average by 4.0× and up to 9.9×) than RawHash. AVAILABILITY AND IMPLEMENTATION RawHash2 is available at https://github.com/CMU-SAFARI/RawHash. We also provide the scripts to fully reproduce our results on our GitHub page.
Collapse
Affiliation(s)
- Can Firtina
- Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8092, Switzerland
| | - Melina Soysal
- Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8092, Switzerland
| | - Joël Lindegger
- Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8092, Switzerland
| | - Onur Mutlu
- Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8092, Switzerland
| |
Collapse
|
7
|
Holthöfer L, Diederich S, Haug V, Lehmann L, Hewel C, Paul NW, Schweiger S, Gerber S, Linke M. A case of an Angelman-syndrome caused by an intragenic duplication of UBE3A uncovered by adaptive nanopore sequencing. Clin Epigenetics 2024; 16:101. [PMID: 39095842 PMCID: PMC11297752 DOI: 10.1186/s13148-024-01711-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 07/22/2024] [Indexed: 08/04/2024] Open
Abstract
Adaptive nanopore sequencing as a diagnostic method for imprinting disorders and episignature analysis revealed an intragenic duplication of Exon 6 and 7 in UBE3A (NM_000462.5) in a patient with relatively mild Angelman-like syndrome. In an all-in-one nanopore sequencing analysis DNA hypomethylation of the SNURF:TSS-DMR, known contributing deletions on the maternal allele and point mutations in UBE3A could be ruled out as disease drivers. In contrast, breakpoints and orientation of the tandem duplication could clearly be defined. Segregation analysis in the family showed that the duplication derived de novo in the maternal grandfather. Our study shows the benefits of an all-in-one nanopore sequencing approach for the diagnostics of Angelman syndrome and other imprinting disorders.
Collapse
Affiliation(s)
- Laura Holthöfer
- Institute for Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Stefan Diederich
- Institute for Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Verena Haug
- Neuropediatrics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Lioba Lehmann
- Institute for Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Charlotte Hewel
- Institute for Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Norbert W Paul
- Institute for History, Philosophy, and Ethics of Medicine, Johannes Gutenberg-University Medical Center Mainz, Mainz, Germany
| | - Susann Schweiger
- Institute for Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Susanne Gerber
- Institute for Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Matthias Linke
- Institute for Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany.
| |
Collapse
|
8
|
Fan K, Li M, Zhang J, Xie Z, Jiang D, Bo X, Zhao D, Shi S, Ni M. ReadCurrent: a VDCNN-based tool for fast and accurate nanopore selective sequencing. Brief Bioinform 2024; 25:bbae435. [PMID: 39226890 PMCID: PMC11370629 DOI: 10.1093/bib/bbae435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 07/20/2024] [Indexed: 09/05/2024] Open
Abstract
Nanopore selective sequencing allows the targeted sequencing of DNA of interest using computational approaches rather than experimental methods such as targeted multiplex polymerase chain reaction or hybridization capture. Compared to sequence-alignment strategies, deep learning (DL) models for classifying target and nontarget DNA provide large speed advantages. However, the relatively low accuracy of these DL-based tools hinders their application in nanopore selective sequencing. Here, we present a DL-based tool named ReadCurrent for nanopore selective sequencing, which takes electric currents as inputs. ReadCurrent employs a modified very deep convolutional neural network (VDCNN) architecture, enabling significantly lower computational costs for training and quicker inference compared to conventional VDCNN. We evaluated the performance of ReadCurrent across 10 nanopore sequencing datasets spanning human, yeasts, bacteria, and viruses. We observed that ReadCurrent achieved a mean accuracy of 98.57% for classification, outperforming four other DL-based selective sequencing methods. In experimental validation that selectively sequenced microbial DNA from human DNA, ReadCurrent achieved an enrichment ratio of 2.85, which was higher than the 2.7 ratio achieved by MinKNOW using the sequence-alignment strategy. In summary, ReadCurrent can rapidly classify target and nontarget DNA with high accuracy, providing an alternative in the toolbox for nanopore selective sequencing. ReadCurrent is available at https://github.com/Ming-Ni-Group/ReadCurrent.
Collapse
Affiliation(s)
- Kechen Fan
- Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
- College of Information Science and Technology, Beijing University of Chemical Technology, No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029, China
| | - Mengfan Li
- Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
- Information Center, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
| | - Jiarong Zhang
- Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
- School of Forensic Medicine, Shanxi Medical University, No. 55 Wenhua Street, Yuci District, Jinzhong 030600, China
| | - Zihan Xie
- Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
- College of Life Science and Technology, Beijing University of Chemical Technology, No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029, China
| | - Daguang Jiang
- College of Information Science and Technology, Beijing University of Chemical Technology, No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029, China
| | - Xiaochen Bo
- Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
| | - Dongsheng Zhao
- Information Center, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
| | - Shenghui Shi
- College of Information Science and Technology, Beijing University of Chemical Technology, No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029, China
| | - Ming Ni
- Advanced & Interdisciplinary Biotechnology, Academy of Military Medical Sciences, No. 27 Taiping Road, Haidian District, Beijing 100850, China
| |
Collapse
|
9
|
Ulrich JU, Renard BY. Fast and space-efficient taxonomic classification of long reads with hierarchical interleaved XOR filters. Genome Res 2024; 34:914-924. [PMID: 38886068 PMCID: PMC11293544 DOI: 10.1101/gr.278623.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 05/23/2024] [Indexed: 06/20/2024]
Abstract
Metagenomic long-read sequencing is gaining popularity for various applications, including pathogen detection and microbiome studies. To analyze the large data created in those studies, software tools need to taxonomically classify the sequenced molecules and estimate the relative abundances of organisms in the sequenced sample. Because of the exponential growth of reference genome databases, the current taxonomic classification methods have large computational requirements. This issue motivated us to develop a new data structure for fast and memory-efficient querying of long reads. Here, we present Taxor as a new tool for long-read metagenomic classification using a hierarchical interleaved XOR filter data structure for indexing and querying large reference genome sets. Taxor implements several k-mer-based approaches, such as syncmers, for pseudoalignment to classify reads and an expectation-maximization algorithm for metagenomic profiling. Our results show that Taxor outperforms state-of-the-art tools regarding precision while having a similar recall for long-read taxonomic classification. Most notably, Taxor reduces the memory requirements and index size by >50% and is among the fastest tools regarding query times. This enables real-time metagenomics analysis with large reference databases on a small laptop in the field.
Collapse
Affiliation(s)
- Jens-Uwe Ulrich
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany;
- Phylogenomics Unit, Center for Artificial Intelligence in Public Health Research, Robert Koch Institute, 15745 Wildau, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Bernhard Y Renard
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany;
| |
Collapse
|
10
|
Sneddon A, Ravindran A, Shanmuganandam S, Kanchi M, Hein N, Jiang S, Shirokikh N, Eyras E. Biochemical-free enrichment or depletion of RNA classes in real-time during direct RNA sequencing with RISER. Nat Commun 2024; 15:4422. [PMID: 38789440 PMCID: PMC11126589 DOI: 10.1038/s41467-024-48673-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
The heterogeneous composition of cellular transcriptomes poses a major challenge for detecting weakly expressed RNA classes, as they can be obscured by abundant RNAs. Although biochemical protocols can enrich or deplete specified RNAs, they are time-consuming, expensive and can compromise RNA integrity. Here we introduce RISER, a biochemical-free technology for the real-time enrichment or depletion of RNA classes. RISER performs selective rejection of molecules during direct RNA sequencing by identifying RNA classes directly from nanopore signals with deep learning and communicating with the sequencing hardware in real time. By targeting the dominant messenger and mitochondrial RNA classes for depletion, RISER reduces their respective read counts by more than 85%, resulting in an increase in sequencing depth of 47% on average for long non-coding RNAs. We also apply RISER for the depletion of globin mRNA in whole blood, achieving a decrease in globin reads by more than 90% as well as an increase in non-globin reads by 16% on average. Furthermore, using a GPU or a CPU, RISER is faster than GPU-accelerated basecalling and mapping. RISER's modular and retrainable software and intuitive command-line interface allow easy adaptation to other RNA classes. RISER is available at https://github.com/comprna/riser .
Collapse
Affiliation(s)
- Alexandra Sneddon
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2601, Australia
- Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Agin Ravindran
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2601, Australia
- Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Somasundhari Shanmuganandam
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia
| | - Madhu Kanchi
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Nadine Hein
- ACRF Department of Cancer Biology and Therapeutics, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Simon Jiang
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia
- Department of Renal Medicine, The Canberra Hospital, Canberra, ACT 2605, Australia
| | - Nikolay Shirokikh
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia.
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2601, Australia.
- Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia.
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia.
| |
Collapse
|
11
|
Mordig M, Rätsch G, Kahles A. SimReadUntil for benchmarking selective sequencing algorithms on ONT devices. Bioinformatics 2024; 40:btae199. [PMID: 38603597 PMCID: PMC11065473 DOI: 10.1093/bioinformatics/btae199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/02/2024] [Accepted: 04/09/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION The Oxford Nanopore Technologies (ONT) ReadUntil API enables selective sequencing, which aims to selectively favor interesting over uninteresting reads, e.g. to deplete or enrich certain genomic regions. The performance gain depends on the selective sequencing decision-making algorithm (SSDA) which decides whether to reject a read, stop receiving a read, or wait for more data. Since real runs are time-consuming and costly, simulating the ONT sequencer with support for the ReadUntil API is highly beneficial for comparing and optimizing new SSDAs. Existing software like MinKNOW and UNCALLED only return raw signal data, are memory-intensive, require huge and often unavailable multi-fast5 files (≥100GB) and are not clearly documented. RESULTS We present the ONT device simulator SimReadUntil that takes a set of full reads as input, distributes them to channels and plays them back in real time including mux scans, channel gaps and blockages, and allows to reject reads as well as stop receiving data from them. Our modified ReadUntil API provides the basecalled reads rather than the raw signal, reducing computational load and focusing on the SSDA rather than on basecalling. Tuning the parameters of tools like ReadFish and ReadBouncer becomes easier because a GPU for basecalling is no longer required. We offer various methods to extract simulation parameters from a sequencing summary file and adapt ReadFish to replicate one of their enrichment experiments. SimReadUntil's gRPC interface allows standardized interaction with a wide range of programming languages. AVAILABILITY AND IMPLEMENTATION Code and fully worked examples are available on GitHub (https://github.com/ratschlab/sim_read_until).
Collapse
Affiliation(s)
- Maximilian Mordig
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zürich, 8092, Switzerland
- Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen, 72076, Germany
| | - Gunnar Rätsch
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zürich, 8092, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, Zürich, 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Biology, ETH Zurich, Zürich, 8092, Switzerland
| | - André Kahles
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zürich, 8092, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, Zürich, 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| |
Collapse
|
12
|
Hewel C, Schmidt H, Runkel S, Kohnen W, Schweiger-Seemann S, Michel A, Bikar SE, Lieb B, Plachter B, Hankeln T, Linke M, Gerber S. Nanopore adaptive sampling of a metagenomic sample derived from a human monkeypox case. J Med Virol 2024; 96:e29610. [PMID: 38654702 DOI: 10.1002/jmv.29610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/18/2024] [Accepted: 04/05/2024] [Indexed: 04/26/2024]
Abstract
In 2022, a series of human monkeypox cases in multiple countries led to the largest and most widespread outbreak outside the known endemic areas. Setup of proper genomic surveillance is of utmost importance to control such outbreaks. To this end, we performed Nanopore (PromethION P24) and Illumina (NextSeq. 2000) Whole Genome Sequencing (WGS) of a monkeypox sample. Adaptive sampling was applied for in silico depletion of the human host genome, allowing for the enrichment of low abundance viral DNA without a priori knowledge of sample composition. Nanopore sequencing allowed for high viral genome coverage, tracking of sample composition during sequencing, strain determination, and preliminary assessment of mutational pattern. In addition to that, only Nanopore data allowed us to resolve the entire monkeypox virus genome, with respect to two structural variants belonging to the genes OPG015 and OPG208. These SVs in important host range genes seem stable throughout the outbreak and are frequently misassembled and/or misannotated due to the prevalence of short read sequencing or short read first assembly. Ideally, standalone standard Illumina sequencing should not be used for Monkeypox WGS and de novo assembly, since it will obfuscate the structure of the genome, which has an impact on the quality and completeness of the genomes deposited in public databases and thus possibly on the ability to evaluate the complete genetic reason for the host range change of monkeypox in the current pandemic.
Collapse
Affiliation(s)
- Charlotte Hewel
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Hanno Schmidt
- SARS-CoV-2 Sequencing Consortium Mainz, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- Institute for Virology and Research Center for Immunotherapy (FZI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Stefan Runkel
- SARS-CoV-2 Sequencing Consortium Mainz, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- Transfusion Unit & Test Center, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Wolfgang Kohnen
- SARS-CoV-2 Sequencing Consortium Mainz, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- Department of Hygiene and Infection Prevention, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Susann Schweiger-Seemann
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- SARS-CoV-2 Sequencing Consortium Mainz, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - André Michel
- SARS-CoV-2 Sequencing Consortium Mainz, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- Medical Management Department, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Sven-Ernö Bikar
- SARS-CoV-2 Sequencing Consortium Mainz, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- StarSEQ GmbH, Mainz, Germany
| | | | - Bodo Plachter
- SARS-CoV-2 Sequencing Consortium Mainz, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- Institute for Virology and Research Center for Immunotherapy (FZI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Thomas Hankeln
- SARS-CoV-2 Sequencing Consortium Mainz, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Molecular Genetics & Genome Analysis, Johannes Gutenberg University of Mainz, Mainz, Germany
| | - Matthias Linke
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- SARS-CoV-2 Sequencing Consortium Mainz, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Susanne Gerber
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- SARS-CoV-2 Sequencing Consortium Mainz, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
13
|
Ulrich JU, Epping L, Pilz T, Walther B, Stingl K, Semmler T, Renard BY. Nanopore adaptive sampling effectively enriches bacterial plasmids. mSystems 2024; 9:e0094523. [PMID: 38376263 PMCID: PMC10949517 DOI: 10.1128/msystems.00945-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 01/23/2024] [Indexed: 02/21/2024] Open
Abstract
Bacterial plasmids play a major role in the spread of antibiotic resistance genes. However, their characterization via DNA sequencing suffers from the low abundance of plasmid DNA in those samples. Although sample preparation methods can enrich the proportion of plasmid DNA before sequencing, these methods are expensive and laborious, and they might introduce a bias by enriching only for specific plasmid DNA sequences. Nanopore adaptive sampling could overcome these issues by rejecting uninteresting DNA molecules during the sequencing process. In this study, we assess the application of adaptive sampling for the enrichment of low-abundant plasmids in known bacterial isolates using two different adaptive sampling tools. We show that a significant enrichment can be achieved even on expired flow cells. By applying adaptive sampling, we also improve the quality of de novo plasmid assemblies and reduce the sequencing time. However, our experiments also highlight issues with adaptive sampling if target and non-target sequences span similar regions. IMPORTANCE Antimicrobial resistance causes millions of deaths every year. Mobile genetic elements like bacterial plasmids are key drivers for the dissemination of antimicrobial resistance genes. This makes the characterization of plasmids via DNA sequencing an important tool for clinical microbiologists. Since plasmids are often underrepresented in bacterial samples, plasmid sequencing can be challenging and laborious. To accelerate the sequencing process, we evaluate nanopore adaptive sampling as an in silico method for the enrichment of low-abundant plasmids. Our results show the potential of this cost-efficient method for future plasmid research but also indicate issues that arise from using reference sequences.
Collapse
Affiliation(s)
- Jens-Uwe Ulrich
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, Berlin, Germany
- Phylogenomics Unit, Center for Artificial Intelligence in Public Health Research, Robert Koch Institute, Wildau, Germany
| | - Lennard Epping
- Genome Sequencing and Genomic Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Tanja Pilz
- Genome Sequencing and Genomic Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Birgit Walther
- Advanced Light and Electron Microscopy, Robert Koch Institute, Berlin, Germany
| | - Kerstin Stingl
- National Reference Laboratory for Campylobacter, Department of Biological Safety, German Federal Institute for Risk Assessment (BfR), Berlin, Germany
| | - Torsten Semmler
- Genome Sequencing and Genomic Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| |
Collapse
|
14
|
Ahsan MU, Gouru A, Chan J, Zhou W, Wang K. A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing. Nat Commun 2024; 15:1448. [PMID: 38365920 PMCID: PMC10873387 DOI: 10.1038/s41467-024-45778-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 02/04/2024] [Indexed: 02/18/2024] Open
Abstract
Oxford Nanopore sequencing can detect DNA methylations from ionic current signal of single molecules, offering a unique advantage over conventional methods. Additionally, adaptive sampling, a software-controlled enrichment method for targeted sequencing, allows reduced representation methylation sequencing that can be applied to CpG islands or imprinted regions. Here we present DeepMod2, a comprehensive deep-learning framework for methylation detection using ionic current signal from Nanopore sequencing. DeepMod2 implements both a bidirectional long short-term memory (BiLSTM) model and a Transformer model and can analyze POD5 and FAST5 signal files generated on R9 and R10 flowcells. Additionally, DeepMod2 can run efficiently on central processing unit (CPU) through model pruning and can infer epihaplotypes or haplotype-specific methylation calls from phased reads. We use multiple publicly available and newly generated datasets to evaluate the performance of DeepMod2 under varying scenarios. DeepMod2 has comparable performance to Guppy and Dorado, which are the current state-of-the-art methods from Oxford Nanopore Technologies that remain closed-source. Moreover, we show a high correlation (r = 0.96) between reduced representation and whole-genome Nanopore sequencing. In summary, DeepMod2 is an open-source tool that enables fast and accurate DNA methylation detection from whole-genome or adaptive sequencing data on a diverse range of flowcell types.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Anagha Gouru
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Joe Chan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Wanding Zhou
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
15
|
Singh G, Alser M, Denolf K, Firtina C, Khodamoradi A, Cavlak MB, Corporaal H, Mutlu O. RUBICON: a framework for designing efficient deep learning-based genomic basecallers. Genome Biol 2024; 25:49. [PMID: 38365730 PMCID: PMC10870431 DOI: 10.1186/s13059-024-03181-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 02/02/2024] [Indexed: 02/18/2024] Open
Abstract
Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The performance of basecalling has critical implications for all later steps in genome analysis. Therefore, there is a need to reduce the computation and memory cost of basecalling while maintaining accuracy. We present RUBICON, a framework to develop efficient hardware-optimized basecallers. We demonstrate the effectiveness of RUBICON by developing RUBICALL, the first hardware-optimized mixed-precision basecaller that performs efficient basecalling, outperforming the state-of-the-art basecallers. We believe RUBICON offers a promising path to develop future hardware-optimized basecallers.
Collapse
Affiliation(s)
- Gagandeep Singh
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
- Research and Advanced Development, AMD, Longmont, USA
| | - Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
| | | | - Can Firtina
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland.
| | | | - Meryem Banu Cavlak
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
| | - Henk Corporaal
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Onur Mutlu
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland.
| |
Collapse
|
16
|
Wang J, Yang L, Cheng A, Tham CY, Tan W, Darmawan J, de Sessions PF, Wan Y. Direct RNA sequencing coupled with adaptive sampling enriches RNAs of interest in the transcriptome. Nat Commun 2024; 15:481. [PMID: 38212309 PMCID: PMC10784512 DOI: 10.1038/s41467-023-44656-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 12/22/2023] [Indexed: 01/13/2024] Open
Abstract
Abundant cellular transcripts occupy most of the sequencing reads in the transcriptome, making it challenging to assay for low-abundant transcripts. Here, we utilize the adaptive sampling function of Oxford Nanopore sequencing to selectively deplete and enrich RNAs of interest without biochemical manipulation before sequencing. Adaptive sampling performed on a pool of in vitro transcribed RNAs resulted in a net increase of 22-30% in the proportion of transcripts of interest in the population. Enriching and depleting different proportions of the Candida albicans transcriptome also resulted in a 11-13.5% increase in the number of reads on target transcripts, with longer and more abundant transcripts being more efficiently depleted. Depleting all currently annotated Candida albicans transcripts did not result in an absolute enrichment of remaining transcripts, although we identified 26 previously unknown transcripts and isoforms, 17 of which are antisense to existing transcripts. Further improvements in the adaptive sampling of RNAs will allow the technology to be widely applied to study RNAs of interest in diverse transcriptomes.
Collapse
Affiliation(s)
- Jiaxu Wang
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore
| | - Lin Yang
- Oxford Nanopore Technologies, Singapore, 138667, Singapore
| | - Anthony Cheng
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore
| | | | - Wenting Tan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore
| | - Jefferson Darmawan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore
| | | | - Yue Wan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore.
- Department of Biochemistry, National University of Singapore, Singapore, 117596, Singapore.
| |
Collapse
|
17
|
Terrazos Miani MA, Borcard L, Gempeler S, Baumann C, Bittel P, Leib SL, Neuenschwander S, Ramette A. NASCarD (Nanopore Adaptive Sampling with Carrier DNA): A Rapid, PCR-Free Method for SARS-CoV-2 Whole-Genome Sequencing in Clinical Samples. Pathogens 2024; 13:61. [PMID: 38251368 PMCID: PMC10818518 DOI: 10.3390/pathogens13010061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 01/04/2024] [Accepted: 01/07/2024] [Indexed: 01/23/2024] Open
Abstract
Whole-genome sequencing (WGS) represents the main technology for SARS-CoV-2 lineage characterization in diagnostic laboratories worldwide. The rapid, near-full-length sequencing of the viral genome is commonly enabled by high-throughput sequencing of PCR amplicons derived from cDNA molecules. Here, we present a new approach called NASCarD (Nanopore Adaptive Sampling with Carrier DNA), which allows a low amount of nucleic acids to be sequenced while selectively enriching for sequences of interest, hence limiting the production of non-target sequences. Using COVID-19 positive samples available during the omicron wave, we demonstrate how the method may lead to >99% genome completeness of the SARS-CoV-2 genome sequences within 7 h of sequencing at a competitive cost. The new approach may have applications beyond SARS-CoV-2 sequencing for other DNA or RNA pathogens in clinical samples.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Alban Ramette
- Institute for Infectious Diseases, University of Bern, Friedbühlstrasse 25, 3001 Bern, Switzerland
| |
Collapse
|
18
|
Lin Y, Zhang Y, Sun H, Jiang H, Zhao X, Teng X, Lin J, Shu B, Sun H, Liao Y, Zhou J. NanoDeep: a deep learning framework for nanopore adaptive sampling on microbial sequencing. Brief Bioinform 2023; 25:bbad499. [PMID: 38189540 PMCID: PMC10772945 DOI: 10.1093/bib/bbad499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 11/21/2023] [Accepted: 12/11/2023] [Indexed: 01/09/2024] Open
Abstract
Nanopore sequencers can enrich or deplete the targeted DNA molecules in a library by reversing the voltage across individual nanopores. However, it requires substantial computational resources to achieve rapid operations in parallel at read-time sequencing. We present a deep learning framework, NanoDeep, to overcome these limitations by incorporating convolutional neural network and squeeze and excitation. We first showed that the raw squiggle derived from native DNA sequences determines the origin of microbial and human genomes. Then, we demonstrated that NanoDeep successfully classified bacterial reads from the pooled library with human sequence and showed enrichment for bacterial sequence compared with routine nanopore sequencing setting. Further, we showed that NanoDeep improves the sequencing efficiency and preserves the fidelity of bacterial genomes in the mock sample. In addition, NanoDeep performs well in the enrichment of metagenome sequences of gut samples, showing its potential applications in the enrichment of unknown microbiota. Our toolkit is available at https://github.com/lysovosyl/NanoDeep.
Collapse
Affiliation(s)
- Yusen Lin
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Yongjun Zhang
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Hang Sun
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Hang Jiang
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Xing Zhao
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China
| | - Xiaojuan Teng
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Jingxia Lin
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Bowen Shu
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Hao Sun
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China
| | - Yuhui Liao
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Jiajian Zhou
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
19
|
Deserranno K, Tilleman L, Rubben K, Deforce D, Van Nieuwerburgh F. Targeted haplotyping in pharmacogenomics using Oxford Nanopore Technologies' adaptive sampling. Front Pharmacol 2023; 14:1286764. [PMID: 38026945 PMCID: PMC10679755 DOI: 10.3389/fphar.2023.1286764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023] Open
Abstract
Pharmacogenomics (PGx) studies the impact of interindividual genomic variation on drug response, allowing the opportunity to tailor the dosing regimen for each patient. Current targeted PGx testing platforms are mainly based on microarray, polymerase chain reaction, or short-read sequencing. Despite demonstrating great value for the identification of single nucleotide variants (SNVs) and insertion/deletions (INDELs), these assays do not permit identification of large structural variants, nor do they allow unambiguous haplotype phasing for star-allele assignment. Here, we used Oxford Nanopore Technologies' adaptive sampling to enrich a panel of 1,036 genes with well-documented PGx relevance extracted from the Pharmacogenomics Knowledge Base (PharmGKB). By evaluating concordance with existing truth sets, we demonstrate accurate variant and star-allele calling for five Genome in a Bottle reference samples. We show that up to three samples can be multiplexed on one PromethION flow cell without a significant drop in variant calling performance, resulting in 99.35% and 99.84% recall and precision for the targeted variants, respectively. This work advances the use of nanopore sequencing in clinical PGx settings.
Collapse
Affiliation(s)
| | | | | | | | - Filip Van Nieuwerburgh
- Laboratory of Pharmaceutical Biotechnology, Faculty of Pharmaceutical Sciences, Ghent University, Ghent, Belgium
| |
Collapse
|
20
|
Hufsky F, Abecasis AB, Babaian A, Beck S, Brierley L, Dellicour S, Eggeling C, Elena SF, Gieraths U, Ha AD, Harvey W, Jones TC, Lamkiewicz K, Lovate GL, Lücking D, Machyna M, Nishimura L, Nocke MK, Renard BY, Sakaguchi S, Sakellaridi L, Spangenberg J, Tarradas-Alemany M, Triebel S, Vakulenko Y, Wijesekara RY, González-Candelas F, Krautwurst S, Pérez-Cataluña A, Randazzo W, Sánchez G, Marz M. The International Virus Bioinformatics Meeting 2023. Viruses 2023; 15:2031. [PMID: 37896809 PMCID: PMC10612056 DOI: 10.3390/v15102031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/08/2023] [Accepted: 09/14/2023] [Indexed: 10/29/2023] Open
Abstract
The 2023 International Virus Bioinformatics Meeting was held in Valencia, Spain, from 24-26 May 2023, attracting approximately 180 participants worldwide. The primary objective of the conference was to establish a dynamic scientific environment conducive to discussion, collaboration, and the generation of novel research ideas. As the first in-person event following the SARS-CoV-2 pandemic, the meeting facilitated highly interactive exchanges among attendees. It served as a pivotal gathering for gaining insights into the current status of virus bioinformatics research and engaging with leading researchers and emerging scientists. The event comprised eight invited talks, 19 contributed talks, and 74 poster presentations across eleven sessions spanning three days. Topics covered included machine learning, bacteriophages, virus discovery, virus classification, virus visualization, viral infection, viromics, molecular epidemiology, phylodynamic analysis, RNA viruses, viral sequence analysis, viral surveillance, and metagenomics. This report provides rewritten abstracts of the presentations, a summary of the key research findings, and highlights shared during the meeting.
Collapse
Affiliation(s)
- Franziska Hufsky
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
| | - Ana B. Abecasis
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Global Health and Tropical Medicine, GHTM, Associate Laboratory in Translation and Innovation towards Global Health, LA-REAL, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, UNL, Rua da Junqueira 100, 1349-008 Lisboa, Portugal
| | - Artem Babaian
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada
- Donnelly Centre, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Sebastian Beck
- Leibniz Institute of Virology, Department Viral Zoonoses—One Health, 20251 Hamburg, Germany;
| | - Liam Brierley
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Department of Health Data Science, University of Liverpool, Liverpool L69 3GF, UK
| | - Simon Dellicour
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, CP160/12, 50 av. FD Roosevelt, 1050 Bruxelles, Belgium
- Laboratory for Clinical and Epidemiological Virology, Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, University of Leuven, 3000 Leuven, Belgium
| | - Christian Eggeling
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Institute of Applied Optics and Biophysics, Friedrich Schiller University Jena, Max-Wien-Platz 1, 07743 Jena, Germany
| | - Santiago F. Elena
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Institute for Integrative Systems Biology (I2SysBio), CSIC-Universitat de Valencia, Catedratico Agustin Escardino 9, 46980 Valencia, Spain
| | - Udo Gieraths
- Institute of Virology, Charité, Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Anh D. Ha
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Will Harvey
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK
| | - Terry C. Jones
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Institute of Virology, Charité, Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Kevin Lamkiewicz
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
| | - Gabriel L. Lovate
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
| | - Dominik Lücking
- Max-Planck Institute for Marine Microbiology, Celsiusstraße 1, 28359 Bremen, Germany
| | - Martin Machyna
- Paul-Ehrlich-Institut, Host-Pathogen-Interactions, 63225 Langen, Germany
| | - Luca Nishimura
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima 411-8540, Japan
- Human Genetics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan
| | - Maximilian K. Nocke
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Department for Molecular & Medical Virology, Ruhr University Bochum, 44801 Bochum, Germany
| | - Bernard Y. Renard
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Digital Engineering Faculty, Hasso Plattner Institute, University of Potsdam, 14482 Potsdam, Germany
| | - Shoichi Sakaguchi
- Department of Microbiology and Infection Control, Faculty of Medicine, Osaka Medical and Pharmaceutical University, Osaka 569-8686, Japan;
| | - Lygeri Sakellaridi
- Institute for Virology and Immunobiology, University of Würzburg, Versbacher Str. 7, 97078 Würzburg, Germany
| | - Jannes Spangenberg
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
| | - Maria Tarradas-Alemany
- Computational Genomics Lab., Department of Genetics, Microbiology and Statistics, Institut de Biomedicina UB (IBUB), Universitat de Barcelona (UB), 08028 Barcelona, Spain
| | - Sandra Triebel
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
| | - Yulia Vakulenko
- Martsinovsky Institute of Medical Parasitology, Tropical and Vector Borne Diseases, Sechenov First Moscow State Medical University, 119991 Moscow, Russia
| | - Rajitha Yasas Wijesekara
- Institute for Bioinformatics, University of Medicine Greifswald, Felix-Hausdorff-Str. 8, 17475 Greifswald, Germany
| | - Fernando González-Candelas
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- Institute for Integrative Systems Biology (I2SysBio), CSIC-Universitat de Valencia, Catedratico Agustin Escardino 9, 46980 Valencia, Spain
- Joint Research Unit “Infection and Public Health” FISABIO, University of Valencia, 46010 Valencia, Spain
| | - Sarah Krautwurst
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
| | - Alba Pérez-Cataluña
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- VISAFELab, Department of Preservation and Food Safety Technologies, Institute of Agrochemistry and Food Technology, IATA-CSIC, 46980 Valencia, Spain
| | - Walter Randazzo
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- VISAFELab, Department of Preservation and Food Safety Technologies, Institute of Agrochemistry and Food Technology, IATA-CSIC, 46980 Valencia, Spain
| | - Gloria Sánchez
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- VISAFELab, Department of Preservation and Food Safety Technologies, Institute of Agrochemistry and Food Technology, IATA-CSIC, 46980 Valencia, Spain
| | - Manja Marz
- European Virus Bioinformatics Center, 07743 Jena, Germany (A.B.A.); (L.B.); (S.D.); (C.E.); (S.F.E.); (T.C.J.); (K.L.); (G.L.L.); (M.K.N.); (B.Y.R.); (F.G.-C.); (A.P.-C.); (W.R.); (G.S.)
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
- German Center for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany
- Michael Stifel Center Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07745 Jena, Germany
- Leibniz Institute for Age Research—Fritz Lippman Institute, 07745 Jena, Germany
| |
Collapse
|
21
|
Hook PW, Timp W. Beyond assembly: the increasing flexibility of single-molecule sequencing technology. Nat Rev Genet 2023; 24:627-641. [PMID: 37161088 PMCID: PMC10169143 DOI: 10.1038/s41576-023-00600-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2023] [Indexed: 05/11/2023]
Abstract
The maturation of high-throughput short-read sequencing technology over the past two decades has shaped the way genomes are studied. Recently, single-molecule, long-read sequencing has emerged as an essential tool in deciphering genome structure and function, including filling gaps in the human reference genome, measuring the epigenome and characterizing splicing variants in the transcriptome. With recent technological developments, these single-molecule technologies have moved beyond genome assembly and are being used in a variety of ways, including to selectively sequence specific loci with long reads, measure chromatin state and protein-DNA binding in order to investigate the dynamics of gene regulation, and rapidly determine copy number variation. These increasingly flexible uses of single-molecule technologies highlight a young and fast-moving part of the field that is leading to a more accessible era of nucleic acid sequencing.
Collapse
Affiliation(s)
- Paul W Hook
- Department of Biomedical Engineering, Molecular Biology and Genetics, and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Molecular Biology and Genetics, and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
22
|
Firtina C, Mansouri Ghiasi N, Lindegger J, Singh G, Cavlak MB, Mao H, Mutlu O. RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes. Bioinformatics 2023; 39:i297-i307. [PMID: 37387139 PMCID: PMC10311405 DOI: 10.1093/bioinformatics/btad272] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either (i) require powerful computational resources that may not be available for portable sequencers or (ii) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. We evaluate RawHash on three applications: (i) read mapping, (ii) relative abundance estimation, and (iii) contamination analysis. Our evaluations show that RawHash is the only tool that can provide high accuracy and high throughput for analyzing large genomes in real-time. When compared to the state-of-the-art techniques, UNCALLED and Sigmap, RawHash provides (i) 25.8× and 3.4× better average throughput and (ii) significantly better accuracy for large genomes, respectively. Source code is available at https://github.com/CMU-SAFARI/RawHash.
Collapse
Affiliation(s)
- Can Firtina
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Nika Mansouri Ghiasi
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Joel Lindegger
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Gagandeep Singh
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Meryem Banu Cavlak
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Haiyu Mao
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Onur Mutlu
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| |
Collapse
|
23
|
Yang J, DeVore AN, Fu DA, Spicer MM, Guo M, Thompson SG, Ahlers-Dannen KE, Polato F, Nussenzweig A, Fisher RA. Rapid and precise genotyping of transgene zygosity in mice using an allele-specific method. Life Sci Alliance 2023; 6:e202201729. [PMID: 37037594 PMCID: PMC10087101 DOI: 10.26508/lsa.202201729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 03/22/2023] [Accepted: 03/23/2023] [Indexed: 04/12/2023] Open
Abstract
Precise determination of transgene zygosity is essential for use of transgenic mice in research. Because integration loci of transgenes are usually unknown due to their random insertion, assessment of transgene zygosity remains a challenge. Current zygosity genotyping methods (progeny testing, qPCR, and NGS-computational biology analysis) are time consuming, prone to error or technically challenging. Here, we developed a novel method to determine transgene zygosity requiring no knowledge of transgene insertion loci. This method applies allele-specific restriction enzyme digestion of PCR products (RE/PCR) to rapidly and reliably quantify transgene zygosity. We demonstrate the applicability of this method to three transgenic strains of mice (Atm TgC3001L, Nes-Cre, and Syn1-Cre) harboring a unique restriction enzyme site on either the transgene or its homologous sequence in the mouse genome. This method is as accurate as the gold standard of progeny testing but requires 2 d instead of a month or more. It is also exceedingly more accurate than the most commonly used approach of qPCR quantification. Our novel method represents a significant technical advance in determining transgene zygosities in mice.
Collapse
Affiliation(s)
- Jianqi Yang
- Departments of Neuroscience and Pharmacology, The University of Iowa, Iowa City, IA, USA
| | - Alison N DeVore
- Departments of Neuroscience and Pharmacology, The University of Iowa, Iowa City, IA, USA
| | - Daniel A Fu
- Departments of Neuroscience and Pharmacology, The University of Iowa, Iowa City, IA, USA
| | - Mackenzie M Spicer
- Departments of Neuroscience and Pharmacology, The University of Iowa, Iowa City, IA, USA
| | - Mengcheng Guo
- Departments of Neuroscience and Pharmacology, The University of Iowa, Iowa City, IA, USA
| | - Samantha G Thompson
- Departments of Neuroscience and Pharmacology, The University of Iowa, Iowa City, IA, USA
| | | | - Federica Polato
- Laboratory of Genome Integrity, National Institutes of Health, Centre for Cancer Research, Bethesda, MD, USA
| | - Andre Nussenzweig
- Laboratory of Genome Integrity, National Institutes of Health, Centre for Cancer Research, Bethesda, MD, USA
| | - Rory A Fisher
- Departments of Neuroscience and Pharmacology, The University of Iowa, Iowa City, IA, USA
- Roy J and Lucille A Carver College of Medicine, The University of Iowa, Iowa City, IA, USA
| |
Collapse
|
24
|
Spealman P, De T, Chuong JN, Gresham D. Best Practices in Microbial Experimental Evolution: Using Reporters and Long-Read Sequencing to Identify Copy Number Variation in Experimental Evolution. J Mol Evol 2023; 91:356-368. [PMID: 37012421 PMCID: PMC10275804 DOI: 10.1007/s00239-023-10102-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 02/21/2023] [Indexed: 04/05/2023]
Abstract
Copy number variants (CNVs), comprising gene amplifications and deletions, are a pervasive class of heritable variation. CNVs play a key role in rapid adaptation in both natural, and experimental, evolution. However, despite the advent of new DNA sequencing technologies, detection and quantification of CNVs in heterogeneous populations has remained challenging. Here, we summarize recent advances in the use of CNV reporters that provide a facile means of quantifying de novo CNVs at a specific locus in the genome, and nanopore sequencing, for resolving the often complex structures of CNVs. We provide guidance for the engineering and analysis of CNV reporters and practical guidelines for single-cell analysis of CNVs using flow cytometry. We summarize recent advances in nanopore sequencing, discuss the utility of this technology, and provide guidance for the bioinformatic analysis of these data to define the molecular structure of CNVs. The combination of reporter systems for tracking and isolating CNV lineages and long-read DNA sequencing for characterizing CNV structures enables unprecedented resolution of the mechanisms by which CNVs are generated and their evolutionary dynamics.
Collapse
Affiliation(s)
- Pieter Spealman
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Titir De
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Julie N Chuong
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - David Gresham
- Department of Biology, New York University, New York, NY, 10003, USA.
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
| |
Collapse
|
25
|
Chen P, Sun Z, Wang J, Liu X, Bai Y, Chen J, Liu A, Qiao F, Chen Y, Yuan C, Sha J, Zhang J, Xu LQ, Li J. Portable nanopore-sequencing technology: Trends in development and applications. Front Microbiol 2023; 14:1043967. [PMID: 36819021 PMCID: PMC9929578 DOI: 10.3389/fmicb.2023.1043967] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 01/03/2023] [Indexed: 02/04/2023] Open
Abstract
Sequencing technology is the most commonly used technology in molecular biology research and an essential pillar for the development and applications of molecular biology. Since 1977, when the first generation of sequencing technology opened the door to interpreting the genetic code, sequencing technology has been developing for three generations. It has applications in all aspects of life and scientific research, such as disease diagnosis, drug target discovery, pathological research, species protection, and SARS-CoV-2 detection. However, the first- and second-generation sequencing technology relied on fluorescence detection systems and DNA polymerization enzyme systems, which increased the cost of sequencing technology and limited its scope of applications. The third-generation sequencing technology performs PCR-free and single-molecule sequencing, but it still depends on the fluorescence detection device. To break through these limitations, researchers have made arduous efforts to develop a new advanced portable sequencing technology represented by nanopore sequencing. Nanopore technology has the advantages of small size and convenient portability, independent of biochemical reagents, and direct reading using physical methods. This paper reviews the research and development process of nanopore sequencing technology (NST) from the laboratory to commercially viable tools; discusses the main types of nanopore sequencing technologies and their various applications in solving a wide range of real-world problems. In addition, the paper collates the analysis tools necessary for performing different processing tasks in nanopore sequencing. Finally, we highlight the challenges of NST and its future research and application directions.
Collapse
Affiliation(s)
- Pin Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Zepeng Sun
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Jiawei Wang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Xinlong Liu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yun Bai
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Jiang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Anna Liu
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Feng Qiao
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Chenyan Yuan
- Clinical Laboratory, Southeast University Zhongda Hospital, Nanjing, China
| | - Jingjie Sha
- School of Mechanical Engineering, Southeast University, Nanjing, China
| | - Jinghui Zhang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Li-Qun Xu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China,*Correspondence: Li-Qun Xu, ✉
| | - Jian Li
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China,Jian Li, ✉
| |
Collapse
|
26
|
Shih PJ, Saadat H, Parameswaran S, Gamaarachchi H. Efficient real-time selective genome sequencing on resource-constrained devices. Gigascience 2022; 12:giad046. [PMID: 37395631 PMCID: PMC10316692 DOI: 10.1093/gigascience/giad046] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 04/11/2023] [Accepted: 06/02/2023] [Indexed: 07/04/2023] Open
Abstract
BACKGROUND Third-generation nanopore sequencers offer selective sequencing or "Read Until" that allows genomic reads to be analyzed in real time and abandoned halfway if not belonging to a genomic region of "interest." This selective sequencing opens the door to important applications such as rapid and low-cost genetic tests. The latency in analyzing should be as low as possible for selective sequencing to be effective so that unnecessary reads can be rejected as early as possible. However, existing methods that employ a subsequence dynamic time warping (sDTW) algorithm for this problem are too computationally intensive that a massive workstation with dozens of CPU cores still struggles to keep up with the data rate of a mobile phone-sized MinION sequencer. RESULTS In this article, we present Hardware Accelerated Read Until (HARU), a resource-efficient hardware-software codesign-based method that exploits a low-cost and portable heterogeneous multiprocessor system-on-chip platform with on-chip field-programmable gate arrays (FPGA) to accelerate the sDTW-based Read Until algorithm. Experimental results show that HARU on a Xilinx FPGA embedded with a 4-core ARM processor is around 2.5× faster than a highly optimized multithreaded software version (around 85× faster than the existing unoptimized multithreaded software) running on a sophisticated server with a 36-core Intel Xeon processor for a SARS-CoV-2 dataset. The energy consumption of HARU is 2 orders of magnitudes lower than the same application executing on the 36-core server. CONCLUSIONS HARU demonstrates that nanopore selective sequencing is possible on resource-constrained devices through rigorous hardware-software optimizations. The source code for the HARU sDTW module is available as open source at https://github.com/beebdev/HARU, and an example application that uses HARU is at https://github.com/beebdev/sigfish-haru.
Collapse
Affiliation(s)
- Po Jui Shih
- School of Computer Science and Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Hassaan Saadat
- School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Sri Parameswaran
- School of Electrical and Information Engineering, University of Sydney, Sydney, NSW 2006, Australia
| | - Hasindu Gamaarachchi
- School of Computer Science and Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
- Genomics Pillar, Garvan Institute of Medical Research, Sydney, NSW 2010, Australia
- Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children’s Research Institute, Sydney 2010, Australia
| |
Collapse
|