1
|
Guo Z, Ni Y, Tan L, Shao Y, Ye L, Chen S, Li R. Nanopore Current Events Magnifier (nanoCEM): a novel tool for visualizing current events at modification sites of nanopore sequencing. NAR Genom Bioinform 2024; 6:lqae052. [PMID: 38774513 PMCID: PMC11106030 DOI: 10.1093/nargab/lqae052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 04/23/2024] [Accepted: 05/05/2024] [Indexed: 05/24/2024] Open
Abstract
Nanopore sequencing technologies have enabled the direct detection of base modifications in DNA or RNA molecules. Despite these advancements, the tools for visualizing electrical current, essential for analyzing base modifications, are often lacking in clarity and compatibility with diverse nanopore pipelines. Here, we present Nanopore Current Events Magnifier (nanoCEM, https://github.com/lrslab/nanoCEM), a Python command-line tool designed to facilitate the identification of DNA/RNA modification sites through enhanced visualization and statistical analysis. Compatible with the four preprocessing methods including 'f5c resquiggle', 'f5c eventalign', 'Tombo' and 'move table', nanoCEM is applicable to RNA and DNA analysis across multiple flow cell types. By utilizing rescaling techniques and calculating various statistical features, nanoCEM provides more accurate and comparable visualization of current events, allowing researchers to effectively observe differences between samples and showcase the modified sites.
Collapse
Affiliation(s)
- Zhihao Guo
- Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, China
| | - Ying Ni
- Tung Biomedical Sciences Centre, City University of Hong Kong, Hong Kong, China
| | - Lu Tan
- Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, China
| | - Yanwen Shao
- Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, China
| | - Lianwei Ye
- Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, China
| | - Sheng Chen
- State Key Lab of Chemical Biology and Drug Discovery and the Department of Food Science and Nutrition, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR
| | - Runsheng Li
- Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, China
- Tung Biomedical Sciences Centre, City University of Hong Kong, Hong Kong, China
- Department of Precision Diagnostic and Therapeutic Technology, City University of Hong Kong Shenzhen Futian Research Institute, Shenzhen, Guangdong, China
| |
Collapse
|
2
|
Sneddon A, Ravindran A, Shanmuganandam S, Kanchi M, Hein N, Jiang S, Shirokikh N, Eyras E. Biochemical-free enrichment or depletion of RNA classes in real-time during direct RNA sequencing with RISER. Nat Commun 2024; 15:4422. [PMID: 38789440 PMCID: PMC11126589 DOI: 10.1038/s41467-024-48673-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
The heterogeneous composition of cellular transcriptomes poses a major challenge for detecting weakly expressed RNA classes, as they can be obscured by abundant RNAs. Although biochemical protocols can enrich or deplete specified RNAs, they are time-consuming, expensive and can compromise RNA integrity. Here we introduce RISER, a biochemical-free technology for the real-time enrichment or depletion of RNA classes. RISER performs selective rejection of molecules during direct RNA sequencing by identifying RNA classes directly from nanopore signals with deep learning and communicating with the sequencing hardware in real time. By targeting the dominant messenger and mitochondrial RNA classes for depletion, RISER reduces their respective read counts by more than 85%, resulting in an increase in sequencing depth of 47% on average for long non-coding RNAs. We also apply RISER for the depletion of globin mRNA in whole blood, achieving a decrease in globin reads by more than 90% as well as an increase in non-globin reads by 16% on average. Furthermore, using a GPU or a CPU, RISER is faster than GPU-accelerated basecalling and mapping. RISER's modular and retrainable software and intuitive command-line interface allow easy adaptation to other RNA classes. RISER is available at https://github.com/comprna/riser .
Collapse
Affiliation(s)
- Alexandra Sneddon
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2601, Australia
- Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Agin Ravindran
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2601, Australia
- Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Somasundhari Shanmuganandam
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia
| | - Madhu Kanchi
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Nadine Hein
- ACRF Department of Cancer Biology and Therapeutics, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Simon Jiang
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia
- Department of Renal Medicine, The Canberra Hospital, Canberra, ACT 2605, Australia
| | - Nikolay Shirokikh
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia.
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2601, Australia.
- Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia.
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia.
| |
Collapse
|
3
|
Mordig M, Rätsch G, Kahles A. SimReadUntil for benchmarking selective sequencing algorithms on ONT devices. Bioinformatics 2024; 40:btae199. [PMID: 38603597 PMCID: PMC11065473 DOI: 10.1093/bioinformatics/btae199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/02/2024] [Accepted: 04/09/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION The Oxford Nanopore Technologies (ONT) ReadUntil API enables selective sequencing, which aims to selectively favor interesting over uninteresting reads, e.g. to deplete or enrich certain genomic regions. The performance gain depends on the selective sequencing decision-making algorithm (SSDA) which decides whether to reject a read, stop receiving a read, or wait for more data. Since real runs are time-consuming and costly, simulating the ONT sequencer with support for the ReadUntil API is highly beneficial for comparing and optimizing new SSDAs. Existing software like MinKNOW and UNCALLED only return raw signal data, are memory-intensive, require huge and often unavailable multi-fast5 files (≥100GB) and are not clearly documented. RESULTS We present the ONT device simulator SimReadUntil that takes a set of full reads as input, distributes them to channels and plays them back in real time including mux scans, channel gaps and blockages, and allows to reject reads as well as stop receiving data from them. Our modified ReadUntil API provides the basecalled reads rather than the raw signal, reducing computational load and focusing on the SSDA rather than on basecalling. Tuning the parameters of tools like ReadFish and ReadBouncer becomes easier because a GPU for basecalling is no longer required. We offer various methods to extract simulation parameters from a sequencing summary file and adapt ReadFish to replicate one of their enrichment experiments. SimReadUntil's gRPC interface allows standardized interaction with a wide range of programming languages. AVAILABILITY AND IMPLEMENTATION Code and fully worked examples are available on GitHub (https://github.com/ratschlab/sim_read_until).
Collapse
Affiliation(s)
- Maximilian Mordig
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zürich, 8092, Switzerland
- Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen, 72076, Germany
| | - Gunnar Rätsch
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zürich, 8092, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, Zürich, 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Biology, ETH Zurich, Zürich, 8092, Switzerland
| | - André Kahles
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zürich, 8092, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, Zürich, 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| |
Collapse
|
4
|
Su C, Chandradoss KR, Malachowski T, Boya R, Ryu HS, Brennand KJ, Phillips-Cremins JE. MASTR-seq: Multiplexed Analysis of Short Tandem Repeats with sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591790. [PMID: 38746155 PMCID: PMC11092654 DOI: 10.1101/2024.04.29.591790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
More than 60 human disorders have been linked to unstable expansion of short tandem repeat (STR) tracts. STR length and the extent of DNA methylation is linked to disease pathology and can be mosaic in a cell type-specific manner in several repeat expansion disorders. Mosaic phenomenon have been difficult to study to date due to technical bias intrinsic to repeat sequences and the need for multi-modal measurements at single-allele resolution. Nanopore long-read sequencing accurately measures STR length and DNA methylation in the same single molecule but is cost prohibitive for studies assessing a target locus across multiple experimental conditions or patient samples. Here, we describe MASTR-seq, M ultiplexed A nalysis of S hort T andem R epeats, for cost-effective, high-throughput, accurate, multi-modal measurements of DNA methylation and STR genotype at single-allele resolution. MASTR-seq couples long-read sequencing, Cas9-mediated target enrichment, and PCR-free multiplexed barcoding to achieve a >ten-fold increase in on-target read mapping for 8-12 pooled samples in a single MinION flow cell. We provide a detailed experimental protocol and computational tools and present evidence that MASTR-seq quantifies tract length and DNA methylation status for CGG and CAG STR loci in normal-length and mutation-length human cell lines. The MASTR-seq protocol takes approximately eight days for experiments and one additional day for data processing and analyses. Key points We provide a protocol for MASTR-seq: M ultiplexed A nalysis of S hort T andem R epeats using Cas9-mediated target enrichment and PCR-free, multiplexed nanopore sequencing. MASTR-seq achieves a >10-fold increase in on-target read proportion for highly repetitive, technically inaccessible regions of the genome relevant for human health and disease.MASTR-seq allows for high-throughput, efficient, accurate, and cost-effective measurement of STR length and DNA methylation in the same single allele for up to 8-12 samples in parallel in one Nanopore MinION flow cell.
Collapse
|
5
|
Vidal A, Wijekoon VB, Viterbo E. Concatenated Nanopore DNA Codes. IEEE Trans Nanobioscience 2024; 23:310-318. [PMID: 38546987 DOI: 10.1109/tnb.2024.3350001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
In nanopore sequencers, single-stranded DNA molecules (or k-mers) enter a small opening in a membrane called a nanopore and modulate the ionic current through the pore, producing a channel output in the form of a noisy piecewise constant signal. An important problem in DNA-based data storage is finding a set of k-mers, i.e. a DNA code, that is robust against noisy sample duplication introduced by nanopore sequencers. Good DNA codes should contain as many k-mers as possible that produce distinguishable current signals (squiggles) as measured by the sequencer. The dissimilarity between squiggles can be estimated using a bound on their pairwise error probability, which is used as a metric for code design. Unfortunately, code construction using the union bound is limited to small k's due to the difficulty of finding maximum cliques in large graphs. In this paper, we construct large codes by concatenating codewords from a base code, thereby packing more information in a single strand while retaining the storage efficiency of the base code. To facilitate decoding, we include a circumfix in the base code to reduce the effect of the nanopore channel memory. We show that the decoding complexity scales as [Formula: see text], where m is the number of concatenated k-mers. Simulations show that the base code error rate is stable as m increases.
Collapse
|
6
|
Munro R, Wibowo S, Payne A, Loose M. Icarust, a real-time simulator for Oxford Nanopore adaptive sampling. Bioinformatics 2024; 40:btae141. [PMID: 38478392 PMCID: PMC10980563 DOI: 10.1093/bioinformatics/btae141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 12/21/2023] [Accepted: 03/11/2024] [Indexed: 04/01/2024] Open
Abstract
MOTIVATION Oxford Nanopore Technologies (ONT) sequencers enable real-time generation of sequence data, which allows for concurrent analysis during a run. Adaptive sampling leverages this real-time capability in extremis, rejecting or accepting reads for sequencing based on assessment of the sequence from the start of each read. This functionality is provided by ONT's software, MinKNOW (Oxford Nanopore Technologies). Designing and developing software to take advantage of adaptive sampling can be costly in terms of sequencing consumables, using precious samples and preparing sequencing libraries. MinKNOW addresses this in part by allowing the replay of previously sequenced runs for testing. However, as we show, the sequencing output only partially changes in response to adaptive sampling instructions. Here we present Icarust, a tool enabling more accurate approximations of sequencing runs. Icarust recreates all the required endpoints of MinKNOW to perform adaptive sampling and writes output compatible with current base-callers and analysis pipelines. Icarust serves nanopore signal simulating a MinION or PromethION flow cell experiment from any reference genome using either R9 or R10 pore models. We show that simulating sequencing runs with Icarust provides a realistic testing and development environment for software exploiting the real-time nature of Nanopore sequencing. AVAILABILITY AND IMPLEMENTATION All code is open source and freely available here-https://github.com/LooseLab/Icarust. Icarust is implemented in Rust, with a docker container also available. The data underlying this article will be shared on reasonable request to the corresponding author.
Collapse
Affiliation(s)
- Rory Munro
- School of Life Sciences, Medical School, Queens Medical Centre, University of Nottingham, Nottingham NG72RD, United Kingdom
| | - Satrio Wibowo
- School of Life Sciences, Medical School, Queens Medical Centre, University of Nottingham, Nottingham NG72RD, United Kingdom
| | - Alexander Payne
- School of Life Sciences, Medical School, Queens Medical Centre, University of Nottingham, Nottingham NG72RD, United Kingdom
| | - Matthew Loose
- School of Life Sciences, Medical School, Queens Medical Centre, University of Nottingham, Nottingham NG72RD, United Kingdom
| |
Collapse
|
7
|
Ulrich JU, Epping L, Pilz T, Walther B, Stingl K, Semmler T, Renard BY. Nanopore adaptive sampling effectively enriches bacterial plasmids. mSystems 2024; 9:e0094523. [PMID: 38376263 PMCID: PMC10949517 DOI: 10.1128/msystems.00945-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 01/23/2024] [Indexed: 02/21/2024] Open
Abstract
Bacterial plasmids play a major role in the spread of antibiotic resistance genes. However, their characterization via DNA sequencing suffers from the low abundance of plasmid DNA in those samples. Although sample preparation methods can enrich the proportion of plasmid DNA before sequencing, these methods are expensive and laborious, and they might introduce a bias by enriching only for specific plasmid DNA sequences. Nanopore adaptive sampling could overcome these issues by rejecting uninteresting DNA molecules during the sequencing process. In this study, we assess the application of adaptive sampling for the enrichment of low-abundant plasmids in known bacterial isolates using two different adaptive sampling tools. We show that a significant enrichment can be achieved even on expired flow cells. By applying adaptive sampling, we also improve the quality of de novo plasmid assemblies and reduce the sequencing time. However, our experiments also highlight issues with adaptive sampling if target and non-target sequences span similar regions. IMPORTANCE Antimicrobial resistance causes millions of deaths every year. Mobile genetic elements like bacterial plasmids are key drivers for the dissemination of antimicrobial resistance genes. This makes the characterization of plasmids via DNA sequencing an important tool for clinical microbiologists. Since plasmids are often underrepresented in bacterial samples, plasmid sequencing can be challenging and laborious. To accelerate the sequencing process, we evaluate nanopore adaptive sampling as an in silico method for the enrichment of low-abundant plasmids. Our results show the potential of this cost-efficient method for future plasmid research but also indicate issues that arise from using reference sequences.
Collapse
Affiliation(s)
- Jens-Uwe Ulrich
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, Berlin, Germany
- Phylogenomics Unit, Center for Artificial Intelligence in Public Health Research, Robert Koch Institute, Wildau, Germany
| | - Lennard Epping
- Genome Sequencing and Genomic Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Tanja Pilz
- Genome Sequencing and Genomic Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Birgit Walther
- Advanced Light and Electron Microscopy, Robert Koch Institute, Berlin, Germany
| | - Kerstin Stingl
- National Reference Laboratory for Campylobacter, Department of Biological Safety, German Federal Institute for Risk Assessment (BfR), Berlin, Germany
| | - Torsten Semmler
- Genome Sequencing and Genomic Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| |
Collapse
|
8
|
Goldsmith C, Thevin V, Fesneau O, Matias MI, Perrault J, Abid AH, Taylor N, Dardalhon V, Marie JC, Hernandez-Vargas H. Single-Molecule DNA Methylation Reveals Unique Epigenetic Identity Profiles of T Helper Cells. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2024; 212:1029-1039. [PMID: 38284984 PMCID: PMC11002815 DOI: 10.4049/jimmunol.2300091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 01/04/2024] [Indexed: 01/30/2024]
Abstract
Both identity and plasticity of CD4 T helper (Th) cells are regulated in part by epigenetic mechanisms. However, a method that reliably and readily profiles DNA base modifications is still needed to finely study Th cell differentiation. Cytosine methylation in CpG context (5mCpG) and cytosine hydroxymethylation (5hmCpG) are DNA modifications that identify stable cell phenotypes, but their potential to characterize intermediate cell transitions has not yet been evaluated. To assess transition states in Th cells, we developed a method to profile Th cell identity using Cas9-targeted single-molecule nanopore sequencing. Targeting as few as 10 selected genomic loci, we were able to distinguish major in vitro polarized murine T cell subtypes, as well as intermediate phenotypes, by their native DNA 5mCpG patterns. Moreover, by using off-target sequences, we were able to infer transcription factor activities relevant to each cell subtype. Detection of 5mCpG and 5hmCpG was validated on intestinal Th17 cells escaping transforming growth factor β control, using single-molecule adaptive sampling. A total of 21 differentially methylated regions mapping to the 10-gene panel were identified in pathogenic Th17 cells relative to their nonpathogenic counterpart. Hence, our data highlight the potential to exploit native DNA methylation profiling to study physiological and pathological transition states of Th cells.
Collapse
Affiliation(s)
- Chloe Goldsmith
- Tumor Escape Resistance and Immunity Department, Cancer Research Center of Lyon, The French League Against Cancer Certified Team, INSERM U1052, CNRS UMR 5286, Léon Bérard Centre and University of Lyon, Lyon, France
| | - Valentin Thevin
- Tumor Escape Resistance and Immunity Department, Cancer Research Center of Lyon, The French League Against Cancer Certified Team, INSERM U1052, CNRS UMR 5286, Léon Bérard Centre and University of Lyon, Lyon, France
| | - Olivier Fesneau
- Tumor Escape Resistance and Immunity Department, Cancer Research Center of Lyon, The French League Against Cancer Certified Team, INSERM U1052, CNRS UMR 5286, Léon Bérard Centre and University of Lyon, Lyon, France
| | - Maria I Matias
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| | - Julie Perrault
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| | - Ali Hani Abid
- Tumor Escape Resistance and Immunity Department, Cancer Research Center of Lyon, The French League Against Cancer Certified Team, INSERM U1052, CNRS UMR 5286, Léon Bérard Centre and University of Lyon, Lyon, France
| | - Naomi Taylor
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
- Pediatric Oncology Branch, National Cancer Institute, Center for Cancer Research, National Institutes of Health, Bethesda, MD
| | - Valérie Dardalhon
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| | - Julien C Marie
- Tumor Escape Resistance and Immunity Department, Cancer Research Center of Lyon, The French League Against Cancer Certified Team, INSERM U1052, CNRS UMR 5286, Léon Bérard Centre and University of Lyon, Lyon, France
| | - Hector Hernandez-Vargas
- Tumor Escape Resistance and Immunity Department, Cancer Research Center of Lyon, The French League Against Cancer Certified Team, INSERM U1052, CNRS UMR 5286, Léon Bérard Centre and University of Lyon, Lyon, France
- Genomics Consulting, Bron, France
| |
Collapse
|
9
|
Kovaka S, Hook PW, Jenike KM, Shivakumar V, Morina LB, Razaghi R, Timp W, Schatz MC. Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.05.583511. [PMID: 38496646 PMCID: PMC10942365 DOI: 10.1101/2024.03.05.583511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic/transcriptomic and epigenetic information without additional library preparation. Presently, only a limited set of modifications can be directly basecalled (e.g. 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis, and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods, and a reproducible de novo training method for k-mer-based pore models, revealing potential errors in ONT's state-of-the-art DNA model. We apply Uncalled4 to RNA 6-methyladenine (m6A) detection in seven human cell lines, identifying 26% more modifications than Nanopolish using m6Anet, including in several genes where m6A has known implications in cancer. Uncalled4 is available open-source at github.com/skovaka/uncalled4.
Collapse
|
10
|
Rodriguez I, Rossi NM, Keskus AG, Xie Y, Ahmad T, Bryant A, Lou H, Paredes JG, Milano R, Rao N, Tulsyan S, Boland JF, Luo W, Liu J, O'Hanlon T, Bess J, Mukhina V, Gaykalova D, Yuki Y, Malik L, Billingsley KJ, Blauwendraat C, Carrington M, Yeager M, Mirabello L, Kolmogorov M, Dean M. Insights into the mechanisms and structure of breakage-fusion-bridge cycles in cervical cancer using long-read sequencing. Am J Hum Genet 2024; 111:544-561. [PMID: 38307027 PMCID: PMC10940022 DOI: 10.1016/j.ajhg.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 12/20/2023] [Accepted: 01/04/2024] [Indexed: 02/04/2024] Open
Abstract
Cervical cancer is caused by human papillomavirus (HPV) infection, has few approved targeted therapeutics, and is the most common cause of cancer death in low-resource countries. We characterized 19 cervical and four head and neck cancer cell lines using long-read DNA and RNA sequencing and identified the HPV types, HPV integration sites, chromosomal alterations, and cancer driver mutations. Structural variation analysis revealed telomeric deletions associated with DNA inversions resulting from breakage-fusion-bridge (BFB) cycles. BFB is a common mechanism of chromosomal alterations in cancer, and our study applies long-read sequencing to this important chromosomal rearrangement type. Analysis of the inversion sites revealed staggered ends consistent with exonuclease digestion of the DNA after breakage. Some BFB events are complex, involving inter- or intra-chromosomal insertions or rearrangements. None of the BFB breakpoints had telomere sequences added to resolve the dicentric chromosomes, and only one BFB breakpoint showed chromothripsis. Five cell lines have a chromosomal region 11q BFB event, with YAP1-BIRC3-BIRC2 amplification. Indeed, YAP1 amplification is associated with a 10-year-earlier age of diagnosis of cervical cancer and is three times more common in African American women. This suggests that individuals with cervical cancer and YAP1-BIRC3-BIRC2 amplification, especially those of African ancestry, might benefit from targeted therapy. In summary, we uncovered valuable insights into the mechanisms and consequences of BFB cycles in cervical cancer using long-read sequencing.
Collapse
Affiliation(s)
- Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Nicole M Rossi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Ayse G Keskus
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Yi Xie
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Tanveer Ahmad
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Asher Bryant
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Hong Lou
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Jesica Godinez Paredes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Rose Milano
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Nina Rao
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA; Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Sonam Tulsyan
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Joseph F Boland
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Wen Luo
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Jia Liu
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Tim O'Hanlon
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Jazmyn Bess
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Vera Mukhina
- Department of Otorhinolaryngology-Head and Neck Surgery, University of Maryland School of Medical Center, Baltimore, MD, USA
| | - Daria Gaykalova
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA; Marlene & Stewart Greenebaum Comprehensive Cancer Center, University of Maryland Medical System, Baltimore, MD, USA; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Yuko Yuki
- Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA
| | | | - Cornelis Blauwendraat
- Center for Alzheimer's and Related Dementias, National Institute on Aging, Bethesda, MD, USA; Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Mary Carrington
- Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA; Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
| | - Meredith Yeager
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Lisa Mirabello
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA.
| |
Collapse
|
11
|
McCabe CV, Price PD, Codner GF, Allan AJ, Caulder A, Christou S, Loeffler J, Mackenzie M, Malzer E, Mianné J, Nowicki KJ, O’Neill EJ, Pike FJ, Hutchison M, Petit-Demoulière B, Stewart ME, Gates H, Wells S, Sanderson ND, Teboul L. Long-read sequencing for fast and robust identification of correct genome-edited alleles: PCR-based and Cas9 capture methods. PLoS Genet 2024; 20:e1011187. [PMID: 38457464 PMCID: PMC10954187 DOI: 10.1371/journal.pgen.1011187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 03/20/2024] [Accepted: 02/20/2024] [Indexed: 03/10/2024] Open
Abstract
BACKGROUND Recent developments in CRISPR/Cas9 genome-editing tools have facilitated the introduction of precise alleles, including genetic intervals spanning several kilobases, directly into the embryo. However, the introduction of donor templates, via homology directed repair, can be erroneous or incomplete and these techniques often produce mosaic founder animals. Thus, newly generated alleles must be verified at the sequence level across the targeted locus. Screening for the presence of the desired mutant allele using traditional sequencing methods can be challenging due to the size of the interval to be sequenced, together with the mosaic nature of founders. METHODOLOGY/PRINCIPAL FINDINGS In order to help disentangle the genetic complexity of these animals, we tested the application of Oxford Nanopore Technologies long-read sequencing at the targeted locus and found that the achievable depth of sequencing is sufficient to offset the sequencing error rate associated with the technology used to validate targeted regions of interest. We have assembled an analysis workflow that facilitates interrogating the entire length of a targeted segment in a single read, to confirm that the intended mutant sequence is present in both heterozygous animals and mosaic founders. We used this workflow to compare the output of PCR-based and Cas9 capture-based targeted sequencing for validation of edited alleles. CONCLUSION Targeted long-read sequencing supports in-depth characterisation of all experimental models that aim to produce knock-in or conditional alleles, including those that contain a mix of genome-edited alleles. PCR- or Cas9 capture-based modalities bring different advantages to the analysis.
Collapse
Affiliation(s)
| | - Peter D. Price
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
| | - Gemma F. Codner
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
| | | | - Adam Caulder
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
| | | | - Jorik Loeffler
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
| | | | - Elke Malzer
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
| | - Joffrey Mianné
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
| | | | | | - Fran J. Pike
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
| | - Marie Hutchison
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
| | - Benoit Petit-Demoulière
- Université de Strasbourg, CNRS, INSERM, Institut Clinique de la Souris (ICS), PHENOMIN, CELPHEDIA, Illkirch, France
| | | | - Hilary Gates
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
- Mammalian Genetics Unit, MRC Harwell, Oxfordshire, United Kingdom
| | - Sara Wells
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
| | - Nicholas D. Sanderson
- Nuffield Department of Clinical Medicine, University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom
| | - Lydia Teboul
- The Mary Lyon Centre, MRC Harwell, Oxfordshire, United Kingdom
| |
Collapse
|
12
|
Nakamura W, Hirata M, Oda S, Chiba K, Okada A, Mateos RN, Sugawa M, Iida N, Ushiama M, Tanabe N, Sakamoto H, Sekine S, Hirasawa A, Kawai Y, Tokunaga K, Tsujimoto SI, Shiba N, Ito S, Yoshida T, Shiraishi Y. Assessing the efficacy of target adaptive sampling long-read sequencing through hereditary cancer patient genomes. NPJ Genom Med 2024; 9:11. [PMID: 38368425 PMCID: PMC10874402 DOI: 10.1038/s41525-024-00394-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 01/15/2024] [Indexed: 02/19/2024] Open
Abstract
Innovations in sequencing technology have led to the discovery of novel mutations that cause inherited diseases. However, many patients with suspected genetic diseases remain undiagnosed. Long-read sequencing technologies are expected to significantly improve the diagnostic rate by overcoming the limitations of short-read sequencing. In addition, Oxford Nanopore Technologies (ONT) offers adaptive sampling and computationally driven target enrichment technology. This enables more affordable intensive analysis of target gene regions compared to standard non-selective long-read sequencing. In this study, we developed an efficient computational workflow for target adaptive sampling long-read sequencing (TAS-LRS) and evaluated it through application to 33 genomes collected from suspected hereditary cancer patients. Our workflow can identify single nucleotide variants with nearly the same accuracy as the short-read platform and elucidate complex forms of structural variations. We also newly identified several SINE-R/VNTR/Alu (SVA) elements affecting the APC gene in two patients with familial adenomatous polyposis, as well as their sites of origin. In addition, we demonstrated that off-target reads from adaptive sampling, which is typically discarded, can be effectively used to accurately genotype common single-nucleotide polymorphisms (SNPs) across the entire genome, enabling the calculation of a polygenic risk score. Furthermore, we identified allele-specific MLH1 promoter hypermethylation in a Lynch syndrome patient. In summary, our workflow with TAS-LRS can simultaneously capture monogenic risk variants including complex structural variations, polygenic background as well as epigenetic alterations, and will be an efficient platform for genetic disease research and diagnosis.
Collapse
Affiliation(s)
- Wataru Nakamura
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Makoto Hirata
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Satoyo Oda
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Division of Laboratory Medicine, National Cancer Center Hospital, Tokyo, Japan
| | - Kenichi Chiba
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ai Okada
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Raúl Nicolás Mateos
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Masahiro Sugawa
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Naoko Iida
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Mineko Ushiama
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Noriko Tanabe
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
| | - Hiromi Sakamoto
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Shigeki Sekine
- Division of Molecular Pathology, National Cancer Center Research Institute, Tokyo, Japan
| | - Akira Hirasawa
- Department of Clinical Genetics and Genomic Medicine, Okayama University Hospital, Okayama, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
- Central Biobank, National Center Biobank Network, Tokyo, Japan
| | - Shin-Ichi Tsujimoto
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Norio Shiba
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Shuichi Ito
- Department of Pediatrics, Yokohama City University Hospital, Kanagawa, Japan
| | - Teruhiko Yoshida
- Division of Genetic Medicine and Services, National Cancer Center Hospital, Tokyo, Japan
- Department of Clinical Genetics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yuichi Shiraishi
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan.
| |
Collapse
|
13
|
Zakeri M, Brown NK, Ahmed OY, Gagie T, Langmead B. Movi: a fast and cache-efficient full-text pangenome index. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.04.565615. [PMID: 37961660 PMCID: PMC10635132 DOI: 10.1101/2023.11.04.565615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Efficient pangenome indexes are promising tools for many applications, including rapid classification of nanopore sequencing reads. Recently, a compressed-index data structure called the "move structure" was proposed as an alternative to other BWT-based indexes like the FM index and r-index. The move structure uniquely achieves both O(r) space and O(1)-time queries, where r is the number of runs in the pangenome BWT. We implemented Movi, an efficient tool for building and querying move-structure pangenome indexes. While the size of the Movi's index is larger than the r-index, it scales at a smaller rate for pangenome references, as its size is exactly proportional to r, the number of runs in the BWT of the reference. Movi can compute sophisticated matching queries needed for classification - such as pseudo-matching lengths and backward search - at least ten times faster than the fastest available methods, and in some cases more than 30-fold faster. Movi achieves this speed by leveraging the move structure's strong locality of reference, incurring close to the minimum possible number of cache misses for queries against large pangenomes. We achieve still further speed improvements by using memory prefetching to attain a degree of latency hiding that would be difficult with other index structures like the r-index. Movi's fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.
Collapse
Affiliation(s)
- Mohsen Zakeri
- Department of Computer Science, Johns Hopkins University
| | | | - Omar Y Ahmed
- Department of Computer Science, Johns Hopkins University
| | - Travis Gagie
- Faculty of Computer Science, Dalhousie University
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University
| |
Collapse
|
14
|
Neujahr AC, Loy DS, Loy JD, Brodersen BW, Fernando SC. Rapid detection of high consequence and emerging viral pathogens in pigs. Front Vet Sci 2024; 11:1341783. [PMID: 38384961 PMCID: PMC10879307 DOI: 10.3389/fvets.2024.1341783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 01/15/2024] [Indexed: 02/23/2024] Open
Abstract
Introduction An increasing emergence of novel animal pathogens has been observed over the last decade. Viruses are a major contributor to the increased emergence and therefore, veterinary surveillance and testing procedures are greatly needed to rapidly and accurately detect high-consequence animal diseases such as Foot and Mouth Disease, Highly Pathogenic Avian Influenza, Classical Swine Fever, and African Swine Fever. The major detection methods for such diseases include real-time PCR assays and pathogen-specific antibodies among others. However, due to genetic drift or -shift in virus genomes, failure to detect such pathogens is a risk with devastating consequences. Additionally, the emergence of novel pathogens with no prior knowledge requires non-biased detection methods for discovery. Methods Utilizing enrichment techniques coupled with Oxford Nanopore Technologies MinION™ sequencing platform, we developed a sample processing and analysis pipeline to identify DNA and RNA viruses and bacterial pathogens from clinical samples. Results and discussion The sample processing and analysis pipeline developed allows the identification of both DNA and RNA viruses and bacterial pathogens simultaneously from a single tissue sample and provides results in less than 12 h. Preliminary evaluation of this method using surrogate viruses in different matrices and using clinical samples from animals with unknown disease causality, we demonstrate that this method can be used to simultaneously detect pathogens from multiple domains of life simultaneously with high confidence.
Collapse
Affiliation(s)
- Alison C. Neujahr
- Department of Complex Biosystems, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Duan S. Loy
- Nebraska Veterinary Diagnostic Center, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - John Dustin Loy
- Nebraska Veterinary Diagnostic Center, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Bruce W. Brodersen
- Nebraska Veterinary Diagnostic Center, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Samodha C. Fernando
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE, United States
- Department of Food Science, University of Nebraska-Lincoln, Lincoln, NE, United States
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, United States
| |
Collapse
|
15
|
Cao L, Kong Y, Fan Y, Ni M, Tourancheau A, Ksiezarek M, Mead EA, Koo T, Gitman M, Zhang XS, Fang G. mEnrich-seq: methylation-guided enrichment sequencing of bacterial taxa of interest from microbiome. Nat Methods 2024; 21:236-246. [PMID: 38177508 DOI: 10.1038/s41592-023-02125-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 11/08/2023] [Indexed: 01/06/2024]
Abstract
Metagenomics has enabled the comprehensive study of microbiomes. However, many applications would benefit from a method that sequences specific bacterial taxa of interest, but not most background taxa. We developed mEnrich-seq (in which 'm' stands for methylation and seq for sequencing) for enriching taxa of interest from metagenomic DNA before sequencing. The core idea is to exploit the self versus nonself differentiation by natural bacterial DNA methylation and rationally choose methylation-sensitive restriction enzymes, individually or in combination, to deplete host and background taxa while enriching targeted taxa. This idea is integrated with library preparation procedures and applied in several applications to enrich (up to 117-fold) pathogenic or beneficial bacteria from human urine and fecal samples, including species that are hard to culture or of low abundance. We assessed 4,601 bacterial strains with mapped methylomes so far and showed broad applicability of mEnrich-seq. mEnrich-seq provides microbiome researchers with a versatile and cost-effective approach for selective sequencing of diverse taxa of interest.
Collapse
Affiliation(s)
- Lei Cao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yimeng Kong
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yu Fan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mi Ni
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alan Tourancheau
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Magdalena Ksiezarek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Edward A Mead
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Tonny Koo
- Department of Pathology, Molecular and Cell-based Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Melissa Gitman
- Department of Pathology, Molecular and Cell-based Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Xue-Song Zhang
- Center for Advanced Biotechnology and Medicine, Rutgers University, New Brunswick, NJ, USA
| | - Gang Fang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
16
|
De Meulenaere K, Cuypers WL, Gauglitz JM, Guetens P, Rosanas-Urgell A, Laukens K, Cuypers B. Selective whole-genome sequencing of Plasmodium parasites directly from blood samples by nanopore adaptive sampling. mBio 2024; 15:e0196723. [PMID: 38054750 PMCID: PMC10790762 DOI: 10.1128/mbio.01967-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 10/20/2023] [Indexed: 12/07/2023] Open
Abstract
IMPORTANCE Malaria is caused by parasites of the genus Plasmodium, and reached a global disease burden of 247 million cases in 2021. To study drug resistance mutations and parasite population dynamics, whole-genome sequencing of patient blood samples is commonly performed. However, the predominance of human DNA in these samples imposes the need for time-consuming laboratory procedures to enrich Plasmodium DNA. We used the Oxford Nanopore Technologies' adaptive sampling feature to circumvent this problem and enrich Plasmodium reads directly during the sequencing run. We demonstrate that adaptive nanopore sequencing efficiently enriches Plasmodium reads, which simplifies and shortens the timeline from blood collection to parasite sequencing. In addition, we show that the obtained data can be used for monitoring genetic markers, or to generate nearly complete genomes. Finally, owing to its inherent mobility, this technology can be easily applied on-site in endemic areas where patients would benefit the most from genomic surveillance.
Collapse
Affiliation(s)
- Katlijn De Meulenaere
- Department of Computer Science, Adrem Data Lab, University of Antwerp, Wilrijk, Belgium
- Department of Biomedical Sciences, Malariology Unit, Institute of Tropical Medicine, Antwerp, Belgium
| | - Wim L. Cuypers
- Department of Computer Science, Adrem Data Lab, University of Antwerp, Wilrijk, Belgium
| | - Julia M. Gauglitz
- Department of Computer Science, Adrem Data Lab, University of Antwerp, Wilrijk, Belgium
| | - Pieter Guetens
- Department of Biomedical Sciences, Malariology Unit, Institute of Tropical Medicine, Antwerp, Belgium
| | - Anna Rosanas-Urgell
- Department of Biomedical Sciences, Malariology Unit, Institute of Tropical Medicine, Antwerp, Belgium
| | - Kris Laukens
- Department of Computer Science, Adrem Data Lab, University of Antwerp, Wilrijk, Belgium
- Excellence centre for Microbial Systems Technology, University of Antwerp, Wilrijk, Belgium
| | - Bart Cuypers
- Department of Computer Science, Adrem Data Lab, University of Antwerp, Wilrijk, Belgium
- Excellence centre for Microbial Systems Technology, University of Antwerp, Wilrijk, Belgium
| |
Collapse
|
17
|
Wang J, Yang L, Cheng A, Tham CY, Tan W, Darmawan J, de Sessions PF, Wan Y. Direct RNA sequencing coupled with adaptive sampling enriches RNAs of interest in the transcriptome. Nat Commun 2024; 15:481. [PMID: 38212309 PMCID: PMC10784512 DOI: 10.1038/s41467-023-44656-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 12/22/2023] [Indexed: 01/13/2024] Open
Abstract
Abundant cellular transcripts occupy most of the sequencing reads in the transcriptome, making it challenging to assay for low-abundant transcripts. Here, we utilize the adaptive sampling function of Oxford Nanopore sequencing to selectively deplete and enrich RNAs of interest without biochemical manipulation before sequencing. Adaptive sampling performed on a pool of in vitro transcribed RNAs resulted in a net increase of 22-30% in the proportion of transcripts of interest in the population. Enriching and depleting different proportions of the Candida albicans transcriptome also resulted in a 11-13.5% increase in the number of reads on target transcripts, with longer and more abundant transcripts being more efficiently depleted. Depleting all currently annotated Candida albicans transcripts did not result in an absolute enrichment of remaining transcripts, although we identified 26 previously unknown transcripts and isoforms, 17 of which are antisense to existing transcripts. Further improvements in the adaptive sampling of RNAs will allow the technology to be widely applied to study RNAs of interest in diverse transcriptomes.
Collapse
Affiliation(s)
- Jiaxu Wang
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore
| | - Lin Yang
- Oxford Nanopore Technologies, Singapore, 138667, Singapore
| | - Anthony Cheng
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore
| | | | - Wenting Tan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore
| | - Jefferson Darmawan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore
| | | | - Yue Wan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, 138672, Singapore.
- Department of Biochemistry, National University of Singapore, Singapore, 117596, Singapore.
| |
Collapse
|
18
|
Terrazos Miani MA, Borcard L, Gempeler S, Baumann C, Bittel P, Leib SL, Neuenschwander S, Ramette A. NASCarD (Nanopore Adaptive Sampling with Carrier DNA): A Rapid, PCR-Free Method for SARS-CoV-2 Whole-Genome Sequencing in Clinical Samples. Pathogens 2024; 13:61. [PMID: 38251368 PMCID: PMC10818518 DOI: 10.3390/pathogens13010061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 01/04/2024] [Accepted: 01/07/2024] [Indexed: 01/23/2024] Open
Abstract
Whole-genome sequencing (WGS) represents the main technology for SARS-CoV-2 lineage characterization in diagnostic laboratories worldwide. The rapid, near-full-length sequencing of the viral genome is commonly enabled by high-throughput sequencing of PCR amplicons derived from cDNA molecules. Here, we present a new approach called NASCarD (Nanopore Adaptive Sampling with Carrier DNA), which allows a low amount of nucleic acids to be sequenced while selectively enriching for sequences of interest, hence limiting the production of non-target sequences. Using COVID-19 positive samples available during the omicron wave, we demonstrate how the method may lead to >99% genome completeness of the SARS-CoV-2 genome sequences within 7 h of sequencing at a competitive cost. The new approach may have applications beyond SARS-CoV-2 sequencing for other DNA or RNA pathogens in clinical samples.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Alban Ramette
- Institute for Infectious Diseases, University of Bern, Friedbühlstrasse 25, 3001 Bern, Switzerland
| |
Collapse
|
19
|
Urban L, Miller AK, Eason D, Vercoe D, Shaffer M, Wilkinson SP, Jeunen GJ, Gemmell NJ, Digby A. Non-invasive real-time genomic monitoring of the critically endangered kākāpō. eLife 2023; 12:RP84553. [PMID: 38153986 PMCID: PMC10754495 DOI: 10.7554/elife.84553] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2023] Open
Abstract
We used non-invasive real-time genomic approaches to monitor one of the last surviving populations of the critically endangered kākāpō (Strigops habroptilus). We first established an environmental DNA metabarcoding protocol to identify the distribution of kākāpō and other vertebrate species in a highly localized manner using soil samples. Harnessing real-time nanopore sequencing and the high-quality kākāpō reference genome, we then extracted species-specific DNA from soil. We combined long read-based haplotype phasing with known individual genomic variation in the kākāpō population to identify the presence of individuals, and confirmed these genomically informed predictions through detailed metadata on kākāpō distributions. This study shows that individual identification is feasible through nanopore sequencing of environmental DNA, with important implications for future efforts in the application of genomics to the conservation of rare species, potentially expanding the application of real-time environmental DNA research from monitoring species distribution to inferring fitness parameters such as genomic diversity and inbreeding.
Collapse
Affiliation(s)
- Lara Urban
- Department of Anatomy, University of OtagoDunedinNew Zealand
- Helmholtz Pioneer Campus, Helmholtz Zentrum MuenchenNeuherbergGermany
- Helmholtz AI, Helmholtz Zentrum MuenchenNeuherbergGermany
- Technical University of Munich, School of Life SciencesFreisingGermany
| | | | - Daryl Eason
- Kākāpō Recovery Programme, Department of ConservationInvercargillNew Zealand
| | - Deidre Vercoe
- Kākāpō Recovery Programme, Department of ConservationInvercargillNew Zealand
| | | | | | - Gert-Jan Jeunen
- Department of Anatomy, University of OtagoDunedinNew Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of OtagoDunedinNew Zealand
| | - Andrew Digby
- Kākāpō Recovery Programme, Department of ConservationInvercargillNew Zealand
| |
Collapse
|
20
|
Wheeler NE, Price V, Cunningham-Oakes E, Tsang KK, Nunn JG, Midega JT, Anjum MF, Wade MJ, Feasey NA, Peacock SJ, Jauneikaite E, Baker KS. Innovations in genomic antimicrobial resistance surveillance. THE LANCET. MICROBE 2023; 4:e1063-e1070. [PMID: 37977163 DOI: 10.1016/s2666-5247(23)00285-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 08/16/2023] [Accepted: 08/22/2023] [Indexed: 11/19/2023]
Abstract
Whole-genome sequencing of antimicrobial-resistant pathogens is increasingly being used for antimicrobial resistance (AMR) surveillance, particularly in high-income countries. Innovations in genome sequencing and analysis technologies promise to revolutionise AMR surveillance and epidemiology; however, routine adoption of these technologies is challenging, particularly in low-income and middle-income countries. As part of a wider series of workshops and online consultations, a group of experts in AMR pathogen genomics and computational tool development conducted a situational analysis, identifying the following under-used innovations in genomic AMR surveillance: clinical metagenomics, environmental metagenomics, gene or plasmid tracking, and machine learning. The group recommended developing cost-effective use cases for each approach and mapping data outputs to clinical outcomes of interest to justify additional investment in capacity, training, and staff required to implement these technologies. Harmonisation and standardisation of methods, and the creation of equitable data sharing and governance frameworks, will facilitate successful implementation of these innovations.
Collapse
Affiliation(s)
- Nicole E Wheeler
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, Edgbaston, UK
| | - Vivien Price
- Department of Clinical Infection, Immunology and Microbiology, Liverpool Centre for Global Health Research, University of Liverpool, Liverpool, UK
| | - Edward Cunningham-Oakes
- Department of Infection Biology and Microbiomes, Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Kara K Tsang
- Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, UK
| | - Jamie G Nunn
- Infectious Disease Challenge Area, Wellcome Trust, London, UK
| | | | - Muna F Anjum
- Department of Bacteriology, Animal and Plant Health Agency, Surrey, UK
| | - Matthew J Wade
- Data Analytics and Surveillance Group, UK Health Security Agency, London, UK; School of Engineering, Newcastle University, Newcastle-upon-Tyne, UK
| | - Nicholas A Feasey
- Clinical Sciences, Liverpool School of Tropical Medicine, Liverpool, UK; Malawi Liverpool Wellcome Research Programme, Chichiri, Blantyre, Malawi
| | | | - Elita Jauneikaite
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK; NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Department of Infectious Disease, Imperial College London, Hammersmith Hospital, London, UK
| | - Kate S Baker
- Centre for Clinical Infection, Microbiology and Immunology, University of Liverpool, Liverpool, UK; Department of Genetics, University of Cambridge, Cambridge, UK.
| |
Collapse
|
21
|
Naarmann-de Vries IS, Gjerga E, Gandor CLA, Dieterich C. Adaptive sampling for nanopore direct RNA-sequencing. RNA (NEW YORK, N.Y.) 2023; 29:1939-1949. [PMID: 37673469 PMCID: PMC10653383 DOI: 10.1261/rna.079727.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/14/2023] [Indexed: 09/08/2023]
Abstract
Nanopore long-read sequencing enables real-time monitoring and controlling of individual nanopores. This allows us to enrich or deplete specific sequences in DNA sequencing in a process called "adaptive sampling." So far, adaptive sampling (AS) was not applicable to the direct sequencing of RNA. Here, we show that AS is feasible and useful for direct RNA sequencing (DRS), which has its specific technical and biological challenges. Using a well-controlled in vitro transcript-based model system, we identify essential characteristics and parameter settings for AS in DRS, as the superior performance of depletion over enrichment. Here, the efficiency of depletion is close to the theoretical maximum. Additionally, we demonstrate that AS efficiently depletes specific transcripts in transcriptome-wide sequencing applications. Specifically, we applied our AS approach to poly(A)-enriched RNA samples from human-induced pluripotent stem cell-derived cardiomyocytes and mouse whole heart tissue and show efficient 2.5- to 2.8-fold depletion of highly abundant mitochondrial-encoded transcripts. Finally, we characterize depletion and enrichment performance for complex transcriptome subsets, that is, at the level of the entire Chromosome 11, proving the general applicability of direct RNA AS. Our analyses provide evidence that AS is especially useful to enable the detection of lowly expressed transcripts and reduce the sequencing of highly abundant disturbing transcripts.
Collapse
Affiliation(s)
- Isabel S Naarmann-de Vries
- Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, 69120 Heidelberg, Germany
- German Center for Cardiovascular Research (DZHK), Partner site Heidelberg/Mannheim, 69120 Heidelberg, Germany
| | - Enio Gjerga
- Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, 69120 Heidelberg, Germany
- German Center for Cardiovascular Research (DZHK), Partner site Heidelberg/Mannheim, 69120 Heidelberg, Germany
| | - Catharina L A Gandor
- Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Christoph Dieterich
- Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, 69120 Heidelberg, Germany
- German Center for Cardiovascular Research (DZHK), Partner site Heidelberg/Mannheim, 69120 Heidelberg, Germany
| |
Collapse
|
22
|
Filser M, Schwartz M, Merchadou K, Hamza A, Villy MC, Decees A, Frouin E, Girard E, Caputo SM, Renault V, Becette V, Golmard L, Servant N, Stoppa-Lyonnet D, Delattre O, Colas C, Masliah-Planchon J. Adaptive nanopore sequencing to determine pathogenicity of BRCA1 exonic duplication. J Med Genet 2023; 60:1206-1209. [PMID: 37263769 DOI: 10.1136/jmg-2023-109155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 04/30/2023] [Indexed: 06/03/2023]
Abstract
BRCA1 and BRCA2 are tumour suppressor genes that have been characterised as predisposition genes for the development of hereditary breast and ovarian cancers among other malignancies. The molecular diagnosis of this predisposition syndrome is based on the detection of inactivating variants of any type in those genes. But in the case of structural variants, functional consequences can be difficult to assess using standard molecular methods, as the precise resolution of their sequence is often impossible with short-read next generation sequencing techniques. It has been recently demonstrated that Oxford Nanopore long-read sequencing technology can accurately and rapidly provide genetic diagnoses of Mendelian diseases, including those linked to pathogenic structural variants. Here, we report the accurate resolution of a germline duplication event of exons 18-20 of BRCA1 using Nanopore sequencing with adaptive sampling target enrichment. This allowed us to classify this variant as pathogenic within a short timeframe of 10 days. This study provides a proof-of-concept that nanopore adaptive sampling is a highly efficient technique for the investigation of structural variants of tumour suppressor genes in a clinical context.
Collapse
Affiliation(s)
- Mathilde Filser
- Genetics Department, Institut Curie, Paris, France
- PSL Research University, Paris, France
| | - Mathias Schwartz
- Genetics Department, Institut Curie, Paris, France
- PSL Research University, Paris, France
| | - Kevin Merchadou
- PSL Research University, Paris, France
- Clinical Bioinformatics Unit, Institut Curie, Paris, France
| | - Abderaouf Hamza
- Genetics Department, Institut Curie, Paris, France
- PSL Research University, Paris, France
| | - Marie-Charlotte Villy
- Oncogenetic Clinic Unit, Institut Curie, Paris, France
- SIREDO Oncology Centre, Institut Curie, Paris, France
| | - Antoine Decees
- Genetics Department, Institut Curie, Paris, France
- PSL Research University, Paris, France
| | - Eléonore Frouin
- PSL Research University, Paris, France
- Clinical Bioinformatics Unit, Institut Curie, Paris, France
| | - Elodie Girard
- PSL Research University, Paris, France
- INSERM U900, Institut Curie, Paris, France
| | - Sandrine M Caputo
- Genetics Department, Institut Curie, Paris, France
- PSL Research University, Paris, France
| | - Victor Renault
- PSL Research University, Paris, France
- Clinical Bioinformatics Unit, Institut Curie, Paris, France
| | - Véronique Becette
- PSL Research University, Paris, France
- Anatomo- and Cyto-pathology, Institut Curie, Saint-Cloud, France
| | - Lisa Golmard
- Genetics Department, Institut Curie, Paris, France
- PSL Research University, Paris, France
| | - Nicolas Servant
- PSL Research University, Paris, France
- INSERM U900, Institut Curie, Paris, France
| | - Dominique Stoppa-Lyonnet
- Genetics Department, Institut Curie, Paris, France
- SIREDO Oncology Centre, Institut Curie, Paris, France
| | - Olivier Delattre
- Genetics Department, Institut Curie, Paris, France
- Inserm U830, PSL University, Research Center, Institut Curie, Paris, France
| | - Chrystelle Colas
- PSL Research University, Paris, France
- Oncogenetic Clinic Unit, Institut Curie, Paris, France
| | | |
Collapse
|
23
|
Lin Y, Zhang Y, Sun H, Jiang H, Zhao X, Teng X, Lin J, Shu B, Sun H, Liao Y, Zhou J. NanoDeep: a deep learning framework for nanopore adaptive sampling on microbial sequencing. Brief Bioinform 2023; 25:bbad499. [PMID: 38189540 PMCID: PMC10772945 DOI: 10.1093/bib/bbad499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 11/21/2023] [Accepted: 12/11/2023] [Indexed: 01/09/2024] Open
Abstract
Nanopore sequencers can enrich or deplete the targeted DNA molecules in a library by reversing the voltage across individual nanopores. However, it requires substantial computational resources to achieve rapid operations in parallel at read-time sequencing. We present a deep learning framework, NanoDeep, to overcome these limitations by incorporating convolutional neural network and squeeze and excitation. We first showed that the raw squiggle derived from native DNA sequences determines the origin of microbial and human genomes. Then, we demonstrated that NanoDeep successfully classified bacterial reads from the pooled library with human sequence and showed enrichment for bacterial sequence compared with routine nanopore sequencing setting. Further, we showed that NanoDeep improves the sequencing efficiency and preserves the fidelity of bacterial genomes in the mock sample. In addition, NanoDeep performs well in the enrichment of metagenome sequences of gut samples, showing its potential applications in the enrichment of unknown microbiota. Our toolkit is available at https://github.com/lysovosyl/NanoDeep.
Collapse
Affiliation(s)
- Yusen Lin
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Yongjun Zhang
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Hang Sun
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Hang Jiang
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Xing Zhao
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China
| | - Xiaojuan Teng
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Jingxia Lin
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Bowen Shu
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Hao Sun
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China
- Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China
| | - Yuhui Liao
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Jiajian Zhou
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
24
|
Han R, Qi J, Xue Y, Sun X, Zhang F, Gao X, Li G. HycDemux: a hybrid unsupervised approach for accurate barcoded sample demultiplexing in nanopore sequencing. Genome Biol 2023; 24:222. [PMID: 37798751 PMCID: PMC10552309 DOI: 10.1186/s13059-023-03053-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Accepted: 09/08/2023] [Indexed: 10/07/2023] Open
Abstract
DNA barcodes enable Oxford Nanopore sequencing to sequence multiple barcoded DNA samples on a single flow cell. DNA sequences with the same barcode need to be grouped together through demultiplexing. As the number of samples increases, accurate demultiplexing becomes difficult. We introduce HycDemux, which incorporates a GPU-parallelized hybrid clustering algorithm that uses nanopore signals and DNA sequences for accurate data clustering, alongside a voting-based module to finalize the demultiplexing results. Comprehensive experiments demonstrate that our approach outperforms unsupervised tools in short sequence fragment clustering and performs more robustly than current state-of-the-art demultiplexing tools for complex multi-sample sequencing data.
Collapse
Affiliation(s)
- Renmin Han
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China
| | - Junhai Qi
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China
- BioMap Research, California, USA
| | - Yang Xue
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China
| | - Xiujuan Sun
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
| | - Fa Zhang
- School of Medical Technolgoy, Beijing Institute of Technology, Beijing, 100085, China.
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, 23955, Saudi Arabia.
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.
| |
Collapse
|
25
|
Majidian S, Agustinho DP, Chin CS, Sedlazeck FJ, Mahmoud M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol 2023; 24:221. [PMID: 37798733 PMCID: PMC10552390 DOI: 10.1186/s13059-023-03061-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
26
|
Zeibich R, Kwan P, J. O’Brien T, Perucca P, Ge Z, Anderson A. Applications for Deep Learning in Epilepsy Genetic Research. Int J Mol Sci 2023; 24:14645. [PMID: 37834093 PMCID: PMC10572791 DOI: 10.3390/ijms241914645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 09/11/2023] [Accepted: 09/21/2023] [Indexed: 10/15/2023] Open
Abstract
Epilepsy is a group of brain disorders characterised by an enduring predisposition to generate unprovoked seizures. Fuelled by advances in sequencing technologies and computational approaches, more than 900 genes have now been implicated in epilepsy. The development and optimisation of tools and methods for analysing the vast quantity of genomic data is a rapidly evolving area of research. Deep learning (DL) is a subset of machine learning (ML) that brings opportunity for novel investigative strategies that can be harnessed to gain new insights into the genomic risk of people with epilepsy. DL is being harnessed to address limitations in accuracy of long-read sequencing technologies, which improve on short-read methods. Tools that predict the functional consequence of genetic variation can represent breaking ground in addressing critical knowledge gaps, while methods that integrate independent but complimentary data enhance the predictive power of genetic data. We provide an overview of these DL tools and discuss how they may be applied to the analysis of genetic data for epilepsy research.
Collapse
Affiliation(s)
- Robert Zeibich
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
| | - Patrick Kwan
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Neurology, Alfred Health, Melbourne, VIC 3004, Australia
- Department of Neurology, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Terence J. O’Brien
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Neurology, Alfred Health, Melbourne, VIC 3004, Australia
- Department of Neurology, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Piero Perucca
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Neurology, Alfred Health, Melbourne, VIC 3004, Australia
- Department of Neurology, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
- Epilepsy Research Centre, Department of Medicine, Austin Health, The University of Melbourne, Melbourne, VIC 3084, Australia
- Bladin-Berkovic Comprehensive Epilepsy Program, Department of Neurology, Austin Health, The University of Melbourne, Melbourne, VIC 3084, Australia
| | - Zongyuan Ge
- Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia;
- Monash-Airdoc Research, Monash University, Melbourne, VIC 3800, Australia
| | - Alison Anderson
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| |
Collapse
|
27
|
van Dijk EL, Naquin D, Gorrichon K, Jaszczyszyn Y, Ouazahrou R, Thermes C, Hernandez C. Genomics in the long-read sequencing era. Trends Genet 2023; 39:649-671. [PMID: 37230864 DOI: 10.1016/j.tig.2023.04.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023]
Abstract
Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.
Collapse
Affiliation(s)
- Erwin L van Dijk
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.
| | - Delphine Naquin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Kévin Gorrichon
- National Center of Human Genomics Research (CNRGH), 91000 Évry-Courcouronnes, France
| | - Yan Jaszczyszyn
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Rania Ouazahrou
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Claude Thermes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Céline Hernandez
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
28
|
Hook PW, Timp W. Beyond assembly: the increasing flexibility of single-molecule sequencing technology. Nat Rev Genet 2023; 24:627-641. [PMID: 37161088 PMCID: PMC10169143 DOI: 10.1038/s41576-023-00600-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2023] [Indexed: 05/11/2023]
Abstract
The maturation of high-throughput short-read sequencing technology over the past two decades has shaped the way genomes are studied. Recently, single-molecule, long-read sequencing has emerged as an essential tool in deciphering genome structure and function, including filling gaps in the human reference genome, measuring the epigenome and characterizing splicing variants in the transcriptome. With recent technological developments, these single-molecule technologies have moved beyond genome assembly and are being used in a variety of ways, including to selectively sequence specific loci with long reads, measure chromatin state and protein-DNA binding in order to investigate the dynamics of gene regulation, and rapidly determine copy number variation. These increasingly flexible uses of single-molecule technologies highlight a young and fast-moving part of the field that is leading to a more accessible era of nucleic acid sequencing.
Collapse
Affiliation(s)
- Paul W Hook
- Department of Biomedical Engineering, Molecular Biology and Genetics, and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Molecular Biology and Genetics, and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
29
|
Shivakumar VS, Ahmed OY, Kovaka S, Zakeri M, Langmead B. Sigmoni: classification of nanopore signal with a compressed pangenome index. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.15.553308. [PMID: 37645873 PMCID: PMC10462034 DOI: 10.1101/2023.08.15.553308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes.
Collapse
Affiliation(s)
| | - Omar Y. Ahmed
- Department of Computer Science, Johns Hopkins University
| | - Sam Kovaka
- Department of Computer Science, Johns Hopkins University
| | - Mohsen Zakeri
- Department of Computer Science, Johns Hopkins University
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University
| |
Collapse
|
30
|
Rodriguez I, Rossi NM, Keskus A, Xie Y, Ahmad T, Bryant A, Lou H, Paredes JG, Milano R, Rao N, Tulsyan S, Boland JF, Luo W, Liu J, O’Hanlon T, Bess J, Mukhina V, Gaykalova D, Yuki Y, Malik L, Billingsley K, Blauwendraat C, Carrington M, Yeager M, Mirabello L, Kolmogorov M, Dean M. Insights into the Mechanisms and Structure of Breakage-Fusion-Bridge Cycles in Cervical Cancer using Long-Read Sequencing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.21.23294276. [PMID: 37662332 PMCID: PMC10473792 DOI: 10.1101/2023.08.21.23294276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Cervical cancer is caused by human papillomavirus (HPV) infection, has few approved targeted therapeutics, and is the most common cause of cancer death in low-resource countries. We characterized 19 cervical and four head and neck cell lines using long-read DNA and RNA sequencing and identified the HPV types, HPV integration sites, chromosomal alterations, and cancer driver mutations. Structural variation analysis revealed telomeric deletions associated with DNA inversions resulting from breakage-fusion-bridge (BFB) cycles. BFB is a common mechanism of chromosomal alterations in cancer, and this is one of the first analyses of these events using long-read sequencing. Analysis of the inversion sites revealed staggered ends consistent with exonuclease digestion of the DNA after breakage. Some BFB events are complex, involving inter- or intra-chromosomal insertions or rearrangements. None of the BFB breakpoints had telomere sequences added to resolve the dicentric chromosomes and only one BFB breakpoint showed chromothripsis. Five cell lines have a Chr11q BFB event, with YAP1/BIRC2/BIRC3 gene amplification. Indeed, YAP1 amplification is associated with a 10-year earlier age of diagnosis of cervical cancer and is three times more common in African American women. This suggests that cervical cancer patients with YAP1/BIRC2/BIRC3-amplification, especially those of African American ancestry, might benefit from targeted therapy. In summary, we uncovered new insights into the mechanisms and consequences of BFB cycles in cervical cancer using long-read sequencing.
Collapse
Affiliation(s)
- Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Nicole M. Rossi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Ayse Keskus
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Yi Xie
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Tanveer Ahmad
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Asher Bryant
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Hong Lou
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA and Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD
| | - Jesica Godinez Paredes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Rose Milano
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Nina Rao
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Sonam Tulsyan
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Joseph F. Boland
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA and Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD
| | - Wen Luo
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA and Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD
| | - Jia Liu
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA and Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD
| | - Tim O’Hanlon
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA and Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD
| | - Jazmyn Bess
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Vera Mukhina
- Department of Otorhinolaryngology-Head and Neck Surgery, University of Maryland School of Medical Center, Baltimore, MD, USA
| | - Daria Gaykalova
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Marlene & Stewart Greenebaum Comprehensive Cancer Center, University of Maryland Medical System, Baltimore, MD, USA
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Yuko Yuki
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA and Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD
| | - Laksh Malik
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA and Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | - Kimberley Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA and Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland, USA and Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, Maryland, USA
| | - Mary Carrington
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA and Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts, USA
| | - Meredith Yeager
- Basic Science Program, Frederick National Laboratory for Cancer Research, National Cancer Institute, Frederick, MD, USA and Laboratory of Integrative Cancer Immunology, Center for Cancer Research, National Cancer Institute, Bethesda, MD
| | - Lisa Mirabello
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| |
Collapse
|
31
|
Urban L, Perlas A, Francino O, Martí‐Carreras J, Muga BA, Mwangi JW, Boykin Okalebo L, Stanton JL, Black A, Waipara N, Fontsere C, Eccles D, Urel H, Reska T, Morales HE, Palmada‐Flores M, Marques‐Bonet T, Watsa M, Libke Z, Erkenswick G, van Oosterhout C. Real-time genomics for One Health. Mol Syst Biol 2023; 19:e11686. [PMID: 37325891 PMCID: PMC10407731 DOI: 10.15252/msb.202311686] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 06/17/2023] Open
Abstract
The ongoing degradation of natural systems and other environmental changes has put our society at a crossroad with respect to our future relationship with our planet. While the concept of One Health describes how human health is inextricably linked with environmental health, many of these complex interdependencies are still not well-understood. Here, we describe how the advent of real-time genomic analyses can benefit One Health and how it can enable timely, in-depth ecosystem health assessments. We introduce nanopore sequencing as the only disruptive technology that currently allows for real-time genomic analyses and that is already being used worldwide to improve the accessibility and versatility of genomic sequencing. We showcase real-time genomic studies on zoonotic disease, food security, environmental microbiome, emerging pathogens, and their antimicrobial resistances, and on environmental health itself - from genomic resource creation for wildlife conservation to the monitoring of biodiversity, invasive species, and wildlife trafficking. We stress why equitable access to real-time genomics in the context of One Health will be paramount and discuss related practical, legal, and ethical limitations.
Collapse
Affiliation(s)
- Lara Urban
- Helmholtz AI, Helmholtz Zentrum MuenchenNeuherbergGermany
- Helmholtz Pioneer Campus, Helmholtz Zentrum MuenchenNeuherbergGermany
- School of Life Sciences, Technical University of MunichFreisingGermany
| | - Albert Perlas
- Helmholtz AI, Helmholtz Zentrum MuenchenNeuherbergGermany
- Helmholtz Pioneer Campus, Helmholtz Zentrum MuenchenNeuherbergGermany
| | - Olga Francino
- Nano1Health SL, Parc de Recerca UABCampus Universitat Autònoma de BarcelonaBarcelonaSpain
| | - Joan Martí‐Carreras
- Nano1Health SL, Parc de Recerca UABCampus Universitat Autònoma de BarcelonaBarcelonaSpain
| | - Brenda A Muga
- Department of AnatomyUniversity of OtagoDunedinNew Zealand
| | | | | | | | - Amanda Black
- Bioprotection AotearoaLincoln UniversityLincolnNew Zealand
| | | | - Claudia Fontsere
- Center for Evolutionary HologenomicsThe Globe Institute, University of CopenhagenCopenhagenDenmark
| | - David Eccles
- Hugh Green Cytometry CentreMalaghan Institute of Medical ResearchWellingtonNew Zealand
| | - Harika Urel
- Helmholtz AI, Helmholtz Zentrum MuenchenNeuherbergGermany
- Helmholtz Pioneer Campus, Helmholtz Zentrum MuenchenNeuherbergGermany
- School of Life Sciences, Technical University of MunichFreisingGermany
| | - Tim Reska
- Helmholtz AI, Helmholtz Zentrum MuenchenNeuherbergGermany
- Helmholtz Pioneer Campus, Helmholtz Zentrum MuenchenNeuherbergGermany
- School of Life Sciences, Technical University of MunichFreisingGermany
| | - Hernán E Morales
- Center for Evolutionary HologenomicsThe Globe Institute, University of CopenhagenCopenhagenDenmark
- Department of Biology, Ecology BuildingLund UniversityLundSweden
| | - Marc Palmada‐Flores
- Institute of Evolutionary BiologyUniversitat Pompeu Fabra‐CSIC, PRBBBarcelonaSpain
| | - Tomas Marques‐Bonet
- Institute of Evolutionary BiologyUniversitat Pompeu Fabra‐CSIC, PRBBBarcelonaSpain
- Catalan Institution of Research and Advanced Studies (ICREA)BarcelonaSpain
- CNAGCentre of Genomic AnalysisBarcelonaSpain
- Institut Català de Paleontologia Miquel CrusafontUniversitat Autònoma de BarcelonaBarcelonaSpain
| | | | - Zane Libke
- Instituto Nacional de BiodiversidadQuitoEcuador
- Fundación Sumak Kawsay In SituCantón MeraEcuador
| | | | | |
Collapse
|
32
|
Ahmed O, Rossi M, Boucher C, Langmead B. Efficient taxa identification using a pangenome index. Genome Res 2023; 33:1069-1077. [PMID: 37258301 PMCID: PMC10538492 DOI: 10.1101/gr.277642.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 05/22/2023] [Indexed: 06/02/2023]
Abstract
Tools that classify sequencing reads against a database of reference sequences require efficient index data-structures. The r-index is a compressed full-text index that answers substring presence/absence, count, and locate queries in space proportional to the amount of distinct sequence in the database: [Formula: see text] space, where r is the number of Burrows-Wheeler runs. To date, the r-index has lacked the ability to quickly classify matches according to which reference sequences (or sequence groupings, i.e., taxa) a match overlaps. We present new algorithms and methods for solving this problem. Specifically, given a collection D of d documents, [Formula: see text] over an alphabet of size σ, we extend the r-index with [Formula: see text] additional words to support document listing queries for a pattern [Formula: see text] that occurs in [Formula: see text] documents in D in [Formula: see text] time and [Formula: see text] space, where w is the machine word size. Applied in a bacterial mock community experiment, our method is up to three times faster than a comparable method that uses the standard r-index locate queries. We show that our method classifies both simulated and real nanopore reads at the strain level with higher accuracy compared with other approaches. Finally, we present strategies for compacting this structure in applications in which read lengths or match lengths can be bounded.
Collapse
Affiliation(s)
- Omar Ahmed
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| | - Massimiliano Rossi
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida 32611, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida 32611, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| |
Collapse
|
33
|
de la Morena-Barrio B, Palomo Á, Padilla J, Martín-Fernández L, Rojo-Carrillo JJ, Cifuentes R, Bravo-Pérez C, Garrido-Rodríguez P, Miñano A, Rubio AM, Pagán J, Llamas M, Vicente V, Vidal F, Lozano ML, Corral J, de la Morena-Barrio ME. Impact of genetic structural variants in factor XI deficiency: identification, accurate characterization, and inferred mechanism by long-read sequencing. J Thromb Haemost 2023; 21:1779-1788. [PMID: 36940803 DOI: 10.1016/j.jtha.2023.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/07/2023] [Accepted: 03/09/2023] [Indexed: 03/23/2023]
Abstract
BACKGROUND Congenital factor XI (FXI) deficiency is a probably underestimated coagulopathy that confers antithrombotic protection. Characterization of genetic defects in F11 is mainly focused on the identification of single-nucleotide variants and small insertion/deletions because they represent up to 99% of the alterations accounting for factor deficiency, with only 3 gross gene defects of structural variants (SVs) having been described. OBJECTIVES To identify and characterize the SVs affecting F11. METHODS The study was performed in 93 unrelated subjects with FXI deficiency recruited in Spanish hospitals over a period of 25 years (1997-2022). F11 was analyzed by next-generation sequencing, multiplex ligand probe amplification, and long-read sequencing. RESULTS Our study identified 30 different genetic variants. Interestingly, we found 3 SVs, all heterozygous: a complex duplication affecting exons 8 and 9, a tandem duplication of exon 14, and a large deletion affecting the whole gene. Nucleotide resolution obtained by long-read sequencing revealed Alu repetitive elements involved in all breakpoints. The large deletion was probably generated de novo in the paternal allele during gametogenesis, and despite affecting 30 additional genes, no syndromic features were described. CONCLUSION SVs may account for a high proportion of F11 genetic defects implicated in the molecular pathology of congenital FXI deficiency. These SVs, likely caused by a nonallelic homologous recombination involving repetitive elements, are heterogeneous in both type and length and may be de novo. These data support the inclusion of methods to detect SVs in this disorder, with long-read-based methods being the most appropriate because they detect all SVs and achieve adequate nucleotide resolution.
Collapse
Affiliation(s)
- Belén de la Morena-Barrio
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Ángeles Palomo
- Servicio de Hematología y Hemoterapia del centro Materno-Infantil del Hospital Regional Universitario Carlos de Haya, Málaga, Spain
| | - José Padilla
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Laura Martín-Fernández
- Laboratori de Coagulopaties Congènites, Banc de Sang i Teixits, Barcelona, Spain; Medicina Transfusional. Vall d'Hebron Institut de Recerca, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Juan José Rojo-Carrillo
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Rosa Cifuentes
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Carlos Bravo-Pérez
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Pedro Garrido-Rodríguez
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Antonia Miñano
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Ana María Rubio
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Javier Pagán
- Servicio de Medicina Interna, Hospital Universitario Morales Meseguer, Murcia, Spain
| | - María Llamas
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Vicente Vicente
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Francisco Vidal
- Laboratori de Coagulopaties Congènites, Banc de Sang i Teixits, Barcelona, Spain; Medicina Transfusional. Vall d'Hebron Institut de Recerca, Universitat Autònoma de Barcelona, Barcelona, Spain; Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Instituto de Salud Carlos III, Madrid, Spain
| | - María Luisa Lozano
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain
| | - Javier Corral
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain.
| | - María Eugenia de la Morena-Barrio
- Servicio de Hematología, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Pascual Parrilla, Centro de Investigación Biomédica en Red de Enfermedades Raras-Instituto de Salud Carlos III, Murcia, Spain.
| |
Collapse
|
34
|
Firtina C, Mansouri Ghiasi N, Lindegger J, Singh G, Cavlak MB, Mao H, Mutlu O. RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes. Bioinformatics 2023; 39:i297-i307. [PMID: 37387139 DOI: 10.1093/bioinformatics/btad272] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either (i) require powerful computational resources that may not be available for portable sequencers or (ii) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. We evaluate RawHash on three applications: (i) read mapping, (ii) relative abundance estimation, and (iii) contamination analysis. Our evaluations show that RawHash is the only tool that can provide high accuracy and high throughput for analyzing large genomes in real-time. When compared to the state-of-the-art techniques, UNCALLED and Sigmap, RawHash provides (i) 25.8× and 3.4× better average throughput and (ii) significantly better accuracy for large genomes, respectively. Source code is available at https://github.com/CMU-SAFARI/RawHash.
Collapse
Affiliation(s)
- Can Firtina
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Nika Mansouri Ghiasi
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Joel Lindegger
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Gagandeep Singh
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Meryem Banu Cavlak
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Haiyu Mao
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| | - Onur Mutlu
- Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland
| |
Collapse
|
35
|
Esteller-Cucala P, Palmada-Flores M, Kuderna LFK, Fontsere C, Serres-Armero A, Dabad M, Torralvo M, Faella A, Ferrández-Peral L, Llovera L, Fornas O, Julià E, Ramírez E, González I, Hecht J, Lizano E, Juan D, Marquès-Bonet T. Y chromosome sequence and epigenomic reconstruction across human populations. Commun Biol 2023; 6:623. [PMID: 37296226 PMCID: PMC10256797 DOI: 10.1038/s42003-023-05004-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 05/31/2023] [Indexed: 06/12/2023] Open
Abstract
Recent advances in long-read sequencing technologies have allowed the generation and curation of more complete genome assemblies, enabling the analysis of traditionally neglected chromosomes, such as the human Y chromosome (chrY). Native DNA was sequenced on a MinION Oxford Nanopore Technologies sequencing device to generate genome assemblies for seven major chrY human haplogroups. We analyzed and compared the chrY enrichment of sequencing data obtained using two different selective sequencing approaches: adaptive sampling and flow cytometry chromosome sorting. We show that adaptive sampling can produce data to create assemblies comparable to chromosome sorting while being a less expensive and time-consuming technique. We also assessed haplogroup-specific structural variants, which would be otherwise difficult to study using short-read sequencing data only. Finally, we took advantage of this technology to detect and profile epigenetic modifications among the considered haplogroups. Altogether, we provide a framework to study complex genomic regions with a simple, fast, and affordable methodology that could be applied to larger population genomics datasets.
Collapse
Affiliation(s)
- Paula Esteller-Cucala
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain.
| | - Marc Palmada-Flores
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Lukas F K Kuderna
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Claudia Fontsere
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Aitor Serres-Armero
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Marc Dabad
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, Barcelona, Spain
| | - María Torralvo
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Armida Faella
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Luis Ferrández-Peral
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Laia Llovera
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Oscar Fornas
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Doctor Aiguader 88, Barcelona, Spain
| | - Eva Julià
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Erika Ramírez
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Irene González
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Jochen Hecht
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Esther Lizano
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, Cerdanyola del Vallès, Spain
| | - David Juan
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Tomàs Marquès-Bonet
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain.
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Doctor Aiguader 88, Barcelona, Spain.
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, Cerdanyola del Vallès, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, Barcelona, Spain.
| |
Collapse
|
36
|
Rossi NM, Dai J, Xie Y, Wangsa D, Heselmeyer-Haddad K, Lou H, Boland JF, Yeager M, Orozco R, Freites EA, Mirabello L, Gharzouzi E, Dean M. Extrachromosomal Amplification of Human Papillomavirus Episomes Is a Mechanism of Cervical Carcinogenesis. Cancer Res 2023; 83:1768-1781. [PMID: 36971511 PMCID: PMC10239328 DOI: 10.1158/0008-5472.can-22-3030] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 01/18/2023] [Accepted: 03/21/2023] [Indexed: 03/29/2023]
Abstract
SIGNIFICANCE Multimers of the HPV genome are generated in cervical tumors replicating as extrachromosomal episomes, which is associated with deletion and rearrangement of the HPV genome and provides a mechanism for oncogenesis without integration.
Collapse
Affiliation(s)
- Nicole M. Rossi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Jieqiong Dai
- Leidos Biomedical Research, Inc., National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yi Xie
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Darawalee Wangsa
- Center for Cancer Research, Genetics Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kerstin Heselmeyer-Haddad
- Center for Cancer Research, Genetics Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hong Lou
- Leidos Biomedical Research, Inc., National Laboratory for Cancer Research, Frederick, MD, USA
| | - Joseph F. Boland
- Leidos Biomedical Research, Inc., National Laboratory for Cancer Research, Frederick, MD, USA
| | - Meredith Yeager
- Leidos Biomedical Research, Inc., National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Enrique Alvirez Freites
- Hospital Central Universitario “Dr. Antonio M Pineda,” Barquisimeto, Lara State, Venezuela, and Universidad Andino de Cusco, Cusco, Perú
| | - Lisa Mirabello
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | | | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| |
Collapse
|
37
|
Whitmore L, McCauley M, Farrell JA, Stammnitz MR, Koda SA, Mashkour N, Summers V, Osborne T, Whilde J, Duffy DJ. Inadvertent human genomic bycatch and intentional capture raise beneficial applications and ethical concerns with environmental DNA. Nat Ecol Evol 2023; 7:873-888. [PMID: 37188965 PMCID: PMC10250199 DOI: 10.1038/s41559-023-02056-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 03/29/2023] [Indexed: 05/17/2023]
Abstract
The field of environmental DNA (eDNA) is advancing rapidly, yet human eDNA applications remain underutilized and underconsidered. Broader adoption of eDNA analysis will produce many well-recognized benefits for pathogen surveillance, biodiversity monitoring, endangered and invasive species detection, and population genetics. Here we show that deep-sequencing-based eDNA approaches capture genomic information from humans (Homo sapiens) just as readily as that from the intended target species. We term this phenomenon human genetic bycatch (HGB). Additionally, high-quality human eDNA could be intentionally recovered from environmental substrates (water, sand and air), holding promise for beneficial medical, forensic and environmental applications. However, this also raises ethical dilemmas, from consent, privacy and surveillance to data ownership, requiring further consideration and potentially novel regulation. We present evidence that human eDNA is readily detectable from 'wildlife' environmental samples as human genetic bycatch, demonstrate that identifiable human DNA can be intentionally recovered from human-focused environmental sampling and discuss the translational and ethical implications of such findings.
Collapse
Affiliation(s)
- Liam Whitmore
- Whitney Laboratory for Marine Bioscience and Sea Turtle Hospital, University of Florida, St. Augustine, FL, USA
- Department of Biological Sciences, School of Natural Sciences, Faculty of Science and Engineering, University of Limerick, Limerick, Ireland
| | - Mark McCauley
- Whitney Laboratory for Marine Bioscience and Sea Turtle Hospital, University of Florida, St. Augustine, FL, USA
- Department of Chemistry, University of Florida, Gainesville, FL, USA
| | - Jessica A Farrell
- Whitney Laboratory for Marine Bioscience and Sea Turtle Hospital, University of Florida, St. Augustine, FL, USA
- Department of Biology, College of Liberal Arts and Sciences, University of Florida, Gainesville, FL, USA
| | - Maximilian R Stammnitz
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Samantha A Koda
- Whitney Laboratory for Marine Bioscience and Sea Turtle Hospital, University of Florida, St. Augustine, FL, USA
| | - Narges Mashkour
- Whitney Laboratory for Marine Bioscience and Sea Turtle Hospital, University of Florida, St. Augustine, FL, USA
| | - Victoria Summers
- Whitney Laboratory for Marine Bioscience and Sea Turtle Hospital, University of Florida, St. Augustine, FL, USA
| | - Todd Osborne
- Whitney Laboratory for Marine Bioscience and Sea Turtle Hospital, University of Florida, St. Augustine, FL, USA
| | - Jenny Whilde
- Whitney Laboratory for Marine Bioscience and Sea Turtle Hospital, University of Florida, St. Augustine, FL, USA
| | - David J Duffy
- Whitney Laboratory for Marine Bioscience and Sea Turtle Hospital, University of Florida, St. Augustine, FL, USA.
- Department of Biology, College of Liberal Arts and Sciences, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
38
|
Sahlin K, Baudeau T, Cazaux B, Marchet C. A survey of mapping algorithms in the long-reads era. Genome Biol 2023; 24:133. [PMID: 37264447 PMCID: PMC10236595 DOI: 10.1186/s13059-023-02972-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/12/2023] [Indexed: 06/03/2023] Open
Abstract
It has been over a decade since the first publication of a method dedicated entirely to mapping long-reads. The distinctive characteristics of long reads resulted in methods moving from the seed-and-extend framework used for short reads to a seed-and-chain framework due to the seed abundance in each read. The main novelties are based on alternative seed constructs or chaining formulations. Dozens of tools now exist, whose heuristics have evolved considerably. We provide an overview of the methods used in long-read mappers. Since they are driven by implementation-specific parameters, we develop an original visualization tool to understand the parameter settings ( http://bcazaux.polytech-lille.net/Minimap2/ ).
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, 106 91, Stockholm, Sweden.
| | - Thomas Baudeau
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000, Lille, France
| | - Bastien Cazaux
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000, Lille, France
| | - Camille Marchet
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000, Lille, France.
| |
Collapse
|
39
|
Ahmed OY, Rossi M, Gagie T, Boucher C, Langmead B. SPUMONI 2: improved classification using a pangenome index of minimizer digests. Genome Biol 2023; 24:122. [PMID: 37202771 PMCID: PMC10197461 DOI: 10.1186/s13059-023-02958-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 05/03/2023] [Indexed: 05/20/2023] Open
Abstract
Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMONI 2's index is 65 times smaller than minimap2's for a mock community pangenome. SPUMONI 2 achieves a speed improvement of 3-fold compared to SPUMONI and 15-fold compared to minimap2. We show SPUMONI 2 achieves an advantageous mix of accuracy and efficiency in practical scenarios such as adaptive sampling, contamination detection and multi-class metagenomics classification.
Collapse
Affiliation(s)
- Omar Y. Ahmed
- Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | - Massimiliano Rossi
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL USA
| | - Travis Gagie
- Faculty of Computer Science, Dalhousie University, Halifax, NS Canada
| | - Christina Boucher
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| |
Collapse
|
40
|
Jiang S, Wei Y, Ke H, Song C, Liao W, Meng L, Sun C, Zhou J, Wang C, Su X, Dong C, Xiong Y, Yang S. Building a nomogram plot based on the nanopore targeted sequencing for predicting urinary tract pathogens and differentiating from colonizing bacteria. Front Cell Infect Microbiol 2023; 13:1142426. [PMID: 37265501 PMCID: PMC10229875 DOI: 10.3389/fcimb.2023.1142426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 04/28/2023] [Indexed: 06/03/2023] Open
Abstract
Background The identification of uropathogens (UPBs) and urinary tract colonizing bacteria (UCB) conduces to guide the antimicrobial therapy to reduce resistant bacterial strains and study urinary microbiota. This study established a nomogram based on the nanopore-targeted sequencing (NTS) and other infectious risk factors to distinguish UPB from UCB. Methods Basic information, medical history, and multiple urine test results were continuously collected and analyzed by least absolute shrinkage and selection operator (LASSO) regression, and multivariate logistic regression was used to determine the independent predictors and construct nomogram. Receiver operating characteristics, area under the curve, decision curve analysis, and calibration curves were used to evaluate the performance of the nomogram. Results In this study, the UPB detected by NTS accounted for 74.1% (401/541) of all urinary tract microorganisms. The distribution of ln(reads) between UPB and UCB groups showed significant difference (OR = 1.39; 95% CI, 1.246-1.551, p < 0.001); the reads number in NTS reports could be used for the preliminary determination of UPB (AUC=0.668) with corresponding cutoff values being 7.042. Regression analysis was performed to determine independent predictors and construct a nomogram, with variables ranked by importance as ln(reads) and the number of microbial species in the urinary tract of NTS, urine culture, age, urological neoplasms, nitrite, and glycosuria. The calibration curve showed an agreement between the predicted and observed probabilities of the nomogram. The decision curve analysis represented that the nomogram would benefit clinical interventions. The performance of nomogram with ln(reads) (AUC = 0.767; 95% CI, 0.726-0.807) was significantly better (Z = 2.304, p-value = 0.021) than that without ln(reads) (AUC = 0.727; 95% CI, 0.681-0.772). The rate of UPB identification of nomogram was significantly higher than that of ln(reads) only (χ2 = 7.36, p-value = 0.009). Conclusions NTS is conducive to distinguish uropathogens from colonizing bacteria, and the nomogram based on NTS and multiple independent predictors has better prediction performance of uropathogens.
Collapse
Affiliation(s)
- Shengming Jiang
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Yangyan Wei
- Department of Cardiovascular Surgery, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Hu Ke
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Chao Song
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Wenbiao Liao
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Lingchao Meng
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Chang Sun
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Jiawei Zhou
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Chuan Wang
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Xiaozhe Su
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Caitao Dong
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Yunhe Xiong
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Sixing Yang
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, China
| |
Collapse
|
41
|
Marchukov D, Li J, Juillerat P, Misselwitz B, Yilmaz B. Benchmarking microbial DNA enrichment protocols from human intestinal biopsies. Front Genet 2023; 14:1184473. [PMID: 37180976 PMCID: PMC10169731 DOI: 10.3389/fgene.2023.1184473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 04/10/2023] [Indexed: 05/16/2023] Open
Abstract
Shotgun metagenomic sequencing is a powerful tool for studying bacterial communities in their natural habitats or sites of infection, without the need for cultivation. However, low microbial signals in metagenomic sequencing can be overwhelmed by host DNA contamination, resulting in decreased sensitivity for microbial read detection. Several commercial kits and other methods have been developed to enrich bacterial sequences; however, these assays have not been tested extensively for human intestinal tissues yet. Therefore, the objective of this study was to assess the effectiveness of various wet-lab and software-based approaches for depleting host DNA from microbiome samples. Four different microbiome DNA enrichment methods, namely the NEBNext Microbiome DNA Enrichment kit, Molzym Ultra-Deep Microbiome Prep, QIAamp DNA Microbiome kit, and Zymo HostZERO microbial DNA kit, were evaluated, along with a software-controlled adaptive sampling (AS) approach by Oxford Nanopore Technologies (ONT) providing microbial signal enrichment by aborting unwanted host DNA sequencing. The NEBNext and QIAamp kits proved to be effective in shotgun metagenomic sequencing studies, as they efficiently reduced host DNA contamination, resulting in 24% and 28% bacterial DNA sequences, respectively, compared to <1% in the AllPrep controls. Additional optimization steps using further detergents and bead-beating steps improved the efficacy of less efficient protocols but not of the QIAamp kit. In contrast, ONT AS increased the overall number of bacterial reads resulting in a better bacterial metagenomic assembly with more bacterial contigs with greater completeness compared to non-AS approaches. Additionally, AS also allowed for the recovery of antimicrobial resistance markers and the identification of plasmids, demonstrating the potential utility of AS for targeted sequencing of microbial signals in complex samples with high amounts of host DNA. However, ONT AS resulted in relevant shifts in the observed bacterial abundance, including 2 to 5 times more Escherichia coli reads. Furthermore, a modest enrichment of Bacteroides fragilis and Bacteroides thetaiotaomicron was also observed with AS. Overall, this study provides insight into the efficacy and limitations of various methods for reducing host DNA contamination in human intestinal samples to improve the utility of metagenomic sequencing.
Collapse
Affiliation(s)
- Dmitrij Marchukov
- University Hospital Zürich, University of Zürich, Zürich, Switzerland
| | - Jiaqi Li
- Department of Visceral Surgery and Medicine, Bern University Hospital, University of Bern, Bern, Switzerland
- Maurice Müller Laboratories, Department for Biomedical Research, University of Bern, Bern, Switzerland
| | - Pascal Juillerat
- Department of Visceral Surgery and Medicine, Bern University Hospital, University of Bern, Bern, Switzerland
- Maurice Müller Laboratories, Department for Biomedical Research, University of Bern, Bern, Switzerland
- Crohn’s and Colitis Center, Gastroenterologie Beaulieu, Lausanne, Switzerland
| | - Benjamin Misselwitz
- Department of Visceral Surgery and Medicine, Bern University Hospital, University of Bern, Bern, Switzerland
- Maurice Müller Laboratories, Department for Biomedical Research, University of Bern, Bern, Switzerland
| | - Bahtiyar Yilmaz
- Department of Visceral Surgery and Medicine, Bern University Hospital, University of Bern, Bern, Switzerland
- Maurice Müller Laboratories, Department for Biomedical Research, University of Bern, Bern, Switzerland
| |
Collapse
|
42
|
Sun Y, Cheng Z, Li X, Yang Q, Zhao B, Wu Z, Xia Y. Genome enrichment of rare and unknown species from complicated microbiomes by nanopore selective sequencing. Genome Res 2023; 33:612-621. [PMID: 37041035 PMCID: PMC10234302 DOI: 10.1101/gr.277266.122] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 03/22/2023] [Indexed: 04/13/2023]
Abstract
Rare species are vital members of a microbial community, but retrieving their genomes is difficult because of their low abundance. The ReadUntil (RU) approach allows nanopore devices to sequence specific DNA molecules selectively in real time, which provides an opportunity for enriching rare species. Despite the robustness of enriching rare species by reducing the sequencing depth of known host sequences, such as the human genome, there is still a gap in RU-based enriching of rare species in environmental samples whose community composition is unclear, and many rare species have poor or incomplete reference genomes in public databases. Therefore, here we present metaRUpore to overcome this challenge. When we applied metaRUpore to a thermophilic anaerobic digester (TAD) community and human gut microbial community, it reduced coverage of the high-abundance populations and modestly increased (∼2×) the genome coverage of the rare taxa, facilitating successful recovery of near-finished metagenome-assembled genomes (nf-MAGs) of rare species. The simplicity and robustness of the approach make it accessible for laboratories with moderate computational resources, and hold the potential to become the standard practice in future metagenomic sequencing of complicated microbiomes.
Collapse
Affiliation(s)
- Yuhong Sun
- School of Environmental Science and Engineering, College of Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Zhanwen Cheng
- School of Environmental Science and Engineering, College of Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Xiang Li
- School of Environmental Science and Engineering, College of Engineering, Southern University of Science and Technology, Shenzhen 518055, China
- State Environmental Protection Key Laboratory of Integrated Surface Water-Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
- Guangdong Provincial Key Laboratory of Soil and Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Qing Yang
- School of Environmental Science and Engineering, College of Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Bixi Zhao
- School of Environmental Science and Engineering, College of Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Ziqi Wu
- School of Environmental Science and Engineering, College of Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yu Xia
- School of Environmental Science and Engineering, College of Engineering, Southern University of Science and Technology, Shenzhen 518055, China;
- State Environmental Protection Key Laboratory of Integrated Surface Water-Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
- Guangdong Provincial Key Laboratory of Soil and Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| |
Collapse
|
43
|
Su J, Lui WW, Lee Y, Zheng Z, Siu GKH, Ng TTL, Zhang T, Lam TTY, Lao HY, Yam WC, Tam KKG, Leung KSS, Lam TW, Leung AWS, Luo R. Evaluation of Mycobacterium tuberculosis enrichment in metagenomic samples using ONT adaptive sequencing and amplicon sequencing for identification and variant calling. Sci Rep 2023; 13:5237. [PMID: 37002338 PMCID: PMC10066345 DOI: 10.1038/s41598-023-32378-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 03/27/2023] [Indexed: 04/03/2023] Open
Abstract
Sensitive detection of Mycobacterium tuberculosis (TB) in small percentages in metagenomic samples is essential for microbial classification and drug resistance prediction. However, traditional methods, such as bacterial culture and microscopy, are time-consuming and sometimes have limited TB detection sensitivity. Oxford nanopore technologies (ONT) MinION sequencing allows rapid and simple sample preparation for sequencing. Its recently developed adaptive sequencing selects reads from targets while allowing real-time base-calling to achieve sequence enrichment or depletion during sequencing. Another common enrichment method is PCR amplification of the target TB genes. In this study, we compared both methods using ONT MinION sequencing for TB detection and variant calling in metagenomic samples using both simulation runs and those with synthetic and patient samples. We found that both methods effectively enrich TB reads from a high percentage of human (95%) and other microbial DNA. Adaptive sequencing with readfish and UNCALLDE achieved a 3.9-fold and 2.2-fold enrichment compared to the control run. We provide a simple automatic analysis framework to support the detection of TB for clinical use, openly available at https://github.com/HKU-BAL/ONT-TB-NF . Depending on the patient's medical condition and sample type, we recommend users evaluate and optimize their workflow for different clinical specimens to improve the detection limit.
Collapse
Affiliation(s)
- Junhao Su
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Wui Wang Lui
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - YanLam Lee
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Zhenxian Zheng
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Gilman Kit-Hang Siu
- Department of Health Technology and Informatics, Faculty of Health and Social Sciences, The Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, China
| | - Timothy Ting-Leung Ng
- Department of Health Technology and Informatics, Faculty of Health and Social Sciences, The Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, China
| | - Tong Zhang
- Department of Computer Science and Engineering, Department of Mathematics, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China
| | - Tommy Tsan-Yuk Lam
- State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
- Laboratory of Data Discovery for Health Limited, 19W Hong Kong Science & Technology Parks, Pak Shek Kok, Hong Kong SAR, China
| | - Hiu-Yin Lao
- Department of Health Technology and Informatics, Faculty of Health and Social Sciences, The Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, China
| | - Wing-Cheong Yam
- Department of Microbiology, Lee Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Kingsley King-Gee Tam
- Department of Microbiology, Lee Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Kenneth Siu-Sing Leung
- Department of Microbiology, Lee Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Tak-Wah Lam
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Amy Wing-Sze Leung
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| |
Collapse
|
44
|
Noordijk B, Nijland R, Carrion VJ, Raaijmakers JM, de Ridder D, de Lannoy C. baseLess: lightweight detection of sequences in raw MinION data. BIOINFORMATICS ADVANCES 2023; 3:vbad017. [PMID: 36818730 PMCID: PMC9936955 DOI: 10.1093/bioadv/vbad017] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 01/27/2023] [Accepted: 02/12/2023] [Indexed: 02/17/2023]
Abstract
Summary With its candybar form factor and low initial investment cost, the MinION brought affordable portable nucleic acid analysis within reach. However, translating the electrical signal it outputs into a sequence of bases still requires mid-tier computer hardware, which remains a caveat when aiming for deployment of many devices at once or usage in remote areas. For applications focusing on detection of a target sequence, such as infectious disease monitoring or species identification, the computational cost of analysis may be reduced by directly detecting the target sequence in the electrical signal instead. Here, we present baseLess, a computational tool that enables such target-detection-only analysis. BaseLess makes use of an array of small neural networks, each of which efficiently detects a fixed-size subsequence of the target sequence directly from the electrical signal. We show that baseLess can accurately determine the identity of reads between three closely related fish species and can classify sequences in mixtures of 20 bacterial species, on an inexpensive single-board computer. Availability and implementation baseLess and all code used in data preparation and validation are available on Github at https://github.com/cvdelannoy/baseLess, under an MIT license. Used validation data and scripts can be found at https://doi.org/10.4121/20261392, under an MIT license. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Ben Noordijk
- Bioinformatics Group, Wageningen University, Wageningen 6700AH, The Netherlands
| | - Reindert Nijland
- Marine Animal Ecology, Wageningen University, Wageningen 6700AP, The Netherlands
| | - Victor J Carrion
- Institute of Biology, Leiden University, Leiden 2300RA, The Netherlands,Department of Microbial Ecology, Netherlands Institute of Ecology, Wageningen 6700AB, The Netherlands,Departamento de Microbiología, Instituto de Hortofruticultura Subtropical y Mediterránea ‘La Mayora’, Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Málaga 29010, Spain
| | - Jos M Raaijmakers
- Institute of Biology, Leiden University, Leiden 2300RA, The Netherlands,Department of Microbial Ecology, Netherlands Institute of Ecology, Wageningen 6700AB, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Wageningen 6700AH, The Netherlands
| | | |
Collapse
|
45
|
Chen P, Sun Z, Wang J, Liu X, Bai Y, Chen J, Liu A, Qiao F, Chen Y, Yuan C, Sha J, Zhang J, Xu LQ, Li J. Portable nanopore-sequencing technology: Trends in development and applications. Front Microbiol 2023; 14:1043967. [PMID: 36819021 PMCID: PMC9929578 DOI: 10.3389/fmicb.2023.1043967] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 01/03/2023] [Indexed: 02/04/2023] Open
Abstract
Sequencing technology is the most commonly used technology in molecular biology research and an essential pillar for the development and applications of molecular biology. Since 1977, when the first generation of sequencing technology opened the door to interpreting the genetic code, sequencing technology has been developing for three generations. It has applications in all aspects of life and scientific research, such as disease diagnosis, drug target discovery, pathological research, species protection, and SARS-CoV-2 detection. However, the first- and second-generation sequencing technology relied on fluorescence detection systems and DNA polymerization enzyme systems, which increased the cost of sequencing technology and limited its scope of applications. The third-generation sequencing technology performs PCR-free and single-molecule sequencing, but it still depends on the fluorescence detection device. To break through these limitations, researchers have made arduous efforts to develop a new advanced portable sequencing technology represented by nanopore sequencing. Nanopore technology has the advantages of small size and convenient portability, independent of biochemical reagents, and direct reading using physical methods. This paper reviews the research and development process of nanopore sequencing technology (NST) from the laboratory to commercially viable tools; discusses the main types of nanopore sequencing technologies and their various applications in solving a wide range of real-world problems. In addition, the paper collates the analysis tools necessary for performing different processing tasks in nanopore sequencing. Finally, we highlight the challenges of NST and its future research and application directions.
Collapse
Affiliation(s)
- Pin Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Zepeng Sun
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Jiawei Wang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Xinlong Liu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yun Bai
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Jiang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Anna Liu
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Feng Qiao
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Chenyan Yuan
- Clinical Laboratory, Southeast University Zhongda Hospital, Nanjing, China
| | - Jingjie Sha
- School of Mechanical Engineering, Southeast University, Nanjing, China
| | - Jinghui Zhang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Li-Qun Xu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China,*Correspondence: Li-Qun Xu, ✉
| | - Jian Li
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China,Jian Li, ✉
| |
Collapse
|
46
|
Hong M, Peng D, Fu A, Wang X, Zheng Y, Xia L, Shi W, Qian C, Li Z, Liu F, Wu Q. The application of nanopore targeted sequencing in the diagnosis and antimicrobial treatment guidance of bloodstream infection of febrile neutropenia patients with hematologic disease. J Cell Mol Med 2023; 27:506-514. [PMID: 36722317 PMCID: PMC9930421 DOI: 10.1111/jcmm.17651] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 11/12/2022] [Accepted: 12/06/2022] [Indexed: 02/02/2023] Open
Abstract
Traditional microbiological methodology has limited sensitivity, detection range, and turnaround times in diagnosis of bloodstream infection in Febrile Neutropenia (FN) patients. A more rapid and sensitive detection technology is urgently needed. Here we used the newly developed Nanapore targeted sequencing (NTS) to diagnose the pathogens in blood samples. The diagnostic performance (sensitivity, specificity and turnaround time) of NTS detection of 202 blood samples from FN patients with hematologic disease was evaluated in comparison to blood culture and nested Polymerase Chain Reaction (PCR) followed by sanger sequence. The impact of NTS results on antibiotic treatment modification, the effectivity and mortality of the patients under the guidance of NTS results were assessed. The data showed that NTS had clinical sensitivity of 92.11%, clinical specificity of 78.41% compared with the blood culture and PCR combination. Importantly, the turnaround time for NTS was <24 h for all specimens, and the pre-report time within 6 h in emergency cases was possible in clinical practice. Among 118 NTS positive patients, 98.3% patients' antibiotic regimens were guided according to NTS results. There was no significant difference in effectivity and mortality rate between Antibiotic regimen switched according to NTS group and Antibiotic regimen covering pathogens detected by NTS group. Therefore, NTS could yield a higher sensitivity, specificity and shorter turnaround time for broad-spectrum pathogens identification in blood samples detection compared with traditional tests. It's also a good guidance in clinical targeted antibiotic treatment for FN patients with hematologic disease, thereby emerging as a promising technology for detecting infectious disease.
Collapse
Affiliation(s)
- Mei Hong
- Institute of Hematology, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| | - Danyue Peng
- Institute of Hematology, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| | - Aisi Fu
- Wuhan Dgensee Clinical Laboratory Co., Ltd.WuhanChina
| | - Xian Wang
- Wuhan Dgensee Clinical Laboratory Co., Ltd.WuhanChina
| | - Yabiao Zheng
- Wuhan Dgensee Clinical Laboratory Co., Ltd.WuhanChina
| | - Linghui Xia
- Institute of Hematology, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| | - Wei Shi
- Institute of Hematology, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| | - Chenjing Qian
- Institute of Hematology, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| | - Zixuan Li
- Institute of Hematology, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| | - Fang Liu
- Institute of Hematology, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| | - Qiuling Wu
- Institute of Hematology, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
47
|
Sadasivan H, Wadden J, Goliya K, Ranjan P, Dickson RP, Blaauw D, Das R, Narayanasamy S. Rapid Real-time Squiggle Classification for Read until using RawMap. ARCHIVES OF CLINICAL AND BIOMEDICAL RESEARCH 2023; 7:45-57. [PMID: 36938368 PMCID: PMC10022530 DOI: 10.26502/acbr.50170318] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2023]
Abstract
ReadUntil enables Oxford Nanopore Technology's (ONT) sequencers to selectively sequence reads of target species in real-time. This enables efficient microbial enrichment for applications such as microbial abundance estimation and is particularly beneficial for metagenomic samples with a very high fraction of non-target reads (> 99% can be human reads). However, read-until requires a fast and accurate software filter that analyzes a short prefix of a read and determines if it belongs to a microbe of interest (target) or not. The baseline Read Until pipeline uses a deep neural network-based basecaller called Guppy and is slow and inaccurate for this task (~60% of bases sequenced are unclassified). We present RawMap, an efficient CPU-only microbial species-agnostic Read Until classifier for filtering non-target human reads in the squiggle space. RawMap uses a Support Vector Machine (SVM), which is trained to distinguish human from microbe using non-linear and non-stationary characteristics of ONT's squiggle output (continuous electrical signals). Compared to the baseline Read Until pipeline, RawMap is a 1327X faster classifier and significantly improves the sequencing time and cost, and compute time savings. We show that RawMap augmented pipelines reduce sequencing time and cost by ~24% and computing cost by 22%. Additionally, since RawMap is agnostic to microbial species, it can also classify microbial species it is not trained on. We also discuss how RawMap may be used as an alternative to the RT-PCR test for viral load quantification of SARS-CoV-2.
Collapse
Affiliation(s)
- Harisankar Sadasivan
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| | - Jack Wadden
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| | - Kush Goliya
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| | - Piyush Ranjan
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, USA
| | - Robert P Dickson
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, USA
| | - David Blaauw
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| | - Reetuparna Das
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| | - Satish Narayanasamy
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA
| |
Collapse
|
48
|
Senanayake A, Gamaarachchi H, Herath D, Ragel R. DeepSelectNet: deep neural network based selective sequencing for oxford nanopore sequencing. BMC Bioinformatics 2023; 24:31. [PMID: 36709261 PMCID: PMC9883605 DOI: 10.1186/s12859-023-05151-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 01/17/2023] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND Nanopore sequencing allows selective sequencing, the ability to programmatically reject unwanted reads in a sample. Selective sequencing has many present and future applications in genomics research and the classification of species from a pool of species is an example. Existing methods for selective sequencing for species classification are still immature and the accuracy highly varies depending on the datasets. For the five datasets we tested, the accuracy of existing methods varied in the range of [Formula: see text] 77 to 97% (average accuracy < 89%). Here we present DeepSelectNet, an accurate deep-learning-based method that can directly classify nanopore current signals belonging to a particular species. DeepSelectNet utilizes novel data preprocessing techniques and improved neural network architecture for regularization. RESULTS For the five datasets tested, DeepSelectNet's accuracy varied between [Formula: see text] 91 and 99% (average accuracy [Formula: see text] 95%). At its best performance, DeepSelectNet achieved a nearly 12% accuracy increase compared to its deep learning-based predecessor SquiggleNet. Furthermore, precision and recall evaluated for DeepSelectNet on average were always > 89% (average [Formula: see text] 95%). In terms of execution performance, DeepSelectNet outperformed SquiggleNet by [Formula: see text] 13% on average. Thus, DeepSelectNet is a practically viable method to improve the effectiveness of selective sequencing. CONCLUSIONS Compared to base alignment and deep learning predecessors, DeepSelectNet can significantly improve the accuracy to enable real-time species classification using selective sequencing. The source code of DeepSelectNet is available at https://github.com/AnjanaSenanayake/DeepSelectNet .
Collapse
Affiliation(s)
- Anjana Senanayake
- grid.11139.3b0000 0000 9816 8637Department of Computer Engineering, University of Peradeniya, Peradeniya, Sri Lanka
| | - Hasindu Gamaarachchi
- grid.415306.50000 0000 9983 6924Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia ,grid.1005.40000 0004 4902 0432School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
| | - Damayanthi Herath
- grid.11139.3b0000 0000 9816 8637Department of Computer Engineering, University of Peradeniya, Peradeniya, Sri Lanka
| | - Roshan Ragel
- grid.11139.3b0000 0000 9816 8637Department of Computer Engineering, University of Peradeniya, Peradeniya, Sri Lanka
| |
Collapse
|
49
|
Lin Y, Dai Y, Zhang S, Guo H, Yang L, Li J, Wang K, Ni M, Hu Z, Jia L, Liu H, Li P, Song H. Application of nanopore adaptive sequencing in pathogen detection of a patient with Chlamydia psittaci infection. Front Cell Infect Microbiol 2023; 13:1064317. [PMID: 36756615 PMCID: PMC9900021 DOI: 10.3389/fcimb.2023.1064317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 01/11/2023] [Indexed: 01/24/2023] Open
Abstract
Introduction Nanopore sequencing has been widely used in clinical metagenomic sequencing for pathogen detection with high portability and real-time sequencing. Oxford Nanopore Technologies has recently launched an adaptive sequencing function, which can enrich on-target reads through real-time alignment and eject uninteresting reads by reversing the voltage across the nanopore. Here we evaluated the utility of adaptive sequencing in clinical pathogen detection. Methods Nanopore adaptive sequencing and standard sequencing was performed on a same flow cell with a bronchoalveolar lavage fluid sample from a patient with Chlamydia psittacosis infection, and was compared with the previous mNGS results. Results Nanopore adaptive sequencing identified 648 on-target stop receiving reads with the longest median read length(688bp), which account for 72.4% of all Chlamydia psittaci reads and 0.03% of total reads in enriched group. The read proportion matched to C. psittaci in the stop receiving group was 99.85%, which was much higher than that of the unblock (<0.01%) and fail to adapt (0.02%) groups. Nanopore adaptive sequencing generated similar data yield of C. psittaci compared with standard nanopore sequencing. The proportion of C. psittaci reads in adaptive sequencing is close to that of standard nanopore sequencing and mNGS, but generated lower genome coverage than mNGS. Discussion Nanopore adaptive sequencing can effectively identify target C. psittaci reads in real-time, but how to increase the targeted data of pathogens still needs to be further evaluated.
Collapse
Affiliation(s)
- Yanfeng Lin
- Academy of Military Medical Sciences, Academy of Military Sciences, Beijing, China,Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Yan Dai
- State Key Laboratory of Translational Medicine and Innovative Drug Development, Jiangsu Simcere Diagnostics Co., Ltd., Nanjing, China
| | - Shuang Zhang
- Academy of Military Medical Sciences, Academy of Military Sciences, Beijing, China,Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Hao Guo
- State Key Laboratory of Translational Medicine and Innovative Drug Development, Jiangsu Simcere Diagnostics Co., Ltd., Nanjing, China
| | - Lang Yang
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Jinhui Li
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Kaiying Wang
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Ming Ni
- Academy of Military Medical Sciences, Academy of Military Sciences, Beijing, China,Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Zongqian Hu
- Academy of Military Medical Sciences, Academy of Military Sciences, Beijing, China,Beijing Institute of Radiation Medicine, Beijing, China
| | - Leili Jia
- Chinese PLA Center for Disease Control and Prevention, Beijing, China
| | - Huiying Liu
- College of Pulmonary & Critical Care Medicine, 8th Medical Center, Chinese PLA General Hospital, Beijing, China,*Correspondence: Huiying Liu, ; Peng Li, ; Hongbin Song,
| | - Peng Li
- Chinese PLA Center for Disease Control and Prevention, Beijing, China,*Correspondence: Huiying Liu, ; Peng Li, ; Hongbin Song,
| | - Hongbin Song
- Academy of Military Medical Sciences, Academy of Military Sciences, Beijing, China,Chinese PLA Center for Disease Control and Prevention, Beijing, China,*Correspondence: Huiying Liu, ; Peng Li, ; Hongbin Song,
| |
Collapse
|
50
|
Takashima Y, Komoto Y, Ohshiro T, Nakatani K, Taniguchi M. Quantitative Microscopic Observation of Base-Ligand Interactions via Hydrogen Bonds by Single-Molecule Counting. J Am Chem Soc 2023; 145:1310-1318. [PMID: 36597667 DOI: 10.1021/jacs.2c11260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Chemical properties have been based on statistical averages since the introduction of Avogadro's number. The lack of suitable methods for counting identified single molecules has posed challenges to counting statistics. The selectivity, affinity, and mode of hydrogen bonding between base and small molecules that make up DNA, which is vital for living organisms, have not yet been revealed at the single molecule level. Here, we show the quantitation of the above-mentioned parameters via single-molecule counting based on the combination of single-molecule electrical measurements and AI. The binding selectivity values of five ligands to four different base molecules were evaluated quantitatively by determining the ratio of the number of aggregates in a solution mixture of base molecules and a ligand. In addition, we show the ligand dependence of the mode and number of microscopic hydrogen bonds via single-molecule counting and quantum chemical calculations.
Collapse
Affiliation(s)
- Yusuke Takashima
- SANKEN, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka567-0047, Japan
| | - Yuki Komoto
- SANKEN, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka567-0047, Japan.,Artificial Intelligence Research Center, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka567-0047, Japan.,Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives (OTRI), OsakaUniversity, 8-1 Mihogaoka, Ibaraki, Osaka567-0047, Japan
| | - Takahito Ohshiro
- SANKEN, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka567-0047, Japan
| | - Kazuhiko Nakatani
- SANKEN, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka567-0047, Japan
| | | |
Collapse
|