1
|
Dorey A, Howorka S. Nanopore DNA sequencing technologies and their applications towards single-molecule proteomics. Nat Chem 2024; 16:314-334. [PMID: 38448507 DOI: 10.1038/s41557-023-01322-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 07/14/2023] [Indexed: 03/08/2024]
Abstract
Sequencing of nucleic acids with nanopores has emerged as a powerful tool offering rapid readout, high accuracy, low cost and portability. This label-free method for sequencing at the single-molecule level is an achievement on its own. However, nanopores also show promise for the technologically even more challenging sequencing of polypeptides, something that could considerably benefit biological discovery, clinical diagnostics and homeland security, as current techniques lack portability and speed. Here we survey the biochemical innovations underpinning commercial and academic nanopore DNA/RNA sequencing techniques, and explore how these advances can fuel developments in future protein sequencing with nanopores.
Collapse
Affiliation(s)
- Adam Dorey
- Department of Chemistry & Institute of Structural Molecular Biology, University College London, London, UK.
| | - Stefan Howorka
- Department of Chemistry & Institute of Structural Molecular Biology, University College London, London, UK.
| |
Collapse
|
2
|
Koonchanok R, Daulatabad SV, Reda K, Janga SC. Sequoia: A Framework for Visual Analysis of RNA Modifications from Direct RNA Sequencing Data. Methods Mol Biol 2023; 2624:127-138. [PMID: 36723813 DOI: 10.1007/978-1-0716-2962-8_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Oxford Nanopore-based long-read direct RNA sequencing protocols are being increasingly used to study the dynamics of RNA metabolic processes due to improvements in read lengths, increased throughput, decreasing cost, ease of library preparation, and convenience. Long-read sequencing enables single-molecule-based detection of posttranscriptional changes, promising novel insights into the functional roles of RNA. However, fulfilling this potential will necessitate the development of new tools for analyzing and exploring this type of data. Although there are tools that allow users to analyze signal information, such as comparing raw signal traces to a nucleotide sequence, they don't facilitate studying each individual signal instance in each read or perform analysis of signal clusters based on signal similarity. Therefore, we present Sequoia, a visual analytics application that allows users to interactively analyze signals originating from nanopore sequencers and can readily be extended to both RNA and DNA sequencing datasets. Sequoia combines a Python-based backend with a multi-view graphical interface that allows users to ingest raw nanopore sequencing data in Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to find attributes of interest. In this tutorial, we illustrate each individual step involved in running Sequoia and in the process dissect input data characteristics. We show how to generate Nanopore sequencing-based visualizations by leveraging dimensionality reduction and parameter tuning to separate modified RNA sequences from their unmodified counterparts. Sequoia's interactive features enhance nanopore-based computational methodologies. Sequoia enables users to construct rationales and hypotheses and develop insights about the dynamic nature of RNA from the visual analysis. Sequoia is available at https://github.com/dnonatar/Sequoia .
Collapse
Affiliation(s)
- Ratanond Koonchanok
- Department of Human-Centered Computing, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, USA
| | - Swapna Vidhur Daulatabad
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, USA
| | - Khairi Reda
- Department of Human-Centered Computing, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, USA.
| | - Sarath Chandra Janga
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, USA.
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA.
- Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), Indianapolis, IN, USA.
| |
Collapse
|
3
|
Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021; 39:1348-1365. [PMID: 34750572 PMCID: PMC8988251 DOI: 10.1038/s41587-021-01108-x] [Citation(s) in RCA: 806] [Impact Index Per Article: 201.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 09/22/2021] [Indexed: 12/13/2022]
Abstract
Rapid advances in nanopore technologies for sequencing single long DNA and RNA molecules have led to substantial improvements in accuracy, read length and throughput. These breakthroughs have required extensive development of experimental and bioinformatics methods to fully exploit nanopore long reads for investigations of genomes, transcriptomes, epigenomes and epitranscriptomes. Nanopore sequencing is being applied in genome assembly, full-length transcript detection and base modification detection and in more specialized areas, such as rapid clinical diagnoses and outbreak surveillance. Many opportunities remain for improving data quality and analytical approaches through the development of new nanopores, base-calling methods and experimental protocols tailored to particular applications.
Collapse
Affiliation(s)
- Yunhao Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Yue Zhao
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
- Biomedical Informatics Shared Resources, The Ohio State University, Columbus, OH, USA
| | - Audrey Bollas
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Yuru Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
- Biomedical Informatics Shared Resources, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
4
|
Sequoia: an interactive visual analytics platform for interpretation and feature extraction from nanopore sequencing datasets. BMC Genomics 2021; 22:513. [PMID: 34233619 PMCID: PMC8262049 DOI: 10.1186/s12864-021-07791-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 06/10/2021] [Indexed: 11/11/2022] Open
Abstract
Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07791-z.
Collapse
|
5
|
Magi A, Semeraro R, Mingrino A, Giusti B, D'Aurizio R. Nanopore sequencing data analysis: state of the art, applications and challenges. Brief Bioinform 2019. [PMID: 28637243 DOI: 10.1093/bib/bbx062] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The nanopore sequencing process is based on the transit of a DNA molecule through a nanoscopic pore, and since the 90s is considered as one of the most promising approaches to detect polymeric molecules. In 2014, Oxford Nanopore Technologies (ONT) launched a beta-testing program that supplied the scientific community with the first prototype of a nanopore sequencer: the MinION. Thanks to this program, several research groups had the opportunity to evaluate the performance of this novel instrument and develop novel computational approaches for analyzing this new generation of data. Despite the short period of time from the release of the MinION, a large number of algorithms and tools have been developed for base calling, data handling, read mapping, de novo assembly and variant discovery. Here, we face the main computational challenges related to the analysis of nanopore data, and we carry out a comprehensive and up-to-date survey of the algorithmic solutions adopted by the bioinformatic community comparing performance and reporting limits and advantages of using this new generation of sequences for genomic analyses. Our analyses demonstrate that the use of nanopore data dramatically improves the de novo assembly of genomes and allows for the exploration of structural variants with an unprecedented accuracy and resolution. However, despite the impressive improvements reached by ONT in the past 2 years, the use of these data for small-variant calling is still challenging, and at present, it needs to be coupled with complementary short sequences for mitigating the intrinsic biases of nanopore sequencing technology.
Collapse
Affiliation(s)
- Alberto Magi
- Department of Statistics, National Cheng Kung University in Taiwan
| | - Roberto Semeraro
- Department of Molecular Physiology and Biophysics, Vanderbilt University, USA
| | | | - Betti Giusti
- Department of Biostatistics, Vanderbilt University, USA
| | | |
Collapse
|
6
|
Bolognini D, Bartalucci N, Mingrino A, Vannucchi AM, Magi A. NanoR: A user-friendly R package to analyze and compare nanopore sequencing data. PLoS One 2019; 14:e0216471. [PMID: 31071140 PMCID: PMC6508625 DOI: 10.1371/journal.pone.0216471] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 04/23/2019] [Indexed: 01/21/2023] Open
Abstract
MinION and GridION X5 from Oxford Nanopore Technologies are devices for real-time DNA and RNA sequencing. On the one hand, MinION is the only real-time, low cost and portable sequencing device and, thanks to its unique properties, is becoming more and more popular among biologists; on the other, GridION X5, mainly for its costs, is less widespread but highly suitable for researchers with large sequencing projects. Despite the fact that Oxford Nanopore Technologies' devices have been increasingly used in the last few years, there is a lack of high-performing and user-friendly tools to handle the data outputted by both MinION and GridION X5 platforms. Here we present NanoR, a cross-platform R package designed with the purpose to simplify and improve nanopore data visualization. Indeed, NanoR is built on few functions but overcomes the capabilities of existing tools to extract meaningful informations from MinION sequencing data; in addition, as exclusive features, NanoR can deal with GridION X5 sequencing outputs and allows comparison of both MinION and GridION X5 sequencing data in one command. NanoR is released as free package for R at https://github.com/davidebolo1993/NanoR.
Collapse
Affiliation(s)
- Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Largo Brambilla 3, Florence, Italy
- * E-mail:
| | - Niccolò Bartalucci
- Department of Experimental and Clinical Medicine, University of Florence, Largo Brambilla 3, Florence, Italy
| | - Alessandra Mingrino
- Department of Experimental and Clinical Medicine, University of Florence, Largo Brambilla 3, Florence, Italy
| | - Alessandro Maria Vannucchi
- Department of Experimental and Clinical Medicine, University of Florence, Largo Brambilla 3, Florence, Italy
| | - Alberto Magi
- Department of Experimental and Clinical Medicine, University of Florence, Largo Brambilla 3, Florence, Italy
| |
Collapse
|
7
|
A new parallel pipeline for DNA methylation analysis of long reads datasets. BMC Bioinformatics 2017; 18:161. [PMID: 28274198 PMCID: PMC5343294 DOI: 10.1186/s12859-017-1574-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 03/01/2017] [Indexed: 12/02/2022] Open
Abstract
Background DNA methylation is an important mechanism of epigenetic regulation in development and disease. New generation sequencers allow genome-wide measurements of the methylation status by reading short stretches of the DNA sequence (Methyl-seq). Several software tools for methylation analysis have been proposed over recent years. However, the current trend is that the new sequencers and the ones expected for an upcoming future yield sequences of increasing length, making these software tools inefficient and obsolete. Results In this paper, we propose a new software based on a strategy for methylation analysis of Methyl-seq sequencing data that requires much shorter execution times while yielding a better level of sensitivity, particularly for datasets composed of long reads. This strategy can be exported to other methylation, DNA and RNA analysis tools. Conclusions The developed software tool achieves execution times one order of magnitude shorter than the existing tools, while yielding equal sensitivity for short reads and even better sensitivity for long reads. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1574-3) contains supplementary material, which is available to authorized users.
Collapse
|