1
|
Petri AJ, Sahlin K. De novo clustering of large long-read transcriptome datasets with isONclust3. Bioinformatics 2025; 41:btaf207. [PMID: 40265453 PMCID: PMC12057813 DOI: 10.1093/bioinformatics/btaf207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2025] [Revised: 04/15/2025] [Accepted: 04/21/2025] [Indexed: 04/24/2025] Open
Abstract
MOTIVATION Long-read sequencing techniques can sequence transcripts from end to end, greatly improving our ability to study the transcription process. Although there are several well-established tools for long-read transcriptome analysis, most are reference-based. This limits the analysis of organisms without high-quality reference genomes and samples or genes with high variability (e.g. cancer samples or some gene families). In such settings, analysis using a reference-free method is favorable. The computational problem of clustering long reads by region of common origin is well-established for reference-free transcriptome analysis pipelines. Such clustering enables large datasets to be split roughly by gene family and, therefore, an independent analysis of each cluster. There exist tools for this. However, none of those tools can efficiently process the large amount of reads that are now generated by long-read sequencing technologies. RESULTS We present isONclust3, an improved algorithm over isONclust and isONclust2, to cluster massive long-read transcriptome datasets into gene families. Like isONclust, isONclust3 represents each cluster with a set of minimizers. However, unlike other approaches, isONclust3 dynamically updates the cluster representation during clustering by adding high-confidence minimizers from new reads assigned to the cluster and employs an iterative cluster-merging step. We show that isONclust3 yields results with higher or comparable quality to state-of-the-art algorithms but is 10-100 times faster on large datasets. Also, using a 256 Gb computing node, isONclust3 was the only tool that could cluster 37 million PacBio reads, which is a typical throughput of the recent PacBio Revio sequencing machine. AVAILABILITY AND IMPLEMENTATION https://github.com/aljpetri/isONclust3.
Collapse
Affiliation(s)
- Alexander J Petri
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden
- Department of Computer Science, University of Helsinki, Helsinki 00014, Finland
| | - Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden
| |
Collapse
|
2
|
Monzó C, Frankish A, Conesa A. Notable challenges posed by long-read sequencing for the study of transcriptional diversity and genome annotation. Genome Res 2025; 35:583-592. [PMID: 40032585 PMCID: PMC12047247 DOI: 10.1101/gr.279865.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 01/30/2025] [Indexed: 03/05/2025]
Abstract
Long-read sequencing (LRS) technologies have revolutionized transcriptomic research by enabling the comprehensive sequencing of full-length transcripts. Using these technologies, researchers have reported tens of thousands of novel transcripts, even in well-annotated genomes, while developing new algorithms and experimental approaches to handle the noisy data. The Long-read RNA-seq Genome Annotation Assessment Project community effort benchmarked LRS methods in transcriptomics and validated many novel, lowly expressed, often times sample-specific transcripts identified by long reads. These molecules represent deviations of the major transcriptional program that were overlooked by short-read sequencing methods but are now captured by the full-length, single-molecule approach. This Perspective discusses the challenges and opportunities associated with LRS' capacity to unravel this fraction of the transcriptome, in terms of both transcriptome biology and genome annotation. For transcriptome biology, we need to develop novel experimental and computational methods to effectively differentiate technology errors from rare but real molecules. For genome annotation, we must agree on the strategy to capture molecular variability while still defining reference annotations that are useful for the genomics community.
Collapse
Affiliation(s)
- Carolina Monzó
- Institute for Integrative Systems Biology (I2SysBio), Spanish National Research Council (CSIC), Paterna 46980, Spain
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Ana Conesa
- Institute for Integrative Systems Biology (I2SysBio), Spanish National Research Council (CSIC), Paterna 46980, Spain;
| |
Collapse
|
3
|
Monzó C, Liu T, Conesa A. Transcriptomics in the era of long-read sequencing. Nat Rev Genet 2025:10.1038/s41576-025-00828-z. [PMID: 40155769 DOI: 10.1038/s41576-025-00828-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2025] [Indexed: 04/01/2025]
Abstract
Transcriptome sequencing revolutionized the analysis of gene expression, providing an unbiased approach to gene detection and quantification that enabled the discovery of novel isoforms, alternative splicing events and fusion transcripts. However, although short-read sequencing technologies have surpassed the limited dynamic range of previous technologies such as microarrays, they have limitations, for example, in resolving full-length transcripts and complex isoforms. Over the past 5 years, long-read sequencing technologies have matured considerably, with improvements in instrumentation and analytical methods, enabling their application to RNA sequencing (RNA-seq). Benchmarking studies are beginning to identify the strengths and limitations of long-read RNA-seq, although there remains a need for comprehensive resources to guide newcomers through the intricacies of this approach. In this Review, we provide a comprehensive overview of the long-read RNA-seq workflow, from library preparation and sequencing challenges to core data processing, downstream analyses and emerging developments. We present an extensive inventory of experimental and analytical methods and discuss current challenges and prospects.
Collapse
Affiliation(s)
- Carolina Monzó
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain.
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain.
| |
Collapse
|
4
|
Santucci K, Cheng Y, Xu SM, Janitz M. Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches. Brief Funct Genomics 2024; 23:683-694. [PMID: 39158328 DOI: 10.1093/bfgp/elae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/29/2024] [Accepted: 07/31/2024] [Indexed: 08/20/2024] Open
Abstract
Long-read sequencing technologies can capture entire RNA transcripts in a single sequencing read, reducing the ambiguity in constructing and quantifying transcript models in comparison to more common and earlier methods, such as short-read sequencing. Recent improvements in the accuracy of long-read sequencing technologies have expanded the scope for novel splice isoform detection and have also enabled a far more accurate reconstruction of complex splicing patterns and transcriptomes. Additionally, the incorporation and advancements of machine learning and deep learning algorithms in bioinformatic software have significantly improved the reliability of long-read sequencing transcriptomic studies. However, there is a lack of consensus on what bioinformatic tools and pipelines produce the most precise and consistent results. Thus, this review aims to discuss and compare the performance of available methods for novel isoform discovery with long-read sequencing technologies, with 25 tools being presented. Furthermore, this review intends to demonstrate the need for developing standard analytical pipelines, tools, and transcript model conventions for novel isoform discovery and transcriptomic studies.
Collapse
Affiliation(s)
- Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
5
|
Qi J, Li Z, Zhang YZ, Li G, Gao X, Han R. TDFPS-Designer: an efficient toolkit for barcode design and selection in nanopore sequencing. Genome Biol 2024; 25:285. [PMID: 39497190 PMCID: PMC11533379 DOI: 10.1186/s13059-024-03423-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 10/17/2024] [Indexed: 11/08/2024] Open
Abstract
Oxford Nanopore Technologies (ONT) offers ultrahigh-throughput multi-sample sequencing but only provides barcode kits that enable up to 96-sample multiplexing. We present TDFPS-Designer, a new toolkit for nanopore sequencing barcode design, which creates significantly more barcodes: 137 with a length of 20 base pairs, 410 at 24 bp, and 1779 at 30 bp, far surpassing ONT's offerings. It includes GPU-based acceleration for ultra-fast demultiplexing and designs robust barcodes suitable for high-error ONT data. TDFPS-Designer outperforms current methods, improving the demultiplexing recall rate by 20% relative to Guppy, without a reduction in precision.
Collapse
Affiliation(s)
- Junhai Qi
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China
| | - Zhengyi Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China
| | - Yao-Zhong Zhang
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, 108-8639, Japan
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Makkah, 23955, Saudi Arabia.
| | - Renmin Han
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.
| |
Collapse
|
6
|
Rosani U, Bortoletto E, Zhang X, Huang BW, Xin LS, Krupovic M, Bai CM. Long-read transcriptomics of Ostreid herpesvirus 1 uncovers a conserved expression strategy for the capsid maturation module and pinpoints a mechanism for evasion of the ADAR-based antiviral defence. Virus Evol 2024; 10:veae088. [PMID: 39555210 PMCID: PMC11565193 DOI: 10.1093/ve/veae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 10/01/2024] [Accepted: 10/15/2024] [Indexed: 11/19/2024] Open
Abstract
Ostreid herpesvirus 1 (OsHV-1), a member of the family Malacoherpesviridae (order Herpesvirales), is a major pathogen of bivalves. However, the molecular details of the malacoherpesvirus infection cycle and its overall similarity to the replication of mammalian herpesviruses (family Orthoherpesviridae) remain obscure. Here, to gain insights into the OsHV-1 biology, we performed long-read sequencing of infected blood clams, Anadara broughtonii, which yielded over one million OsHV-1 long reads. These data enabled the annotation of the viral genome with 78 gene units and 274 transcripts, of which 67 were polycistronic mRNAs, 35 ncRNAs, and 20 natural antisense transcripts (NATs). Transcriptomics and proteomics data indicate preferential transcription and independent translation of the capsid scaffold protein as an OsHV-1 capsid maturation protease isoform. The conservation of this transcriptional architecture across Herpesvirales likely indicates its functional importance and ancient origin. Moreover, we traced RNA editing events using short-read sequencing and supported the presence of inosine nucleotides in native OsHV-1 RNA, consistent with the activity of adenosine deaminase acting on dsRNA 1 (ADAR1). Our data suggest that, whereas RNA hyper-editing is concentrated in specific regions of the OsHV-1 genome, single-nucleotide editing is more dispersed along the OsHV-1 transcripts. In conclusion, we reveal the existence of conserved pan-Herpesvirales transcriptomic architecture of the capsid maturation module and uncover a transcription-based viral counter defence mechanism, which presumably facilitates the evasion of the host ADAR antiviral system.
Collapse
Affiliation(s)
- Umberto Rosani
- Department of Biology, University of Padova, Via U. Bassi, 58/B, Padova 35121, Italy
| | - Enrico Bortoletto
- Department of Biology, University of Padova, Via U. Bassi, 58/B, Padova 35121, Italy
| | - Xiang Zhang
- State Key Laboratory of Mariculture Biobreeding and Sustainable Goods, Key Laboratory of Maricultural Organism Disease Control, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Rd, Qingdao 266071, China
| | - Bo-Wen Huang
- State Key Laboratory of Mariculture Biobreeding and Sustainable Goods, Key Laboratory of Maricultural Organism Disease Control, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Rd, Qingdao 266071, China
| | - Lu-Sheng Xin
- State Key Laboratory of Mariculture Biobreeding and Sustainable Goods, Key Laboratory of Maricultural Organism Disease Control, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Rd, Qingdao 266071, China
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Archaeal Virology Unit, 25 rue du Dr. Roux, Paris 75015, France
| | - Chang-Ming Bai
- State Key Laboratory of Mariculture Biobreeding and Sustainable Goods, Key Laboratory of Maricultural Organism Disease Control, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Rd, Qingdao 266071, China
- Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, 168 Wenhai Rd, Qingdao 266237, China
| |
Collapse
|
7
|
Piątkowski J, Koźluk K, Golik P. Mitochondrial transcriptome of Candida albicans in flagranti - direct RNA sequencing reveals a new layer of information. BMC Genomics 2024; 25:860. [PMID: 39277734 PMCID: PMC11401289 DOI: 10.1186/s12864-024-10791-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 09/10/2024] [Indexed: 09/17/2024] Open
Abstract
BACKGROUND Organellar transcriptomes are relatively under-studied systems, with data related to full-length transcripts and posttranscriptional modifications remaining sparse. Direct RNA sequencing presents the possibility of accessing a previously unavailable layer of information pertaining to transcriptomic data, as well as circumventing the biases introduced by second-generation RNA-seq platforms. Direct long-read ONT sequencing allows for the isoform analysis of full-length transcripts and the detection of posttranscriptional modifications. However, there are still relatively few projects employing this method specifically for studying organellar transcriptomes. RESULTS Candida albicans is a promising model for investigating nucleo-mitochondrial interactions. This work comprises ONT sequencing of the Candida albicans mitochondrial transcriptome along with the development of a dedicated data analysis pipeline. This approach allowed for the detection of complete transcript isoforms and posttranslational RNA modifications, as well as an analysis of C. albicans deletion mutants in genes coding for the 5' and 3' mitochondrial RNA exonucleases CaPET127 and CaDSS1. It also enabled for corrections to previous studies in terms of 3' and 5' transcript ends. A number of intermediate splicing isoforms was also discovered, along with mature and unspliced transcripts and changes in their abundances resulting from disruption of both 5' and 3' exonucleolytic processing. Multiple putative posttranscriptional modification sites have also been detected. CONCLUSIONS This preliminary work demonstrates the suitability of direct RNA sequencing for studying yeast mitochondrial transcriptomes in general and provides new insights into the workings of the C. albicans mitochondrial transcriptome in particular. It also provides a general roadmap for analyzing mitochondrial transcriptomic data from other organisms.
Collapse
Affiliation(s)
- Jakub Piątkowski
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, 02-106, Warsaw, Poland.
| | - Kacper Koźluk
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, 02-106, Warsaw, Poland
| | - Paweł Golik
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, 02-106, Warsaw, Poland
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, 02-106, Warsaw, Poland
| |
Collapse
|
8
|
Hou Z, Yang S, He W, Lu T, Feng X, Zang L, Bai W, Chen X, Nie B, Li C, Wei M, Ma L, Han Z, Zou Q, Li W, Wang L. The haplotype-resolved genome of diploid Chrysanthemum indicum unveils new acacetin synthases genes and their evolutionary history. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024. [PMID: 38864745 DOI: 10.1111/tpj.16854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 03/31/2024] [Accepted: 05/03/2024] [Indexed: 06/13/2024]
Abstract
Acacetin, a flavonoid compound, possesses a wide range of pharmacological effects, including antimicrobial, immune regulation, and anticancer effects. Some key steps in its biosynthetic pathway were largely unknown in flowering plants. Here, we present the first haplotype-resolved genome of Chrysanthemum indicum, whose dried flowers contain abundant flavonoids and have been utilized as traditional Chinese medicine. Various phylogenetic analyses revealed almost equal proportion of three tree topologies among three Chrysanthemum species (C. indicum, C. nankingense, and C. lavandulifolium), indicating that frequent gene flow among Chrysanthemum species or incomplete lineage sorting due to rapid speciation might contribute to conflict topologies. The expanded gene families in C. indicum were associated with oxidative functions. Through comprehensive candidate gene screening, we identified five flavonoid O-methyltransferase (FOMT) candidates, which were highly expressed in flowers and whose expressional levels were significantly correlated with the content of acacetin. Further experiments validated two FOMTs (CI02A009970 and CI03A006662) were capable of catalyzing the conversion of apigenin into acacetin, and these two genes are possibly responsible acacetin accumulation in disc florets and young leaves, respectively. Furthermore, combined analyses of ancestral chromosome reconstruction and phylogenetic trees revealed the distinct evolutionary fates of the two validated FOMT genes. Our study provides new insights into the biosynthetic pathway of flavonoid compounds in the Asteraceae family and offers a model for tracing the origin and evolutionary routes of single genes. These findings will facilitate in vitro biosynthetic production of flavonoid compounds through cellular and metabolic engineering and expedite molecular breeding of C. indicum cultivars.
Collapse
Affiliation(s)
- Zhuangwei Hou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Song Yang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Weijun He
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Tingting Lu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Xunmeng Feng
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lanlan Zang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Wenhui Bai
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Xueqing Chen
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Bao Nie
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Cheng Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Min Wei
- China Resources Sanjiu Medical and Pharmaceutical Co., Ltd, Shenzhen, 518110, China
| | - Liangju Ma
- China Resources Sanjiu Medical and Pharmaceutical Co., Ltd, Shenzhen, 518110, China
| | - Zhengzhou Han
- China Resources Sanjiu Medical and Pharmaceutical Co., Ltd, Shenzhen, 518110, China
| | - Qingjun Zou
- China Resources Sanjiu Medical and Pharmaceutical Co., Ltd, Shenzhen, 518110, China
- National Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs, National Resource Center for Chinese Materia Medica, Chinese Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Wei Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Li Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
- State Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs, Beijing, 100700, China
| |
Collapse
|
9
|
Sneddon A, Ravindran A, Shanmuganandam S, Kanchi M, Hein N, Jiang S, Shirokikh N, Eyras E. Biochemical-free enrichment or depletion of RNA classes in real-time during direct RNA sequencing with RISER. Nat Commun 2024; 15:4422. [PMID: 38789440 PMCID: PMC11126589 DOI: 10.1038/s41467-024-48673-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
The heterogeneous composition of cellular transcriptomes poses a major challenge for detecting weakly expressed RNA classes, as they can be obscured by abundant RNAs. Although biochemical protocols can enrich or deplete specified RNAs, they are time-consuming, expensive and can compromise RNA integrity. Here we introduce RISER, a biochemical-free technology for the real-time enrichment or depletion of RNA classes. RISER performs selective rejection of molecules during direct RNA sequencing by identifying RNA classes directly from nanopore signals with deep learning and communicating with the sequencing hardware in real time. By targeting the dominant messenger and mitochondrial RNA classes for depletion, RISER reduces their respective read counts by more than 85%, resulting in an increase in sequencing depth of 47% on average for long non-coding RNAs. We also apply RISER for the depletion of globin mRNA in whole blood, achieving a decrease in globin reads by more than 90% as well as an increase in non-globin reads by 16% on average. Furthermore, using a GPU or a CPU, RISER is faster than GPU-accelerated basecalling and mapping. RISER's modular and retrainable software and intuitive command-line interface allow easy adaptation to other RNA classes. RISER is available at https://github.com/comprna/riser .
Collapse
Affiliation(s)
- Alexandra Sneddon
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2601, Australia
- Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Agin Ravindran
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2601, Australia
- Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Somasundhari Shanmuganandam
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia
| | - Madhu Kanchi
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Nadine Hein
- ACRF Department of Cancer Biology and Therapeutics, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Simon Jiang
- Department of Immunity, Inflammation and Infection, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
- Centre for Personalised Immunology, NHMRC Centre for Research Excellence, Australian National University, Canberra, ACT 2601, Australia
- Department of Renal Medicine, The Canberra Hospital, Canberra, ACT 2605, Australia
| | - Nikolay Shirokikh
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia.
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2601, Australia.
- Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia.
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia.
| |
Collapse
|
10
|
Nanes Sarfati D, Xue Y, Song ES, Byrne A, Le D, Darmanis S, Quake SR, Burlacot A, Sikes J, Wang B. Coordinated wound responses in a regenerative animal-algal holobiont. Nat Commun 2024; 15:4032. [PMID: 38740753 DOI: 10.1038/s41467-024-48366-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 04/24/2024] [Indexed: 05/16/2024] Open
Abstract
Animal regeneration involves coordinated responses across cell types throughout the animal body. In endosymbiotic animals, whether and how symbionts react to host injury and how cellular responses are integrated across species remain unexplored. Here, we study the acoel Convolutriloba longifissura, which hosts symbiotic Tetraselmis sp. green algae and can regenerate entire bodies from tissue fragments. We show that animal injury causes a decline in the photosynthetic efficiency of the symbiotic algae, alongside two distinct, sequential waves of transcriptional responses in acoel and algal cells. The initial algal response is characterized by the upregulation of a cohort of photosynthesis-related genes, though photosynthesis is not necessary for regeneration. A conserved animal transcription factor, runt, is induced after injury and required for acoel regeneration. Knockdown of Cl-runt dampens transcriptional responses in both species and further reduces algal photosynthetic efficiency post-injury. Our results suggest that the holobiont functions as an integrated unit of biological organization by coordinating molecular networks across species through the runt-dependent animal regeneration program.
Collapse
Affiliation(s)
| | - Yuan Xue
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Eun Sun Song
- Department of Applied Physics, Stanford University, Stanford, CA, USA
| | | | - Daniel Le
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | | | - Stephen R Quake
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Department of Applied Physics, Stanford University, Stanford, CA, USA
| | - Adrien Burlacot
- Department of Biology, Stanford University, Stanford, CA, USA
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| | - James Sikes
- Department of Biology, University of San Francisco, San Francisco, CA, USA.
| | - Bo Wang
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
11
|
Ma J, Zhao X, Qi E, Han R, Yu T, Li G. Highly efficient clustering of long-read transcriptomic data with GeLuster. Bioinformatics 2024; 40:btae059. [PMID: 38310330 PMCID: PMC10881092 DOI: 10.1093/bioinformatics/btae059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/08/2024] [Accepted: 01/30/2024] [Indexed: 02/05/2024] Open
Abstract
MOTIVATION The advancement of long-read RNA sequencing technologies leads to a bright future for transcriptome analysis, in which clustering long reads according to their gene family of origin is of great importance. However, existing de novo clustering algorithms require plenty of computing resources. RESULTS We developed a new algorithm GeLuster for clustering long RNA-seq reads. Based on our tests on one simulated dataset and nine real datasets, GeLuster exhibited superior performance. On the tested Nanopore datasets it ran 2.9-17.5 times as fast as the second-fastest method with less than one-seventh of memory consumption, while achieving higher clustering accuracy. And on the PacBio data, GeLuster also had a similar performance. It sets the stage for large-scale transcriptome study in future. AVAILABILITY AND IMPLEMENTATION GeLuster is freely available at https://github.com/yutingsdu/GeLuster.
Collapse
Affiliation(s)
- Junchi Ma
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Xiaoyu Zhao
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Enfeng Qi
- School of Mathematics and Statistics, Guangxi Normal University, Guilin 541000, China
| | - Renmin Han
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
| | - Ting Yu
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
| |
Collapse
|
12
|
Benotmane JK, Kueckelhaus J, Will P, Zhang J, Ravi VM, Joseph K, Sankowski R, Beck J, Lee-Chang C, Schnell O, Heiland DH. High-sensitive spatially resolved T cell receptor sequencing with SPTCR-seq. Nat Commun 2023; 14:7432. [PMID: 37973846 PMCID: PMC10654577 DOI: 10.1038/s41467-023-43201-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 11/03/2023] [Indexed: 11/19/2023] Open
Abstract
Spatial resolution of the T cell repertoire is essential for deciphering cancer-associated immune dysfunction. Current spatially resolved transcriptomic technologies are unable to directly annotate T cell receptors (TCR). We present spatially resolved T cell receptor sequencing (SPTCR-seq), which integrates optimized target enrichment and long-read sequencing for highly sensitive TCR sequencing. The SPTCR computational pipeline achieves yield and coverage per TCR comparable to alternative single-cell TCR technologies. Our comparison of PCR-based and SPTCR-seq methods underscores SPTCR-seq's superior ability to reconstruct the entire TCR architecture, including V, D, J regions and the complementarity-determining region 3 (CDR3). Employing SPTCR-seq, we assess local T cell diversity and clonal expansion across spatially discrete niches. Exploration of the reciprocal interaction of the tumor microenvironmental and T cells discloses the critical involvement of NK and B cells in T cell exhaustion. Integrating spatially resolved omics and TCR sequencing provides as a robust tool for exploring T cell dysfunction in cancers and beyond.
Collapse
Affiliation(s)
- Jasim Kada Benotmane
- Department of Neurosurgery, Medical Center - University of Freiburg, Freiburg, Germany
- Faculty of Medicine, Freiburg University, Freiburg, Germany
- Microenvironment and Immunology Research Laboratory, Medical Center-University of Freiburg, Freiburg, Germany
| | - Jan Kueckelhaus
- Department of Neurosurgery, Medical Center - University of Freiburg, Freiburg, Germany
- Faculty of Medicine, Freiburg University, Freiburg, Germany
- Microenvironment and Immunology Research Laboratory, Medical Center-University of Freiburg, Freiburg, Germany
| | - Paulina Will
- Department of Neurosurgery, Medical Center - University of Freiburg, Freiburg, Germany
- Faculty of Medicine, Freiburg University, Freiburg, Germany
- Microenvironment and Immunology Research Laboratory, Medical Center-University of Freiburg, Freiburg, Germany
| | - Junyi Zhang
- Department of Neurosurgery, Medical Center - University of Freiburg, Freiburg, Germany
- Faculty of Medicine, Freiburg University, Freiburg, Germany
- Microenvironment and Immunology Research Laboratory, Medical Center-University of Freiburg, Freiburg, Germany
| | - Vidhya M Ravi
- Department of Neurosurgery, Medical Center - University of Freiburg, Freiburg, Germany
- Faculty of Medicine, Freiburg University, Freiburg, Germany
- Microenvironment and Immunology Research Laboratory, Medical Center-University of Freiburg, Freiburg, Germany
- Translational NeuroOncology Research Group, Medical Center-University of Freiburg, Freiburg, Germany
| | - Kevin Joseph
- Department of Neurosurgery, Medical Center - University of Freiburg, Freiburg, Germany
- Faculty of Medicine, Freiburg University, Freiburg, Germany
- Microenvironment and Immunology Research Laboratory, Medical Center-University of Freiburg, Freiburg, Germany
- Translational NeuroOncology Research Group, Medical Center-University of Freiburg, Freiburg, Germany
- Center for NeuroModulation (NeuroModul), University of Freiburg, Freiburg, Germany
| | - Roman Sankowski
- Institute of Neuropathology, Medical Center-University of Freiburg, Freiburg, Germany
| | - Jürgen Beck
- Department of Neurosurgery, Medical Center - University of Freiburg, Freiburg, Germany
- Faculty of Medicine, Freiburg University, Freiburg, Germany
| | - Catalina Lee-Chang
- Department of Neurological Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Oliver Schnell
- Department of Neurosurgery, Medical Center - University of Freiburg, Freiburg, Germany
- Faculty of Medicine, Freiburg University, Freiburg, Germany
- Translational NeuroOncology Research Group, Medical Center-University of Freiburg, Freiburg, Germany
| | - Dieter Henrik Heiland
- Department of Neurosurgery, Medical Center - University of Freiburg, Freiburg, Germany.
- Faculty of Medicine, Freiburg University, Freiburg, Germany.
- Microenvironment and Immunology Research Laboratory, Medical Center-University of Freiburg, Freiburg, Germany.
- Department of Neurological Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
- German Cancer Consortium (DKTK), partner site Freiburg, Freiburg, Germany.
| |
Collapse
|
13
|
Kainth AS, Haddad GA, Hall JM, Ruthenburg AJ. Merging short and stranded long reads improves transcript assembly. PLoS Comput Biol 2023; 19:e1011576. [PMID: 37883581 PMCID: PMC10629667 DOI: 10.1371/journal.pcbi.1011576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 11/07/2023] [Accepted: 10/05/2023] [Indexed: 10/28/2023] Open
Abstract
Long-read RNA sequencing has arisen as a counterpart to short-read sequencing, with the potential to capture full-length isoforms, albeit at the cost of lower depth. Yet this potential is not fully realized due to inherent limitations of current long-read assembly methods and underdeveloped approaches to integrate short-read data. Here, we critically compare the existing methods and develop a new integrative approach to characterize a particularly challenging pool of low-abundance long noncoding RNA (lncRNA) transcripts from short- and long-read sequencing in two distinct cell lines. Our analysis reveals severe limitations in each of the sequencing platforms. For short-read assemblies, coverage declines at transcript termini resulting in ambiguous ends, and uneven low coverage results in segmentation of a single transcript into multiple transcripts. Conversely, long-read sequencing libraries lack depth and strand-of-origin information in cDNA-based methods, culminating in erroneous assembly and quantitation of transcripts. We also discover a cDNA synthesis artifact in long-read datasets that markedly impacts the identity and quantitation of assembled transcripts. Towards remediating these problems, we develop a computational pipeline to "strand" long-read cDNA libraries that rectifies inaccurate mapping and assembly of long-read transcripts. Leveraging the strengths of each platform and our computational stranding, we also present and benchmark a hybrid assembly approach that drastically increases the sensitivity and accuracy of full-length transcript assembly on the correct strand and improves detection of biological features of the transcriptome. When applied to a challenging set of under-annotated and cell-type variable lncRNA, our method resolves the segmentation problem of short-read sequencing and the depth problem of long-read sequencing, resulting in the assembly of coherent transcripts with precise 5' and 3' ends. Our workflow can be applied to existing datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.
Collapse
Affiliation(s)
- Amoldeep S. Kainth
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Gabriela A. Haddad
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Johnathon M. Hall
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Alexander J. Ruthenburg
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago, Chicago, Illinois, United States of America
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
14
|
Vachon A, Seo GE, Patel NH, Coffin CS, Marinier E, Eyras E, Osiowy C. Hepatitis B virus serum RNA transcript isoform composition and proportion in chronic hepatitis B patients by nanopore long-read sequencing. Front Microbiol 2023; 14:1233178. [PMID: 37645229 PMCID: PMC10461054 DOI: 10.3389/fmicb.2023.1233178] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 07/31/2023] [Indexed: 08/31/2023] Open
Abstract
Introduction Serum hepatitis B virus (HBV) RNA is a promising new biomarker to manage and predict clinical outcomes of chronic hepatitis B (CHB) infection. However, the HBV serum transcriptome within encapsidated particles, which is the biomarker analyte measured in serum, remains poorly characterized. This study aimed to evaluate serum HBV RNA transcript composition and proportionality by PCR-cDNA nanopore sequencing of samples from CHB patients having varied HBV genotype (gt, A to F) and HBeAg status. Methods Longitudinal specimens from 3 individuals during and following pregnancy (approximately 7 months between time points) were also investigated. HBV RNA extracted from 16 serum samples obtained from 13 patients (73.3% female, 84.6% Asian) was sequenced and serum HBV RNA isoform detection and quantification were performed using three bioinformatic workflows; FLAIR, RATTLE, and a GraphMap-based workflow within the Galaxy application. A spike-in RNA variant (SIRV) control mix was used to assess run quality and coverage. The proportionality of transcript isoforms was based on total HBV reads determined by each workflow. Results All chosen isoform detection workflows showed high agreement in transcript proportionality and composition for most samples. HBV pregenomic RNA (pgRNA) was the most frequently observed transcript isoform (93.8% of patient samples), while other detected transcripts included pgRNA spliced variants, 3' truncated variants and HBx mRNA, depending on the isoform detection method. Spliced variants of pgRNA were primarily observed in HBV gtB, C, E, or F-infected patients, with the Sp1 spliced variant detected most frequently. Twelve other pgRNA spliced variant transcripts were identified, including 3 previously unidentified transcripts, although spliced isoform identification was very dependent on the workflow used to analyze sequence data. Longitudinal sampling among pregnant and post-partum antiviral-treated individuals showed increasing proportions of 3' truncated pgRNA variants over time. Conclusions This study demonstrated long-read sequencing as a promising tool for the characterization of the serum HBV transcriptome. However, further studies are needed to better understand how serum HBV RNA isoform type and proportion are linked to CHB disease progression and antiviral treatment response.
Collapse
Affiliation(s)
- Alicia Vachon
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Grace E. Seo
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Nishi H. Patel
- Department of Medicine and Department of Microbiology, Immunology, and Infectious Diseases, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Carla S. Coffin
- Department of Medicine and Department of Microbiology, Immunology, and Infectious Diseases, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Eric Marinier
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT, Australia
- The John Curtin School of Medical Research, ANU College of Health and Medicine, Canberra, ACT, Australia
- Catalan Institution for Research and Advanced Studies, Barcelona, Spain
- Hospital del Mar Medical Research Institute, Barcelona, Spain
| | - Carla Osiowy
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
| |
Collapse
|
15
|
Petri AJ, Sahlin K. isONform: reference-free transcriptome reconstruction from Oxford Nanopore data. Bioinformatics 2023; 39:i222-i231. [PMID: 37387174 PMCID: PMC10311309 DOI: 10.1093/bioinformatics/btad264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION With advances in long-read transcriptome sequencing, we can now fully sequence transcripts, which greatly improves our ability to study transcription processes. A popular long-read transcriptome sequencing technique is Oxford Nanopore Technologies (ONT), which through its cost-effective sequencing and high throughput, has the potential to characterize the transcriptome in a cell. However, due to transcript variability and sequencing errors, long cDNA reads need substantial bioinformatic processing to produce a set of isoform predictions from the reads. Several genome and annotation-based methods exist to produce transcript predictions. However, such methods require high-quality genomes and annotations and are limited by the accuracy of long-read splice aligners. In addition, gene families with high heterogeneity may not be well represented by a reference genome and would benefit from reference-free analysis. Reference-free methods to predict transcripts from ONT, such as RATTLE, exist, but their sensitivity is not comparable to reference-based approaches. RESULTS We present isONform, a high-sensitivity algorithm to construct isoforms from ONT cDNA sequencing data. The algorithm is based on iterative bubble popping on gene graphs built from fuzzy seeds from the reads. Using simulated, synthetic, and biological ONT cDNA data, we show that isONform has substantially higher sensitivity than RATTLE albeit with some loss in precision. On biological data, we show that isONform's predictions have substantially higher consistency with the annotation-based method StringTie2 compared with RATTLE. We believe isONform can be used both for isoform construction for organisms without well-annotated genomes and as an orthogonal method to verify predictions of reference-based methods. AVAILABILITY AND IMPLEMENTATION https://github.com/aljpetri/isONform.
Collapse
Affiliation(s)
- Alexander J Petri
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden
| | - Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden
| |
Collapse
|
16
|
Javaran VJ, Poursalavati A, Lemoyne P, Ste-Croix DT, Moffett P, Fall ML. NanoViromics: long-read sequencing of dsRNA for plant virus and viroid rapid detection. Front Microbiol 2023; 14:1192781. [PMID: 37415816 PMCID: PMC10320856 DOI: 10.3389/fmicb.2023.1192781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 06/06/2023] [Indexed: 07/08/2023] Open
Abstract
There is a global need for identifying viral pathogens, as well as for providing certified clean plant materials, in order to limit the spread of viral diseases. A key component of management programs for viral-like diseases is having a diagnostic tool that is quick, reliable, inexpensive, and easy to use. We have developed and validated a dsRNA-based nanopore sequencing protocol as a reliable method for detecting viruses and viroids in grapevines. We compared our method, which we term direct-cDNA sequencing from dsRNA (dsRNAcD), to direct RNA sequencing from rRNA-depleted total RNA (rdTotalRNA), and found that it provided more viral reads from infected samples. Indeed, dsRNAcD was able to detect all of the viruses and viroids detected using Illumina MiSeq sequencing (dsRNA-MiSeq). Furthermore, dsRNAcD sequencing was also able to detect low-abundance viruses that rdTotalRNA sequencing failed to detect. Additionally, rdTotalRNA sequencing resulted in a false-positive viroid identification due to the misannotation of a host-driven read. Two taxonomic classification workflows, DIAMOND & MEGAN (DIA & MEG) and Centrifuge & Recentrifuge (Cent & Rec), were also evaluated for quick and accurate read classification. Although the results from both workflows were similar, we identified pros and cons for both workflows. Our study shows that dsRNAcD sequencing and the proposed data analysis workflows are suitable for consistent detection of viruses and viroids, particularly in grapevines where mixed viral infections are common.
Collapse
Affiliation(s)
- Vahid J. Javaran
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC, Canada
- Centre SÈVE, Département de Biologie, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Abdonaser Poursalavati
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC, Canada
- Centre SÈVE, Département de Biologie, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Pierre Lemoyne
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC, Canada
| | - Dave T. Ste-Croix
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC, Canada
- Département de phytologie, Faculté des Sciences de l’Agriculture et de l’Alimentation, Université Laval, Québec, QC, Canada
| | - Peter Moffett
- Centre SÈVE, Département de Biologie, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Mamadou L. Fall
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC, Canada
| |
Collapse
|
17
|
Paisey EK, Santosa E, Kurniawati A, Supijatno, Matra DD. Long-reads-based transcriptome dataset from leaves of lime, Citrus aurantiifolia (Christm.) Swingle treated by ethephon and abscisic acid. Data Brief 2023; 48:109167. [PMID: 37206898 PMCID: PMC10189083 DOI: 10.1016/j.dib.2023.109167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 04/04/2023] [Accepted: 04/12/2023] [Indexed: 09/19/2024] Open
Abstract
The lime plant is a horticultural plant that grows in tropical regions. One of the cultivation maintenances to increase the production of lime fruits is pruning. However, the pruning technique of lime requires high production costs. In addition, phytohormones such as ethylene and abscisic acid have regulation to help drop leaves and branches. The study aimed to identify genes in lime involved in the self-pruning process during ethephon and abscisic acid treatments. Total RNA was extracted and subjected to long-read sequencing using a PCR-cDNA sequencing kit, Oxford Nanopore Technologies. The transcripts were produced 5,914 using the RATTLE program and ranged from 201 - 8,156 bp, and N50 was 1,292 bp. The RNA-seq dataset is available as a raw sequence read that scientists can further process and analyze, and this data can be helpful for lime breeding programs that can shed branches and leaves.
Collapse
Affiliation(s)
- Elda Kristiani Paisey
- Agronomy and Horticulture Study Program, Graduate School of IPB University, Bogor, Indonesia
- Agrotechnology Study Program, Faculty of Agriculture, Papua University, Manokwari, Indonesia
| | - Edi Santosa
- Department of Agronomy and Horticulture, Faculty of Agriculture, IPB University, Bogor, Indonesia
| | - Ani Kurniawati
- Department of Agronomy and Horticulture, Faculty of Agriculture, IPB University, Bogor, Indonesia
| | - Supijatno
- Department of Agronomy and Horticulture, Faculty of Agriculture, IPB University, Bogor, Indonesia
| | - Deden Derajat Matra
- Department of Agronomy and Horticulture, Faculty of Agriculture, IPB University, Bogor, Indonesia
| |
Collapse
|
18
|
Nip KM, Hafezqorani S, Gagalova KK, Chiu R, Yang C, Warren RL, Birol I. Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2. Nat Commun 2023; 14:2940. [PMID: 37217540 PMCID: PMC10202958 DOI: 10.1038/s41467-023-38553-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 05/08/2023] [Indexed: 05/24/2023] Open
Abstract
Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly. We introduce "RNA-Bloom2 [ https://github.com/bcgsc/RNA-Bloom ]", a reference-free assembly method for long-read transcriptome sequencing data. Using simulated datasets and spike-in control data, we show that the transcriptome assembly quality of RNA-Bloom2 is competitive to those of reference-based methods. Furthermore, we find that RNA-Bloom2 requires 27.0 to 80.6% of the peak memory and 3.6 to 10.8% of the total wall-clock runtime of a competing reference-free method. Finally, we showcase RNA-Bloom2 in assembling a transcriptome sample of Picea sitchensis (Sitka spruce). Since our method does not rely on a reference, it further sets the groundwork for large-scale comparative transcriptomics where high-quality draft genome assemblies are not readily available.
Collapse
Affiliation(s)
- Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, V5Z 4S6, Canada.
| | - Saber Hafezqorani
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, V5Z 4S6, Canada
| | - Kristina K Gagalova
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, V5Z 4S6, Canada
| | - Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada.
| |
Collapse
|
19
|
Gao L, Xu W, Xin T, Song J. Application of third-generation sequencing to herbal genomics. FRONTIERS IN PLANT SCIENCE 2023; 14:1124536. [PMID: 36959935 PMCID: PMC10027759 DOI: 10.3389/fpls.2023.1124536] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 02/02/2023] [Indexed: 06/18/2023]
Abstract
There is a long history of traditional medicine use. However, little genetic information is available for the plants used in traditional medicine, which limits the exploitation of these natural resources. Third-generation sequencing (TGS) techniques have made it possible to gather invaluable genetic information and develop herbal genomics. In this review, we introduce two main TGS techniques, PacBio SMRT technology and Oxford Nanopore technology, and compare the two techniques against Illumina, the predominant next-generation sequencing technique. In addition, we summarize the nuclear and organelle genome assemblies of commonly used medicinal plants, choose several examples from genomics, transcriptomics, and molecular identification studies to dissect the specific processes and summarize the advantages and disadvantages of the two TGS techniques when applied to medicinal organisms. Finally, we describe how we expect that TGS techniques will be widely utilized to assemble telomere-to-telomere (T2T) genomes and in epigenomics research involving medicinal plants.
Collapse
|
20
|
Kirov I, Merkulov P, Polkhovskaya E, Konstantinov Z, Kazancev M, Saenko K, Polkhovskiy A, Dudnikov M, Garibyan T, Demurin Y, Soloviev A. Epigenetic Stress and Long-Read cDNA Sequencing of Sunflower ( Helianthus annuus L.) Revealed the Origin of the Plant Retrotranscriptome. PLANTS (BASEL, SWITZERLAND) 2022; 11:3579. [PMID: 36559691 PMCID: PMC9784723 DOI: 10.3390/plants11243579] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 12/13/2022] [Accepted: 12/13/2022] [Indexed: 06/12/2023]
Abstract
Transposable elements (TEs) contribute not only to genome diversity but also to transcriptome diversity in plants. To unravel the sources of LTR retrotransposon (RTE) transcripts in sunflower, we exploited a recently developed transposon activation method ('TEgenesis') along with long-read cDNA Nanopore sequencing. This approach allows for the identification of 56 RTE transcripts from different genomic loci including full-length and non-autonomous RTEs. Using the mobilome analysis, we provided a new set of expressed and transpositional active sunflower RTEs for future studies. Among them, a Ty3/Gypsy RTE called SUNTY3 exhibited ongoing transposition activity, as detected by eccDNA analysis. We showed that the sunflower genome contains a diverse set of non-autonomous RTEs encoding a single RTE protein, including the previously described TR-GAG (terminal repeat with the GAG domain) as well as new categories, TR-RT-RH, TR-RH, and TR-INT-RT. Our results demonstrate that 40% of the loci for RTE-related transcripts (nonLTR-RTEs) lack their LTR sequences and resemble conventional eucaryotic genes encoding RTE-related proteins with unknown functions. It was evident based on phylogenetic analysis that three nonLTR-RTEs encode GAG (HadGAG1-3) fused to a host protein. These HadGAG proteins have homologs found in other plant species, potentially indicating GAG domestication. Ultimately, we found that the sunflower retrotranscriptome originated from the transcription of active RTEs, non-autonomous RTEs, and gene-like RTE transcripts, including those encoding domesticated proteins.
Collapse
Affiliation(s)
- Ilya Kirov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Pavel Merkulov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Ekaterina Polkhovskaya
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Zakhar Konstantinov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Mikhail Kazancev
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Ksenia Saenko
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Federal Research Center of Biological Plant Protection, 350039 Krasnodar, Russia
| | - Alexander Polkhovskiy
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
- Skolkovo Institute of Science and Technology, 121205 Moscow, Russia
| | - Maxim Dudnikov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Tsovinar Garibyan
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Yakov Demurin
- Pustovoit All-Russia Research Institute of Oilseed Crops, Filatova St. 17, 350038 Krasnodar, Russia
| | - Alexander Soloviev
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| |
Collapse
|
21
|
Orabi B, Xie N, McConeghy B, Dong X, Chauve C, Hach F. Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing. Nucleic Acids Res 2022; 51:e11. [PMID: 36478271 PMCID: PMC9881145 DOI: 10.1093/nar/gkac1112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/26/2022] [Accepted: 11/08/2022] [Indexed: 12/13/2022] Open
Abstract
Alternative splicing (AS) is an important mechanism in the development of many cancers, as novel or aberrant AS patterns play an important role as an independent onco-driver. In addition, cancer-specific AS is potentially an effective target of personalized cancer therapeutics. However, detecting AS events remains a challenging task, especially if these AS events are novel. This is exacerbated by the fact that existing transcriptome annotation databases are far from being comprehensive, especially with regard to cancer-specific AS. Additionally, traditional sequencing technologies are severely limited by the short length of the generated reads, which rarely spans more than a single splice junction site. Given these challenges, transcriptomic long-read (LR) sequencing presents a promising potential for the detection and discovery of AS. We present Freddie, a computational annotation-independent isoform discovery and detection tool. Freddie takes as input transcriptomic LR sequencing of a sample alongside its genomic split alignment and computes a set of isoforms for the given sample. It then partitions the input reads into sets that can be processed independently and in parallel. For each partition, Freddie segments the genomic alignment of the reads into canonical exon segments. The goal of this segmentation is to be able to represent any potential isoform as a subset of these canonical exons. This segmentation is formulated as an optimization problem and is solved with a dynamic programming algorithm. Then, Freddie reconstructs the isoforms by jointly clustering and error-correcting the reads using the canonical segmentation as a succinct representation. The clustering and error-correcting step is formulated as an optimization problem-the Minimum Error Clustering into Isoforms (MErCi) problem-and is solved using integer linear programming (ILP). We compare the performance of Freddie on simulated datasets with other isoform detection tools with varying dependence on annotation databases. We show that Freddie outperforms the other tools in its accuracy, including those given the complete ground truth annotation. We also run Freddie on a transcriptomic LR dataset generated in-house from a prostate cancer cell line with a matched short-read RNA-seq dataset. Freddie results in isoforms with a higher short-read cross-validation rate than the other tested tools. Freddie is open source and available at https://github.com/vpc-ccg/freddie/.
Collapse
Affiliation(s)
- Baraa Orabi
- Department of Computer Science, the University of British Columbia, Vancouver, BC, Canada
| | - Ning Xie
- Vancouver Prostate Centre, Vancouver, BC, Canada
| | | | - Xuesen Dong
- Vancouver Prostate Centre, Vancouver, BC, Canada,Department of Urologic Sciences, the University of British Columbia, Vancouver, BC, Canada
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Faraz Hach
- To whom correspondence should be addressed.
| |
Collapse
|
22
|
Bonenfant Q, Noé L, Touzet H. Porechop_ABI: discovering unknown adapters in Oxford Nanopore Technology sequencing reads for downstream trimming. BIOINFORMATICS ADVANCES 2022; 3:vbac085. [PMID: 36698762 PMCID: PMC9869717 DOI: 10.1093/bioadv/vbac085] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/07/2022] [Indexed: 11/23/2022]
Abstract
Motivation Oxford Nanopore Technologies (ONT) sequencing has become very popular over the past few years and offers a cost-effective solution for many genomic and transcriptomic projects. One distinctive feature of the technology is that the protocol includes the ligation of adapters to both ends of each fragment. Those adapters should then be removed before downstream analyses, either during the basecalling step or by explicit trimming. This basic task may be tricky when the definition of the adapter sequence is not well documented. Results We have developed a new method to scan a set of ONT reads to see if it contains adapters, without any prior knowledge on the sequence of the potential adapters, and then trim out those adapters. The algorithm is based on approximate k-mers and is able to discover adapter sequences based on their frequency alone. The method was successfully tested on a variety of ONT datasets with different flowcells, sequencing kits and basecallers. Availability and implementation The resulting software, named Porechop_ABI, is open-source and is available at https://github.com/bonsai-team/Porechop_ABI. Supplementary information Supplementary data are available at Bioinformatics advances online.
Collapse
Affiliation(s)
- Quentin Bonenfant
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL—Centre de Recherche en Informatique Signal et Automatique de Lille, Lille F-59000, France
| | - Laurent Noé
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL—Centre de Recherche en Informatique Signal et Automatique de Lille, Lille F-59000, France
| | | |
Collapse
|