1
|
Kumari P, Kaur M, Dindhoria K, Ashford B, Amarasinghe SL, Thind AS. Advances in long-read single-cell transcriptomics. Hum Genet 2024:10.1007/s00439-024-02678-x. [PMID: 38787419 DOI: 10.1007/s00439-024-02678-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 05/07/2024] [Indexed: 05/25/2024]
Abstract
Long-read single-cell transcriptomics (scRNA-Seq) is revolutionizing the way we profile heterogeneity in disease. Traditional short-read scRNA-Seq methods are limited in their ability to provide complete transcript coverage, resolve isoforms, and identify novel transcripts. The scRNA-Seq protocols developed for long-read sequencing platforms overcome these limitations by enabling the characterization of full-length transcripts. Long-read scRNA-Seq techniques initially suffered from comparatively poor accuracy compared to short read scRNA-Seq. However, with improvements in accuracy, accessibility, and cost efficiency, long-reads are gaining popularity in the field of scRNA-Seq. This review details the advances in long-read scRNA-Seq, with an emphasis on library preparation protocols and downstream bioinformatics analysis tools.
Collapse
Affiliation(s)
- Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Manmeet Kaur
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Bruce Ashford
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia
| | - Shanika L Amarasinghe
- Monash Biomedical Discovery Institute, Monash University, Clayton, VIC, 3800, Australia
- Walter and Eliza Hall Institute of Medical Research, 1G, Royal Parade, Parkville, VIC, 3025, Australia
| | - Amarinder Singh Thind
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia.
- The School of Chemistry and Molecular Bioscience (SCMB), University of Wollongong, Loftus St, Wollongong, NSW, 2500, Australia.
| |
Collapse
|
2
|
Greshnova A, Pál K, Martinez JFI, Canzar S, Makova KD. Transcript Isoform Diversity of Y Chromosome Ampliconic Genes of Great Apes Uncovered Using Long Reads and Telomere-to-Telomere Reference Genome Assemblies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.02.587783. [PMID: 38617276 PMCID: PMC11014635 DOI: 10.1101/2024.04.02.587783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Y chromosomes of great apes harbor Ampliconic Genes (YAGs)-multi-copy gene families (BPY2, CDY, DAZ, HSFY, PRY, RBMY, TSPY, VCY, and XKRY) that encode proteins important for spermatogenesis. Previous work assembled YAG transcripts based on their targeted sequencing but not using reference genome assemblies, potentially resulting in an incomplete transcript repertoire. Here we used the recently produced gapless telomere-to-telomere (T2T) Y chromosome assemblies of great ape species (bonobo, chimpanzee, human, gorilla, Bornean orangutan, and Sumatran orangutan) and analyzed RNA data from whole-testis samples for the same species. We generated hybrid transcriptome assemblies by combining targeted long reads (Pacific Biosciences), untargeted long reads (Pacific Biosciences) and untargeted short reads (Illumina)and mapping them to the T2T reference genomes. Compared to the results from the reference-free approach, average transcript length was more than two times higher, and the total number of transcripts decreased three times, improving the quality of the assembled transcriptome. The reference-based transcriptome assemblies allowed us to differentiate transcripts originating from different Y chromosome gene copies and from their non-Y chromosome homologs. We identified two sources of transcriptome diversity-alternative splicing and gene duplication with subsequent diversification of gene copies. For each gene family, we detected transcribed pseudogenes along with protein-coding gene copies. We revealed previously unannotated gene copies of YAGs as compared to currently available NCBI annotations, as well as novel isoforms for annotated gene copies. This analysis paves the way for better understanding Y chromosome gene functions, which is important given their role in spermatogenesis.
Collapse
Affiliation(s)
- Aleksandra Greshnova
- Department of Biology, Penn State University, University Park, PA, USA
- Current address: Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Karol Pál
- Department of Biology, Penn State University, University Park, PA, USA
| | - Juan Francisco Iturralde Martinez
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, United States
- Huck Institutes of the Life Sciences. Pennsylvania State University, University Park, PA 16802, USA
| | - Stefan Canzar
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, United States
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA, USA
| |
Collapse
|
3
|
Adams M, Vollmers C. Generation and analysis of a mouse multi-tissue genome annotation atlas. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.31.578267. [PMID: 38352519 PMCID: PMC10862843 DOI: 10.1101/2024.01.31.578267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Generating an accurate and complete genome annotation for an organism is complex because the cells within each tissue can express a unique set of transcript isoforms from a unique set of genes. A comprehensive genome annotation should contain information on what tissues express what transcript isoforms at what level. This tissue-level isoform information can then inform a wide range of research questions as well as experiment designs. Long-read sequencing technology combined with advanced full-length cDNA library preparation methods has now achieved throughput and accuracy where generating these types of annotations is achievable. Here, we show this by generating a genome annotation of the mouse (Mus musculus). We used the nanopore-based R2C2 long-read sequencing method to generate 64 million highly accurate full length cDNA consensus reads - averaging 5.4 million reads per tissue for a dozen tissues. Using the Mandalorion tool we processed these reads to generate the Tissue-level Atlas of Mouse Isoforms (TAMI - available at https://genome.ucsc.edu/s/vollmers/TAMI) which we believe will be a valuable complement to conventional, manually curated reference genome annotations.
Collapse
Affiliation(s)
- Matthew Adams
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz
| | | |
Collapse
|
4
|
Mestre-Tomás J, Liu T, Pardo-Palacios F, Conesa A. SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark. Genome Biol 2023; 24:286. [PMID: 38082294 PMCID: PMC10712166 DOI: 10.1186/s13059-023-03127-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 11/27/2023] [Indexed: 12/18/2023] Open
Abstract
Long-read RNA sequencing has emerged as a powerful tool for transcript discovery, even in well-annotated organisms. However, assessing the accuracy of different methods in identifying annotated and novel transcripts remains a challenge. Here, we present SQANTI-SIM, a versatile tool that wraps around popular long-read simulators to allow precise management of transcript novelty based on the structural categories defined by SQANTI3. By selectively excluding specific transcripts from the reference dataset, SQANTI-SIM effectively emulates scenarios involving unannotated transcripts. Furthermore, the tool provides customizable features and supports the simulation of additional types of data, representing the first multi-omics simulation tool for the lrRNA-seq field.
Collapse
Affiliation(s)
- Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Camino de Vera, Valencia, 46022, Spain
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Francisco Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain.
| |
Collapse
|
5
|
Arendt-Tranholm A, Mwirigi JM, Price TJ. RNA isoform expression landscape of the human dorsal root ganglion (DRG) generated from long read sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.28.564535. [PMID: 37961262 PMCID: PMC10634934 DOI: 10.1101/2023.10.28.564535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Splicing is a post-transcriptional RNA processing mechanism that enhances genomic complexity by creating multiple isoforms from the same gene. Diversity in splicing in the mammalian nervous system is associated with neuronal development, synaptic function and plasticity, and is also associated with diseases of the nervous system ranging from neurodegeneration to chronic pain. We aimed to characterize the isoforms expressed in the human peripheral nervous system, with the goal of creating a resource to identify novel isoforms of functionally relevant genes associated with somatosensation and nociception. We used long read sequencing (LRS) to document isoform expression in the human dorsal root ganglia (hDRG) from 3 organ donors. Isoforms were validated in silico by confirming expression in hDRG short read sequencing (SRS) data from 3 independent organ donors. 19,547 isoforms of protein-coding genes were detected using LRS and validated with SRS and strict expression cutoffs. We identified 763 isoforms with at least one previously undescribed splice-junction. Previously unannotated isoforms of multiple pain-associated genes, including ASIC3, MRGPRX1 and HNRNPK were identified. In the novel isoforms of ASIC3, a region comprising ~35% of the 5'UTR was excised. In contrast, a novel splice-junction was utilized in isoforms of MRGPRX1 to include an additional exon upstream of the start-codon, consequently adding a region to the 5'UTR. Novel isoforms of HNRNPK were identified which utilized previously unannotated splice-sites to both excise exon 14 and include a sequence in the 5' end of exon 13. The insertion and deletion in the coding region was predicted to excise a serine-phosphorylation site favored by cdc2, and replace it with a tyrosine-phosphorylation site potentially phosphorylated by SRC. We also independently confirm a recently reported DRG-specific splicing event in WNK1 that gives insight into how painless peripheral neuropathy occurs when this gene is mutated. Our findings give a clear overview of mRNA isoform diversity in the hDRG obtained using LRS. Using this work as a foundation, an important next step will be to use LRS on hDRG tissues recovered from people with a history of chronic pain. This should enable identification of new drug targets and a better understanding of chronic pain that may involve aberrant splicing events.
Collapse
Affiliation(s)
- Asta Arendt-Tranholm
- School of Behavioral and Brain Sciences, Department of Neuroscience and Center for Advanced Pain Studies, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, Texas, 75080
| | - Juliet M. Mwirigi
- School of Behavioral and Brain Sciences, Department of Neuroscience and Center for Advanced Pain Studies, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, Texas, 75080
| | - Theodore J. Price
- School of Behavioral and Brain Sciences, Department of Neuroscience and Center for Advanced Pain Studies, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, Texas, 75080
| |
Collapse
|
6
|
Mestre-Tomás J, Liu T, Pardo-Palacios F, Conesa A. SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554392. [PMID: 37662216 PMCID: PMC10473693 DOI: 10.1101/2023.08.23.554392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Long-read RNA-seq has emerged as a powerful tool for transcript discovery, even in well-annotated organisms. However, assessing the accuracy of different methods in identifying annotated and novel transcripts remains a challenge. Here, we present SQANTI-SIM, a versatile utility that wraps around popular long-read simulators to allow precise management of transcript novelty based on the structural categories defined by SQANTI3. By selectively excluding specific transcripts from the reference dataset, SQANTI-SIM effectively emulates scenarios involving unannotated transcripts. Furthermore, the tool provides customizable features and supports the simulation of additional types of data, representing the first multi-omics simulation tool for the lrRNA-seq field. We demonstrate the effectiveness of SQANTI-SIM by benchmarking five transcriptome reconstruction pipelines using the simulated data.
Collapse
Affiliation(s)
- Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Francisco Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| |
Collapse
|