1
|
Woolley CR, Chariker JH, Rouchka EC, Ford EE, Hudson E, Rasche KM, Whitley CS, Vanwinkle Z, Casella CR, Smith ML, Mitchell TC. Full-length mRNA sequencing resolves novel variation in 5' UTR length for genes expressed during human CD4 T-cell activation. Immunogenetics 2025; 77:14. [PMID: 39904916 PMCID: PMC11794378 DOI: 10.1007/s00251-025-01371-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Accepted: 01/23/2025] [Indexed: 02/06/2025]
Abstract
Isoform sequencing (Iso-Seq) uses long-read technology to produce highly accurate full-length reads of mRNA transcripts. Visualization of individual mRNA molecules can reveal new details of transcript variation within understudied portions of mRNA, such as the 5' untranslated region (UTR). Differential 5' UTRs may contain motifs, upstream open reading frames (uORFs), and secondary structures that can serve to regulate translation or further indicate changes in promoter usage, where transcriptional control may impact protein expression levels. To begin to explore isoform variation during T-cell activation, we generated the first Iso-Seq reference transcriptome of activated human CD4 T cells. Within this dataset, we discovered many novel splice- and end-variant transcripts. Remarkably, one in every eight genes expressed in our dataset was found to have a notable proportion of transcripts with 5' UTR lengthened by over 100 bp compared to the longest corresponding UTR within the Gencode dataset. Among these end-variant transcripts, two novel isoforms were identified for CXCR5, a chemokine receptor associated with T follicular helper cell (Tfh) function and differentiation. When investigated in a model cell system, these lengthened UTR conferred reduced transcript stability and, for one of these isoforms, short uORFs introduced by the added length altered protein expression kinetics. This study highlights instances in which current reference databases are incomplete relative to the information obtained by long-read sequencing of intact mRNA. Iso-Seq is thus a promising approach to better understanding the plasticity of promoter usage, alternative splicing, and UTR sequences that influence RNA stability and translation efficiency.
Collapse
Affiliation(s)
- Cassandra R Woolley
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY, USA
| | - Julia H Chariker
- Department of Neuroscience Training, University of Louisville School of Medicine, KY, Louisville, USA
| | - Eric C Rouchka
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
- KY INBRE Bioinformatics Core, University of Louisville School of Medicine, Louisville, KY, USA
| | - Easton E Ford
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Elizabeth Hudson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Kamille M Rasche
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY, USA
| | - Caleb S Whitley
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY, USA
| | - Zachary Vanwinkle
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Carolyn R Casella
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY, USA
| | - Melissa L Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Thomas C Mitchell
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY, USA.
| |
Collapse
|
2
|
Ou J, Liu H, Park S, Green MR, Zhu LJ. InPAS: An R/Bioconductor Package for Identifying Novel Polyadenylation Sites and Alternative Polyadenylation from Bulk RNA-seq Data. Front Biosci (Schol Ed) 2024; 16:21. [PMID: 39736014 DOI: 10.31083/j.fbs1604021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 09/20/2024] [Accepted: 10/10/2024] [Indexed: 12/31/2024]
Abstract
BACKGROUND Alternative cleavage and polyadenylation (APA) is a crucial post-transcriptional gene regulation mechanism that regulates gene expression in eukaryotes by increasing the diversity and complexity of both the transcriptome and proteome. Despite the development of more than a dozen experimental methods over the last decade to identify and quantify APA events, widespread adoption of these methods has been limited by technical, financial, and time constraints. Consequently, APA remains poorly understood in most eukaryotes. However, RNA sequencing (RNA-seq) technology has revolutionized transcriptome profiling and recent studies have shown that RNA-seq data can be leveraged to identify and quantify APA events. RESULTS To fully capitalize on the exponentially growing RNA-seq data, we developed InPAS (Identification of Novel alternative PolyAdenylation Sites), an R/Bioconductor package for accurate identification of novel and known cleavage and polyadenylation sites (CPSs), as well as quantification of APA from RNA-seq data of various experimental designs. Compared to other APA analysis tools, InPAS offers several important advantages, including the ability to detect both novel proximal and distal CPSs, to fine tune positions of CPSs using a naïve Bayes classifier based on flanking sequence features, and to identify APA events from RNA-seq data of complex experimental designs using linear models. We benchmarked the performance of InPAS and other leading tools using simulated and experimental RNA-seq data with matched 3'-end RNA-seq data. Our results reveal that InPAS frequently outperforms existing tools in terms of precision, sensitivity, and specificity. Furthermore, we demonstrate its scalability and versatility by applying it to large, diverse RNA-seq datasets. CONCLUSIONS InPAS is an efficient and robust tool for identifying and quantifying APA events using readily accessible conventional RNA-seq data. Its versatility opens doors to explore APA regulation across diverse eukaryotic systems with various experimental designs. We believe that InPAS will drive APA research forward, deepening our understanding of its role in regulating gene expression, and potentially leading to the discovery of biomarkers or therapeutics for diseases.
Collapse
Affiliation(s)
- Jianhong Ou
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Regeneration Center, Duke University School of Medicine, Duke University, Durham, NC 27701, USA
| | - Haibo Liu
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Sungmi Park
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Michael R Green
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Lihua Julie Zhu
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Department of Molecular Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| |
Collapse
|
3
|
Hueso M, Mallén A, Navarro E. Generation of Transcript Length Variants and Reprogramming of mRNA Splicing During Atherosclerosis Progression in ApoE-Deficient Mice. Biomedicines 2024; 12:2703. [PMID: 39767610 PMCID: PMC11672872 DOI: 10.3390/biomedicines12122703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 11/19/2024] [Accepted: 11/20/2024] [Indexed: 01/11/2025] Open
Abstract
Background. Variant 3'UTRs provide mRNAs with different binding sites for miRNAs or RNA-binding proteins (RBPs) allowing the establishment of new regulatory environments. Regulation of 3'UTR length impacts on the control of gene expression by regulating accessibility of miRNAs or RBPs to homologous sequences in mRNAs. Objective. Studying the dynamics of mRNA length variations in atherosclerosis (ATS) progression and reversion in ApoE-deficient mice exposed to a high-fat diet and treated with an αCD40-specific siRNA or with a sequence-scrambled siRNA as control. Methods. We gathered microarray mRNA expression data from the aortas of mice after 2 or 16 weeks of treatments, and used these data in a Bioinformatics analysis. Results. Here, we report the lengthening of the 5'UTR/3'UTRs and the shortening of the CDS in downregulated mRNAs during ATS progression. Furthermore, treatment with the αCD40-specific siRNA resulted in the partial reversion of the 3'UTR lengthening. Exon analysis showed that these length variations were actually due to changes in the number of exons embedded in mRNAs, and the further examination of transcripts co-expressed at weeks 2 and 16 in mice treated with the control siRNA revealed a process of mRNA isoform switching in which transcript variants differed in the patterns of alternative splicing or activated latent/cryptic splice sites. Conclusion. We document length variations in the 5'UTR/3'UTR and CDS of mRNAs downregulated during atherosclerosis progression and suggest a role for mRNA splicing reprogramming and transcript isoform switching in the generation of disease-related mRNA sequence diversity and variability.
Collapse
Affiliation(s)
- Miguel Hueso
- Experimental Nephrology Lab, Institut d’Investigació Biomèdica de Bellvitge-IDIBELL, C/Feixa Llarga s/n, L’Hospitalet de Llobregat, 08907 Barcelona, Spain;
- Department of Nephrology, Hospital Universitari and Bellvitge, Institut d’Investigació Biomèdica de Bellvitge-IDIBELL, C/Feixa Llarga s/n, L’Hospitalet de Llobregat, 08907 Barcelona, Spain
| | - Adrián Mallén
- Experimental Nephrology Lab, Institut d’Investigació Biomèdica de Bellvitge-IDIBELL, C/Feixa Llarga s/n, L’Hospitalet de Llobregat, 08907 Barcelona, Spain;
| | - Estanis Navarro
- REMAR Group, Germans Trias i Pujol Research Institute (IGTP), Ctra de Can Ruti, Camí de les Escoles s/n, 08916 Badalona, Spain
| |
Collapse
|
4
|
Dioken DN, Ozgul I, Koksal Bicakci G, Gol K, Can T, Erson-Bensan AE. Differential expression of mRNA 3'-end isoforms in cervical and ovarian cancers. Heliyon 2023; 9:e20035. [PMID: 37810050 PMCID: PMC10559779 DOI: 10.1016/j.heliyon.2023.e20035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 07/26/2023] [Accepted: 09/08/2023] [Indexed: 10/10/2023] Open
Abstract
Early diagnosis and therapeutic targeting are continuing challenges for gynecological cancers. Here, we focus on cancer transcriptomes and describe the differential expression of 3'UTR isoforms in patients using an algorithm to detect differential poly(A) site usage. We find primarily 3'UTR shortening cases in cervical cancers compared with the normal cervix. We show differential expression of alternate 3'-end isoforms of FOXP1, VPS4B, and OGT in HPV16-positive patients who develop high-grade cervical lesions compared with the infected but non-progressing group. In contrast, in ovarian cancers, 3'UTR lengthening is more evident compared with normal ovary tissue. Nevertheless, highly malignant ovarian tumors have unique 3'UTR shortening events (e.g., CHRAC1, SLC16A1, and TOP2A), some of which correlate with upregulated protein levels in tumors. Overall, our study shows isoform level deregulation in gynecological cancers and highlights the complexity of the transcriptome. This transcript diversity could help identify novel cancer genes and provide new possibilities for diagnosis and therapy.
Collapse
Affiliation(s)
- Didem Naz Dioken
- Department of Biological Sciences, Middle East Technical University (METU), Dumlupinar Blv No: 1 Universiteler Mah., Cankaya, Ankara, 06800, Turkiye
| | - Ibrahim Ozgul
- Department of Biological Sciences, Middle East Technical University (METU), Dumlupinar Blv No: 1 Universiteler Mah., Cankaya, Ankara, 06800, Turkiye
| | - Gozde Koksal Bicakci
- Department of Biological Sciences, Middle East Technical University (METU), Dumlupinar Blv No: 1 Universiteler Mah., Cankaya, Ankara, 06800, Turkiye
| | - Kemal Gol
- Gynecology Clinic, Ugur Mumcu Cad 17/2, Cankaya, Ankara, Turkiye
| | - Tolga Can
- Department of Computer Engineering, Middle East Technical University (METU), Dumlupinar Blv No: 1, Universiteler Mah., Ankara, 06800, Turkiye
| | - Ayse Elif Erson-Bensan
- Department of Biological Sciences, Middle East Technical University (METU), Dumlupinar Blv No: 1 Universiteler Mah., Cankaya, Ankara, 06800, Turkiye
| |
Collapse
|
5
|
Ye W, Lian Q, Ye C, Wu X. A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00121-8. [PMID: 36167284 PMCID: PMC10372920 DOI: 10.1016/j.gpb.2022.09.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/17/2022] [Accepted: 09/19/2022] [Indexed: 05/08/2023]
Abstract
Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3' untranslated region, tissue-specific, cross-species, and single-cell pA prediction.
Collapse
Affiliation(s)
- Wenbin Ye
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Coastal and Wetland Ecosystems, Ministry of Education, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
6
|
Liu Y, Zhang Y, Wang J, Lu F. Transcriptome-wide measurement of poly(A) tail length and composition at subnanogram total RNA sensitivity by PAIso-seq. Nat Protoc 2022; 17:1980-2007. [PMID: 35831615 DOI: 10.1038/s41596-022-00704-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 03/23/2022] [Indexed: 12/14/2022]
Abstract
Poly(A) tails are added to the 3' ends of most mRNAs in a non-templated manner and play essential roles in post-transcriptional regulation, including mRNA export, stability and translation. Measuring poly(A) tails is critical for understanding their regulatory roles in almost every aspect of biological and medical studies. Previous methods for analyzing poly(A) tails require large amounts of input RNA (microgram-level total RNA), which limits their application. We recently developed a poly(A) inclusive full-length RNA isoform-sequencing method (PAIso-seq) at single-oocyte-level sensitivity (a single mammalian oocyte contains ~0.5 ng of total RNA) based on PacBio sequencing that enabled accurate measurement of the poly(A) tail length and non-A residues within the body of poly(A) tails along with the full-length cDNA, providing the opportunity to study precious in vivo samples with very limited input material. Here, we describe a detailed protocol for PAIso-seq library preparation from single mouse oocytes or bulk oocyte samples. In addition, we provide a complete bioinformatic pipeline to perform the analysis from the raw data to downstream analysis. The minimum time required is ~14.5 h for PAIso-seq double-stranded cDNA preparation, 2 d for PacBio sequencing in HiFi mode and 8 h for the initial data analysis.
Collapse
Affiliation(s)
- Yusheng Liu
- State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China.
| | - Yiwei Zhang
- College of Life Science, Northeast Agricultural University, Harbin, China
| | - Jiaqiang Wang
- College of Life Science, Northeast Agricultural University, Harbin, China.
| | - Falong Lu
- State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China. .,University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|