1
|
Moon Y, Burri D, Zavolan M. Identification of experimentally-supported poly(A) sites in single-cell RNA-seq data with SCINPAS. NAR Genom Bioinform 2023; 5:lqad079. [PMID: 37705828 PMCID: PMC10495540 DOI: 10.1093/nargab/lqad079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/15/2023] [Accepted: 08/23/2023] [Indexed: 09/15/2023] Open
Abstract
Alternative polyadenylation is a main driver of transcriptome diversity in mammals, generating transcript isoforms with different 3' ends via cleavage and polyadenylation at distinct polyadenylation (poly(A)) sites. The regulation of cell type-specific poly(A) site choice is not completely resolved, and requires quantitative poly(A) site usage data across cell types. 3' end-based single-cell RNA-seq can now be broadly used to obtain such data, enabling the identification and quantification of poly(A) sites with direct experimental support. We propose SCINPAS, a computational method to identify poly(A) sites from scRNA-seq datasets. SCINPAS modifies the read deduplication step to favor the selection of distal reads and extract those with non-templated poly(A) tails. This approach improves the resolution of poly(A) site recovery relative to standard software. SCINPAS identifies poly(A) sites in genic and non-genic regions, providing complementary information relative to other tools. The workflow is modular, and the key read deduplication step is general, enabling the use of SCINPAS in other typical analyses of single cell gene expression. Taken together, we show that SCINPAS is able to identify experimentally-supported, known and novel poly(A) sites from 3' end-based single-cell RNA sequencing data.
Collapse
Affiliation(s)
- Youngbin Moon
- Computational and Systems Biology, Biozentrum University of Basel, Spitalstrasse 41, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Dominik Burri
- Computational and Systems Biology, Biozentrum University of Basel, Spitalstrasse 41, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Mihaela Zavolan
- Computational and Systems Biology, Biozentrum University of Basel, Spitalstrasse 41, CH-4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
2
|
Hao S, Zhang L, Zhao D, Zhou J, Ye C, Qu H, Li QQ. Inhibitor AN3661 reveals biological functions of Arabidopsis CLEAVAGE and POLYADENYLATION SPECIFICITY FACTOR 73. PLANT PHYSIOLOGY 2023; 193:537-554. [PMID: 37335917 DOI: 10.1093/plphys/kiad352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 05/09/2023] [Accepted: 05/21/2023] [Indexed: 06/21/2023]
Abstract
Cleavage and polyadenylation specificity factor (CPSF) is a protein complex that plays an essential biochemical role in mRNA 3'-end formation, including poly(A) signal recognition and cleavage at the poly(A) site. However, its biological functions at the organismal level are mostly unknown in multicellular eukaryotes. The study of plant CPSF73 has been hampered by the lethality of Arabidopsis (Arabidopsis thaliana) homozygous mutants of AtCPSF73-I and AtCPSF73-II. Here, we used poly(A) tag sequencing to investigate the roles of AtCPSF73-I and AtCPSF73-II in Arabidopsis treated with AN3661, an antimalarial drug with specificity for parasite CPSF73 that is homologous to plant CPSF73. Direct seed germination on an AN3661-containing medium was lethal; however, 7-d-old seedlings treated with AN3661 survived. AN3661 targeted AtCPSF73-I and AtCPSF73-II, inhibiting growth through coordinating gene expression and poly(A) site choice. Functional enrichment analysis revealed that the accumulation of ethylene and auxin jointly inhibited primary root growth. AN3661 affected poly(A) signal recognition, resulted in lower U-rich signal usage, caused transcriptional readthrough, and increased the distal poly(A) site usage. Many microRNA targets were found in the 3' untranslated region lengthened transcripts; these miRNAs may indirectly regulate the expression of these targets. Overall, this work demonstrates that AtCPSF73 plays important part in co-transcriptional regulation, affecting growth, and development in Arabidopsis.
Collapse
Affiliation(s)
- Saiqi Hao
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystem, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Lidan Zhang
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystem, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Danhui Zhao
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystem, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Jiawen Zhou
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystem, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystem, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Haidong Qu
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystem, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Qingshun Q Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystem, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
- Biomedical Sciences, College of Dental Medicine, Western University of Health Sciences, Pomona, CA 91766, USA
| |
Collapse
|
3
|
Ji G, Tang Q, Zhu S, Zhu J, Ye P, Xia S, Wu X. stAPAminer: Mining Spatial Patterns of Alternative Polyadenylation for Spatially Resolved Transcriptomic Studies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:601-618. [PMID: 36669641 PMCID: PMC10787175 DOI: 10.1016/j.gpb.2023.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 12/07/2022] [Accepted: 01/08/2023] [Indexed: 01/19/2023]
Abstract
Alternative polyadenylation (APA) contributes to transcriptome complexity and gene expression regulation and has been implicated in various cellular processes and diseases. Single-cell RNA sequencing (scRNA-seq) has enabled the profiling of APA at the single-cell level; however, the spatial information of cells is not preserved in scRNA-seq. Alternatively, spatial transcriptomics (ST) technologies provide opportunities to decipher the spatial context of the transcriptomic landscape. Pioneering studies have revealed potential spatially variable genes and/or splice isoforms; however, the pattern of APA usage in spatial contexts remains unappreciated. In this study, we developed a toolkit called stAPAminer for mining spatial patterns of APA from spatially barcoded ST data. APA sites were identified and quantified from the ST data. In particular, an imputation model based on the k-nearest neighbors algorithm was designed to recover APA signals, and then APA genes with spatial patterns of APA usage variation were identified. By analyzing well-established ST data of the mouse olfactory bulb (MOB), we presented a detailed view of spatial APA usage across morphological layers of the MOB. We compiled a comprehensive list of genes with spatial APA dynamics and obtained several major spatial expression patterns that represent spatial APA dynamics in different morphological layers. By extending this analysis to two additional replicates of the MOB ST data, we observed that the spatial APA patterns of several genes were reproducible among replicates. stAPAminer employs the power of ST to explore the transcriptional atlas of spatial APA patterns with spatial resolution. This toolkit is available at https://github.com/BMILAB/stAPAminer and https://ngdc.cncb.ac.cn/biocode/tools/BT007320.
Collapse
Affiliation(s)
- Guoli Ji
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Qi Tang
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Sheng Zhu
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Junyi Zhu
- Institute of Neuroscience, Soochow University, Suzhou 215000, China
| | - Pengchao Ye
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Shuting Xia
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Institute of Neuroscience, Soochow University, Suzhou 215000, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
4
|
Sanpedro-Luna JA, Jacinto-Vázquez JJ, Anastacio-Marcelino E, Posadas-Gutiérrez CM, Olmos-Pineda I, González-Bernal JA, Carcaño-Montiel M, Vega-Alvarado L, Vázquez-Cruz C, Sánchez-Alonso P. Telomerase RNA plays a major role in the completion of the life cycle in Ustilago maydis and shares conserved domains with other Ustilaginales. PLoS One 2023; 18:e0281251. [PMID: 36952474 PMCID: PMC10035886 DOI: 10.1371/journal.pone.0281251] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 01/18/2023] [Indexed: 03/25/2023] Open
Abstract
The RNA subunit of telomerase is an essential component whose primary sequence and length are poorly conserved among eukaryotic organisms. The phytopathogen Ustilago maydis is a dimorphic fungus of the order Ustilaginales. We analyzed several species of Ustilaginales to computationally identify the TElomere RNA (TER) gene ter1. To confirm the identity of the TER gene, we disrupted the gene and characterized telomerase-negative mutants. Similar to catalytic TERT mutants, ter1Δ mutants exhibit phenotypes of growth delay, telomere shortening and low replicative potential. ter1-disrupted mutants were unable to infect maize seedlings in heterozygous crosses and showed defects such as cell cycle arrest and segregation failure. We concluded that ter1, which encodes the TER subunit of the telomerase of U. maydis, have similar and perhaps more extensive functions than trt1.
Collapse
Affiliation(s)
- Juan Antonio Sanpedro-Luna
- Instituto de Ciencias, Posgrado en Microbiología, Benemérita Universidad Autónoma de Puebla, Puebla, México
| | - José Juan Jacinto-Vázquez
- Instituto de Ciencias, Posgrado en Microbiología, Benemérita Universidad Autónoma de Puebla, Puebla, México
| | - Estela Anastacio-Marcelino
- Instituto de Ciencias, Centro de Investigaciones Microbiológicas, Benemérita Universidad Autónoma de Puebla, Puebla, México
| | | | - Iván Olmos-Pineda
- Facultad de Ciencias de la Computación, Benemérita Universidad Autónoma de Puebla, Puebla, México
| | - Jesús Antonio González-Bernal
- Department of Computer Science and Engineering, The University of Texas Arlington, Arlington, Texas, United States of America
| | - Moisés Carcaño-Montiel
- Instituto de Ciencias, Centro de Investigaciones Microbiológicas, Benemérita Universidad Autónoma de Puebla, Puebla, México
| | - Leticia Vega-Alvarado
- Instituto de Ciencias Aplicadas y Tecnología, Universidad Nacional Autónoma de México, Ciudad Universitaria, Ciudad de México, México, México
| | - Candelario Vázquez-Cruz
- Instituto de Ciencias, Posgrado en Microbiología, Benemérita Universidad Autónoma de Puebla, Puebla, México
- Instituto de Ciencias, Centro de Investigaciones Microbiológicas, Benemérita Universidad Autónoma de Puebla, Puebla, México
| | - Patricia Sánchez-Alonso
- Instituto de Ciencias, Posgrado en Microbiología, Benemérita Universidad Autónoma de Puebla, Puebla, México
- Instituto de Ciencias, Centro de Investigaciones Microbiológicas, Benemérita Universidad Autónoma de Puebla, Puebla, México
- * E-mail:
| |
Collapse
|
5
|
Ye W, Lian Q, Ye C, Wu X. A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00121-8. [PMID: 36167284 PMCID: PMC10372920 DOI: 10.1016/j.gpb.2022.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/17/2022] [Accepted: 09/19/2022] [Indexed: 05/08/2023]
Abstract
Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3' untranslated region, tissue-specific, cross-species, and single-cell pA prediction.
Collapse
Affiliation(s)
- Wenbin Ye
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Coastal and Wetland Ecosystems, Ministry of Education, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
6
|
scAPAmod: Profiling Alternative Polyadenylation Modalities in Single Cells from Single-Cell RNA-Seq Data. Int J Mol Sci 2022; 23:ijms23158123. [PMID: 35897701 PMCID: PMC9329739 DOI: 10.3390/ijms23158123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/01/2022] [Accepted: 07/21/2022] [Indexed: 11/17/2022] Open
Abstract
Alternative polyadenylation (APA) is a key layer of gene expression regulation, and APA choice is finely modulated in cells. Advances in single-cell RNA-seq (scRNA-seq) have provided unprecedented opportunities to study APA in cell populations. However, existing studies that investigated APA in single cells were either confined to a few cells or focused on profiling APA dynamics between cell types or identifying APA sites. The diversity and pattern of APA usages on a genomic scale in single cells remains unappreciated. Here, we proposed an analysis framework based on a Gaussian mixture model, scAPAmod, to identify patterns of APA usage from homogeneous or heterogeneous cell populations at the single-cell level. We systematically evaluated the performance of scAPAmod using simulated data and scRNA-seq data. The results show that scAPAmod can accurately identify different patterns of APA usages at the single-cell level. We analyzed the dynamic changes in the pattern of APA usage using scAPAmod in different cell differentiation and developmental stages during mouse spermatogenesis and found that even the same gene has different patterns of APA usages in different differentiation stages. The preference of patterns of usages of APA sites in different genomic regions was also analyzed. We found that patterns of APA usages of the same gene in 3′ UTRs (3′ untranslated region) and non-3′ UTRs are different. Moreover, we analyzed cell-type-specific APA usage patterns and changes in patterns of APA usages across cell types. Different from the conventional analysis of single-cell heterogeneity based on gene expression profiling, this study profiled the heterogeneous pattern of APA isoforms, which contributes to revealing the heterogeneity of single-cell gene expression with higher resolution.
Collapse
|
7
|
Zhu S, Lian Q, Ye W, Qin W, Wu Z, Ji G, Wu X. scAPAdb: a comprehensive database of alternative polyadenylation at single-cell resolution. Nucleic Acids Res 2021; 50:D365-D370. [PMID: 34508354 PMCID: PMC8728153 DOI: 10.1093/nar/gkab795] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 08/26/2021] [Accepted: 09/02/2021] [Indexed: 01/08/2023] Open
Abstract
Alternative polyadenylation (APA) is a widespread regulatory mechanism of transcript diversification in eukaryotes, which is increasingly recognized as an important layer for eukaryotic gene expression. Recent studies based on single-cell RNA-seq (scRNA-seq) have revealed cell-to-cell heterogeneity in APA usage and APA dynamics across different cell types in various tissues, biological processes and diseases. However, currently available APA databases were all collected from bulk 3′-seq and/or RNA-seq data, and no existing database has provided APA information at single-cell resolution. Here, we present a user-friendly database called scAPAdb (http://www.bmibig.cn/scAPAdb), which provides a comprehensive and manually curated atlas of poly(A) sites, APA events and poly(A) signals at the single-cell level. Currently, scAPAdb collects APA information from > 360 scRNA-seq experiments, covering six species including human, mouse and several other plant species. scAPAdb also provides batch download of data, and users can query the database through a variety of keywords such as gene identifier, gene function and accession number. scAPAdb would be a valuable and extendable resource for the study of cell-to-cell heterogeneity in APA isoform usages and APA-mediated gene regulation at the single-cell level under diverse cell types, tissues and species.
Collapse
Affiliation(s)
- Sheng Zhu
- Pasteurien College, Soochow University, Suzhou, Jiangsu 215000, China.,Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Qiwei Lian
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Wei Qin
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Zhe Wu
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Xiaohui Wu
- Pasteurien College, Soochow University, Suzhou, Jiangsu 215000, China
| |
Collapse
|
8
|
Ye C, Zhao D, Ye W, Wu X, Ji G, Li QQ, Lin J. QuantifyPoly(A): reshaping alternative polyadenylation landscapes of eukaryotes with weighted density peak clustering. Brief Bioinform 2021; 22:6319934. [PMID: 34255024 DOI: 10.1093/bib/bbab268] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 06/23/2021] [Accepted: 06/23/2021] [Indexed: 01/09/2023] Open
Abstract
The dynamic choice of different polyadenylation sites in a gene is referred to as alternative polyadenylation, which functions in many important biological processes. Large-scale messenger RNA 3' end sequencing has revealed that cleavage sites for polyadenylation are presented with microheterogeneity. To date, the conventional determination of polyadenylation site clusters is subjective and arbitrary, leading to inaccurate annotations. Here, we present a weighted density peak clustering method, QuantifyPoly(A), to accurately quantify genome-wide polyadenylation choices. Applying QuantifyPoly(A) on published 3' end sequencing datasets from both animals and plants, their polyadenylation profiles are reshaped into myriads of novel polyadenylation site clusters. Most of these novel polyadenylation site clusters show significantly dynamic usage across different biological samples or associate with binding sites of trans-acting factors. Upstream sequences of these clusters are enriched with polyadenylation signals UGUA, UAAA and/or AAUAAA in a species-dependent manner. Polyadenylation site clusters also exhibit species specificity, while plants ones generally show higher microheterogeneity than that of animals. QuantifyPoly(A) is broadly applicable to any types of 3' end sequencing data and species for accurate quantification and construction of the complex and dynamic polyadenylation landscape and enables us to decode alternative polyadenylation events invisible to conventional methods at a much higher resolution.
Collapse
Affiliation(s)
- Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Danhui Zhao
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Qingshun Q Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China.,Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA 91766, USA
| | - Juncheng Lin
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China.,FAFU-UCR Joint Center, Horticulture Biology and Metabolomics Center, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| |
Collapse
|
9
|
Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence. Nat Commun 2021; 12:1652. [PMID: 33712618 PMCID: PMC7955126 DOI: 10.1038/s41467-021-21894-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 02/18/2021] [Indexed: 02/01/2023] Open
Abstract
Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3'-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model-trained using the Human Brain Reference RNA commercial standard-performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi's input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.
Collapse
|
10
|
Ye W, Liu T, Fu H, Ye C, Ji G, Wu X. movAPA: modeling and visualization of dynamics of alternative polyadenylation across biological samples. Bioinformatics 2020; 37:2470-2472. [PMID: 33258917 DOI: 10.1093/bioinformatics/btaa997] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/10/2020] [Accepted: 11/17/2020] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3' end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. RESULTS We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3' UTR shortening/lengthening events between conditions. APA site switching involving non-3' UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. AVAILABILITY AND IMPLEMENTATION https://github.com/BMILAB/movAPA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenbin Ye
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Tao Liu
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen 361102, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen 361005, China
| |
Collapse
|
11
|
Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species. PLoS Comput Biol 2020; 16:e1008297. [PMID: 33151940 PMCID: PMC7671507 DOI: 10.1371/journal.pcbi.1008297] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 11/17/2020] [Accepted: 08/30/2020] [Indexed: 11/19/2022] Open
Abstract
In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.
Collapse
|
12
|
Wu X, Liu T, Ye C, Ye W, Ji G. scAPAtrap: identification and quantification of alternative polyadenylation sites from single-cell RNA-seq data. Brief Bioinform 2020; 22:5952304. [PMID: 33142319 DOI: 10.1093/bib/bbaa273] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 09/17/2020] [Accepted: 09/20/2020] [Indexed: 02/06/2023] Open
Abstract
Alternative polyadenylation (APA) generates diverse mRNA isoforms, which contributes to transcriptome diversity and gene expression regulation by affecting mRNA stability, translation and localization in cells. The rapid development of 3' tag-based single-cell RNA-sequencing (scRNA-seq) technologies, such as CEL-seq and 10x Genomics, has led to the emergence of computational methods for identifying APA sites and profiling APA dynamics at single-cell resolution. However, existing methods fail to detect the precise location of poly(A) sites or sites with low read coverage. Moreover, they rely on priori genome annotation and can only detect poly(A) sites located within or near annotated genes. Here we proposed a tool called scAPAtrap for detecting poly(A) sites at the whole genome level in individual cells from 3' tag-based scRNA-seq data. scAPAtrap incorporates peak identification and poly(A) read anchoring, enabling the identification of the precise location of poly(A) sites, even for sites with low read coverage. Moreover, scAPAtrap can identify poly(A) sites without using priori genome annotation, which helps locate novel poly(A) sites in previously overlooked regions and improve genome annotation. We compared scAPAtrap with two latest methods, scAPA and Sierra, using scRNA-seq data from different experimental technologies and species. Results show that scAPAtrap identified poly(A) sites with higher accuracy and sensitivity than competing methods and could be used to explore APA dynamics among cell types or the heterogeneous APA isoform expression in individual cells. scAPAtrap is available at https://github.com/BMILAB/scAPAtrap.
Collapse
Affiliation(s)
- Xiaohui Wu
- Department of Automation in Xiamen University
| | - Tao Liu
- Department of Automation in Xiamen University
| | - Congting Ye
- College of the Environment and Ecology in Xiamen University
| | - Wenbin Ye
- Department of Automation in Xiamen University
| | - Guoli Ji
- Department of Automation in Xiamen University
| |
Collapse
|
13
|
Tu M, Li Y. Profiling Alternative 3' Untranslated Regions in Sorghum using RNA-seq Data. Front Genet 2020; 11:556749. [PMID: 33193635 PMCID: PMC7649775 DOI: 10.3389/fgene.2020.556749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 09/30/2020] [Indexed: 12/18/2022] Open
Abstract
Sorghum is an important crop widely used for food, feed, and fuel. Transcriptome-wide studies of 3′ untranslated regions (3′UTR) using regular RNA-seq remain scarce in sorghum, while transcriptomes have been characterized extensively using Illumina short-read sequencing platforms for many sorghum varieties under various conditions or developmental contexts. 3′UTR is a critical regulatory component of genes, controlling the translation, transport, and stability of messenger RNAs. In the present study, we profiled the alternative 3′UTRs at the transcriptome level in three genetically related but phenotypically contrasting lines of sorghum: Rio, BTx406, and R9188. A total of 1,197 transcripts with alternative 3′UTRs were detected using RNA-seq data. Their categorization identified 612 high-confidence alternative 3′UTRs. Importantly, the high-confidence alternative 3′UTR genes significantly overlapped with the genesets that are associated with RNA N6-methyladenosine (m6A) modification, suggesting a clear indication between alternative 3′UTR and m6A methylation in sorghum. Moreover, taking advantage of sorghum genetics, we provided evidence of genotype specificity of alternative 3′UTR usage. In summary, our work exemplifies a transcriptome-wide profiling of alternative 3′UTRs using regular RNA-seq data in non-model crops and gains insights into alternative 3′UTRs and their genotype specificity.
Collapse
Affiliation(s)
- Min Tu
- Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| | - Yin Li
- Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| |
Collapse
|
14
|
Xia Z, Li Y, Zhang B, Li Z, Hu Y, Chen W, Gao X. DeeReCT-PolyA: a robust and generic deep learning method for PAS identification. Bioinformatics 2020; 35:2371-2379. [PMID: 30500881 PMCID: PMC6612895 DOI: 10.1093/bioinformatics/bty991] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 11/06/2018] [Accepted: 11/29/2018] [Indexed: 02/06/2023] Open
Abstract
Motivation Polyadenylation is a critical step for gene expression regulation during the maturation of mRNA. An accurate and robust method for poly(A) signals (PASs) identification is not only desired for the purpose of better transcripts’ end annotation, but can also help us gain a deeper insight of the underlying regulatory mechanism. Although many methods have been proposed for PAS recognition, most of them are PAS motif- and human-specific, which leads to high risks of overfitting, low generalization power, and inability to reveal the connections between the underlying mechanisms of different mammals. Results In this work, we propose a robust, PAS motif agnostic, and highly interpretable and transferrable deep learning model for accurate PAS recognition, which requires no prior knowledge or human-designed features. We show that our single model trained over all human PAS motifs not only outperforms the state-of-the-art methods trained on specific motifs, but can also be generalized well to two mouse datasets. Moreover, we further increase the prediction accuracy by transferring the deep learning model trained on the data of one species to the data of a different species. Several novel underlying poly(A) patterns are revealed through the visualization of important oligomers and positions in our trained models. Finally, we interpret the deep learning models by converting the convolutional filters into sequence logos and quantitatively compare the sequence logos between human and mouse datasets. Availability and implementation https://github.com/likesum/DeeReCT-PolyA Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhihao Xia
- Department of Computer Science and Engineering (CSE), Washington University in St Louis, St Louis, MO, USA
| | - Yu Li
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| | - Bin Zhang
- Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China
| | - Zhongxiao Li
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| | - Yuhui Hu
- Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China
| | - Wei Chen
- Department of Biology, Southern University of Science and Technology (SUSTC), Shenzhen, China
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| |
Collapse
|
15
|
Ye C, Zhou Q, Wu X, Ji G, Li QQ. Genome-wide alternative polyadenylation dynamics in response to biotic and abiotic stresses in rice. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2019; 183:109485. [PMID: 31376807 DOI: 10.1016/j.ecoenv.2019.109485] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 07/24/2019] [Accepted: 07/26/2019] [Indexed: 05/24/2023]
Abstract
Alternative polyadenylation (APA) is an important way to regulate gene expression at the post-transcriptional level, and is extensively involved in plant stress responses. However, the systematic roles of APA regulation in response to abiotic and biotic stresses in rice at the genome scale remain unknown. To take advantage of available RNA-seq datasets, using a novel tool APAtrap, we identified thousands of genes with significantly differential usage of polyadenylation [poly(A)] sites in response to the abiotic stress (drought, heat shock, and cadmium) and biotic stress [bacterial blight (BB), rice blast, and rice stripe virus (RSV)]. Genes with stress-responsive APA dynamics commonly exhibited higher expression levels when their isoforms with short 3' untranslated region (3' UTR) were more abundant. The stress-responsive APA events were widely involved in crucial stress-responsive genes and pathways: e.g. APA acted as a negative regulator in heat stress tolerance; APA events were involved in DNA repair and cell wall formation under Cd stress; APA regulated chlorophyll metabolism, being associated with the pathogenesis of leaf diseases under RSV and BB challenges. Furthermore, APA events were found to be involved in glutathione metabolism and MAPK signaling pathways, mediating a crosstalk among the abiotic and biotic stress-responsive regulatory networks in rice. Analysis of large-scale datasets revealed that APA may regulate abiotic and biotic stress-responsive processes in rice. Such post-transcriptome diversities contribute to rice adaption to various environmental challenges. Our study would supply useful resource for further molecular assisted breeding of multiple stress-tolerant cultivars for rice.
Collapse
Affiliation(s)
- Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China.
| | - Qian Zhou
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China; Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, 91766, USA.
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian, 361005, China.
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian, 361005, China.
| | - Qingshun Quinn Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China; Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, 91766, USA.
| |
Collapse
|
16
|
Leung MKK, Delong A, Frey BJ. Inference of the human polyadenylation code. Bioinformatics 2019; 34:2889-2898. [PMID: 29648582 PMCID: PMC6129302 DOI: 10.1093/bioinformatics/bty211] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 04/09/2018] [Indexed: 01/02/2023] Open
Abstract
Motivation Processing of transcripts at the 3′-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which site is cleaved, the process of alternative polyadenylation enables genes to produce transcript isoforms with different 3′-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the sequence determinants underlying this regulatory process, a computational model that can accurately predict polyadenylation patterns from genomic features is desirable. Results Previous works have focused on identifying candidate polyadenylation sites and classifying tissue-specific sites. By training on how multiple sites in genes are competitively selected for polyadenylation from 3′-end sequencing data, we developed a deep learning model that can predict the tissue-specific strength of a polyadenylation site in the 3′ untranslated region of the human genome given only its genomic sequence. We demonstrate the model’s broad utility on multiple tasks, without any application-specific training. The model can be used to predict which polyadenylation site is more likely to be selected in genes with multiple sites. It can be used to scan the 3′ untranslated region to find candidate polyadenylation sites. It can be used to classify the pathogenicity of variants near annotated polyadenylation sites in ClinVar. It can also be used to anticipate the effect of antisense oligonucleotide experiments to redirect polyadenylation. We provide analysis on how different features affect the model’s predictive performance and a method to identify sensitive regions of the genome at the single-based resolution that can affect polyadenylation regulation. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michael K K Leung
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.,Deep Genomics, MaRS Centre, Toronto, Canada
| | - Andrew Delong
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.,Deep Genomics, MaRS Centre, Toronto, Canada
| | - Brendan J Frey
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada.,Deep Genomics, MaRS Centre, Toronto, Canada.,Banting and Best Department of Medical Research, University of Toronto, Toronto, Canada
| |
Collapse
|
17
|
Ji G, Chen M, Ye W, Zhu S, Ye C, Su Y, Peng H, Wu X. TSAPA: identification of tissue-specific alternative polyadenylation sites in plants. Bioinformatics 2019; 34:2123-2125. [PMID: 29385403 DOI: 10.1093/bioinformatics/bty044] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 01/26/2018] [Indexed: 11/12/2022] Open
Abstract
Summary Alternative polyadenylation (APA) is now emerging as a widespread mechanism modulated tissue-specifically, which highlights the need to define tissue-specific poly(A) sites for profiling APA dynamics across tissues. We have developed an R package called TSAPA based on the machine learning model for identifying tissue-specific poly(A) sites in plants. A feature space including more than 200 features was assembled to specifically characterize poly(A) sites in plants. The classification model in TSAPA can be customized by selecting desirable features or classifiers. TSAPA is also capable of predicting tissue-specific poly(A) sites in unannotated intergenic regions. TSAPA will be a valuable addition to the community for studying dynamics of APA in plants. Availability and implementation https://github.com/BMILAB/TSAPA. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guoli Ji
- Department of Automation.,Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China
| | | | | | | | - Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
| | | | - Xiaohui Wu
- Department of Automation.,Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China
| |
Collapse
|
18
|
Zhu S, Wu X, Fu H, Ye C, Chen M, Jiang Z, Ji G. Modeling of Genome-Wide Polyadenylation Signals in Xenopus tropicalis. Front Genet 2019; 10:647. [PMID: 31333724 PMCID: PMC6616101 DOI: 10.3389/fgene.2019.00647] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 06/18/2019] [Indexed: 12/22/2022] Open
Abstract
Alternative polyadenylation (APA) is an important post-transcriptional modification event to process messenger RNA (mRNA) for transcriptional termination, transport, and translation. In the present study, we characterized poly(A) signals in Xenopus tropicalis using 70,918 highly confident poly(A) sites derived from 16,511 protein-coding genes to understand their roles in the regulation of embryo development and gender difference. We examined potential factors, including the gene length, the number of introns in a gene, and the intron length, that may affect the prevalence of APA. We observed 12 prominent poly(A) signal patterns, which accounted for approximately 92% of total APA sites in Xenopus tropicalis. Among them, three patterns are specific to X. tropicalis, so they are absent in other animals such as humans or mice. We catalogued APA sites based on their genomic regions and developed a bioinformatics pipeline to identify over-represented signal patterns for each class. Then the schema of cis elements for APA sites in each genomic region was proposed. More importantly, APA usage is dramatically dynamic in embryos along five developmental stages and well-coordinated with the maternal-to-zygotic transition event. We used an entropy-based method to identify developmental stage-specific APA sites and identified significant signal patterns around specific sites and constitutive sites. We found that the APA frequency in different genomic regions varies with developmental stages and that those sites located in intron or coding sequence regions contribute most to the dynamics of gene expression during developmental stages. This study deciphers the characteristics and poly(A) signal patterns for both canonical APA sites and non-canonical APA sites across different developmental stages and gender dimorphisms in X. tropicalis, providing new insights into the dynamic regulation of distal and proximal APA.
Collapse
Affiliation(s)
- Sheng Zhu
- Department of Automation, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.,Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen, China
| | - Congting Ye
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.,Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, China
| | - Moliang Chen
- Department of Automation, Xiamen University, Xiamen, China
| | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA, United States
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.,Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, China
| |
Collapse
|
19
|
Chen M, Ji G, Fu H, Lin Q, Ye C, Ye W, Su Y, Wu X. A survey on identification and quantification of alternative polyadenylation sites from RNA-seq data. Brief Bioinform 2019; 21:1261-1276. [PMID: 31267126 DOI: 10.1093/bib/bbz068] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 05/03/2019] [Accepted: 05/14/2019] [Indexed: 12/13/2022] Open
Abstract
Alternative polyadenylation (APA) has been implicated to play an important role in post-transcriptional regulation by regulating mRNA abundance, stability, localization and translation, which contributes considerably to transcriptome diversity and gene expression regulation. RNA-seq has become a routine approach for transcriptome profiling, generating unprecedented data that could be used to identify and quantify APA site usage. A number of computational approaches for identifying APA sites and/or dynamic APA events from RNA-seq data have emerged in the literature, which provide valuable yet preliminary results that should be refined to yield credible guidelines for the scientific community. In this review, we provided a comprehensive overview of the status of currently available computational approaches. We also conducted objective benchmarking analysis using RNA-seq data sets from different species (human, mouse and Arabidopsis) and simulated data sets to present a systematic evaluation of 11 representative methods. Our benchmarking study showed that the overall performance of all tools investigated is moderate, reflecting that there is still lot of scope to improve the prediction of APA site or dynamic APA events from RNA-seq data. Particularly, prediction results from individual tools differ considerably, and only a limited number of predicted APA sites or genes are common among different tools. Accordingly, we attempted to give some advice on how to assess the reliability of the obtained results. We also proposed practical recommendations on the appropriate method applicable to diverse scenarios and discussed implications and future directions relevant to profiling APA from RNA-seq data.
Collapse
Affiliation(s)
- Moliang Chen
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Qianmin Lin
- Xiang' an hospital of Xiamen university, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Wenbin Ye
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| |
Collapse
|
20
|
Ye C, Long Y, Ji G, Li QQ, Wu X. APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data. Bioinformatics 2019; 34:1841-1849. [PMID: 29360928 DOI: 10.1093/bioinformatics/bty029] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Accepted: 01/17/2018] [Indexed: 12/28/2022] Open
Abstract
Motivation Alternative polyadenylation (APA) has been increasingly recognized as a crucial mechanism that contributes to transcriptome diversity and gene expression regulation. As RNA-seq has become a routine protocol for transcriptome analysis, it is of great interest to leverage such unprecedented collection of RNA-seq data by new computational methods to extract and quantify APA dynamics in these transcriptomes. However, research progress in this area has been relatively limited. Conventional methods rely on either transcript assembly to determine transcript 3' ends or annotated poly(A) sites. Moreover, they can neither identify more than two poly(A) sites in a gene nor detect dynamic APA site usage considering more than two poly(A) sites. Results We developed an approach called APAtrap based on the mean squared error model to identify and quantify APA sites from RNA-seq data. APAtrap is capable of identifying novel 3' UTRs and 3' UTR extensions, which contributes to locating potential poly(A) sites in previously overlooked regions and improving genome annotations. APAtrap also aims to tally all potential poly(A) sites and detect genes with differential APA site usages between conditions. Extensive comparisons of APAtrap with two other latest methods, ChangePoint and DaPars, using various RNA-seq datasets from simulation studies, human and Arabidopsis demonstrate the efficacy and flexibility of APAtrap for any organisms with an annotated genome. Availability and implementation Freely available for download at https://apatrap.sourceforge.io. Contact liqq@xmu.edu.cn or xhuister@xmu.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Yuqi Long
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Qingshun Quinn Li
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China.,Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA 91766, USA
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| |
Collapse
|
21
|
Albalawi F, Chahid A, Guo X, Albaradei S, Magana-Mora A, Jankovic BR, Uludag M, Van Neste C, Essack M, Laleg-Kirati TM, Bajic VB. Hybrid model for efficient prediction of poly(A) signals in human genomic DNA. Methods 2019; 166:31-39. [PMID: 30991099 DOI: 10.1016/j.ymeth.2019.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 03/12/2019] [Accepted: 04/01/2019] [Indexed: 12/15/2022] Open
Abstract
Polyadenylation signals (PAS) are found in most protein-coding and some non-coding genes in eukaryotes. Their accurate recognition improves understanding gene regulation mechanisms and recognition of the 3'-end of transcribed gene regions where premature or alternate transcription ends may lead to various diseases. Although different methods and tools for in-silico prediction of genomic signals have been proposed, the correct identification of PAS in genomic DNA remains challenging due to a vast number of non-relevant hexamers identical to PAS hexamers. In this study, we developed a novel method for PAS recognition. The method is implemented in a hybrid PAS recognition model (HybPAS), which is based on deep neural networks (DNNs) and logistic regression models (LRMs). One of such models is developed for each of the 12 most frequent human PAS hexamers. DNN models appeared the best for eight PAS types (including the two most frequent PAS hexamers), while LRM appeared best for the remaining four PAS types. The new models use different combinations of signal processing-based, statistical, and sequence-based features as input. The results obtained on human genomic data show that HybPAS outperforms the well-tuned state-of-the-art Omni-PolyA models, reducing the classification error for different PAS hexamers by up to 57.35% for 10 out of 12 PAS types, with Omni-PolyA models being better for two PAS types. For the most frequent PAS types, 'AATAAA' and 'ATTAAA', HybPAS reduced the error rate by 35.14% and 34.48%, respectively. On average, HybPAS reduces the error by 30.29%. HybPAS is implemented partly in Python and in MATLAB available at https://github.com/EMANG-KAUST/PolyA_Prediction_LRM_DNN.
Collapse
Affiliation(s)
- Fahad Albalawi
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia; Taif University, Electrical Engineering, Taif 21944, Saudi Arabia
| | - Abderrazak Chahid
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Xingang Guo
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Somayah Albaradei
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Arturo Magana-Mora
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia; Saudi Aramco, EXPEC-ARC, Drilling Technology Team, Dhahran 31311, Saudi Arabia
| | - Boris R Jankovic
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Mahmut Uludag
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Christophe Van Neste
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia; Ghent University, Center for Medical Genetics Ghent (CMGG), B-9000 Ghent, Belgium
| | - Magbubah Essack
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Taous-Meriem Laleg-Kirati
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia.
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia.
| |
Collapse
|
22
|
Harrison BJ, Park JW, Gomes C, Petruska JC, Sapio MR, Iadarola MJ, Chariker JH, Rouchka EC. Detection of Differentially Expressed Cleavage Site Intervals Within 3' Untranslated Regions Using CSI-UTR Reveals Regulated Interaction Motifs. Front Genet 2019; 10:182. [PMID: 30915105 PMCID: PMC6422928 DOI: 10.3389/fgene.2019.00182] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 02/19/2019] [Indexed: 01/08/2023] Open
Abstract
The length of untranslated regions at the 3' end of transcripts (3'UTRs) is regulated by alternate polyadenylation (APA). 3'UTRs contain regions that harbor binding motifs for regulatory molecules. However, the mechanisms that coordinate the 3'UTR length of specific groups of transcripts are not well-understood. We therefore developed a method, CSI-UTR, that models 3'UTR structure as tandem segments between functional alternative-polyadenylation sites (termed cleavage site intervals-CSIs). This approach facilitated (1) profiling of 3'UTR isoform expression changes and (2) statistical enrichment of putative regulatory motifs. CSI-UTR analysis is UTR-annotation independent and can interrogate legacy data generated from standard RNA-Seq libraries. CSI-UTR identified a set of CSIs in human and rodent transcriptomes. Analysis of RNA-Seq datasets from neural tissue identified differential expression events within 3'UTRs not detected by standard gene-based differential expression analyses. Further, in many instances 3'UTR and CDS from the same gene were regulated differently. This modulation of motifs for RNA-interacting molecules with potential condition-dependent and tissue-specific RNA binding partners near the polyA signal and CSI junction may play a mechanistic role in the specificity of alternative polyadenylation. Source code, CSI BED files and example datasets are available at: https://github.com/UofLBioinformatics/CSI-UTR.
Collapse
Affiliation(s)
- Benjamin J Harrison
- Department of Biomedical Sciences, Center for Excellence in the Neurosciences, College of Osteopathic Medicine, University of New England, Biddeford, ME, United States.,Department of Anatomical Sciences and Neurobiology, University of Louisville, Louisville, KY, United States.,Kentucky Biomedical Research Infrastructure Network Bioinformatics Core, Louisville, KY, United States
| | - Juw Won Park
- Kentucky Biomedical Research Infrastructure Network Bioinformatics Core, Louisville, KY, United States.,Department of Computer Engineering and Computer Science, Speed School of Engineering, University of Louisville, Louisville, KY, United States
| | - Cynthia Gomes
- Department of Anatomical Sciences and Neurobiology, University of Louisville, Louisville, KY, United States
| | - Jeffrey C Petruska
- Department of Anatomical Sciences and Neurobiology, University of Louisville, Louisville, KY, United States.,Kentucky Spinal Cord Injury Research Center, University of Louisville, Louisville, KY, United States.,Department of Neurological Surgery, University of Louisville, Louisville, KY, United States
| | - Matthew R Sapio
- Department of Perioperative Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, United States
| | - Michael J Iadarola
- Department of Perioperative Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, United States
| | - Julia H Chariker
- Department of Anatomical Sciences and Neurobiology, University of Louisville, Louisville, KY, United States.,Kentucky Biomedical Research Infrastructure Network Bioinformatics Core, Louisville, KY, United States
| | - Eric C Rouchka
- Kentucky Biomedical Research Infrastructure Network Bioinformatics Core, Louisville, KY, United States.,Department of Computer Engineering and Computer Science, Speed School of Engineering, University of Louisville, Louisville, KY, United States
| |
Collapse
|
23
|
Ye W, Long Y, Ji G, Su Y, Ye P, Fu H, Wu X. Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis. BMC Genomics 2019; 20:75. [PMID: 30669970 PMCID: PMC6343338 DOI: 10.1186/s12864-019-5433-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Accepted: 01/03/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alternative polyadenylation (APA) has emerged as a pervasive mechanism that contributes to the transcriptome complexity and dynamics of gene regulation. The current tsunami of whole genome poly(A) site data from various conditions generated by 3' end sequencing provides a valuable data source for the study of APA-related gene expression. Cluster analysis is a powerful technique for investigating the association structure among genes, however, conventional gene clustering methods are not suitable for APA-related data as they fail to consider the information of poly(A) sites (e.g., location, abundance, number, etc.) within each gene or measure the association among poly(A) sites between two genes. RESULTS Here we proposed a computational framework, named PASCCA, for clustering genes from replicated or unreplicated poly(A) site data using canonical correlation analysis (CCA). PASCCA incorporates multiple layers of gene expression data from both the poly(A) site level and gene level and takes into account the number of replicates and the variability within each experimental group. Moreover, PASCCA characterizes poly(A) sites in various ways including the abundance and relative usage, which can exploit the advantages of 3' end deep sequencing in quantifying APA sites. Using both real and synthetic poly(A) site data sets, the cluster analysis demonstrates that PASCCA outperforms other widely-used distance measures under five performance metrics including connectivity, the Dunn index, average distance, average distance between means, and the biological homogeneity index. We also used PASCCA to infer APA-specific gene modules from recently published poly(A) site data of rice and discovered some distinct functional gene modules. We have made PASCCA an easy-to-use R package for APA-related gene expression analyses, including the characterization of poly(A) sites, quantification of association between genes, and clustering of genes. CONCLUSIONS By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data. PASCCA could be used to elucidate the dynamic interplay of genes and their APA sites among various biological conditions from emerging 3' end sequencing data to address the complex biological phenomenon.
Collapse
Affiliation(s)
- Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Yuqi Long
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Software Quality Testing Engineering Research Center, China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, 510610, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350116, China
| | - Pengchao Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen, 361005, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, 361005, China. .,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China.
| |
Collapse
|
24
|
Zhu Y, Vaughn JC. Experimental Verification and Evolutionary Origin of 5'-UTR Polyadenylation Sites in Arabidopsis thaliana. FRONTIERS IN PLANT SCIENCE 2018; 9:969. [PMID: 30026753 PMCID: PMC6041940 DOI: 10.3389/fpls.2018.00969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Accepted: 06/15/2018] [Indexed: 06/08/2023]
Abstract
Messenger RNA (mRNA) polyadenylation is an indispensable step during post-transcriptional pre-mRNA processing for most genes in eukaryotes. The usage of one poly(A) site over another is known as alternative polyadenylation (APA). APA has been implicated in gene expression regulation through its role of selecting the ends of a transcript. Recent studies of polyadenylation profiles in the Arabidopsis database unexpectedly predicted that a portion of the poly(A) sites are located in the 5'-UTR, which remains to be experimentally verified. We selected 16 genes from a dataset of 744, based on criteria designed to minimize problems in interpretation. Here, we experimentally verify 5'-UTR-APA in Arabidopsis for 10 of the 16 selected genes, and show for the first time existence of independent polyadenylated 5'-UTR transcripts, arising due to alternative polyadenylation. We used 3'-RACE and sequencing to validate poly(A) sites and northern blot to show that the observed short upstream transcripts do not arise from the 3'-end of a previously unrecognized convergent gene. Evidence is reported showing that two of the independent upstream open reading frame (uORF) transcripts studied, one containing a complex dual uORF, very likely arose by exon shuffling following duplication of the 5'-end from the downstream major open reading frame (mORF). Finally, results are presented to show that the uORF in this gene may encode two short functional proteins, based on observation of amino acid sequence conservation encoded by the dual uORFs.
Collapse
|
25
|
Alternative polyadenylation drives genome-to-phenome information detours in the AMPKα1 and AMPKα2 knockout mice. Sci Rep 2018; 8:6462. [PMID: 29691479 PMCID: PMC5915415 DOI: 10.1038/s41598-018-24683-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 04/06/2018] [Indexed: 01/25/2023] Open
Abstract
Currently available mouse knockout (KO) lines remain largely uncharacterized for genome-to-phenome (G2P) information flows. Here we test our hypothesis that altered myogenesis seen in AMPKα1- and AMPKα2-KO mice is caused by use of alternative polyadenylation sites (APSs). AMPKα1 and AMPKα2 are two α subunits of adenosine monophosphate-activated protein kinase (AMPK), which serves as a cellular sensor in regulation of many biological events. A total of 56,483 APSs were derived from gastrocnemius muscles. The differentially expressed APSs (DE-APSs) that were down-regulated tended to be distal. The DE-APSs that were related to reduced and increased muscle mass were down-regulated in AMPKα1-KO mice, but up-regulated in AMPKα2-KO mice, respectively. Five genes: Car3 (carbonic anhydrase 3), Mylk4 (myosin light chain kinase family, member 4), Neb (nebulin), Obscn (obscurin) and Pfkm (phosphofructokinase, muscle) utilized different APSs with potentially antagonistic effects on muscle function. Overall, gene knockout triggers genome plasticity via use of APSs, completing the G2P processes. However, gene-based analysis failed to reach such a resolution. Therefore, we propose that alternative transcripts are minimal functional units in genomes and the traditional central dogma concept should be now examined under a systems biology approach.
Collapse
|
26
|
Afik S, Bartok O, Artyomov MN, Shishkin AA, Kadri S, Hanan M, Zhu X, Garber M, Kadener S. Defining the 5΄ and 3΄ landscape of the Drosophila transcriptome with Exo-seq and RNaseH-seq. Nucleic Acids Res 2017; 45:e95. [PMID: 28335028 PMCID: PMC5499799 DOI: 10.1093/nar/gkx133] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 02/15/2017] [Indexed: 01/19/2023] Open
Abstract
Cells regulate biological responses in part through changes in transcription start sites (TSS) or cleavage and polyadenylation sites (PAS). To fully understand gene regulatory networks, it is therefore critical to accurately annotate cell type-specific TSS and PAS. Here we present a simple and straightforward approach for genome-wide annotation of 5΄- and 3΄-RNA ends. Our approach reliably discerns bona fide PAS from false PAS that arise due to internal poly(A) tracts, a common problem with current PAS annotation methods. We applied our methodology to study the impact of temperature on the Drosophila melanogaster head transcriptome. We found hundreds of previously unidentified TSS and PAS which revealed two interesting phenomena: first, genes with multiple PASs tend to harbor a motif near the most proximal PAS, which likely represents a new cleavage and polyadenylation signal. Second, motif analysis of promoters of genes affected by temperature suggested that boundary element association factor of 32 kDa (BEAF-32) and DREF mediates a transcriptional program at warm temperatures, a result we validated in a fly line where beaf-32 is downregulated. These results demonstrate the utility of a high-throughput platform for complete experimental and computational analysis of mRNA-ends to improve gene annotation.
Collapse
Affiliation(s)
- Shaked Afik
- Biological Chemistry Department, Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Osnat Bartok
- Biological Chemistry Department, Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Maxim N Artyomov
- Department of Pathology and Immunology, Washington University School of Medicine, St Louis, MO 63110, USA.,Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Alexander A Shishkin
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Sabah Kadri
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Mor Hanan
- Biological Chemistry Department, Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Xiaopeng Zhu
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Manuel Garber
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Sebastian Kadener
- Biological Chemistry Department, Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| |
Collapse
|
27
|
Ji G, Lin Q, Long Y, Ye C, Ye W, Wu X. PAcluster: Clustering polyadenylation site data using canonical correlation analysis. J Bioinform Comput Biol 2017; 15:1750018. [PMID: 28874086 DOI: 10.1142/s0219720017500184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Alternative polyadenylation (APA) is a pervasive mechanism that contributes to gene regulation. Increasing sequenced poly(A) sites are placing new demands for the development of computational methods to investigate APA regulation. Cluster analysis is important to identify groups of co-expressed genes. However, clustering of poly(A) sites has not been extensively studied in APA, where most APA studies failed to consider the distribution, abundance, and variation of APA sites in each gene. Here we constructed a two-layer model based on canonical correlation analysis (CCA) to explore the underlying biological mechanisms in APA regulation. The first layer quantifies the general correlation of APA sites across various conditions between each gene and the second layer identifies genes with statistically significant correlation on their APA patterns to infer APA-specific gene clusters. Using hierarchical clustering, we comprehensively compared our method with four other widely used distance measures based on three performance indexes. Results showed that our method significantly enhanced the clustering performance for both synthetic and real poly(A) site data and could generate clusters with more biological meaning. We have implemented the CCA-based method as a publically available R package called PAcluster, which provides an efficient solution to the clustering of large APA-specific biological dataset.
Collapse
Affiliation(s)
- Guoli Ji
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Qianmin Lin
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Yuqi Long
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Congting Ye
- † College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, P. R. China
| | - Wenbin Ye
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Xiaohui Wu
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| |
Collapse
|
28
|
Guo C, Spinelli M, Liu M, Li QQ, Liang C. A Genome-wide Study of "Non-3UTR" Polyadenylation Sites in Arabidopsis thaliana. Sci Rep 2016; 6:28060. [PMID: 27301740 PMCID: PMC4908657 DOI: 10.1038/srep28060] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Accepted: 05/20/2016] [Indexed: 11/18/2022] Open
Abstract
Alternative polyadenylation has been recognized as a key contributor of gene expression regulation by generating different transcript isoforms with altered 3′ ends. Although polyadenylation is well known for marking the end of a 3′ UTR, an increasing number of studies have reported previously less-addressed polyadenylation events located in other parts of genes in many eukaryotic organisms. These other locations include 5′ UTRs, introns and coding sequences (termed herein as non-3UTR), as well as antisense and intergenic polyadenlation. Focusing on the non-3UTR polyadenylation sites (n3PASs), we detected and characterized more than 11000 n3PAS clusters in the Arabidopsis genome using poly(A)-tag sequencing data (PAT-Seq). Further analyses suggested that the occurrence of these n3PASs were positively correlated with certain characteristics of their respective host genes, including the presence of spliced, diminutive or diverse beginning of 5′ UTRs, number of introns and whether introns have extreme lengths. The interaction of the host genes with surrounding genetic elements, like a convergently overlapped gene and associated transposable element, may contribute to the generation of a n3PAS as well. Collectively, these results provide a better understanding of n3PASs, and offer some new insights of the underlying mechanisms for non-3UTR polyadenylation and its regulation in plants.
Collapse
Affiliation(s)
- Cheng Guo
- Department of Biology, Miami University, Oxford, OH 45056, USA
| | | | - Man Liu
- Department of Biology, Miami University, Oxford, OH 45056, USA
| | - Qingshun Q Li
- Key Laboratory of the Ministry of Education for Costal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China.,Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA 91766, USA
| | - Chun Liang
- Department of Biology, Miami University, Oxford, OH 45056, USA
| |
Collapse
|