1
|
Han B, Bai S, Liu Y, Wu J, Feng X, Xin R. Definer: A computational method for accurate identification of RNA pseudouridine sites based on deep learning. PLoS One 2025; 20:e0320077. [PMID: 40273178 PMCID: PMC12021131 DOI: 10.1371/journal.pone.0320077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 02/12/2025] [Indexed: 04/26/2025] Open
Abstract
Pseudouridine is an important modification site, which is widely present in a variety of non-coding RNAs and is involved in a variety of important biological processes. Studies have shown that pseudouridine is important in many biological functions such as gene expression, RNA structural stability, and various diseases. Therefore, accurate identification of pseudouridine sites can effectively explain the functional mechanism of this modification site. Due to the rapid increase of genomics data, traditional biological experimental methods to identify RNA modification sites can no longer meet the practical needs, and it is necessary to accurately identify pseudouridine sites from high-throughput RNA sequence data by computational methods. In this study, we propose a deep learning-based computational method, Definer, to accurately identify RNA pseudouridine loci in three species, Homo sapiens, Saccharomyces cerevisiae and Mus musculus. The method incorporates two sequence coding schemes, including NCP and One-hot, and then feeds the extracted RNA sequence features into a deep learning model constructed from CNN, GRU and Attention. The benchmark dataset contains data from three species, H. sapiens, S. cerevisiae and M. musculus, and the results using 10-fold cross-validation show that Definer significantly outperforms other existing methods. Meanwhile, the data sets of two species, H. sapiens and S. cerevisiae, were tested independently to further demonstrate the predictive ability of the model. In summary, our method, Definer, can accurately identify pseudouridine modification sites in RNA.
Collapse
Affiliation(s)
- Bo Han
- Jilin Chemical Hospital, Jilin, P.R. China
| | - Sudan Bai
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, P.R. China
| | - Yang Liu
- Jilin Chemical Hospital, Jilin, P.R. China
| | - Jiezhang Wu
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, P.R. China
| | - Xin Feng
- School of Science, Jilin Institute of Chemical Technology, Jilin, P.R. China
| | - Ruihao Xin
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, P.R. China
| |
Collapse
|
2
|
Wang H, Wang Y, Zhou J, Song B, Tu G, Nguyen A, Su J, Coenen F, Wei Z, Rigden DJ, Meng J. Statistical modeling of single-cell epitranscriptomics enabled trajectory and regulatory inference of RNA methylation. CELL GENOMICS 2025; 5:100702. [PMID: 39642887 PMCID: PMC11770222 DOI: 10.1016/j.xgen.2024.100702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 10/07/2024] [Accepted: 11/06/2024] [Indexed: 12/09/2024]
Abstract
As a fundamental mechanism for gene expression regulation, post-transcriptional RNA methylation plays versatile roles in various biological processes and disease mechanisms. Recent advances in single-cell technology have enabled simultaneous profiling of transcriptome-wide RNA methylation in thousands of cells, holding the promise to provide deeper insights into the dynamics, functions, and regulation of RNA methylation. However, it remains a major challenge to determine how to best analyze single-cell epitranscriptomics data. In this study, we developed SigRM, a computational framework for effectively mining single-cell epitranscriptomics datasets with a large cell number, such as those produced by the scDART-seq technique from the SMART-seq2 platform. SigRM not only outperforms state-of-the-art models in RNA methylation site detection on both simulated and real datasets but also provides rigorous quantification metrics of RNA methylation levels. This facilitates various downstream analyses, including trajectory inference and regulatory network reconstruction concerning the dynamics of RNA methylation.
Collapse
Affiliation(s)
- Haozhe Wang
- Department of Biosciences and Bioinformatics, Center for Intelligent RNA Therapeutics, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, School of Science, XJTLU Entrepreneur College, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China; Department of Computer Science, University of Liverpool, L7 8TX Liverpool, UK
| | - Yue Wang
- School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, Jiangsu 210023, China.
| | - Jingxian Zhou
- School of AI and Advanced Computing, XJTLU Entrepreneur College, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China; Department of Computer Science, University of Liverpool, L7 8TX Liverpool, UK; Sino-French Hoffmann Institute, School of Basic Medical Sciences, Guangzhou Medical University, Guangzhou, Guangdong 511436, China
| | - Bowen Song
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing, Jiangsu 210023, China; Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Gang Tu
- Department of Biosciences and Bioinformatics, Center for Intelligent RNA Therapeutics, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, School of Science, XJTLU Entrepreneur College, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Anh Nguyen
- Department of Computer Science, University of Liverpool, L7 8TX Liverpool, UK
| | - Jionglong Su
- School of AI and Advanced Computing, XJTLU Entrepreneur College, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Frans Coenen
- Department of Computer Science, University of Liverpool, L7 8TX Liverpool, UK
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Jia Meng
- Department of Biosciences and Bioinformatics, Center for Intelligent RNA Therapeutics, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, School of Science, XJTLU Entrepreneur College, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China; Institute of Biomedical Research, Regulatory Mechanism and Targeted Therapy for Liver Cancer Shiyan Key Laboratory, Hubei Provincial Clinical Research Center for Precise Diagnosis and Treatment of Liver Cancer, Taihe Hospital, Hubei University of Medicine, Shiyan, Hubei 442000, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK.
| |
Collapse
|
3
|
Noor S, Naseem A, Awan HH, Aslam W, Khan S, AlQahtani SA, Ahmad N. Deep-m5U: a deep learning-based approach for RNA 5-methyluridine modification prediction using optimized feature integration. BMC Bioinformatics 2024; 25:360. [PMID: 39563239 DOI: 10.1186/s12859-024-05978-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 11/06/2024] [Indexed: 11/21/2024] Open
Abstract
BACKGROUND RNA 5-methyluridine (m5U) modifications play a crucial role in biological processes, making their accurate identification a key focus in computational biology. This paper introduces Deep-m5U, a robust predictor designed to enhance the prediction of m5U modifications. The proposed method, named Deep-m5U, utilizes a hybrid pseudo-K-tuple nucleotide composition (PseKNC) for sequence formulation, a Shapley Additive exPlanations (SHAP) algorithm for discriminant feature selection, and a deep neural network (DNN) as the classifier. RESULTS The model was evaluated using two benchmark datasets, i.e., Full Transcript and Mature mRNA. Deep-m5U achieved overall accuracies of 91.47% and 95.86% for the Full Transcript and Mature mRNA datasets with 10-fold cross-validation, and for independent samples, the model attained 92.94% and 95.17% accuracy. CONCLUSION Compared to existing models, Deep-m5U showed approximately 5.23% and 3.73% higher accuracy on the training data and 3.95% and 3.26% higher accuracy on independent samples for the Full Transcript and Mature mRNA datasets, respectively. The reliability and effectiveness of Deep-m5U make it a valuable tool for scientists and a potential asset in pharmaceutical design and research.
Collapse
Affiliation(s)
- Sumaiya Noor
- Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA
| | - Afshan Naseem
- Institute of Oceanography and Environment (INOS), Universiti Malaysia Terengganu, 21030, Kuala Nerus, Terengganu, Malaysia
| | - Hamid Hussain Awan
- Department of Computer Science, Muslim Youth University, Islamabad, Pakistan
| | - Wasiq Aslam
- Department of Computer Science, Muslim Youth University, Islamabad, Pakistan
| | - Salman Khan
- New Emerging Technologies and 5G Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Salman A AlQahtani
- New Emerging Technologies and 5G Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Nijad Ahmad
- Department of Computer Science, Khurasan University, Jalalabad, Afghanistan.
| |
Collapse
|
4
|
Chen M, Zou Q, Qi R, Ding Y. PseU-KeMRF: A Novel Method for Identifying RNA Pseudouridine Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1423-1435. [PMID: 38625768 DOI: 10.1109/tcbb.2024.3389094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Pseudouridine is a type of abundant RNA modification that is seen in many different animals and is crucial for a variety of biological functions. Accurately identifying pseudouridine sites within the RNA sequence is vital for the subsequent study of various biological mechanisms of pseudouridine. However, the use of traditional experimental methods faces certain challenges. The development of fast and convenient computational methods is necessary to accurately identify pseudouridine sites from RNA sequence information. To address this, we introduce a novel pseudouridine site prediction model called PseU-KeMRF, which can identify pseudouridine sites in three species, H. sapiens, S. cerevisiae, and M. musculus. Through comprehensive analysis, we selected four RNA coding schemes, including binary feature, position-specific trinucleotide propensity based on single strand (PSTNPss), nucleotide chemical property (NCP) and pseudo k-tuple composition (PseKNC). Then the support vector machine-recursive feature elimination (SVM-RFE) method was used for feature selection and the feature subset was optimized. Finally, the best feature subsets are input into the kernel based on multinomial random forests (KeMRF) classifier for cross-validation and independent testing. As a new classification method, compared with the traditional random forest, KeMRF not only improves the node splitting process of decision tree construction based on multinomial distribution, but also combines the easy to interpret kernel method for prediction, which makes the classification performance better. Our results indicate superior predictive performance of PseU-KeMRF over other existing models, which can prove that PseU-KeMRF is a highly competitive predictive model that can successfully identify pseudouridine sites in RNA sequences.
Collapse
|
5
|
Bortoletto E, Rosani U. Bioinformatics for Inosine: Tools and Approaches to Trace This Elusive RNA Modification. Genes (Basel) 2024; 15:996. [PMID: 39202357 PMCID: PMC11353476 DOI: 10.3390/genes15080996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 07/23/2024] [Accepted: 07/25/2024] [Indexed: 09/03/2024] Open
Abstract
Inosine is a nucleotide resulting from the deamination of adenosine in RNA. This chemical modification process, known as RNA editing, is typically mediated by a family of double-stranded RNA binding proteins named Adenosine Deaminase Acting on dsRNA (ADAR). While the presence of ADAR orthologs has been traced throughout the evolution of metazoans, the existence and extension of RNA editing have been characterized in a more limited number of animals so far. Undoubtedly, ADAR-mediated RNA editing plays a vital role in physiology, organismal development and disease, making the understanding of the evolutionary conservation of this phenomenon pivotal to a deep characterization of relevant biological processes. However, the lack of direct high-throughput methods to reveal RNA modifications at single nucleotide resolution limited an extended investigation of RNA editing. Nowadays, these methods have been developed, and appropriate bioinformatic pipelines are required to fully exploit this data, which can complement existing approaches to detect ADAR editing. Here, we review the current literature on the "bioinformatics for inosine" subject and we discuss future research avenues in the field.
Collapse
Affiliation(s)
| | - Umberto Rosani
- Department of Biology, University of Padova, 35131 Padova, Italy;
| |
Collapse
|
6
|
Wang X, Li P, Wang R, Gao X. PseUpred-ELPSO Is an Ensemble Learning Predictor with Particle Swarm Optimizer for Improving the Prediction of RNA Pseudouridine Sites. BIOLOGY 2024; 13:248. [PMID: 38666860 PMCID: PMC11048358 DOI: 10.3390/biology13040248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 03/27/2024] [Accepted: 04/01/2024] [Indexed: 04/28/2024]
Abstract
RNA pseudouridine modification exists in different RNA types of many species, and it has a significant role in regulating the expression of biological processes. To understand the functional mechanisms for RNA pseudouridine sites, the accurate identification of pseudouridine sites in RNA sequences is essential. Although several fast and inexpensive computational methods have been proposed, the challenge of improving recognition accuracy and generalization still exists. This study proposed a novel ensemble predictor called PseUpred-ELPSO for improved RNA pseudouridine site prediction. After analyzing the nucleotide composition preferences between RNA pseudouridine site sequences, two feature representations were determined and fed into the stacking ensemble framework. Then, using five tree-based machine learning classifiers as base classifiers, 30-dimensional RNA profiles are constructed to represent RNA sequences, and using the PSO algorithm, the weights of the RNA profiles were searched to further enhance the representation. A logistic regression classifier was used as a meta-classifier to complete the final predictions. Compared to the most advanced predictors, the performance of PseUpred-ELPSO is superior in both cross-validation and the independent test. Based on the PseUpred-ELPSO predictor, a free and easy-to-operate web server has been established, which will be a powerful tool for pseudouridine site identification.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer Science and Technology, Zhengzhou University of Light Industry, No. 136, Science Avenue, Zhengzhou 450002, China; (X.W.); (P.L.)
- Henan Provincial Key Laboratory of Data Intelligence for Food Safety, Zhengzhou University of Light Industry, No. 136, Science Avenue, Zhengzhou 450002, China
| | - Pengfei Li
- School of Computer Science and Technology, Zhengzhou University of Light Industry, No. 136, Science Avenue, Zhengzhou 450002, China; (X.W.); (P.L.)
| | - Rong Wang
- School of Electronic Information, Zhengzhou University of Light Industry, No. 136, Science Avenue, Zhengzhou 450002, China;
| | - Xu Gao
- National Supercomputing Center in Zhengzhou, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|
7
|
Chen M, Sun M, Su X, Tiwari P, Ding Y. Fuzzy kernel evidence Random Forest for identifying pseudouridine sites. Brief Bioinform 2024; 25:bbae169. [PMID: 38622357 PMCID: PMC11018548 DOI: 10.1093/bib/bbae169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/27/2024] [Accepted: 03/31/2024] [Indexed: 04/17/2024] Open
Abstract
Pseudouridine is an RNA modification that is widely distributed in both prokaryotes and eukaryotes, and plays a critical role in numerous biological activities. Despite its importance, the precise identification of pseudouridine sites through experimental approaches poses significant challenges, requiring substantial time and resources.Therefore, there is a growing need for computational techniques that can reliably and quickly identify pseudouridine sites from vast amounts of RNA sequencing data. In this study, we propose fuzzy kernel evidence Random Forest (FKeERF) to identify pseudouridine sites. This method is called PseU-FKeERF, which demonstrates high accuracy in identifying pseudouridine sites from RNA sequencing data. The PseU-FKeERF model selected four RNA feature coding schemes with relatively good performance for feature combination, and then input them into the newly proposed FKeERF method for category prediction. FKeERF not only uses fuzzy logic to expand the original feature space, but also combines kernel methods that are easy to interpret in general for category prediction. Both cross-validation tests and independent tests on benchmark datasets have shown that PseU-FKeERF has better predictive performance than several state-of-the-art methods. This new method not only improves the accuracy of pseudouridine site identification, but also provides a certain reference for disease control and related drug development in the future.
Collapse
Affiliation(s)
- Mingshuai Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| | - Mingai Sun
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Xi Su
- Foshan Women and Children Hospital, Foshan 528000, China
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| |
Collapse
|
8
|
Wang R, Chung CR, Lee TY. Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species. Int J Mol Sci 2024; 25:2869. [PMID: 38474116 DOI: 10.3390/ijms25052869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 02/19/2024] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open
Abstract
RNA modification plays a crucial role in cellular regulation. However, traditional high-throughput sequencing methods for elucidating their functional mechanisms are time-consuming and labor-intensive, despite extensive research. Moreover, existing methods often limit their focus to specific species, neglecting the simultaneous exploration of RNA modifications across diverse species. Therefore, a versatile computational approach is necessary for interpretable analysis of RNA modifications across species. A multi-scale biological language-based deep learning model is proposed for interpretable, sequential-level prediction of diverse RNA modifications. Benchmark comparisons across species demonstrate the model's superiority in predicting various RNA methylation types over current state-of-the-art methods. The cross-species validation and attention weight visualization also highlight the model's capability to capture sequential and functional semantics from genomic backgrounds. Our analysis of RNA modifications helps us find the potential existence of "biological grammars" in each modification type, which could be effective for mapping methylation-related sequential patterns and understanding the underlying biological mechanisms of RNA modifications.
Collapse
Affiliation(s)
- Rulan Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 320317, Taiwan
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan
| |
Collapse
|
9
|
Ghanim GE, Sekne Z, Balch S, van Roon AMM, Nguyen THD. 2.7 Å cryo-EM structure of human telomerase H/ACA ribonucleoprotein. Nat Commun 2024; 15:746. [PMID: 38272871 PMCID: PMC10811338 DOI: 10.1038/s41467-024-45002-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 01/03/2024] [Indexed: 01/27/2024] Open
Abstract
Telomerase is a ribonucleoprotein (RNP) enzyme that extends telomeric repeats at eukaryotic chromosome ends to counterbalance telomere loss caused by incomplete genome replication. Human telomerase is comprised of two distinct functional lobes tethered by telomerase RNA (hTR): a catalytic core, responsible for DNA extension; and a Hinge and ACA (H/ACA) box RNP, responsible for telomerase biogenesis. H/ACA RNPs also have a general role in pseudouridylation of spliceosomal and ribosomal RNAs, which is critical for the biogenesis of the spliceosome and ribosome. Much of our structural understanding of eukaryotic H/ACA RNPs comes from structures of the human telomerase H/ACA RNP. Here we report a 2.7 Å cryo-electron microscopy structure of the telomerase H/ACA RNP. The significant improvement in resolution over previous 3.3 Å to 8.2 Å structures allows us to uncover new molecular interactions within the H/ACA RNP. Many disease mutations are mapped to these interaction sites. The structure also reveals unprecedented insights into a region critical for pseudouridylation in canonical H/ACA RNPs. Together, our work advances understanding of telomerase-related disease mutations and the mechanism of pseudouridylation by eukaryotic H/ACA RNPs.
Collapse
Affiliation(s)
| | - Zala Sekne
- MRC Laboratory of Molecular Biology, Cambridge, CB2 0QH, UK
| | | | | | | |
Collapse
|
10
|
Song B, Huang D, Zhang Y, Wei Z, Su J, Pedro de Magalhães J, Rigden DJ, Meng J, Chen K. m6A-TSHub: Unveiling the Context-specific m 6A Methylation and m 6A-affecting Mutations in 23 Human Tissues. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:678-694. [PMID: 36096444 PMCID: PMC10787194 DOI: 10.1016/j.gpb.2022.09.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 08/19/2022] [Accepted: 09/02/2022] [Indexed: 06/15/2023]
Abstract
As the most pervasive epigenetic marker present on mRNAs and long non-coding RNAs (lncRNAs), N6-methyladenosine (m6A) RNA methylation has been shown to participate in essential biological processes. Recent studies have revealed the distinct patterns of m6A methylome across human tissues, and a major challenge remains in elucidating the tissue-specific presence and circuitry of m6A methylation. We present here a comprehensive online platform, m6A-TSHub, for unveiling the context-specific m6A methylation and genetic mutations that potentially regulate m6A epigenetic mark. m6A-TSHub consists of four core components, including (1) m6A-TSDB, a comprehensive database of 184,554 functionally annotated m6A sites derived from 23 human tissues and 499,369 m6A sites from 25 tumor conditions, respectively; (2) m6A-TSFinder, a web server for high-accuracy prediction of m6A methylation sites within a specific tissue from RNA sequences, which was constructed using multi-instance deep neural networks with gated attention; (3) m6A-TSVar, a web server for assessing the impact of genetic variants on tissue-specific m6A RNA modifications; and (4) m6A-CAVar, a database of 587,983 The Cancer Genome Atlas (TCGA) cancer mutations (derived from 27 cancer types) that were predicted to affect m6A modifications in the primary tissue of cancers. The database should make a useful resource for studying the m6A methylome and the genetic factors of epitranscriptome disturbance in a specific tissue (or cancer type). m6A-TSHub is accessible at www.xjtlu.edu.cn/biologicalsciences/m6ats.
Collapse
Affiliation(s)
- Bowen Song
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China; Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Daiyun Huang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Department of Computer Science, University of Liverpool, Liverpool L69 7ZB, United Kingdom.
| | - Yuxin Zhang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Ageing & Chronic Disease, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Jionglong Su
- School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - João Pedro de Magalhães
- Institute of Ageing & Chronic Disease, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Jia Meng
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom; Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; AI University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Kunqi Chen
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China.
| |
Collapse
|
11
|
Adachi H, Pan Y, He X, Chen JL, Klein B, Platenburg G, Morais P, Boutz P, Yu YT. Targeted pseudouridylation: An approach for suppressing nonsense mutations in disease genes. Mol Cell 2023; 83:637-651.e9. [PMID: 36764303 PMCID: PMC9975048 DOI: 10.1016/j.molcel.2023.01.009] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 12/18/2022] [Accepted: 01/05/2023] [Indexed: 02/11/2023]
Abstract
Nonsense mutations create premature termination codons (PTCs), activating the nonsense-mediated mRNA decay (NMD) pathway to degrade most PTC-containing mRNAs. The undegraded mRNA is translated, but translation terminates at the PTC, leading to no production of the full-length protein. This work presents targeted PTC pseudouridylation, an approach for nonsense suppression in human cells. Specifically, an artificial box H/ACA guide RNA designed to target the mRNA PTC can suppress both NMD and premature translation termination in various sequence contexts. Targeted pseudouridylation exhibits a level of suppression comparable with that of aminoglycoside antibiotic treatments. When targeted pseudouridylation is combined with antibiotic treatment, a much higher level of suppression is observed. Transfection of a disease model cell line (carrying a chromosomal PTC) with a designer guide RNA gene targeting the PTC also leads to nonsense suppression. Thus, targeted pseudouridylation is an RNA-directed gene-specific approach that suppresses NMD and concurrently promotes PTC readthrough.
Collapse
Affiliation(s)
- Hironori Adachi
- Department of Biochemistry and Biophysics, Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA
| | - Yi Pan
- Department of Biochemistry and Biophysics, Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA
| | - Xueyang He
- Department of Biochemistry and Biophysics, Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA
| | - Jonathan L Chen
- Department of Biochemistry and Biophysics, Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA
| | - Bart Klein
- ProQR Therapeutics, Leiden, the Netherlands
| | | | | | - Paul Boutz
- Department of Biochemistry and Biophysics, Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA; Center for Biomedical Informatics and Wilmot Cancer Institute, University of Rochester Medical Center, Rochester, NY, USA.
| | - Yi-Tao Yu
- Department of Biochemistry and Biophysics, Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA.
| |
Collapse
|
12
|
Zhang X, Wang S, Xie L, Zhu Y. PseU-ST: A new stacked ensemble-learning method for identifying RNA pseudouridine sites. Front Genet 2023; 14:1121694. [PMID: 36741328 PMCID: PMC9892456 DOI: 10.3389/fgene.2023.1121694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 01/09/2023] [Indexed: 01/20/2023] Open
Abstract
Background: Pseudouridine (Ψ) is one of the most abundant RNA modifications found in a variety of RNA types, and it plays a significant role in many biological processes. The key to studying the various biochemical functions and mechanisms of Ψ is to identify the Ψ sites. However, identifying Ψ sites using experimental methods is time-consuming and expensive. Therefore, it is necessary to develop computational methods that can accurately predict Ψ sites based on RNA sequence information. Methods: In this study, we proposed a new model called PseU-ST to identify Ψ sites in Homo sapiens (H. sapiens), Saccharomyces cerevisiae (S. cerevisiae), and Mus musculus (M. musculus). We selected the best six encoding schemes and four machine learning algorithms based on a comprehensive test of almost all of the RNA sequence encoding schemes available in the iLearnPlus software package, and selected the optimal features for each encoding scheme using chi-square and incremental feature selection algorithms. Then, we selected the optimal feature combination and the best base-classifier combination for each species through an extensive performance comparison and employed a stacking strategy to build the predictive model. Results: The results demonstrated that PseU-ST achieved better prediction performance compared with other existing models. The PseU-ST accuracy scores were 93.64%, 87.74%, and 89.64% on H_990, S_628, and M_944, respectively, representing increments of 13.94%, 6.05%, and 0.26%, respectively, higher than the best existing methods on the same benchmark training datasets. Conclusion: The data indicate that PseU-ST is a very competitive prediction model for identifying RNA Ψ sites in H. sapiens, M. musculus, and S. cerevisiae. In addition, we found that the Position-specific trinucleotide propensity based on single strand (PSTNPss) and Position-specific of three nucleotides (PS3) features play an important role in Ψ site identification. The source code for PseU-ST and the data are obtainable in our GitHub repository (https://github.com/jluzhangxinrubio/PseU-ST).
Collapse
|
13
|
Guo X, Li F, Song J. Predicting Pseudouridine Sites with Porpoise. Methods Mol Biol 2023; 2624:139-151. [PMID: 36723814 DOI: 10.1007/978-1-0716-2962-8_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Pseudouridine is a ubiquitous RNA modification and plays a crucial role in many biological processes. However, it remains a challenging task to identify pseudouridine sites using expensive and time-consuming experimental research. To this end, we present Porpoise, a computational approach to identify pseudouridine sites from RNA sequence data. Porpoise builds on a stacking ensemble learning framework with several informative features and achieves competitive performance compared with state-of-the-art approaches. This protocol elaborates on step-by-step use and execution of the local stand-alone version and the webserver of Porpoise. In addition, we also provide a general machine learning framework that can help identify the optimal stacking ensemble learning model using different combinations of feature-based features. This general machine learning framework can facilitate users to build their pseudouridine predictors using their in-house datasets.
Collapse
Affiliation(s)
- Xudong Guo
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Fuyi Li
- College of Information Engineering, Northwest A&F University, Yangling, China.
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia.
- Monash Data Futures Institute, Monash University, Melbourne, VIC, Australia.
| |
Collapse
|
14
|
Yao J, Hao C, Chen K, Meng J, Song B. Pseudouridine Identification and Functional Annotation with PIANO. Methods Mol Biol 2023; 2624:153-162. [PMID: 36723815 DOI: 10.1007/978-1-0716-2962-8_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Pseudouridine (Ψ) is the first-discovered RNA modification abundantly present in many classes of RNAs, which plays a pivotal role in a series of biological processes. Accurately identifying the location of Ψ sites is helpful for relevant downstream researches. In this chapter, we introduce a website PIANO-for pseudouridine site (Ψ) identification and functional annotation, which enables researchers to predict human putative Ψ sites with a high-accuracy (average AUC of 0.955 under the full transcript model and 0.838 under the mature mRNA model when testing on six independent datasets). The posttranscriptional regulatory mechanisms of putative Ψ sites including miRNA-targets, RBP-binding regions, and splicing sites were also annotated. A comprehensive query database was also provided to deposit over 4300 human Ψ modifications, which is currently the most complete collection of experimental-derived Ψ sites. The PIANO website is freely accessible at: http://piano.rnamd.com or http://180.208.58.19/Ψ-WHISTLE .
Collapse
Affiliation(s)
- Jiahui Yao
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, China
| | - Cuiyueyue Hao
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, China
| | - Kunqi Chen
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, China
- AI University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Bowen Song
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, China.
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK.
| |
Collapse
|
15
|
Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022; 16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Ribonucleic acid (RNA) methylation is the most abundant modification in biological systems, accounting for 60% of all RNA modifications, and affects multiple aspects of RNA (including mRNAs, tRNAs, rRNAs, microRNAs, and long non-coding RNAs). Dysregulation of RNA methylation causes many developmental diseases through various mechanisms mediated by N 6-methyladenosine (m6A), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), 5-hydroxymethylcytosine (hm5C), and pseudouridine (Ψ). The emerging tools of RNA methylation can be used as diagnostic, preventive, and therapeutic markers. Here, we review the accumulated discoveries to date regarding the biological function and dynamic regulation of RNA methylation/modification, as well as the most popularly used techniques applied for profiling RNA epitranscriptome, to provide new ideas for growth and development.
Collapse
Affiliation(s)
- Jia Zou
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Hui Liu
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Wei Tan
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Yi-qi Chen
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Jing Dong
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Shu-yuan Bai
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Zhao-xia Wu
- Community Health Service Center, Wuchang Hospital, Wuhan, China
| | - Yan Zeng
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,
| |
Collapse
|
16
|
The role of post-transcriptional modifications during development. Biol Futur 2022:10.1007/s42977-022-00142-3. [PMID: 36481986 DOI: 10.1007/s42977-022-00142-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 11/30/2022] [Indexed: 12/13/2022]
Abstract
AbstractWhile the existence of post-transcriptional modifications of RNA nucleotides has been known for decades, in most RNA species the exact positions of these modifications and their physiological function have been elusive until recently. Technological advances, such as high-throughput next-generation sequencing (NGS) methods and nanopore-based mapping technologies, have made it possible to map the position of these modifications with single nucleotide accuracy, and genetic screens have uncovered the “writer”, “reader” and “eraser” proteins that help to install, interpret and remove such modifications, respectively. These discoveries led to intensive research programmes with the aim of uncovering the roles of these modifications during diverse biological processes. In this review, we assess novel discoveries related to the role of post-transcriptional modifications during animal development, highlighting how these discoveries can affect multiple aspects of development from fertilization to differentiation in many species.
Collapse
|
17
|
RNADSN: Transfer-Learning 5-Methyluridine (m5U) Modification on mRNAs from Common Features of tRNA. Int J Mol Sci 2022; 23:ijms232113493. [PMID: 36362279 PMCID: PMC9655583 DOI: 10.3390/ijms232113493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 09/24/2022] [Accepted: 09/29/2022] [Indexed: 11/06/2022] Open
Abstract
One of the most abundant non-canonical bases widely occurring on various RNA molecules is 5-methyluridine (m5U). Recent studies have revealed its influences on the development of breast cancer, systemic lupus erythematosus, and the regulation of stress responses. The accurate identification of m5U sites is crucial for understanding their biological functions. We propose RNADSN, the first transfer learning deep neural network that learns common features between tRNA m5U and mRNA m5U to enhance the prediction of mRNA m5U. Without seeing the experimentally detected mRNA m5U sites, RNADSN has already outperformed the state-of-the-art method, m5UPred. Using mRNA m5U classification as an additional layer of supervision, our model achieved another distinct improvement and presented an average area under the receiver operating characteristic curve (AUC) of 0.9422 and an average precision (AP) of 0.7855. The robust performance of RNADSN was also verified by cross-technical and cross-cellular validation. The interpretation of RNADSN also revealed the sequence motif of common features. Therefore, RNADSN should be a useful tool for studying m5U modification.
Collapse
|
18
|
RNA modifications in aging-associated cardiovascular diseases. Aging (Albany NY) 2022; 14:8110-8136. [PMID: 36178367 PMCID: PMC9596201 DOI: 10.18632/aging.204311] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 09/17/2022] [Indexed: 11/25/2022]
Abstract
Cardiovascular disease (CVD) is a leading cause of morbidity and mortality worldwide that bears an enormous healthcare burden and aging is a major contributing factor to CVDs. Functional gene expression network during aging is regulated by mRNAs transcriptionally and by non-coding RNAs epi-transcriptionally. RNA modifications alter the stability and function of both mRNAs and non-coding RNAs and are involved in differentiation, development, and diseases. Here we review major chemical RNA modifications on mRNAs and non-coding RNAs, including N6-adenosine methylation, N1-adenosine methylation, 5-methylcytidine, pseudouridylation, 2′ -O-ribose-methylation, and N7-methylguanosine, in the aging process with an emphasis on cardiovascular aging. We also summarize the currently available methods to detect RNA modifications and the bioinformatic tools to study RNA modifications. More importantly, we discussed the specific implication of the RNA modifications on mRNAs and non-coding RNAs in the pathogenesis of aging-associated CVDs, including atherosclerosis, hypertension, coronary heart diseases, congestive heart failure, atrial fibrillation, peripheral artery disease, venous insufficiency, and stroke.
Collapse
|
19
|
Chen M, Zhang X, Ju Y, Liu Q, Ding Y. iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:13829-13850. [PMID: 36654069 DOI: 10.3934/mbe.2022644] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Biological sequence analysis is an important basic research work in the field of bioinformatics. With the explosive growth of data, machine learning methods play an increasingly important role in biological sequence analysis. By constructing a classifier for prediction, the input sequence feature vector is predicted and evaluated, and the knowledge of gene structure, function and evolution is obtained from a large amount of sequence information, which lays a foundation for researchers to carry out in-depth research. At present, many machine learning methods have been applied to biological sequence analysis such as RNA gene recognition and protein secondary structure prediction. As a biological sequence, RNA plays an important biological role in the encoding, decoding, regulation and expression of genes. The analysis of RNA data is currently carried out from the aspects of structure and function, including secondary structure prediction, non-coding RNA identification and functional site prediction. Pseudouridine (У) is the most widespread and rich RNA modification and has been discovered in a variety of RNAs. It is highly essential for the study of related functional mechanisms and disease diagnosis to accurately identify У sites in RNA sequences. At present, several computational approaches have been suggested as an alternative to experimental methods to detect У sites, but there is still potential for improvement in their performance. In this study, we present a model based on twin support vector machine (TWSVM) for У site identification. The model combines a variety of feature representation techniques and uses the max-relevance and min-redundancy methods to obtain the optimum feature subset for training. The independent testing accuracy is improved by 3.4% in comparison to current advanced У site predictors. The outcomes demonstrate that our model has better generalization performance and improves the accuracy of У site identification. iPseU-TWSVM can be a helpful tool to identify У sites.
Collapse
Affiliation(s)
- Mingshuai Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Xin Zhang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Qing Liu
- Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
20
|
Liu L, Song B, Chen K, Zhang Y, de Magalhães JP, Rigden DJ, Lei X, Wei Z. WHISTLE server: A high-accuracy genomic coordinate-based machine learning platform for RNA modification prediction. Methods 2022; 203:378-382. [PMID: 34245870 DOI: 10.1016/j.ymeth.2021.07.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 06/28/2021] [Accepted: 07/05/2021] [Indexed: 01/12/2023] Open
Abstract
The primary sequences of DNA, RNA and protein have been used as the dominant information source of existing machine learning tools, especially for contexts not fully explored by wet-experimental approaches. Since molecular markers are profoundly orchestrated in the living organisms, those markers that cannot be unambiguously recovered from the primary sequence often help to predict other biological events. To the best of our knowledge, there is no current tool to build and deploy machine learning models that consider genomic evidence. We therefore developed the WHISTLE server, the first machine learning platform based on genomic coordinates. It features convenient covariate extraction and model web deployment with 46 distinct genomic features integrated along with the conventional sequence features. We showed that, when predicting m6A sites from SRAMP project, the model integrating genomic features substantially outperformed those based on only sequence features. The WHISTLE server should be a useful tool for studying biological attributes specifically associated with genomic coordinates, and is freely accessible at: www.xjtlu.edu.cn/biologicalsciences/whi2.
Collapse
Affiliation(s)
- Lian Liu
- School of Computer Sciences, Shannxi Normal University, Xi'an, Shaanxi 710119, China
| | - Bowen Song
- Department of Mathematical Sciences, University of Liverpool, L69 7ZB Liverpool, United Kingdom; Institute of Ageing & Chronic Disease, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Kunqi Chen
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Yuxin Zhang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - João Pedro de Magalhães
- Institute of Ageing & Chronic Disease, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Xiujuan Lei
- School of Computer Sciences, Shannxi Normal University, Xi'an, Shaanxi 710119, China.
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom.
| |
Collapse
|
21
|
Hassan D, Acevedo D, Daulatabad SV, Mir Q, Janga SC. Penguin: A Tool for Predicting Pseudouridine Sites in Direct RNA Nanopore Sequencing Data. Methods 2022; 203:478-487. [PMID: 35182749 DOI: 10.1016/j.ymeth.2022.02.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 02/03/2022] [Accepted: 02/14/2022] [Indexed: 01/04/2023] Open
Abstract
Pseudouridine is one of the most abundant RNA modifications, occurring when uridines are catalyzed by Pseudouridine synthase proteins. It plays an important role in many biological processes and has been reported to have application in drug development. Recently, the single-molecule sequencing techniques such as the direct RNA sequencing platform offered by Oxford Nanopore technologies have enabled direct detection of RNA modifications on the molecule being sequenced. In this study, we introduce a tool called Penguin that integrates several machine learning (ML) models to identify RNA Pseudouridine sites on Nanopore direct RNA sequencing reads. Pseudouridine sites were identified on single molecule sequencing data collected from direct RNA sequencing resulting in 723K reads in Hek293 and 500K reads in Hela cell lines. Penguin extracts a set of features from the raw signal measured by the Oxford Nanopore and the corresponding basecalled k-mer. Those features are used to train the predictors included in Penguin, which in turn, can predict whether the signal is modified by the presence of Pseudouridine sites in the testing phase. We have included various predictors in Penguin, including Support vector machines (SVM), Random Forest (RF), and Neural network (NN). The results on the two benchmark data sets for Hek293 and Hela cell lines show outstanding performance of Penguin either in random split testing or in independent validation testing. In random split testing, Penguin has been able to identify Pseudouridine sites with a high accuracy of 93.38% by applying SVM to Hek293 benchmark dataset. In independent validation testing, Penguin achieves an accuracy of 92.61% by training SVM with Hek293 benchmark dataset and testing it for identifying Pseudouridine sites on Hela benchmark dataset. Thus, Penguin outperforms the existing Pseudouridine predictors in the literature by 16 % higher accuracy than those predictors using independent validation testing. Employing penguin to predict Pseudouridine revealed a significant enrichment of "regulation of mRNA 3'-end processing" in Hek293 cell line and positive regulation of transcription from RNA polymerase II promoter involved in cellular response to chemical stimulus in Hela cell line. Penguin software and models are available on GitHub at https://github.com/Janga-Lab/Penguin and can be readily employed for predicting Ψ sites from Nanopore direct RNA-sequencing datasets.
Collapse
Affiliation(s)
- Doaa Hassan
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 535 West Michigan Street, Indianapolis, Indiana 46202; Computers and Systems Department, National Telecommunication Institute, Cairo, Egypt
| | - Daniel Acevedo
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 535 West Michigan Street, Indianapolis, Indiana 46202; Computer Science Department, University of Texas Rio Grande Valley
| | - Swapna Vidhur Daulatabad
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 535 West Michigan Street, Indianapolis, Indiana 46202
| | - Quoseena Mir
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 535 West Michigan Street, Indianapolis, Indiana 46202
| | - Sarath Chandra Janga
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 535 West Michigan Street, Indianapolis, Indiana 46202; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Medical Research and Library Building, 975 West Walnut Street, Indianapolis, Indiana, 46202; Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), 410 West 10th Street, Indianapolis, Indiana, 46202.
| |
Collapse
|
22
|
Huang S, Zhang W, Katanski CD, Dersh D, Dai Q, Lolans K, Yewdell J, Eren AM, Pan T. Interferon inducible pseudouridine modification in human mRNA by quantitative nanopore profiling. Genome Biol 2021; 22:330. [PMID: 34872593 PMCID: PMC8646010 DOI: 10.1186/s13059-021-02557-y] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 11/23/2021] [Indexed: 01/28/2023] Open
Abstract
Pseudouridine (Ψ) is an abundant mRNA modification in mammalian transcriptome, but its functions have remained elusive due to the difficulty of transcriptome-wide mapping. We develop a nanopore native RNA sequencing method for quantitative Ψ prediction (NanoPsu) that utilizes native content training, machine learning modeling, and single-read linkage analysis. Biologically, we find interferon inducible Ψ modifications in interferon-stimulated gene transcripts which are consistent with a role of Ψ in enabling efficacy of mRNA vaccines.
Collapse
Affiliation(s)
- Sihao Huang
- Department of Biochemistry & Molecular Biology, University of Chicago, Chicago, IL 60637 USA
| | - Wen Zhang
- Department of Biochemistry & Molecular Biology, University of Chicago, Chicago, IL 60637 USA
| | - Christopher D. Katanski
- Department of Biochemistry & Molecular Biology, University of Chicago, Chicago, IL 60637 USA
| | - Devin Dersh
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892 USA
| | - Qing Dai
- Department of Chemistry, University of Chicago, Chicago, IL 60637 USA
| | - Karen Lolans
- Department of Medicine, University of Chicago, Chicago, IL 60637 USA
| | - Jonathan Yewdell
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892 USA
| | - A. Murat Eren
- Department of Medicine, University of Chicago, Chicago, IL 60637 USA
| | - Tao Pan
- Department of Biochemistry & Molecular Biology, University of Chicago, Chicago, IL 60637 USA
| |
Collapse
|
23
|
Wang X, Lin X, Wang R, Han N, Fan K, Han L, Ding Z. A Feature Fusion Predictor for RNA Pseudouridine Sites with Particle Swarm Optimizer Based Feature Selection and Ensemble Learning Approach. Curr Issues Mol Biol 2021; 43:1844-1858. [PMID: 34889887 PMCID: PMC8929013 DOI: 10.3390/cimb43030129] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 10/17/2021] [Accepted: 10/19/2021] [Indexed: 01/28/2023] Open
Abstract
RNA pseudouridine modification is particularly important in a variety of cellular biological and physiological processes. It plays a significant role in understanding RNA functions, RNA structure stabilization, translation processes, etc. To understand its functional mechanisms, it is necessary to accurately identify pseudouridine sites in RNA sequences. Although some computational methods have been proposed for the identification of pseudouridine sites, it is still a challenge to improve the identification accuracy and generalization ability. To address this challenge, a novel feature fusion predictor, named PsoEL-PseU, is proposed for the prediction of pseudouridine sites. Firstly, this study systematically and comprehensively explored different types of feature descriptors and determined six feature descriptors with various properties. To improve the feature representation ability, a binary particle swarm optimizer was used to capture the optimal feature subset for six feature descriptors. Secondly, six individual predictors were trained by using the six optimal feature subsets. Finally, to fuse the effects of all six features, six individual predictors were fused into an ensemble predictor by a parallel fusion strategy. Ten-fold cross-validation on three benchmark datasets indicated that the PsoEL-PseU predictor significantly outperformed the current state-of-the-art predictors. Additionally, the new predictor achieved better accuracy in the independent dataset evaluation-accuracy which is significantly higher than that of its existing counterparts-and the user-friendly webserver developed by the PsoEL-PseU predictor has been made freely accessible.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
- Correspondence:
| | - Xi Lin
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
| | - Rong Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
| | - Nijia Han
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
| | - Kaiqi Fan
- School of Material and Chemical Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China;
| | - Lijun Han
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
| | - Zhaoyuan Ding
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China; (X.L.); (R.W.); (N.H.); (L.H.); (Z.D.)
| |
Collapse
|
24
|
Liu M, Li H, Luo X, Cai J, Chen T, Xie Y, Ren J, Zuo Z. RPS: a comprehensive database of RNAs involved in liquid-liquid phase separation. Nucleic Acids Res 2021; 50:D347-D355. [PMID: 34718734 PMCID: PMC8728229 DOI: 10.1093/nar/gkab986] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 09/24/2021] [Accepted: 10/09/2021] [Indexed: 12/11/2022] Open
Abstract
Liquid–liquid phase separation (LLPS) is critical for assembling membraneless organelles (MLOs) such as nucleoli, P-bodies, and stress granules, which are involved in various physiological processes and pathological conditions. While the critical role of RNA in the formation and the maintenance of MLOs is increasingly appreciated, there is still a lack of specific resources for LLPS-related RNAs. Here, we presented RPS (http://rps.renlab.org), a comprehensive database of LLPS-related RNAs in 20 distinct biomolecular condensates from eukaryotes and viruses. Currently, RPS contains 21,613 LLPS-related RNAs with three different evidence types, including ‘Reviewed’, ‘High-throughput’ and ‘Predicted’. RPS provides extensive annotations of LLPS-associated RNA properties, including sequence features, RNA structures, RNA–protein/RNA–RNA interactions, and RNA modifications. Moreover, RPS also provides comprehensive disease annotations to help users to explore the relationship between LLPS and disease. The user-friendly web interface of RPS allows users to access the data efficiently. In summary, we believe that RPS will serve as a valuable platform to study the role of RNA in LLPS and further improve our understanding of the biological functions of LLPS.
Collapse
Affiliation(s)
- Mengni Liu
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Huiqin Li
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Xiaotong Luo
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Jieyi Cai
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Tianjian Chen
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Yubin Xie
- Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510060, China
| | - Jian Ren
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Zhixiang Zuo
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| |
Collapse
|
25
|
He X, Zhang S, Zhang Y, Lei Z, Jiang T, Zeng J. Characterizing RNA Pseudouridylation by Convolutional Neural Networks. GENOMICS, PROTEOMICS & BIOINFORMATICS 2021; 19:815-833. [PMID: 33631424 PMCID: PMC9170758 DOI: 10.1016/j.gpb.2019.11.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 09/15/2019] [Accepted: 11/13/2019] [Indexed: 12/12/2022]
Abstract
Pseudouridine (Ψ) is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and mRNAs. However, the functions, mechanisms, and precise distribution of Ψs (especially in mRNAs) still remain largely unclear. The landscape of Ψs across the transcriptome has not yet been fully delineated. Here, we present a highly effective model based on a convolutional neural network (CNN), called PseudoUridyLation Site Estimator (PULSE), to analyze large-scale profiling data of Ψ sites and characterize the contextual sequence features of pseudouridylation. PULSE, consisting of two alternatively-stacked convolution and pooling layers followed by a fully-connected neural network, can automatically learn the hidden patterns of pseudouridylation from the local sequence information. Extensive validation tests demonstrated that PULSE can outperform other state-of-the-art prediction methods and achieve high prediction accuracy, thus enabling us to further characterize the transcriptome-wide landscape of Ψ sites. We further showed that the prediction results derived from PULSE can provide novel insights into understanding the functional roles of pseudouridylation, such as the regulations of RNA secondary structure, codon usage, translation, and RNA stability, and the connection to single nucleotide variants. The source code and final model for PULSE are available at https://github.com/mlcb-thu/PULSE.
Collapse
Affiliation(s)
- Xuan He
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Sai Zhang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Yanqing Zhang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Zhixin Lei
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China; Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA; MOE Key Lab of Bioinformatics and Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; Institute of Integrative Genome Biology, University of California, Riverside, CA 92521, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China; MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
26
|
El Allali A, Elhamraoui Z, Daoud R. Machine learning applications in RNA modification sites prediction. Comput Struct Biotechnol J 2021; 19:5510-5524. [PMID: 34712397 PMCID: PMC8517552 DOI: 10.1016/j.csbj.2021.09.025] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/24/2021] [Accepted: 09/25/2021] [Indexed: 12/15/2022] Open
Abstract
Ribonucleic acid (RNA) modifications are post-transcriptional chemical composition changes that have a fundamental role in regulating the main aspect of RNA function. Recently, large datasets have become available thanks to the recent development in deep sequencing and large-scale profiling. This availability of transcriptomic datasets has led to increased use of machine learning based approaches in epitranscriptomics, particularly in identifying RNA modifications. In this review, we comprehensively explore machine learning based approaches used for the prediction of 11 RNA modification types, namely,m 1 A ,m 6 A ,m 5 C , 5 hmC , ψ , 2 ' - O - Me , ac 4 C ,m 7 G , A - to - I ,m 2 G , and D . This review covers the life cycle of machine learning methods to predict RNA modification sites including available benchmark datasets, feature extraction, and classification algorithms. We compare available methods in terms of datasets, target species, approach, and accuracy for each RNA modification type. Finally, we discuss the advantages and limitations of the reviewed approaches and suggest future perspectives.
Collapse
Affiliation(s)
- A. El Allali
- African Genome Center, University Mohamed VI Polytechnic, Morocco
| | - Zahra Elhamraoui
- African Genome Center, University Mohamed VI Polytechnic, Morocco
| | - Rachid Daoud
- African Genome Center, University Mohamed VI Polytechnic, Morocco
| |
Collapse
|
27
|
Li F, Guo X, Jin P, Chen J, Xiang D, Song J, Coin LJM. Porpoise: a new approach for accurate prediction of RNA pseudouridine sites. Brief Bioinform 2021; 22:6314697. [PMID: 34226915 DOI: 10.1093/bib/bbab245] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 05/19/2021] [Accepted: 06/08/2021] [Indexed: 12/14/2022] Open
Abstract
Pseudouridine is a ubiquitous RNA modification type present in eukaryotes and prokaryotes, which plays a vital role in various biological processes. Almost all kinds of RNAs are subject to this modification. However, it remains a great challenge to identify pseudouridine sites via experimental approaches, requiring expensive and time-consuming experimental research. Therefore, computational approaches that can be used to perform accurate in silico identification of pseudouridine sites from the large amount of RNA sequence data are highly desirable and can aid in the functional elucidation of this critical modification. Here, we propose a new computational approach, termed Porpoise, to accurately identify pseudouridine sites from RNA sequence data. Porpoise builds upon a comprehensive evaluation of 18 frequently used feature encoding schemes based on the selection of four types of features, including binary features, pseudo k-tuple composition, nucleotide chemical property and position-specific trinucleotide propensity based on single-strand (PSTNPss). The selected features are fed into the stacked ensemble learning framework to enable the construction of an effective stacked model. Both cross-validation tests on the benchmark dataset and independent tests show that Porpoise achieves superior predictive performance than several state-of-the-art approaches. The application of model interpretation tools demonstrates the importance of PSTNPs for the performance of the trained models. This new method is anticipated to facilitate community-wide efforts to identify putative pseudouridine sites and formulate novel testable biological hypothesis.
Collapse
Affiliation(s)
- Fuyi Li
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, the University of Melbourne, Australia
| | | | - Peipei Jin
- Department of Clinical Laboratory of Ruijin Hospital, affiliated with Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | | | - Dongxu Xiang
- Faculty of Engineering and Information Technology, The University of Melbourne, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Australia
| | - Lachlan J M Coin
- Department of Microbiology and Immunology at the University of Melbourne, Australia
| |
Collapse
|
28
|
Song Z, Huang D, Song B, Chen K, Song Y, Liu G, Su J, Magalhães JPD, Rigden DJ, Meng J. Attention-based multi-label neural networks for integrated prediction and interpretation of twelve widely occurring RNA modifications. Nat Commun 2021; 12:4011. [PMID: 34188054 PMCID: PMC8242015 DOI: 10.1038/s41467-021-24313-3] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Accepted: 06/07/2021] [Indexed: 02/08/2023] Open
Abstract
Recent studies suggest that epi-transcriptome regulation via post-transcriptional RNA modifications is vital for all RNA types. Precise identification of RNA modification sites is essential for understanding the functions and regulatory mechanisms of RNAs. Here, we present MultiRM, a method for the integrated prediction and interpretation of post-transcriptional RNA modifications from RNA sequences. Built upon an attention-based multi-label deep learning framework, MultiRM not only simultaneously predicts the putative sites of twelve widely occurring transcriptome modifications (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um), but also returns the key sequence contents that contribute most to the positive predictions. Importantly, our model revealed a strong association among different types of RNA modifications from the perspective of their associated sequence contexts. Our work provides a solution for detecting multiple RNA modifications, enabling an integrated analysis of these RNA modifications, and gaining a better understanding of sequence-based RNA modification mechanisms.
Collapse
Affiliation(s)
- Zitao Song
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, PR China
| | - Daiyun Huang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, PR China.
- Department of Computer Sciences, University of Liverpool, Liverpool, United Kingdom.
| | - Bowen Song
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, PR China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Kunqi Chen
- Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, PR China
| | - Yiyou Song
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, PR China
| | - Gang Liu
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, PR China
| | - Jionglong Su
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi'an Jiaotong-Liverpool University, Suzhou, PR China
| | | | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, PR China.
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom.
- AI University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou, PR China.
| |
Collapse
|
29
|
Schauerte M, Pozhydaieva N, Höfer K. Shaping the Bacterial Epitranscriptome-5'-Terminal and Internal RNA Modifications. Adv Biol (Weinh) 2021; 5:e2100834. [PMID: 34121369 DOI: 10.1002/adbi.202100834] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 05/07/2021] [Indexed: 11/11/2022]
Abstract
All domains of life utilize a diverse set of modified ribonucleotides that can impact the sequence, structure, function, stability, and the fate of RNAs, as well as their interactions with other molecules. Today, more than 160 different RNA modifications are known that decorate the RNA at the 5'-terminus or internal RNA positions. The boost of next-generation sequencing technologies sets the foundation to identify and study the functional role of RNA modifications. The recent advances in the field of RNA modifications reveal a novel regulatory layer between RNA modifications and proteins, which is central to developing a novel concept called "epitranscriptomics." The majority of RNA modifications studies focus on the eukaryotic epitranscriptome. In contrast, RNA modifications in prokaryotes are poorly characterized. This review outlines the current knowledge of the prokaryotic epitranscriptome focusing on mRNA modifications. Here, it is described that several internal and 5'-terminal RNA modifications either present or likely present in prokaryotic mRNA. Thereby, the individual techniques to identify these epitranscriptomic modifications, their writers, readers and erasers, and their proposed functions are explored. Besides that, still unanswered questions in the field of prokaryotic epitranscriptomics are pointed out, and its future perspectives in the dawn of next-generation sequencing technologies are outlined.
Collapse
Affiliation(s)
- Maik Schauerte
- Max-Planck-Institute for terrestrial Microbiology, Marburg, Hessen, 35043, Germany
| | - Nadiia Pozhydaieva
- Max-Planck-Institute for terrestrial Microbiology, Marburg, Hessen, 35043, Germany
| | - Katharina Höfer
- Max-Planck-Institute for terrestrial Microbiology, Marburg, Hessen, 35043, Germany
| |
Collapse
|
30
|
Zhang SY, Zhang SW, Zhang T, Fan XN, Meng J. Recent advances in functional annotation and prediction of the epitranscriptome. Comput Struct Biotechnol J 2021; 19:3015-3026. [PMID: 34136099 PMCID: PMC8175281 DOI: 10.1016/j.csbj.2021.05.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 05/16/2021] [Accepted: 05/18/2021] [Indexed: 12/17/2022] Open
Abstract
RNA modifications, in particular N6-methyladenosine (m6A), participate in every stages of RNA metabolism and play diverse roles in essential biological processes and disease pathogenesis. Thanks to the advances in sequencing technology, tens of thousands of RNA modification sites can be identified in a typical high-throughput experiment; however, it remains a major challenge to decipher the functional relevance of these sites, such as, affecting alternative splicing, regulation circuit in essential biological processes or association to diseases. As the focus of RNA epigenetics gradually shifts from site discovery to functional studies, we review here recent progress in functional annotation and prediction of RNA modification sites from a bioinformatics perspective. The review covers naïve annotation with associated biological events, e.g., single nucleotide polymorphism (SNP), RNA binding protein (RBP) and alternative splicing, prediction of key sites and their regulatory functions, inference of disease association, and mining the diagnosis and prognosis value of RNA modification regulators. We further discussed the limitations of existing approaches and some future perspectives.
Collapse
Affiliation(s)
- Song-Yao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Teng Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Xiao-Nan Fan
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| |
Collapse
|
31
|
Epigenetics: Roles and therapeutic implications of non-coding RNA modifications in human cancers. MOLECULAR THERAPY. NUCLEIC ACIDS 2021; 25:67-82. [PMID: 34188972 PMCID: PMC8217334 DOI: 10.1016/j.omtn.2021.04.021] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
As next-generation sequencing (NGS) is leaping forward, more than 160 covalent RNA modification processes have been reported, and they are widely present in every organism and overall RNA type. Many modification processes of RNA introduce a new layer to the gene regulation process, resulting in novel RNA epigenetics. The commonest RNA modification includes pseudouridine (Ψ), N 7-methylguanosine (m7G), 5-hydroxymethylcytosine (hm5C), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), N 6-methyladenosine (m6A), and others. In this study, we focus on non-coding RNAs (ncRNAs) to summarize the epigenetic consequences of RNA modifications, and the pathogenesis of cancer, as diagnostic markers and therapeutic targets for cancer, as well as the mechanisms affecting the immune environment of cancer. In addition, we summarize the current status of epigenetic drugs for tumor therapy based on ncRNA modifications and the progress of bioinformatics methods in elucidating RNA modifications in recent years.
Collapse
|
32
|
Aziz AZB, Hasan MAM, Shin J. Identification of RNA pseudouridine sites using deep learning approaches. PLoS One 2021; 16:e0247511. [PMID: 33621235 PMCID: PMC7901771 DOI: 10.1371/journal.pone.0247511] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 02/08/2021] [Indexed: 01/05/2023] Open
Abstract
Pseudouridine(Ψ) is widely popular among various RNA modifications which have been confirmed to occur in rRNA, mRNA, tRNA, and nuclear/nucleolar RNA. Hence, identifying them has vital significance in academic research, drug development and gene therapies. Several laboratory techniques for Ψ identification have been introduced over the years. Although these techniques produce satisfactory results, they are costly, time-consuming and requires skilled experience. As the lengths of RNA sequences are getting longer day by day, an efficient method for identifying pseudouridine sites using computational approaches is very important. In this paper, we proposed a multi-channel convolution neural network using binary encoding. We employed k-fold cross-validation and grid search to tune the hyperparameters. We evaluated its performance in the independent datasets and found promising results. The results proved that our method can be used to identify pseudouridine sites for associated purposes. We have also implemented an easily accessible web server at http://103.99.176.239/ipseumulticnn/.
Collapse
Affiliation(s)
- Abu Zahid Bin Aziz
- Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
- * E-mail:
| | - Md. Al Mehedi Hasan
- Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
| | - Jungpil Shin
- School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu, Japan
| |
Collapse
|
33
|
Zhuang J, Liu D, Lin M, Qiu W, Liu J, Chen S. PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm. Front Genet 2021; 12:773882. [PMID: 34868261 PMCID: PMC8637112 DOI: 10.3389/fgene.2021.773882] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 10/04/2021] [Indexed: 11/16/2022] Open
Abstract
Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming. Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification. Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep.
Collapse
Affiliation(s)
- Jujuan Zhuang
- College of Science, Dalian Maritime University, Dalian, China
| | - Danyang Liu
- College of Science, Dalian Maritime University, Dalian, China
| | - Meng Lin
- College of Science, Dalian Maritime University, Dalian, China
| | - Wenjing Qiu
- Electrical and Information Engineering, Anhui University of Technology, Anhui, China
- Geneis (Beijing) Co., Ltd., Beijing, China
| | | | - Size Chen
- Department of Oncology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
- Guangdong Provincial Engineering Research Center for Esophageal Cancer Precise Therapy, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
- Central Laboratory, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
- *Correspondence: Size Chen,
| |
Collapse
|
34
|
Ao C, Yu L, Zou Q. Prediction of bio-sequence modifications and the associations with diseases. Brief Funct Genomics 2020; 20:1-18. [PMID: 33313647 DOI: 10.1093/bfgp/elaa023] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 11/09/2020] [Accepted: 11/10/2020] [Indexed: 12/22/2022] Open
Abstract
Modifications of protein, RNA and DNA play an important role in many biological processes and are related to some diseases. Therefore, accurate identification and comprehensive understanding of protein, RNA and DNA modification sites can promote research on disease treatment and prevention. With the development of sequencing technology, the number of known sequences has continued to increase. In the past decade, many computational tools that can be used to predict protein, RNA and DNA modification sites have been developed. In this review, we comprehensively summarized the modification site predictors for three different biological sequences and the association with diseases. The relevant web server is accessible at http://lab.malab.cn/∼acy/PTM_data/ some sample data on protein, RNA and DNA modification can be downloaded from that website.
Collapse
|
35
|
Jiang J, Song B, Tang Y, Chen K, Wei Z, Meng J. m5UPred: A Web Server for the Prediction of RNA 5-Methyluridine Sites from Sequences. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 22:742-747. [PMID: 33230471 PMCID: PMC7595847 DOI: 10.1016/j.omtn.2020.09.031] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 09/25/2020] [Indexed: 11/16/2022]
Abstract
As one of the widely occurring RNA modifications, 5-methyluridine (m5U) has recently been shown to play critical roles in various biological functions and disease pathogenesis, such as under stress response and during breast cancer development. Precise identification of m5U sites on RNA is vital for the understanding of the regulatory mechanisms of RNA life. We present here m5UPred, the first web server for in silico identification of m5U sites from the primary sequences of RNA. Built upon the support vector machine (SVM) algorithm and the biochemical encoding scheme, m5UPred achieved reasonable prediction performance with the area under the receiver operating characteristic curve (AUC) greater than 0.954 by 5-fold cross-validation and independent testing datasets. To critically test and validate the performance of our newly proposed predictor, the experimentally validated m5U sites were further separated by high-throughput sequencing techniques (miCLIP-Seq and FICC-Seq) and cell types (HEK293 and HAP1). When tested on cross-technique and cross-cell-type validation using independent datasets, m5UPred achieved an average AUC of 0.922 and 0.926 under mature mRNA mode, respectively, showing reasonable accuracy and reliability. The m5UPred web server is freely accessible now and it should make a useful tool for the researchers who are interested in m5U RNA modification.
Collapse
Affiliation(s)
- Jie Jiang
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX, Liverpool, UK
| | - Bowen Song
- Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX, Liverpool, UK
| | - Yujiao Tang
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX, Liverpool, UK
| | - Kunqi Chen
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, UK
| | - Zhen Wei
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX, Liverpool, UK
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX, Liverpool, UK
| |
Collapse
|
36
|
Liu K, Chen W. iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics 2020; 36:3336-3342. [PMID: 32134472 DOI: 10.1093/bioinformatics/btaa155] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/26/2020] [Accepted: 02/28/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION RNA modifications play critical roles in a series of cellular and developmental processes. Knowledge about the distributions of RNA modifications in the transcriptomes will provide clues to revealing their functions. Since experimental methods are time consuming and laborious for detecting RNA modifications, computational methods have been proposed for this aim in the past five years. However, there are some drawbacks for both experimental and computational methods in simultaneously identifying modifications occurred on different nucleotides. RESULTS To address such a challenge, in this article, we developed a new predictor called iMRM, which is able to simultaneously identify m6A, m5C, m1A, ψ and A-to-I modifications in Homo sapiens, Mus musculus and Saccharomyces cerevisiae. In iMRM, the feature selection technique was used to pick out the optimal features. The results from both 10-fold cross-validation and jackknife test demonstrated that the performance of iMRM is superior to existing methods for identifying RNA modifications. AVAILABILITY AND IMPLEMENTATION A user-friendly web server for iMRM was established at http://www.bioml.cn/XG_iRNA/home. The off-line command-line version is available at https://github.com/liukeweiaway/iMRM. CONTACT greatchen@ncst.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kewei Liu
- School of Life Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063009, China
| | - Wei Chen
- School of Life Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063009, China.,Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| |
Collapse
|
37
|
Khan SM, He F, Wang D, Chen Y, Xu D. MU-PseUDeep: A deep learning method for prediction of pseudouridine sites. Comput Struct Biotechnol J 2020; 18:1877-1883. [PMID: 32774783 PMCID: PMC7387732 DOI: 10.1016/j.csbj.2020.07.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 07/09/2020] [Accepted: 07/10/2020] [Indexed: 01/18/2023] Open
Abstract
Pseudouridine synthase binds to uridine sites and catalyzes the conversion of uridine to pseudouridine (Ψ). This binding takes place in a specific context and in the conformation of nucleotides. Most machine-learning methods for Ψ site classification use nucleotide frequency as a feature, which may not fully depict the relevant conformation around a Ψ site. Using the power of deep learning and raw sequence, as well as secondary structure features, our tool MU-PseUDeep is designed to capture both the sequence and secondary structure context, which inputs the raw RNA sequence and the predicted secondary structure to two sets of convolutional neural networks. It has shown considerable improvement in Ψ site prediction over existing tools, XG-PseU, PseUI, and iRNA-PseU for both balanced and imbalanced datasets. To the best of our knowledge, this is the most accurate tool for Ψ site prediction. We also used MU-PseUDeep to scan the human transcriptome, which shows that the genes with predicted Ψ sites are enriched in nucleotide and protein binding, as well as in neurodegeneration pathways. The tool is open source, available at https://github.com/smk5g5/MU-PseUDeep.
Collapse
Affiliation(s)
- Saad M. Khan
- Informatics Institute, University of Missouri, Columbia, MO 65211, United States
| | - Fei He
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, United States
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, United States
| | - Yongbing Chen
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Dong Xu
- Informatics Institute, University of Missouri, Columbia, MO 65211, United States
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, United States
- Corresponding author.
| |
Collapse
|
38
|
Mathlin J, Le Pera L, Colombo T. A Census and Categorization Method of Epitranscriptomic Marks. Int J Mol Sci 2020; 21:ijms21134684. [PMID: 32630140 PMCID: PMC7370119 DOI: 10.3390/ijms21134684] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 06/26/2020] [Accepted: 06/27/2020] [Indexed: 12/21/2022] Open
Abstract
In the past few years, thorough investigation of chemical modifications operated in the cells on ribonucleic acid (RNA) molecules is gaining momentum. This new field of research has been dubbed “epitranscriptomics”, in analogy to best-known epigenomics, to stress the potential of ensembles of RNA modifications to constitute a post-transcriptional regulatory layer of gene expression orchestrated by writer, reader, and eraser RNA-binding proteins (RBPs). In fact, epitranscriptomics aims at identifying and characterizing all functionally relevant changes involving both non-substitutional chemical modifications and editing events made to the transcriptome. Indeed, several types of RNA modifications that impact gene expression have been reported so far in different species of cellular RNAs, including ribosomal RNAs, transfer RNAs, small nuclear RNAs, messenger RNAs, and long non-coding RNAs. Supporting functional relevance of this largely unknown regulatory mechanism, several human diseases have been associated directly to RNA modifications or to RBPs that may play as effectors of epitranscriptomic marks. However, an exhaustive epitranscriptome’s characterization, aimed to systematically classify all RNA modifications and clarify rules, actors, and outcomes of this promising regulatory code, is currently not available, mainly hampered by lack of suitable detecting technologies. This is an unfortunate limitation that, thanks to an unprecedented pace of technological advancements especially in the sequencing technology field, is likely to be overcome soon. Here, we review the current knowledge on epitranscriptomic marks and propose a categorization method based on the reference ribonucleotide and its rounds of modifications (“stages”) until reaching the given modified form. We believe that this classification scheme can be useful to coherently organize the expanding number of discovered RNA modifications.
Collapse
Affiliation(s)
- Julia Mathlin
- Department of Life Sciences and Medicine, University of Luxembourg, L-4367 Belvaux, Luxembourg
- Correspondence: (J.M.); (L.L.P.); Tel.: +39-06-4991-0556 (L.L.P.)
| | - Loredana Le Pera
- CNR-Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), 70126 Bari, Italy
- CNR-Institute of Molecular Biology and Pathology (IBPM), 00185 Rome, Italy;
- Correspondence: (J.M.); (L.L.P.); Tel.: +39-06-4991-0556 (L.L.P.)
| | - Teresa Colombo
- CNR-Institute of Molecular Biology and Pathology (IBPM), 00185 Rome, Italy;
| |
Collapse
|
39
|
Liu L, Song B, Ma J, Song Y, Zhang SY, Tang Y, Wu X, Wei Z, Chen K, Su J, Rong R, Lu Z, de Magalhães JP, Rigden DJ, Zhang L, Zhang SW, Huang Y, Lei X, Liu H, Meng J. Bioinformatics approaches for deciphering the epitranscriptome: Recent progress and emerging topics. Comput Struct Biotechnol J 2020; 18:1587-1604. [PMID: 32670500 PMCID: PMC7334300 DOI: 10.1016/j.csbj.2020.06.010] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 06/02/2020] [Accepted: 06/07/2020] [Indexed: 12/13/2022] Open
Abstract
Post-transcriptional RNA modification occurs on all types of RNA and plays a vital role in regulating every aspect of RNA function. Thanks to the development of high-throughput sequencing technologies, transcriptome-wide profiling of RNA modifications has been made possible. With the accumulation of a large number of high-throughput datasets, bioinformatics approaches have become increasing critical for unraveling the epitranscriptome. We review here the recent progress in bioinformatics approaches for deciphering the epitranscriptomes, including epitranscriptome data analysis techniques, RNA modification databases, disease-association inference, general functional annotation, and studies on RNA modification site prediction. We also discuss the limitations of existing approaches and offer some future perspectives.
Collapse
Affiliation(s)
- Lian Liu
- School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi 710119, China
| | - Bowen Song
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jiani Ma
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Yi Song
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Song-Yao Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, China
| | - Yujiao Tang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Xiangyu Wu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Kunqi Chen
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Jionglong Su
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
| | - Rong Rong
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Zhiliang Lu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - João Pedro de Magalhães
- Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX, Liverpool, United Kingdom
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Shao-Wu Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, 78249, USA
- Department of Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Xiujuan Lei
- School of Computer Sciences, Shannxi Normal University, Xi’an, Shaanxi 710119, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| |
Collapse
|
40
|
Abstract
Background:Pseudouridine (Ψ) is the most abundant RNA modification and has important functions in a series of biological and cellular processes. Although experimental techniques have made great contributions to identify Ψ sites, they are still labor-intensive and costineffective. In the past few years, a series of computational approaches have been developed, which provided rapid and efficient approaches to identify Ψ sites.Results:To provide the readership with a clear landscape about the recent development in this important area, in this review, we summarized and compared the representative computational approaches developed for identifying Ψ sites. Moreover, future directions in computationally identifying Ψ sites were discussed as well.Conclusion:We anticipate that this review will provide novel insights into the researches on pseudouridine modification.
Collapse
Affiliation(s)
- Wei Chen
- School of Life Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063210, China
| | - Kewei Liu
- School of Life Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063210, China
| |
Collapse
|
41
|
Song B, Chen K, Tang Y, Ma J, Meng J, Wei Z. PSI-MOUSE: Predicting Mouse Pseudouridine Sites From Sequence and Genome-Derived Features. Evol Bioinform Online 2020; 16:1176934320925752. [PMID: 32565674 PMCID: PMC7285933 DOI: 10.1177/1176934320925752] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 03/30/2020] [Indexed: 12/04/2022] Open
Abstract
Pseudouridine (Ψ) is the first discovered and the most prevalent posttranscriptional modification, which has been widely studied during the past decades. Pseudouridine was observed in almost all kinds of RNAs and shown to have important biological functions. Currently, the time-consuming and high-cost procedures of experimental approaches limit its uses in real-life Ψ site detection. Alternatively, by taking advantage of the explosive growth of Ψ sequencing data, the computational methods may provide a more cost-effective avenue. To date, the existing mouse Ψ site predictors were all developed based on sequence-derived features, and their performance can be further improved by adding the domain knowledge derived feature. Therefore, it is highly desirable to propose a genomic feature-based computational method to increase the accuracy and efficiency of the identification of Ψ RNA modification in the mouse transcriptome. In our study, a predictive framework PSI-MOUSE was built. Besides the conventional sequence-based features, PSI-MOUSE first introduced 38 additional genomic features derived from the mouse genome, which achieved a satisfactory improvement in the prediction performance, compared with other existing models. Moreover, PSI-MOUSE also features in automatically annotating the putative Ψ sites with diverse types of posttranscriptional regulations (RNA-binding protein [RBP]-binding regions, miRNA-RNA interactions, and splicing sites), which can serve as a useful research tool for the study of Ψ RNA modification in the mouse genome. Finally, 3282 experimentally validated mouse Ψ sites were also collected in a database with customized query functions. For the convenience of academic users, a website was built to provide a user-friendly interface for the query and analysis on the database. The website is freely accessible at www.xjtlu.edu.cn/biologicalsciences/psimouse and http://psimouse.rnamd.com. We introduced the genome-derived features to mouse for the first time, and we achieved a good performance in mouse Ψ site prediction. Compared with the existing state-of-art methods, our newly developed approach PSI-MOUSE obtained a substantial improvement in prediction accuracy, marking the reliable contributions of genomic features for the prediction of RNA modifications in a species other than human.
Collapse
Affiliation(s)
- Bowen Song
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, China
| | - Kunqi Chen
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, China
| | - Yujiao Tang
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, China
| | - Jialin Ma
- Cancer Genome Computational Analysis, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, China
| | - Zhen Wei
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, China
| |
Collapse
|
42
|
iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model. Interdiscip Sci 2020; 12:193-203. [PMID: 32170573 DOI: 10.1007/s12539-020-00362-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 02/16/2020] [Accepted: 02/19/2020] [Indexed: 01/28/2023]
Abstract
Pseudouridine represents one of the most prevalent post-transcriptional RNA modifications. The identification of pseudouridine sites is an essential step toward understanding RNA functions, RNA structure stabilization, translation process, and RNA stability; however, high-throughput experimental techniques remain expensive and time-consuming in lab explorations and biochemical processes. Thus, how to develop an efficient pseudouridine site identification method based on machine learning is very important both in academic research and drug development. Motived by this, we present an effective layered ensemble model designated as iPseU-Layer for identification of RNA pseudouridine sites. The proposed iPseU-Layer approach is essentially based on three different machine learning layers including: feature selection layer, feature extraction and fusion layer, and prediction layer. The feature selection layer reduces the dimensionality, which can be regarded as a data pre-processing stage. The feature extraction and fusion layer utilizes an ensemble method which is implemented through various machine learning algorithms to generate some outputs. The prediction layer applies classic random forest to identify the final results. Furthermore, we systematically conduct the validation experiments using cross-validation tests and independent test with the current state-of-the-art models. The proposed iPseU-Layer provides a promising predictive performance in terms of sensitivity, specificity, accuracy and Matthews correlation coefficient. Collectively, these findings indicate that the framework of iPseU-Layer is a feasible and effective strategy for the prediction of RNA pseudouridine sites.
Collapse
|
43
|
Song B, Tang Y, Wei Z, Liu G, Su J, Meng J, Chen K. PIANO: A Web Server for Pseudouridine-Site (Ψ) Identification and Functional Annotation. Front Genet 2020; 11:88. [PMID: 32226440 PMCID: PMC7080813 DOI: 10.3389/fgene.2020.00088] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 01/27/2020] [Indexed: 12/04/2022] Open
Abstract
Known as the "fifth RNA nucleotide", pseudouridine (Ψ or psi) is the first-discovered and most abundant RNA modification occurring at the Uridine site, and it plays a prominent role in a number of biological processes. Thousands of Ψ sites have been identified within different biological contexts thanks to the advancement in high-throughput sequencing technology; nevertheless, the transcriptome-wide distribution, biomolecular functions, regulatory mechanisms, and disease relevance of pseudouridylation are largely elusive. We report here a web server-PIANO-for pseudouridine site (Ψ) identification and functional annotation. PIANO was built upon a high-accuracy predictor that takes advantage of both conventional sequence features and 42 additional genomic features. When tested on six independent datasets generated from four independent Ψ-profiling technologies (Ψ-seq, RBS-seq, Pseudo-seq, and CeU-seq) as benchmarks, PIANO achieved an average AUC of 0.955 and 0.838 under the full transcript and mature mRNA models, respectively, marking a substantial improvement in accuracy compared to the existing in silico Ψ-site prediction methods, i.e., PPUS (0.713 and 0.707), iRNA-PseU (0.713 and 0.712), and PseUI (0.634 and 0.652). Besides, PIANO web server systematically annotates the predicted Ψ sites with post-transcriptional regulatory mechanisms (miRNA-targets, RBP-binding regions, and splicing sites) in its prediction report to help the users explore potential machinery of Ψ. Moreover, a concise query interface was also built for 4,303 known Ψ sites, which is currently the largest collection of experimentally validated human Ψ sites. The PIANO website is freely accessible at: http://piano.rnamd.com.
Collapse
Affiliation(s)
- Bowen Song
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Yujiao Tang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of Ageing & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| | - Gang Liu
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Jionglong Su
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Kunqi Chen
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
- Institute of Ageing & Chronic Disease, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
44
|
Dou L, Li X, Ding H, Xu L, Xiang H. Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem? MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 19:293-303. [PMID: 31865116 PMCID: PMC6931122 DOI: 10.1016/j.omtn.2019.11.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/29/2019] [Accepted: 11/11/2019] [Indexed: 01/01/2023]
Abstract
Pseudouridine (Ψ) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, Ψ sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Ψ sites, they are expensive and time consuming, especially in the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Ψ sites on uncharacterized RNA sequences. Several predictors have been proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identified Ψ sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated using 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 datasets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the combined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Ψ identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the corresponding accuracies are generally in the range of 60%-70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Ψ modification prediction problem.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China.
| |
Collapse
|
45
|
Lv Z, Zhang J, Ding H, Zou Q. RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites. Front Bioeng Biotechnol 2020; 8:134. [PMID: 32175316 PMCID: PMC7054385 DOI: 10.3389/fbioe.2020.00134] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 02/10/2020] [Indexed: 12/21/2022] Open
Abstract
One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Several useful machine learning approaches have become available recently, with the increasing progress of next-generation sequencing technology; however, existing methods cannot predict sites with high accuracy. Thus, a more accurate predictor is required. In this study, a random forest-based predictor named RF-PseU is proposed for prediction of pseudouridylation sites. To optimize feature representation and obtain a better model, the light gradient boosting machine algorithm and incremental feature selection strategy were used to select the optimum feature space vector for training the random forest model RF-PseU. Compared with previous state-of-the-art predictors, the results on the same benchmark data sets of three species demonstrate that RF-PseU performs better overall. The integrated average leave-one-out cross-validation and independent testing accuracy scores were 71.4% and 74.7%, respectively, representing increments of 3.63% and 4.77% versus the best existing predictor. Moreover, the final RF-PseU model for prediction was built on leave-one-out cross-validation and provides a reliable and robust tool for identifying pseudouridine sites. A web server with a user-friendly interface is accessible at http://148.70.81.170:10228/rfpseu.
Collapse
Affiliation(s)
- Zhibin Lv
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
46
|
Abstract
Background Pseudouridine modification is most commonly found among various kinds of RNA modification occurred in both prokaryotes and eukaryotes. This biochemical event has been proved to occur in multiple types of RNAs, including rRNA, mRNA, tRNA, and nuclear/nucleolar RNA. Hence, gaining a holistic understanding of pseudouridine modification can contribute to the development of drug discovery and gene therapies. Although some laboratory techniques have come up with moderately good outcomes in pseudouridine identification, they are costly and required skilled work experience. We propose iPseU-NCP – an efficient computational framework to predict pseudouridine sites using the Random Forest (RF) algorithm combined with nucleotide chemical properties (NCP) generated from RNA sequences. The benchmark dataset collected from Chen et al. (2016) was used to develop iPseU-NCP and fairly compare its performances with other methods. Results Under the same experimental settings, comparing with three state-of-the-art methods including iPseU-CNN, PseUI, and iRNA-PseU, the Matthew’s correlation coefficient (MCC) of our model increased by about 20.0%, 55.0%, and 109.0% when tested on the H. sapiens (H_200) dataset and by about 6.5%, 35.0%, and 150.0% when tested on the S. cerevisiae (S_200) dataset, respectively. This significant growth in MCC is very important since it ensures the stability and performance of our model. With those two independent test datasets, our model also presented higher accuracy with a success rate boosted by 7.0%, 13.0%, and 20.0% and 2.0%, 9.5%, and 25.0% when compared to iPseU-CNN, PseUI, and iRNA-PseU, respectively. For majority of other evaluation metrics, iPseU-NCP demonstrated superior performance as well. Conclusions iPseU-NCP combining the RF and NPC-encoded features showed better performances than other existing state-of-the-art methods in the identification of pseudouridine sites. This also shows an optimistic view in addressing biological issues related to human diseases.
Collapse
|
47
|
Chen Z, Zhao P, Li F, Wang Y, Smith AI, Webb GI, Akutsu T, Baggag A, Bensmail H, Song J. Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences. Brief Bioinform 2019; 21:1676-1696. [DOI: 10.1093/bib/bbz112] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 07/31/2019] [Accepted: 08/07/2019] [Indexed: 12/14/2022] Open
Abstract
Abstract
RNA post-transcriptional modifications play a crucial role in a myriad of biological processes and cellular functions. To date, more than 160 RNA modifications have been discovered; therefore, accurate identification of RNA-modification sites is fundamental for a better understanding of RNA-mediated biological functions and mechanisms. However, due to limitations in experimental methods, systematic identification of different types of RNA-modification sites remains a major challenge. Recently, more than 20 computational methods have been developed to identify RNA-modification sites in tandem with high-throughput experimental methods, with most of these capable of predicting only single types of RNA-modification sites. These methods show high diversity in their dataset size, data quality, core algorithms, features extracted and feature selection techniques and evaluation strategies. Therefore, there is an urgent need to revisit these methods and summarize their methodologies, in order to improve and further develop computational techniques to identify and characterize RNA-modification sites from the large amounts of sequence data. With this goal in mind, first, we provide a comprehensive survey on a large collection of 27 state-of-the-art approaches for predicting N1-methyladenosine and N6-methyladenosine sites. We cover a variety of important aspects that are crucial for the development of successful predictors, including the dataset quality, operating algorithms, sequence and genomic features, feature selection, model performance evaluation and software utility. In addition, we also provide our thoughts on potential strategies to improve the model performance. Second, we propose a computational approach called DeepPromise based on deep learning techniques for simultaneous prediction of N1-methyladenosine and N6-methyladenosine. To extract the sequence context surrounding the modification sites, three feature encodings, including enhanced nucleic acid composition, one-hot encoding, and RNA embedding, were used as the input to seven consecutive layers of convolutional neural networks (CNNs), respectively. Moreover, DeepPromise further combined the prediction score of the CNN-based models and achieved around 43% higher area under receiver-operating curve (AUROC) for m1A site prediction and 2–6% higher AUROC for m6A site prediction, respectively, when compared with several existing state-of-the-art approaches on the independent test. In-depth analyses of characteristic sequence motifs identified from the convolution-layer filters indicated that nucleotide presentation at proximal positions surrounding the modification sites contributed most to the classification, whereas those at distal positions also affected classification but to different extents. To maximize user convenience, a web server was developed as an implementation of DeepPromise and made publicly available at http://DeepPromise.erc.monash.edu/, with the server accepting both RNA sequences and genomic sequences to allow prediction of two types of putative RNA-modification sites.
Collapse
Affiliation(s)
- Zhen Chen
- School of BasicMedical Science, Qingdao University, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Fuyi Li
- Northwest A&F University, China
| | | | - A Ian Smith
- Prince Henrys Institute Melbourne and Monash University, Australia
| | | | | | - Abdelkader Baggag
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Victoria 3800, Australia
| |
Collapse
|
48
|
Song Y, Xu Q, Wei Z, Zhen D, Su J, Chen K, Meng J. Predict Epitranscriptome Targets and Regulatory Functions of N 6-Methyladenosine (m 6A) Writers and Erasers. Evol Bioinform Online 2019; 15:1176934319871290. [PMID: 31523126 PMCID: PMC6728658 DOI: 10.1177/1176934319871290] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 07/31/2019] [Indexed: 12/13/2022] Open
Abstract
Currently, although many successful bioinformatics efforts have been reported in the epitranscriptomics field for N 6-methyladenosine (m6A) site identification, none is focused on the substrate specificity of different m6A-related enzymes, ie, the methyltransferases (writers) and demethylases (erasers). In this work, to untangle the target specificity and the regulatory functions of different RNA m6A writers (METTL3-METT14 and METTL16) and erasers (ALKBH5 and FTO), we extracted 49 genomic features along with the conventional sequence features and used the machine learning approach of random forest to predict their epitranscriptome substrates. Our method achieved reasonable performance on both the writer target prediction (as high as 0.918) and the eraser target prediction (as high as 0.888) in a 5-fold cross-validation, and results of the gene ontology analysis of their preferential targets further revealed the functional relevance of different RNA methylation writers and erasers.
Collapse
Affiliation(s)
- Yiyou Song
- Department of Biological Sciences, Xi’an
Jiaotong-Liverpool University, Suzhou, China
| | - Qingru Xu
- Department of Biological Sciences, Xi’an
Jiaotong-Liverpool University, Suzhou, China
| | - Zhen Wei
- Department of Biological Sciences, Xi’an
Jiaotong-Liverpool University, Suzhou, China
- Department of Mathematical Sciences,
Xi’an Jiaotong-Liverpool University, Suzhou, China
| | - Di Zhen
- Department of Biological Sciences, Xi’an
Jiaotong-Liverpool University, Suzhou, China
| | - Jionglong Su
- Department of Mathematical Sciences,
Xi’an Jiaotong-Liverpool University, Suzhou, China
- Research Center for Precision Medicine,
Xi’an Jiaotong-Liverpool University, Suzhou, China
| | - Kunqi Chen
- Department of Biological Sciences, Xi’an
Jiaotong-Liverpool University, Suzhou, China
- Institute of Ageing and Chronic Disease,
University of Liverpool, Liverpool, UK
| | - Jia Meng
- Research Center for Precision Medicine,
Xi’an Jiaotong-Liverpool University, Suzhou, China
- Institute of Integrative Biology,
University of Liverpool, Liverpool, UK
| |
Collapse
|
49
|
Liu K, Chen W, Lin H. XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites. Mol Genet Genomics 2019; 295:13-21. [DOI: 10.1007/s00438-019-01600-9] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Accepted: 07/29/2019] [Indexed: 01/08/2023]
|
50
|
Barraud P, Tisné C. To be or not to be modified: Miscellaneous aspects influencing nucleotide modifications in tRNAs. IUBMB Life 2019; 71:1126-1140. [PMID: 30932315 PMCID: PMC6850298 DOI: 10.1002/iub.2041] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 03/10/2019] [Indexed: 12/12/2022]
Abstract
Transfer RNAs (tRNAs) are essential components of the cellular protein synthesis machineries, but are also implicated in many roles outside translation. To become functional, tRNAs, initially transcribed as longer precursor tRNAs, undergo a tightly controlled biogenesis process comprising the maturation of their extremities, removal of intronic sequences if present, addition of the 3'-CCA amino-acid accepting sequence, and aminoacylation. In addition, the most impressive feature of tRNA biogenesis consists in the incorporation of a large number of posttranscriptional chemical modifications along its sequence. The chemical nature of these modifications is highly diverse, with more than hundred different modifications identified in tRNAs to date. All functions of tRNAs in cells are controlled and modulated by modifications, making the understanding of the mechanisms that determine and influence nucleotide modifications in tRNAs an essential point in tRNA biology. This review describes the different aspects that determine whether a certain position in a tRNA molecule is modified or not. We describe how sequence and structural determinants, as well as the presence of prior modifications control modification processes. We also describe how environmental factors and cellular stresses influence the level and/or the nature of certain modifications introduced in tRNAs, and report situations where these dynamic modulations of tRNA modification levels are regulated by active demodification processes. © 2019 IUBMB Life, 71(8):1126-1140, 2019.
Collapse
Affiliation(s)
- Pierre Barraud
- Expression génétique microbienneInstitut de biologie physico‐chimique (IBPC), UMR 8261, CNRS, Université Paris DiderotParisFrance
| | - Carine Tisné
- Expression génétique microbienneInstitut de biologie physico‐chimique (IBPC), UMR 8261, CNRS, Université Paris DiderotParisFrance
| |
Collapse
|