1
|
Jiang D, Ao C, Li Y, Yu L. Feadm5C: Enhancing prediction of RNA 5-Methylcytosine modification sites with physicochemical molecular graph features. Genomics 2025; 117:111037. [PMID: 40127825 DOI: 10.1016/j.ygeno.2025.111037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 11/04/2024] [Accepted: 03/20/2025] [Indexed: 03/26/2025]
Abstract
One common post-transcriptional modification that is essential to biological activities is RNA 5-methylcytosine (m5C). A large amount of RNA data containing m5C modification sites has been gathered as a result of the rapid development of high-throughput sequencing technology. While there are a lot of machine learning based techniques available for identifying m5C alteration sites, these models' accuracy still has to be raised. This study proposed a novel method, Feadm5C, which predicts m5C based on fusing molecular graph features and sequencing information together. 10-fold cross-validation was used to assess the model's predictive performance. In addition, we used t-SNE visualization to assess the model's stability and effectiveness. While keeping feature encoding and model structure straightforward, the approach suggested in this work outperforms the most recent approaches in use. The dataset and code of the model can be downloaded from GitHub (https://github.com/LiangYu-Xidian/Feadm5C).
Collapse
Affiliation(s)
- Dongdong Jiang
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Yan Li
- School of Management, Xi'an Polytechnic University, Xi'an 710000, Shaanxi, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China.
| |
Collapse
|
2
|
Artika IM, Arianti R, Demény MÁ, Kristóf E. RNA modifications and their role in gene expression. Front Mol Biosci 2025; 12:1537861. [PMID: 40351534 PMCID: PMC12061695 DOI: 10.3389/fmolb.2025.1537861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Accepted: 04/02/2025] [Indexed: 05/14/2025] Open
Abstract
Post-transcriptional RNA modifications have recently emerged as critical regulators of gene expression programs. Understanding normal tissue development and disease susceptibility requires knowledge of the various cellular mechanisms which control gene expression in multicellular organisms. Research into how different RNA modifications such as in N6-methyladenosine (m6A), inosine (I), 5-methylcytosine (m5C), pseudouridine (Ψ), 5-hydroxymethylcytosine (hm5C), N1-methyladenosine (m1A), N6,2'-O-dimethyladenosine (m6Am), 2'-O-methylation (Nm), N7-methylguanosine (m7G) etc. affect the expression of genes could be valuable. This review highlights the current understanding of RNA modification, methods used to study RNA modification, types of RNA modification, and molecular mechanisms underlying RNA modification. The role of RNA modification in modulating gene expression in both physiological and diseased states is discussed. The potential applications of RNA modification in therapeutic development are elucidated.
Collapse
Affiliation(s)
- I. Made Artika
- Department of Biochemistry, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University, Bogor, Indonesia
| | - Rini Arianti
- Laboratory of Cell Biochemistry, Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
- Universitas Muhammadiyah Bangka Belitung, Pangkalpinang, Indonesia
| | - Máté Á. Demény
- Department of Medical Chemistry, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Endre Kristóf
- Laboratory of Cell Biochemistry, Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| |
Collapse
|
3
|
Sheng N, Qiao J, Wei L, Shi H, Guo H, Yang C. Computational models for prediction of m6A sites using deep learning. Methods 2025; 240:113-124. [PMID: 40268153 DOI: 10.1016/j.ymeth.2025.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 04/02/2025] [Accepted: 04/07/2025] [Indexed: 04/25/2025] Open
Abstract
RNA modifications play a crucial role in enhancing the structural and functional diversity of RNA molecules and regulating various stages of the RNA life cycle. Among these modifications, N6-Methyladenosine (m6A) is the most common internal modification in eukaryotic mRNAs and has been extensively studied over the past decade. Accurate identification of m6A modification sites is essential for understanding their function and underlying mechanisms. Traditional methods predominantly rely on machine learning techniques to recognize m6A sites, which often fail to capture the contextual features of these sites comprehensively. In this study, we comprehensively summarize previously published methods based on machine learning and deep learning. We also validate multiple deep learning approaches on benchmark dataset, including previously underutilized methods in m6A site prediction, pre-trained models specifically designed for biological sequence and other basic deep learning methods. Additionally, we further analyze the dataset features and interpret the model's predictions to enhance understanding. Our experimental results clearly demonstrate the effectiveness of the deep learning models, elucidating their strong potential in accurately recognizing m6A modification sites.
Collapse
Affiliation(s)
- Nan Sheng
- School of Software, Shandong University, Jinan 250101, PR China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250101, PR China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250101, PR China
| | - Hua Shi
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, PR China
| | - Huannan Guo
- Beidahuang Industry Group General Hospital, PR China.
| | - Changshun Yang
- Department of Gastrointestinal Surgery, Fuzhou University Affiliated Provincial Hospital, Fuzhou 350004, PR China.
| |
Collapse
|
4
|
Yuge CC, Hang ES, Mamtha MRN, Vishwakarma S, Wang S, Wang C, Le NQK. RNA-ModX: a multilabel prediction and interpretation framework for RNA modifications. Brief Bioinform 2024; 26:bbae688. [PMID: 39737566 DOI: 10.1093/bib/bbae688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 11/18/2024] [Accepted: 12/16/2024] [Indexed: 01/01/2025] Open
Abstract
Accurate prediction of RNA modifications holds profound implications for elucidating RNA function and mechanism, with potential applications in drug development. Here, the RNA-ModX presents a highly precise predictive model designed to forecast post-transcriptional RNA modifications, complemented by a user-friendly web application tailored for seamless utilization by future researchers. To achieve exceptional accuracy, the RNA-ModX systematically explored a range of machine learning models, including Long Short-Term Memory (LSTM), Gated Recurrent Unit, and Transformer-based architectures. The model underwent rigorous testing using a dataset comprising RNA sequences containing the four fundamental nucleotides (A, C, G, U) and spanning 12 prevalent modification classes (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um), with sequences of length 1001 nucleotides. Notably, the LSTM model, augmented with 3-mer encoding, demonstrated the highest level of model accuracy. Furthermore, Local Interpretable Model-Agnostic Explanations were employed to facilitate result interpretation, enhancing the transparency and interpretability of the model's predictions. In conjunction with the model development, a user-friendly web application was meticulously crafted, featuring an intuitive interface for researchers to effortlessly upload RNA sequences. Upon submission, the model executes in the backend, generating predictions which are seamlessly presented to the user in a coherent manner. This integration of cutting-edge predictive modeling with a user-centric interface signifies a significant step forward in facilitating the exploration and utilization of RNA modification prediction technologies by the broader research community.
Collapse
Affiliation(s)
- Chelsea Chen Yuge
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Ee Soon Hang
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | | | - Shashikant Vishwakarma
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Sijia Wang
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Cheng Wang
- Independent Researcher, Singapore, Singapore
| | - Nguyen Quoc Khanh Le
- In-Service Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, 252 Wuxing Street, 110, Taipei, Taiwan
| |
Collapse
|
5
|
Noor S, Naseem A, Awan HH, Aslam W, Khan S, AlQahtani SA, Ahmad N. Deep-m5U: a deep learning-based approach for RNA 5-methyluridine modification prediction using optimized feature integration. BMC Bioinformatics 2024; 25:360. [PMID: 39563239 DOI: 10.1186/s12859-024-05978-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 11/06/2024] [Indexed: 11/21/2024] Open
Abstract
BACKGROUND RNA 5-methyluridine (m5U) modifications play a crucial role in biological processes, making their accurate identification a key focus in computational biology. This paper introduces Deep-m5U, a robust predictor designed to enhance the prediction of m5U modifications. The proposed method, named Deep-m5U, utilizes a hybrid pseudo-K-tuple nucleotide composition (PseKNC) for sequence formulation, a Shapley Additive exPlanations (SHAP) algorithm for discriminant feature selection, and a deep neural network (DNN) as the classifier. RESULTS The model was evaluated using two benchmark datasets, i.e., Full Transcript and Mature mRNA. Deep-m5U achieved overall accuracies of 91.47% and 95.86% for the Full Transcript and Mature mRNA datasets with 10-fold cross-validation, and for independent samples, the model attained 92.94% and 95.17% accuracy. CONCLUSION Compared to existing models, Deep-m5U showed approximately 5.23% and 3.73% higher accuracy on the training data and 3.95% and 3.26% higher accuracy on independent samples for the Full Transcript and Mature mRNA datasets, respectively. The reliability and effectiveness of Deep-m5U make it a valuable tool for scientists and a potential asset in pharmaceutical design and research.
Collapse
Affiliation(s)
- Sumaiya Noor
- Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA
| | - Afshan Naseem
- Institute of Oceanography and Environment (INOS), Universiti Malaysia Terengganu, 21030, Kuala Nerus, Terengganu, Malaysia
| | - Hamid Hussain Awan
- Department of Computer Science, Muslim Youth University, Islamabad, Pakistan
| | - Wasiq Aslam
- Department of Computer Science, Muslim Youth University, Islamabad, Pakistan
| | - Salman Khan
- New Emerging Technologies and 5G Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Salman A AlQahtani
- New Emerging Technologies and 5G Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Nijad Ahmad
- Department of Computer Science, Khurasan University, Jalalabad, Afghanistan.
| |
Collapse
|
6
|
Shaon MSH, Karim T, Ali MM, Ahmed K, Bui FM, Chen L, Moni MA. A robust deep learning approach for identification of RNA 5-methyluridine sites. Sci Rep 2024; 14:25688. [PMID: 39465261 PMCID: PMC11514282 DOI: 10.1038/s41598-024-76148-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 10/10/2024] [Indexed: 10/29/2024] Open
Abstract
RNA 5-methyluridine (m5U) sites play a significant role in understanding RNA modifications, which influence numerous biological processes such as gene expression and cellular functioning. Consequently, the identification of m5U sites can play a vital role in the integrity, structure, and function of RNA molecules. Therefore, this study introduces GRUpred-m5U, a novel deep learning-based framework based on a gated recurrent unit in mature RNA and full transcript RNA datasets. We used three descriptor groups: nucleic acid composition, pseudo nucleic acid composition, and physicochemical properties, which include five feature extraction methods ENAC, Kmer, DPCP, DPCP type 2, and PseDNC. Initially, we aggregated all the feature extraction methods and created a new merged set. Three hybrid models were developed employing deep-learning methods and evaluated through 10-fold cross-validation with seven evaluation metrics. After a comprehensive evaluation, the GRUpred-m5U model outperformed the other applied models, obtaining 98.41% and 96.70% accuracy on the two datasets, respectively. To our knowledge, the proposed model outperformed all the existing state-of-the-art technology. The proposed supervised machine learning model was evaluated using unsupervised machine learning techniques such as principal component analysis (PCA), and it was observed that the proposed method provided a valid performance for identifying m5U. Considering its multi-layered construction, the GRUpred-m5U model has tremendous potential for future applications in the biological industry. The model, which consisted of neurons processing complicated input, excelled at pattern recognition and produced reliable results. Despite its greater size, the model obtained accurate results, essential in detecting m5U.
Collapse
Affiliation(s)
| | - Tasmin Karim
- Department of Computer Science and Informatics, Oakland University, Rochester, MI, 48309, USA
| | - Md Mamun Ali
- Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada
- Department of Software Engineering, Daffodil Smart City (DSC), Daffodil International University, Birulia, Savar, Dhaka, 1216, Bangladesh
| | - Kawsar Ahmed
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada.
- Group of Bio-photomatiχ, Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, 1902, Tangail, Bangladesh.
- Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Dhaka, 1216, Birulia, Bangladesh.
| | - Francis M Bui
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada
| | - Li Chen
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada
| | - Mohammad Ali Moni
- AI & Digital Health Technology, Artificial Intelligence & Cyber Future Institute, Charles Sturt University, Bathurst, NSW, 2795, Australia.
- AI & Digital Health Technology, Rural Health Research Institute, Charles Sturt University, Orange, NSW, 2800, Australia.
| |
Collapse
|
7
|
Guo B, Wei X, Liu S, Cui W, Zhou C. Deep learning modeling of RNA ac4C deposition reveals the importance of plant alternative splicing. PLANT MOLECULAR BIOLOGY 2024; 114:118. [PMID: 39467957 DOI: 10.1007/s11103-024-01512-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 10/03/2024] [Indexed: 10/30/2024]
Abstract
The N4-acetylcytidine (ac4C) modification has recently been characterized as a noncanonical RNA marker in plants. While the precise installation of ac4C sites in individual plant transcripts continues to present challenges, the biological roles of ac4C in specific plant species are gradually being deciphered. Herein, we utilized a deep learning technique called iac4C (intelligent ac4C) to predict ac4C sites in mRNA. ac4C deposition was effectively forecasted by the iac4C model (AUROC = 0.948), revealing a reliable distribution pattern primarily situated in the transcribing area as opposed to regions that are not translated. The iac4C deep learning approach using a combination of BiGRU and self-attention mechanisms both validates previous studies showing a positive correlation between ac4C and RNA splicing in plant species and reveals new examples of other splicing events associated with ac4C. Our advanced deep learning algorithm for analyzing ac4C enables swift identification of important biological phenomena that would otherwise be challenging to uncover through traditional experimental approaches. These findings provide insight into the essential regulatory function of site-specific ac4C deposition in alternative splicing processes. The source code and datasets for iac4C are available at https://github.com/xlwei507/iac4C .
Collapse
Affiliation(s)
- Bintao Guo
- Key Laboratory of Three Gorges Regional Plant Genetics and Germplasm Enhancement (CTGU)/Biotechnology Research Center, College of Biological and Pharmaceutical Sciences, China Three Gorges University, Yichang, 443002, China
| | - Xinlin Wei
- Key Laboratory of Three Gorges Regional Plant Genetics and Germplasm Enhancement (CTGU)/Biotechnology Research Center, College of Biological and Pharmaceutical Sciences, China Three Gorges University, Yichang, 443002, China
| | - Shuangcheng Liu
- Key Laboratory of Three Gorges Regional Plant Genetics and Germplasm Enhancement (CTGU)/Biotechnology Research Center, College of Biological and Pharmaceutical Sciences, China Three Gorges University, Yichang, 443002, China
| | - Wenchao Cui
- Key Laboratory of Three Gorges Regional Plant Genetics and Germplasm Enhancement (CTGU)/Biotechnology Research Center, College of Biological and Pharmaceutical Sciences, China Three Gorges University, Yichang, 443002, China.
| | - Chao Zhou
- Key Laboratory of Three Gorges Regional Plant Genetics and Germplasm Enhancement (CTGU)/Biotechnology Research Center, College of Biological and Pharmaceutical Sciences, China Three Gorges University, Yichang, 443002, China.
| |
Collapse
|
8
|
Saha R, Vázquez-Salazar A, Nandy A, Chen IA. Fitness Landscapes and Evolution of Catalytic RNA. Annu Rev Biophys 2024; 53:109-125. [PMID: 39013026 DOI: 10.1146/annurev-biophys-030822-025038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
The relationship between genotype and phenotype, or the fitness landscape, is the foundation of genetic engineering and evolution. However, mapping fitness landscapes poses a major technical challenge due to the amount of quantifiable data that is required. Catalytic RNA is a special topic in the study of fitness landscapes due to its relatively small sequence space combined with its importance in synthetic biology. The combination of in vitro selection and high-throughput sequencing has recently provided empirical maps of both complete and local RNA fitness landscapes, but the astronomical size of sequence space limits purely experimental investigations. Next steps are likely to involve data-driven interpolation and extrapolation over sequence space using various machine learning techniques. We discuss recent progress in understanding RNA fitness landscapes, particularly with respect to protocells and machine representations of RNA. The confluence of technical advances may significantly impact synthetic biology in the near future.
Collapse
Affiliation(s)
- Ranajay Saha
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California, USA; ,
| | - Alberto Vázquez-Salazar
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California, USA; ,
| | - Aditya Nandy
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California, USA; ,
- Department of Chemistry, The University of Chicago, Chicago, Illinois, USA
- The James Franck Institute, The University of Chicago, Chicago, Illinois, USA
| | - Irene A Chen
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, California, USA; ,
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California, USA
| |
Collapse
|
9
|
Wang M, Ali H, Xu Y, Xie J, Xu S. BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities. J Biol Chem 2024; 300:107140. [PMID: 38447795 PMCID: PMC10997841 DOI: 10.1016/j.jbc.2024.107140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/17/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024] Open
Abstract
RNA modification, a posttranscriptional regulatory mechanism, significantly influences RNA biogenesis and function. The accurate identification of modification sites is paramount for investigating their biological implications. Methods for encoding RNA sequence into numerical data play a crucial role in developing robust models for predicting modification sites. However, existing techniques suffer from limitations, including inadequate information representation, challenges in effectively integrating positional and sequential information, and the generation of irrelevant or redundant features when combining multiple approaches. These deficiencies hinder the effectiveness of machine learning models in addressing the performance challenges associated with predicting RNA modification sites. Here, we introduce a novel RNA sequence feature representation method, named BiPSTP, which utilizes bidirectional trinucleotide position-specific propensities. We employ the parameter ξ to denote the interval between the current nucleotide and its adjacent forward or backward dinucleotide, enabling the extraction of positional and sequential information from RNA sequences. Leveraging the BiPSTP method, we have developed the prediction model mRNAPred using support vector machine classifier to identify multiple types of RNA modification sites. We evaluate the performance of our BiPSTP method and mRNAPred model across 12 distinct RNA modification types. Our experimental results demonstrate the superiority of the mRNAPred model compared to state-of-art models in the domain of RNA modification sites identification. Importantly, our BiPSTP method enhances the robustness and generalization performance of prediction models. Notably, it can be applied to feature extraction from DNA sequences to predict other biological modification sites.
Collapse
Affiliation(s)
- Mingzhao Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Haider Ali
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yandi Xu
- School of Computer Science, Shaanxi Normal University, Xi'an, China; College of Life Sciences, Shaanxi Normal University, Xi'an, China
| | - Juanying Xie
- School of Computer Science, Shaanxi Normal University, Xi'an, China.
| | - Shengquan Xu
- College of Life Sciences, Shaanxi Normal University, Xi'an, China.
| |
Collapse
|
10
|
Meng L, Chen X, Cheng K, Chen N, Zheng Z, Wang F, Sun H, Wong KC. TransPTM: a transformer-based model for non-histone acetylation site prediction. Brief Bioinform 2024; 25:bbae219. [PMID: 38725156 PMCID: PMC11082075 DOI: 10.1093/bib/bbae219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 04/08/2024] [Accepted: 04/23/2024] [Indexed: 05/13/2024] Open
Abstract
Protein acetylation is one of the extensively studied post-translational modifications (PTMs) due to its significant roles across a myriad of biological processes. Although many computational tools for acetylation site identification have been developed, there is a lack of benchmark dataset and bespoke predictors for non-histone acetylation site prediction. To address these problems, we have contributed to both dataset creation and predictor benchmark in this study. First, we construct a non-histone acetylation site benchmark dataset, namely NHAC, which includes 11 subsets according to the sequence length ranging from 11 to 61 amino acids. There are totally 886 positive samples and 4707 negative samples for each sequence length. Secondly, we propose TransPTM, a transformer-based neural network model for non-histone acetylation site predication. During the data representation phase, per-residue contextualized embeddings are extracted using ProtT5 (an existing pre-trained protein language model). This is followed by the implementation of a graph neural network framework, which consists of three TransformerConv layers for feature extraction and a multilayer perceptron module for classification. The benchmark results reflect that TransPTM has the competitive performance for non-histone acetylation site prediction over three state-of-the-art tools. It improves our comprehension on the PTM mechanism and provides a theoretical basis for developing drug targets for diseases. Moreover, the created PTM datasets fills the gap in non-histone acetylation site datasets and is beneficial to the related communities. The related source code and data utilized by TransPTM are accessible at https://www.github.com/TransPTM/TransPTM.
Collapse
Affiliation(s)
- Lingkuan Meng
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| | - Xingjian Chen
- Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, MA 02138, United States
| | - Ke Cheng
- Department of Chemistry, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| | - Nanjun Chen
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| | - Hongyan Sun
- Department of Chemistry, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
| |
Collapse
|
11
|
Harun-Or-Roshid M, Maeda K, Phan LT, Manavalan B, Kurata H. Stack-DHUpred: Advancing the accuracy of dihydrouridine modification sites detection via stacking approach. Comput Biol Med 2024; 169:107848. [PMID: 38145601 DOI: 10.1016/j.compbiomed.2023.107848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 11/14/2023] [Accepted: 12/11/2023] [Indexed: 12/27/2023]
Abstract
Dihydrouridine (DHU, D) is one of the most abundant post-transcriptional uridine modifications found in tRNA, mRNA, and snoRNA, closely associated with disease pathogenesis and various biological processes in eukaryotes. Identifying D sites is important for understanding the modification mechanisms and/or epigenetic regulation. However, biological experiments for detecting D sites are time-consuming and expensive. Given these challenges, computational methods have been developed for accurately identifying the D sites in genome-wide datasets. However, existing methods have some limitations, and their prediction performance needs to be improved. In this work, we have developed a new computational predictor for accurately identifying D sites called Stack-DHUpred. Briefly, we trained 66 baseline models or single-feature models by connecting six machine learning classifiers with eleven different feature encoding methods and stacked different baseline models to build stacked ensemble learning models. Subsequently, the optimal combination of the baseline models was identified for the construction of the final stacked model. Remarkably, the Stack-DHUpred outperformed the existing predictors on our new independent dataset, indicating that the stacking approach significantly improved the prediction performance. We have made Stack-DHUpred available to the public through a web server (http://kurata35.bio.kyutech.ac.jp/Stack-DHUpred) and a standalone program (https://github.com/kuratahiroyuki/Stack-DHUpred). We believe that Stack-DHUpred will be a valuable tool for accelerating the discovery of D modifications and understanding their role in post-transcriptional regulation.
Collapse
Affiliation(s)
- Md Harun-Or-Roshid
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Kazuhiro Maeda
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Le Thi Phan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea.
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| |
Collapse
|
12
|
Wang H, Huang T, Wang D, Zeng W, Sun Y, Zhang L. MSCAN: multi-scale self- and cross-attention network for RNA methylation site prediction. BMC Bioinformatics 2024; 25:32. [PMID: 38233745 PMCID: PMC10795237 DOI: 10.1186/s12859-024-05649-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 01/11/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all RNA types. Precise recognition of RNA modifications is critical for understanding their functions and regulatory mechanisms. However, wet experimental methods are often costly and time-consuming, limiting their wide range of applications. Therefore, recent research has focused on developing computational methods, particularly deep learning (DL). Bidirectional long short-term memory (BiLSTM), convolutional neural network (CNN), and the transformer have demonstrated achievements in modification site prediction. However, BiLSTM cannot achieve parallel computation, leading to a long training time, CNN cannot learn the dependencies of the long distance of the sequence, and the Transformer lacks information interaction with sequences at different scales. This insight underscores the necessity for continued research and development in natural language processing (NLP) and DL to devise an enhanced prediction framework that can effectively address the challenges presented. RESULTS This study presents a multi-scale self- and cross-attention network (MSCAN) to identify the RNA methylation site using an NLP and DL way. Experiment results on twelve RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) reveal that the area under the receiver operating characteristic of MSCAN obtains respectively 98.34%, 85.41%, 97.29%, 96.74%, 99.04%, 79.94%, 76.22%, 65.69%, 92.92%, 92.03%, 95.77%, 89.66%, which is better than the state-of-the-art prediction model. This indicates that the model has strong generalization capabilities. Furthermore, MSCAN reveals a strong association among different types of RNA modifications from an experimental perspective. A user-friendly web server for predicting twelve widely occurring human RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) is available at http://47.242.23.141/MSCAN/index.php . CONCLUSIONS A predictor framework has been developed through binary classification to predict RNA methylation sites.
Collapse
Affiliation(s)
- Honglei Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
- School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou, 221400, China
| | - Tao Huang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Dong Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Wenliang Zeng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yanjing Sun
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| |
Collapse
|
13
|
Yang Y, Liu Z, Lu J, Sun Y, Fu Y, Pan M, Xie X, Ge Q. Analysis approaches for the identification and prediction of N6-methyladenosine sites. Epigenetics 2023; 18:2158284. [PMID: 36562485 PMCID: PMC9980620 DOI: 10.1080/15592294.2022.2158284] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The global dynamics in a variety of biological processes can be revealed by mapping transcriptional m6A sites, in particular full-transcriptome m6A. And individual m6A sites have contributed to biological function, which can be evaluated by stoichiometric information obtained from the single nucleotide resolution. Currently, the identification of m6A sites is mainly carried out by experiment and prediction methods, based on high-throughput sequencing and machine learning model respectively. This review summarizes the recent topics and progress made in bioinformatics methods of deciphering the m6A methylation, including the experimental detection of m6A methylation sites, techniques of data analysis, the way of predicting m6A methylation sites, m6A methylation databases, and detection of m6A modification in circRNA. At the end, the essay makes a brief discussion for the development perspective in this area.
Collapse
Affiliation(s)
- Yuwei Yang
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Zhiyu Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Junru Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Yuqing Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Yue Fu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Min Pan
- Department of Pathology and Pathophysiology School of Medicine, Southeast University, Nanjing, China
| | - Xueying Xie
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| |
Collapse
|
14
|
Jia J, Wei Z, Cao X. EMDL-ac4C: identifying N4-acetylcytidine based on ensemble two-branch residual connection DenseNet and attention. Front Genet 2023; 14:1232038. [PMID: 37519885 PMCID: PMC10372626 DOI: 10.3389/fgene.2023.1232038] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 06/29/2023] [Indexed: 08/01/2023] Open
Abstract
Introduction: N4-acetylcytidine (ac4C) is a critical acetylation modification that has an essential function in protein translation and is associated with a number of human diseases. Methods: The process of identifying ac4C sites by biological experiments is too cumbersome and costly. And the performance of several existing computational models needs to be improved. Therefore, we propose a new deep learning tool EMDL-ac4C to predict ac4C sites, which uses a simple one-hot encoding for a unbalanced dataset using a downsampled ensemble deep learning network to extract important features to identify ac4C sites. The base learner of this ensemble model consists of a modified DenseNet and Squeeze-and-Excitation Networks. In addition, we innovatively add a convolutional residual structure in parallel with the dense block to achieve the effect of two-layer feature extraction. Results: The average accuracy (Acc), mathews correlation coefficient (MCC), and area under the curve Area under curve of EMDL-ac4C on ten independent testing sets are 80.84%, 61.77%, and 87.94%, respectively. Discussion: Multiple experimental comparisons indicate that EMDL-ac4C outperforms existing predictors and it greatly improved the predictive performance of the ac4C sites. At the same time, EMDL-ac4C could provide a valuable reference for the next part of the study. The source code and experimental data are available at: https://github.com/13133989982/EMDLac4C.
Collapse
Affiliation(s)
- Jianhua Jia
- *Correspondence: Jianhua Jia, ; Zhangying Wei,
| | | | | |
Collapse
|
15
|
Yu L, Zhang Y, Xue L, Liu F, Jing R, Luo J. Evaluation and development of deep neural networks for RNA 5-Methyluridine classifications using autoBioSeqpy. Front Microbiol 2023; 14:1175925. [PMID: 37275146 PMCID: PMC10232852 DOI: 10.3389/fmicb.2023.1175925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 04/27/2023] [Indexed: 06/07/2023] Open
Abstract
Post-transcriptionally RNA modifications, also known as the epitranscriptome, play crucial roles in the regulation of gene expression during development. Recently, deep learning (DL) has been employed for RNA modification site prediction and has shown promising results. However, due to the lack of relevant studies, it is unclear which DL architecture is best suited for some pyrimidine modifications, such as 5-methyluridine (m5U). To fill this knowledge gap, we first performed a comparative evaluation of various commonly used DL models for epigenetic studies with the help of autoBioSeqpy. We identified optimal architectural variations for m5U site classification, optimizing the layer depth and neuron width. Second, we used this knowledge to develop Deepm5U, an improved convolutional-recurrent neural network that accurately predicts m5U sites from RNA sequences. We successfully applied Deepm5U to transcriptomewide m5U profiling data across different sequencing technologies and cell types. Third, we showed that the techniques for interpreting deep neural networks, including LayerUMAP and DeepSHAP, can provide important insights into the internal operation and behavior of models. Overall, we offered practical guidance for the development, benchmark, and analysis of deep learning models when designing new algorithms for RNA modifications.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Yonglin Zhang
- Department of Pharmacy, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| | - Jiesi Luo
- Basic Medical College, Southwest Medical University, Luzhou, China
- Sichuan Key Medical Laboratory of New Drug Discovery and Druggability Evaluation, Luzhou Key Laboratory of Activity Screening and Druggability Evaluation for Chinese Materia Medica, Southwest Medical University, Luzhou, China
| |
Collapse
|
16
|
Acera Mateos P, Zhou Y, Zarnack K, Eyras E. Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning. Brief Bioinform 2023; 24:7150742. [PMID: 37139545 DOI: 10.1093/bib/bbad163] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/03/2023] [Indexed: 05/05/2023] Open
Abstract
The expanding field of epitranscriptomics might rival the epigenome in the diversity of biological processes impacted. In recent years, the development of new high-throughput experimental and computational techniques has been a key driving force in discovering the properties of RNA modifications. Machine learning applications, such as for classification, clustering or de novo identification, have been critical in these advances. Nonetheless, various challenges remain before the full potential of machine learning for epitranscriptomics can be leveraged. In this review, we provide a comprehensive survey of machine learning methods to detect RNA modifications using diverse input data sources. We describe strategies to train and test machine learning methods and to encode and interpret features that are relevant for epitranscriptomics. Finally, we identify some of the current challenges and open questions about RNA modification analysis, including the ambiguity in predicting RNA modifications in transcript isoforms or in single nucleotides, or the lack of complete ground truth sets to test RNA modifications. We believe this review will inspire and benefit the rapidly developing field of epitranscriptomics in addressing the current limitations through the effective use of machine learning.
Collapse
Affiliation(s)
- Pablo Acera Mateos
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| | - You Zhou
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| |
Collapse
|
17
|
Jia J, Qin L, Lei R. DGA-5mC: A 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:9759-9780. [PMID: 37322910 DOI: 10.3934/mbe.2023428] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The 5-methylcytosine (5mC) in the promoter region plays a significant role in biological processes and diseases. A few high-throughput sequencing technologies and traditional machine learning algorithms are often used by researchers to detect 5mC modification sites. However, high-throughput identification is laborious, time-consuming and expensive; moreover, the machine learning algorithms are not so advanced. Therefore, there is an urgent need to develop a more efficient computational approach to replace those traditional methods. Since deep learning algorithms are more popular and have powerful computational advantages, we constructed a novel prediction model, called DGA-5mC, to identify 5mC modification sites in promoter regions by using a deep learning algorithm based on an improved densely connected convolutional network (DenseNet) and the bidirectional GRU approach. Furthermore, we added a self-attention module to evaluate the importance of various 5mC features. The deep learning-based DGA-5mC model algorithm automatically handles large proportions of unbalanced data for both positive and negative samples, highlighting the model's reliability and superiority. So far as the authors are aware, this is the first time that the combination of an improved DenseNet and bidirectional GRU methods has been used to predict the 5mC modification sites in promoter regions. It can be seen that the DGA-5mC model, after using a combination of one-hot coding, nucleotide chemical property coding and nucleotide density coding, performed well in terms of sensitivity, specificity, accuracy, the Matthews correlation coefficient (MCC), area under the curve and Gmean in the independent test dataset: 90.19%, 92.74%, 92.54%, 64.64%, 96.43% and 91.46%, respectively. In addition, all datasets and source codes for the DGA-5mC model are freely accessible at https://github.com/lulukoss/DGA-5mC.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Lulu Qin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Rufeng Lei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
18
|
Zhang X, Wang S, Xie L, Zhu Y. PseU-ST: A new stacked ensemble-learning method for identifying RNA pseudouridine sites. Front Genet 2023; 14:1121694. [PMID: 36741328 PMCID: PMC9892456 DOI: 10.3389/fgene.2023.1121694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 01/09/2023] [Indexed: 01/20/2023] Open
Abstract
Background: Pseudouridine (Ψ) is one of the most abundant RNA modifications found in a variety of RNA types, and it plays a significant role in many biological processes. The key to studying the various biochemical functions and mechanisms of Ψ is to identify the Ψ sites. However, identifying Ψ sites using experimental methods is time-consuming and expensive. Therefore, it is necessary to develop computational methods that can accurately predict Ψ sites based on RNA sequence information. Methods: In this study, we proposed a new model called PseU-ST to identify Ψ sites in Homo sapiens (H. sapiens), Saccharomyces cerevisiae (S. cerevisiae), and Mus musculus (M. musculus). We selected the best six encoding schemes and four machine learning algorithms based on a comprehensive test of almost all of the RNA sequence encoding schemes available in the iLearnPlus software package, and selected the optimal features for each encoding scheme using chi-square and incremental feature selection algorithms. Then, we selected the optimal feature combination and the best base-classifier combination for each species through an extensive performance comparison and employed a stacking strategy to build the predictive model. Results: The results demonstrated that PseU-ST achieved better prediction performance compared with other existing models. The PseU-ST accuracy scores were 93.64%, 87.74%, and 89.64% on H_990, S_628, and M_944, respectively, representing increments of 13.94%, 6.05%, and 0.26%, respectively, higher than the best existing methods on the same benchmark training datasets. Conclusion: The data indicate that PseU-ST is a very competitive prediction model for identifying RNA Ψ sites in H. sapiens, M. musculus, and S. cerevisiae. In addition, we found that the Position-specific trinucleotide propensity based on single strand (PSTNPss) and Position-specific of three nucleotides (PS3) features play an important role in Ψ site identification. The source code for PseU-ST and the data are obtainable in our GitHub repository (https://github.com/jluzhangxinrubio/PseU-ST).
Collapse
|
19
|
Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022; 16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Ribonucleic acid (RNA) methylation is the most abundant modification in biological systems, accounting for 60% of all RNA modifications, and affects multiple aspects of RNA (including mRNAs, tRNAs, rRNAs, microRNAs, and long non-coding RNAs). Dysregulation of RNA methylation causes many developmental diseases through various mechanisms mediated by N 6-methyladenosine (m6A), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), 5-hydroxymethylcytosine (hm5C), and pseudouridine (Ψ). The emerging tools of RNA methylation can be used as diagnostic, preventive, and therapeutic markers. Here, we review the accumulated discoveries to date regarding the biological function and dynamic regulation of RNA methylation/modification, as well as the most popularly used techniques applied for profiling RNA epitranscriptome, to provide new ideas for growth and development.
Collapse
Affiliation(s)
- Jia Zou
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Hui Liu
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Wei Tan
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Yi-qi Chen
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Jing Dong
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Shu-yuan Bai
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Zhao-xia Wu
- Community Health Service Center, Wuchang Hospital, Wuhan, China
| | - Yan Zeng
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,
| |
Collapse
|
20
|
Durge AR, Shrimankar DD, Sawarkar AD. Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective. Curr Genomics 2022; 23:299-317. [PMID: 36778194 PMCID: PMC9878859 DOI: 10.2174/1389202923666220927105311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 08/29/2022] [Accepted: 09/01/2022] [Indexed: 11/22/2022] Open
Abstract
Genome sequences indicate a wide variety of characteristics, which include species and sub-species type, genotype, diseases, growth indicators, yield quality, etc. To analyze and study the characteristics of the genome sequences across different species, various deep learning models have been proposed by researchers, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Multilayer Perceptrons (MLPs), etc., which vary in terms of evaluation performance, area of application and species that are processed. Due to a wide differentiation between the algorithmic implementations, it becomes difficult for research programmers to select the best possible genome processing model for their application. In order to facilitate this selection, the paper reviews a wide variety of such models and compares their performance in terms of accuracy, area of application, computational complexity, processing delay, precision and recall. Thus, in the present review, various deep learning and machine learning models have been presented that possess different accuracies for different applications. For multiple genomic data, Repeated Incremental Pruning to Produce Error Reduction with Support Vector Machine (Ripper SVM) outputs 99.7% of accuracy, and for cancer genomic data, it exhibits 99.27% of accuracy using the CNN Bayesian method. Whereas for Covid genome analysis, Bidirectional Long Short-Term Memory with CNN (BiLSTM CNN) exhibits the highest accuracy of 99.95%. A similar analysis of precision and recall of different models has been reviewed. Finally, this paper concludes with some interesting observations related to the genomic processing models and recommends applications for their efficient use.
Collapse
Affiliation(s)
- Aditi R. Durge
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| | - Deepti D. Shrimankar
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India,Address correspondence to this author at the Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India; Tel: 9860606477; E-mail:
| | - Ankush D. Sawarkar
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| |
Collapse
|
21
|
Suleman MT, Alkhalifah T, Alturise F, Khan YD. DHU-Pred: accurate prediction of dihydrouridine sites using position and composition variant features on diverse classifiers. PeerJ 2022; 10:e14104. [PMID: 36320563 PMCID: PMC9618264 DOI: 10.7717/peerj.14104] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 09/01/2022] [Indexed: 01/21/2023] Open
Abstract
Background Dihydrouridine (D) is a modified transfer RNA post-transcriptional modification (PTM) that occurs abundantly in bacteria, eukaryotes, and archaea. The D modification assists in the stability and conformational flexibility of tRNA. The D modification is also responsible for pulmonary carcinogenesis in humans. Objective For the detection of D sites, mass spectrometry and site-directed mutagenesis have been developed. However, both are labor-intensive and time-consuming methods. The availability of sequence data has provided the opportunity to build computational models for enhancing the identification of D sites. Based on the sequence data, the DHU-Pred model was proposed in this study to find possible D sites. Methodology The model was built by employing comprehensive machine learning and feature extraction approaches. It was then validated using in-demand evaluation metrics and rigorous experimentation and testing approaches. Results The DHU-Pred revealed an accuracy score of 96.9%, which was considerably higher compared to the existing D site predictors. Availability and Implementation A user-friendly web server for the proposed model was also developed and is freely available for the researchers.
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management & Technology, Lahore, Pakistan
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management & Technology, Lahore, Pakistan
| |
Collapse
|
22
|
Huang D, Chen K, Song B, Wei Z, Su J, Coenen F, de Magalhães JP, Rigden DJ, Meng J. Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation. Nucleic Acids Res 2022; 50:10290-10310. [PMID: 36155798 PMCID: PMC9561283 DOI: 10.1093/nar/gkac830] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 08/26/2022] [Accepted: 09/15/2022] [Indexed: 12/25/2022] Open
Abstract
As the most pervasive epigenetic mark present on mRNA and lncRNA, N6-methyladenosine (m6A) RNA methylation regulates all stages of RNA life in various biological processes and disease mechanisms. Computational methods for deciphering RNA modification have achieved great success in recent years; nevertheless, their potential remains underexploited. One reason for this is that existing models usually consider only the sequence of transcripts, ignoring the various regions (or geography) of transcripts such as 3'UTR and intron, where the epigenetic mark forms and functions. Here, we developed three simple yet powerful encoding schemes for transcripts to capture the submolecular geographic information of RNA, which is largely independent from sequences. We show that m6A prediction models based on geographic information alone can achieve comparable performances to classic sequence-based methods. Importantly, geographic information substantially enhances the accuracy of sequence-based models, enables isoform- and tissue-specific prediction of m6A sites, and improves m6A signal detection from direct RNA sequencing data. The geographic encoding schemes we developed have exhibited strong interpretability, and are applicable to not only m6A but also N1-methyladenosine (m1A), and can serve as a general and effective complement to the widely used sequence encoding schemes in deep learning applications concerning RNA transcripts.
Collapse
Affiliation(s)
- Daiyun Huang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Department of Computer Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - Kunqi Chen
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, PR China
| | - Bowen Song
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - Jionglong Su
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
| | - Frans Coenen
- Department of Computer Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - João Pedro de Magalhães
- Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou 215123, PR China
| |
Collapse
|
23
|
Ma S, Zhu J, Wang M, Zhu J, Wang W, Xiong Y, Jiang R, Liu L, Jiang T. Comprehensive analysis of m7G modification patterns based on potential m7G regulators and tumor microenvironment infiltration characterization in lung adenocarcinoma. Front Genet 2022; 13:996950. [PMID: 36246663 PMCID: PMC9559715 DOI: 10.3389/fgene.2022.996950] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Background: The non-negligible role of epigenetic modifications in cancer development and tumor microenvironment (TME) has been demonstrated in recent studies. Nonetheless, the potential regulatory role of N7-methylguanosine (m7G) modification in shaping and impacting the TME remains unclear. Methods: A comprehensive analysis was performed to explore the m7G modification patterns based on 24 potential m7G regulators in 817 lung adenocarcinoma (LUAD) patients, and the TME landscape in distinct m7G modification patterns were evaluated. The m7G score was established based on principal component analysis (PCA) to quantify m7G modification patterns and evaluate the TME cell infiltrating characteristics of individual tumors. Further, correlation analyses of m7Gscore with response to chemotherapy and immunotherapy were performed. Results: We identified three distinct m7G modification patterns with the biological pathway enrichment and TME cell infiltrating characteristics corresponded to immune-desert, immune-inflamed and immune-excluded phenotype, respectively. We further demonstrated the m7Gscore could predict the TME infiltrating characteristics, tumor mutation burden (TMB), response to immunotherapy and chemotherapy, as well as prognosis of individual tumors. High m7Gscore was associated with increased component of immune cell infiltration, low TMB and survival advantage, while low m7Gscore was linked to decreased immune cell infiltration and increased TMB. Additionally, patients with lower m7Gscore demonstrated significant therapeutic advantages. Conclusion: This study demonstrated the regulatory mechanisms of m7G modification on TME formation and regulation of lung adenocarcinoma. Identification of individual tumor m7G modification patterns will contribute to the understanding of TME characterization and guiding more effective immunotherapy strategies.
Collapse
Affiliation(s)
- Shouzheng Ma
- Department of Thoracic Surgery, Tangdu Hospital, Fourth Military Medical University, Xi’an, China
| | - Jun Zhu
- Department of General Surgery, The Southern Theater Air Force Hospital, Guangzhou, China
| | - Mengmeng Wang
- Department of Drug and Equipment, Lintong Rehabilitation and Convalescent Centre, Xi’an, China
| | - Jianfei Zhu
- Department of Thoracic Surgery, Tangdu Hospital, Fourth Military Medical University, Xi’an, China
| | - Wenchen Wang
- Department of Thoracic Surgery, Tangdu Hospital, Fourth Military Medical University, Xi’an, China
| | - Yanlu Xiong
- Department of Thoracic Surgery, Tangdu Hospital, Fourth Military Medical University, Xi’an, China
| | - Runmin Jiang
- Department of Thoracic Surgery, Tangdu Hospital, Fourth Military Medical University, Xi’an, China
| | - Lei Liu
- Department of Gastroenterology, Tangdu Hospital, Fourth Military Medical University, Xi’an, China
- Department of Gastroenterology, Daping Hospital, Army Medical University, Chongqing, China
- *Correspondence: Lei Liu, ; Tao Jiang,
| | - Tao Jiang
- Department of Thoracic Surgery, Tangdu Hospital, Fourth Military Medical University, Xi’an, China
- *Correspondence: Lei Liu, ; Tao Jiang,
| |
Collapse
|
24
|
Luo Q, Zhan X, Kuang Y, Sun M, Dong F, Sun E, Chen B. WTAP promotes oesophageal squamous cell carcinoma development by decreasing CPSF4 expression in an m 6A-dependent manner. MEDICAL ONCOLOGY (NORTHWOOD, LONDON, ENGLAND) 2022; 39:231. [PMID: 36175708 DOI: 10.1007/s12032-022-01830-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 08/17/2022] [Indexed: 10/14/2022]
Abstract
m6A is a widespread RNA modification. However, the mechanism through which m6A regulated the progress of oesophageal squamous cell carcinoma (ESCC) remains undetermined. The levels and prognosis of WTAP were analysed using an ESCC tissue microarray (87 ESCC and 44 paracancerous tissues). TCGA and Oncolnc databases validate WTAP expression and prognosis. CCK8, colony formation (CF), wound healing, transwell cell invasion (CI), and migration (CM) assays were employed for the detection of the biological impacts of WTAP. Expression of tumour stemness-related genes was assessed via qRT-PCR and western blotting. The m6A RNA methylation (m6AMe) quantitative kit was employed for cellular methylation level detection. Arraystar m6A-mRNA and lncRNA epitranscriptomic microarray analyses were used to screen low methylation, high expression, and prognosis-related candidate gene CPSF4. KEGG enrichment analysis was used to screen the downstream signalling pathways of CPSF4. WTAP, a methyltransferase "writer", was markedly enhanced in ESCC and was strongly correlated with poor patient outcome. WTAP knockdown inhibited the cell proliferation (CP), CI, CM, and stemness of ESCC cells in vitro and reduced the overall m6A modification (m6AMo) percentage of ESCC cells. CPSF4 is a target of WTAP-based m6AMo. WTAP-based m6AMo of CPSF4 transcript reduced the stability of CPSF4 by relying on YTHDF2. We identified the significant role of WTAP-catalysed m6AMo in ESCC tumourigenesis, wherein it facilitates ESCC tumour growth and metastasis through decreasing CPSF4 expression in an m6A-dependent manner.
Collapse
Affiliation(s)
- Qian Luo
- Department of Pathology, Wannan Medical College, Wuhu, Anhui, China
| | - Xuebing Zhan
- Department of Pathology, The First People's Hospital of Huizhou City, Huizhou, Guangdong, China
| | - Yunshu Kuang
- Department of Pathology, Wannan Medical College, Wuhu, Anhui, China
| | - Mingzhong Sun
- Graduate School, Wannan Medical College, Wuhu, Anhui, China
| | - Fangyuan Dong
- Department of Pathology, Maanshan People's Hospital, Maanshan, Anhui, China
| | - Entao Sun
- Department of Health Inspection and Quarantine, Wannan Medical College, Wuhu, Anhui, China.
| | - Bing Chen
- Department of Pathology, Wannan Medical College, Wuhu, Anhui, China.
| |
Collapse
|
25
|
Liu Y, Zeng S, Wu M. Novel insights into noncanonical open reading frames in cancer. Biochim Biophys Acta Rev Cancer 2022; 1877:188755. [PMID: 35777601 DOI: 10.1016/j.bbcan.2022.188755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 06/11/2022] [Accepted: 06/23/2022] [Indexed: 12/12/2022]
Abstract
With technological advances, previously neglected noncanonical open reading frames (nORFs) are drawing ever-increasing attention. However, the translation potential of numerous putative nORFs remains elusive, and the functions of noncanonical peptides have not been systemically summarized. Moreover, the relationship between noncanonical peptides and their counterpart protein or RNA products remains elusive and the clinical implementation of noncanonical peptides has not been explored. In this review, we highlight how recent technological advances such as ribosome profiling, bioinformatics approaches and CRISPR/Cas9 facilitate the research of noncanonical peptides. We delineate the features of each nORF category and the evolutionary process underneath the nORFs. Most importantly, we summarize the diversified functions of noncanonical peptides in cancer based on their subcellular location, which reflect their extensive participation in key pathways and essential cellular activities in cancer cells. Meanwhile, the equilibrium between noncanonical peptides and their corresponding transcripts or counterpart products may be dysregulated under pathological states, which is essential for their roles in cancer. Lastly, we explore their underestimated potential in clinical application as diagnostic biomarkers and treatment targets against cancer.
Collapse
Affiliation(s)
- Yihan Liu
- Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha 410013, Hunan, China; The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health, The Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan 410008, China; Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China; Key Laboratory for Molecular Radiation Oncology of Hunan Province, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shan Zeng
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China; Key Laboratory for Molecular Radiation Oncology of Hunan Province, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China.
| | - Minghua Wu
- Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha 410013, Hunan, China; The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health, The Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan 410008, China.
| |
Collapse
|
26
|
EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction. BMC Bioinformatics 2022; 23:221. [PMID: 35676633 PMCID: PMC9178860 DOI: 10.1186/s12859-022-04756-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 05/27/2022] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Recent research recommends that epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all sorts of RNA. Exact identification of RNA modification is vital for understanding their purposes and regulatory mechanisms. However, traditional experimental methods of identifying RNA modification sites are relatively complicated, time-consuming, and laborious. Machine learning approaches have been applied in the procedures of RNA sequence features extraction and classification in a computational way, which may supplement experimental approaches more efficiently. Recently, convolutional neural network (CNN) and long short-term memory (LSTM) have been demonstrated achievements in modification site prediction on account of their powerful functions in representation learning. However, CNN can learn the local response from the spatial data but cannot learn sequential correlations. And LSTM is specialized for sequential modeling and can access both the contextual representation but lacks spatial data extraction compared with CNN. There is strong motivation to construct a prediction framework using natural language processing (NLP), deep learning (DL) for these reasons. RESULTS This study presents an ensemble multiscale deep learning predictor (EMDLP) to identify RNA methylation sites in an NLP and DL way. It organically combines the dilated convolution and Bidirectional LSTM (BiLSTM), which helps to take better advantage of the local and global information for site prediction. The first step of EMDLP is to represent the RNA sequences in an NLP way. Thus, three encodings, e.g., RNA word embedding, One-hot encoding, and RGloVe, which is an improved learning method of word vector representation based on GloVe, are adopted to decipher sites from the viewpoints of the local and global information. Then, a dilated convolutional Bidirectional LSTM network (DCB) model is constructed with the dilated convolutional neural network (DCNN) followed by BiLSTM to extract potential contributing features for methylation site prediction. Finally, these three encoding methods are integrated by a soft vote to obtain better predictive performance. Experiment results on m1A and m6A reveal that the area under the receiver operating characteristic(AUROC) of EMDLP obtains respectively 95.56%, 85.24%, and outperforms the state-of-the-art models. To maximize user convenience, a user-friendly webserver for EMDLP was publicly available at http://www.labiip.net/EMDLP/index.php ( http://47.104.130.81/EMDLP/index.php ). CONCLUSIONS We developed a predictor for m1A and m6A methylation sites.
Collapse
|
27
|
Bilal A, Alarfaj FK, Khan RA, Suleman MT, Long H. m5c-iEnsem: 5-methylcytosine sites identification through ensemble models. BIOINFORMATICS (OXFORD, ENGLAND) 2022; 41:btae722. [PMID: 39657957 PMCID: PMC11911556 DOI: 10.1093/bioinformatics/btae722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 10/28/2024] [Accepted: 12/06/2024] [Indexed: 12/12/2024]
Abstract
MOTIVATION 5-Methylcytosine (m5c), a modified cytosine base, arises from adding a methyl group at the 5th carbon position. This modification is a prevalent form of post-transcriptional modification (PTM) found in various types of RNA. Traditional laboratory techniques often fail to provide rapid and accurate identification of m5c sites. However, with the growing accessibility of sequence data, expanding computational models offers a more efficient and reliable approach to m5c site detection. This research focused on creating advanced in-silico methods using ensemble learning techniques. The encoded data was processed through ensemble models, including bagging and boosting techniques. These models were then rigorously evaluated through independent testing and 10-fold cross-validation. RESULTS Among the models tested, the Bagging ensemble-based predictor, m5C-iEnsem, demonstrated superior performance to existing m5c prediction tools. AVAILABILITY AND IMPLEMENTATION To further support the research community, m5c-iEnsem has been made available via a user-friendly web server at https://m5c-iensem.streamlit.app/.
Collapse
Affiliation(s)
- Anas Bilal
- College of Information Science and Technology, Hainan Normal University, Haikou 571158, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou 571158, China
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU), Al-Ahsa 31982, Saudi Arabia
| | - Rafaqat Alam Khan
- Department of Software Engineering, Lahore Garrison University, Lahore 54000, Pakistan
| | | | - Haixia Long
- College of Information Science and Technology, Hainan Normal University, Haikou 571158, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou 571158, China
| |
Collapse
|