1
|
Chen Q, Wang S, Zhang M, Xiang Y, Chen Q, Li Z, Song Y, Bai L, Zhu Y. Aberrant downregulation of Y-box binding protein 1 expression impairs the cell cycle in an m 5C-dependent manner in human granulosa cells from patients with primary ovarian insufficiency. Cell Mol Life Sci 2025; 82:206. [PMID: 40397139 PMCID: PMC12095738 DOI: 10.1007/s00018-025-05709-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 04/01/2025] [Accepted: 04/09/2025] [Indexed: 05/22/2025]
Abstract
Y-box binding protein 1 (YBX1) has been reported to play a role in human granulosa cell (GC) dysfunction by binding with long noncoding RNAs in patients with primary ovarian insufficiency (POI). 5-Methylcytosine (m5C) methylation is an abundant RNA epigenetic modification that is widely present in eukaryotic RNAs. However, whether YBX1, an important m5C reader, whether YBX1 participates in POI in an m5C- dependent manner remains unknown. Here, we demonstrated that the expression levels of YBX1 were decreased in GCs from patients with biochemical POI. YBX1 knockdown in a human granulosa cell line (KGN) impaired cell proliferation by preventing the G1 to S transition in the cell cycle. Conversely, YBX1 overexpression promoted the KGN cell proliferation. Integrated analysis of the transcriptome and m5C methylome profiles revealed that in human GCs, knockdown of YBX1 expression destabilized cell cycle-associated transcripts in an m5C-dependent manner, resulting in cell cycle arrest. Our results provide new insights of the pathogenesis of POI, revealing an alternative molecular mechanism in which YBX1 participates in human GC dysfunction by affecting the stability of cell cycle-associated genes in an m5C-dependent manner and thereby modulating GC proliferation.
Collapse
Affiliation(s)
- Qichao Chen
- Key Laboratory of Reproductive Genetics (Ministry of Education), Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China
- Department of Reproductive Endocrinology, School of Medicine, Women's Hospital, Zhejiang University, Hangzhou, 310002, Zhejiang, China
| | - Sisi Wang
- Key Laboratory of Reproductive Genetics (Ministry of Education), Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China
- Department of Reproductive Endocrinology, School of Medicine, Women's Hospital, Zhejiang University, Hangzhou, 310002, Zhejiang, China
| | - Min Zhang
- Key Laboratory of Reproductive Genetics (Ministry of Education), Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China
- Department of Reproductive Endocrinology, School of Medicine, Women's Hospital, Zhejiang University, Hangzhou, 310002, Zhejiang, China
| | - Yu Xiang
- Key Laboratory of Reproductive Genetics (Ministry of Education), Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China
- Department of Reproductive Endocrinology, School of Medicine, Women's Hospital, Zhejiang University, Hangzhou, 310002, Zhejiang, China
| | - Qingqing Chen
- Key Laboratory of Reproductive Genetics (Ministry of Education), Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China
- Department of Reproductive Endocrinology, School of Medicine, Women's Hospital, Zhejiang University, Hangzhou, 310002, Zhejiang, China
| | - Zhekun Li
- Key Laboratory of Reproductive Genetics (Ministry of Education), Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China
- Department of Reproductive Endocrinology, School of Medicine, Women's Hospital, Zhejiang University, Hangzhou, 310002, Zhejiang, China
| | - Yang Song
- Key Laboratory of Reproductive Genetics (Ministry of Education), Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China
- Department of Reproductive Endocrinology, School of Medicine, Women's Hospital, Zhejiang University, Hangzhou, 310002, Zhejiang, China
| | - Long Bai
- Key Laboratory of Reproductive Genetics (Ministry of Education), Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China.
- Department of Reproductive Endocrinology, School of Medicine, Women's Hospital, Zhejiang University, Hangzhou, 310002, Zhejiang, China.
- Zhejiang Key Laboratory of Maternal and Infant Health, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China.
| | - Yimin Zhu
- Key Laboratory of Reproductive Genetics (Ministry of Education), Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China.
- Department of Reproductive Endocrinology, School of Medicine, Women's Hospital, Zhejiang University, Hangzhou, 310002, Zhejiang, China.
- Zhejiang Key Laboratory of Maternal and Infant Health, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310002, Zhejiang, China.
- Institute of Medical Genetics and Development, Zhejiang University, Hangzhou, 310002, Zhejiang, China.
| |
Collapse
|
2
|
Jiang D, Ao C, Li Y, Yu L. Feadm5C: Enhancing prediction of RNA 5-Methylcytosine modification sites with physicochemical molecular graph features. Genomics 2025; 117:111037. [PMID: 40127825 DOI: 10.1016/j.ygeno.2025.111037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 11/04/2024] [Accepted: 03/20/2025] [Indexed: 03/26/2025]
Abstract
One common post-transcriptional modification that is essential to biological activities is RNA 5-methylcytosine (m5C). A large amount of RNA data containing m5C modification sites has been gathered as a result of the rapid development of high-throughput sequencing technology. While there are a lot of machine learning based techniques available for identifying m5C alteration sites, these models' accuracy still has to be raised. This study proposed a novel method, Feadm5C, which predicts m5C based on fusing molecular graph features and sequencing information together. 10-fold cross-validation was used to assess the model's predictive performance. In addition, we used t-SNE visualization to assess the model's stability and effectiveness. While keeping feature encoding and model structure straightforward, the approach suggested in this work outperforms the most recent approaches in use. The dataset and code of the model can be downloaded from GitHub (https://github.com/LiangYu-Xidian/Feadm5C).
Collapse
Affiliation(s)
- Dongdong Jiang
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Yan Li
- School of Management, Xi'an Polytechnic University, Xi'an 710000, Shaanxi, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China.
| |
Collapse
|
3
|
Han B, Bai S, Liu Y, Wu J, Feng X, Xin R. Definer: A computational method for accurate identification of RNA pseudouridine sites based on deep learning. PLoS One 2025; 20:e0320077. [PMID: 40273178 PMCID: PMC12021131 DOI: 10.1371/journal.pone.0320077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 02/12/2025] [Indexed: 04/26/2025] Open
Abstract
Pseudouridine is an important modification site, which is widely present in a variety of non-coding RNAs and is involved in a variety of important biological processes. Studies have shown that pseudouridine is important in many biological functions such as gene expression, RNA structural stability, and various diseases. Therefore, accurate identification of pseudouridine sites can effectively explain the functional mechanism of this modification site. Due to the rapid increase of genomics data, traditional biological experimental methods to identify RNA modification sites can no longer meet the practical needs, and it is necessary to accurately identify pseudouridine sites from high-throughput RNA sequence data by computational methods. In this study, we propose a deep learning-based computational method, Definer, to accurately identify RNA pseudouridine loci in three species, Homo sapiens, Saccharomyces cerevisiae and Mus musculus. The method incorporates two sequence coding schemes, including NCP and One-hot, and then feeds the extracted RNA sequence features into a deep learning model constructed from CNN, GRU and Attention. The benchmark dataset contains data from three species, H. sapiens, S. cerevisiae and M. musculus, and the results using 10-fold cross-validation show that Definer significantly outperforms other existing methods. Meanwhile, the data sets of two species, H. sapiens and S. cerevisiae, were tested independently to further demonstrate the predictive ability of the model. In summary, our method, Definer, can accurately identify pseudouridine modification sites in RNA.
Collapse
Affiliation(s)
- Bo Han
- Jilin Chemical Hospital, Jilin, P.R. China
| | - Sudan Bai
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, P.R. China
| | - Yang Liu
- Jilin Chemical Hospital, Jilin, P.R. China
| | - Jiezhang Wu
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, P.R. China
| | - Xin Feng
- School of Science, Jilin Institute of Chemical Technology, Jilin, P.R. China
| | - Ruihao Xin
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, P.R. China
| |
Collapse
|
4
|
Dao F, Xie X, Zhang H, Guan Z, Wu C, Su W, Wei Y, Hong F, Luo X, Xie S, Lai H, Gao D, Yang Y, Zhang Y, Ning L, Li S, Hao Y, Lebeau B, Ling CCY, Huang J, Fullwood MJ, Lin H, Lv H. PlantEMS: A comprehensive database of epigenetic modification sites across multiple plant species. PLANT COMMUNICATIONS 2025; 6:101228. [PMID: 39709521 PMCID: PMC12010403 DOI: 10.1016/j.xplc.2024.101228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 09/29/2024] [Accepted: 12/18/2024] [Indexed: 12/23/2024]
Abstract
In summary, PlantEMS is designed to advance plant epigenetics research by providing a comprehensive repository of multi-omics and multi-modification data. This resource enables detailed investigations into the epigenetic regulatory mechanisms underlying essential plant traits and responses, potentially informing innovative strategies for crop management, monitoring, and development.
Collapse
Affiliation(s)
- Fuying Dao
- School of Biological Sciences, Nanyang Technological University, Singapore 639798, Singapore
| | - Xueqin Xie
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Hongqi Zhang
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Zhengxing Guan
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Changchun Wu
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Wei Su
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Yijie Wei
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Feitong Hong
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Xinwei Luo
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Sijia Xie
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Hongyan Lai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Dong Gao
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Yuhe Yang
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137 China
| | - Lin Ning
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| | - Shihao Li
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Yuduo Hao
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Benjamin Lebeau
- School of Biological Sciences, Nanyang Technological University, Singapore 639798, Singapore
| | - Crystal Chia Yin Ling
- School of Biological Sciences, Nanyang Technological University, Singapore 639798, Singapore
| | - Jian Huang
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China
| | - Melissa Jane Fullwood
- School of Biological Sciences, Nanyang Technological University, Singapore 639798, Singapore.
| | - Hao Lin
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China.
| | - Hao Lv
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P.R. China.
| |
Collapse
|
5
|
Fu H, Ding Z, Wang W. Trans-m5C: A transformer-based model for predicting 5-methylcytosine (m5C) sites. Methods 2025; 234:178-186. [PMID: 39742984 DOI: 10.1016/j.ymeth.2024.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 10/31/2024] [Accepted: 12/11/2024] [Indexed: 01/04/2025] Open
Abstract
5-Methylcytosine (m5C) plays a pivotal role in various RNA metabolic processes, including RNA localization, stability, and translation. Current high-throughput sequencing technologies for m5C site identification are resource-intensive in terms of cost, labor, and time. As such, there is a pressing need for efficient computational approaches. Many existing computational methods rely on intricate hand-crafted features, requiring unavailable features, often leading to suboptimal prediction accuracy. Addressing these challenges, we introduce a novel deep-learning method, Trans-m5C. We first categorize m5C sites into NSUN2-dependent and NSUN6-dependent types for independent feature extraction. Subsequently, meticulously crafted transformer neural networks are employed to distill global features. The prediction of m5C sites is then accomplished using a discriminator built from a multi-layer perceptron. A rigorous evaluation for the performance of Trans-m5C on experimentally validated m5C data from human and mouse species reveals that our method offers a competitive edge over both baseline and existing methodologies.
Collapse
Affiliation(s)
- Haitao Fu
- School of Artificial Intelligence, Hubei University, Wuhan, 430062, China
| | - Zewen Ding
- University of Edinburgh, Centre for Discovery Brain Sciences, Edinburgh, EH89XD, United Kingdom
| | - Wen Wang
- University of Edinburgh, Queen's Medical Research Institute, Edinburgh, EH164TJ, United Kingdom.
| |
Collapse
|
6
|
Abbas Z, Rehman MU, Tayara H, Lee SW, Chong KT. m5C-Seq: Machine learning-enhanced profiling of RNA 5-methylcytosine modifications. Comput Biol Med 2024; 182:109087. [PMID: 39232403 DOI: 10.1016/j.compbiomed.2024.109087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 08/13/2024] [Accepted: 08/29/2024] [Indexed: 09/06/2024]
Abstract
Epigenetic modifications, particularly RNA methylation and histone alterations, play a crucial role in heredity, development, and disease. Among these, RNA 5-methylcytosine (m5C) is the most prevalent RNA modification in mammalian cells, essential for processes such as ribosome synthesis, translational fidelity, mRNA nuclear export, turnover, and translation. The increasing volume of nucleotide sequences has led to the development of machine learning-based predictors for m5C site prediction. However, these predictors often face challenges related to training data limitations and overfitting due to insufficient external validation. This study introduces m5C-Seq, an ensemble learning approach for RNA modification profiling, designed to address these issues. m5C-Seq employs a meta-classifier that integrates 15 probabilities generated from a novel, large dataset using systematic encoding methods to make final predictions. Demonstrating superior performance compared to existing predictors, m5C-Seq represents a significant advancement in accurate RNA modification profiling. The code and the newly established datasets are made available through GitHub at https://github.com/Z-Abbas/m5C-Seq.
Collapse
Affiliation(s)
- Zeeshan Abbas
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, South Korea
| | - Mobeen Ur Rehman
- Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University, United Arab Emirates
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea
| | - Seung Won Lee
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea.
| |
Collapse
|
7
|
Malebary SJ, Alromema N, Suleman MT, Saleem M. m5c-iDeep: 5-Methylcytosine sites identification through deep learning. Methods 2024; 230:80-90. [PMID: 39089345 DOI: 10.1016/j.ymeth.2024.07.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 07/16/2024] [Accepted: 07/23/2024] [Indexed: 08/03/2024] Open
Abstract
5-Methylcytosine (m5c) is a modified cytosine base which is formed as the result of addition of methyl group added at position 5 of carbon. This modification is one of the most common PTM that used to occur in almost all types of RNA. The conventional laboratory methods do not provide quick reliable identification of m5c sites. However, the sequence data readiness has made it feasible to develop computationally intelligent models that optimize the identification process for accuracy and robustness. The present research focused on the development of in-silico methods built using deep learning models. The encoded data was then fed into deep learning models, which included gated recurrent unit (GRU), long short-term memory (LSTM), and bi-directional LSTM (Bi-LSTM). After that, the models were subjected to a rigorous evaluation process that included both independent set testing and 10-fold cross validation. The results revealed that LSTM-based model, m5c-iDeep, outperformed revealing 99.9 % accuracy while comparing with existing m5c predictors. In order to facilitate researchers, m5c-iDeep was also deployed on a web-based server which is accessible at https://taseersuleman-m5c-ideep-m5c-ideep.streamlit.app/.
Collapse
Affiliation(s)
- Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia
| | - Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia.
| | - Muhammad Taseer Suleman
- Department of Criminology and Forensic Sciences, Lahore Garrison University, Lahore Pakistan; Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore 54770 Pakistan
| | - Maham Saleem
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore 54770 Pakistan
| |
Collapse
|
8
|
Bortoletto E, Rosani U. Bioinformatics for Inosine: Tools and Approaches to Trace This Elusive RNA Modification. Genes (Basel) 2024; 15:996. [PMID: 39202357 PMCID: PMC11353476 DOI: 10.3390/genes15080996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 07/23/2024] [Accepted: 07/25/2024] [Indexed: 09/03/2024] Open
Abstract
Inosine is a nucleotide resulting from the deamination of adenosine in RNA. This chemical modification process, known as RNA editing, is typically mediated by a family of double-stranded RNA binding proteins named Adenosine Deaminase Acting on dsRNA (ADAR). While the presence of ADAR orthologs has been traced throughout the evolution of metazoans, the existence and extension of RNA editing have been characterized in a more limited number of animals so far. Undoubtedly, ADAR-mediated RNA editing plays a vital role in physiology, organismal development and disease, making the understanding of the evolutionary conservation of this phenomenon pivotal to a deep characterization of relevant biological processes. However, the lack of direct high-throughput methods to reveal RNA modifications at single nucleotide resolution limited an extended investigation of RNA editing. Nowadays, these methods have been developed, and appropriate bioinformatic pipelines are required to fully exploit this data, which can complement existing approaches to detect ADAR editing. Here, we review the current literature on the "bioinformatics for inosine" subject and we discuss future research avenues in the field.
Collapse
Affiliation(s)
| | - Umberto Rosani
- Department of Biology, University of Padova, 35131 Padova, Italy;
| |
Collapse
|
9
|
Wang M, Ali H, Xu Y, Xie J, Xu S. BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities. J Biol Chem 2024; 300:107140. [PMID: 38447795 PMCID: PMC10997841 DOI: 10.1016/j.jbc.2024.107140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/17/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024] Open
Abstract
RNA modification, a posttranscriptional regulatory mechanism, significantly influences RNA biogenesis and function. The accurate identification of modification sites is paramount for investigating their biological implications. Methods for encoding RNA sequence into numerical data play a crucial role in developing robust models for predicting modification sites. However, existing techniques suffer from limitations, including inadequate information representation, challenges in effectively integrating positional and sequential information, and the generation of irrelevant or redundant features when combining multiple approaches. These deficiencies hinder the effectiveness of machine learning models in addressing the performance challenges associated with predicting RNA modification sites. Here, we introduce a novel RNA sequence feature representation method, named BiPSTP, which utilizes bidirectional trinucleotide position-specific propensities. We employ the parameter ξ to denote the interval between the current nucleotide and its adjacent forward or backward dinucleotide, enabling the extraction of positional and sequential information from RNA sequences. Leveraging the BiPSTP method, we have developed the prediction model mRNAPred using support vector machine classifier to identify multiple types of RNA modification sites. We evaluate the performance of our BiPSTP method and mRNAPred model across 12 distinct RNA modification types. Our experimental results demonstrate the superiority of the mRNAPred model compared to state-of-art models in the domain of RNA modification sites identification. Importantly, our BiPSTP method enhances the robustness and generalization performance of prediction models. Notably, it can be applied to feature extraction from DNA sequences to predict other biological modification sites.
Collapse
Affiliation(s)
- Mingzhao Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Haider Ali
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yandi Xu
- School of Computer Science, Shaanxi Normal University, Xi'an, China; College of Life Sciences, Shaanxi Normal University, Xi'an, China
| | - Juanying Xie
- School of Computer Science, Shaanxi Normal University, Xi'an, China.
| | - Shengquan Xu
- College of Life Sciences, Shaanxi Normal University, Xi'an, China.
| |
Collapse
|
10
|
Aslam I, Shah S, Jabeen S, ELAffendi M, A Abdel Latif A, Ul Haq N, Ali G. A CNN based m5c RNA methylation predictor. Sci Rep 2023; 13:21885. [PMID: 38081880 PMCID: PMC10713599 DOI: 10.1038/s41598-023-48751-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
Post-transcriptional modifications of RNA play a key role in performing a variety of biological processes, such as stability and immune tolerance, RNA splicing, protein translation and RNA degradation. One of these RNA modifications is m5c which participates in various cellular functions like RNA structural stability and translation efficiency, got popularity among biologists. By applying biological experiments to detect RNA m5c methylation sites would require much more efforts, time and money. Most of the researchers are using pre-processed RNA sequences of 41 nucleotides where the methylated cytosine is in the center. Therefore, it is possible that some of the information around these motif may have lost. The conventional methods are unable to process the RNA sequence directly due to high dimensionality and thus need optimized techniques for better features extraction. To handle the above challenges the goal of this study is to employ an end-to-end, 1D CNN based model to classify and interpret m5c methylated data sites. Moreover, our aim is to analyze the sequence in its full length where the methylated cytosine may not be in the center. The evaluation of the proposed architecture showed a promising results by outperforming state-of-the-art techniques in terms of sensitivity and accuracy. Our model achieve 96.70% sensitivity and 96.21% accuracy for 41 nucleotides sequences while 96.10% accuracy for full length sequences.
Collapse
Affiliation(s)
- Irum Aslam
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Sajid Shah
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Saima Jabeen
- College of Engineering, AI Research Center, Alfaisal University, Riyadh, 50927, Saudi Arabia.
| | - Mohammed ELAffendi
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Asmaa A Abdel Latif
- Public Health and Community Medicine Department (Industrial medicine and occupational health specialty, Faculty of Medicine, Menoufia University, Shibîn el Kôm, Egypt
| | - Nuhman Ul Haq
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Gauhar Ali
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| |
Collapse
|
11
|
Jia J, Cao X, Wei Z. DLC-ac4C: A Prediction Model for N4-acetylcytidine Sites in Human mRNA Based on DenseNet and Bidirectional LSTM Methods. Curr Genomics 2023; 24:171-186. [PMID: 38178985 PMCID: PMC10761336 DOI: 10.2174/0113892029270191231013111911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 09/13/2023] [Accepted: 09/21/2023] [Indexed: 01/06/2024] Open
Abstract
Introduction N4 acetylcytidine (ac4C) is a highly conserved nucleoside modification that is essential for the regulation of immune functions in organisms. Currently, the identification of ac4C is primarily achieved using biological methods, which can be time-consuming and labor-intensive. In contrast, accurate identification of ac4C by computational methods has become a more effective method for classification and prediction. Aim To the best of our knowledge, although there are several computational methods for ac4C locus prediction, the performance of the models they constructed is poor, and the network structure they used is relatively simple and suffers from the disadvantage of network degradation. This study aims to improve these limitations by proposing a predictive model based on integrated deep learning to better help identify ac4C sites. Methods In this study, we propose a new integrated deep learning prediction framework, DLC-ac4C. First, we encode RNA sequences based on three feature encoding schemes, namely C2 encoding, nucleotide chemical property (NCP) encoding, and nucleotide density (ND) encoding. Second, one-dimensional convolutional layers and densely connected convolutional networks (DenseNet) are used to learn local features, and bi-directional long short-term memory networks (Bi-LSTM) are used to learn global features. Third, a channel attention mechanism is introduced to determine the importance of sequence characteristics. Finally, a homomorphic integration strategy is used to limit the generalization error of the model, which further improves the performance of the model. Results The DLC-ac4C model performed well in terms of sensitivity (Sn), specificity (Sp), accuracy (Acc), Mathews correlation coefficient (MCC), and area under the curve (AUC) for the independent test data with 86.23%, 79.71%, 82.97%, 66.08%, and 90.42%, respectively, which was significantly better than the prediction accuracy of the existing methods. Conclusion Our model not only combines DenseNet and Bi-LSTM, but also uses the channel attention mechanism to better capture hidden information features from a sequence perspective, and can identify ac4C sites more effectively.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Xiaojing Cao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Zhangying Wei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| |
Collapse
|
12
|
Zhang Y, Ge F, Li F, Yang X, Song J, Yu DJ. Prediction of Multiple Types of RNA Modifications via Biological Language Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3205-3214. [PMID: 37289599 DOI: 10.1109/tcbb.2023.3283985] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
It has been demonstrated that RNA modifications play essential roles in multiple biological processes. Accurate identification of RNA modifications in the transcriptome is critical for providing insights into the biological functions and mechanisms. Many tools have been developed for predicting RNA modifications at single-base resolution, which employ conventional feature engineering methods that focus on feature design and feature selection processes that require extensive biological expertise and may introduce redundant information. With the rapid development of artificial intelligence technologies, end-to-end methods are favorably received by researchers. Nevertheless, each well-trained model is only suitable for a specific RNA methylation modification type for nearly all of these approaches. In this study, we present MRM-BERT by feeding task-specific sequences into the powerful BERT (Bidirectional Encoder Representations from Transformers) model and implementing fine-tuning, which exhibits competitive performance to the state-of-the-art methods. MRM-BERT avoids repeated de novo training of the model and can predict multiple RNA modifications such as pseudouridine, m6A, m5C, and m1A in Mus musculus, Arabidopsis thaliana, and Saccharomyces cerevisiae. In addition, we analyse the attention heads to provide high attention regions for the prediction, and conduct saturated in silico mutagenesis of the input sequences to discover potential changes of RNA modifications, which can better assist researchers in their follow-up research.
Collapse
|
13
|
Liang S, Zhao Y, Jin J, Qiao J, Wang D, Wang Y, Wei L. Rm-LR: A long-range-based deep learning model for predicting multiple types of RNA modifications. Comput Biol Med 2023; 164:107238. [PMID: 37515874 DOI: 10.1016/j.compbiomed.2023.107238] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Recent research has highlighted the pivotal role of RNA post-transcriptional modifications in the regulation of RNA expression and function. Accurate identification of RNA modification sites is important for understanding RNA function. In this study, we propose a novel RNA modification prediction method, namely Rm-LR, which leverages a long-range-based deep learning approach to accurately predict multiple types of RNA modifications using RNA sequences only. Rm-LR incorporates two large-scale RNA language pre-trained models to capture discriminative sequential information and learn local important features, which are subsequently integrated through a bilinear attention network. Rm-LR supports a total of ten RNA modification types (m6A, m1A, m5C, m5U, m6Am, Ψ, Am, Cm, Gm, and Um) and significantly outperforms the state-of-the-art methods in terms of predictive capability on benchmark datasets. Experimental results show the effectiveness and superiority of Rm-LR in prediction of various RNA modifications, demonstrating the strong adaptability and robustness of our proposed model. We demonstrate that RNA language pretrained models enable to learn dense biological sequential representations from large-scale long-range RNA corpus, and meanwhile enhance the interpretability of the models. This work contributes to the development of accurate and reliable computational models for RNA modification prediction, providing insights into the complex landscape of RNA modifications.
Collapse
Affiliation(s)
- Sirui Liang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yanxi Zhao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Ding Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China.
| |
Collapse
|
14
|
Abbas Z, Rehman MU, Tayara H, Zou Q, Chong KT. XGBoost framework with feature selection for the prediction of RNA N5-methylcytosine sites. Mol Ther 2023; 31:2543-2551. [PMID: 37271991 PMCID: PMC10422016 DOI: 10.1016/j.ymthe.2023.05.016] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/06/2023] [Accepted: 05/31/2023] [Indexed: 06/06/2023] Open
Abstract
5-methylcytosine (m5C) is indeed a critical post-transcriptional alteration that is widely present in various kinds of RNAs and is crucial to the fundamental biological processes. By correctly identifying the m5C-methylation sites on RNA, clinicians can more clearly comprehend the precise function of these m5C-sites in different biological processes. Due to their effectiveness and affordability, computational methods have received greater attention over the last few years for the identification of methylation sites in various species. To precisely identify RNA m5C locations in five different species including Homo sapiens, Arabidopsis thaliana, Mus musculus, Drosophila melanogaster, and Danio rerio, we proposed a more effective and accurate model named m5C-pred. To create m5C-pred, five distinct feature encoding techniques were combined to extract features from the RNA sequence, and then we used SHapley Additive exPlanations to choose the best features among them, followed by XGBoost as a classifier. We applied the novel optimization method called Optuna to quickly and efficiently determine the best hyperparameters. Finally, the proposed model was evaluated using independent test datasets, and we compared the results with the previous methods. Our approach, m5C- pred, is anticipated to be useful for accurately identifying m5C sites, outperforming the currently available state-of-the-art techniques.
Collapse
Affiliation(s)
- Zeeshan Abbas
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Mobeen Ur Rehman
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea.
| |
Collapse
|
15
|
Song B, Huang D, Zhang Y, Wei Z, Su J, Pedro de Magalhães J, Rigden DJ, Meng J, Chen K. m6A-TSHub: Unveiling the Context-specific m 6A Methylation and m 6A-affecting Mutations in 23 Human Tissues. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:678-694. [PMID: 36096444 PMCID: PMC10787194 DOI: 10.1016/j.gpb.2022.09.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 08/19/2022] [Accepted: 09/02/2022] [Indexed: 06/15/2023]
Abstract
As the most pervasive epigenetic marker present on mRNAs and long non-coding RNAs (lncRNAs), N6-methyladenosine (m6A) RNA methylation has been shown to participate in essential biological processes. Recent studies have revealed the distinct patterns of m6A methylome across human tissues, and a major challenge remains in elucidating the tissue-specific presence and circuitry of m6A methylation. We present here a comprehensive online platform, m6A-TSHub, for unveiling the context-specific m6A methylation and genetic mutations that potentially regulate m6A epigenetic mark. m6A-TSHub consists of four core components, including (1) m6A-TSDB, a comprehensive database of 184,554 functionally annotated m6A sites derived from 23 human tissues and 499,369 m6A sites from 25 tumor conditions, respectively; (2) m6A-TSFinder, a web server for high-accuracy prediction of m6A methylation sites within a specific tissue from RNA sequences, which was constructed using multi-instance deep neural networks with gated attention; (3) m6A-TSVar, a web server for assessing the impact of genetic variants on tissue-specific m6A RNA modifications; and (4) m6A-CAVar, a database of 587,983 The Cancer Genome Atlas (TCGA) cancer mutations (derived from 27 cancer types) that were predicted to affect m6A modifications in the primary tissue of cancers. The database should make a useful resource for studying the m6A methylome and the genetic factors of epitranscriptome disturbance in a specific tissue (or cancer type). m6A-TSHub is accessible at www.xjtlu.edu.cn/biologicalsciences/m6ats.
Collapse
Affiliation(s)
- Bowen Song
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China; Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Daiyun Huang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Department of Computer Science, University of Liverpool, Liverpool L69 7ZB, United Kingdom.
| | - Yuxin Zhang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Ageing & Chronic Disease, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Jionglong Su
- School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - João Pedro de Magalhães
- Institute of Ageing & Chronic Disease, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Jia Meng
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom; Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; AI University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Kunqi Chen
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China.
| |
Collapse
|
16
|
Charoenkwan P, Schaduangrat N, Shoombuatong W. StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens. BMC Bioinformatics 2023; 24:301. [PMID: 37507654 PMCID: PMC10386778 DOI: 10.1186/s12859-023-05421-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
BACKGROUND The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision. RESULTS In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866. CONCLUSIONS In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server ( http://2pmlab.camt.cmu.ac.th/StackTTCA ) to maximize user convenience for high-throughput screening of novel TTCAs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
17
|
Jia J, Wei Z, Cao X. EMDL-ac4C: identifying N4-acetylcytidine based on ensemble two-branch residual connection DenseNet and attention. Front Genet 2023; 14:1232038. [PMID: 37519885 PMCID: PMC10372626 DOI: 10.3389/fgene.2023.1232038] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 06/29/2023] [Indexed: 08/01/2023] Open
Abstract
Introduction: N4-acetylcytidine (ac4C) is a critical acetylation modification that has an essential function in protein translation and is associated with a number of human diseases. Methods: The process of identifying ac4C sites by biological experiments is too cumbersome and costly. And the performance of several existing computational models needs to be improved. Therefore, we propose a new deep learning tool EMDL-ac4C to predict ac4C sites, which uses a simple one-hot encoding for a unbalanced dataset using a downsampled ensemble deep learning network to extract important features to identify ac4C sites. The base learner of this ensemble model consists of a modified DenseNet and Squeeze-and-Excitation Networks. In addition, we innovatively add a convolutional residual structure in parallel with the dense block to achieve the effect of two-layer feature extraction. Results: The average accuracy (Acc), mathews correlation coefficient (MCC), and area under the curve Area under curve of EMDL-ac4C on ten independent testing sets are 80.84%, 61.77%, and 87.94%, respectively. Discussion: Multiple experimental comparisons indicate that EMDL-ac4C outperforms existing predictors and it greatly improved the predictive performance of the ac4C sites. At the same time, EMDL-ac4C could provide a valuable reference for the next part of the study. The source code and experimental data are available at: https://github.com/13133989982/EMDLac4C.
Collapse
Affiliation(s)
- Jianhua Jia
- *Correspondence: Jianhua Jia, ; Zhangying Wei,
| | | | | |
Collapse
|
18
|
Kong Y, Yu J, Ge S, Fan X. Novel insight into RNA modifications in tumor immunity: Promising targets to prevent tumor immune escape. Innovation (N Y) 2023; 4:100452. [PMID: 37485079 PMCID: PMC10362524 DOI: 10.1016/j.xinn.2023.100452] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 05/23/2023] [Indexed: 07/25/2023] Open
Abstract
An immunosuppressive state is a typical feature of the tumor microenvironment. Despite the dramatic success of immune checkpoint inhibitor (ICI) therapy in preventing tumor cell escape from immune surveillance, primary and acquired resistance have limited its clinical use. Notably, recent clinical trials have shown that epigenetic drugs can significantly improve the outcome of ICI therapy in various cancers, indicating the importance of epigenetic modifications in immune regulation of tumors. Recently, RNA modifications (N6-methyladenosine [m6A], N1-methyladenosine [m1A], 5-methylcytosine [m5C], etc.), novel hotspot areas of epigenetic research, have been shown to play crucial roles in protumor and antitumor immunity. In this review, we provide a comprehensive understanding of how m6A, m1A, and m5C function in tumor immunity by directly regulating different immune cells as well as indirectly regulating tumor cells through different mechanisms, including modulating the expression of immune checkpoints, inducing metabolic reprogramming, and affecting the secretion of immune-related factors. Finally, we discuss the current status of strategies targeting RNA modifications to prevent tumor immune escape, highlighting their potential.
Collapse
Affiliation(s)
- Yuxin Kong
- Department of Ophthalmology, Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai 200001, China
| | - Jie Yu
- Department of Ophthalmology, Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai 200001, China
| | - Shengfang Ge
- Department of Ophthalmology, Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai 200001, China
| | - Xianqun Fan
- Department of Ophthalmology, Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai 200001, China
| |
Collapse
|
19
|
Ultrasensitive photoelectrochemical biosensor for DNA 5-methylcytosine analysis based on co-sensitization strategy combined with bridged DNA nanoprobe. Talanta 2023; 254:124140. [PMID: 36463802 DOI: 10.1016/j.talanta.2022.124140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 11/19/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022]
Abstract
Altered DNA methylation in the form of 5-methylcytosine (5-mC) patterns is correlated with disease diagnosis, prognosis, and treatment response. Therefore, accurate analysis of 5-mC is of great significance for the diagnosis of diseases. Here, an efficient enhanced photoelectrochemical (PEC) biosensor was designed for the quantitative analysis of DNA 5-mC based on a cascaded energy level aligned co-sensitization strategy coupling with the bridged DNA nanoprobe (BDN). Firstly, Au nanoparticle/graphite phase carbon nitride/titanium dioxide (AuNPs/g-C3N4@TiO2) nanocomposite was synthesized through in situ growth of AuNPs on g-C3N4@TiO2 surface as a matrix to provide a stable background signal. Next, BDN with a high mass transfer rate synthesized from a pair of DNA tetrahedral as nanomechanical handles was used as a capture probe to bind to the target sequence. The polydopamine nanosphere was applied to load with CdTe QDs (PDANS-CdTe QDs) as a photocurrent label of 5-mC antibodies. When the 5-mC existed, a large number of PDANS-Ab-CdTe QDs were introduced to the electrode surface, the formed CdTe QDs/AuNPs/g-C3N4@TiO2 co-sensitive structure could effectively enhance the electron transfer capability and photocurrent response rate due to the effective cascade energy level arrangement, leading to a significantly enhanced photocurrent signal. The proposed PEC biosensor manifested a wide range from 10-17 M to 10-7 M and a detection limit of 2.2 aM. Meanwhile, the excellent performance indicated the practicability of the designed strategy, thus being capable of the clinical diagnosis of 5-mC.
Collapse
|
20
|
Dao FY, Liu ML, Su W, Lv H, Zhang ZY, Lin H, Liu L. AcrPred: A hybrid optimization with enumerated machine learning algorithm to predict Anti-CRISPR proteins. Int J Biol Macromol 2023; 228:706-714. [PMID: 36584777 DOI: 10.1016/j.ijbiomac.2022.12.250] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/12/2022] [Accepted: 12/22/2022] [Indexed: 12/29/2022]
Abstract
CRISPR-Cas, as a tool for gene editing, has received extensive attention in recent years. Anti-CRISPR (Acr) proteins can inactivate the CRISPR-Cas defense system during interference phase, and can be used as a potential tool for the regulation of gene editing. In-depth study of Anti-CRISPR proteins is of great significance for the implementation of gene editing. In this study, we developed a high-accuracy prediction model based on two-step model fusion strategy, called AcrPred, which could produce an AUC of 0.952 with independent dataset validation. To further validate the proposed model, we compared with published tools and correctly identified 9 of 10 new Acr proteins, indicating the strong generalization ability of our model. Finally, for the convenience of related wet-experimental researchers, a user-friendly web-server AcrPred (Anti-CRISPR proteins Prediction) was established at http://lin-group.cn/server/AcrPred, by which users can easily identify potential Anti-CRISPR proteins.
Collapse
Affiliation(s)
- Fu-Ying Dao
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; School of Biological Sciences, Nanyang Technological University, Singapore 639798, Singapore
| | - Meng-Lu Liu
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Su
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lv
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Department of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland; SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Zhao-Yue Zhang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China.
| |
Collapse
|
21
|
Taguchi YH. Bioinformatic tools for epitranscriptomics. Am J Physiol Cell Physiol 2023; 324:C447-C457. [PMID: 36468841 DOI: 10.1152/ajpcell.00437.2022] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/17/2022] [Accepted: 11/22/2022] [Indexed: 12/12/2022]
Abstract
The epitranscriptome, defined as RNA modifications that do not involve alterations in the nucleotide sequence, is a popular topic in the genomic sciences. Because we need massive computational techniques to identify epitranscriptomes within individual transcripts, many tools have been developed to infer epitranscriptomic sites as well as to process datasets using high-throughput sequencing. In this review, we summarize recent developments in epitranscriptome spatial detection and data analysis and discuss their progression.
Collapse
Affiliation(s)
- Y-H Taguchi
- Department of Physics, Chuo University, Tokyo, Japan
| |
Collapse
|
22
|
Wang Y, Wang X, Cui X, Meng J, Rong R. Self-attention enabled deep learning of dihydrouridine (D) modification on mRNAs unveiled a distinct sequence signature from tRNAs. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 31:411-420. [PMID: 36845339 PMCID: PMC9945750 DOI: 10.1016/j.omtn.2023.01.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 01/23/2023] [Indexed: 01/28/2023]
Abstract
Dihydrouridine (D) is a modified pyrimidine nucleotide universally found in viral, prokaryotic, and eukaryotic species. It serves as a metabolic modulator for various pathological conditions, and its elevated levels in tumors are associated with a series of cancers. Precise identification of D sites on RNA is vital for understanding its biological function. A number of computational approaches have been developed for predicting D sites on tRNAs; however, none have considered mRNAs. We present here DPred, the first computational tool for predicting D on mRNAs in yeast from the primary RNA sequences. Built on a local self-attention layer and a convolutional neural network (CNN) layer, the proposed deep learning model outperformed classic machine learning approaches (random forest, support vector machines, etc.) and achieved reasonable accuracy and reliability with areas under the curve of 0.9166 and 0.9027 in jackknife cross-validation and on an independent testing dataset, respectively. Importantly, we showed that distinct sequence signatures are associated with the D sites on mRNAs and tRNAs, implying potentially different formation mechanisms and putative divergent functionality of this modification on the two types of RNA. DPred is available as a user-friendly Web server.
Collapse
Affiliation(s)
- Yue Wang
- Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China,Department of Computer Science, University of Liverpool, L69 7ZB Liverpool, UK
| | - Xuan Wang
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Xiaodong Cui
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, China
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China,AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China,Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, UK
| | - Rong Rong
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China,Corresponding author: Rong Rong, Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China.
| |
Collapse
|
23
|
Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022; 16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Ribonucleic acid (RNA) methylation is the most abundant modification in biological systems, accounting for 60% of all RNA modifications, and affects multiple aspects of RNA (including mRNAs, tRNAs, rRNAs, microRNAs, and long non-coding RNAs). Dysregulation of RNA methylation causes many developmental diseases through various mechanisms mediated by N 6-methyladenosine (m6A), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), 5-hydroxymethylcytosine (hm5C), and pseudouridine (Ψ). The emerging tools of RNA methylation can be used as diagnostic, preventive, and therapeutic markers. Here, we review the accumulated discoveries to date regarding the biological function and dynamic regulation of RNA methylation/modification, as well as the most popularly used techniques applied for profiling RNA epitranscriptome, to provide new ideas for growth and development.
Collapse
Affiliation(s)
- Jia Zou
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Hui Liu
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Wei Tan
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Yi-qi Chen
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Jing Dong
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Shu-yuan Bai
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Zhao-xia Wu
- Community Health Service Center, Wuchang Hospital, Wuhan, China
| | - Yan Zeng
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,
| |
Collapse
|
24
|
Ao C, Jiao S, Wang Y, Yu L, Zou Q. Biological Sequence Classification: A Review on Data and General Methods. RESEARCH (WASHINGTON, D.C.) 2022; 2022:0011. [PMID: 39285948 PMCID: PMC11404319 DOI: 10.34133/research.0011] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 10/25/2022] [Indexed: 09/19/2024]
Abstract
With the rapid development of biotechnology, the number of biological sequences has grown exponentially. The continuous expansion of biological sequence data promotes the application of machine learning in biological sequences to construct predictive models for mining biological sequence information. There are many branches of biological sequence classification research. In this review, we mainly focus on the function and modification classification of biological sequences based on machine learning. Sequence-based prediction and analysis are the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides. However, there are hundreds of classification models developed for biological sequences, and the quite varied specific methods seem dizzying at first glance. Here, we aim to establish a long-term support website (http://lab.malab.cn/~acy/BioseqData/home.html), which provides readers with detailed information on the classification method and download links to relevant datasets. We briefly introduce the steps to build an effective model framework for biological sequence data. In addition, a brief introduction to single-cell sequencing data analysis methods and applications in biology is also included. Finally, we discuss the current challenges and future perspectives of biological sequence classification research.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
25
|
Huang D, Chen K, Song B, Wei Z, Su J, Coenen F, de Magalhães JP, Rigden DJ, Meng J. Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation. Nucleic Acids Res 2022; 50:10290-10310. [PMID: 36155798 PMCID: PMC9561283 DOI: 10.1093/nar/gkac830] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 08/26/2022] [Accepted: 09/15/2022] [Indexed: 12/25/2022] Open
Abstract
As the most pervasive epigenetic mark present on mRNA and lncRNA, N6-methyladenosine (m6A) RNA methylation regulates all stages of RNA life in various biological processes and disease mechanisms. Computational methods for deciphering RNA modification have achieved great success in recent years; nevertheless, their potential remains underexploited. One reason for this is that existing models usually consider only the sequence of transcripts, ignoring the various regions (or geography) of transcripts such as 3'UTR and intron, where the epigenetic mark forms and functions. Here, we developed three simple yet powerful encoding schemes for transcripts to capture the submolecular geographic information of RNA, which is largely independent from sequences. We show that m6A prediction models based on geographic information alone can achieve comparable performances to classic sequence-based methods. Importantly, geographic information substantially enhances the accuracy of sequence-based models, enables isoform- and tissue-specific prediction of m6A sites, and improves m6A signal detection from direct RNA sequencing data. The geographic encoding schemes we developed have exhibited strong interpretability, and are applicable to not only m6A but also N1-methyladenosine (m1A), and can serve as a general and effective complement to the widely used sequence encoding schemes in deep learning applications concerning RNA transcripts.
Collapse
Affiliation(s)
- Daiyun Huang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Department of Computer Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - Kunqi Chen
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, PR China
| | - Bowen Song
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - Jionglong Su
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
| | - Frans Coenen
- Department of Computer Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - João Pedro de Magalhães
- Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou 215123, PR China
| |
Collapse
|
26
|
Li X, Zhang S, Shi H. An improved residual network using deep fusion for identifying RNA 5-methylcytosine sites. Bioinformatics 2022; 38:4271-4277. [PMID: 35866985 DOI: 10.1093/bioinformatics/btac532] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 06/30/2022] [Accepted: 07/21/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION 5-Methylcytosine (m5C) is a crucial post-transcriptional modification. With the development of technology, it is widely found in various RNAs. Numerous studies have indicated that m5C plays an essential role in various activities of organisms, such as tRNA recognition, stabilization of RNA structure, RNA metabolism and so on. Traditional identification is costly and time-consuming by wet biological experiments. Therefore, computational models are commonly used to identify the m5C sites. Due to the vast computing advantages of deep learning, it is feasible to construct the predictive model through deep learning algorithms. RESULTS In this study, we construct a model to identify m5C based on a deep fusion approach with an improved residual network. First, sequence features are extracted from the RNA sequences using Kmer, K-tuple nucleotide frequency component (KNFC), Pseudo dinucleotide composition (PseDNC) and Physical and chemical property (PCP). Kmer and KNFC extract information from a statistical point of view. PseDNC and PCP extract information from the physicochemical properties of RNA sequences. Then, two parts of information are fused with new features using bidirectional long- and short-term memory and attention mechanisms, respectively. Immediately after, the fused features are fed into the improved residual network for classification. Finally, 10-fold cross-validation and independent set testing are used to verify the credibility of the model. The results show that the accuracy reaches 91.87%, 95.55%, 92.27% and 95.60% on the training sets and independent test sets of Arabidopsis thaliana and M.musculus, respectively. This is a considerable improvement compared to previous studies and demonstrates the robust performance of our model. AVAILABILITY AND IMPLEMENTATION The data and code related to the study are available at https://github.com/alivelxj/m5c-DFRESG.
Collapse
Affiliation(s)
- Xinjie Li
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, P. R. China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, P. R. China
| | - Hongyan Shi
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, P. R. China
| |
Collapse
|
27
|
Hasan MM, Tsukiyama S, Cho JY, Kurata H, Alam MA, Liu X, Manavalan B, Deng HW. Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy. Mol Ther 2022; 30:2856-2867. [PMID: 35526094 PMCID: PMC9372321 DOI: 10.1016/j.ymthe.2022.05.001] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 04/25/2022] [Accepted: 05/03/2022] [Indexed: 11/30/2022] Open
Abstract
As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA.
| | - Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Jae Youl Cho
- Molecular Immunology Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Xiaowen Liu
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea.
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA.
| |
Collapse
|
28
|
FRTpred: A novel approach for accurate prediction of protein folding rate and type. Comput Biol Med 2022; 149:105911. [DOI: 10.1016/j.compbiomed.2022.105911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 07/08/2022] [Accepted: 07/23/2022] [Indexed: 11/20/2022]
|
29
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Circ-LocNet: A Computational Framework for Circular RNA Sub-Cellular Localization Prediction. Int J Mol Sci 2022; 23:ijms23158221. [PMID: 35897818 PMCID: PMC9329987 DOI: 10.3390/ijms23158221] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/15/2022] [Accepted: 07/20/2022] [Indexed: 02/04/2023] Open
Abstract
Circular ribonucleic acids (circRNAs) are novel non-coding RNAs that emanate from alternative splicing of precursor mRNA in reversed order across exons. Despite the abundant presence of circRNAs in human genes and their involvement in diverse physiological processes, the functionality of most circRNAs remains a mystery. Like other non-coding RNAs, sub-cellular localization knowledge of circRNAs has the aptitude to demystify the influence of circRNAs on protein synthesis, degradation, destination, their association with different diseases, and potential for drug development. To date, wet experimental approaches are being used to detect sub-cellular locations of circular RNAs. These approaches help to elucidate the role of circRNAs as protein scaffolds, RNA-binding protein (RBP) sponges, micro-RNA (miRNA) sponges, parental gene expression modifiers, alternative splicing regulators, and transcription regulators. To complement wet-lab experiments, considering the progress made by machine learning approaches for the determination of sub-cellular localization of other non-coding RNAs, the paper in hand develops a computational framework, Circ-LocNet, to precisely detect circRNA sub-cellular localization. Circ-LocNet performs comprehensive extrinsic evaluation of 7 residue frequency-based, residue order and frequency-based, and physio-chemical property-based sequence descriptors using the five most widely used machine learning classifiers. Further, it explores the performance impact of K-order sequence descriptor fusion where it ensembles similar as well dissimilar genres of statistical representation learning approaches to reap the combined benefits. Considering the diversity of statistical representation learning schemes, it assesses the performance of second-order, third-order, and going all the way up to seventh-order sequence descriptor fusion. A comprehensive empirical evaluation of Circ-LocNet over a newly developed benchmark dataset using different settings reveals that standalone residue frequency-based sequence descriptors and tree-based classifiers are more suitable to predict sub-cellular localization of circular RNAs. Further, K-order heterogeneous sequence descriptors fusion in combination with tree-based classifiers most accurately predict sub-cellular localization of circular RNAs. We anticipate this study will act as a rich baseline and push the development of robust computational methodologies for the accurate sub-cellular localization determination of novel circRNAs.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
- Correspondence:
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- School of Computer Science & Electrical Engineering, National University of Sciences and Technology, Islamabad 44000, Pakistan;
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
30
|
Jiang J, Song B, Chen K, Lu Z, Rong R, Zhong Y, Meng J. m6AmPred: Identifying RNA N6, 2'-O-dimethyladenosine (m 6A m) sites based on sequence-derived information. Methods 2022; 203:328-334. [PMID: 33540081 DOI: 10.1016/j.ymeth.2021.01.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 01/14/2021] [Accepted: 01/20/2021] [Indexed: 12/11/2022] Open
Abstract
N6,2'-O-dimethyladenosine (m6Am) is a reversible modification widely occurred on varied RNA molecules. The biological function of m6Am is yet to be known though recent studies have revealed its influences in cellular mRNA fate. Precise identification of m6Am sites on RNA is vital for the understanding of its biological functions. We present here m6AmPred, the first web server for in silico identification of m6Am sites from the primary sequences of RNA. Built upon the eXtreme Gradient Boosting with Dart algorithm (XgbDart) and EIIP-PseEIIP encoding scheme, m6AmPred achieved promising prediction performance with the AUCs greater than 0.954 when tested by 10-fold cross-validation and independent testing datasets. To critically test and validate the performance of m6AmPred, the experimentally verified m6Am sites from two data sources were cross-validated. The m6AmPred web server is freely accessible at: https://www.xjtlu.edu.cn/biologicalsciences/m6am, and it should make a useful tool for the researchers who are interested in N6,2'-O-dimethyladenosine RNA modification.
Collapse
Affiliation(s)
- Jie Jiang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, United Kingdom
| | - Bowen Song
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, United Kingdom.
| | - Kunqi Chen
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China; Institute of Ageing & Chronic Disease, University of Liverpool, L7 8TX Liverpool, United Kingdom
| | - Zhiliang Lu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Rong Rong
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Yu Zhong
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China; AI University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, United Kingdom
| |
Collapse
|
31
|
Charoenkwan P, Schaduangrat N, Lio' P, Moni MA, Manavalan B, Shoombuatong W. NEPTUNE: A novel computational approach for accurate and large-scale identification of tumor homing peptides. Comput Biol Med 2022; 148:105700. [PMID: 35715261 DOI: 10.1016/j.compbiomed.2022.105700] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/31/2022] [Accepted: 06/04/2022] [Indexed: 11/16/2022]
Abstract
Tumor homing peptides (THPs) play a crucial role in recognizing and specifically binding to cancer cells. Although experimental approaches can facilitate the precise identification of THPs, they are usually time-consuming, labor-intensive, and not cost-effective. However, computational approaches can identify THPs by utilizing sequence information alone, thus highlighting their great potential for large-scale identification of THPs. Herein, we propose NEPTUNE, a novel computational approach for the accurate and large-scale identification of THPs from sequence information. Specifically, we constructed variant baseline models from multiple feature encoding schemes coupled with six popular machine learning algorithms. Subsequently, we comprehensively assessed and investigated the effects of these baseline models on THP prediction. Finally, the probabilistic information generated by the optimal baseline models is fed into a support vector machine-based classifier to construct the final meta-predictor (NEPTUNE). Cross-validation and independent tests demonstrated that NEPTUNE achieved superior performance for THP prediction compared with its constituent baseline models and the existing methods. Moreover, we employed the powerful SHapley additive exPlanations method to improve the interpretation of NEPTUNE and elucidate the most important features for identifying THPs. Finally, we implemented an online web server using NEPTUNE, which is available at http://pmlabstack.pythonanywhere.com/NEPTUNE. NEPTUNE could be beneficial for the large-scale identification of unknown THP candidates for follow-up experimental validation.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland St Lucia, QLD, 4072, Australia
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
32
|
Liu Y, Shen Y, Wang H, Zhang Y, Zhu X. m5Cpred-XS: A New Method for Predicting RNA m5C Sites Based on XGBoost and SHAP. Front Genet 2022; 13:853258. [PMID: 35432446 PMCID: PMC9005994 DOI: 10.3389/fgene.2022.853258] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 02/16/2022] [Indexed: 11/13/2022] Open
Abstract
As one of the most important post-transcriptional modifications of RNA, 5-cytosine-methylation (m5C) is reported to closely relate to many chemical reactions and biological functions in cells. Recently, several computational methods have been proposed for identifying m5C sites. However, the accuracy and efficiency are still not satisfactory. In this study, we proposed a new method, m5Cpred-XS, for predicting m5C sites of H. sapiens, M. musculus, and A. thaliana. First, the powerful SHAP method was used to select the optimal feature subset from seven different kinds of sequence-based features. Second, different machine learning algorithms were used to train the models. The results of five-fold cross-validation indicate that the model based on XGBoost achieved the highest prediction accuracy. Finally, our model was compared with other state-of-the-art models, which indicates that m5Cpred-XS is superior to other methods. Moreover, we deployed the model on a web server that can be accessed through http://m5cpred-xs.zhulab.org.cn/, and m5Cpred-XS is expected to be a useful tool for studying m5C sites.
Collapse
Affiliation(s)
| | | | | | - Yong Zhang
- *Correspondence: Xiaolei Zhu, ; Yong Zhang,
| | | |
Collapse
|
33
|
Xiao X, Shao YT, Luo ZT, Qiu WR. m5C-HPromoter: An Ensemble Deep Learning Predictor for Identifying 5-methylcytosine Sites in Human Promoters. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220330150259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aims:
This paper is intended to identify 5-methylcytosine Sites in Human Promoters.
Background:
Aberrant DNA methylation patterns are often associated with tumor development, hypermethylation inhibits expression of tumor suppressor genes, and hypomethylation stimulates expression of certain oncogenes. Most DNA methylation occurs on the CpG island of gene promoter region.
Objective:
Therefore, a comprehensive display of the methylation status of the promoter region of human gene is extremely important for understanding cancer pathogenesis and function of post-transcriptional modification.
Method:
This paper constructed three human promoter methylation datasets, a total of 3 million sample sequences, of small cell lung cancer, non-small cell lung cancer, and hepatocellular carcinoma from Cancer Cell Line Encyclopedia (CCLE) database. Frequency-based One-Hot Encoding was used to encode the sample sequence, and an innovative stacking-based ensemble deep learning classifier was applied to establish the m5C-HPromoter predictor.
Result:
Taking the average of 10 times of 5-fold cross-validation, m5C-HPromoter obtained a good result of Accuracy (Acc) = 0.9270, Matthew's correlation coefficient (MCC) = 0.7234, Sensitivity (Sn) = 0.9123, and Specificity (Sp) = 0.9290.
Collapse
Affiliation(s)
- Xuan Xiao
- Department of Computer, Jing-De-Zhen Ceramic Institute, 333046, Jing-De-Zhen, China
| | - Yu-Tao Shao
- Department of Computer, Jing-De-Zhen Ceramic Institute, 333046, Jing-De-Zhen, China
| | - Zhen-Tao Luo
- Department of Computer, Jing-De-Zhen Ceramic Institute, 333046, Jing-De-Zhen, China
| | - Wang-Ren Qiu
- Department of Computer, Jing-De-Zhen Ceramic Institute, 333046, Jing-De-Zhen, China
| |
Collapse
|
34
|
Wang H, Zhao H, Zhang J, Han J, Liu Z. A parallel model of DenseCNN and ordered-neuron LSTM for generic and species-specific succinylation site prediction. Biotechnol Bioeng 2022; 119:1755-1767. [PMID: 35320585 DOI: 10.1002/bit.28091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 03/12/2022] [Accepted: 03/19/2022] [Indexed: 11/07/2022]
Abstract
Lysine succinylation (Ksucc) regulates various metabolic processes, participates in vital life processes, ans is involved in the occurrence and development of numerous diseases. Accurate recognition of succinylation sites can reveal underlying functional mechanisms and pathogenesis. However, most remain undetected. Moreover, a deep learning architecture focusing on generic and species-specific predictions is still lacking. Thus, we proposed a deep learning-based framework named Deep-Ksucc, combining a dense convolutional network (DenseCNN) and ordered-neuron long short-term memory (OnLSTM) in parallel, which took the cascading characteristics of sequence information and physicochemical properties as the input. The results of the generic and species-specific predictions indicated that Deep-Ksucc can identify sequence patterns of different organisms and recognize plenty of succinylation sites. The case study showed that Deep-Ksucc can serve as a reliable tool for biology verification and computer-aided recognition of succinylation sites. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Hong Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Jing Zhang
- Engineering Training Center, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Jiale Han
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Zhihao Liu
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| |
Collapse
|
35
|
Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties. Int J Mol Sci 2022; 23:ijms23063044. [PMID: 35328461 PMCID: PMC8950657 DOI: 10.3390/ijms23063044] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 02/25/2022] [Accepted: 03/09/2022] [Indexed: 12/03/2022] Open
Abstract
Dihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional roles. Traditional experimental techniques to identify D are laborious and time-consuming. In addition, there are few computational tools for such analysis. In this study, we utilized eleven sequence-derived feature extraction methods and implemented five popular machine algorithms to identify an optimal model. During data preprocessing, data were partitioned for training and testing. Oversampling was also adopted to reduce the effect of the imbalance between positive and negative samples. The best-performing model was obtained through a combination of random forest and nucleotide chemical property modeling. The optimized model presented high sensitivity and specificity values of 0.9688 and 0.9706 in independent tests, respectively. Our proposed model surpassed published tools in independent tests. Furthermore, a series of validations across several aspects was conducted in order to demonstrate the robustness and reliability of our model.
Collapse
|
36
|
Ahmad S, Charoenkwan P, Quinn JMW, Moni MA, Hasan MM, Lio' P, Shoombuatong W. SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins. Sci Rep 2022; 12:4106. [PMID: 35260777 PMCID: PMC8904530 DOI: 10.1038/s41598-022-08173-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/03/2022] [Indexed: 12/30/2022] Open
Abstract
Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION).
Collapse
Affiliation(s)
- Saeed Ahmad
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Julian M W Quinn
- Bone Biology Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia
| | - Mohammad Ali Moni
- Faculty of Health and Behavioural Sciences, School of Health and Rehabilitation Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Md Mehedi Hasan
- Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA, 70112, USA
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
37
|
Manavalan B, Basith S, Lee G. Comparative analysis of machine learning-based approaches for identifying therapeutic peptides targeting SARS-CoV-2. Brief Bioinform 2022; 23:bbab412. [PMID: 34595489 PMCID: PMC8500067 DOI: 10.1093/bib/bbab412] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/27/2021] [Accepted: 09/07/2021] [Indexed: 01/08/2023] Open
Abstract
Coronavirus disease 2019 (COVID-19) has impacted public health as well as societal and economic well-being. In the last two decades, various prediction algorithms and tools have been developed for predicting antiviral peptides (AVPs). The current COVID-19 pandemic has underscored the need to develop more efficient and accurate machine learning (ML)-based prediction algorithms for the rapid identification of therapeutic peptides against severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Several peptide-based ML approaches, including anti-coronavirus peptides (ACVPs), IL-6 inducing epitopes and other epitopes targeting SARS-CoV-2, have been implemented in COVID-19 therapeutics. Owing to the growing interest in the COVID-19 field, it is crucial to systematically compare the existing ML algorithms based on their performances. Accordingly, we comprehensively evaluated the state-of-the-art IL-6 and AVP predictors against coronaviruses in terms of core algorithms, feature encoding schemes, performance evaluation metrics and software usability. A comprehensive performance assessment was then conducted to evaluate the robustness and scalability of the existing predictors using well-constructed independent validation datasets. Additionally, we discussed the advantages and disadvantages of the existing methods, providing useful insights into the development of novel computational tools for characterizing and identifying epitopes or ACVPs. The insights gained from this review are anticipated to provide critical guidance to the scientific community in the rapid design and development of accurate and efficient next-generation in silico tools against SARS-CoV-2.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea
| |
Collapse
|
38
|
Predicting RNA 5-Methylcytosine Sites by Using Essential Sequence Features and Distributions. BIOMED RESEARCH INTERNATIONAL 2022; 2022:4035462. [PMID: 35071593 PMCID: PMC8776474 DOI: 10.1155/2022/4035462] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 12/07/2021] [Accepted: 12/22/2021] [Indexed: 12/15/2022]
Abstract
Methylation is one of the most common and considerable modifications in biological systems mediated by multiple enzymes. Recent studies have shown that methylation has been widely identified in different RNA molecules. RNA methylation modifications have various kinds, such as 5-methylcytosine (m5C). However, for individual methylation sites, their functions still remain to be elucidated. Testing of all methylation sites relies heavily on high-throughput sequencing technology, which is expensive and labor consuming. Thus, computational prediction approaches could serve as a substitute. In this study, multiple machine learning models were used to predict possible RNA m5C sites on the basis of mRNA sequences in human and mouse. Each site was represented by several features derived from
-mers of an RNA subsequence containing such site as center. The powerful max-relevance and min-redundancy (mRMR) feature selection method was employed to analyse these features. The outcome feature list was fed into incremental feature selection method, incorporating four classification algorithms, to build efficient models. Furthermore, the sites related to features used in the models were also investigated.
Collapse
|
39
|
Bilal A, Alarfaj FK, Khan RA, Suleman MT, Long H. m5c-iEnsem: 5-methylcytosine sites identification through ensemble models. BIOINFORMATICS (OXFORD, ENGLAND) 2022; 41:btae722. [PMID: 39657957 PMCID: PMC11911556 DOI: 10.1093/bioinformatics/btae722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 10/28/2024] [Accepted: 12/06/2024] [Indexed: 12/12/2024]
Abstract
MOTIVATION 5-Methylcytosine (m5c), a modified cytosine base, arises from adding a methyl group at the 5th carbon position. This modification is a prevalent form of post-transcriptional modification (PTM) found in various types of RNA. Traditional laboratory techniques often fail to provide rapid and accurate identification of m5c sites. However, with the growing accessibility of sequence data, expanding computational models offers a more efficient and reliable approach to m5c site detection. This research focused on creating advanced in-silico methods using ensemble learning techniques. The encoded data was processed through ensemble models, including bagging and boosting techniques. These models were then rigorously evaluated through independent testing and 10-fold cross-validation. RESULTS Among the models tested, the Bagging ensemble-based predictor, m5C-iEnsem, demonstrated superior performance to existing m5c prediction tools. AVAILABILITY AND IMPLEMENTATION To further support the research community, m5c-iEnsem has been made available via a user-friendly web server at https://m5c-iensem.streamlit.app/.
Collapse
Affiliation(s)
- Anas Bilal
- College of Information Science and Technology, Hainan Normal University, Haikou 571158, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou 571158, China
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU), Al-Ahsa 31982, Saudi Arabia
| | - Rafaqat Alam Khan
- Department of Software Engineering, Lahore Garrison University, Lahore 54000, Pakistan
| | | | - Haixia Long
- College of Information Science and Technology, Hainan Normal University, Haikou 571158, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou 571158, China
| |
Collapse
|
40
|
Gong Y, Liao B, Wang P, Zou Q. DrugHybrid_BS: Using Hybrid Feature Combined With Bagging-SVM to Predict Potentially Druggable Proteins. Front Pharmacol 2021; 12:771808. [PMID: 34916947 PMCID: PMC8669608 DOI: 10.3389/fphar.2021.771808] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/15/2021] [Indexed: 01/09/2023] Open
Abstract
Drug targets are biological macromolecules or biomolecule structures capable of specifically binding a therapeutic effect with a particular drug or regulating physiological functions. Due to the important value and role of drug targets in recent years, the prediction of potential drug targets has become a research hotspot. The key to the research and development of modern new drugs is first to identify potential drug targets. In this paper, a new predictor, DrugHybrid_BS, is developed based on hybrid features and Bagging-SVM to identify potentially druggable proteins. This method combines the three features of monoDiKGap (k = 2), cross-covariance, and grouped amino acid composition. It removes redundant features and analyses key features through MRMD and MRMD2.0. The cross-validation results show that 96.9944% of the potentially druggable proteins can be accurately identified, and the accuracy of the independent test set has reached 96.5665%. This all means that DrugHybrid_BS has the potential to become a useful predictive tool for druggable proteins. In addition, the hybrid key features can identify 80.0343% of the potentially druggable proteins combined with Bagging-SVM, which indicates the significance of this part of the features for research.
Collapse
Affiliation(s)
- Yuxin Gong
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Peng Wang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
41
|
Cheng X, Wang J, Li Q, Liu T. BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters. Molecules 2021; 26:7414. [PMID: 34946497 PMCID: PMC8704614 DOI: 10.3390/molecules26247414] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 12/04/2021] [Indexed: 12/04/2022] Open
Abstract
An important reason of cancer proliferation is the change in DNA methylation patterns, characterized by the localized hypermethylation of the promoters of tumor-suppressor genes together with an overall decrease in the level of 5-methylcytosine (5mC). Therefore, identifying the 5mC sites in the promoters is a critical step towards further understanding the diverse functions of DNA methylation in genetic diseases such as cancers and aging. However, most wet-lab experimental techniques are often time consuming and laborious for detecting 5mC sites. In this study, we proposed a deep learning-based approach, called BiLSTM-5mC, for accurately identifying 5mC sites in genome-wide DNA promoters. First, we randomly divided the negative samples into 11 subsets of equal size, one of which can form the balance subset by combining with the positive samples in the same amount. Then, two types of feature vectors encoded by the one-hot method, and the nucleotide property and frequency (NPF) methods were fed into a bidirectional long short-term memory (BiLSTM) network and a full connection layer to train the 22 submodels. Finally, the outputs of these models were integrated to predict 5mC sites by using the majority vote strategy. Our experimental results demonstrated that BiLSTM-5mC outperformed existing methods based on the same independent dataset.
Collapse
Affiliation(s)
- Xin Cheng
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China; (X.C.); (Q.L.)
| | - Jun Wang
- School of Software Technology, Zhejiang University, Ningbo 315048, China;
| | - Qianyue Li
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China; (X.C.); (Q.L.)
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China; (X.C.); (Q.L.)
| |
Collapse
|
42
|
Guo G, Pan K, Fang S, Ye L, Tong X, Wang Z, Xue X, Zhang H. Advances in mRNA 5-methylcytosine modifications: Detection, effectors, biological functions, and clinical relevance. MOLECULAR THERAPY. NUCLEIC ACIDS 2021; 26:575-593. [PMID: 34631286 PMCID: PMC8479277 DOI: 10.1016/j.omtn.2021.08.020] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
5-methylcytosine (m5C) post-transcriptional modifications affect the maturation, stability, and translation of the mRNA molecule. These modifications play an important role in many physiological and pathological processes, including stress response, tumorigenesis, tumor cell migration, embryogenesis, and viral replication. Recently, there has been a better understanding of the biological implications of m5C modification owing to the rapid development and optimization of detection technologies, including liquid chromatography-tandem mass spectrometry (LC-MS/MS) and RNA-BisSeq. Further, predictive models (such as PEA-m5C, m5C-PseDNC, and DeepMRMP) for the identification of potential m5C modification sites have also emerged. In this review, we summarize the current experimental detection methods and predictive models for mRNA m5C modifications, focusing on their advantages and limitations. We systematically surveyed the latest research on the effectors related to mRNA m5C modifications and their biological functions in multiple species. Finally, we discuss the physiological effects and pathological significance of m5C modifications in multiple diseases, as well as their therapeutic potential, thereby providing new perspectives for disease treatment and prognosis.
Collapse
Affiliation(s)
- Gangqiang Guo
- Wenzhou Collaborative Innovation Center of Gastrointestinal Cancer in Basic Research and Precision Medicine, Wenzhou Key Laboratory of Cancer-related Pathogens and Immunity, Department of Microbiology and Immunology, Institute of Molecular Virology and Immunology, Institute of Tropical Medicine, School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Kan Pan
- First Clinical College, Wenzhou Medical University, Wenzhou, China
| | - Su Fang
- Wenzhou Collaborative Innovation Center of Gastrointestinal Cancer in Basic Research and Precision Medicine, Wenzhou Key Laboratory of Cancer-related Pathogens and Immunity, Department of Microbiology and Immunology, Institute of Molecular Virology and Immunology, Institute of Tropical Medicine, School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Lele Ye
- Department of Gynecologic Oncology, Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Xinya Tong
- Wenzhou Collaborative Innovation Center of Gastrointestinal Cancer in Basic Research and Precision Medicine, Wenzhou Key Laboratory of Cancer-related Pathogens and Immunity, Department of Microbiology and Immunology, Institute of Molecular Virology and Immunology, Institute of Tropical Medicine, School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Zhibin Wang
- Wenzhou Collaborative Innovation Center of Gastrointestinal Cancer in Basic Research and Precision Medicine, Wenzhou Key Laboratory of Cancer-related Pathogens and Immunity, Department of Microbiology and Immunology, Institute of Molecular Virology and Immunology, Institute of Tropical Medicine, School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Xiangyang Xue
- Wenzhou Collaborative Innovation Center of Gastrointestinal Cancer in Basic Research and Precision Medicine, Wenzhou Key Laboratory of Cancer-related Pathogens and Immunity, Department of Microbiology and Immunology, Institute of Molecular Virology and Immunology, Institute of Tropical Medicine, School of Basic Medical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Huidi Zhang
- Department of Nephrology, The First Affiliated Hospital, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
43
|
Staem5: A novel computational approachfor accurate prediction of m5C site. MOLECULAR THERAPY. NUCLEIC ACIDS 2021; 26:1027-1034. [PMID: 34786208 PMCID: PMC8571400 DOI: 10.1016/j.omtn.2021.10.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/27/2021] [Accepted: 10/06/2021] [Indexed: 12/25/2022]
Abstract
5-Methylcytosine (m5C) is an important post-transcriptional modification that has been extensively found in multiple types of RNAs. Many studies have shown that m5C plays vital roles in many biological functions, such as RNA structure stability and metabolism. Computational approaches act as an efficient way to identify m5C sites from high-throughput RNA sequence data and help interpret the functional mechanism of this important modification. This study proposed a novel species-specific computational approach, Staem5, to accurately predict RNA m5C sites in Mus musculus and Arabidopsis thaliana. Staem5 was developed by employing feature fusion tactics to leverage informatic sequence profiles, and a stacking ensemble learning framework combined five popular machine learning algorithms. Extensive benchmarking tests demonstrated that Staem5 outperformed state-of-the-art approaches in both cross-validation and independent tests. We provide the source code of Staem5, which is publicly available at https://github.com/Cxd-626/Staem5.git.
Collapse
|
44
|
Dou L, Zhou W, Zhang L, Xu L, Han K. Accurate identification of RNA D modification using multiple features. RNA Biol 2021; 18:2236-2246. [PMID: 33729104 PMCID: PMC8632091 DOI: 10.1080/15476286.2021.1898160] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 02/13/2021] [Accepted: 02/23/2021] [Indexed: 10/21/2022] Open
Abstract
As one of the common post-transcriptional modifications in tRNAs, dihydrouridine (D) has prominent effects on regulating the flexibility of tRNA as well as cancerous diseases. Facing with the expensive and time-consuming sequencing techniques to detect D modification, precise computational tools can largely promote the progress of molecular mechanisms and medical developments. We proposed a novel predictor, called iRNAD_XGBoost, to identify potential D sites using multiple RNA sequence representations. In this method, by considering the imbalance problem using hybrid sampling method SMOTEEEN, the XGBoost-selected top 30 features are applied to construct model. The optimized model showed high Sn and Sp values of 97.13% and 97.38% over jackknife test, respectively. For the independent experiment, these two metrics separately achieved 91.67% and 94.74%. Compared with iRNAD method, this model illustrated high generalizability and consistent prediction efficiencies for positive and negative samples, which yielded satisfactory MCC scores of 0.94 and 0.86, respectively. It is inferred that the chemical property and nucleotide density features (CPND), electron-ion interaction pseudopotential (EIIP and PseEIIP) as well as dinucleotide composition (DNC) are crucial to the recognition of D modification. The proposed predictor is a promising tool to help experimental biologists investigate molecular functions.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, GuangdongChina
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, SichuanChina
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, HeilongjiangChina
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, Guangdong, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, GuangdongChina
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, HeilongjiangChina
| |
Collapse
|
45
|
Lyu Y, Zhang Z, Li J, He W, Ding Y, Guo F. iEnhancer-KL: A Novel Two-Layer Predictor for Identifying Enhancers by Position Specific of Nucleotide Composition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2809-2815. [PMID: 33481715 DOI: 10.1109/tcbb.2021.3053608] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
An enhancer is a short region of DNA with the ability to recruit transcription factors and their complexes, increasing the likelihood of the transcription of a particular gene. Considering the importance of enhancers, enhancer identification is a prevailing problem in computational biology. In this paper, we propose a novel two-layer enhancer predictor called iEnhancer-KL, using computational biology algorithms to identify enhancers and then classify these enhancers into strong or weak types. Kullback-Leibler (KL) divergence is creatively taken into consideration to improve the feature extraction method PSTNPss. Then, LASSO is used to reduce the dimension of features and finally helps to get better prediction performance. Furthermore, the selected features are tested on several machine learning models, and the SVM algorithm achieves the best performance. The rigorous cross-validation indicates that our predictor is remarkably superior to the existing state-of-the-art methods with an Acc of 84.23 percent and the MCC of 0.6849 for identifying enhancers. Our code and results can be freely downloaded from https://github.com/Not-so-middle/iEnhancer-KL.git.
Collapse
|
46
|
Huang Z, Li J, Chen J, Chen D. Construction of Prognostic Risk Model of 5-Methylcytosine-Related Long Non-Coding RNAs and Evaluation of the Characteristics of Tumor-Infiltrating Immune Cells in Breast Cancer. Front Genet 2021; 12:748279. [PMID: 34777473 PMCID: PMC8585929 DOI: 10.3389/fgene.2021.748279] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 10/12/2021] [Indexed: 11/28/2022] Open
Abstract
Purpose: The role of 5-methylcytosine-related long non-coding RNAs (m5C-lncRNAs) in breast cancer (BC) remains unclear. Here, we aimed to investigate the prognostic value, gene expression characteristics, and correlation between m5C-lncRNA risk model and tumor immune cell infiltration in BC. Methods: The expression matrix of m5C-lncRNAs in BC was obtained from The Cancer Genome Atlas database, and the lncRNAs were analyzed using differential expression analysis as well as univariate and multivariate Cox regression analysis to eventually obtain BC-specific m5C-lncRNAs. A risk model was developed based on three lncRNAs using multivariate Cox regression and the prognostic value, accuracy, as well as reliability were verified. Gene set enrichment analysis (GSEA) was used to analyze the Kyoto Encyclopedia of Genes and Genomes signaling pathway enrichment of the risk model. CIBERSORT algorithm and correlation analysis were used to explore the characteristics of the BC tumor-infiltrating immune cells. Finally, reverse transcription-quantitative polymerase chain reaction was performed to detect the expression level of three lncRNA in clinical samples. Results: A total of 334 differential m5C-lncRNAs were identified, and three BC-specific m5C-lncRNAs were selected, namely AP005131.2, AL121832.2, and LINC01152. Based on these three lncRNAs, a highly reliable and specific risk model was constructed, which was proven to be closely related to the prognosis of patients with BC. Therefore, a nomogram based on the risk score was built to assist clinical decisions. GSEA revealed that the risk model was significantly enriched in metabolism-related pathways and was associated with tumor immune cell infiltration based on the analysis with the CIBERSORT algorithm. Conclusion: The efficient risk model based on m5C-lncRNAs associated with cancer metabolism and tumor immune cell infiltration could predict the survival prognosis of patients, and AP005131.2, AL121832.2, and LINC01152 could be novel biomarkers and therapeutic targets for BC.
Collapse
Affiliation(s)
| | | | | | - Debo Chen
- Department of Breast Surgery, Quanzhou First Hospital of Fujian Medical University, Quanzhou, China
| |
Collapse
|
47
|
Malik AA, Chotpatiwetchkul W, Phanus-Umporn C, Nantasenamat C, Charoenkwan P, Shoombuatong W. StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors. J Comput Aided Mol Des 2021; 35:1037-1053. [PMID: 34622387 DOI: 10.1007/s10822-021-00418-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/17/2021] [Indexed: 01/07/2023]
Abstract
Fast and accurate identification of inhibitors with potency against HCV NS5B polymerase is currently a challenging task. As conventional experimental methods is the gold standard method for the design and development of new HCV inhibitors, they often require costly investment of time and resources. In this study, we develop a novel machine learning-based meta-predictor (termed StackHCV) for accurate and large-scale identification of HCV inhibitors. Unlike the existing method, which is based on single-feature-based approach, we first constructed a pool of various baseline models by employing a wide range of heterogeneous molecular fingerprints with five popular machine learning algorithms (k-nearest neighbor, multi-layer perceptron, partial least squares, random forest and support vectors machine). Secondly, we integrated these baseline models in order to develop the final meta-based model by means of the stacking strategy. Extensive benchmarking experiments showed that StackHCV achieved a more accurate and stable performance as compared to its constituent baseline models on the training dataset and also outperformed the existing predictor on the independent test dataset. To facilitate the high-throughput identification of HCV inhibitors, we built a web server that can be freely accessed at http://camt.pythonanywhere.com/StackHCV . It is expected that StackHCV could be a useful tool for fast and precise identification of potential drugs against HCV NS5B particularly for liver cancer therapy and other clinical applications.
Collapse
Affiliation(s)
- Aijaz Ahmad Malik
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Warot Chotpatiwetchkul
- Applied Computational Chemistry Research Unit, Department of Chemistry, School of Science, King Mongkut's Institute of Technology Ladkrabang, Bangkok, 10520, Thailand
| | - Chuleeporn Phanus-Umporn
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
48
|
Zhang Q, Liu F, Chen W, Miao H, Liang H, Liao Z, Zhang Z, Zhang B. The role of RNA m 5C modification in cancer metastasis. Int J Biol Sci 2021; 17:3369-3380. [PMID: 34512153 PMCID: PMC8416729 DOI: 10.7150/ijbs.61439] [Citation(s) in RCA: 103] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 07/19/2021] [Indexed: 12/26/2022] Open
Abstract
Epigenetic modification plays a crucial regulatory role in the biological processes of eukaryotic cells. The recent characterization of DNA and RNA methylation is still ongoing. Tumor metastasis has long been an unconquerable feature in the fight against cancer. As an inevitable component of the epigenetic regulatory network, 5-methylcytosine is associated with multifarious cellular processes and systemic diseases, including cell migration and cancer metastasis. Recently, gratifying progress has been achieved in determining the molecular interactions between m5C writers (DNMTs and NSUNs), demethylases (TETs), readers (YTHDF2, ALYREF and YBX1) and RNAs. However, the underlying mechanism of RNA m5C methylation in cell mobility and metastasis remains unclear. The functions of m5C writers and readers are believed to regulate gene expression at the post-transcription level and are involved in cellular metabolism and movement. In this review, we emphatically summarize the recent updates on m5C components and related regulatory networks. The content will be focused on writers and readers of the RNA m5C modification and potential mechanisms in diseases. We will discuss relevant upstream and downstream interacting molecules and their associations with cell migration and metastasis.
Collapse
Affiliation(s)
- Qiaofeng Zhang
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.,Hubei Province for the Clinical Medicine Research Center of Hepatic Surgery, Wuhan, Hubei 430030, China.,Hubei key laboratory of Hepato-Pancreato-Biliary Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Furong Liu
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.,Hubei Province for the Clinical Medicine Research Center of Hepatic Surgery, Wuhan, Hubei 430030, China.,Hubei key laboratory of Hepato-Pancreato-Biliary Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Wei Chen
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.,Hubei Province for the Clinical Medicine Research Center of Hepatic Surgery, Wuhan, Hubei 430030, China.,Hubei key laboratory of Hepato-Pancreato-Biliary Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Hongrui Miao
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.,Hubei Province for the Clinical Medicine Research Center of Hepatic Surgery, Wuhan, Hubei 430030, China.,Hubei key laboratory of Hepato-Pancreato-Biliary Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Huifang Liang
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.,Hubei Province for the Clinical Medicine Research Center of Hepatic Surgery, Wuhan, Hubei 430030, China.,Hubei key laboratory of Hepato-Pancreato-Biliary Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Zhibin Liao
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.,Hubei Province for the Clinical Medicine Research Center of Hepatic Surgery, Wuhan, Hubei 430030, China.,Hubei key laboratory of Hepato-Pancreato-Biliary Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Zhanguo Zhang
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.,Hubei Province for the Clinical Medicine Research Center of Hepatic Surgery, Wuhan, Hubei 430030, China.,Hubei key laboratory of Hepato-Pancreato-Biliary Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Bixiang Zhang
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.,Hubei Province for the Clinical Medicine Research Center of Hepatic Surgery, Wuhan, Hubei 430030, China.,Hubei key laboratory of Hepato-Pancreato-Biliary Diseases, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| |
Collapse
|
49
|
iBitter-Fuse: A Novel Sequence-Based Bitter Peptide Predictor by Fusing Multi-View Features. Int J Mol Sci 2021; 22:ijms22168958. [PMID: 34445663 PMCID: PMC8396555 DOI: 10.3390/ijms22168958] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 08/08/2021] [Accepted: 08/17/2021] [Indexed: 12/19/2022] Open
Abstract
Accurate identification of bitter peptides is of great importance for better understanding their biochemical and biophysical properties. To date, machine learning-based methods have become effective approaches for providing a good avenue for identifying potential bitter peptides from large-scale protein datasets. Although few machine learning-based predictors have been developed for identifying the bitterness of peptides, their prediction performances could be improved. In this study, we developed a new predictor (named iBitter-Fuse) for achieving more accurate identification of bitter peptides. In the proposed iBitter-Fuse, we have integrated a variety of feature encoding schemes for providing sufficient information from different aspects, namely consisting of compositional information and physicochemical properties. To enhance the predictive performance, the customized genetic algorithm utilizing self-assessment-report (GA-SAR) was employed for identifying informative features followed by inputting optimal ones into a support vector machine (SVM)-based classifier for developing the final model (iBitter-Fuse). Benchmarking experiments based on both 10-fold cross-validation and independent tests indicated that the iBitter-Fuse was able to achieve more accurate performance as compared to state-of-the-art methods. To facilitate the high-throughput identification of bitter peptides, the iBitter-Fuse web server was established and made freely available online. It is anticipated that the iBitter-Fuse will be a useful tool for aiding the discovery and de novo design of bitter peptides.
Collapse
|
50
|
Charoenkwan P, Chiangjong W, Hasan MM, Nantasenamat C, Shoombuatong W. Review and comparative analysis of machine learning-based predictors for predicting and analyzing of anti-angiogenic peptides. Curr Med Chem 2021; 29:849-864. [PMID: 34375178 DOI: 10.2174/0929867328666210810145806] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/17/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]
Abstract
Cancer is one of the leading causes of death worldwide and underlying this is angiogenesis that represents one of the hallmarks of cancer. Ongoing effort is already under way in the discovery of anti-angiogenic peptides (AAPs) as a promising therapeutic route by tackling the formation of new blood vessels. As such, the identification of AAPs constitutes a viable path for understanding their mechanistic properties pertinent for the discovery of new anti-cancer drugs. In spite of the abundance of peptide sequences in public databases, experimental efforts in the identification of anti-angiogenic peptides have progressed very slowly owing to its high expenditures and laborious nature. Owing to its inherent ability to make sense of large volumes of data, machine learning (ML) represents a lucrative technique that can be harnessed for peptide-based drug discovery. In this review, we conducted a comprehensive and comparative analysis of ML-based AAP predictors in terms of their employed feature descriptors, ML algorithms, cross-validation methods and prediction performance. Moreover, the common framework of these AAP predictors and their inherent weaknesses are also discussed. Particularly, we explore future perspectives for improving the prediction accuracy and model interpretability, which represents an interesting avenue for overcoming some of the inherent weaknesses of existing AAP predictors. We anticipate that this review would assist researchers in the rapid screening and identification of promising AAPs for clinical use.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok 10400, Thailand
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, United States
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|