1
|
Artika IM, Arianti R, Demény MÁ, Kristóf E. RNA modifications and their role in gene expression. Front Mol Biosci 2025; 12:1537861. [PMID: 40351534 PMCID: PMC12061695 DOI: 10.3389/fmolb.2025.1537861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Accepted: 04/02/2025] [Indexed: 05/14/2025] Open
Abstract
Post-transcriptional RNA modifications have recently emerged as critical regulators of gene expression programs. Understanding normal tissue development and disease susceptibility requires knowledge of the various cellular mechanisms which control gene expression in multicellular organisms. Research into how different RNA modifications such as in N6-methyladenosine (m6A), inosine (I), 5-methylcytosine (m5C), pseudouridine (Ψ), 5-hydroxymethylcytosine (hm5C), N1-methyladenosine (m1A), N6,2'-O-dimethyladenosine (m6Am), 2'-O-methylation (Nm), N7-methylguanosine (m7G) etc. affect the expression of genes could be valuable. This review highlights the current understanding of RNA modification, methods used to study RNA modification, types of RNA modification, and molecular mechanisms underlying RNA modification. The role of RNA modification in modulating gene expression in both physiological and diseased states is discussed. The potential applications of RNA modification in therapeutic development are elucidated.
Collapse
Affiliation(s)
- I. Made Artika
- Department of Biochemistry, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University, Bogor, Indonesia
| | - Rini Arianti
- Laboratory of Cell Biochemistry, Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
- Universitas Muhammadiyah Bangka Belitung, Pangkalpinang, Indonesia
| | - Máté Á. Demény
- Department of Medical Chemistry, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Endre Kristóf
- Laboratory of Cell Biochemistry, Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| |
Collapse
|
2
|
Li M, Li R, Zhang Y, Peng S, Lv Z. Using statistical analysis to explore the influencing factors of data imbalance for machine learning identification methods of human transcriptome m6A modification sites. Comput Biol Chem 2025; 115:108351. [PMID: 39837162 DOI: 10.1016/j.compbiolchem.2025.108351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2024] [Revised: 12/12/2024] [Accepted: 01/09/2025] [Indexed: 01/23/2025]
Abstract
RNA methylation, particularly through m6A modification, represents a crucial epigenetic mechanism that governs gene expression and influences a range of biological functions. Accurate identification of methylation sites is crucial for understanding their biological functions. Traditional experimental methods, however, are often costly and can be influenced by experimental conditions, making machine learning, especially deep learning techniques, a vital tool for m6A site identification. Despite their utility, current machine learning models struggle with unbalanced datasets, a common issue in bioinformatics. This study addresses the RNA methylation site data imbalance problem from three key perspectives: feature encoding representation, deep learning models, and data resampling strategies. Using the K-mer one-hot encoding strategy, we effectively extracted RNA sequence features and developed classification prediction models utilizing long short-term memory networks (LSTM) and its variant, Multiplicative LSTM (mLSTM). We further enhanced model performance by ensemble and weighted strategy models. Additionally, we utilized the sequence generative adversarial network (SeqGAN) and the synthetic minority resampling technique (SMOTE) to construct balanced datasets for RNA methylation sites. The prediction results were rigorously analyzed using the Wilcoxon test and multivariate linear regression to explore the effects of different K-mer values, model architectures, and sampling methods on classification outcomes. The analysis underscored the significant impact of feature selection, model architecture, and sampling techniques in addressing data imbalance. Notably, the optimal prediction performance was achieved with a K value of 5 using the mLSTM-ensemble model. These findings not only offer new insights and methodologies for RNA methylation site identification but also provide valuable guidance for addressing similar challenges in bioinformatics.
Collapse
Affiliation(s)
- Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Rujun Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Yichi Zhang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Shiyu Peng
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China.
| |
Collapse
|
3
|
Su Q, Phan LT, Pham NT, Wei L, Manavalan B. MST-m6A: A Novel Multi-Scale Transformer-based Framework for Accurate Prediction of m6A Modification Sites Across Diverse Cellular Contexts. J Mol Biol 2025; 437:168856. [PMID: 39510345 DOI: 10.1016/j.jmb.2024.168856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 10/23/2024] [Accepted: 11/02/2024] [Indexed: 11/15/2024]
Abstract
N6-methyladenosine (m6A) modification, a prevalent epigenetic mark in eukaryotic cells, is crucial in regulating gene expression and RNA metabolism. Accurately identifying m6A modification sites is essential for understanding their functions within biological processes and the intricate mechanisms that regulate them. Recent advances in high-throughput sequencing technologies have enabled the generation of extensive datasets characterizing m6A modification sites at single-nucleotide resolution, leading to the development of computational methods for identifying m6A RNA modification sites. However, most current methods focus on specific cell lines, limiting their generalizability and practical application across diverse biological contexts. To address the limitation, we propose MST-m6A, a novel approach for identifying m6A modification sites with higher accuracy across various cell lines and tissues. MST-m6A utilizes a multi-scale transformer-based architecture, employing dual k-mer tokenization to capture rich feature representations and global contextual information from RNA sequences at multiple levels of granularity. These representations are then effectively combined using a channel fusion mechanism and further processed by a convolutional neural network to enhance prediction accuracy. Rigorous validation demonstrates that MST-m6A significantly outperforms conventional machine learning models, deep learning models, and state-of-the-art predictors. We anticipate that the high precision and cross-cell-type adaptability of MST-m6A will provide valuable insights into m6A biology and facilitate advancements in related fields. The proposed approach is available at https://github.com/cbbl-skku-org/MST-m6A/ for prediction and reproducibility purposes.
Collapse
Affiliation(s)
- Qiaosen Su
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Le Thi Phan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, Macau
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
4
|
Li G, Zhao B, Su X, Yang Y, Zeng Z, Hu P, Hu L. Capturing short-range and long-range dependencies of nucleotides for identifying RNA N6-methyladenosine modification sites. Comput Biol Med 2025; 186:109625. [PMID: 39756188 DOI: 10.1016/j.compbiomed.2024.109625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 11/17/2024] [Accepted: 12/23/2024] [Indexed: 01/07/2025]
Abstract
N6-methyladenosine (m6A) plays a crucial role in enriching RNA functional and genetic information, and the identification of m6A modification sites is therefore an important task to promote the understanding of RNA epigenetics. In the identification process, current studies are mainly concentrated on capturing the short-range dependencies between adjacent nucleotides in RNA sequences, while ignoring the impact of long-range dependencies between non-adjacent nucleotides for learning high-quality representation of RNA sequences. In this work, we propose an end-to-end prediction model, called m6ASLD, to improve the identification accuracy of m6A modification sites by capturing the short-range and long-range dependencies of nucleotides. Specifically, m6ASLD first encodes the type and position information of nucleotides to construct the initial embeddings of RNA sequences. A self-correlation map is then generated to characterize both short-range and long-range dependencies with a designed map generating block for each RNA sequence. After that, m6ASLD learns the global and local representations of RNA sequences by using a graph convolution process and a designed dependency searching block respectively, and finally achieves its identification task under a joint training scheme. Extensive experiments have demonstrated the promising performance of m6ASLD on 11 benchmark datasets across several evaluation metrics.
Collapse
Affiliation(s)
- Guodong Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Bowei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Xiaorui Su
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Yue Yang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Zhi Zeng
- College of Computer Science and Technology, Xi'an Jiaotong University, 710049, Xi'an, China.
| | - Pengwei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| |
Collapse
|
5
|
Chen J, Wu H, Zuo T, Wu J, Chen Z. METTL3‑mediated N6‑methyladenosine modification of MMP9 mRNA promotes colorectal cancer proliferation and migration. Oncol Rep 2025; 53:9. [PMID: 39540393 DOI: 10.3892/or.2024.8842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 10/08/2024] [Indexed: 11/16/2024] Open
Abstract
N6‑methyladenosine (m6A) is the predominant chemical modification of eukaryotic mRNA, dynamically mediated by the RNA methyltransferase, methyltransferase-like 3 (METTL3). m6A modification plays a critical role in cancer progression through post‑transcriptional regulation in various types of cancer. However, the role of METTL3 and its associated m6A modification in colorectal tumorigenesis remains to be fully elucidated. In the present study, it was demonstrated that METTL3 expression and the m6A levels were both upregulated in colorectal cancer (CRC) and positively associated with clinical progression, based on the bioinformatics analysis of cancer databases. Furthermore, knockdown and overexpression of METTL3 notably affected CRC cell viability, apoptosis and migration in vitro. Similarly, xenograft animal models confirmed that METTL3 promoted CRC tumorigenicity in vivo. Mechanistically, it was revealed that the m6A modification of matrix metallopeptidase 9 (MMP9) mRNA mediated by METTL3 promoted its expression in CRC by decreasing its degradation. Collectively, the findings of the present study suggested that the METTL3/MMP9 axis could serve as a novel promising therapeutic candidate for CRC.
Collapse
Affiliation(s)
- Jie Chen
- Department of Central Laboratory, The First Affiliated Hospital of Jiaxing University, Jiaxing, Zhejiang 314000, P.R. China
| | - Henglan Wu
- Department of Nephrology, The First Affiliated Hospital of Jiaxing University, Jiaxing, Zhejiang 314000, P.R. China
| | - Ting Zuo
- Department of Anesthesia Surgery, The First Affiliated Hospital of Jiaxing University, Jiaxing, Zhejiang 314000, P.R. China
| | - Jianming Wu
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Jiaxing University, Jiaxing, Zhejiang 314000, P.R. China
| | - Zhiheng Chen
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Jiaxing University, Jiaxing, Zhejiang 314000, P.R. China
| |
Collapse
|
6
|
Huang D, Meng J, Chen K. AI techniques have facilitated the understanding of epitranscriptome distribution. CELL GENOMICS 2024; 4:100718. [PMID: 39667349 PMCID: PMC11701248 DOI: 10.1016/j.xgen.2024.100718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 11/08/2024] [Accepted: 11/08/2024] [Indexed: 12/14/2024]
Abstract
N6-methyladenosine (m6A), the most prevalent internal mRNA modification in higher eukaryotes, plays diverse roles in cellular regulation. By incorporating both sequence- and genome-derived features, Fan et al.1 designed a novel Transformer-BiGRU framework that achieves superior performance in computational m6A identification, thus demonstrating the potential of AI in genomic studies.
Collapse
Affiliation(s)
- Daiyun Huang
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fuzhou 350122, China; Wisdom Lake Academy of Pharmacy, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; School of Life Sciences, Fudan University, Shanghai 200092, China
| | - Jia Meng
- Department of Biosciences and Bioinformatics, Center for Intelligent RNA Therapeutics, Suzhou Key Laboratory of Cancer Biology and Chronic Diseases, School of Science, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L7 8TX, UK
| | - Kunqi Chen
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fuzhou 350122, China; Department of Medical Microbiology, Fujian Key Laboratory of Tumor Microbiology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China; School of Medical Technology and Engineering, Fujian Medical University, Fuzhou 350122, China.
| |
Collapse
|
7
|
Fan R, Cui C, Kang B, Chang Z, Wang G, Cui Q. A combined deep learning framework for mammalian m6A site prediction. CELL GENOMICS 2024; 4:100697. [PMID: 39571573 DOI: 10.1016/j.xgen.2024.100697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 09/17/2024] [Accepted: 10/28/2024] [Indexed: 12/14/2024]
Abstract
N6-methyladenosine (m6A) is the most prevalent chemical modification in eukaryotic mRNAs and plays key roles in diverse cellular processes. Precise localization of m6A sites is thus critical for characterizing the functional roles of m6A in various conditions and dissecting the mechanisms governing its deposition. Here, we design a combined framework of Transformer architecture and recurrent neural network, deepSRAMP, to identify m6A sites using sequence-based and genome-derived features. As a result, deepSRAMP achieves a notably enhanced performance compared to its predecessor, SRAMP, the most-used predictor in this field. Moreover, based on multiple benchmark datasets, deepSRAMP greatly outperforms other state-of-the-art m6A predictors, including WHISTLE and DeepPromise, with an average 16.1% and 18.3% increase in AUROC and a 43.9% and 46.4% increase in AUPRC. Finally, deepSRAMP can be successfully exploited on mammalian m6A epitranscriptome mapping under diverse cellular conditions and can potentially reveal differential m6A sites among transcript isoforms of individual genes.
Collapse
Affiliation(s)
- Rui Fan
- Department of Biomedical Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling, School of Basic Medical Sciences, Peking University, 38 Xueyuan Road, Beijing 100191, China
| | - Chunmei Cui
- Department of Biomedical Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling, School of Basic Medical Sciences, Peking University, 38 Xueyuan Road, Beijing 100191, China.
| | - Boming Kang
- Department of Biomedical Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling, School of Basic Medical Sciences, Peking University, 38 Xueyuan Road, Beijing 100191, China
| | - Zecheng Chang
- State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, College of Basic Medicine, Jilin University, Changchun 130021, China
| | - Guoqing Wang
- State Key Laboratory for Diagnosis and Treatment of Severe Zoonotic Infectious Diseases, College of Basic Medicine, Jilin University, Changchun 130021, China.
| | - Qinghua Cui
- School of Sports Medicine, Wuhan Institute of Physical Education, No. 461 Luoyu Road, Wuchang District, Wuhan 430079, Hubei Province, China; Department of Biomedical Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling, School of Basic Medical Sciences, Peking University, 38 Xueyuan Road, Beijing 100191, China; Department of Cardiology and Institute of Vascular Medicine, Peking University Third Hospital, 49 Huayuanbei Road, Beijing 100191, China.
| |
Collapse
|
8
|
Huang J, Wang X, Xia R, Yang D, Liu J, Lv Q, Yu X, Meng J, Chen K, Song B, Wang Y. Domain-knowledge enabled ensemble learning of 5-formylcytosine (f5C) modification sites. Comput Struct Biotechnol J 2024; 23:3175-3185. [PMID: 39253057 PMCID: PMC11381828 DOI: 10.1016/j.csbj.2024.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 08/07/2024] [Accepted: 08/07/2024] [Indexed: 09/11/2024] Open
Abstract
5-formylcytidine (f5C) is a unique post-transcriptional RNA modification found in mRNA and tRNA at the wobble site, playing a crucial role in mitochondrial protein synthesis and potentially contributing to the regulation of translation. Recent studies have unveiled that the f5C modifications may drive mitochondrial mRNA translation to power cancer metastasis. Accurate identification of f5C sites is essential for further unraveling their molecular functions and regulatory mechanisms, but there are currently no computational methods available for predicting their locations. In this study, we introduce an innovative ensemble approach, successfully enabling the computational recognition of Saccharomyces cerevisiae f5C. We conducted a comprehensive model selection process that involved multiple basic machine learning and deep learning algorithms such as recurrent neural networks, convolutional neural networks and Transformer-based models. Initially trained only on sequence information, these individual models achieved an AUROC ranging from 0.7104 to 0.7492. Through the integration of 32 novel domain-derived genomic features, the performance of individual models has significantly improved to an AUROC between 0.7309 and 0.8076. To further enhance accuracy and robustness, we then constructed the ensembles of these individual models with different combinations. The best performance attained by our ensemble models reached an AUROC of 0.8391. Shapley additive explanations were conducted to explain the significant contributions of genomic features, providing insights into the putative distribution of f5C across various topological regions and potentially paving the way for revealing their functional relevance within distinct genomic contexts. A freely accessible web server that allows real-time analysis of user-uploaded sites can be accessed at: www.rnamd.org/Resf5C-Pred.
Collapse
Affiliation(s)
- Jiaming Huang
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Xuan Wang
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Rong Xia
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Dongqing Yang
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jian Liu
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Qi Lv
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Xiaoxuan Yu
- Department of Pharmacology, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jia Meng
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L7 8TX, United Kingdom
| | - Kunqi Chen
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China
| | - Bowen Song
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Yue Wang
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| |
Collapse
|
9
|
Yuge CC, Hang ES, Mamtha MRN, Vishwakarma S, Wang S, Wang C, Le NQK. RNA-ModX: a multilabel prediction and interpretation framework for RNA modifications. Brief Bioinform 2024; 26:bbae688. [PMID: 39737566 DOI: 10.1093/bib/bbae688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 11/18/2024] [Accepted: 12/16/2024] [Indexed: 01/01/2025] Open
Abstract
Accurate prediction of RNA modifications holds profound implications for elucidating RNA function and mechanism, with potential applications in drug development. Here, the RNA-ModX presents a highly precise predictive model designed to forecast post-transcriptional RNA modifications, complemented by a user-friendly web application tailored for seamless utilization by future researchers. To achieve exceptional accuracy, the RNA-ModX systematically explored a range of machine learning models, including Long Short-Term Memory (LSTM), Gated Recurrent Unit, and Transformer-based architectures. The model underwent rigorous testing using a dataset comprising RNA sequences containing the four fundamental nucleotides (A, C, G, U) and spanning 12 prevalent modification classes (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um), with sequences of length 1001 nucleotides. Notably, the LSTM model, augmented with 3-mer encoding, demonstrated the highest level of model accuracy. Furthermore, Local Interpretable Model-Agnostic Explanations were employed to facilitate result interpretation, enhancing the transparency and interpretability of the model's predictions. In conjunction with the model development, a user-friendly web application was meticulously crafted, featuring an intuitive interface for researchers to effortlessly upload RNA sequences. Upon submission, the model executes in the backend, generating predictions which are seamlessly presented to the user in a coherent manner. This integration of cutting-edge predictive modeling with a user-centric interface signifies a significant step forward in facilitating the exploration and utilization of RNA modification prediction technologies by the broader research community.
Collapse
Affiliation(s)
- Chelsea Chen Yuge
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Ee Soon Hang
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | | | - Shashikant Vishwakarma
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Sijia Wang
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Cheng Wang
- Independent Researcher, Singapore, Singapore
| | - Nguyen Quoc Khanh Le
- In-Service Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, 252 Wuxing Street, 110, Taipei, Taiwan
| |
Collapse
|
10
|
Kang Y, Wang H, Qin Y, Liu G, Yu Y, Zhang Y. PSATF-6mA: an integrated learning fusion feature-encoded DNA-6 mA methylcytosine modification site recognition model based on attentional mechanisms. Front Genet 2024; 15:1498884. [PMID: 39600317 PMCID: PMC11588721 DOI: 10.3389/fgene.2024.1498884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Accepted: 10/30/2024] [Indexed: 11/29/2024] Open
Abstract
DNA methylation is of crucial importance for biological genetic expression, such as biological cell differentiation and cellular tumours. The identification of DNA-6mA sites using traditional biological experimental methods requires more cumbersome steps and a large amount of time. The advent of neural network technology has facilitated the identification of 6 mA sites on cross-species DNA with enhanced efficacy. Nevertheless, the majority of contemporary neural network models for identifying 6 mA sites prioritize the design of the identification model, with comparatively limited research conducted on the statistically significant DNA sequence itself. Consequently, this paper will focus on the statistical strategy of DNA double-stranded features, utilising the multi-head self-attention mechanism in neural networks applied to DNA position probabilistic relationships. Furthermore, a new recognition model, PSATF-6 mA, will be constructed by continually adjusting the attentional tendency of feature fusion through an integrated learning framework. The experimental results, obtained through cross-validation with cross-species data, demonstrate that the PSATF-6 mA model outperforms the baseline model. The in-Matthews correlation coefficient (MCC) for the cross-species dataset of rice and m. musus genomes can reach a score of 0.982. The present model is expected to assist biologists in more accurately identifying 6 mA locus and in formulating new testable biological hypotheses.
Collapse
Affiliation(s)
- Yanmei Kang
- School of Cyber Science and Engineering, University of International Relations, Beijing, China
| | - Hongyuan Wang
- School of Cyber Science and Engineering, University of International Relations, Beijing, China
| | - Yubo Qin
- School of Cyber Science and Engineering, University of International Relations, Beijing, China
| | - Guanlin Liu
- School of Cyber Science and Engineering, University of International Relations, Beijing, China
| | - Yi Yu
- College of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
| | - Yongjian Zhang
- School of Cyber Science and Engineering, University of International Relations, Beijing, China
| |
Collapse
|
11
|
Du C, Fan W, Zhou Y. Integrated Biochemical and Computational Methods for Deciphering RNA-Processing Codes. WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1875. [PMID: 39523464 DOI: 10.1002/wrna.1875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 09/23/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
RNA processing involves steps such as capping, splicing, polyadenylation, modification, and nuclear export. These steps are essential for transforming genetic information in DNA into proteins and contribute to RNA diversity and complexity. Many biochemical methods have been developed to profile and quantify RNAs, as well as to identify the interactions between RNAs and RNA-binding proteins (RBPs), especially when coupled with high-throughput sequencing technologies. With the rapid accumulation of diverse data, it is crucial to develop computational methods to convert the big data into biological knowledge. In particular, machine learning and deep learning models are commonly utilized to learn the rules or codes governing the transformation from DNA sequences to intriguing RNAs based on manually designed or automatically extracted features. When precise enough, the RNA codes can be incredibly useful for predicting RNA products, decoding the molecular mechanisms, forecasting the impact of disease variants on RNA processing events, and identifying driver mutations. In this review, we systematically summarize the biochemical and computational methods for deciphering five important RNA codes related to alternative splicing, alternative polyadenylation, RNA localization, RNA modifications, and RBP binding. For each code, we review the main types of experimental methods used to generate training data, as well as the key features, strategic model structures, and advantages of representative tools. We also discuss the challenges encountered in developing predictive models using large language models and extensive domain knowledge. Additionally, we highlight useful resources and propose ways to improve computational tools for studying RNA codes.
Collapse
Affiliation(s)
- Chen Du
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Wuhan University, Wuhan, China
| | - Weiliang Fan
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Wuhan University, Wuhan, China
| | - Yu Zhou
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Wuhan University, Wuhan, China
- Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, China
- State Key Laboratory of Virology, Wuhan University, Wuhan, China
| |
Collapse
|
12
|
Xia Y, Zhang Y, Liu D, Zhu YH, Wang Z, Song J, Yu DJ. BLAM6A-Merge: Leveraging Attention Mechanisms and Feature Fusion Strategies to Improve the Identification of RNA N6-Methyladenosine Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1803-1815. [PMID: 38913512 DOI: 10.1109/tcbb.2024.3418490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
RNA N6-methyladenosine is a prevalent and abundant type of RNA modification that exerts significant influence on diverse biological processes. To date, numerous computational approaches have been developed for predicting methylation, with most of them ignoring the correlations of different encoding strategies and failing to explore the adaptability of various attention mechanisms for methylation identification. To solve the above issues, we proposed an innovative framework for predicting RNA m6A modification site, termed BLAM6A-Merge. Specifically, it utilized a multimodal feature fusion strategy to combine the classification results of four features and Blastn tool. Apart from this, different attention mechanisms were employed for extracting higher-level features on specific features after the screening process. Extensive experiments on 12 benchmarking datasets demonstrated that BLAM6A-Merge achieved superior performance (average AUC: 0.849 for the full transcript mode and 0.784 for the mature mRNA mode). Notably, the Blastn tool was employed for the first time in the identification of methylation sites.
Collapse
|
13
|
Uddin I, Awan HH, Khalid M, Khan S, Akbar S, Sarker MR, Abdolrasol MGM, Alghamdi TAH. A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications. Sci Rep 2024; 14:20819. [PMID: 39242695 PMCID: PMC11379919 DOI: 10.1038/s41598-024-71568-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 08/29/2024] [Indexed: 09/09/2024] Open
Abstract
RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA's operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis.
Collapse
Affiliation(s)
- Islam Uddin
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Hamid Hussain Awan
- Department of Computer Science, Muslim Youth University, Islamabad, Pakistan
| | - Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, 21955, Saudi Arabia
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Mahidur R Sarker
- Institute of Visual Informatics, Universiti Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia
- Universidad de Diseño, Innovación y Tecnología, UDIT, Av. Alfonso XIII, 97, 28016, Madrid, Spain
| | - Maher G M Abdolrasol
- Institute of Sustainable Energy, Universiti Tenaga Nasional, Kajang, 43000, Malaysia
| | - Thamer A H Alghamdi
- Wolfson Centre for Magnetics, School of Engineering, Cardiff University, Cardiff, CF24 3AA, UK.
- Electrical Engineering Department, Faculty of Engineering, Al-Baha University, Al-Baha, 65779, Saudi Arabia.
| |
Collapse
|
14
|
Jiang X, Zhan L, Tang X. RNA modifications in physiology and pathology: Progressing towards application in clinical settings. Cell Signal 2024; 121:111242. [PMID: 38851412 DOI: 10.1016/j.cellsig.2024.111242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 05/23/2024] [Accepted: 05/30/2024] [Indexed: 06/10/2024]
Abstract
The potential to modify individual nucleotides through chemical means in order to impact the electrostatic charge, hydrophobic properties, and base pairing of RNA molecules is harnessed in the medical application of stable synthetic RNAs like mRNA vaccines and synthetic small RNA molecules. These modifications are used to either increase or decrease the production of therapeutic proteins. Additionally, naturally occurring biochemical alterations of nucleotides play a role in regulating RNA metabolism and function, thereby modulating essential cellular processes. Research elucidating the mechanisms through which RNA modifications govern fundamental cellular functions in multicellular organisms has enhanced our comprehension of how irregular RNA modification profiles can lead to human diseases. Collectively, these fundamental scientific findings have unveiled the molecular and cellular functions of RNA modifications, offering new opportunities for therapeutic intervention and paving the way for a variety of innovative clinical strategies.
Collapse
Affiliation(s)
- Xue Jiang
- College of Pharmacy and Traditional Chinese Medicine, Jiangsu College of Nursing, Huaian, Jiangsu 223005, China
| | - Lijuan Zhan
- College of Pharmacy and Traditional Chinese Medicine, Jiangsu College of Nursing, Huaian, Jiangsu 223005, China.
| | - Xiaozhu Tang
- School of Medicine & Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China.
| |
Collapse
|
15
|
Bortoletto E, Rosani U. Bioinformatics for Inosine: Tools and Approaches to Trace This Elusive RNA Modification. Genes (Basel) 2024; 15:996. [PMID: 39202357 PMCID: PMC11353476 DOI: 10.3390/genes15080996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 07/23/2024] [Accepted: 07/25/2024] [Indexed: 09/03/2024] Open
Abstract
Inosine is a nucleotide resulting from the deamination of adenosine in RNA. This chemical modification process, known as RNA editing, is typically mediated by a family of double-stranded RNA binding proteins named Adenosine Deaminase Acting on dsRNA (ADAR). While the presence of ADAR orthologs has been traced throughout the evolution of metazoans, the existence and extension of RNA editing have been characterized in a more limited number of animals so far. Undoubtedly, ADAR-mediated RNA editing plays a vital role in physiology, organismal development and disease, making the understanding of the evolutionary conservation of this phenomenon pivotal to a deep characterization of relevant biological processes. However, the lack of direct high-throughput methods to reveal RNA modifications at single nucleotide resolution limited an extended investigation of RNA editing. Nowadays, these methods have been developed, and appropriate bioinformatic pipelines are required to fully exploit this data, which can complement existing approaches to detect ADAR editing. Here, we review the current literature on the "bioinformatics for inosine" subject and we discuss future research avenues in the field.
Collapse
Affiliation(s)
| | - Umberto Rosani
- Department of Biology, University of Padova, 35131 Padova, Italy;
| |
Collapse
|
16
|
Wang J, Yang Z, Chen C, Yao G, Wan X, Bao S, Ding J, Wang L, Jiang H. MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction. Brief Bioinform 2024; 25:bbae387. [PMID: 39129365 PMCID: PMC11317537 DOI: 10.1093/bib/bbae387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 06/24/2024] [Accepted: 07/23/2024] [Indexed: 08/13/2024] Open
Abstract
Enzymatic reaction kinetics are central in analyzing enzymatic reaction mechanisms and target-enzyme optimization, and thus in biomanufacturing and other industries. The enzyme turnover number (kcat) and Michaelis constant (Km), key kinetic parameters for measuring enzyme catalytic efficiency, are crucial for analyzing enzymatic reaction mechanisms and the directed evolution of target enzymes. Experimental determination of kcat and Km is costly in terms of time, labor, and cost. To consider the intrinsic connection between kcat and Km and further improve the prediction performance, we propose a universal pretrained multitask deep learning model, MPEK, to predict these parameters simultaneously while considering pH, temperature, and organismal information. Through testing on the same kcat and Km test datasets, MPEK demonstrated superior prediction performance over the previous models. Specifically, MPEK achieved the Pearson coefficient of 0.808 for predicting kcat, improving ca. 14.6% and 7.6% compared to the DLKcat and UniKP models, and it achieved the Pearson coefficient of 0.777 for predicting Km, improving ca. 34.9% and 53.3% compared to the Kroll_model and UniKP models. More importantly, MPEK was able to reveal enzyme promiscuity and was sensitive to slight changes in the mutant enzyme sequence. In addition, in three case studies, it was shown that MPEK has the potential for assisted enzyme mining and directed evolution. To facilitate in silico evaluation of enzyme catalytic efficiency, we have established a web server implementing this model, which can be accessed at http://mathtc.nscc-tj.cn/mpek.
Collapse
Affiliation(s)
- Jingjing Wang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Zhijiang Yang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Chang Chen
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Ge Yao
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Xiukun Wan
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Shaoheng Bao
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| |
Collapse
|
17
|
Song R, J Sutton G, Li F, Liu Q, Wong JJL. Variable calling of m6A and associated features in databases: a guide for end-users. Brief Bioinform 2024; 25:bbae434. [PMID: 39258883 PMCID: PMC11388104 DOI: 10.1093/bib/bbae434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 07/01/2024] [Accepted: 08/19/2024] [Indexed: 09/12/2024] Open
Abstract
N6-methyladenosine (m$^{6}$A) is a widely-studied methylation to messenger RNAs, which has been linked to diverse cellular processes and human diseases. Numerous databases that collate m$^{6}$A profiles of distinct cell types have been created to facilitate quick and easy mining of m$^{6}$A signatures associated with cell-specific phenotypes. However, these databases contain inherent complexities that have not been explicitly reported, which may lead to inaccurate identification and interpretation of m$^{6}$A-associated biology by end-users who are unaware of them. Here, we review various m$^{6}$A-related databases, and highlight several critical matters. In particular, differences in peak-calling pipelines across databases drive substantial variability in both peak number and coordinates with only moderate reproducibility, and the inclusion of peak calls from early m$^{6}$A sequencing protocols may lead to the reporting of false positives or negatives. The awareness of these matters will help end-users avoid the inclusion of potentially unreliable data in their studies and better utilize m$^{6}$A databases to derive biologically meaningful results.
Collapse
Affiliation(s)
- Renhua Song
- Epigenetics and RNA Biology Laboratory, School of Medical Sciences, The University of Sydney, Camperdown, NSW 2050, Australia
- Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2050, Australia
| | - Gavin J Sutton
- Epigenetics and RNA Biology Laboratory, School of Medical Sciences, The University of Sydney, Camperdown, NSW 2050, Australia
- Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2050, Australia
| | - Fuyi Li
- College of Information Engineering, Northwest A&F University, Yangling 712100, Shaanxi, China
- South Australian immunoGENomics Cancer Institute (SAiGENCI), The University of Adelaide, Adelaide, South Australia 5005, Australia
| | - Qian Liu
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Maryland Pkwy, NV 89154, United States
- School of Life Sciences, College of Sciences, University of Nevada, Las Vegas, Maryland Pkwy, NV 89154, United States
| | - Justin J-L Wong
- Epigenetics and RNA Biology Laboratory, School of Medical Sciences, The University of Sydney, Camperdown, NSW 2050, Australia
- Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2050, Australia
| |
Collapse
|
18
|
Song M, Zhao J, Zhang C, Jia C, Yang J, Zhao H, Zhai J, Lei B, Tao S, Chen S, Su R, Ma C. PEA-m6A: an ensemble learning framework for accurately predicting N6-methyladenosine modifications in plants. PLANT PHYSIOLOGY 2024; 195:1200-1213. [PMID: 38428981 DOI: 10.1093/plphys/kiae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 01/11/2024] [Accepted: 02/01/2024] [Indexed: 03/03/2024]
Abstract
N 6-methyladenosine (m6A), which is the mostly prevalent modification in eukaryotic mRNAs, is involved in gene expression regulation and many RNA metabolism processes. Accurate prediction of m6A modification is important for understanding its molecular mechanisms in different biological contexts. However, most existing models have limited range of application and are species-centric. Here we present PEA-m6A, a unified, modularized and parameterized framework that can streamline m6A-Seq data analysis for predicting m6A-modified regions in plant genomes. The PEA-m6A framework builds ensemble learning-based m6A prediction models with statistic-based and deep learning-driven features, achieving superior performance with an improvement of 6.7% to 23.3% in the area under precision-recall curve compared with state-of-the-art regional-scale m6A predictor WeakRM in 12 plant species. Especially, PEA-m6A is capable of leveraging knowledge from pretrained models via transfer learning, representing an innovation in that it can improve prediction accuracy of m6A modifications under small-sample training tasks. PEA-m6A also has a strong capability for generalization, making it suitable for application in within- and cross-species m6A prediction. Overall, this study presents a promising m6A prediction tool, PEA-m6A, with outstanding performance in terms of its accuracy, flexibility, transferability, and generalization ability. PEA-m6A has been packaged using Galaxy and Docker technologies for ease of use and is publicly available at https://github.com/cma2015/PEA-m6A.
Collapse
Affiliation(s)
- Minggui Song
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jiawen Zhao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chujun Zhang
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chengchao Jia
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jing Yang
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Haonan Zhao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jingjing Zhai
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Beilei Lei
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Shiheng Tao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Siqi Chen
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Ran Su
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Chuang Ma
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
19
|
Liu Q, Yang GH, Wang NZ, Wang XC, Zhang ZL, Qiao LJ, Cui WJ. Dexmedetomidine suppressed the biological behavior of RAW264.7 cells treated with LPS by down-regulating HOTAIR. Heliyon 2024; 10:e27690. [PMID: 38533037 PMCID: PMC10963246 DOI: 10.1016/j.heliyon.2024.e27690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 03/05/2024] [Accepted: 03/05/2024] [Indexed: 03/28/2024] Open
Abstract
Background Previous studies have revealed dexmedetomidine have potential protective effects on vital organs by inhibiting the release of inflammatory cytokines. To investigate the effects of dexmedetomidine on sepsis, especially in the initial inflammatory stage of sepsis. RAW264.7 cells were used as the cell model in this study to elucidate the underlying mechanisms. Methods In this study, we conducted several assays to investigate the mechanisms of dexmedetomidine and HOTAIR in sepsis. Cell viability was assessed using the CCK-8 kit, while inflammation responses were measured using ELISA for IL-1β, IL-6, and TNF-α. Additionally, we employed qPCR, MeRIP, and RIP to further explore the underlying mechanisms. Results Our findings indicate that dexmedetomidine treatment enhanced cell viability and reduced the production of inflammatory cytokines in LPS-treated RAW264.7 cells. Furthermore, we observed that the expression of HOTAIR was increased in LPS-treated RAW264.7 cells, which was then decreased upon dexmedetomidine pre-treatment. Further investigation demonstrated that HOTAIR could counteract the beneficial effects of dexmedetomidine on cell viability and cytokine production. Interestingly, we discovered that YTHDF1 targeted HOTAIR and was upregulated in LPS-treated RAW264.7 cells, but reduced in dexmedetomidine treatment. We also found that YTHDF1 increased HOTAIR and HOTAIR m6A levels. Conclusions Collectively, our results suggest that dexmedetomidine downregulates HOTAIR and YTHDF1 expression, which in turn inhibits the biological behavior of LPS-treated RAW264.7 cells. This finding has potential implications for the prevention and treatment of sepsis-induced kidney injury.
Collapse
Affiliation(s)
- Qin Liu
- Intensive Care Unit, Shengli Oilfield Central Hospital, Dongying, Shandong, 257000, China
| | - Guang-Hu Yang
- Intensive Care Unit, Shengli Oilfield Central Hospital, Dongying, Shandong, 257000, China
| | - Nai-Zhi Wang
- Department of Respiratory and Critical Care Medicine, Jinan Central Hospital, Jinan, Shandong, 250013, China
| | - Xin-Cheng Wang
- Intensive Care Unit, Shengli Oilfield Central Hospital, Dongying, Shandong, 257000, China
| | - Zhao-Long Zhang
- Intensive Care Unit, Shengli Oilfield Central Hospital, Dongying, Shandong, 257000, China
| | - Lu-Jun Qiao
- Intensive Care Unit, Shengli Oilfield Central Hospital, Dongying, Shandong, 257000, China
| | - Wen-Juan Cui
- Intensive Care Unit, Shengli Oilfield Central Hospital, Dongying, Shandong, 257000, China
| |
Collapse
|
20
|
Tu G, Wang X, Xia R, Song B. m6A-TCPred: a web server to predict tissue-conserved human m 6A sites using machine learning approach. BMC Bioinformatics 2024; 25:127. [PMID: 38528499 PMCID: PMC10962094 DOI: 10.1186/s12859-024-05738-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 03/11/2024] [Indexed: 03/27/2024] Open
Abstract
BACKGROUND N6-methyladenosine (m6A) is the most prevalent post-transcriptional modification in eukaryotic cells that plays a crucial role in regulating various biological processes, and dysregulation of m6A status is involved in multiple human diseases including cancer contexts. A number of prediction frameworks have been proposed for high-accuracy identification of putative m6A sites, however, none have targeted for direct prediction of tissue-conserved m6A modified residues from non-conserved ones at base-resolution level. RESULTS We report here m6A-TCPred, a computational tool for predicting tissue-conserved m6A residues using m6A profiling data from 23 human tissues. By taking advantage of the traditional sequence-based characteristics and additional genome-derived information, m6A-TCPred successfully captured distinct patterns between potentially tissue-conserved m6A modifications and non-conserved ones, with an average AUROC of 0.871 and 0.879 tested on cross-validation and independent datasets, respectively. CONCLUSION Our results have been integrated into an online platform: a database holding 268,115 high confidence m6A sites with their conserved information across 23 human tissues; and a web server to predict the conserved status of user-provided m6A collections. The web interface of m6A-TCPred is freely accessible at: www.rnamd.org/m6ATCPred .
Collapse
Affiliation(s)
- Gang Tu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Xuan Wang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, 215123, China.
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L7 8TX, UK.
| | - Rong Xia
- Department of Financial and Actuarial Mathematics, Xi'an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Bowen Song
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| |
Collapse
|
21
|
Wang R, Chung CR, Lee TY. Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species. Int J Mol Sci 2024; 25:2869. [PMID: 38474116 DOI: 10.3390/ijms25052869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 02/19/2024] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open
Abstract
RNA modification plays a crucial role in cellular regulation. However, traditional high-throughput sequencing methods for elucidating their functional mechanisms are time-consuming and labor-intensive, despite extensive research. Moreover, existing methods often limit their focus to specific species, neglecting the simultaneous exploration of RNA modifications across diverse species. Therefore, a versatile computational approach is necessary for interpretable analysis of RNA modifications across species. A multi-scale biological language-based deep learning model is proposed for interpretable, sequential-level prediction of diverse RNA modifications. Benchmark comparisons across species demonstrate the model's superiority in predicting various RNA methylation types over current state-of-the-art methods. The cross-species validation and attention weight visualization also highlight the model's capability to capture sequential and functional semantics from genomic backgrounds. Our analysis of RNA modifications helps us find the potential existence of "biological grammars" in each modification type, which could be effective for mapping methylation-related sequential patterns and understanding the underlying biological mechanisms of RNA modifications.
Collapse
Affiliation(s)
- Rulan Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 320317, Taiwan
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan
| |
Collapse
|
22
|
Yang HZ, Zhuo D, Huang Z, Luo G, Liang S, Fan Y, Zhao Y, Lv X, Qiu C, Zhang L, Liu Y, Sun T, Chen X, Li SS, Jin X. Deficiency of Acetyltransferase nat10 in Zebrafish Causes Developmental Defects in the Visual Function. Invest Ophthalmol Vis Sci 2024; 65:31. [PMID: 38381411 PMCID: PMC10893899 DOI: 10.1167/iovs.65.2.31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 01/19/2024] [Indexed: 02/22/2024] Open
Abstract
Purpose N4-acetylcytidine (ac4C) is a post-transcriptional RNA modification catalyzed by N-acetyltransferase 10 (NAT10), a critical factor known to influence mRNA stability. However, the role of ac4C in visual development remains unexplored. Methods Analysis of public datasets and immunohistochemical staining were conducted to assess the expression pattern of nat10 in zebrafish. We used CRISPR/Cas9 and RNAi technologies to knockout (KO) and knockdown (KD) nat10, the zebrafish ortholog of human NAT10, and evaluated its effects on early development. To assess the impact of nat10 knockdown on visual function, we performed comprehensive histological evaluations and behavioral analyses. Transcriptome profiling and real-time (RT)-PCR were utilized to detect alterations in gene expression resulting from the nat10 knockdown. Dot-blot and RNA immunoprecipitation (RIP)-PCR analyses were conducted to verify changes in ac4C levels in both total RNA and opsin mRNA specifically. Additionally, we used the actinomycin D assay to examine the stability of opsin mRNA following the nat10 KD. Results Our study found that the zebrafish NAT10 protein shares similar structural properties with its human counterpart. We observed that the nat10 gene was prominently expressed in the visual system during early zebrafish development. A deficiency of nat10 in zebrafish embryos resulted in increased mortality and developmental abnormalities. Behavioral and histological assessments indicated significant vision impairment in nat10 KD zebrafish. Transcriptomic analysis and RT-PCR identified substantial downregulation of retinal transcripts related to phototransduction, light response, photoreceptors, and visual perception in the nat10 KD group. Dot-blot and RIP-PCR analyses confirmed a pronounced reduction in ac4C levels in both total RNA and specifically in opsin messenger RNA (mRNA). Additionally, by evaluating mRNA decay in zebrafish treated with actinomycin D, we observed a significant decrease in the stability of opsin mRNA in the nat10 KD group. Conclusions The ac4C-mediated mRNA modification plays an essential role in maintaining visual development and retinal function. The loss of NAT10-mediated ac4C modification results in significant disruptions to these processes, underlining the importance of this RNA modification in ocular development.
Collapse
Affiliation(s)
| | - Donghai Zhuo
- School of Medicine, Nankai University, Tianjin, China
| | | | - Gan Luo
- Tianjin Medical University, Tianjin, China
- Department of Spinal Surgery, Tianjin Union Medical Center, Tianjin, China
| | - Shuang Liang
- Tianjin Central Hospital of Gynecology and Obstetrics, Tianjin, China
| | - Yonggang Fan
- School of Medicine, Nankai University, Tianjin, China
| | - Ying Zhao
- School of Medicine, Nankai University, Tianjin, China
| | - Xinxin Lv
- School of Medicine, Nankai University, Tianjin, China
| | - Caizhen Qiu
- School of Medicine, Nankai University, Tianjin, China
| | - Lingzhu Zhang
- School of Medicine, Nankai University, Tianjin, China
| | - Yang Liu
- Department of Spinal Surgery, Tianjin Union Medical Center, Tianjin, China
| | - Tianwei Sun
- Tianjin Medical University, Tianjin, China
- Department of Spinal Surgery, Tianjin Union Medical Center, Tianjin, China
| | - Xu Chen
- Tianjin Central Hospital of Gynecology and Obstetrics, Tianjin, China
- Tianjin Key Laboratory of Human Development and Reproductive Regulation, Tianjin, China
| | - Shan-Shan Li
- School of Medicine, Nankai University, Tianjin, China
| | - Xin Jin
- School of Medicine, Nankai University, Tianjin, China
- Tianjin Central Hospital of Gynecology and Obstetrics, Tianjin, China
- Tianjin Key Laboratory of Human Development and Reproductive Regulation, Tianjin, China
| |
Collapse
|
23
|
Wang H, Huang T, Wang D, Zeng W, Sun Y, Zhang L. MSCAN: multi-scale self- and cross-attention network for RNA methylation site prediction. BMC Bioinformatics 2024; 25:32. [PMID: 38233745 PMCID: PMC10795237 DOI: 10.1186/s12859-024-05649-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 01/11/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all RNA types. Precise recognition of RNA modifications is critical for understanding their functions and regulatory mechanisms. However, wet experimental methods are often costly and time-consuming, limiting their wide range of applications. Therefore, recent research has focused on developing computational methods, particularly deep learning (DL). Bidirectional long short-term memory (BiLSTM), convolutional neural network (CNN), and the transformer have demonstrated achievements in modification site prediction. However, BiLSTM cannot achieve parallel computation, leading to a long training time, CNN cannot learn the dependencies of the long distance of the sequence, and the Transformer lacks information interaction with sequences at different scales. This insight underscores the necessity for continued research and development in natural language processing (NLP) and DL to devise an enhanced prediction framework that can effectively address the challenges presented. RESULTS This study presents a multi-scale self- and cross-attention network (MSCAN) to identify the RNA methylation site using an NLP and DL way. Experiment results on twelve RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) reveal that the area under the receiver operating characteristic of MSCAN obtains respectively 98.34%, 85.41%, 97.29%, 96.74%, 99.04%, 79.94%, 76.22%, 65.69%, 92.92%, 92.03%, 95.77%, 89.66%, which is better than the state-of-the-art prediction model. This indicates that the model has strong generalization capabilities. Furthermore, MSCAN reveals a strong association among different types of RNA modifications from an experimental perspective. A user-friendly web server for predicting twelve widely occurring human RNA modification sites (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um) is available at http://47.242.23.141/MSCAN/index.php . CONCLUSIONS A predictor framework has been developed through binary classification to predict RNA methylation sites.
Collapse
Affiliation(s)
- Honglei Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
- School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou, 221400, China
| | - Tao Huang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Dong Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Wenliang Zeng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yanjing Sun
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| |
Collapse
|
24
|
Ji Y, Sun J, Xie J, Wu W, Shuai SC, Zhao Q, Chen W. m5UMCB: Prediction of RNA 5-methyluridine sites using multi-scale convolutional neural network with BiLSTM. Comput Biol Med 2024; 168:107793. [PMID: 38048661 DOI: 10.1016/j.compbiomed.2023.107793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 11/20/2023] [Accepted: 11/28/2023] [Indexed: 12/06/2023]
Abstract
As a prevalent RNA modification, 5-methyluridine (m5U) plays a critical role in diverse biological processes and disease pathogenesis. High-throughput identification of m5U typically relies on labor-intensive biochemical experiments using various sequencing-based techniques, which are not only time-consuming but also expensive. Consequently, there is a pressing need for more efficient and cost-effective computational methods to complement these high-throughput techniques. In this study, we present m5UMCB, a novel approach that harnesses a multi-scale convolutional neural network (CNN) in tandem with bidirectional long short-term memory (BiLSTM) to recognize m5U sites. Our method involves segmenting RNA sequences into smaller fragments based on a 3-mer length and subsequently mapping each fragment to a lower-dimensional vector representation using the global vectors for word representation (GloVe) technique. Through a series of multi-scale convolution and pooling operations, local features are extracted from RNA sequences and transformed into abstract, high-level features. The feature matrix is then inputted into a BiLSTM network, enabling the capture of contextual information and long-term dependencies within the sequence. Ultimately, a fully connected layer is employed to classify m5U sites. The validation results from 5-fold cross-validation (5-fold CV) test indicate that m5UMCB outperforms existing state-of-the-art predictive methods, demonstrating a 1.98% increase in the area under ROC curve (AUC) and significant improvements in relevant evaluation metrics. We are confident that m5UMCB will serve as a valuable tool for m5U prediction.
Collapse
Affiliation(s)
- Yingshan Ji
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Jingxuan Xie
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Wei Wu
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Stella C Shuai
- Biological Science, Northwestern University, Evanston, IL, 60208, USA
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China.
| |
Collapse
|
25
|
Jiang Y, Yang K, Jia B, Gao Y, Chen Y, Chen P, Lu X, Zhang W, Wang X. Nicotine destructs dental stem cell-based periodontal tissue regeneration. J Dent Sci 2024; 19:231-245. [PMID: 38303843 PMCID: PMC10829564 DOI: 10.1016/j.jds.2023.04.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 04/18/2023] [Indexed: 02/03/2024] Open
Abstract
Background/purpose Nicotine is a widely known addictive and toxic substance in cigarette that exacerbates periodontitis. However, its deleterious effects on dental stem cells and subsequent implications in tissue regeneration remain unclear. This study aimed to explore the effects of nicotine on the regenerative capacity of human periodontal ligament stem cells (hPDLSCs) based on transcriptomics and proteomics, and determined possible targeted genes associated with smoking-related periodontitis. Materials and methods hPDLSCs were treated with different concentrations of nicotine ranging from 10-3 to 10-8 M. Transcriptomics and proteomics were performed and confirmed employing Western blot, 5-ethynyl-2'-deoxyuridine (EdU), and alkaline phosphatase (ALP) staining. A ligature-induced periodontitis mouse model was established and administrated with nicotine (16.2 μg/10 μL) via gingival sulcus. The bone resorption was assessed by micro-computed tomography and histological staining. Key genes were identified using multi-omics analysis with verifications in hPDLSCs and human periodontal tissues. Results Based on enrichments analysis, nicotine-treated hPDLSCs exhibited decreased proliferation and differentiation abilities. Local administration of nicotine in mouse model significantly aggravated bone resorption and undermined periodontal tissue regeneration by inhibiting the endogenous dental stem cells regenerative ability. HMGCS1, GPNMB, and CHRNA7 were hub-genes according to the network analysis and corelated with proliferation and differentiation capabilities, which were also verified in both cells and tissues. Conclusion Our study investigated the destructive effects of nicotine on the regeneration of periodontal tissues from aspects of in vitro and in vivo with the supporting information from both transcriptome and proteome, providing novel targets into the molecular mechanisms of smoking-related periodontitis.
Collapse
Affiliation(s)
- Yuran Jiang
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Stomatology, School of Stomatology, Fourth Military Medical University, Xi'an, Shaanxi, China
| | - Kuan Yang
- School of Stomatology, Qingdao University, Qingdao, Shandong, China
| | - Bo Jia
- State Key Laboratory of Cancer Biology, Biotechnology Center, School of Pharmacy, Forth Military Medical University, Xi'an, Shaanxi, China
| | - Yuan Gao
- State Key Laboratory of Cancer Biology, Biotechnology Center, School of Pharmacy, Forth Military Medical University, Xi'an, Shaanxi, China
- School of Biomedical Science, Li Ka-shing School of Medicine, Hong Kong University, Hong Kong, China
| | - Yujiang Chen
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Stomatology, School of Stomatology, Fourth Military Medical University, Xi'an, Shaanxi, China
| | - Peng Chen
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Stomatology, School of Stomatology, Fourth Military Medical University, Xi'an, Shaanxi, China
| | - Xiaoxi Lu
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Stomatology, School of Stomatology, Fourth Military Medical University, Xi'an, Shaanxi, China
| | - Wei Zhang
- State Key Laboratory of Cancer Biology, Biotechnology Center, School of Pharmacy, Forth Military Medical University, Xi'an, Shaanxi, China
| | - Xiaojing Wang
- State Key Laboratory of Military Stomatology, National Clinical Research Center for Oral Diseases, Shaanxi Key Laboratory of Stomatology, School of Stomatology, Fourth Military Medical University, Xi'an, Shaanxi, China
| |
Collapse
|
26
|
Wang H, Zeng W, Huang X, Liu Z, Sun Y, Zhang L. MTTLm 6A: A multi-task transfer learning approach for base-resolution mRNA m 6A site prediction based on an improved transformer. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:272-299. [PMID: 38303423 DOI: 10.3934/mbe.2024013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
N6-methyladenosine (m6A) is a crucial RNA modification involved in various biological activities. Computational methods have been developed for the detection of m6A sites in Saccharomyces cerevisiae at base-resolution due to their cost-effectiveness and efficiency. However, the generalization of these methods has been hindered by limited base-resolution datasets. Additionally, RMBase contains a vast number of low-resolution m6A sites for Saccharomyces cerevisiae, and base-resolution sites are often inferred from these low-resolution results through post-calibration. We propose MTTLm6A, a multi-task transfer learning approach for base-resolution mRNA m6A site prediction based on an improved transformer. First, the RNA sequences are encoded by using one-hot encoding. Then, we construct a multi-task model that combines a convolutional neural network with a multi-head-attention deep framework. This model not only detects low-resolution m6A sites, it also assigns reasonable probabilities to the predicted sites. Finally, we employ transfer learning to predict base-resolution m6A sites based on the low-resolution m6A sites. Experimental results on Saccharomyces cerevisiae m6A and Homo sapiens m1A data demonstrate that MTTLm6A respectively achieved area under the receiver operating characteristic (AUROC) values of 77.13% and 92.9%, outperforming the state-of-the-art models. At the same time, it shows that the model has strong generalization ability. To enhance user convenience, we have made a user-friendly web server for MTTLm6A publicly available at http://47.242.23.141/MTTLm6A/index.php.
Collapse
Affiliation(s)
- Honglei Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
- School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou, China
| | - Wenliang Zeng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xiaoling Huang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Zhaoyang Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Yanjing Sun
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
27
|
Zhang Y, Wang Z, Zhang Y, Li S, Guo Y, Song J, Yu DJ. Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues. Bioinformatics 2023; 39:btad709. [PMID: 37995291 PMCID: PMC10697738 DOI: 10.1093/bioinformatics/btad709] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 11/01/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
MOTIVATION RNA N6-methyladenosine (m6A) in Homo sapiens plays vital roles in a variety of biological functions. Precise identification of m6A modifications is thus essential to elucidation of their biological functions and underlying molecular-level mechanisms. Currently available high-throughput single-nucleotide-resolution m6A modification data considerably accelerated the identification of RNA modification sites through the development of data-driven computational methods. Nevertheless, existing methods have limitations in terms of the coverage of single-nucleotide-resolution cell lines and have poor capability in model interpretations, thereby having limited applicability. RESULTS In this study, we present CLSM6A, comprising a set of deep learning-based models designed for predicting single-nucleotide-resolution m6A RNA modification sites across eight different cell lines and three tissues. Extensive benchmarking experiments are conducted on well-curated datasets and accordingly, CLSM6A achieves superior performance than current state-of-the-art methods. Furthermore, CLSM6A is capable of interpreting the prediction decision-making process by excavating critical motifs activated by filters and pinpointing highly concerned positions in both forward and backward propagations. CLSM6A exhibits better portability on similar cross-cell line/tissue datasets, reveals a strong association between highly activated motifs and high-impact motifs, and demonstrates complementary attributes of different interpretation strategies. AVAILABILITY AND IMPLEMENTATION The webserver is available at http://csbio.njust.edu.cn/bioinf/clsm6a. The datasets and code are available at https://github.com/zhangying-njust/CLSM6A/.
Collapse
Affiliation(s)
- Ying Zhang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yiwen Zhang
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Shanshan Li
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Yuming Guo
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| |
Collapse
|
28
|
Yang Y, Liu Z, Lu J, Sun Y, Fu Y, Pan M, Xie X, Ge Q. Analysis approaches for the identification and prediction of N6-methyladenosine sites. Epigenetics 2023; 18:2158284. [PMID: 36562485 PMCID: PMC9980620 DOI: 10.1080/15592294.2022.2158284] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The global dynamics in a variety of biological processes can be revealed by mapping transcriptional m6A sites, in particular full-transcriptome m6A. And individual m6A sites have contributed to biological function, which can be evaluated by stoichiometric information obtained from the single nucleotide resolution. Currently, the identification of m6A sites is mainly carried out by experiment and prediction methods, based on high-throughput sequencing and machine learning model respectively. This review summarizes the recent topics and progress made in bioinformatics methods of deciphering the m6A methylation, including the experimental detection of m6A methylation sites, techniques of data analysis, the way of predicting m6A methylation sites, m6A methylation databases, and detection of m6A modification in circRNA. At the end, the essay makes a brief discussion for the development perspective in this area.
Collapse
Affiliation(s)
- Yuwei Yang
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Zhiyu Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Junru Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Yuqing Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Yue Fu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Min Pan
- Department of Pathology and Pathophysiology School of Medicine, Southeast University, Nanjing, China
| | - Xueying Xie
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| |
Collapse
|
29
|
Zhang J, Li Y, Zhang J, Liu L, Chen Y, Yang X, Liao X, He M, Jia Z, Fan J, Bian JS, Nie X. ADAR1 regulates vascular remodeling in hypoxic pulmonary hypertension through N1-methyladenosine modification of circCDK17. Acta Pharm Sin B 2023; 13:4840-4855. [PMID: 38045055 PMCID: PMC10692360 DOI: 10.1016/j.apsb.2023.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 06/13/2023] [Accepted: 07/05/2023] [Indexed: 12/05/2023] Open
Abstract
Pulmonary hypertension (PH) is an extremely malignant pulmonary vascular disease of unknown etiology. ADAR1 is an RNA editing enzyme that converts adenosine in RNA to inosine, thereby affecting RNA expression. However, the role of ADAR1 in PH development remains unclear. In the present study, we investigated the biological role and molecular mechanism of ADAR1 in PH pulmonary vascular remodeling. Overexpression of ADAR1 aggravated PH progression and promoted the proliferation of pulmonary artery smooth muscle cells (PASMCs). Conversely, inhibition of ADAR1 produced opposite effects. High-throughput whole transcriptome sequencing showed that ADAR1 was an important regulator of circRNAs in PH. CircCDK17 level was significantly lowered in the serum of PH patients. The effects of ADAR1 on cell cycle progression and proliferation were mediated by circCDK17. ADAR1 affects the stability of circCDK17 by mediating A-to-I modification at the A5 and A293 sites of circCDK17 to prevent it from m1A modification. We demonstrate for the first time that ADAR1 contributes to the PH development, at least partially, through m1A modification of circCDK17 and the subsequent PASMCs proliferation. Our study provides a novel therapeutic strategy for treatment of PH and the evidence for circCDK17 as a potential novel marker for the diagnosis of this disease.
Collapse
Affiliation(s)
- Junting Zhang
- Shenzhen Institute of Respiratory Disease, Shenzhen Key Laboratory of Respiratory Disease, Shenzhen People's Hospital (the Second Clinical Medical College, Jinan University; the First Affiliated Hospital, Southern University of Science and Technology), Shenzhen 518020, China; Post-Doctoral Scientific Research Station of Basic Medicine, Jinan University, Guangzhou 510632, China
| | - Yiying Li
- Shenzhen Institute of Respiratory Disease, Shenzhen Key Laboratory of Respiratory Disease, Shenzhen People's Hospital (the Second Clinical Medical College, Jinan University; the First Affiliated Hospital, Southern University of Science and Technology), Shenzhen 518020, China; Post-Doctoral Scientific Research Station of Basic Medicine, Jinan University, Guangzhou 510632, China
| | - Jianchao Zhang
- Shenzhen Institute of Respiratory Disease, Shenzhen Key Laboratory of Respiratory Disease, Shenzhen People's Hospital (the Second Clinical Medical College, Jinan University; the First Affiliated Hospital, Southern University of Science and Technology), Shenzhen 518020, China; Post-Doctoral Scientific Research Station of Basic Medicine, Jinan University, Guangzhou 510632, China
| | - Lu Liu
- Department of Pharmacology, School of Medicine, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yuan Chen
- Lung Transplant Group, Wuxi People's Hospital Affiliated to Nanjing Medical University, Wuxi 211103, China
| | - Xusheng Yang
- Lung Transplant Group, Wuxi People's Hospital Affiliated to Nanjing Medical University, Wuxi 211103, China
| | - Xueyi Liao
- Shenzhen Institute of Respiratory Disease, Shenzhen Key Laboratory of Respiratory Disease, Shenzhen People's Hospital (the Second Clinical Medical College, Jinan University; the First Affiliated Hospital, Southern University of Science and Technology), Shenzhen 518020, China; Post-Doctoral Scientific Research Station of Basic Medicine, Jinan University, Guangzhou 510632, China
| | - Muhua He
- Department of Pharmacology, School of Medicine, Southern University of Science and Technology, Shenzhen 518055, China
| | - Zihui Jia
- Lung Transplant Group, Wuxi People's Hospital Affiliated to Nanjing Medical University, Wuxi 211103, China
| | - Jun Fan
- Department of Medical Biochemistry and Molecular Biology, School of Medicine, Jinan University, Guangzhou 510632, China
| | - Jin-Song Bian
- Department of Pharmacology, School of Medicine, Southern University of Science and Technology, Shenzhen 518055, China
| | - Xiaowei Nie
- Shenzhen Institute of Respiratory Disease, Shenzhen Key Laboratory of Respiratory Disease, Shenzhen People's Hospital (the Second Clinical Medical College, Jinan University; the First Affiliated Hospital, Southern University of Science and Technology), Shenzhen 518020, China; Post-Doctoral Scientific Research Station of Basic Medicine, Jinan University, Guangzhou 510632, China
| |
Collapse
|
30
|
Zhang M, Zhao J, Wu J, Wang Y, Zhuang M, Zou L, Mao R, Jiang B, Liu J, Song X. In-depth characterization and identification of translatable lncRNAs. Comput Biol Med 2023; 164:107243. [PMID: 37453378 DOI: 10.1016/j.compbiomed.2023.107243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/16/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
Long non-coding RNAs (LncRNAs) are non-protein coding transcripts more than 200 nucleotides in length. Deep sequencing technologies have unveiled lncRNAs can harbor translatable short open reading frames (sORFs). Yet the regulatory mechanisms governing lncRNA translation events remain poorly understood. Here, we exhaustively detected the sequence, functional element, and structure features relevant to lncRNA translation in human. Extensive identification and analysis reveal that translatable lncRNAs contain richer protein-coding related sequence features, cap-dependent and cap-independent translation initiation mechanisms, and more stable secondary structures, as compared to untranslatable lncRNAs. These findings strongly support lncRNAs serve as a repository for the production of new small peptides. Based on the feature fusion affecting translation and the extreme gradient boosting (XGBoost) algorithm, we developed the first computational tool that dedicated for predicting translatable lncRNAs, named TransLncPred. Benchmark experimental results show that our method outperforms several state-of-the-art RNA coding potential prediction tools on the same training and testing datasets. The 100-time 10-fold cross-validation tests also demonstrate that regulatory element-derived features, especially N7-methylguanosine (m7G) and internal ribosome entry site (IRES), contribute to the improvement in predictive performance.
Collapse
Affiliation(s)
- Meng Zhang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Jian Zhao
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China.
| | - Jing Wu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
| | - Yulan Wang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Minhui Zhuang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Lingxiao Zou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Renlong Mao
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Bin Jiang
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Jingjing Liu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Xiaofeng Song
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China.
| |
Collapse
|
31
|
Zhang Y, Ge F, Li F, Yang X, Song J, Yu DJ. Prediction of Multiple Types of RNA Modifications via Biological Language Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3205-3214. [PMID: 37289599 DOI: 10.1109/tcbb.2023.3283985] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
It has been demonstrated that RNA modifications play essential roles in multiple biological processes. Accurate identification of RNA modifications in the transcriptome is critical for providing insights into the biological functions and mechanisms. Many tools have been developed for predicting RNA modifications at single-base resolution, which employ conventional feature engineering methods that focus on feature design and feature selection processes that require extensive biological expertise and may introduce redundant information. With the rapid development of artificial intelligence technologies, end-to-end methods are favorably received by researchers. Nevertheless, each well-trained model is only suitable for a specific RNA methylation modification type for nearly all of these approaches. In this study, we present MRM-BERT by feeding task-specific sequences into the powerful BERT (Bidirectional Encoder Representations from Transformers) model and implementing fine-tuning, which exhibits competitive performance to the state-of-the-art methods. MRM-BERT avoids repeated de novo training of the model and can predict multiple RNA modifications such as pseudouridine, m6A, m5C, and m1A in Mus musculus, Arabidopsis thaliana, and Saccharomyces cerevisiae. In addition, we analyse the attention heads to provide high attention regions for the prediction, and conduct saturated in silico mutagenesis of the input sequences to discover potential changes of RNA modifications, which can better assist researchers in their follow-up research.
Collapse
|
32
|
Liang S, Zhao Y, Jin J, Qiao J, Wang D, Wang Y, Wei L. Rm-LR: A long-range-based deep learning model for predicting multiple types of RNA modifications. Comput Biol Med 2023; 164:107238. [PMID: 37515874 DOI: 10.1016/j.compbiomed.2023.107238] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Recent research has highlighted the pivotal role of RNA post-transcriptional modifications in the regulation of RNA expression and function. Accurate identification of RNA modification sites is important for understanding RNA function. In this study, we propose a novel RNA modification prediction method, namely Rm-LR, which leverages a long-range-based deep learning approach to accurately predict multiple types of RNA modifications using RNA sequences only. Rm-LR incorporates two large-scale RNA language pre-trained models to capture discriminative sequential information and learn local important features, which are subsequently integrated through a bilinear attention network. Rm-LR supports a total of ten RNA modification types (m6A, m1A, m5C, m5U, m6Am, Ψ, Am, Cm, Gm, and Um) and significantly outperforms the state-of-the-art methods in terms of predictive capability on benchmark datasets. Experimental results show the effectiveness and superiority of Rm-LR in prediction of various RNA modifications, demonstrating the strong adaptability and robustness of our proposed model. We demonstrate that RNA language pretrained models enable to learn dense biological sequential representations from large-scale long-range RNA corpus, and meanwhile enhance the interpretability of the models. This work contributes to the development of accurate and reliable computational models for RNA modification prediction, providing insights into the complex landscape of RNA modifications.
Collapse
Affiliation(s)
- Sirui Liang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yanxi Zhao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Ding Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China.
| |
Collapse
|
33
|
Song B, Huang D, Zhang Y, Wei Z, Su J, Pedro de Magalhães J, Rigden DJ, Meng J, Chen K. m6A-TSHub: Unveiling the Context-specific m 6A Methylation and m 6A-affecting Mutations in 23 Human Tissues. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:678-694. [PMID: 36096444 PMCID: PMC10787194 DOI: 10.1016/j.gpb.2022.09.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 08/19/2022] [Accepted: 09/02/2022] [Indexed: 06/15/2023]
Abstract
As the most pervasive epigenetic marker present on mRNAs and long non-coding RNAs (lncRNAs), N6-methyladenosine (m6A) RNA methylation has been shown to participate in essential biological processes. Recent studies have revealed the distinct patterns of m6A methylome across human tissues, and a major challenge remains in elucidating the tissue-specific presence and circuitry of m6A methylation. We present here a comprehensive online platform, m6A-TSHub, for unveiling the context-specific m6A methylation and genetic mutations that potentially regulate m6A epigenetic mark. m6A-TSHub consists of four core components, including (1) m6A-TSDB, a comprehensive database of 184,554 functionally annotated m6A sites derived from 23 human tissues and 499,369 m6A sites from 25 tumor conditions, respectively; (2) m6A-TSFinder, a web server for high-accuracy prediction of m6A methylation sites within a specific tissue from RNA sequences, which was constructed using multi-instance deep neural networks with gated attention; (3) m6A-TSVar, a web server for assessing the impact of genetic variants on tissue-specific m6A RNA modifications; and (4) m6A-CAVar, a database of 587,983 The Cancer Genome Atlas (TCGA) cancer mutations (derived from 27 cancer types) that were predicted to affect m6A modifications in the primary tissue of cancers. The database should make a useful resource for studying the m6A methylome and the genetic factors of epitranscriptome disturbance in a specific tissue (or cancer type). m6A-TSHub is accessible at www.xjtlu.edu.cn/biologicalsciences/m6ats.
Collapse
Affiliation(s)
- Bowen Song
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China; Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Daiyun Huang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Department of Computer Science, University of Liverpool, Liverpool L69 7ZB, United Kingdom.
| | - Yuxin Zhang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Ageing & Chronic Disease, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Jionglong Su
- School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - João Pedro de Magalhães
- Institute of Ageing & Chronic Disease, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Jia Meng
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom; Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; AI University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Kunqi Chen
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China.
| |
Collapse
|
34
|
Wang Y, Tai S, Zhang S, Sheng N, Xie X. PromGER: Promoter Prediction Based on Graph Embedding and Ensemble Learning for Eukaryotic Sequence. Genes (Basel) 2023; 14:1441. [PMID: 37510345 PMCID: PMC10379012 DOI: 10.3390/genes14071441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/04/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open
Abstract
Promoters are DNA non-coding regions around the transcription start site and are responsible for regulating the gene transcription process. Due to their key role in gene function and transcriptional activity, the prediction of promoter sequences and their core elements accurately is a crucial research area in bioinformatics. At present, models based on machine learning and deep learning have been developed for promoter prediction. However, these models cannot mine the deeper biological information of promoter sequences and consider the complex relationship among promoter sequences. In this work, we propose a novel prediction model called PromGER to predict eukaryotic promoter sequences. For a promoter sequence, firstly, PromGER utilizes four types of feature-encoding methods to extract local information within promoter sequences. Secondly, according to the potential relationships among promoter sequences, the whole promoter sequences are constructed as a graph. Furthermore, three different scales of graph-embedding methods are applied for obtaining the global feature information more comprehensively in the graph. Finally, combining local features with global features of sequences, PromGER analyzes and predicts promoter sequences through a tree-based ensemble-learning framework. Compared with seven existing methods, PromGER improved the average specificity of 13%, accuracy of 10%, Matthew's correlation coefficient of 16%, precision of 4%, F1 score of 6%, and AUC of 9%. Specifically, this study interpreted the PromGER by the t-distributed stochastic neighbor embedding (t-SNE) method and SHAPley Additive exPlanations (SHAP) value analysis, which demonstrates the interpretability of the model.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Shiwen Tai
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Shuangquan Zhang
- School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Nan Sheng
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Xuping Xie
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| |
Collapse
|
35
|
Wang R, Cheng X, Chi D, Liu S, Li Q, Chen B, Xi M. M 1A and m 7G modification-related genes are potential biomarkers for survival prognosis and for deciphering the tumor immune microenvironment in esophageal squamous cell carcinoma. Discov Oncol 2023; 14:99. [PMID: 37314494 DOI: 10.1007/s12672-023-00710-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 06/01/2023] [Indexed: 06/15/2023] Open
Abstract
BACKGROUND Esophageal squamous cell carcinoma (ESCC) is the most common esophageal malignancy, and RNA methylation has been reported to be involved in the tumorigenesis of ESCC. However, no study has explored methylation modifications in m1A and m7G as prognostic markers for survival prediction in ESCC. METHODS Public gene-expression data and clinical annotation of 254 patients obtained from The Cancer Genome Atlas and the Gene Expression Omnibus databases were analyzed to identify potential consensus clusters of m1A and m7G modification-related genes. The RNA-seq of 20 patients in Sun Yat-Sen University Cancer Center was used as the validation set. Following screening for relevant differentially expressed genes (DEGs) and enrichment pathways were elucidated. DEGs were used to construct risk models using the randomForest algorithm, and the prognostic role of the models was assessed by applying Kaplan-Meier analysis. Extent of immune cell infiltration, drug resistance, and response to cancer treatment among different clusters and risk groups were also evaluated. RESULTS Consensus clustering analysis based on m1A and m7G modification patterns revealed three potential clusters. In total, 212 RNA methylation-related DEGs were identified. The methylation-associated signature consisting of 6 genes was then constructed to calculate methylation-related score (MRScore) and patients were dived into MRScore-high and MRScore-low groups. This signature has satisfied prognostic value for survival of ESCC (AUC = 0.66, 0.67, 0.64 for 2-, 3-, 4- year OS), and has satisfied performance in the validation SYSUCC cohort (AUC = 0.66 for 2- and 3-year OS). Significant correlation between m1A and m7G modification-related genes and immune cell infiltration, and drug resistance was also observed. CONCLUSIONS Transcriptomic prognostic signatures based on m1A and m7G modification-related genes are closely associated with immune cell infiltration in ESCC patients and have important correlations with the therapeutic sensitivity of multiple chemotherapeutic agents.
Collapse
Affiliation(s)
- Ruixi Wang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Centre for Cancer Medicine, Guangdong Esophageal Cancer Institute, Guangzhou, China
- Department of Radiation Oncology, Sun Yat-Sen University Cancer Center, No. 651 Dongfeng East Road, Guangzhou, 510060, China
| | - Xingyuan Cheng
- State Key Laboratory of Oncology in South China, Collaborative Innovation Centre for Cancer Medicine, Guangdong Esophageal Cancer Institute, Guangzhou, China
- Department of Radiation Oncology, Sun Yat-Sen University Cancer Center, No. 651 Dongfeng East Road, Guangzhou, 510060, China
| | - Dongmei Chi
- State Key Laboratory of Oncology in South China, Collaborative Innovation Centre for Cancer Medicine, Guangdong Esophageal Cancer Institute, Guangzhou, China
- Department of Anesthesiology, Sun Yat-Sen University Cancer Center, No. 651 Dongfeng East Road, Guangzhou, 510060, China
| | - Shiliang Liu
- State Key Laboratory of Oncology in South China, Collaborative Innovation Centre for Cancer Medicine, Guangdong Esophageal Cancer Institute, Guangzhou, China
- Department of Radiation Oncology, Sun Yat-Sen University Cancer Center, No. 651 Dongfeng East Road, Guangzhou, 510060, China
| | - Qiaoqiao Li
- State Key Laboratory of Oncology in South China, Collaborative Innovation Centre for Cancer Medicine, Guangdong Esophageal Cancer Institute, Guangzhou, China
- Department of Radiation Oncology, Sun Yat-Sen University Cancer Center, No. 651 Dongfeng East Road, Guangzhou, 510060, China
| | - Baoqing Chen
- State Key Laboratory of Oncology in South China, Collaborative Innovation Centre for Cancer Medicine, Guangdong Esophageal Cancer Institute, Guangzhou, China.
- Department of Radiation Oncology, Sun Yat-Sen University Cancer Center, No. 651 Dongfeng East Road, Guangzhou, 510060, China.
| | - Mian Xi
- State Key Laboratory of Oncology in South China, Collaborative Innovation Centre for Cancer Medicine, Guangdong Esophageal Cancer Institute, Guangzhou, China.
- Department of Radiation Oncology, Sun Yat-Sen University Cancer Center, No. 651 Dongfeng East Road, Guangzhou, 510060, China.
| |
Collapse
|
36
|
Yu L, Zhang Y, Xue L, Liu F, Jing R, Luo J. Evaluation and development of deep neural networks for RNA 5-Methyluridine classifications using autoBioSeqpy. Front Microbiol 2023; 14:1175925. [PMID: 37275146 PMCID: PMC10232852 DOI: 10.3389/fmicb.2023.1175925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 04/27/2023] [Indexed: 06/07/2023] Open
Abstract
Post-transcriptionally RNA modifications, also known as the epitranscriptome, play crucial roles in the regulation of gene expression during development. Recently, deep learning (DL) has been employed for RNA modification site prediction and has shown promising results. However, due to the lack of relevant studies, it is unclear which DL architecture is best suited for some pyrimidine modifications, such as 5-methyluridine (m5U). To fill this knowledge gap, we first performed a comparative evaluation of various commonly used DL models for epigenetic studies with the help of autoBioSeqpy. We identified optimal architectural variations for m5U site classification, optimizing the layer depth and neuron width. Second, we used this knowledge to develop Deepm5U, an improved convolutional-recurrent neural network that accurately predicts m5U sites from RNA sequences. We successfully applied Deepm5U to transcriptomewide m5U profiling data across different sequencing technologies and cell types. Third, we showed that the techniques for interpreting deep neural networks, including LayerUMAP and DeepSHAP, can provide important insights into the internal operation and behavior of models. Overall, we offered practical guidance for the development, benchmark, and analysis of deep learning models when designing new algorithms for RNA modifications.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Yonglin Zhang
- Department of Pharmacy, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| | - Jiesi Luo
- Basic Medical College, Southwest Medical University, Luzhou, China
- Sichuan Key Medical Laboratory of New Drug Discovery and Druggability Evaluation, Luzhou Key Laboratory of Activity Screening and Druggability Evaluation for Chinese Materia Medica, Southwest Medical University, Luzhou, China
| |
Collapse
|
37
|
Wang L, Tang Y. N6-methyladenosine (m6A) in cancer stem cell: From molecular mechanisms to therapeutic implications. Biomed Pharmacother 2023; 163:114846. [PMID: 37167725 DOI: 10.1016/j.biopha.2023.114846] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 04/21/2023] [Accepted: 05/04/2023] [Indexed: 05/13/2023] Open
Abstract
The emergence of drug resistance and metastasis has long been a difficult problem for cancer treatment. Recent studies have shown that cancer stem cell populations are key factors in the regulation of cancer aggressiveness, relapse and drug resistance. Cancer stem cell (CSC) populations are highly plastic and self-renewing, giving them unique metabolic, metastatic, and chemotherapy resistance properties. N6-methyladenosine (m6A) is the most abundant internal modification of mRNA and is involved in a variety of cell growth and development processes, including RNA transcription, alternative splicing, degradation, and translation. It has also been linked to the development of various cancers. At present, the important role of m6A in tumour progression is gradually attracting attention, especially in the tumour stemness regulation process. Abnormal m6A modifications regulate tumour metastasis, recurrence and drug resistance. This paper aims to explore the regulatory mechanism of m6A in CSCs and clinical therapy, clarify its regulatory network, and provide theoretical guidance for the development of clinical targets and improvement of therapeutic effects.
Collapse
Affiliation(s)
- Liming Wang
- Department of General Surgery, The Fourth Affiliated Hospital of China Medical University, Shenyang, P.R. China
| | - Yuanxin Tang
- Department of General Surgery, The Fourth Affiliated Hospital of China Medical University, Shenyang, P.R. China.
| |
Collapse
|
38
|
Acera Mateos P, Zhou Y, Zarnack K, Eyras E. Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning. Brief Bioinform 2023; 24:7150742. [PMID: 37139545 DOI: 10.1093/bib/bbad163] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/03/2023] [Indexed: 05/05/2023] Open
Abstract
The expanding field of epitranscriptomics might rival the epigenome in the diversity of biological processes impacted. In recent years, the development of new high-throughput experimental and computational techniques has been a key driving force in discovering the properties of RNA modifications. Machine learning applications, such as for classification, clustering or de novo identification, have been critical in these advances. Nonetheless, various challenges remain before the full potential of machine learning for epitranscriptomics can be leveraged. In this review, we provide a comprehensive survey of machine learning methods to detect RNA modifications using diverse input data sources. We describe strategies to train and test machine learning methods and to encode and interpret features that are relevant for epitranscriptomics. Finally, we identify some of the current challenges and open questions about RNA modification analysis, including the ambiguity in predicting RNA modifications in transcript isoforms or in single nucleotides, or the lack of complete ground truth sets to test RNA modifications. We believe this review will inspire and benefit the rapidly developing field of epitranscriptomics in addressing the current limitations through the effective use of machine learning.
Collapse
Affiliation(s)
- Pablo Acera Mateos
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| | - You Zhou
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| |
Collapse
|
39
|
Wang R, Chung CR, Huang HD, Lee TY. Identification of species-specific RNA N6-methyladinosine modification sites from RNA sequences. Brief Bioinform 2023; 24:7008797. [PMID: 36715277 DOI: 10.1093/bib/bbac573] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/11/2022] [Accepted: 11/24/2022] [Indexed: 01/31/2023] Open
Abstract
N6-methyladinosine (m6A) modification is the most abundant co-transcriptional modification in eukaryotic RNA and plays important roles in cellular regulation. Traditional high-throughput sequencing experiments used to explore functional mechanisms are time-consuming and labor-intensive, and most of the proposed methods focused on limited species types. To further understand the relevant biological mechanisms among different species with the same RNA modification, it is necessary to develop a computational scheme that can be applied to different species. To achieve this, we proposed an attention-based deep learning method, adaptive-m6A, which consists of convolutional neural network, bi-directional long short-term memory and an attention mechanism, to identify m6A sites in multiple species. In addition, three conventional machine learning (ML) methods, including support vector machine, random forest and logistic regression classifiers, were considered in this work. In addition to the performance of ML methods for multi-species prediction, the optimal performance of adaptive-m6A yielded an accuracy of 0.9832 and the area under the receiver operating characteristic curve of 0.98. Moreover, the motif analysis and cross-validation among different species were conducted to test the robustness of one model towards multiple species, which helped improve our understanding about the sequence characteristics and biological functions of RNA modifications in different species.
Collapse
Affiliation(s)
- Rulan Wang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| | - Chia-Ru Chung
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Life Sciences, University of Science and Technology of China, 230026, Hefei, Anhui, P.R. China
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| |
Collapse
|
40
|
M6A-BERT-Stacking: A Tissue-Specific Predictor for Identifying RNA N6-Methyladenosine Sites Based on BERT and Stacking Strategy. Symmetry (Basel) 2023. [DOI: 10.3390/sym15030731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023] Open
Abstract
As the most abundant RNA methylation modification, N6-methyladenosine (m6A) could regulate asymmetric and symmetric division of hematopoietic stem cells and play an important role in various diseases. Therefore, the precise identification of m6A sites around the genomes of different species is a critical step to further revealing their biological functions and influence on these diseases. However, the traditional wet-lab experimental methods for identifying m6A sites are often laborious and expensive. In this study, we proposed an ensemble deep learning model called m6A-BERT-Stacking, a powerful predictor for the detection of m6A sites in various tissues of three species. First, we utilized two encoding methods, i.e., di ribonucleotide index of RNA (DiNUCindex_RNA) and k-mer word segmentation, to extract RNA sequence features. Second, two encoding matrices together with the original sequences were respectively input into three different deep learning models in parallel to train three sub-models, namely residual networks with convolutional block attention module (Resnet-CBAM), bidirectional long short-term memory with attention (BiLSTM-Attention), and pre-trained bidirectional encoder representations from transformers model for DNA-language (DNABERT). Finally, the outputs of all sub-models were ensembled based on the stacking strategy to obtain the final prediction of m6A sites through the fully connected layer. The experimental results demonstrated that m6A-BERT-Stacking outperformed most of the existing methods based on the same independent datasets.
Collapse
|
41
|
Liu Y, Wang S, Li X, Liu Y, Zhu X. NeuroPpred-SVM: A New Model for Predicting Neuropeptides Based on Embeddings of BERT. J Proteome Res 2023; 22:718-728. [PMID: 36749151 DOI: 10.1021/acs.jproteome.2c00363] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Neuropeptides play pivotal roles in different physiological processes and are related to different kinds of diseases. Identification of neuropeptides is of great benefit for studying the mechanism of these physiological processes and the treatment of neurological disorders. Several state-of-the-art neuropeptide predictors have been developed by using a two-layer stacking ensemble algorithm. Although the two-layer stacking ensemble algorithm can improve the feature representability, these models are complex, which are not as efficient as the models based on one classifier. In this study, we proposed a new model, NeuroPpred-SVM, to predict neuropeptides based on the embeddings of Bidirectional Encoder Representations from Transformers and other sequential features by using a support vector machine (SVM). The experimental results indicate that our model achieved a cross-validation area under the receiver operating characteristic (AUROC) curve of 0.969 on the training data set and an AUROC of 0.966 on the independent test set. By comparing our model with the other four state-of-the-art models including NeuroPIpred, PredNeuroP, NeuroPpred-Fuse, and NeuroPpred-FRL on the independent test set, our model achieved the highest AUROC, Matthews correlation coefficient, accuracy, and specificity, which indicate that our model outperforms the existing models. We believed that NeuroPpred-SVM could be a useful tool for identifying neuropeptides with high accuracy and low cost. The data sets and Python code are available at https://github.com/liuyf-a/NeuroPpred-SVM.
Collapse
Affiliation(s)
- Yufeng Liu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Shuyu Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiang Li
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Yinbo Liu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
42
|
Fan Y, Sun G, Pan X. ELMo4m6A: A Contextual Language Embedding-Based Predictor for Detecting RNA N6-Methyladenosine Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:944-954. [PMID: 35536814 DOI: 10.1109/tcbb.2022.3173323] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
N6-methyladenosine (m6A) is a universal post-transcriptional modification of RNAs, and it is widely involved in various biological processes. Identifying m6A modification sites accurately is indispensable to further investigate m6A-mediated biological functions. How to better represent RNA sequences is crucial for building effective computational methods for detecting m6A modification sites. However, traditional encoding methods require complex biological prior knowledge and are time-consuming. Furthermore, most of the existing m6A sites prediction methods are limited to single species, and few methods are able to predict m6A sites across different species and tissues. Thus, it is necessary to design a more efficient computational method to predict m6A sites across multiple species and tissues. In this paper, we proposed ELMo4m6A, a contextual language embedding-based method for predicting m6A sites from RNA sequences without any prior knowledge. ELMo4m6A first learns embeddings of RNA sequences using a language model ELMo, then uses a hybrid convolutional neural network (CNN) and long short-term memory (LSTM) to identify m6A sites. The results of 5-fold cross-validation and independent testing demonstrate that ELMo4m6A is superior to state-of-the-art methods. Moreover, we applied integrated gradients to find potential sequence patterns contributing to m6A sites.
Collapse
|
43
|
Taguchi YH. Bioinformatic tools for epitranscriptomics. Am J Physiol Cell Physiol 2023; 324:C447-C457. [PMID: 36468841 DOI: 10.1152/ajpcell.00437.2022] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/17/2022] [Accepted: 11/22/2022] [Indexed: 12/12/2022]
Abstract
The epitranscriptome, defined as RNA modifications that do not involve alterations in the nucleotide sequence, is a popular topic in the genomic sciences. Because we need massive computational techniques to identify epitranscriptomes within individual transcripts, many tools have been developed to infer epitranscriptomic sites as well as to process datasets using high-throughput sequencing. In this review, we summarize recent developments in epitranscriptome spatial detection and data analysis and discuss their progression.
Collapse
Affiliation(s)
- Y-H Taguchi
- Department of Physics, Chuo University, Tokyo, Japan
| |
Collapse
|
44
|
Yan Y, Peng J, Liang Q, Ren X, Cai Y, Peng B, Chen X, Wang X, Yi Q, Xu Z. Dynamic m6A-ncRNAs association and their impact on cancer pathogenesis, immune regulation and therapeutic response. Genes Dis 2023; 10:135-150. [PMID: 37013031 PMCID: PMC10066278 DOI: 10.1016/j.gendis.2021.10.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Revised: 10/11/2021] [Accepted: 10/25/2021] [Indexed: 02/08/2023] Open
Abstract
Several types of modifications have been proven to participate in the metabolism and processing of different RNA types, including non-coding RNAs (ncRNAs). N-6-methyladenosine (m6A) is a dynamic and reversible RNA modification that is closely involved in the ncRNA homeostasis, and serves as a crucial regulator for multiple cancer-associated signaling pathways. The ncRNAs usually regulate the epigenetic modification, mRNA transcription and other biological processes, displaying enormous roles in human cancers. In this review, we summarized the significant implications of m6A-ncRNA interaction in various types of cancers. In particular, the interplay between m6A and ncRNAs in cancer pathogenesis and therapeutic resistance are being widely recognized. We also discussed the relevance of m6A-ncRNA interaction in immune regulation, followed by the interference on cancer immunotherapeutic procedures. In addition, we briefly highlighted the computation tools that could identify the accurate features of m6A methylome among ncRNAs. In summary, this review would pave the way for a better understanding of the biological functions of m6A-ncRNA crosstalk in cancer research and treatment.
Collapse
Affiliation(s)
- Yuanliang Yan
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jinwu Peng
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Pathology, Xiangya Changde Hospital, Changde, Hunan 415000, China
| | - Qiuju Liang
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xinxin Ren
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Center for Molecular Medicine, Xiangya Hospital, Key Laboratory of Molecular Radiation Oncology of Hunan Province, Central South University, Changsha, Hunan 410008, China
| | - Yuan Cai
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Bi Peng
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xi Chen
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xiang Wang
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Qiaoli Yi
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Zhijie Xu
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Pathology, Xiangya Changde Hospital, Changde, Hunan 415000, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| |
Collapse
|
45
|
Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022; 16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Ribonucleic acid (RNA) methylation is the most abundant modification in biological systems, accounting for 60% of all RNA modifications, and affects multiple aspects of RNA (including mRNAs, tRNAs, rRNAs, microRNAs, and long non-coding RNAs). Dysregulation of RNA methylation causes many developmental diseases through various mechanisms mediated by N 6-methyladenosine (m6A), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), 5-hydroxymethylcytosine (hm5C), and pseudouridine (Ψ). The emerging tools of RNA methylation can be used as diagnostic, preventive, and therapeutic markers. Here, we review the accumulated discoveries to date regarding the biological function and dynamic regulation of RNA methylation/modification, as well as the most popularly used techniques applied for profiling RNA epitranscriptome, to provide new ideas for growth and development.
Collapse
Affiliation(s)
- Jia Zou
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Hui Liu
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Wei Tan
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Yi-qi Chen
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Jing Dong
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Shu-yuan Bai
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Zhao-xia Wu
- Community Health Service Center, Wuchang Hospital, Wuhan, China
| | - Yan Zeng
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,
| |
Collapse
|
46
|
Zhou J, Wang X, Wei Z, Meng J, Huang D. 4acCPred: Weakly supervised prediction of N4-acetyldeoxycytosine DNA modification from sequences. MOLECULAR THERAPY - NUCLEIC ACIDS 2022; 30:337-345. [DOI: 10.1016/j.omtn.2022.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 10/12/2022] [Indexed: 11/06/2022]
|
47
|
RNADSN: Transfer-Learning 5-Methyluridine (m5U) Modification on mRNAs from Common Features of tRNA. Int J Mol Sci 2022; 23:ijms232113493. [PMID: 36362279 PMCID: PMC9655583 DOI: 10.3390/ijms232113493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 09/24/2022] [Accepted: 09/29/2022] [Indexed: 11/06/2022] Open
Abstract
One of the most abundant non-canonical bases widely occurring on various RNA molecules is 5-methyluridine (m5U). Recent studies have revealed its influences on the development of breast cancer, systemic lupus erythematosus, and the regulation of stress responses. The accurate identification of m5U sites is crucial for understanding their biological functions. We propose RNADSN, the first transfer learning deep neural network that learns common features between tRNA m5U and mRNA m5U to enhance the prediction of mRNA m5U. Without seeing the experimentally detected mRNA m5U sites, RNADSN has already outperformed the state-of-the-art method, m5UPred. Using mRNA m5U classification as an additional layer of supervision, our model achieved another distinct improvement and presented an average area under the receiver operating characteristic curve (AUC) of 0.9422 and an average precision (AP) of 0.7855. The robust performance of RNADSN was also verified by cross-technical and cross-cellular validation. The interpretation of RNADSN also revealed the sequence motif of common features. Therefore, RNADSN should be a useful tool for studying m5U modification.
Collapse
|
48
|
Huang D, Chen K, Song B, Wei Z, Su J, Coenen F, de Magalhães JP, Rigden DJ, Meng J. Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation. Nucleic Acids Res 2022; 50:10290-10310. [PMID: 36155798 PMCID: PMC9561283 DOI: 10.1093/nar/gkac830] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 08/26/2022] [Accepted: 09/15/2022] [Indexed: 12/25/2022] Open
Abstract
As the most pervasive epigenetic mark present on mRNA and lncRNA, N6-methyladenosine (m6A) RNA methylation regulates all stages of RNA life in various biological processes and disease mechanisms. Computational methods for deciphering RNA modification have achieved great success in recent years; nevertheless, their potential remains underexploited. One reason for this is that existing models usually consider only the sequence of transcripts, ignoring the various regions (or geography) of transcripts such as 3'UTR and intron, where the epigenetic mark forms and functions. Here, we developed three simple yet powerful encoding schemes for transcripts to capture the submolecular geographic information of RNA, which is largely independent from sequences. We show that m6A prediction models based on geographic information alone can achieve comparable performances to classic sequence-based methods. Importantly, geographic information substantially enhances the accuracy of sequence-based models, enables isoform- and tissue-specific prediction of m6A sites, and improves m6A signal detection from direct RNA sequencing data. The geographic encoding schemes we developed have exhibited strong interpretability, and are applicable to not only m6A but also N1-methyladenosine (m1A), and can serve as a general and effective complement to the widely used sequence encoding schemes in deep learning applications concerning RNA transcripts.
Collapse
Affiliation(s)
- Daiyun Huang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Department of Computer Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - Kunqi Chen
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, PR China
| | - Bowen Song
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - Jionglong Su
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
| | - Frans Coenen
- Department of Computer Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - João Pedro de Magalhães
- Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, PR China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, UK
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou 215123, PR China
| |
Collapse
|
49
|
Wang H, Zhao S, Cheng Y, Bi S, Zhu X. MTDeepM6A-2S: A two-stage multi-task deep learning method for predicting RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Front Microbiol 2022; 13:999506. [PMID: 36274691 PMCID: PMC9579691 DOI: 10.3389/fmicb.2022.999506] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 11/13/2022] Open
Abstract
N6-methyladenosine (m6A) is one of the most important RNA modifications, which is involved in many biological activities. Computational methods have been developed to detect m6A sites due to their high efficiency and low costs. As one of the most widely utilized model organisms, many methods have been developed for predicting m6A sites of Saccharomyces cerevisiae. However, the generalization of these methods was hampered by the limited size of the benchmark datasets. On the other hand, over 60,000 low resolution m6A sites and more than 10,000 base resolution m6A sites of Saccharomyces cerevisiae are recorded in RMBase and m6A-Atlas, respectively. The base resolution m6A sites are often obtained from low resolution results by post calibration. In view of these, we proposed a two-stage deep learning method, named MTDeepM6A-2S, to predict RNA m6A sites of Saccharomyces cerevisiae based on RNA sequence information. In the first stage, a multi-task model with convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) deep framework was built to not only detect the low resolution m6A sites but also assign a reasonable probability for the predicted site. In the second stage, a transfer-learning strategy was used to build the model to predict the base resolution m6A sites from those low resolution m6A sites. The effectiveness of our model was validated on both training and independent test sets. The results show that our model outperforms other state-of-the-art models on the independent test set, which indicates that our model holds high potential to become a useful tool for epitranscriptomics analysis.
Collapse
|
50
|
Wang M, Li F, Wu H, Liu Q, Li S. PredPromoter-MF(2L): A Novel Approach of Promoter Prediction Based on Multi-source Feature Fusion and Deep Forest. Interdiscip Sci 2022; 14:697-711. [PMID: 35488998 DOI: 10.1007/s12539-022-00520-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 04/05/2022] [Accepted: 04/05/2022] [Indexed: 12/12/2022]
Abstract
Promoters short DNA sequences play vital roles in initiating gene transcription. However, it remains a challenge to identify promoters using conventional experiment techniques in a high-throughput manner. To this end, several computational predictors based on machine learning models have been developed, while their performance is unsatisfactory. In this study, we proposed a novel two-layer predictor, called PredPromoter-MF(2L), based on multi-source feature fusion and ensemble learning. PredPromoter-MF(2L) was developed based on various deep features learned by a pre-trained deep learning network model and sequence-derived features. Feature selection based on XGBoost was applied to reduce fused features dimensions, and a cascade deep forest model was trained on the selected feature subset for promoter prediction. The results both fivefold cross-validation and independent test demonstrated that PredPromoter-MF(2L) outperformed state-of-the-art methods.
Collapse
Affiliation(s)
- Miao Wang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, VIC, 3000, Australia
| | - Hao Wu
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China.
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China.
| |
Collapse
|