1
|
Shin H, Yoon T, Yoon S. Fatigue life predictor: predicting fatigue life of metallic material using LSTM with a contextual attention model. RSC Adv 2025; 15:15781-15795. [PMID: 40365199 PMCID: PMC12070260 DOI: 10.1039/d5ra01578b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2025] [Accepted: 05/05/2025] [Indexed: 05/15/2025] Open
Abstract
Low-cycle fatigue (LCF) data involve complex temporal interactions in a strain cycle series, which hinders accurate fatigue life prediction. Current studies lack reliable methods for fatigue life prediction using only initial-cycle data while simultaneously capturing both temporal dependencies and localized features. This study introduces a novel deep-learning-based prediction model designed for LCF data. The proposed approach combines long short-term memory (LSTM) and convolutional neural network (CNN) architectures with an attention mechanism to effectively capture the temporal and localized characteristics of stress-strain data from acquisition through a series of cycle strain-controlled tests. Among the models tested, the LSTM-contextual attention model demonstrated superior performance (R 2 = 0.99), outperforming the baseline LSTM and CNN models with higher R 2 values and improved statistical metrics. The analysis of attention weights further revealed the model's ability to focus on critical timesteps associated with fatigue damage, highlighting its effectiveness in learning key features from LCF data. This study underscores the potential of deep-learning-based methods for accurate fatigue life prediction in LCF applications. This study provides a foundation for future research to extend these approaches to diverse materials with varying fatigue conditions and advanced models capable of incorporating non-linear fatigue mechanisms.
Collapse
Affiliation(s)
- Hongchul Shin
- Department of Mechanical Engineering, Korea University Seoul 02841 Republic of Korea
- Department of Mechanical Engineering, Changwon National University Changwon 51140 Republic of Korea
| | - Taeyoung Yoon
- Department of Mechanical Engineering, Changwon National University Changwon 51140 Republic of Korea
| | - Sungmin Yoon
- Department of Mechanical Engineering, Changwon National University Changwon 51140 Republic of Korea
| |
Collapse
|
2
|
Huang H, Zhou F, Jia J, Zhang H. DTC-m6Am: A Framework for Recognizing N6,2'-O-dimethyladenosine Sites in Unbalanced Classification Patterns Based on DenseNet and Attention Mechanisms. FRONT BIOSCI-LANDMRK 2025; 30:36603. [PMID: 40302345 DOI: 10.31083/fbl36603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2024] [Revised: 03/12/2025] [Accepted: 03/25/2025] [Indexed: 05/02/2025]
Abstract
BACKGROUND m6Am is a specific RNA modification that plays an important role in regulating mRNA stability, translational efficiency, and cellular stress response. m6Am's precise identification is essential to gain insight into its functional mechanisms at transcriptional and post-transcriptional levels. Due to the limitations of experimental assays, the development of efficient computational tools to predict m6Am sites has become a major focus of research, offering potential breakthroughs in RNA epigenetics. In this study, we present a robust and reliable deep learning model, DTC-m6Am, for identifying m6Am sites across the transcriptome. METHODS Our proposed DTC-m6Am model first represents RNA sequences by One-Hot coding to capture base-based features and provide structured inputs for subsequent deep learning models. The model then combines densely connected convolutional networks (DenseNet) and temporal convolutional network (TCN). The DenseNet module leverages its dense connectivity property to effectively extract local features and enhance information flow, whereas the TCN module focuses on capturing global time series dependencies to enhance the modeling capability for long sequence features. To further optimize feature extraction, the Convolutional Block Attention Module (CBAM) is used to focus on key regions through spatial and channel attention mechanisms. Finally, a fully connected layer is used for the classification task to achieve accurate prediction of the m6Am site. For the data imbalance problem, we use the focal loss function to balance the learning effect of positive and negative samples and improve the performance of the model on imbalanced data. RESULTS The deep learning-based DTC-m6Am model performs well on all evaluation metrics, achieving 87.8%, 50.3%, 69.1%, 41.1%, and 76.5% for sensitivity (Sn), specificity (Sp), accuracy (ACC), Mathew's correlation coefficient (MCC), and area under the curve (AUC), respectively, on the independent test set. CONCLUSIONS We critically evaluated the performance of DTC-m6Am using 10-fold cross-validation and independent testing and compared it to existing methods. The MCC value of 41.1% was achieved when using the independent test, which is 19.7% higher than the current state-of-the-art prediction method, m6Aminer. The results indicate that the DTC-m6Am model has high accuracy and stability and is an effective tool for predicting m6Am sites.
Collapse
Affiliation(s)
- Hui Huang
- School of Information Engineering, Jingdezhen Ceramic University, 333403 Jingdezhen, Jiangxi, China
| | - Fenglin Zhou
- School of Information Engineering, Jingdezhen Ceramic University, 333403 Jingdezhen, Jiangxi, China
| | - Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, 333403 Jingdezhen, Jiangxi, China
| | - Huachun Zhang
- School of Information Engineering, Jingdezhen Ceramic University, 333403 Jingdezhen, Jiangxi, China
| |
Collapse
|
3
|
Sheng N, Qiao J, Wei L, Shi H, Guo H, Yang C. Computational models for prediction of m6A sites using deep learning. Methods 2025; 240:113-124. [PMID: 40268153 DOI: 10.1016/j.ymeth.2025.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 04/02/2025] [Accepted: 04/07/2025] [Indexed: 04/25/2025] Open
Abstract
RNA modifications play a crucial role in enhancing the structural and functional diversity of RNA molecules and regulating various stages of the RNA life cycle. Among these modifications, N6-Methyladenosine (m6A) is the most common internal modification in eukaryotic mRNAs and has been extensively studied over the past decade. Accurate identification of m6A modification sites is essential for understanding their function and underlying mechanisms. Traditional methods predominantly rely on machine learning techniques to recognize m6A sites, which often fail to capture the contextual features of these sites comprehensively. In this study, we comprehensively summarize previously published methods based on machine learning and deep learning. We also validate multiple deep learning approaches on benchmark dataset, including previously underutilized methods in m6A site prediction, pre-trained models specifically designed for biological sequence and other basic deep learning methods. Additionally, we further analyze the dataset features and interpret the model's predictions to enhance understanding. Our experimental results clearly demonstrate the effectiveness of the deep learning models, elucidating their strong potential in accurately recognizing m6A modification sites.
Collapse
Affiliation(s)
- Nan Sheng
- School of Software, Shandong University, Jinan 250101, PR China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250101, PR China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250101, PR China
| | - Hua Shi
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, PR China
| | - Huannan Guo
- Beidahuang Industry Group General Hospital, PR China.
| | - Changshun Yang
- Department of Gastrointestinal Surgery, Fuzhou University Affiliated Provincial Hospital, Fuzhou 350004, PR China.
| |
Collapse
|
4
|
Liu Q, Zhou BM, Wang LJ, Zhang CY. Construction of a hierarchical DNA circuit for single-molecule profiling of locus-specific N 6-methyladenosine-MALAT1 in clinical tissues. Biosens Bioelectron 2025; 274:117198. [PMID: 39893948 DOI: 10.1016/j.bios.2025.117198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 01/16/2025] [Accepted: 01/25/2025] [Indexed: 02/04/2025]
Abstract
N6-methyladenosine (m6A) is the most important internal methylation in eukaryotic RNAs, and it is critically implicated in diverse RNA metabolisms for cancer development. Because epigenetic modifications do not interfere with Watson-Crick base pairing and m6A modification is not susceptible to chemical decorations, standard hybridization-based techniques cannot be applied for sensing m6A in RNAs. Consequently, the development of new methods for accurate and sensitive profiling of locus-specific m6A in RNAs remains a great challenge. Herein, we demonstrate for the first time the construction of a hierarchical DNA circuit for single-molecule profiling of locus-specific m6A-metastasis-associated lung adenocarcinoma transcript 1 (m6A-MALAT1) in clinical tissues. Taking advantage of high discrimination of VMC10-DNAzyme between m6A and A, exponential efficiency of hierarchical DNA circuit, and ultrahigh signal-to-noise ratio of single-molecule detection, this nanodevice exhibits attomolar sensitivity with a limit of detection (LOD) of 1.8 aM for m6A-MALAT1 in vitro and a dynamic range of 7 orders of magnitude. Moreover, it can discriminate 0.001% m6A-MALAT1 from excess A-MALAT1, quantify m6A-MALAT1 in diverse cancer cells at single-cell level, distinguish m6A-MALAT1 expressions in breast cancer patients and healthy individuals, and monitor cellular m6A-MALAT1 for gene therapy, offering a promising platform for epitranscriptomic research and clinical diagnostics.
Collapse
Affiliation(s)
- Qian Liu
- School of Chemistry and Chemical Engineering, State Key Laboratory of Digital Medical Engineering, Southeast University, Nanjing, 211189, China
| | - Bao-Mei Zhou
- School of Chemistry and Chemical Engineering, State Key Laboratory of Digital Medical Engineering, Southeast University, Nanjing, 211189, China
| | - Li-Juan Wang
- School of Chemistry and Chemical Engineering, State Key Laboratory of Digital Medical Engineering, Southeast University, Nanjing, 211189, China.
| | - Chun-Yang Zhang
- School of Chemistry and Chemical Engineering, State Key Laboratory of Digital Medical Engineering, Southeast University, Nanjing, 211189, China.
| |
Collapse
|
5
|
Su Q, Phan LT, Pham NT, Wei L, Manavalan B. MST-m6A: A Novel Multi-Scale Transformer-based Framework for Accurate Prediction of m6A Modification Sites Across Diverse Cellular Contexts. J Mol Biol 2025; 437:168856. [PMID: 39510345 DOI: 10.1016/j.jmb.2024.168856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 10/23/2024] [Accepted: 11/02/2024] [Indexed: 11/15/2024]
Abstract
N6-methyladenosine (m6A) modification, a prevalent epigenetic mark in eukaryotic cells, is crucial in regulating gene expression and RNA metabolism. Accurately identifying m6A modification sites is essential for understanding their functions within biological processes and the intricate mechanisms that regulate them. Recent advances in high-throughput sequencing technologies have enabled the generation of extensive datasets characterizing m6A modification sites at single-nucleotide resolution, leading to the development of computational methods for identifying m6A RNA modification sites. However, most current methods focus on specific cell lines, limiting their generalizability and practical application across diverse biological contexts. To address the limitation, we propose MST-m6A, a novel approach for identifying m6A modification sites with higher accuracy across various cell lines and tissues. MST-m6A utilizes a multi-scale transformer-based architecture, employing dual k-mer tokenization to capture rich feature representations and global contextual information from RNA sequences at multiple levels of granularity. These representations are then effectively combined using a channel fusion mechanism and further processed by a convolutional neural network to enhance prediction accuracy. Rigorous validation demonstrates that MST-m6A significantly outperforms conventional machine learning models, deep learning models, and state-of-the-art predictors. We anticipate that the high precision and cross-cell-type adaptability of MST-m6A will provide valuable insights into m6A biology and facilitate advancements in related fields. The proposed approach is available at https://github.com/cbbl-skku-org/MST-m6A/ for prediction and reproducibility purposes.
Collapse
Affiliation(s)
- Qiaosen Su
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Le Thi Phan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, Macau
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
6
|
Li G, Zhao B, Su X, Yang Y, Zeng Z, Hu P, Hu L. Capturing short-range and long-range dependencies of nucleotides for identifying RNA N6-methyladenosine modification sites. Comput Biol Med 2025; 186:109625. [PMID: 39756188 DOI: 10.1016/j.compbiomed.2024.109625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 11/17/2024] [Accepted: 12/23/2024] [Indexed: 01/07/2025]
Abstract
N6-methyladenosine (m6A) plays a crucial role in enriching RNA functional and genetic information, and the identification of m6A modification sites is therefore an important task to promote the understanding of RNA epigenetics. In the identification process, current studies are mainly concentrated on capturing the short-range dependencies between adjacent nucleotides in RNA sequences, while ignoring the impact of long-range dependencies between non-adjacent nucleotides for learning high-quality representation of RNA sequences. In this work, we propose an end-to-end prediction model, called m6ASLD, to improve the identification accuracy of m6A modification sites by capturing the short-range and long-range dependencies of nucleotides. Specifically, m6ASLD first encodes the type and position information of nucleotides to construct the initial embeddings of RNA sequences. A self-correlation map is then generated to characterize both short-range and long-range dependencies with a designed map generating block for each RNA sequence. After that, m6ASLD learns the global and local representations of RNA sequences by using a graph convolution process and a designed dependency searching block respectively, and finally achieves its identification task under a joint training scheme. Extensive experiments have demonstrated the promising performance of m6ASLD on 11 benchmark datasets across several evaluation metrics.
Collapse
Affiliation(s)
- Guodong Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Bowei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Xiaorui Su
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Yue Yang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Zhi Zeng
- College of Computer Science and Technology, Xi'an Jiaotong University, 710049, Xi'an, China.
| | - Pengwei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, 830011, Urumqi, China; University of Chinese Academy of Sciences, 100049, Beijing, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, 830011, Urumqi, China.
| |
Collapse
|
7
|
Pilala KM, Panoutsopoulou K, Papadimitriou MA, Soureas K, Scorilas A, Avgeris M. Exploring the methyl-verse: Dynamic interplay of epigenome and m6A epitranscriptome. Mol Ther 2025; 33:447-464. [PMID: 39659016 PMCID: PMC11852398 DOI: 10.1016/j.ymthe.2024.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 11/19/2024] [Accepted: 12/05/2024] [Indexed: 12/12/2024] Open
Abstract
The orchestration of dynamic epigenetic and epitranscriptomic modifications is pivotal for the fine-tuning of gene expression. However, these modifications are traditionally examined independently. Recent compelling studies have disclosed an interesting communication and interplay between m6A RNA methylation (m6A epitranscriptome) and epigenetic modifications, enabling the formation of feedback circuits and cooperative networks. Intriguingly, the interaction between m6A and DNA methylation machinery, coupled with the crosstalk between m6A RNA and histone modifications shape the transcriptional profile and translational efficiency. Moreover, m6A modifications interact also with non-coding RNAs, modulating their stability, abundance, and regulatory functions. In the light of these findings, m6A imprinting acts as a versatile checkpoint, linking epigenetic and epitranscriptomic layers toward a multilayer and time-dependent control of gene expression and cellular homeostasis. The scope of the present review is to decipher the m6A-coordinated circuits with DNA imprinting, chromatin architecture, and non-coding RNAs networks in normal physiology and carcinogenesis. Ultimately, we summarize the development of innovative CRISPR-dCas engineering platforms fused with m6A catalytic components (m6A writers or erasers) to achieve transcript-specific editing of m6A epitranscriptomes that can create new insights in modern RNA therapeutics.
Collapse
Affiliation(s)
- Katerina-Marina Pilala
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | - Konstantina Panoutsopoulou
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | - Maria-Alexandra Papadimitriou
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | - Konstantinos Soureas
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece; Laboratory of Clinical Biochemistry - Molecular Diagnostics, Second Department of Pediatrics, School of Medicine, National and Kapodistrian University of Athens, "P. & A. Kyriakou" Children's Hospital, Athens, Greece
| | - Andreas Scorilas
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | - Margaritis Avgeris
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece; Laboratory of Clinical Biochemistry - Molecular Diagnostics, Second Department of Pediatrics, School of Medicine, National and Kapodistrian University of Athens, "P. & A. Kyriakou" Children's Hospital, Athens, Greece.
| |
Collapse
|
8
|
Huang Y, Zhang L, Mu W, Zheng M, Bao X, Li H, Luo X, Ren J, Zuo Z. RMVar 2.0: an updated database of functional variants in RNA modifications. Nucleic Acids Res 2025; 53:D275-D283. [PMID: 39436017 PMCID: PMC11701541 DOI: 10.1093/nar/gkae924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 10/01/2024] [Accepted: 10/08/2024] [Indexed: 10/23/2024] Open
Abstract
Evaluating the impact of genetic variants on RNA modifications (RMs) is crucial for identifying disease-associated variants and understanding the pathogenic mechanisms underlying human diseases. Previously, we developed a database called RMVar to catalog variants linked to RNA modifications in humans and mice. Here, we present an updated version RMVar 2.0 (http://rmvar.renlab.cn). In this updated version, we applied an enhanced analytical pipeline to the latest RNA modification datasets and genetic variant information to identify RM-associated variants. A notable advancement in RMVar 2.0 is our incorporation of allele-specific RNA modification analysis to identify RM-associated variants, a novel approach not utilized in RMVar 1.0 or other comparable databases. Furthermore, the database offers comprehensive annotations for various molecular events, including RNA-binding protein (RBP) interactions, RNA-RNA interactions, splicing events, and circular RNAs (circRNAs), which facilitate investigations into how RM-associated variants influence post-transcriptional regulation. Additionally, we provide disease-related information sourced from ClinVar and GWAS to help researchers explore the connections between RNA modifications and various diseases. We believe that RMVar 2.0 will significantly enhance our understanding of the functional implications of genetic variants affecting RNA modifications within the context of human disease research.
Collapse
Affiliation(s)
- Yuantai Huang
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Luowanyue Zhang
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Weiping Mu
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Mohan Zheng
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Xiaoqiong Bao
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Huiqin Li
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Xiaotong Luo
- Innovation Center of the Sixth Affiliated hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
| | - Jian Ren
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Zhixiang Zuo
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| |
Collapse
|
9
|
Xia R, Yin X, Huang J, Chen K, Ma J, Wei Z, Su J, Blake N, Rigden DJ, Meng J, Song B. Interpretable deep cross networks unveiled common signatures of dysregulated epitranscriptomes across 12 cancer types. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102376. [PMID: 39618823 PMCID: PMC11605186 DOI: 10.1016/j.omtn.2024.102376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 10/25/2024] [Indexed: 01/12/2025]
Abstract
Cancer is a complex and multifaceted group of diseases characterized by uncontrolled cell growth that leads to the formation of malignant tumors. Recent studies suggest that N6-methyladenosine (m6A) RNA methylation plays pivotal roles in cancer pathology by influencing various cellular processes. However, the degree to which these mechanisms are shared across different cancer types remains unclear. In this study, we analyze an expansive array of 167 m6A epitranscriptome profiles covering 12 distinct cancer types and their originating normal tissues. We trained 12 distinct, cancer type-specific interpretable deep cross network models, which successfully distinguish between specific pairs of normal and cancer m6A contexts using integrated information from both the sequences and curated genomic knowledge. Interestingly, cross-cancer type testing indicated the existence of shared genomic patterns across various cancers at the epitranscriptome level. A pan-cancer model was subsequently developed to identify these shared patterns that could not be observed in a single cancer type. Our analysis uncovered, for the first time, a common epitranscriptome signature shared across multiple cancer types, particularly associated with RNA hybridization process and aberrant splicing. This highlights the importance of a comprehensive understanding of the pan-cancer epitranscriptome and holding potential implications in the development of RNA methylation-based therapeutics for various cancers.
Collapse
Affiliation(s)
- Rong Xia
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Xiangyu Yin
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Jiaming Huang
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Kunqi Chen
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China
| | - Jiongming Ma
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Zhen Wei
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- Institute of Infection, Veterinary & Ecological Sciences, University of Liverpool, L7 8TX Liverpool, UK
| | - Jionglong Su
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Neil Blake
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Jia Meng
- Institute of Biomedical Research, Regulatory Mechanism and Targeted Therapy for Liver Cancer Shiyan Key Laboratory, Hubei Provincial Clinical Research Center for Precise Diagnosis and Treatment of Liver Cancer, Taihe Hospital, Hubei University of Medicine, Shiyan, Hubei 442000, China
- Department of Biological Sciences, School of Science, Suzhou Key Laboratory of Cancer Biology and Chronic Disease, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Bowen Song
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| |
Collapse
|
10
|
Huang J, Wang X, Xia R, Yang D, Liu J, Lv Q, Yu X, Meng J, Chen K, Song B, Wang Y. Domain-knowledge enabled ensemble learning of 5-formylcytosine (f5C) modification sites. Comput Struct Biotechnol J 2024; 23:3175-3185. [PMID: 39253057 PMCID: PMC11381828 DOI: 10.1016/j.csbj.2024.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 08/07/2024] [Accepted: 08/07/2024] [Indexed: 09/11/2024] Open
Abstract
5-formylcytidine (f5C) is a unique post-transcriptional RNA modification found in mRNA and tRNA at the wobble site, playing a crucial role in mitochondrial protein synthesis and potentially contributing to the regulation of translation. Recent studies have unveiled that the f5C modifications may drive mitochondrial mRNA translation to power cancer metastasis. Accurate identification of f5C sites is essential for further unraveling their molecular functions and regulatory mechanisms, but there are currently no computational methods available for predicting their locations. In this study, we introduce an innovative ensemble approach, successfully enabling the computational recognition of Saccharomyces cerevisiae f5C. We conducted a comprehensive model selection process that involved multiple basic machine learning and deep learning algorithms such as recurrent neural networks, convolutional neural networks and Transformer-based models. Initially trained only on sequence information, these individual models achieved an AUROC ranging from 0.7104 to 0.7492. Through the integration of 32 novel domain-derived genomic features, the performance of individual models has significantly improved to an AUROC between 0.7309 and 0.8076. To further enhance accuracy and robustness, we then constructed the ensembles of these individual models with different combinations. The best performance attained by our ensemble models reached an AUROC of 0.8391. Shapley additive explanations were conducted to explain the significant contributions of genomic features, providing insights into the putative distribution of f5C across various topological regions and potentially paving the way for revealing their functional relevance within distinct genomic contexts. A freely accessible web server that allows real-time analysis of user-uploaded sites can be accessed at: www.rnamd.org/Resf5C-Pred.
Collapse
Affiliation(s)
- Jiaming Huang
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Xuan Wang
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Rong Xia
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Dongqing Yang
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jian Liu
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Qi Lv
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Xiaoxuan Yu
- Department of Pharmacology, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jia Meng
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L7 8TX, United Kingdom
| | - Kunqi Chen
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China
| | - Bowen Song
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Yue Wang
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| |
Collapse
|
11
|
Yang Z, Shao W, Matsuda Y, Song L. iResNetDM: An interpretable deep learning approach for four types of DNA methylation modification prediction. Comput Struct Biotechnol J 2024; 23:4214-4221. [PMID: 39650332 PMCID: PMC11621598 DOI: 10.1016/j.csbj.2024.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 10/11/2024] [Accepted: 11/02/2024] [Indexed: 12/11/2024] Open
Abstract
Motivation Although several computational methods for predicting DNA methylation modifications have been developed, two main limitations persist: 1) All of the models are currently confined to binary predictors, which merely determine the presence or absence of DNA methylation modifications and thus prevent comprehensive analyses of the interrelations among varied modification types. Multi-class classification models for RNA modifications have been developed, and a comparable approach for DNA is essential. 2) Few previous studies offer adequate explanations of how models make decisions, instead relying on the extraction and visualization of attention matrices, which have identified few motifs and do not provide sufficient insights into the model decision-making process. Result In this study, we introduce the task of DNA methylation modification prediction as a multi-class classification problem for the first time. We present iResNetDM, a deep learning model that integrates Residual Networks (ResNet) with self-attention mechanisms. To the best of our knowledge, iResNetDM is the first model capable of distinguishing between four types of DNA methylation modifications. Our model not only demonstrates good performance across various DNA methylation modifications but can also capture relationships between different types of modifications. We used the integrated gradients technique to enhance the interpretability of the iResNetDM. This method can effectively elucidate the model's decision-making process, thus enabling the successful identification of multiple motifs. Notably, our model displays remarkable robustness, and can effectively identify unique motifs across different methylation modifications. We also compared the motifs discovered in various modifications and found that some had notable sequence similarities, suggesting that they may be subject to different types of modifications. This finding highlights the potential importance of these motifs in gene regulation.
Collapse
Affiliation(s)
- Zerui Yang
- Department of Chemistry, City University of Hong Kong, Hong Kong
- City University of Hong Kong Shenzhen Research Institute
| | - Wei Shao
- Department of Computer Science, City University of Hong Kong, Hong Kong
| | - Yudai Matsuda
- Department of Chemistry, City University of Hong Kong, Hong Kong
| | - Linqi Song
- City University of Hong Kong Shenzhen Research Institute
- Department of Computer Science, City University of Hong Kong, Hong Kong
| |
Collapse
|
12
|
Yuge CC, Hang ES, Mamtha MRN, Vishwakarma S, Wang S, Wang C, Le NQK. RNA-ModX: a multilabel prediction and interpretation framework for RNA modifications. Brief Bioinform 2024; 26:bbae688. [PMID: 39737566 DOI: 10.1093/bib/bbae688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 11/18/2024] [Accepted: 12/16/2024] [Indexed: 01/01/2025] Open
Abstract
Accurate prediction of RNA modifications holds profound implications for elucidating RNA function and mechanism, with potential applications in drug development. Here, the RNA-ModX presents a highly precise predictive model designed to forecast post-transcriptional RNA modifications, complemented by a user-friendly web application tailored for seamless utilization by future researchers. To achieve exceptional accuracy, the RNA-ModX systematically explored a range of machine learning models, including Long Short-Term Memory (LSTM), Gated Recurrent Unit, and Transformer-based architectures. The model underwent rigorous testing using a dataset comprising RNA sequences containing the four fundamental nucleotides (A, C, G, U) and spanning 12 prevalent modification classes (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um), with sequences of length 1001 nucleotides. Notably, the LSTM model, augmented with 3-mer encoding, demonstrated the highest level of model accuracy. Furthermore, Local Interpretable Model-Agnostic Explanations were employed to facilitate result interpretation, enhancing the transparency and interpretability of the model's predictions. In conjunction with the model development, a user-friendly web application was meticulously crafted, featuring an intuitive interface for researchers to effortlessly upload RNA sequences. Upon submission, the model executes in the backend, generating predictions which are seamlessly presented to the user in a coherent manner. This integration of cutting-edge predictive modeling with a user-centric interface signifies a significant step forward in facilitating the exploration and utilization of RNA modification prediction technologies by the broader research community.
Collapse
Affiliation(s)
- Chelsea Chen Yuge
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Ee Soon Hang
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | | | - Shashikant Vishwakarma
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Sijia Wang
- NUS-ISS, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Cheng Wang
- Independent Researcher, Singapore, Singapore
| | - Nguyen Quoc Khanh Le
- In-Service Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, 250 Wuxing Street, 110, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, 252 Wuxing Street, 110, Taipei, Taiwan
| |
Collapse
|
13
|
Xia Y, Zhang Y, Liu D, Zhu YH, Wang Z, Song J, Yu DJ. BLAM6A-Merge: Leveraging Attention Mechanisms and Feature Fusion Strategies to Improve the Identification of RNA N6-Methyladenosine Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1803-1815. [PMID: 38913512 DOI: 10.1109/tcbb.2024.3418490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
RNA N6-methyladenosine is a prevalent and abundant type of RNA modification that exerts significant influence on diverse biological processes. To date, numerous computational approaches have been developed for predicting methylation, with most of them ignoring the correlations of different encoding strategies and failing to explore the adaptability of various attention mechanisms for methylation identification. To solve the above issues, we proposed an innovative framework for predicting RNA m6A modification site, termed BLAM6A-Merge. Specifically, it utilized a multimodal feature fusion strategy to combine the classification results of four features and Blastn tool. Apart from this, different attention mechanisms were employed for extracting higher-level features on specific features after the screening process. Extensive experiments on 12 benchmarking datasets demonstrated that BLAM6A-Merge achieved superior performance (average AUC: 0.849 for the full transcript mode and 0.784 for the mature mRNA mode). Notably, the Blastn tool was employed for the first time in the identification of methylation sites.
Collapse
|
14
|
Luo Z, Yu L, Xu Z, Liu K, Gu L. Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites. BIOLOGY 2024; 13:777. [PMID: 39452086 PMCID: PMC11504118 DOI: 10.3390/biology13100777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 09/19/2024] [Accepted: 09/23/2024] [Indexed: 10/26/2024]
Abstract
N6-methyladenosine (m6A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m6A mapping have accelerated the accumulation of m6A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m6A site prediction. However, it is still a major challenge to precisely predict m6A sites using in silico approaches. To advance the computational support for m6A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., H. sapiens, M. musculus, Rat, S. cerevisiae, Zebrafish, A. thaliana, Pig, Rhesus, and Chimpanzee). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m6A modification. We revisited 52 computational approaches published since 2015 for m6A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m6A identification and facilitate more rigorous comparisons of new methods in the future.
Collapse
Affiliation(s)
- Zhengtao Luo
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China;
- Anhui Provincial Key Laboratory of Smart Agriculture Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
| | - Liyi Yu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China; (L.Y.); (Z.X.)
| | - Zhaochun Xu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China; (L.Y.); (Z.X.)
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin 150076, China
| | - Kening Liu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China; (L.Y.); (Z.X.)
| | - Lichuan Gu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China;
- Anhui Provincial Key Laboratory of Smart Agriculture Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
| |
Collapse
|
15
|
Zheng Y, Li H, Lin S. m7GRegpred: substrate prediction of N7-methylguanosine (m7G) writers and readers based on sequencing features. Front Genet 2024; 15:1469011. [PMID: 39262420 PMCID: PMC11387174 DOI: 10.3389/fgene.2024.1469011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 08/19/2024] [Indexed: 09/13/2024] Open
Abstract
N7-Methylguanosine (m7G) is important RNA modification at internal and the cap structure of five terminal end of message RNA. It is essential for RNA stability of RNA, the efficiency of translation, and various intracellular RNA processing pathways. Given the significance of the m7G modification, numerous studies have been conducted to predict m7G sites. To further elucidate the regulatory mechanisms surrounding m7G, we introduce a novel bioinformatics framework, m7GRegpred, designed to forecast the targets of the m7G methyltransferases METTL1 and WDR4, and m7G readers QKI5, QKI6, and QKI7 for the first time. We integrated different features to build predictors, with AUROC scores of 0.856, 0.857, 0.780, 0.776, 0.818 for METTL1, WDR4, QKI5, QKI6, and QKI7, respectively. In addition, the effect of window lengths and algorism were systemically evaluated in this work. The finial model was summarized in a user-friendly webserver: http://modinfor.com/m7GRegpred/. Our research indicates that the substrates of m7G regulators can be identified and may potentially advance the study of m7G regulators under unique conditions.
Collapse
Affiliation(s)
- Yu Zheng
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian, China
- School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, China
| | - Haipeng Li
- Graduate School of Fujian Medical University, Fuzhou, Fujian, China
- Department of Operating Room, Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Shaofeng Lin
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian, China
- School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, China
| |
Collapse
|
16
|
Zhao J, Chen Z, Zhang M, Zou L, He S, Liu J, Wang Q, Song X, Wu J. DeepIRES: a hybrid deep learning model for accurate identification of internal ribosome entry sites in cellular and viral mRNAs. Brief Bioinform 2024; 25:bbae439. [PMID: 39234953 PMCID: PMC11375421 DOI: 10.1093/bib/bbae439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/03/2024] [Accepted: 08/21/2024] [Indexed: 09/06/2024] Open
Abstract
The internal ribosome entry site (IRES) is a cis-regulatory element that can initiate translation in a cap-independent manner. It is often related to cellular processes and many diseases. Thus, identifying the IRES is important for understanding its mechanism and finding potential therapeutic strategies for relevant diseases since identifying IRES elements by experimental method is time-consuming and laborious. Many bioinformatics tools have been developed to predict IRES, but all these tools are based on structure similarity or machine learning algorithms. Here, we introduced a deep learning model named DeepIRES for precisely identifying IRES elements in messenger RNA (mRNA) sequences. DeepIRES is a hybrid model incorporating dilated 1D convolutional neural network blocks, bidirectional gated recurrent units, and self-attention module. Tenfold cross-validation results suggest that DeepIRES can capture deeper relationships between sequence features and prediction results than other baseline models. Further comparison on independent test sets illustrates that DeepIRES has superior and robust prediction capability than other existing methods. Moreover, DeepIRES achieves high accuracy in predicting experimental validated IRESs that are collected in recent studies. With the application of a deep learning interpretable analysis, we discover some potential consensus motifs that are related to IRES activities. In summary, DeepIRES is a reliable tool for IRES prediction and gives insights into the mechanism of IRES elements.
Collapse
Affiliation(s)
- Jian Zhao
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29 Jiangjun Road, Jiangning District, Nanjing 211106, China
| | - Zhewei Chen
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29 Jiangjun Road, Jiangning District, Nanjing 211106, China
| | - Meng Zhang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29 Jiangjun Road, Jiangning District, Nanjing 211106, China
| | - Lingxiao Zou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29 Jiangjun Road, Jiangning District, Nanjing 211106, China
| | - Shan He
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29 Jiangjun Road, Jiangning District, Nanjing 211106, China
| | - Jingjing Liu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29 Jiangjun Road, Jiangning District, Nanjing 211106, China
| | - Quan Wang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29 Jiangjun Road, Jiangning District, Nanjing 211106, China
| | - Xiaofeng Song
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, No. 29 Jiangjun Road, Jiangning District, Nanjing 211106, China
| | - Jing Wu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, No. 101 Longmian Avenue, Jiangning District, Nanjing 211166, China
| |
Collapse
|
17
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
18
|
Song M, Zhao J, Zhang C, Jia C, Yang J, Zhao H, Zhai J, Lei B, Tao S, Chen S, Su R, Ma C. PEA-m6A: an ensemble learning framework for accurately predicting N6-methyladenosine modifications in plants. PLANT PHYSIOLOGY 2024; 195:1200-1213. [PMID: 38428981 DOI: 10.1093/plphys/kiae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 01/11/2024] [Accepted: 02/01/2024] [Indexed: 03/03/2024]
Abstract
N 6-methyladenosine (m6A), which is the mostly prevalent modification in eukaryotic mRNAs, is involved in gene expression regulation and many RNA metabolism processes. Accurate prediction of m6A modification is important for understanding its molecular mechanisms in different biological contexts. However, most existing models have limited range of application and are species-centric. Here we present PEA-m6A, a unified, modularized and parameterized framework that can streamline m6A-Seq data analysis for predicting m6A-modified regions in plant genomes. The PEA-m6A framework builds ensemble learning-based m6A prediction models with statistic-based and deep learning-driven features, achieving superior performance with an improvement of 6.7% to 23.3% in the area under precision-recall curve compared with state-of-the-art regional-scale m6A predictor WeakRM in 12 plant species. Especially, PEA-m6A is capable of leveraging knowledge from pretrained models via transfer learning, representing an innovation in that it can improve prediction accuracy of m6A modifications under small-sample training tasks. PEA-m6A also has a strong capability for generalization, making it suitable for application in within- and cross-species m6A prediction. Overall, this study presents a promising m6A prediction tool, PEA-m6A, with outstanding performance in terms of its accuracy, flexibility, transferability, and generalization ability. PEA-m6A has been packaged using Galaxy and Docker technologies for ease of use and is publicly available at https://github.com/cma2015/PEA-m6A.
Collapse
Affiliation(s)
- Minggui Song
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jiawen Zhao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chujun Zhang
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chengchao Jia
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jing Yang
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Haonan Zhao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jingjing Zhai
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Beilei Lei
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Shiheng Tao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Siqi Chen
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Ran Su
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Chuang Ma
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
19
|
Wu Y, Shao W, Yan M, Wang Y, Xu P, Huang G, Li X, Gregory BD, Yang J, Wang H, Yu X. Transfer learning enables identification of multiple types of RNA modifications using nanopore direct RNA sequencing. Nat Commun 2024; 15:4049. [PMID: 38744925 PMCID: PMC11094168 DOI: 10.1038/s41467-024-48437-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 04/26/2024] [Indexed: 05/16/2024] Open
Abstract
Nanopore direct RNA sequencing (DRS) has emerged as a powerful tool for RNA modification identification. However, concurrently detecting multiple types of modifications in a single DRS sample remains a challenge. Here, we develop TandemMod, a transferable deep learning framework capable of detecting multiple types of RNA modifications in single DRS data. To train high-performance TandemMod models, we generate in vitro epitranscriptome datasets from cDNA libraries, containing thousands of transcripts labeled with various types of RNA modifications. We validate the performance of TandemMod on both in vitro transcripts and in vivo human cell lines, confirming its high accuracy for profiling m6A and m5C modification sites. Furthermore, we perform transfer learning for identifying other modifications such as m7G, Ψ, and inosine, significantly reducing training data size and running time without compromising performance. Finally, we apply TandemMod to identify 3 types of RNA modifications in rice grown in different environments, demonstrating its applicability across species and conditions. In summary, we provide a resource with ground-truth labels that can serve as benchmark datasets for nanopore-based modification identification methods, and TandemMod for identifying diverse RNA modifications using a single DRS sample.
Collapse
Affiliation(s)
- You Wu
- Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Wenna Shao
- Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Mengxiao Yan
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
| | - Yuqin Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
| | - Pengfei Xu
- Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Guoqiang Huang
- Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaofei Li
- Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Brian D Gregory
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Jun Yang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China.
- Chenshan Scientific Research Center of CAS Center for Excellence in Molecular Plant Sciences, Shanghai, 201602, China.
| | - Hongxia Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China.
- Chenshan Scientific Research Center of CAS Center for Excellence in Molecular Plant Sciences, Shanghai, 201602, China.
| | - Xiang Yu
- Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
20
|
Wang R, Chung CR, Lee TY. Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species. Int J Mol Sci 2024; 25:2869. [PMID: 38474116 DOI: 10.3390/ijms25052869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 02/19/2024] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open
Abstract
RNA modification plays a crucial role in cellular regulation. However, traditional high-throughput sequencing methods for elucidating their functional mechanisms are time-consuming and labor-intensive, despite extensive research. Moreover, existing methods often limit their focus to specific species, neglecting the simultaneous exploration of RNA modifications across diverse species. Therefore, a versatile computational approach is necessary for interpretable analysis of RNA modifications across species. A multi-scale biological language-based deep learning model is proposed for interpretable, sequential-level prediction of diverse RNA modifications. Benchmark comparisons across species demonstrate the model's superiority in predicting various RNA methylation types over current state-of-the-art methods. The cross-species validation and attention weight visualization also highlight the model's capability to capture sequential and functional semantics from genomic backgrounds. Our analysis of RNA modifications helps us find the potential existence of "biological grammars" in each modification type, which could be effective for mapping methylation-related sequential patterns and understanding the underlying biological mechanisms of RNA modifications.
Collapse
Affiliation(s)
- Rulan Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 320317, Taiwan
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan
| |
Collapse
|
21
|
Jiang J, Song B, Meng J, Zhou J. Tissue-specific RNA methylation prediction from gene expression data using sparse regression models. Comput Biol Med 2024; 169:107892. [PMID: 38171264 DOI: 10.1016/j.compbiomed.2023.107892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 12/19/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024]
Abstract
N6-methyladenosine (m6A) is a highly prevalent and conserved post-transcriptional modification observed in mRNA and long non-coding RNA (lncRNA). Identifying potential m6A sites within RNA sequences is crucial for unraveling the potential influence of the epitranscriptome on biological processes. In this study, we introduce Exp2RM, a novel approach that formulates single-site-based tissue-specific elastic net models for predicting tissue-specific methylation levels utilizing gene expression data. The resulting ensemble model demonstrates robust predictive performance for tissue-specific methylation levels, with an average R-squared value of 0.496 and a median R-squared value of 0.482 across all 22 human tissues. Since methylation distribution varies among tissues, we trained the model to incorporate similar patterns, significantly improves accuracy with the median R-squared value increasing to 0.728. Additonally, functional analysis reveals Exp2RM's ability to capture coefficient genes in relevant biological processes. This study emphasizes the importance of tissue-specific methylation distribution in enhancing prediction accuracy and provides insights into the functional implications of methylation sites.
Collapse
Affiliation(s)
- Jie Jiang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB, Liverpool, United Kingdom
| | - Bowen Song
- Department of Public Health, School of Medicine & Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China; AI University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB, Liverpool, United Kingdom
| | - Jingxian Zhou
- School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University Entrepreneur College (Taicang), Taicang, Suzhou, Jiangsu Province, 215400, China; Department of Computer Science, University of Liverpool, L69 7ZB, Liverpool, United Kingdom.
| |
Collapse
|
22
|
Liu R, Wang Q, Zhang X. Identification of prognostic coagulation-related signatures in clear cell renal cell carcinoma through integrated multi-omics analysis and machine learning. Comput Biol Med 2024; 168:107779. [PMID: 38061153 DOI: 10.1016/j.compbiomed.2023.107779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/30/2023] [Accepted: 11/28/2023] [Indexed: 01/10/2024]
Abstract
Clear cell renal cell carcinoma is a threat to public health with high morbidity and mortality. Clinical evidence has shown that cancer-associated thrombosis poses significant challenges to treatments, including drug resistance and difficulties in surgical decision-making in ccRCC. However, the coagulation pathway, one of the core mechanisms of cancer-associated thrombosis, recently found closely related to the tumor microenvironment and immune-related pathway, is rarely researched in ccRCC. Therefore, we integrated bulk RNA-seq data, DNA mutation and methylation data, single-cell data, and proteomic data to perform a comprehensive analysis of coagulation-related genes in ccRCC. First, we demonstrated the importance of the coagulation-related gene set by consensus clustering. Based on machine learning, we identified 5 coagulation signature genes and verified their clinical value in TCGA, ICGC, and E-MTAB-1980 databases. It's also demonstrated that the specific expression patterns of coagulation signature genes driven by CNV and methylation were closely correlated with pathways including apoptosis, immune infiltration, angiogenesis, and the construction of extracellular matrix. Moreover, we identified two types of tumor cells in single-cell data by machine learning, and the coagulation signature genes were differentially expressed in two types of tumor cells. Besides, the signature genes were proven to influence immune cells especially the differentiation of T cells. And their protein level was also validated.
Collapse
Affiliation(s)
- Ruijie Liu
- Department of Urology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, China.
| | - Qi Wang
- Department of Urology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, China.
| | - Xiaoping Zhang
- Department of Urology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, China.
| |
Collapse
|
23
|
Ji Y, Sun J, Xie J, Wu W, Shuai SC, Zhao Q, Chen W. m5UMCB: Prediction of RNA 5-methyluridine sites using multi-scale convolutional neural network with BiLSTM. Comput Biol Med 2024; 168:107793. [PMID: 38048661 DOI: 10.1016/j.compbiomed.2023.107793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 11/20/2023] [Accepted: 11/28/2023] [Indexed: 12/06/2023]
Abstract
As a prevalent RNA modification, 5-methyluridine (m5U) plays a critical role in diverse biological processes and disease pathogenesis. High-throughput identification of m5U typically relies on labor-intensive biochemical experiments using various sequencing-based techniques, which are not only time-consuming but also expensive. Consequently, there is a pressing need for more efficient and cost-effective computational methods to complement these high-throughput techniques. In this study, we present m5UMCB, a novel approach that harnesses a multi-scale convolutional neural network (CNN) in tandem with bidirectional long short-term memory (BiLSTM) to recognize m5U sites. Our method involves segmenting RNA sequences into smaller fragments based on a 3-mer length and subsequently mapping each fragment to a lower-dimensional vector representation using the global vectors for word representation (GloVe) technique. Through a series of multi-scale convolution and pooling operations, local features are extracted from RNA sequences and transformed into abstract, high-level features. The feature matrix is then inputted into a BiLSTM network, enabling the capture of contextual information and long-term dependencies within the sequence. Ultimately, a fully connected layer is employed to classify m5U sites. The validation results from 5-fold cross-validation (5-fold CV) test indicate that m5UMCB outperforms existing state-of-the-art predictive methods, demonstrating a 1.98% increase in the area under ROC curve (AUC) and significant improvements in relevant evaluation metrics. We are confident that m5UMCB will serve as a valuable tool for m5U prediction.
Collapse
Affiliation(s)
- Yingshan Ji
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Jingxuan Xie
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Wei Wu
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Stella C Shuai
- Biological Science, Northwestern University, Evanston, IL, 60208, USA
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China.
| |
Collapse
|
24
|
Wang L, Zhou Y. MRM-BERT: a novel deep neural network predictor of multiple RNA modifications by fusing BERT representation and sequence features. RNA Biol 2024; 21:1-10. [PMID: 38357904 PMCID: PMC10877979 DOI: 10.1080/15476286.2024.2315384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/26/2023] [Accepted: 02/02/2024] [Indexed: 02/16/2024] Open
Abstract
RNA modifications play crucial roles in various biological processes and diseases. Accurate prediction of RNA modification sites is essential for understanding their functions. In this study, we propose a hybrid approach that fuses a pre-trained sequence representation with various sequence features to predict multiple types of RNA modifications in one combined prediction framework. We developed MRM-BERT, a deep learning method that combined the pre-trained DNABERT deep sequence representation module and the convolutional neural network (CNN) exploiting four traditional sequence feature encodings to improve the prediction performance. MRM-BERT was evaluated on multiple datasets of 12 commonly occurring RNA modifications, including m6A, m5C, m1A and so on. The results demonstrate that our hybrid model outperforms other models in terms of area under receiver operating characteristic curve (AUC) for all 12 types of RNA modifications. MRM-BERT is available as an online tool (http://117.122.208.21:8501) or source code (https://github.com/abhhba999/MRM-BERT), which allows users to predict RNA modification sites and visualize the results. Overall, our study provides an effective and efficient approach to predict multiple RNA modifications, contributing to the understanding of RNA biology and the development of therapeutic strategies.
Collapse
Affiliation(s)
- Linshu Wang
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Yuan Zhou
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University, Beijing, China
- Department of Biomedical Informatics, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing, China
| |
Collapse
|
25
|
Aslam I, Shah S, Jabeen S, ELAffendi M, A Abdel Latif A, Ul Haq N, Ali G. A CNN based m5c RNA methylation predictor. Sci Rep 2023; 13:21885. [PMID: 38081880 PMCID: PMC10713599 DOI: 10.1038/s41598-023-48751-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
Post-transcriptional modifications of RNA play a key role in performing a variety of biological processes, such as stability and immune tolerance, RNA splicing, protein translation and RNA degradation. One of these RNA modifications is m5c which participates in various cellular functions like RNA structural stability and translation efficiency, got popularity among biologists. By applying biological experiments to detect RNA m5c methylation sites would require much more efforts, time and money. Most of the researchers are using pre-processed RNA sequences of 41 nucleotides where the methylated cytosine is in the center. Therefore, it is possible that some of the information around these motif may have lost. The conventional methods are unable to process the RNA sequence directly due to high dimensionality and thus need optimized techniques for better features extraction. To handle the above challenges the goal of this study is to employ an end-to-end, 1D CNN based model to classify and interpret m5c methylated data sites. Moreover, our aim is to analyze the sequence in its full length where the methylated cytosine may not be in the center. The evaluation of the proposed architecture showed a promising results by outperforming state-of-the-art techniques in terms of sensitivity and accuracy. Our model achieve 96.70% sensitivity and 96.21% accuracy for 41 nucleotides sequences while 96.10% accuracy for full length sequences.
Collapse
Affiliation(s)
- Irum Aslam
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Sajid Shah
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Saima Jabeen
- College of Engineering, AI Research Center, Alfaisal University, Riyadh, 50927, Saudi Arabia.
| | - Mohammed ELAffendi
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Asmaa A Abdel Latif
- Public Health and Community Medicine Department (Industrial medicine and occupational health specialty, Faculty of Medicine, Menoufia University, Shibîn el Kôm, Egypt
| | - Nuhman Ul Haq
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Gauhar Ali
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| |
Collapse
|
26
|
Zhang Y, Wang Z, Zhang Y, Li S, Guo Y, Song J, Yu DJ. Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues. Bioinformatics 2023; 39:btad709. [PMID: 37995291 PMCID: PMC10697738 DOI: 10.1093/bioinformatics/btad709] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 11/01/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
MOTIVATION RNA N6-methyladenosine (m6A) in Homo sapiens plays vital roles in a variety of biological functions. Precise identification of m6A modifications is thus essential to elucidation of their biological functions and underlying molecular-level mechanisms. Currently available high-throughput single-nucleotide-resolution m6A modification data considerably accelerated the identification of RNA modification sites through the development of data-driven computational methods. Nevertheless, existing methods have limitations in terms of the coverage of single-nucleotide-resolution cell lines and have poor capability in model interpretations, thereby having limited applicability. RESULTS In this study, we present CLSM6A, comprising a set of deep learning-based models designed for predicting single-nucleotide-resolution m6A RNA modification sites across eight different cell lines and three tissues. Extensive benchmarking experiments are conducted on well-curated datasets and accordingly, CLSM6A achieves superior performance than current state-of-the-art methods. Furthermore, CLSM6A is capable of interpreting the prediction decision-making process by excavating critical motifs activated by filters and pinpointing highly concerned positions in both forward and backward propagations. CLSM6A exhibits better portability on similar cross-cell line/tissue datasets, reveals a strong association between highly activated motifs and high-impact motifs, and demonstrates complementary attributes of different interpretation strategies. AVAILABILITY AND IMPLEMENTATION The webserver is available at http://csbio.njust.edu.cn/bioinf/clsm6a. The datasets and code are available at https://github.com/zhangying-njust/CLSM6A/.
Collapse
Affiliation(s)
- Ying Zhang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yiwen Zhang
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Shanshan Li
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Yuming Guo
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| |
Collapse
|
27
|
Yang Y, Liu Z, Lu J, Sun Y, Fu Y, Pan M, Xie X, Ge Q. Analysis approaches for the identification and prediction of N6-methyladenosine sites. Epigenetics 2023; 18:2158284. [PMID: 36562485 PMCID: PMC9980620 DOI: 10.1080/15592294.2022.2158284] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The global dynamics in a variety of biological processes can be revealed by mapping transcriptional m6A sites, in particular full-transcriptome m6A. And individual m6A sites have contributed to biological function, which can be evaluated by stoichiometric information obtained from the single nucleotide resolution. Currently, the identification of m6A sites is mainly carried out by experiment and prediction methods, based on high-throughput sequencing and machine learning model respectively. This review summarizes the recent topics and progress made in bioinformatics methods of deciphering the m6A methylation, including the experimental detection of m6A methylation sites, techniques of data analysis, the way of predicting m6A methylation sites, m6A methylation databases, and detection of m6A modification in circRNA. At the end, the essay makes a brief discussion for the development perspective in this area.
Collapse
Affiliation(s)
- Yuwei Yang
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Zhiyu Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Junru Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Yuqing Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Yue Fu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Min Pan
- Department of Pathology and Pathophysiology School of Medicine, Southeast University, Nanjing, China
| | - Xueying Xie
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| |
Collapse
|
28
|
Benak D, Kolar F, Zhang L, Devaux Y, Hlavackova M. RNA modification m 6Am: the role in cardiac biology. Epigenetics 2023; 18:2218771. [PMID: 37331009 DOI: 10.1080/15592294.2023.2218771] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 05/19/2023] [Accepted: 05/23/2023] [Indexed: 06/20/2023] Open
Abstract
Epitranscriptomic modifications have recently emerged into the spotlight of researchers due to their vast regulatory effects on gene expression and thereby cellular physiology and pathophysiology. N6,2'-O-dimethyladenosine (m6Am) is one of the most prevalent chemical marks on RNA and is dynamically regulated by writers (PCIF1, METTL4) and erasers (FTO). The presence or absence of m6Am in RNA affects mRNA stability, regulates transcription, and modulates pre-mRNA splicing. Nevertheless, its functions in the heart are poorly known. This review summarizes the current knowledge and gaps about m6Am modification and its regulators in cardiac biology. It also points out technical challenges and lists the currently available techniques to measure m6Am. A better understanding of epitranscriptomic modifications is needed to improve our knowledge of the molecular regulations in the heart which may lead to novel cardioprotective strategies.
Collapse
Affiliation(s)
- Daniel Benak
- Laboratory of Developmental Cardiology, Institute of Physiology of the Czech Academy of Sciences, Prague, Czech Republic
- Department of Physiology, Faculty of Science, Charles University, Prague, Czech Republic
| | - Frantisek Kolar
- Laboratory of Developmental Cardiology, Institute of Physiology of the Czech Academy of Sciences, Prague, Czech Republic
| | - Lu Zhang
- Bioinformatics Platform, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Yvan Devaux
- Cardiovascular Research Unit, Department of Population Health, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Marketa Hlavackova
- Laboratory of Developmental Cardiology, Institute of Physiology of the Czech Academy of Sciences, Prague, Czech Republic
| |
Collapse
|
29
|
Jia J, Wei Z, Sun M. EMDL_m6Am: identifying N6,2'-O-dimethyladenosine sites based on stacking ensemble deep learning. BMC Bioinformatics 2023; 24:397. [PMID: 37880673 PMCID: PMC10598967 DOI: 10.1186/s12859-023-05543-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 10/20/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND N6, 2'-O-dimethyladenosine (m6Am) is an abundant RNA methylation modification on vertebrate mRNAs and is present in the transcription initiation region of mRNAs. It has recently been experimentally shown to be associated with several human disorders, including obesity genes, and stomach cancer, among others. As a result, N6,2'-O-dimethyladenosine (m6Am) site will play a crucial part in the regulation of RNA if it can be correctly identified. RESULTS This study proposes a novel deep learning-based m6Am prediction model, EMDL_m6Am, which employs one-hot encoding to expressthe feature map of the RNA sequence and recognizes m6Am sites by integrating different CNN models via stacking. Including DenseNet, Inflated Convolutional Network (DCNN) and Deep Multiscale Residual Network (MSRN), the sensitivity (Sn), specificity (Sp), accuracy (ACC), Mathews correlation coefficient (MCC) and area under the curve (AUC) of our model on the training data set reach 86.62%, 88.94%, 87.78%, 0.7590 and 0.8778, respectively, and the prediction results on the independent test set are as high as 82.25%, 79.72%, 80.98%, 0.6199, and 0.8211. CONCLUSIONS In conclusion, the experimental results demonstrated that EMDL_m6Am greatly improved the predictive performance of the m6Am sites and could provide a valuable reference for the next part of the study. The source code and experimental data are available at: https://github.com/13133989982/EMDL-m6Am .
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China.
| | - Zhangying Wei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China.
| | - Mingwei Sun
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| |
Collapse
|
30
|
Shi M, Li X, Li M, Si Y. Attention-based generative adversarial networks improve prognostic outcome prediction of cancer from multimodal data. Brief Bioinform 2023; 24:bbad329. [PMID: 37756592 DOI: 10.1093/bib/bbad329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 08/20/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023] Open
Abstract
The prediction of prognostic outcome is critical for the development of efficient cancer therapeutics and potential personalized medicine. However, due to the heterogeneity and diversity of multimodal data of cancer, data integration and feature selection remain a challenge for prognostic outcome prediction. We proposed a deep learning method with generative adversarial network based on sequential channel-spatial attention modules (CSAM-GAN), a multimodal data integration and feature selection approach, for accomplishing prognostic stratification tasks in cancer. Sequential channel-spatial attention modules equipped with an encoder-decoder are applied for the input features of multimodal data to accurately refine selected features. A discriminator network was proposed to make the generator and discriminator learning in an adversarial way to accurately describe the complex heterogeneous information of multiple modal data. We conducted extensive experiments with various feature selection and classification methods and confirmed that the CSAM-GAN via the multilayer deep neural network (DNN) classifier outperformed these baseline methods on two different multimodal data sets with miRNA expression, mRNA expression and histopathological image data: lower-grade glioma and kidney renal clear cell carcinoma. The CSAM-GAN via the multilayer DNN classifier bridges the gap between heterogenous multimodal data and prognostic outcome prediction.
Collapse
Affiliation(s)
- Mingguang Shi
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China
| | - Xuefeng Li
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China
| | - Mingna Li
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China
| | - Yichong Si
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui 230009, China
| |
Collapse
|
31
|
Zhang Y, Ge F, Li F, Yang X, Song J, Yu DJ. Prediction of Multiple Types of RNA Modifications via Biological Language Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3205-3214. [PMID: 37289599 DOI: 10.1109/tcbb.2023.3283985] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
It has been demonstrated that RNA modifications play essential roles in multiple biological processes. Accurate identification of RNA modifications in the transcriptome is critical for providing insights into the biological functions and mechanisms. Many tools have been developed for predicting RNA modifications at single-base resolution, which employ conventional feature engineering methods that focus on feature design and feature selection processes that require extensive biological expertise and may introduce redundant information. With the rapid development of artificial intelligence technologies, end-to-end methods are favorably received by researchers. Nevertheless, each well-trained model is only suitable for a specific RNA methylation modification type for nearly all of these approaches. In this study, we present MRM-BERT by feeding task-specific sequences into the powerful BERT (Bidirectional Encoder Representations from Transformers) model and implementing fine-tuning, which exhibits competitive performance to the state-of-the-art methods. MRM-BERT avoids repeated de novo training of the model and can predict multiple RNA modifications such as pseudouridine, m6A, m5C, and m1A in Mus musculus, Arabidopsis thaliana, and Saccharomyces cerevisiae. In addition, we analyse the attention heads to provide high attention regions for the prediction, and conduct saturated in silico mutagenesis of the input sequences to discover potential changes of RNA modifications, which can better assist researchers in their follow-up research.
Collapse
|
32
|
Liang S, Zhao Y, Jin J, Qiao J, Wang D, Wang Y, Wei L. Rm-LR: A long-range-based deep learning model for predicting multiple types of RNA modifications. Comput Biol Med 2023; 164:107238. [PMID: 37515874 DOI: 10.1016/j.compbiomed.2023.107238] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Recent research has highlighted the pivotal role of RNA post-transcriptional modifications in the regulation of RNA expression and function. Accurate identification of RNA modification sites is important for understanding RNA function. In this study, we propose a novel RNA modification prediction method, namely Rm-LR, which leverages a long-range-based deep learning approach to accurately predict multiple types of RNA modifications using RNA sequences only. Rm-LR incorporates two large-scale RNA language pre-trained models to capture discriminative sequential information and learn local important features, which are subsequently integrated through a bilinear attention network. Rm-LR supports a total of ten RNA modification types (m6A, m1A, m5C, m5U, m6Am, Ψ, Am, Cm, Gm, and Um) and significantly outperforms the state-of-the-art methods in terms of predictive capability on benchmark datasets. Experimental results show the effectiveness and superiority of Rm-LR in prediction of various RNA modifications, demonstrating the strong adaptability and robustness of our proposed model. We demonstrate that RNA language pretrained models enable to learn dense biological sequential representations from large-scale long-range RNA corpus, and meanwhile enhance the interpretability of the models. This work contributes to the development of accurate and reliable computational models for RNA modification prediction, providing insights into the complex landscape of RNA modifications.
Collapse
Affiliation(s)
- Sirui Liang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yanxi Zhao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Ding Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China.
| |
Collapse
|
33
|
Zhou Y, Wu J, Yao S, Xu Y, Zhao W, Tong Y, Zhou Z. DeepCIP: A multimodal deep learning method for the prediction of internal ribosome entry sites of circRNAs. Comput Biol Med 2023; 164:107288. [PMID: 37542919 DOI: 10.1016/j.compbiomed.2023.107288] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/05/2023] [Accepted: 07/28/2023] [Indexed: 08/07/2023]
Abstract
Circular RNAs (circRNAs) have been found to have the ability to encode proteins through internal ribosome entry sites (IRESs), which are essential RNA regulatory elements for cap-independent translation. Identification of IRES elements in circRNA is crucial for understanding its function. Previous studies have presented IRES predictors based on machine learning techniques, but they were mainly designed for linear RNA IRES. In this study, we proposed DeepCIP (Deep learning method for CircRNA IRES Prediction), a multimodal deep learning approach that employs both sequence and structural information for circRNA IRES prediction. Our results demonstrate the effectiveness of the sequence and structure models used by DeepCIP in feature extraction and suggest that integrating sequence and structural information efficiently improves the accuracy of prediction. The comparison studies indicate that DeepCIP outperforms other comparative methods on the test set and real circRNA IRES dataset. Furthermore, through the integration of an interpretable analysis mechanism, we elucidate the sequence patterns learned by our model, which align with the previous discovery of motifs that facilitate circRNA translation. Thus, DeepCIP has the potential to enhance the study of the coding potential of circRNAs and contribute to the design of circRNA-based drugs. DeepCIP as a standalone program is freely available at https://github.org/zjupgx/DeepCIP.
Collapse
Affiliation(s)
- Yuxuan Zhou
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; Zhejiang University Innovation Institute for Artificial Intelligence in Medicine - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Jingcheng Wu
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Shihao Yao
- College of Life Sciences, China Jiliang University, Hangzhou, 310018, China; China Jiliang University - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Yulian Xu
- College of Life Sciences, China Jiliang University, Hangzhou, 310018, China; China Jiliang University - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Wenbin Zhao
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; Zhejiang University Innovation Institute for Artificial Intelligence in Medicine - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Yunguang Tong
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; College of Life Sciences, China Jiliang University, Hangzhou, 310018, China; Aoming (Hangzhou) Biomedical Co., Ltd., Hangzhou, 310018, China; Zhejiang University Innovation Institute for Artificial Intelligence in Medicine - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China; China Jiliang University - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China.
| | - Zhan Zhou
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, 322000, China.
| |
Collapse
|
34
|
Abbas Z, Rehman MU, Tayara H, Zou Q, Chong KT. XGBoost framework with feature selection for the prediction of RNA N5-methylcytosine sites. Mol Ther 2023; 31:2543-2551. [PMID: 37271991 PMCID: PMC10422016 DOI: 10.1016/j.ymthe.2023.05.016] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/06/2023] [Accepted: 05/31/2023] [Indexed: 06/06/2023] Open
Abstract
5-methylcytosine (m5C) is indeed a critical post-transcriptional alteration that is widely present in various kinds of RNAs and is crucial to the fundamental biological processes. By correctly identifying the m5C-methylation sites on RNA, clinicians can more clearly comprehend the precise function of these m5C-sites in different biological processes. Due to their effectiveness and affordability, computational methods have received greater attention over the last few years for the identification of methylation sites in various species. To precisely identify RNA m5C locations in five different species including Homo sapiens, Arabidopsis thaliana, Mus musculus, Drosophila melanogaster, and Danio rerio, we proposed a more effective and accurate model named m5C-pred. To create m5C-pred, five distinct feature encoding techniques were combined to extract features from the RNA sequence, and then we used SHapley Additive exPlanations to choose the best features among them, followed by XGBoost as a classifier. We applied the novel optimization method called Optuna to quickly and efficiently determine the best hyperparameters. Finally, the proposed model was evaluated using independent test datasets, and we compared the results with the previous methods. Our approach, m5C- pred, is anticipated to be useful for accurately identifying m5C sites, outperforming the currently available state-of-the-art techniques.
Collapse
Affiliation(s)
- Zeeshan Abbas
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Mobeen Ur Rehman
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea.
| |
Collapse
|
35
|
Song B, Huang D, Zhang Y, Wei Z, Su J, Pedro de Magalhães J, Rigden DJ, Meng J, Chen K. m6A-TSHub: Unveiling the Context-specific m 6A Methylation and m 6A-affecting Mutations in 23 Human Tissues. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:678-694. [PMID: 36096444 PMCID: PMC10787194 DOI: 10.1016/j.gpb.2022.09.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 08/19/2022] [Accepted: 09/02/2022] [Indexed: 06/15/2023]
Abstract
As the most pervasive epigenetic marker present on mRNAs and long non-coding RNAs (lncRNAs), N6-methyladenosine (m6A) RNA methylation has been shown to participate in essential biological processes. Recent studies have revealed the distinct patterns of m6A methylome across human tissues, and a major challenge remains in elucidating the tissue-specific presence and circuitry of m6A methylation. We present here a comprehensive online platform, m6A-TSHub, for unveiling the context-specific m6A methylation and genetic mutations that potentially regulate m6A epigenetic mark. m6A-TSHub consists of four core components, including (1) m6A-TSDB, a comprehensive database of 184,554 functionally annotated m6A sites derived from 23 human tissues and 499,369 m6A sites from 25 tumor conditions, respectively; (2) m6A-TSFinder, a web server for high-accuracy prediction of m6A methylation sites within a specific tissue from RNA sequences, which was constructed using multi-instance deep neural networks with gated attention; (3) m6A-TSVar, a web server for assessing the impact of genetic variants on tissue-specific m6A RNA modifications; and (4) m6A-CAVar, a database of 587,983 The Cancer Genome Atlas (TCGA) cancer mutations (derived from 27 cancer types) that were predicted to affect m6A modifications in the primary tissue of cancers. The database should make a useful resource for studying the m6A methylome and the genetic factors of epitranscriptome disturbance in a specific tissue (or cancer type). m6A-TSHub is accessible at www.xjtlu.edu.cn/biologicalsciences/m6ats.
Collapse
Affiliation(s)
- Bowen Song
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China; Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Daiyun Huang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Department of Computer Science, University of Liverpool, Liverpool L69 7ZB, United Kingdom.
| | - Yuxin Zhang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Ageing & Chronic Disease, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Jionglong Su
- School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - João Pedro de Magalhães
- Institute of Ageing & Chronic Disease, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Jia Meng
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom; Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; AI University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Kunqi Chen
- Key Laboratory of Gastrointestinal Cancer (Fujian Medical University), Ministry of Education, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China.
| |
Collapse
|
36
|
Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review. BIOLOGY 2023; 12:1033. [PMID: 37508462 PMCID: PMC10376273 DOI: 10.3390/biology12071033] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/18/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]
Abstract
The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
Collapse
Affiliation(s)
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea;
| |
Collapse
|
37
|
Jia J, Wei Z, Cao X. EMDL-ac4C: identifying N4-acetylcytidine based on ensemble two-branch residual connection DenseNet and attention. Front Genet 2023; 14:1232038. [PMID: 37519885 PMCID: PMC10372626 DOI: 10.3389/fgene.2023.1232038] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 06/29/2023] [Indexed: 08/01/2023] Open
Abstract
Introduction: N4-acetylcytidine (ac4C) is a critical acetylation modification that has an essential function in protein translation and is associated with a number of human diseases. Methods: The process of identifying ac4C sites by biological experiments is too cumbersome and costly. And the performance of several existing computational models needs to be improved. Therefore, we propose a new deep learning tool EMDL-ac4C to predict ac4C sites, which uses a simple one-hot encoding for a unbalanced dataset using a downsampled ensemble deep learning network to extract important features to identify ac4C sites. The base learner of this ensemble model consists of a modified DenseNet and Squeeze-and-Excitation Networks. In addition, we innovatively add a convolutional residual structure in parallel with the dense block to achieve the effect of two-layer feature extraction. Results: The average accuracy (Acc), mathews correlation coefficient (MCC), and area under the curve Area under curve of EMDL-ac4C on ten independent testing sets are 80.84%, 61.77%, and 87.94%, respectively. Discussion: Multiple experimental comparisons indicate that EMDL-ac4C outperforms existing predictors and it greatly improved the predictive performance of the ac4C sites. At the same time, EMDL-ac4C could provide a valuable reference for the next part of the study. The source code and experimental data are available at: https://github.com/13133989982/EMDLac4C.
Collapse
Affiliation(s)
- Jianhua Jia
- *Correspondence: Jianhua Jia, ; Zhangying Wei,
| | | | | |
Collapse
|
38
|
Kong Y, Yu J, Ge S, Fan X. Novel insight into RNA modifications in tumor immunity: Promising targets to prevent tumor immune escape. Innovation (N Y) 2023; 4:100452. [PMID: 37485079 PMCID: PMC10362524 DOI: 10.1016/j.xinn.2023.100452] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 05/23/2023] [Indexed: 07/25/2023] Open
Abstract
An immunosuppressive state is a typical feature of the tumor microenvironment. Despite the dramatic success of immune checkpoint inhibitor (ICI) therapy in preventing tumor cell escape from immune surveillance, primary and acquired resistance have limited its clinical use. Notably, recent clinical trials have shown that epigenetic drugs can significantly improve the outcome of ICI therapy in various cancers, indicating the importance of epigenetic modifications in immune regulation of tumors. Recently, RNA modifications (N6-methyladenosine [m6A], N1-methyladenosine [m1A], 5-methylcytosine [m5C], etc.), novel hotspot areas of epigenetic research, have been shown to play crucial roles in protumor and antitumor immunity. In this review, we provide a comprehensive understanding of how m6A, m1A, and m5C function in tumor immunity by directly regulating different immune cells as well as indirectly regulating tumor cells through different mechanisms, including modulating the expression of immune checkpoints, inducing metabolic reprogramming, and affecting the secretion of immune-related factors. Finally, we discuss the current status of strategies targeting RNA modifications to prevent tumor immune escape, highlighting their potential.
Collapse
Affiliation(s)
- Yuxin Kong
- Department of Ophthalmology, Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai 200001, China
| | - Jie Yu
- Department of Ophthalmology, Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai 200001, China
| | - Shengfang Ge
- Department of Ophthalmology, Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai 200001, China
| | - Xianqun Fan
- Department of Ophthalmology, Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai 200001, China
| |
Collapse
|
39
|
Chen R, Li F, Guo X, Bi Y, Li C, Pan S, Coin LJM, Song J. ATTIC is an integrated approach for predicting A-to-I RNA editing sites in three species. Brief Bioinform 2023; 24:bbad170. [PMID: 37150785 PMCID: PMC10565902 DOI: 10.1093/bib/bbad170] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 04/12/2023] [Accepted: 04/14/2023] [Indexed: 05/09/2023] Open
Abstract
A-to-I editing is the most prevalent RNA editing event, which refers to the change of adenosine (A) bases to inosine (I) bases in double-stranded RNAs. Several studies have revealed that A-to-I editing can regulate cellular processes and is associated with various human diseases. Therefore, accurate identification of A-to-I editing sites is crucial for understanding RNA-level (i.e. transcriptional) modifications and their potential roles in molecular functions. To date, various computational approaches for A-to-I editing site identification have been developed; however, their performance is still unsatisfactory and needs further improvement. In this study, we developed a novel stacked-ensemble learning model, ATTIC (A-To-I ediTing predICtor), to accurately identify A-to-I editing sites across three species, including Homo sapiens, Mus musculus and Drosophila melanogaster. We first comprehensively evaluated 37 RNA sequence-derived features combined with 14 popular machine learning algorithms. Then, we selected the optimal base models to build a series of stacked ensemble models. The final ATTIC framework was developed based on the optimal models improved by the feature selection strategy for specific species. Extensive cross-validation and independent tests illustrate that ATTIC outperforms state-of-the-art tools for predicting A-to-I editing sites. We also developed a web server for ATTIC, which is publicly available at http://web.unimelb-bioinfortools.cloud.edu.au/ATTIC/. We anticipate that ATTIC can be utilized as a useful tool to accelerate the identification of A-to-I RNA editing events and help characterize their roles in post-transcriptional regulation.
Collapse
Affiliation(s)
- Ruyi Chen
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Fuyi Li
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Yue Bi
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| | - Shirui Pan
- School of Information and Communication Technology, Griffith University, QLD 4222, Australia
| | - Lachlan J M Coin
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
| |
Collapse
|
40
|
Li F, Liu S, Li K, Zhang Y, Duan M, Yao Z, Zhu G, Guo Y, Wang Y, Huang L, Zhou F. EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species. Comput Biol Med 2023; 160:107030. [PMID: 37196456 DOI: 10.1016/j.compbiomed.2023.107030] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 04/21/2023] [Accepted: 05/10/2023] [Indexed: 05/19/2023]
Abstract
Methylation is a major DNA epigenetic modification for regulating the biological processes without altering the DNA sequence, and multiple types of DNA methylations have been discovered, including 6mA, 5hmC, and 4mC. Multiple computational approaches were developed to automatically identify the DNA methylation residues using machine learning or deep learning algorithms. The machine learning (ML) based methods are difficult to be transferred to the other predicting tasks of the DNA methylation sites using additional knowledge. Deep learning (DL) may facilitate the transfer learning of knowledge from similar tasks, but they are often ineffective on small datasets. This study proposes an integrated feature representation framework EpiTEAmDNA based on the strategies of transfer learning and ensemble learning, which is evaluated on multiple DNA methylation types across 15 species. EpiTEAmDNA integrates convolutional neural network (CNN) and conventional machine learning methods, and shows improved performances than the existing DL-based methods on small datasets when no additional knowledge is available. The experimental data suggests that the EpiTEAmDNA models may be further improved via transfer learning based on additional knowledge. The evaluation experiments on the independent test datasets also suggest that the proposed EpiTEAmDNA framework outperforms the existing models in most prediction tasks of the 3 DNA methylation types across 15 species. The source code, pre-trained global model, and the EpiTEAmDNA feature representation framework are freely available at http://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Fei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Shuai Liu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Kewei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yaqi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Meiyu Duan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| | - Zhaomin Yao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China
| | - Gancheng Zhu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yutong Guo
- College of Life Sciences, Jilin University, Changchun, Jilin, 130012, China
| | - Ying Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| |
Collapse
|
41
|
Acera Mateos P, Zhou Y, Zarnack K, Eyras E. Concepts and methods for transcriptome-wide prediction of chemical messenger RNA modifications with machine learning. Brief Bioinform 2023; 24:7150742. [PMID: 37139545 DOI: 10.1093/bib/bbad163] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/03/2023] [Indexed: 05/05/2023] Open
Abstract
The expanding field of epitranscriptomics might rival the epigenome in the diversity of biological processes impacted. In recent years, the development of new high-throughput experimental and computational techniques has been a key driving force in discovering the properties of RNA modifications. Machine learning applications, such as for classification, clustering or de novo identification, have been critical in these advances. Nonetheless, various challenges remain before the full potential of machine learning for epitranscriptomics can be leveraged. In this review, we provide a comprehensive survey of machine learning methods to detect RNA modifications using diverse input data sources. We describe strategies to train and test machine learning methods and to encode and interpret features that are relevant for epitranscriptomics. Finally, we identify some of the current challenges and open questions about RNA modification analysis, including the ambiguity in predicting RNA modifications in transcript isoforms or in single nucleotides, or the lack of complete ground truth sets to test RNA modifications. We believe this review will inspire and benefit the rapidly developing field of epitranscriptomics in addressing the current limitations through the effective use of machine learning.
Collapse
Affiliation(s)
- Pablo Acera Mateos
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| | - You Zhou
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| |
Collapse
|
42
|
Soylu NN, Sefer E. BERT2OME: Prediction of 2'-O-Methylation Modifications From RNA Sequence by Transformer Architecture Based on BERT. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2177-2189. [PMID: 37819796 DOI: 10.1109/tcbb.2023.3237769] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Recent work on language models has resulted in state-of-the-art performance on various language tasks. Among these, Bidirectional Encoder Representations from Transformers (BERT) has focused on contextualizing word embeddings to extract context and semantics of the words. On the other hand, post-transcriptional 2'-O-methylation (Nm) RNA modification is important in various cellular tasks and related to a number of diseases. The existing high-throughput experimental techniques take longer time to detect these modifications, and costly in exploring these functional processes. Here, to deeply understand the associated biological processes faster, we come up with an efficient method Bert2Ome to infer 2'-O-methylation RNA modification sites from RNA sequences. Bert2Ome combines BERT-based model with convolutional neural networks (CNN) to infer the relationship between the modification sites and RNA sequence content. Unlike the methods proposed so far, Bert2Ome assumes each given RNA sequence as a text and focuses on improving the modification prediction performance by integrating the pretrained deep learning-based language model BERT. Additionally, our transformer-based approach could infer modification sites across multiple species. According to 5-fold cross-validation, human and mouse accuracies were 99.15% and 94.35% respectively. Similarly, ROC AUC scores were 0.99, 0.94 for the same species. Detailed results show that Bert2Ome reduces the time consumed in biological experiments and outperforms the existing approaches across different datasets and species over multiple metrics. Additionally, deep learning approaches such as 2D CNNs are more promising in learning BERT attributes than more conventional machine learning methods.
Collapse
|
43
|
Wang R, Chung CR, Huang HD, Lee TY. Identification of species-specific RNA N6-methyladinosine modification sites from RNA sequences. Brief Bioinform 2023; 24:7008797. [PMID: 36715277 DOI: 10.1093/bib/bbac573] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/11/2022] [Accepted: 11/24/2022] [Indexed: 01/31/2023] Open
Abstract
N6-methyladinosine (m6A) modification is the most abundant co-transcriptional modification in eukaryotic RNA and plays important roles in cellular regulation. Traditional high-throughput sequencing experiments used to explore functional mechanisms are time-consuming and labor-intensive, and most of the proposed methods focused on limited species types. To further understand the relevant biological mechanisms among different species with the same RNA modification, it is necessary to develop a computational scheme that can be applied to different species. To achieve this, we proposed an attention-based deep learning method, adaptive-m6A, which consists of convolutional neural network, bi-directional long short-term memory and an attention mechanism, to identify m6A sites in multiple species. In addition, three conventional machine learning (ML) methods, including support vector machine, random forest and logistic regression classifiers, were considered in this work. In addition to the performance of ML methods for multi-species prediction, the optimal performance of adaptive-m6A yielded an accuracy of 0.9832 and the area under the receiver operating characteristic curve of 0.98. Moreover, the motif analysis and cross-validation among different species were conducted to test the robustness of one model towards multiple species, which helped improve our understanding about the sequence characteristics and biological functions of RNA modifications in different species.
Collapse
Affiliation(s)
- Rulan Wang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| | - Chia-Ru Chung
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Life Sciences, University of Science and Technology of China, 230026, Hefei, Anhui, P.R. China
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| |
Collapse
|
44
|
Fan Y, Sun G, Pan X. ELMo4m6A: A Contextual Language Embedding-Based Predictor for Detecting RNA N6-Methyladenosine Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:944-954. [PMID: 35536814 DOI: 10.1109/tcbb.2022.3173323] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
N6-methyladenosine (m6A) is a universal post-transcriptional modification of RNAs, and it is widely involved in various biological processes. Identifying m6A modification sites accurately is indispensable to further investigate m6A-mediated biological functions. How to better represent RNA sequences is crucial for building effective computational methods for detecting m6A modification sites. However, traditional encoding methods require complex biological prior knowledge and are time-consuming. Furthermore, most of the existing m6A sites prediction methods are limited to single species, and few methods are able to predict m6A sites across different species and tissues. Thus, it is necessary to design a more efficient computational method to predict m6A sites across multiple species and tissues. In this paper, we proposed ELMo4m6A, a contextual language embedding-based method for predicting m6A sites from RNA sequences without any prior knowledge. ELMo4m6A first learns embeddings of RNA sequences using a language model ELMo, then uses a hybrid convolutional neural network (CNN) and long short-term memory (LSTM) to identify m6A sites. The results of 5-fold cross-validation and independent testing demonstrate that ELMo4m6A is superior to state-of-the-art methods. Moreover, we applied integrated gradients to find potential sequence patterns contributing to m6A sites.
Collapse
|
45
|
Liu T, Zou B, He M, Hu Y, Dou Y, Cui T, Tan P, Li S, Rao S, Huang Y, Liu S, Cai K, Wang D. LncReader: identification of dual functional long noncoding RNAs using a multi-head self-attention mechanism. Brief Bioinform 2023; 24:6961607. [PMID: 36575567 DOI: 10.1093/bib/bbac579] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 11/11/2022] [Accepted: 11/28/2022] [Indexed: 12/29/2022] Open
Abstract
Long noncoding ribonucleic acids (RNAs; LncRNAs) endowed with both protein-coding and noncoding functions are referred to as 'dual functional lncRNAs'. Recently, dual functional lncRNAs have been intensively studied and identified as involved in various fundamental cellular processes. However, apart from time-consuming and cell-type-specific experiments, there is virtually no in silico method for predicting the identity of dual functional lncRNAs. Here, we developed a deep-learning model with a multi-head self-attention mechanism, LncReader, to identify dual functional lncRNAs. Our data demonstrated that LncReader showed multiple advantages compared to various classical machine learning methods using benchmark datasets from our previously reported cncRNAdb project. Moreover, to obtain independent in-house datasets for robust testing, mass spectrometry proteomics combined with RNA-seq and Ribo-seq were applied in four leukaemia cell lines, which further confirmed that LncReader achieved the best performance compared to other tools. Therefore, LncReader provides an accurate and practical tool that enables fast dual functional lncRNA identification.
Collapse
Affiliation(s)
- Tianyuan Liu
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China.,Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Bohao Zou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Department of Statistics, University of California Davis, Davis, California, USA
| | - Manman He
- State Key Laboratory of Medical Molecular Biology, Key Laboratorytar of RNA Regulation and Hematopoiesis, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, CAMS and Peking Union Medical College, Beijing 100005, China
| | - Yongfei Hu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Dermatology Hospital, Southern Medical University, Guangzhou, 510091, China
| | - Yiying Dou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Tianyu Cui
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Puwen Tan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Shaobin Li
- Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Shuan Rao
- Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Yan Huang
- Cancer Research Institute, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Sixi Liu
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Kaican Cai
- Department of Thoracic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Dong Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Dermatology Hospital, Southern Medical University, Guangzhou, 510091, China.,Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, 350122, China
| |
Collapse
|
46
|
Zou J, Liu H, Tan W, Chen YQ, Dong J, Bai SY, Wu ZX, Zeng Y. Dynamic regulation and key roles of ribonucleic acid methylation. Front Cell Neurosci 2022; 16:1058083. [PMID: 36601431 PMCID: PMC9806184 DOI: 10.3389/fncel.2022.1058083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Ribonucleic acid (RNA) methylation is the most abundant modification in biological systems, accounting for 60% of all RNA modifications, and affects multiple aspects of RNA (including mRNAs, tRNAs, rRNAs, microRNAs, and long non-coding RNAs). Dysregulation of RNA methylation causes many developmental diseases through various mechanisms mediated by N 6-methyladenosine (m6A), 5-methylcytosine (m5C), N 1-methyladenosine (m1A), 5-hydroxymethylcytosine (hm5C), and pseudouridine (Ψ). The emerging tools of RNA methylation can be used as diagnostic, preventive, and therapeutic markers. Here, we review the accumulated discoveries to date regarding the biological function and dynamic regulation of RNA methylation/modification, as well as the most popularly used techniques applied for profiling RNA epitranscriptome, to provide new ideas for growth and development.
Collapse
Affiliation(s)
- Jia Zou
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Hui Liu
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Wei Tan
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Yi-qi Chen
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Jing Dong
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Shu-yuan Bai
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China
| | - Zhao-xia Wu
- Community Health Service Center, Wuchang Hospital, Wuhan, China
| | - Yan Zeng
- Community Health Service Center, Geriatric Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China,Brain Science and Advanced Technology Institute, School of Medicine, Wuhan University of Science and Technology, Wuhan, China,School of Public Health, Wuhan University of Science and Technology, Wuhan, China,*Correspondence: Yan Zeng,
| |
Collapse
|
47
|
Ao C, Jiao S, Wang Y, Yu L, Zou Q. Biological Sequence Classification: A Review on Data and General Methods. RESEARCH (WASHINGTON, D.C.) 2022; 2022:0011. [PMID: 39285948 PMCID: PMC11404319 DOI: 10.34133/research.0011] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 10/25/2022] [Indexed: 09/19/2024]
Abstract
With the rapid development of biotechnology, the number of biological sequences has grown exponentially. The continuous expansion of biological sequence data promotes the application of machine learning in biological sequences to construct predictive models for mining biological sequence information. There are many branches of biological sequence classification research. In this review, we mainly focus on the function and modification classification of biological sequences based on machine learning. Sequence-based prediction and analysis are the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides. However, there are hundreds of classification models developed for biological sequences, and the quite varied specific methods seem dizzying at first glance. Here, we aim to establish a long-term support website (http://lab.malab.cn/~acy/BioseqData/home.html), which provides readers with detailed information on the classification method and download links to relevant datasets. We briefly introduce the steps to build an effective model framework for biological sequence data. In addition, a brief introduction to single-cell sequencing data analysis methods and applications in biology is also included. Finally, we discuss the current challenges and future perspectives of biological sequence classification research.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
48
|
Luo Z, Lou L, Qiu W, Xu Z, Xiao X. Predicting N6-Methyladenosine Sites in Multiple Tissues of Mammals through Ensemble Deep Learning. Int J Mol Sci 2022; 23:15490. [PMID: 36555143 PMCID: PMC9778682 DOI: 10.3390/ijms232415490] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/03/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
N6-methyladenosine (m6A) is the most abundant within eukaryotic messenger RNA modification, which plays an essential regulatory role in the control of cellular functions and gene expression. However, it remains an outstanding challenge to detect mRNA m6A transcriptome-wide at base resolution via experimental approaches, which are generally time-consuming and expensive. Developing computational methods is a good strategy for accurate in silico detection of m6A modification sites from the large amount of RNA sequence data. Unfortunately, the existing computational models are usually only for m6A site prediction in a single species, without considering the tissue level of species, while most of them are constructed based on low-confidence level data generated by an m6A antibody immunoprecipitation (IP)-based sequencing method, thereby restricting reliability and generalizability of proposed models. Here, we review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals. Our model im6APred builds upon a comprehensive evaluation of multiple classification methods, including four traditional classification algorithms and three deep learning methods and their ensembles. The optimal base-classifier combinations are then chosen by five-fold cross-validation test to achieve an effective stacked model. Our model im6APred can produce the area under the receiver operating characteristic curve (AUROC) in the range of 0.82-0.91 on independent tests, indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification. Moreover, AUROCs in the range of 0.77-0.96 were achieved using cross-species/tissues validation on the benchmark dataset, demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.
Collapse
Affiliation(s)
| | | | | | - Zhaochun Xu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
49
|
Zhou J, Wang X, Wei Z, Meng J, Huang D. 4acCPred: Weakly supervised prediction of N4-acetyldeoxycytosine DNA modification from sequences. MOLECULAR THERAPY - NUCLEIC ACIDS 2022; 30:337-345. [DOI: 10.1016/j.omtn.2022.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 10/12/2022] [Indexed: 11/06/2022]
|
50
|
RNADSN: Transfer-Learning 5-Methyluridine (m5U) Modification on mRNAs from Common Features of tRNA. Int J Mol Sci 2022; 23:ijms232113493. [PMID: 36362279 PMCID: PMC9655583 DOI: 10.3390/ijms232113493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 09/24/2022] [Accepted: 09/29/2022] [Indexed: 11/06/2022] Open
Abstract
One of the most abundant non-canonical bases widely occurring on various RNA molecules is 5-methyluridine (m5U). Recent studies have revealed its influences on the development of breast cancer, systemic lupus erythematosus, and the regulation of stress responses. The accurate identification of m5U sites is crucial for understanding their biological functions. We propose RNADSN, the first transfer learning deep neural network that learns common features between tRNA m5U and mRNA m5U to enhance the prediction of mRNA m5U. Without seeing the experimentally detected mRNA m5U sites, RNADSN has already outperformed the state-of-the-art method, m5UPred. Using mRNA m5U classification as an additional layer of supervision, our model achieved another distinct improvement and presented an average area under the receiver operating characteristic curve (AUC) of 0.9422 and an average precision (AP) of 0.7855. The robust performance of RNADSN was also verified by cross-technical and cross-cellular validation. The interpretation of RNADSN also revealed the sequence motif of common features. Therefore, RNADSN should be a useful tool for studying m5U modification.
Collapse
|