1
|
Zhang C, Zhu X, Peterson N, Wang J, Wan S. A Comprehensive Review on RNA Subcellular Localization Prediction. ARXIV 2025:arXiv:2504.17162v1. [PMID: 40313658 PMCID: PMC12045386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
Abstract
The subcellular localization of RNAs, including long non-coding RNAs (lncRNAs), messenger RNAs (mRNAs), microRNAs (miRNAs) and other smaller RNAs, plays a critical role in determining their biological functions. For instance, lncRNAs are predominantly associated with chromatin and act as regulators of gene transcription and chromatin structure, while mRNAs are distributed across the nucleus and cytoplasm, facilitating the transport of genetic information for protein synthesis. Understanding RNA localization sheds light on processes like gene expression regulation with spatial and temporal precision. However, traditional wet lab methods for determining RNA localization, such as in situ hybridization, are often time-consuming, resource-demanding, and costly. To overcome these challenges, computational methods leveraging artificial intelligence (AI) and machine learning (ML) have emerged as powerful alternatives, enabling large-scale prediction of RNA subcellular localization. This paper provides a comprehensive review of the latest advancements in AI-based approaches for RNA subcellular localization prediction, covering various RNA types and focusing on sequence-based, image-based, and hybrid methodologies that combine both data types. We highlight the potential of these methods to accelerate RNA research, uncover molecular pathways, and guide targeted disease treatments. Furthermore, we critically discuss the challenges in AI/ML approaches for RNA subcellular localization, such as data scarcity and lack of benchmarks, and opportunities to address them. This review aims to serve as a valuable resource for researchers seeking to develop innovative solutions in the field of RNA subcellular localization and beyond.
Collapse
Affiliation(s)
- Cece Zhang
- Department of Cell & Systems Biology, University of Toronto, ON, Canada
| | - Xuehuan Zhu
- School of Engineering, University of California, Los Angeles, CA, United States
| | - Nick Peterson
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| | - Jieqiong Wang
- Department of Neurological Sciences, University of Nebraska Medical Center, Omaha, NE, United States
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| |
Collapse
|
2
|
Choudhury S, Bajiya N, Patiyal S, Raghava GPS. MRSLpred-a hybrid approach for predicting multi-label subcellular localization of mRNA at the genome scale. FRONTIERS IN BIOINFORMATICS 2024; 4:1341479. [PMID: 38379813 PMCID: PMC10877048 DOI: 10.3389/fbinf.2024.1341479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 01/15/2024] [Indexed: 02/22/2024] Open
Abstract
In the past, several methods have been developed for predicting the single-label subcellular localization of messenger RNA (mRNA). However, only limited methods are designed to predict the multi-label subcellular localization of mRNA. Furthermore, the existing methods are slow and cannot be implemented at a transcriptome scale. In this study, a fast and reliable method has been developed for predicting the multi-label subcellular localization of mRNA that can be implemented at a genome scale. Machine learning-based methods have been developed using mRNA sequence composition, where the XGBoost-based classifier achieved an average area under the receiver operator characteristic (AUROC) of 0.709 (0.668-0.732). In addition to alignment-free methods, we developed alignment-based methods using motif search techniques. Finally, a hybrid technique that combines the XGBoost model and the motif-based approach has been developed, achieving an average AUROC of 0.742 (0.708-0.816). Our method-MRSLpred-outperforms the existing state-of-the-art classifier in terms of performance and computation efficiency. A publicly accessible webserver and a standalone tool have been developed to facilitate researchers (webserver: https://webs.iiitd.edu.in/raghava/mrslpred/).
Collapse
Affiliation(s)
| | | | | | - Gajendra P. S. Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
3
|
Zeng M, Wu Y, Li Y, Yin R, Lu C, Duan J, Li M. LncLocFormer: a Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism. Bioinformatics 2023; 39:btad752. [PMID: 38109668 PMCID: PMC10749772 DOI: 10.1093/bioinformatics/btad752] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 11/13/2023] [Accepted: 12/17/2023] [Indexed: 12/20/2023] Open
Abstract
MOTIVATION There is mounting evidence that the subcellular localization of lncRNAs can provide valuable insights into their biological functions. In the real world of transcriptomes, lncRNAs are usually localized in multiple subcellular localizations. Furthermore, lncRNAs have specific localization patterns for different subcellular localizations. Although several computational methods have been developed to predict the subcellular localization of lncRNAs, few of them are designed for lncRNAs that have multiple subcellular localizations, and none of them take motif specificity into consideration. RESULTS In this study, we proposed a novel deep learning model, called LncLocFormer, which uses only lncRNA sequences to predict multi-label lncRNA subcellular localization. LncLocFormer utilizes eight Transformer blocks to model long-range dependencies within the lncRNA sequence and shares information across the lncRNA sequence. To exploit the relationship between different subcellular localizations and find distinct localization patterns for different subcellular localizations, LncLocFormer employs a localization-specific attention mechanism. The results demonstrate that LncLocFormer outperforms existing state-of-the-art predictors on the hold-out test set. Furthermore, we conducted a motif analysis and found LncLocFormer can capture known motifs. Ablation studies confirmed the contribution of the localization-specific attention mechanism in improving the prediction performance. AVAILABILITY AND IMPLEMENTATION The LncLocFormer web server is available at http://csuligroup.com:9000/LncLocFormer. The source code can be obtained from https://github.com/CSUBioGroup/LncLocFormer.
Collapse
Affiliation(s)
- Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yifan Wu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yiming Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Rui Yin
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32603, United States
| | - Chengqian Lu
- School of Computer Science, Key Laboratory of Intelligent Computing and Information Processing, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Junwen Duan
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| |
Collapse
|
4
|
Wang J, Horlacher M, Cheng L, Winther O. RNA trafficking and subcellular localization-a review of mechanisms, experimental and predictive methodologies. Brief Bioinform 2023; 24:bbad249. [PMID: 37466130 PMCID: PMC10516376 DOI: 10.1093/bib/bbad249] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/30/2023] [Accepted: 06/16/2023] [Indexed: 07/20/2023] Open
Abstract
RNA localization is essential for regulating spatial translation, where RNAs are trafficked to their target locations via various biological mechanisms. In this review, we discuss RNA localization in the context of molecular mechanisms, experimental techniques and machine learning-based prediction tools. Three main types of molecular mechanisms that control the localization of RNA to distinct cellular compartments are reviewed, including directed transport, protection from mRNA degradation, as well as diffusion and local entrapment. Advances in experimental methods, both image and sequence based, provide substantial data resources, which allow for the design of powerful machine learning models to predict RNA localizations. We review the publicly available predictive tools to serve as a guide for users and inspire developers to build more effective prediction models. Finally, we provide an overview of multimodal learning, which may provide a new avenue for the prediction of RNA localization.
Collapse
Affiliation(s)
- Jun Wang
- Bioinformatics Centre, Department of Biology, University of Copenhagen, København Ø 2100, Denmark
| | - Marc Horlacher
- Computational Health Center, Helmholtz Center, Munich, Germany
| | - Lixin Cheng
- Shenzhen People’s Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
| | - Ole Winther
- Bioinformatics Centre, Department of Biology, University of Copenhagen, København Ø 2100, Denmark
- Center for Genomic Medicine, Rigshospitalet (Copenhagen University Hospital), Copenhagen 2100, Denmark
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| |
Collapse
|
5
|
Androvic P, Schifferer M, Perez Anderson K, Cantuti-Castelvetri L, Jiang H, Ji H, Liu L, Gouna G, Berghoff SA, Besson-Girard S, Knoferle J, Simons M, Gokce O. Spatial Transcriptomics-correlated Electron Microscopy maps transcriptional and ultrastructural responses to brain injury. Nat Commun 2023; 14:4115. [PMID: 37433806 DOI: 10.1038/s41467-023-39447-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 06/14/2023] [Indexed: 07/13/2023] Open
Abstract
Understanding the complexity of cellular function within a tissue necessitates the combination of multiple phenotypic readouts. Here, we developed a method that links spatially-resolved gene expression of single cells with their ultrastructural morphology by integrating multiplexed error-robust fluorescence in situ hybridization (MERFISH) and large area volume electron microscopy (EM) on adjacent tissue sections. Using this method, we characterized in situ ultrastructural and transcriptional responses of glial cells and infiltrating T-cells after demyelinating brain injury in male mice. We identified a population of lipid-loaded "foamy" microglia located in the center of remyelinating lesion, as well as rare interferon-responsive microglia, oligodendrocytes, and astrocytes that co-localized with T-cells. We validated our findings using immunocytochemistry and lipid staining-coupled single-cell RNA sequencing. Finally, by integrating these datasets, we detected correlations between full-transcriptome gene expression and ultrastructural features of microglia. Our results offer an integrative view of the spatial, ultrastructural, and transcriptional reorganization of single cells after demyelinating brain injury.
Collapse
Affiliation(s)
- Peter Androvic
- Institute for Stroke and Dementia Research, University Hospital of Munich, LMU Munich, Munich, Germany
| | - Martina Schifferer
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
- Munich Cluster of Systems Neurology (SyNergy), Munich, Germany
| | - Katrin Perez Anderson
- Institute for Stroke and Dementia Research, University Hospital of Munich, LMU Munich, Munich, Germany
| | - Ludovico Cantuti-Castelvetri
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
- Institute of Neuronal Cell Biology, Technical University Munich, Munich, Germany
| | - Hanyi Jiang
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
- Munich Cluster of Systems Neurology (SyNergy), Munich, Germany
| | - Hao Ji
- Institute for Stroke and Dementia Research, University Hospital of Munich, LMU Munich, Munich, Germany
| | - Lu Liu
- Institute for Stroke and Dementia Research, University Hospital of Munich, LMU Munich, Munich, Germany
| | - Garyfallia Gouna
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
- Institute of Neuronal Cell Biology, Technical University Munich, Munich, Germany
| | - Stefan A Berghoff
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
- Institute of Neuronal Cell Biology, Technical University Munich, Munich, Germany
| | - Simon Besson-Girard
- Institute for Stroke and Dementia Research, University Hospital of Munich, LMU Munich, Munich, Germany
| | - Johanna Knoferle
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
- Institute of Neuronal Cell Biology, Technical University Munich, Munich, Germany
- Department of Neurodegenerative Diseases and Geriatric Psychiatry, University Hospital Bonn, Bonn, Germany
| | - Mikael Simons
- Institute for Stroke and Dementia Research, University Hospital of Munich, LMU Munich, Munich, Germany
- German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
- Munich Cluster of Systems Neurology (SyNergy), Munich, Germany
- Institute of Neuronal Cell Biology, Technical University Munich, Munich, Germany
| | - Ozgun Gokce
- Institute for Stroke and Dementia Research, University Hospital of Munich, LMU Munich, Munich, Germany.
- Munich Cluster of Systems Neurology (SyNergy), Munich, Germany.
- Department of Neurodegenerative Diseases and Geriatric Psychiatry, University Hospital Bonn, Bonn, Germany.
| |
Collapse
|
6
|
Zhang C, Zhang R, Liang C, Deng Y, Li Z, Deng Y, Tang BZ. Charge-elimination strategy for constructing RNA-selective fluorescent probe undisturbed by mitochondria. Biomaterials 2022; 291:121915. [DOI: 10.1016/j.biomaterials.2022.121915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 11/04/2022] [Accepted: 11/13/2022] [Indexed: 11/18/2022]
|
7
|
Asim MN, Ibrahim MA, Malik MI, Zehe C, Cloarec O, Trygg J, Dengel A, Ahmed S. EL-RMLocNet: An explainable LSTM network for RNA-associated multi-compartment localization prediction. Comput Struct Biotechnol J 2022; 20:3986-4002. [PMID: 35983235 PMCID: PMC9356161 DOI: 10.1016/j.csbj.2022.07.031] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/16/2022] [Accepted: 07/16/2022] [Indexed: 11/23/2022] Open
Abstract
Subcellular localization of Ribonucleic Acid (RNA) molecules provide significant insights into the functionality of RNAs and helps to explore their association with various diseases. Predominantly developed single-compartment localization predictors (SCLPs) lack to demystify RNA association with diverse biochemical and pathological processes mainly happen through RNA co-localization in multiple compartments. Limited multi-compartment localization predictors (MCLPs) manage to produce decent performance only for target RNA class of particular sub-type. Further, existing computational approaches have limited practical significance and potential to optimize therapeutics due to the poor degree of model explainability. The paper in hand presents an explainable Long Short-Term Memory (LSTM) network "EL-RMLocNet", predictive performance and interpretability of which are optimized using a novel GeneticSeq2Vec statistical representation learning scheme and attention mechanism for accurate multi-compartment localization prediction of different RNAs solely using raw RNA sequences. GeneticSeq2Vec generates optimized statistical vectors of raw RNA sequences by capturing short and long range relations of nucleotide k-mers. Using sequence vectors generated by GeneticSeq2Vec scheme, Long Short Term Memory layers extract most informative features, weighting of which on the basis of discriminative potential for accurate multi-compartment localization prediction is performed using attention layer. Through reverse engineering, weights of statistical feature space are mapped to nucleotide k-mers patterns to make multi-compartment localization prediction decision making transparent and explainable for different RNA classes and species. Empirical evaluation indicates that EL-RMLocNet outperforms state-of-the-art predictor for subcellular localization prediction of 4 different RNA classes by an average accuracy figure of 8% for Homo Sapiens species and 6% for Mus Musculus species. EL-RMLocNet is freely available as a web server at (https://sds_genetic_analysis.opendfki.de/subcellular_loc/).
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Muhammad Imran Malik
- School of Computer Science & Electrical Engineering, National University of Sciences and Technology, 44000, Islamabad, Pakistan
| | - Christoph Zehe
- Sartorius Corporate Research, Sartorius Stedim Cellca GmbH, 89081 Ulm, Germany
| | - Olivier Cloarec
- Sartorius Corporate Research, Sartorius Stedim Cellca GmbH, 89081 Ulm, Germany
| | - Johan Trygg
- Computational Life Science Cluster (CLiC), Umeå University, 90187 Umea, Sweden
- Sartorius Corporate Research, Sartorius Stedim Data Analytics, 90333 Umea, Sweden
| | - Andreas Dengel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| |
Collapse
|
8
|
Nakai K, Wei L. Recent Advances in the Prediction of Subcellular Localization of Proteins and Related Topics. FRONTIERS IN BIOINFORMATICS 2022; 2:910531. [PMID: 36304291 PMCID: PMC9580943 DOI: 10.3389/fbinf.2022.910531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open
Abstract
Prediction of subcellular localization of proteins from their amino acid sequences has a long history in bioinformatics and is still actively developing, incorporating the latest advances in machine learning and proteomics. Notably, deep learning-based methods for natural language processing have made great contributions. Here, we review recent advances in the field as well as its related fields, such as subcellular proteomics and the prediction/recognition of subcellular localization from image data.
Collapse
Affiliation(s)
- Kenta Nakai
- Institute of Medical Science, The University of Tokyo, Minato-Ku, Japan
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China
| |
Collapse
|