1
|
Deng X, Liu L. BiGM-lncLoc: Bi-level Multi-Graph Meta-Learning for Predicting Cell-Specific Long Noncoding RNAs Subcellular Localization. Interdiscip Sci 2025; 17:359-374. [PMID: 39724386 DOI: 10.1007/s12539-024-00679-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 11/11/2024] [Accepted: 11/18/2024] [Indexed: 12/28/2024]
Abstract
The precise spatiotemporal expression of long noncoding RNAs (lncRNAs) plays a pivotal role in biological regulation, and aberrant expression of lncRNAs in different subcellular localizations has been intricately linked to the onset and progression of a variety of cancers. Computational methods provide effective means for predicting lncRNA subcellular localization, but current studies either ignore cell line and tissue specificity or the correlation and shared information among cell lines. In this study, we propose a novel approach, BiGM-lncLoc, treating the prediction of lncRNA subcellular localization across cell lines as a multi-graph meta-learning task. Our investigation involves two categories of data: the localization data of nucleotide sequences in different cell lines and cell line expression data. BiGM-lncLoc comprises a cell line-specific optimization network learning specific knowledge from cell line expression data and a graph neural network optimized across cell lines. Subsequently, the specific and shared knowledge acquired through bi-level optimization is applied to a new cell-line prediction task without the need for re-training or fine-tuning. Additionally, through key feature analysis of the impact of different nucleotide combinations on the model, we confirm the necessity of cell line-specific studies based on correlation analysis. Finally, experiments conducted on various cell lines with different data sizes indicate that BiGM-lncLoc outperforms other methods in terms of prediction accuracy, with an average accuracy of 97.7%. After removing overlapping samples to ensure data independence for each cell line, the accuracy ranged from 82.4% to 94.7%, still surpassing existing models. Our code can be found at https://github.com/BioCL1/BiGM-lncLoc .
Collapse
Affiliation(s)
- Xi Deng
- School of Information, Yunnan Normal University, Kunming, 650500, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming, 650500, China.
- Department of Education of Yunnan Province, Engineering Research Center of Computer Vision and Intelligent Control Technology, Kunming, 650500, China.
| |
Collapse
|
2
|
Zhang L, Gao S, Yuan Q, Fu Y, Yang R. An ensemble learning method combined with multiple feature representation strategies to predict lncRNA subcellular localizations. Comput Biol Chem 2025; 115:108336. [PMID: 39752849 DOI: 10.1016/j.compbiolchem.2024.108336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 10/26/2024] [Accepted: 12/25/2024] [Indexed: 02/26/2025]
Abstract
Long non-coding RNAs (lncRNAs) are strongly associated with cellular physiological mechanisms and implicated in the numerous diseases. By exploring the subcellular localizations of lncRNAs, we can not only gain crucial insights into the molecular mechanisms of lncRNA-related biological processes but also make valuable contributions towards the diagnosis, prevention, and treatment of various human diseases. However, conventional experimental techniques tend to be laborious and time-intensive. In this context, computational methods are in increased demand. The focus of this paper is the development of an innovative ensemble method that incorporates hybrid features to accurately predict the subcellular localizations of lncRNAs. To address the issue of incomplete reflection of inherent correlation with the intended target using singular source features, the utilization of heterogeneous multi-source features is implemented by introducing information on sequence composition, physicochemical properties, and structure. To address the issue of the imbalance classes in the benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed. Finally, the resulting predictor termed lncSLPre is developed by integrating the outputs of the individual classifiers. Experimental findings suggest that the complementarity of multi-source heterogeneous features improves prediction performance. Additionally, it is demonstrated that the application of SMOTE is effective in mitigating the issue of the imbalanced dataset, while the feature selection approach is critical in eliminating extraneous and redundant features. Compared with existing advanced methods, lncSLPre achieves better performance with an overall accuracy improvement of 13.13%, 2.15%, and 3.23%, respectively, indicating that lncSLPre can effectively predict lncRNA subcellular localizations.
Collapse
Affiliation(s)
- Lina Zhang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| | - Sizan Gao
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| | - Qinghao Yuan
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| | - Yao Fu
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| | - Runtao Yang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| |
Collapse
|
3
|
Zeng M, Zhang X, Li Y, Lu C, Yin R, Guo F, Li M. RNALoc-LM: RNA subcellular localization prediction using pre-trained RNA language model. Bioinformatics 2025; 41:btaf127. [PMID: 40119908 PMCID: PMC11978386 DOI: 10.1093/bioinformatics/btaf127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 02/28/2025] [Accepted: 03/19/2025] [Indexed: 03/25/2025] Open
Abstract
MOTIVATION Accurately predicting RNA subcellular localization is crucial for understanding the cellular functions and regulatory mechanisms of RNAs. Although many computational methods have been developed to predict the subcellular localization of lncRNAs, miRNAs, and circRNAs, very few of them are designed to simultaneously predict the subcellular localization of multiple types of RNAs. In addition, the emergence of pre-trained RNA language model has shown remarkable performance in various bioinformatics tasks, such as structure prediction and functional annotation. Despite these advancements, there remains a significant gap in applying pre-trained RNA language models specifically for predicting RNA subcellular localization. RESULTS In this study, we proposed RNALoc-LM, the first interpretable deep-learning framework that leverages a pre-trained RNA language model for predicting RNA subcellular localization. RNALoc-LM uses a pre-trained RNA language model to encode RNA sequences, then captures local patterns and long-range dependencies through TextCNN and BiLSTM modules. A multi-head attention mechanism is used to focus on important regions within the RNA sequences. The results demonstrate that RNALoc-LM significantly outperforms both deep-learning baselines and existing state-of-the-art predictors. Additionally, motif analysis highlights RNALoc-LM's potential for discovering important motifs, while an ablation study confirms the effectiveness of the RNA sequence embeddings generated by the pre-trained RNA language model. AVAILABILITY AND IMPLEMENTATION The RNALoc-LM web server is available at http://csuligroup.com:8000/RNALoc-LM. The source code can be obtained from https://github.com/CSUBioGroup/RNALoc-LM.
Collapse
Affiliation(s)
- Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Xinyu Zhang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yiming Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Chengqian Lu
- School of Computer Science, Key Laboratory of Intelligent Computing and Information Processing, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Rui Yin
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32603, United States
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| |
Collapse
|
4
|
Wang S, Yu ZG, Han GS, Sun XG. CFPLncLoc: A multi-label lncRNA subcellular localization prediction based on Chaos game representation and centralized feature pyramid. Int J Biol Macromol 2025; 297:139519. [PMID: 39761904 DOI: 10.1016/j.ijbiomac.2025.139519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 01/01/2025] [Accepted: 01/03/2025] [Indexed: 01/20/2025]
Abstract
There is increasing evidence that the subcellular localization of long noncoding RNAs (lncRNAs) can provide valuable insights into their biological functions. In terms of transcriptomes, lncRNAs were usually found in multiple subcellular localizations. Although several computational methods have been developed to predict the subcellular localization of lncRNAs, few of them were designed for lncRNAs that have multiple subcellular localizations. In this study, we propose a novel deep learning model, called CFPLncLoc, which uses chaos game representation (CGR) images of lncRNA sequences to predict multi-label lncRNA subcellular localization. CFPLncLoc utilizes image update strategy (IUS) to enhance the relative feature representation of the CGR images. To extract higher-level features from CGR images, CFPLncLoc introduces the multi-scale feature fusion (MFF) model, centralized feature pyramid (CFP), from the field of computer vision (CV). Ablation studies confirmed the contribution of the IUS and CFP in improving the prediction performance. Statistical test results verify that CFPLncLoc outperforms existing state-of-the-art predictors under the evaluation metric MaAUC on the hold-out/independent test set. The source code can be obtained from https://github.com/ShengWang-XTU/CFPLncLoc.
Collapse
Affiliation(s)
- Sheng Wang
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China
| | - Zu-Guo Yu
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China.
| | - Guo-Sheng Han
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China.
| | - Xin-Gen Sun
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China
| |
Collapse
|
5
|
Hu W, Yue Y, Yan R, Guan L, Li M. An ensemble deep learning framework for multi-class LncRNA subcellular localization with innovative encoding strategy. BMC Biol 2025; 23:47. [PMID: 39984880 PMCID: PMC11846348 DOI: 10.1186/s12915-025-02148-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 02/03/2025] [Indexed: 02/23/2025] Open
Abstract
BACKGROUND Long non-coding RNA (LncRNA) play pivotal roles in various cellular processes, and elucidating their subcellular localization can offer crucial insights into their functional significance. Accurate prediction of lncRNA subcellular localization is of paramount importance. Despite numerous computational methods developed for this purpose, existing approaches still encounter challenges stemming from the complexity of data representation and the difficulty in capturing nucleotide distribution information within sequences. RESULTS In this study, we propose a novel deep learning-based model, termed MGBLncLoc, which incorporates a unique multi-class encoding technique known as generalized encoding based on the Distribution Density of Multi-Class Nucleotide Groups (MCD-ND). This encoding approach enables more precise reflection of nucleotide distributions, distinguishing between constant and discriminative regions within sequences, thereby enhancing prediction performance. Additionally, our deep learning model integrates advanced neural network modules, including Multi-Dconv Head Transposed Attention, Gated-Dconv Feed-forward Network, Convolutional Neural Network, and Bidirectional Gated Recurrent Unit, to comprehensively exploit sequence features of lncRNA. CONCLUSIONS Comparative analysis against commonly used sequence feature encoding methods and existing prediction models validates the effectiveness of MGBLncLoc, demonstrating superior performance. This research offers novel insights and effective solutions for predicting lncRNA subcellular localization, thereby providing valuable support for related biological investigations.
Collapse
Affiliation(s)
- Wenxing Hu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Yan Yue
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Ruomei Yan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Lixin Guan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Mengshan Li
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China.
| |
Collapse
|
6
|
Ma D, Wu Z, Zhang M, Mao J, Xu W, Jiang L, Wang Z. Glutathiones' life in multi-cancers: especially their potential micropetides in liver hepatocellular carcinoma. Discov Oncol 2025; 16:201. [PMID: 39966283 PMCID: PMC11836257 DOI: 10.1007/s12672-025-01945-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 02/05/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND Glutathione plays critical roles in detoxifying xenobiotics, cell signaling, cell death and the antioxidant defence in an emerging body of evidence, the most abundant intracellular low molecular weight thiol in tissues. However, all glutathione metabolism pertinent genes (GMPGs) expression and their diagnostic/prognostic/micropeptide potential analyses have not been investigated to perform in pan-cancers. METHODS We gained GMPGs from the MsigDB 7.2, 12,123 samples were used to reveal the differentially expressed genes (DEGs) and the survival analysis in 32 types of cancers from TCGA, GTEx, and GEO datasets for the first time. All statistical analyses were performed by R for bioinformatics, such as DEGs, prognostic, diagnostic analysis, ceRNA, micropeptide prediction and immune infiltration. In addition, we utilized siRNA technology to target knockdown the expression of the G6PD gene in Huh7 hepatocellular carcinoma cells. RESULTS G6PD was significantly expressed and poor prognosis in liver hepatocellular carcinoma (LIHC) and predicted RBM26-AS1 encoded micropeptide might target G6PD in LIHC. In vitro experiments show that G6PD knockout in Huh7 cells reduces their proliferation, migration, and invasion capabilities. CONCLUSIONS We confirmed that G6PD played a crucial role in the occurrence and progression of LIHC. G6PD is positively associated with Th2 cells in LIHC, regulating immune responses in the immune system. We considered that micropeptide RBM26-AS1 might be a new player involved in LIHC by interacting with G6PD, might perform a key function in liver cancer.
Collapse
Affiliation(s)
- Didi Ma
- Anhui Province Key Laboratory of Non-Coding RNA Basic and Clinical Transformation (Wannan Medical College), Central Laboratory, Yijishan Hospital of Wannan Medical College, Wuhu, 241000, China
| | - Zhenguo Wu
- Anhui Province Key Laboratory of Non-Coding RNA Basic and Clinical Transformation (Wannan Medical College), Central Laboratory, Yijishan Hospital of Wannan Medical College, Wuhu, 241000, China
| | - Mengying Zhang
- Anhui Province Key Laboratory of Non-Coding RNA Basic and Clinical Transformation (Wannan Medical College), Central Laboratory, Yijishan Hospital of Wannan Medical College, Wuhu, 241000, China
| | - Jian Mao
- Yangtze River Delta Information Intelligence Innovation Research Institute, Wuhu, 241000, China
| | - Wenqin Xu
- Anhui Province Key Laboratory of Non-Coding RNA Basic and Clinical Transformation (Wannan Medical College), Central Laboratory, Yijishan Hospital of Wannan Medical College, Wuhu, 241000, China
| | - Lan Jiang
- Anhui Province Key Laboratory of Non-Coding RNA Basic and Clinical Transformation (Wannan Medical College), Central Laboratory, Yijishan Hospital of Wannan Medical College, Wuhu, 241000, China.
| | - Zuzhen Wang
- Anhui Province Key Laboratory of Non-Coding RNA Basic and Clinical Transformation (Wannan Medical College), Central Laboratory, Yijishan Hospital of Wannan Medical College, Wuhu, 241000, China.
- Center of Reproductive Medicine, Yijishan Hospital of Wannan Medical College, Wuhu, China.
| |
Collapse
|
7
|
Wang S, Yu ZG, Han GS. MVSLLnc: LncRNA subcellular localization prediction based on multi-source features and two-stage voting strategy. Methods 2025; 234:324-332. [PMID: 39837434 DOI: 10.1016/j.ymeth.2025.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 12/28/2024] [Accepted: 01/16/2025] [Indexed: 01/23/2025] Open
Abstract
The subcellular localization of long non-coding RNAs (lncRNAs) is crucial for understanding the function of lncRNAs. Since the traditional biological experimental methods are time-consuming and some existing computational methods rely on high computing power, we are committed to finding a simple and easy-to-implement method to achieve more efficient prediction of the subcellular localization of lncRNAs. In this work, we proposed a model based on multi-source features and two-stage voting strategy for predicting the subcellular localization of lncRNAs (MVSLLnc). The multi-source features include k-mer frequency, features based on the coordinate values of Chaos Game Representation (CGR) and features based on physicochemical property (PhyChe). We feed the multi-source features into the traditional machine learning classifiers RF, SVM and XGBoost, respectively, and perform the final prediction task with two-stage voting strategy. Experimental results on three benchmark datasets show that the accuracy can reach 0.829, 0.793 and 0.968, respectively. The accuracy on three independent test sets is 0.642, 0.737 and 0.518, respectively, which are competitive with the existing methods. Our ablation analyses show that the two-stage voting strategy can make full use of the advantages of multi-source features and multiple classifiers, and obtain more robust results.
Collapse
Affiliation(s)
- Sheng Wang
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China
| | - Zu-Guo Yu
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China.
| | - Guo-Sheng Han
- National Center for Applied Mathematics in Hunan, Xiangtan University, Hunan 411105, China; Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, China.
| |
Collapse
|
8
|
Wu L, Wang L, Hu S, Tang G, Chen J, Yi Y, Xie H, Lin J, Wang M, Wang D, Yang B, Huang Y. RNALocate v3.0: Advancing the Repository of RNA Subcellular Localization with Dynamic Analysis and Prediction. Nucleic Acids Res 2025; 53:D284-D292. [PMID: 39404071 PMCID: PMC11701552 DOI: 10.1093/nar/gkae872] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Revised: 09/18/2024] [Accepted: 09/24/2024] [Indexed: 01/18/2025] Open
Abstract
Subcellular localization of RNA is a crucial mechanism for regulating diverse biological processes within cells. Dynamic RNA subcellular localizations are essential for maintaining cellular homeostasis; however, their distribution and changes during development and differentiation remain largely unexplored. To elucidate the dynamic patterns of RNA distribution within cells, we have upgraded RNALocate to version 3.0, a repository for RNA-subcellular localization (http://www.rnalocate.org/ or http://www.rna-society.org/rnalocate/). RNALocate v3.0 incorporates and analyzes RNA subcellular localization sequencing data from over 850 samples, with a specific focus on the dynamic changes in subcellular localizations under various conditions. The species coverage has also been expanded to encompass mammals, non-mammals, plants and microbes. Additionally, we provide an integrated prediction algorithm for the subcellular localization of seven RNA types across eleven subcellular compartments, utilizing convolutional neural networks (CNNs) and transformer models. Overall, RNALocate v3.0 contains a total of 1 844 013 RNA-localization entries covering 26 RNA types, 242 species and 177 subcellular localizations. It serves as a comprehensive and readily accessible data resource for RNA-subcellular localization, facilitating the elucidation of cellular function and disease pathogenesis.
Collapse
Affiliation(s)
- Le Wu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, No.1023, South Shatai Road, Baiyun District, Guangzhou 510515, China
| | - Luqi Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, No.1023, South Shatai Road, Baiyun District, Guangzhou 510515, China
| | - Shijie Hu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, No.1023, South Shatai Road, Baiyun District, Guangzhou 510515, China
- Department of Pathology, Harbin Medical University, 157th Rd of Baojian, Nangang Distinct, Harbin 150081, China
| | - Guangjue Tang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, No.1023, South Shatai Road, Baiyun District, Guangzhou 510515, China
| | - Jia Chen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, No.1023, South Shatai Road, Baiyun District, Guangzhou 510515, China
| | - Ying Yi
- Dermatology Hospital, Southern Medical University, No.2, Lujing Road, Yuexiu District, Guangzhou 510091, China
| | - Hailong Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, No.1023, South Shatai Road, Baiyun District, Guangzhou 510515, China
| | - Jiahao Lin
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, No.1023, South Shatai Road, Baiyun District, Guangzhou 510515, China
| | - Mei Wang
- State Key Laboratory of Organ Failure Research, Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, No.1023, South Shatai Road, Baiyun District, Guangzhou 510515, China
| | - Dong Wang
- Dermatology Hospital, Southern Medical University, No.2, Lujing Road, Yuexiu District, Guangzhou 510091, China
- Department of Bioinformatics, Guangdong Province Key Laboratory of Molecular Tumor Pathology, School of Basic Medical Sciences, Southern Medical University, No.1023, South Shatai Road, Baiyun District, Guangzhou 510515, China
| | - Bin Yang
- Dermatology Hospital, Southern Medical University, No.2, Lujing Road, Yuexiu District, Guangzhou 510091, China
| | - Yan Huang
- Cancer Research Institute, School of Basic Medical Sciences, Southern Medical University, No.1023, South Shatai Road, Baiyun District, Guangzhou 510515, China
| |
Collapse
|
9
|
Zhu L, Chen H, Yang S. LncSL: A Novel Stacked Ensemble Computing Tool for Subcellular Localization of lncRNA by Amino Acid-Enhanced Features and Two-Stage Automated Selection Strategy. Int J Mol Sci 2024; 25:13734. [PMID: 39769496 PMCID: PMC11678684 DOI: 10.3390/ijms252413734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 12/17/2024] [Accepted: 12/19/2024] [Indexed: 01/11/2025] Open
Abstract
Long non-coding RNA (lncRNA) is a non-coding RNA longer than 200 nucleotides, crucial for functions like cell cycle regulation and gene transcription. Accurate localization prediction from sequence information is vital for understanding lncRNA's biological roles. Computational methods offer an effective alternative to traditional experimental methods for annotating lncRNA subcellular positions. Existing machine learning-based methods are limited and often overlook regions with coding potential that affect the function of lncRNA. Therefore, we propose a new model called LncSL. For feature encoding, both lncRNA sequences and amino acid sequences from open reading frames (ORFs) are employed. And we selected the most suitable features by CatBoost and integrated them into a new feature set. Additionally, a voting process with seven feature selection algorithms identified the higher contributive features for training our final stacked model. Additionally, an automatic model selection strategy is constructed to find a better performance meta-model for assembling LncSL. This study specifically focuses on predicting the subcellular localization of lncRNA in the nucleus and cytoplasm. On two benchmark datasets called S1 and S2 datasets, LncSL outperformed existing methods by 6.3% to 12.3% in the Matthew's correlation coefficient on a balanced test dataset. On an unbalanced independent test dataset sourced from S1, LncSL improved by 4.7% to 18.6% in the Matthew's correlation coefficient, which further demonstrates that LncSL is superior to other compared methods. In all, this study presents an effective method for predicting lncRNA subcellular localization through enhancing sequence information, which is always overlooked by traditional methods, and addressing contributive meta-model selection problems, which can offer new insights for other bioinformatics problems.
Collapse
Affiliation(s)
| | | | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China; (L.Z.); (H.C.)
| |
Collapse
|
10
|
Poloni JF, Oliveira FHS, Feltes BC. Localization is the key to action: regulatory peculiarities of lncRNAs. Front Genet 2024; 15:1478352. [PMID: 39737005 PMCID: PMC11683014 DOI: 10.3389/fgene.2024.1478352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Accepted: 11/27/2024] [Indexed: 01/01/2025] Open
Abstract
To understand the transcriptomic profile of an individual cell in a multicellular organism, we must comprehend its surrounding environment and the cellular space where distinct molecular stimuli responses are located. Contradicting the initial perception that RNAs were nonfunctional and that only a few could act in chromatin remodeling, over the last few decades, research has revealed that they are multifaceted, versatile regulators of most cellular processes. Among the various RNAs, long non-coding RNAs (LncRNAs) regulate multiple biological processes and can even impact cell fate. In this sense, the subcellular localization of lncRNAs is the primary determinant of their functions. It affects their behavior by limiting their potential molecular partner and which process it can affect. The fine-tuned activity of lncRNAs is also tissue-specific and modulated by their cis and trans regulation. Hence, the spatial context of lncRNAs is crucial for understanding the regulatory networks by which they influence and are influenced. Therefore, predicting a lncRNA's correct location is not just a technical challenge but a critical step in understanding the biological meaning of its activity. Hence, examining these peculiarities is crucial to researching and discussing lncRNAs. In this review, we debate the spatial regulation of lncRNAs and their tissue-specific roles and regulatory mechanisms. We also briefly highlight how bioinformatic tools can aid research in the area.
Collapse
Affiliation(s)
| | | | - Bruno César Feltes
- Department of Biophysics, Laboratory of DNA Repair and Aging, Institute of Biosciences, Federal University of Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| |
Collapse
|
11
|
Zeng Y, Guo T, Feng L, Yin Z, Luo H, Yin H. Insights into lncRNA-mediated regulatory networks in Hevea brasiliensis under anthracnose stress. PLANT METHODS 2024; 20:182. [PMID: 39633437 PMCID: PMC11619270 DOI: 10.1186/s13007-024-01301-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Accepted: 11/08/2024] [Indexed: 12/07/2024]
Abstract
In recent years, long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) have emerged as critical regulators in plant biology, governing complex gene regulatory networks. In the context of disease resistance in Hevea brasiliensis, the rubber tree, significant progress has been made in understanding its response to anthracnose disease, a serious threat posed by fungal pathogens impacting global rubber tree cultivation and latex quality. While advances have been achieved in unraveling the genetic and molecular foundations underlying anthracnose resistance, gaps persist in comprehending the regulatory roles of lncRNAs and miRNAs under such stress conditions. The specific contributions of these non-coding RNAs in orchestrating molecular responses against anthracnose in H. brasiliensis remain unclear, necessitating further exploration to uncover strategies that increase disease resistance. Here, we integrate lncRNA sequencing, miRNA sequencing, and degradome sequencing to decipher the regulatory landscape of lncRNAs and miRNAs in H. brasiliensis under anthracnose stress. We investigated the genomic and regulatory profiles of differentially expressed lncRNAs (DE-lncRNAs) and constructed a competitive endogenous RNA (ceRNA) regulatory network in response to pathogenic infection. Additionally, we elucidated the functional roles of HblncRNA29219 and its antisense hbr-miR482a, as well as the miR390-TAS3-ARF pathway, in enhancing anthracnose resistance. These findings provide valuable insights into plant-microbe interactions and hold promising implications for advancing agricultural crop protection strategies. This comprehensive analysis sheds light on non-coding RNA-mediated regulatory mechanisms in H. brasiliensis under pathogen stress, establishing a foundation for innovative approaches aimed at enhancing crop resilience and sustainability in agriculture.
Collapse
Affiliation(s)
- Yanluo Zeng
- School of Tropical Agriculture and Forestry, Hainan University, Haikou, Hainan, China
| | - Tianbin Guo
- School of Tropical Agriculture and Forestry, Hainan University, Haikou, Hainan, China
| | - Liping Feng
- School of Breeding and Multiplication, Hainan University, Haikou, Hainan, China
| | - Zhuoda Yin
- TJ-YZ School of Network Science, Haikou University of Economics, Haikou, China
| | - Hongli Luo
- School of Breeding and Multiplication, Hainan University, Haikou, Hainan, China.
| | - Hongyan Yin
- School of Tropical Agriculture and Forestry, Hainan University, Haikou, Hainan, China.
| |
Collapse
|
12
|
Wang K, Hu Y, Li S, Chen M, Li Z. LncLSTA: a versatile predictor unveiling subcellular localization of lncRNAs through long-short term attention. BIOINFORMATICS ADVANCES 2024; 5:vbae173. [PMID: 39758831 PMCID: PMC11700581 DOI: 10.1093/bioadv/vbae173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 10/20/2024] [Accepted: 11/07/2024] [Indexed: 01/07/2025]
Abstract
Motivation Much evidence suggests that the subcellular localization of long-stranded noncoding RNAs (LncRNAs) provides key insights for the study of their biological function. Results This study proposes a novel deep learning framework, LncLSTA, designed for predicting the subcellular localization of LncRNAs. It firstly exploits LncRNA sequence, electron-ion interaction pseudopotentials, and nucleotide chemical property as feature inputs. Departing from conventional k-mer approaches, this model uses a set of 1D convolutional and maxpooling operations for dynamical feature aggregation. Furthermore, LncLSTA integrates a long-short term attention module with a bidirectional long and short term memory network to comprehensively extract sequence information. In addition, it incorporates a TextCNN module to enhance accuracy and robustness in subcellular localization tasks. Experimental results demonstrate the efficacy of LncLSTA, showcasing its superior performance compared to other state-of-the-art methods. Notably, LncLSTA exhibits the transfer learning capability, extending its utility to predict the subcellular localization prediction of mRNAs, while maintaining consistently satisfactory prediction results. This research contributes valuable insights into understanding the biological functions of LncRNAs through subcellular localization, emphasizing the potential of deep learning approaches in advancing RNA-related studies. Availability and implementation The source code is publicly available at https://bis.zju.edu.cn/LncLSTA.
Collapse
Affiliation(s)
- Kai Wang
- School of Information Engineering, Huzhou University, Huzhou, Zhejiang 313000, China
- School of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
| | - Yueming Hu
- College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310003, China
| | - Sida Li
- College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310003, China
| | - Ming Chen
- College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310003, China
| | - Zhong Li
- School of Information Engineering, Huzhou University, Huzhou, Zhejiang 313000, China
- School of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
- College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310003, China
| |
Collapse
|
13
|
Li J, Liu R, Hu H, Huang Y, Shi Y, Li H, Chen H, Cai M, Wang N, Yan T, Wang K, Liu H. Methionine deprivation inhibits glioma proliferation and EMT via the TP53TG1/miR-96-5p/STK17B ceRNA pathway. NPJ Precis Oncol 2024; 8:270. [PMID: 39572759 PMCID: PMC11582638 DOI: 10.1038/s41698-024-00763-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 11/11/2024] [Indexed: 11/24/2024] Open
Abstract
Recent research highlights the significant impact of methionine metabolism on glioma progression. An increasing amount of compelling evidence bridges long non-coding RNAs to abnormal metabolism in gliomas. However, the specific role of long non-coding RNAs in methionine metabolism regulating glioma progression remains unclear. This study reveals that methionine deprivation inhibits the proliferation, migration, and invasion capabilities of gliomas. Interestingly, the expression of TP53TG1, a long non-coding RNA, is also suppressed. TP53TG1 is highly expressed in gliomas and associated with poor patient outcomes. Subsequently, our data proves that inhibition of TP53TG1 suppresses glioma cell proliferation and the epithelial-mesenchymal transition process both in vitro and in vivo. Ultimately, we found that the underlying mechanism involves a competing endogenous RNA regulating network, in which TP53TG1 modulates the target protein STK17B by competitively binding to miR-96-5p, thus regulating glioma progression. These findings suggest that targeting methionine deprivation could be a promising approach for the clinical treatment of glioma.
Collapse
Affiliation(s)
- Jiafeng Li
- Department of Neurosurgery, First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
- Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, 150001, Heilongjiang Province, China
- Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
| | - Ruijie Liu
- Department of Neurosurgery, First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
- Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, 150001, Heilongjiang Province, China
- Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
| | - Hong Hu
- Department of Neurosurgery, First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
- Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, 150001, Heilongjiang Province, China
- Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
| | - Yishuai Huang
- Department of Neurosurgery, First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
- Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, 150001, Heilongjiang Province, China
- Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
| | - Ying Shi
- Departments of Magnetic Resonance, The First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
| | - Honglei Li
- Department of Neurosurgery, First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
- Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, 150001, Heilongjiang Province, China
- Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
| | - Hao Chen
- Department of Neurosurgery, First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
- Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, 150001, Heilongjiang Province, China
- Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
| | - Meng Cai
- Department of Neurosurgery, First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
- Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, 150001, Heilongjiang Province, China
- Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
| | - Ning Wang
- Department of Critical Care Medicine, First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang Province, China
| | - Tao Yan
- Central Laboratory, Linyi People's Hospital, Linyi, 276000, Shandong Province, China
- Linyi Key Laboratory of Neurophysiology, Linyi People's Hospital, Linyi, 276000, Shandong Province, China
| | - Kaikai Wang
- Department of Neurosurgery, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Zhejiang Province, Hangzhou, PR China.
| | - Huailei Liu
- Department of Neurosurgery, First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang Province, China.
- Key Colleges and Universities Laboratory of Neurosurgery in Heilongjiang Province, Harbin, 150001, Heilongjiang Province, China.
- Institute of Neuroscience, Sino-Russian Medical Research Center, Harbin Medical University, Harbin, 150001, Heilongjiang Province, China.
| |
Collapse
|
14
|
Shakoori A, Hosseinzadeh A, Nafisi N, Omranipour R, Sahebi L, Nazanin Hosseinkhan, Ahmadi M, Ghafouri-Fard S, Abtin M. Importance of LINC00852/miR-145-5p in breast cancer: a bioinformatics and experimental study. Discov Oncol 2024; 15:672. [PMID: 39557729 PMCID: PMC11574217 DOI: 10.1007/s12672-024-01553-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 11/07/2024] [Indexed: 11/20/2024] Open
Abstract
PURPOSE We aimed to examine the importance of an lncRNA, namely LINC00852, in the pathogenesis of breast cancer. MATERIALS AND METHODS In the current study, we used several online tools to examine the importance of LINC00852 in breast cancer. Then, we examined these findings in 50 pairs of breast cancer tissues and adjacent non-cancerous ones. We also re-evaluated the data of miR-145-5p signature from our recent study. RESULTS While in silico tools revealed down-regulation of LINC00852 in breast cancer samples, expression assays showed significant up-regulation of this lncRNAs in breast cancer samples compared with matching control samples from Iranian patients. miR-145-5p was under-expressed in breast cancer samples compared with non-cancerous samples. LINC00852 could separate breast cancer tissues from adjacent non-malignant tissues with an AUC value of 0.7218 (P value < 0.001). CONCLUSION The current study potentiates LINC00852/miR-145-5p axis as a possible contributor to the pathogenesis of breast cancer.
Collapse
Affiliation(s)
- Abbas Shakoori
- Department of Medical Genetics, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
- Department of Medical Genetics, Cancer Institute of Iran, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Dr. Qarib St., Keshavarz Blvd, Tehran, Iran
| | | | - Nahid Nafisi
- Surgery Department, Rasoul Akram Hospital, Clinical Research Development Center (RCRDC), Iran University of Medical Sciences, Tehran, Iran
| | - Ramesh Omranipour
- Breast Disease Research Center (BDRC), Tehran University of Medical Sciences, Tehran, Iran
- Department of Surgical Oncology, Cancer Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Leyla Sahebi
- Maternal, Fetal and Neonatal Research Center, Family Health Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Nazanin Hosseinkhan
- Endocrine Research Center, Institute of Endocrinology and Metabolism, Iran University of Medical Sciences (IUMS), Tehran, Iran
| | - Mohsen Ahmadi
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Soudeh Ghafouri-Fard
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Maryam Abtin
- Department of Medical Genetics, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
15
|
Ren J, Guo Z, Qi Y, Zhang Z, Liu L. Prediction of YY1 loop anchor based on multi-omics features. Methods 2024; 232:96-106. [PMID: 39521361 DOI: 10.1016/j.ymeth.2024.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 10/22/2024] [Accepted: 11/06/2024] [Indexed: 11/16/2024] Open
Abstract
The three-dimensional structure of chromatin is crucial for the regulation of gene expression. YY1 promotes enhancer-promoter interactions in a manner analogous to CTCF-mediated chromatin interactions. However, little is known about which YY1 binding sites can form loop anchors. In this study, the LightGBM model was used to predict YY1-loop anchors by integrating multi-omics data. Due to the large imbalance in the number of positive and negative samples, we use AUPRC to reflect the quality of the classifier. The results show that the LightGBM model exhibits strong predictive performance (AUPRC≥0.93). To verify the robustness of the model, the dataset was divided into training and test sets at a 4:1 ratio. The results show that the model performs well for YY1-loop anchor prediction on both the training and independent test sets. Additionally, we ranked the importance of the features and found that the formation of YY1-loop anchors is primarily influenced by the co-binding of transcription factors CTCF, SMC3, and RAD21, as well as histone modifications and sequence context.
Collapse
Affiliation(s)
- Jun Ren
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China; School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Zhiling Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yixuan Qi
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China; School of Mathematics and Statistics, Hainan Normal University, Haikou, China; School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zheng Zhang
- Computer Science and Information Systems, Murray State University, Murray, USA
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| |
Collapse
|
16
|
Du C, Fan W, Zhou Y. Integrated Biochemical and Computational Methods for Deciphering RNA-Processing Codes. WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1875. [PMID: 39523464 DOI: 10.1002/wrna.1875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 09/23/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
RNA processing involves steps such as capping, splicing, polyadenylation, modification, and nuclear export. These steps are essential for transforming genetic information in DNA into proteins and contribute to RNA diversity and complexity. Many biochemical methods have been developed to profile and quantify RNAs, as well as to identify the interactions between RNAs and RNA-binding proteins (RBPs), especially when coupled with high-throughput sequencing technologies. With the rapid accumulation of diverse data, it is crucial to develop computational methods to convert the big data into biological knowledge. In particular, machine learning and deep learning models are commonly utilized to learn the rules or codes governing the transformation from DNA sequences to intriguing RNAs based on manually designed or automatically extracted features. When precise enough, the RNA codes can be incredibly useful for predicting RNA products, decoding the molecular mechanisms, forecasting the impact of disease variants on RNA processing events, and identifying driver mutations. In this review, we systematically summarize the biochemical and computational methods for deciphering five important RNA codes related to alternative splicing, alternative polyadenylation, RNA localization, RNA modifications, and RBP binding. For each code, we review the main types of experimental methods used to generate training data, as well as the key features, strategic model structures, and advantages of representative tools. We also discuss the challenges encountered in developing predictive models using large language models and extensive domain knowledge. Additionally, we highlight useful resources and propose ways to improve computational tools for studying RNA codes.
Collapse
Affiliation(s)
- Chen Du
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Wuhan University, Wuhan, China
| | - Weiliang Fan
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Wuhan University, Wuhan, China
| | - Yu Zhou
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Wuhan University, Wuhan, China
- Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, China
- State Key Laboratory of Virology, Wuhan University, Wuhan, China
| |
Collapse
|
17
|
Luo Z, Yu L, Xu Z, Liu K, Gu L. Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites. BIOLOGY 2024; 13:777. [PMID: 39452086 PMCID: PMC11504118 DOI: 10.3390/biology13100777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 09/19/2024] [Accepted: 09/23/2024] [Indexed: 10/26/2024]
Abstract
N6-methyladenosine (m6A) plays a crucial regulatory role in the control of cellular functions and gene expression. Recent advances in sequencing techniques for transcriptome-wide m6A mapping have accelerated the accumulation of m6A site information at a single-nucleotide level, providing more high-confidence training data to develop computational approaches for m6A site prediction. However, it is still a major challenge to precisely predict m6A sites using in silico approaches. To advance the computational support for m6A site identification, here, we curated 13 up-to-date benchmark datasets from nine different species (i.e., H. sapiens, M. musculus, Rat, S. cerevisiae, Zebrafish, A. thaliana, Pig, Rhesus, and Chimpanzee). This will assist the research community in conducting an unbiased evaluation of alternative approaches and support future research on m6A modification. We revisited 52 computational approaches published since 2015 for m6A site identification, including 30 traditional machine learning-based, 14 deep learning-based, and 8 ensemble learning-based methods. We comprehensively reviewed these computational approaches in terms of their training datasets, calculated features, computational methodologies, performance evaluation strategy, and webserver/software usability. Using these benchmark datasets, we benchmarked nine predictors with available online websites or stand-alone software and assessed their prediction performance. We found that deep learning and traditional machine learning approaches generally outperformed scoring function-based approaches. In summary, the curated benchmark dataset repository and the systematic assessment in this study serve to inform the design and implementation of state-of-the-art computational approaches for m6A identification and facilitate more rigorous comparisons of new methods in the future.
Collapse
Affiliation(s)
- Zhengtao Luo
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China;
- Anhui Provincial Key Laboratory of Smart Agriculture Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
| | - Liyi Yu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China; (L.Y.); (Z.X.)
| | - Zhaochun Xu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China; (L.Y.); (Z.X.)
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin 150076, China
| | - Kening Liu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China; (L.Y.); (Z.X.)
| | - Lichuan Gu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China;
- Anhui Provincial Key Laboratory of Smart Agriculture Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
| |
Collapse
|
18
|
Li X, Li H, Yang Z, Wang L. Distribution rules of 8-mer spectra and characterization of evolution state in animal genome sequences. BMC Genomics 2024; 25:855. [PMID: 39266973 PMCID: PMC11391722 DOI: 10.1186/s12864-024-10786-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 09/09/2024] [Indexed: 09/14/2024] Open
Abstract
BACKGROUND Studying the composition rules and evolution mechanisms of genome sequences are core issues in the post-genomic era, and k-mer spectrum analysis of genome sequences is an effective means to solve this problem. RESULT We divided total 8-mers of genome sequences into 16 kinds of XY-type due to XY dinucleotides number in 8-mers. Previous works explored that the independent unimodal distributions observed only in three CG-type 8-mer spectra, while non-CG type 8-mer spectra have not the universal phenomenon from prokaryotes to eukaryotes. On this basis, we analyzed the distribution variation of non-CG type 8-mer spectra across 889 animal genome sequences. Following the evolutionary order of animals from primitive to more complex, we found that the spectrum distributions gradually transition from unimodal to tri-modal. The relative distance from the average frequency of each non-CG type 8-mers to the center frequency is different within a species and among different species. For the 8-mers contain CG dinucleotides, we further divided these into 16 subsets, where each 8-mer contains both CG and XY dinucleotides, called XY1_CG1 subsets. We found that the separability values of XY1_CG1 spectra are closely related to the evolution and specificity of animals. Considering the constraint of Chargaff's second parity rule, we finally obtained 10 separability values as the feature set to characterize the evolution state of genome sequences. In order to verify the rationality of the feature set, we used 14 common classification algorithms to perform binary classification tests. The results showed that the accuracy (Acc) ranged between 98.70% and 83.88% among birds, other vertebrates and mammals. CONCLUSION We proposed a credible feature set to characterizes the evolution state of genomes and obtained satisfied results by the feature set on large scale classification of animals.
Collapse
Affiliation(s)
- Xiaolong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Hong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China.
| | - Zhenhua Yang
- School of Economics and Management, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Lu Wang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| |
Collapse
|
19
|
Hu SL, Chen YL, Zhang LQ, Bai H, Yang JH, Li QZ. LncSTPred: a predictive model of lncRNA subcellular localization and decipherment of the biological determinants influencing localization. Front Mol Biosci 2024; 11:1452142. [PMID: 39301172 PMCID: PMC11411566 DOI: 10.3389/fmolb.2024.1452142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 08/19/2024] [Indexed: 09/22/2024] Open
Abstract
Introduction Long non-coding RNAs (lncRNAs) play crucial roles in genetic markers, genome rearrangement, chromatin modifications, and other biological processes. Increasing evidence suggests that lncRNA functions are closely related to their subcellular localization. However, the distribution of lncRNAs in different subcellular localizations is imbalanced. The number of lncRNAs located in the nucleus is more than ten times that in the exosome. Methods In this study, we propose a new oversampling method to construct a predictive dataset and develop a predictive model called LncSTPred. This model improves the Adaboost algorithm for subcellular localization prediction using 3-mer, 3-RF sequence, and minimum free energy structure features. Results and Discussion By using our improved Adaboost algorithm, better prediction accuracy for lncRNA subcellular localization was obtained. In addition, we evaluated feature importance by using the F-score and analyzed the influence of highly relevant features on lncRNAs. Our study shows that the ANA features may be a key factor for predicting lncRNA subcellular localization, which correlates with the composition of stems and loops in the secondary structure of lncRNAs.
Collapse
Affiliation(s)
- Si-Le Hu
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Ying-Li Chen
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Lu-Qiang Zhang
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Hui Bai
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Jia-Hong Yang
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Qian-Zhong Li
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University, Hohhot, China
| |
Collapse
|
20
|
Miller JR, Yi W, Adjeroh DA. Evaluation of machine learning models that predict lncRNA subcellular localization. NAR Genom Bioinform 2024; 6:lqae125. [PMID: 39296930 PMCID: PMC11409063 DOI: 10.1093/nargab/lqae125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 08/17/2024] [Accepted: 09/02/2024] [Indexed: 09/21/2024] Open
Abstract
The lncATLAS database quantifies the relative cytoplasmic versus nuclear abundance of long non-coding RNAs (lncRNAs) observed in 15 human cell lines. The literature describes several machine learning models trained and evaluated on these and similar datasets. These reports showed moderate performance, e.g. 72-74% accuracy, on test subsets of the data withheld from training. In all these reports, the datasets were filtered to include genes with extreme values while excluding genes with values in the middle range and the filters were applied prior to partitioning the data into training and testing subsets. Using several models and lncATLAS data, we show that this 'middle exclusion' protocol boosts performance metrics without boosting model performance on unfiltered test data. We show that various models achieve only about 60% accuracy when evaluated on unfiltered lncRNA data. We suggest that the problem of predicting lncRNA subcellular localization from nucleotide sequences is more challenging than currently perceived. We provide a basic model and evaluation procedure as a benchmark for future studies of this problem.
Collapse
Affiliation(s)
- Jason R Miller
- Department of Computer Science and Information Technology; Hood College, Frederick, MD 21701, USA
- Lane Department of Computer Science and Electrical Engineering; West Virginia University, Morgantown, WV 26506, USA
| | - Weijun Yi
- Lane Department of Computer Science and Electrical Engineering; West Virginia University, Morgantown, WV 26506, USA
| | - Donald A Adjeroh
- Lane Department of Computer Science and Electrical Engineering; West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
21
|
Zheng Y, Li H, Lin S. m7GRegpred: substrate prediction of N7-methylguanosine (m7G) writers and readers based on sequencing features. Front Genet 2024; 15:1469011. [PMID: 39262420 PMCID: PMC11387174 DOI: 10.3389/fgene.2024.1469011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 08/19/2024] [Indexed: 09/13/2024] Open
Abstract
N7-Methylguanosine (m7G) is important RNA modification at internal and the cap structure of five terminal end of message RNA. It is essential for RNA stability of RNA, the efficiency of translation, and various intracellular RNA processing pathways. Given the significance of the m7G modification, numerous studies have been conducted to predict m7G sites. To further elucidate the regulatory mechanisms surrounding m7G, we introduce a novel bioinformatics framework, m7GRegpred, designed to forecast the targets of the m7G methyltransferases METTL1 and WDR4, and m7G readers QKI5, QKI6, and QKI7 for the first time. We integrated different features to build predictors, with AUROC scores of 0.856, 0.857, 0.780, 0.776, 0.818 for METTL1, WDR4, QKI5, QKI6, and QKI7, respectively. In addition, the effect of window lengths and algorism were systemically evaluated in this work. The finial model was summarized in a user-friendly webserver: http://modinfor.com/m7GRegpred/. Our research indicates that the substrates of m7G regulators can be identified and may potentially advance the study of m7G regulators under unique conditions.
Collapse
Affiliation(s)
- Yu Zheng
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian, China
- School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, China
| | - Haipeng Li
- Graduate School of Fujian Medical University, Fuzhou, Fujian, China
- Department of Operating Room, Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Shaofeng Lin
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian, China
- School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, China
| |
Collapse
|
22
|
Sen S, Mukhopadhyay D. A Holistic Analysis of Alzheimer's Disease-Associated lncRNA Communities Reveals Enhanced lncRNA-miRNA-RBP Regulatory Triad Formation Within Functionally Segregated Clusters. J Mol Neurosci 2024; 74:77. [PMID: 39143264 PMCID: PMC11324768 DOI: 10.1007/s12031-024-02244-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 07/04/2024] [Indexed: 08/16/2024]
Abstract
Recent studies on the regulatory networks implicated in Alzheimer's disease (AD) evince long non-coding RNAs (lncRNAs) as crucial regulatory players, albeit a poor understanding of the mechanism. Analyzing differential gene expression in the RNA-seq data from the post-mortem AD brain hippocampus, we categorized a list of AD-dysregulated lncRNA transcripts into functionally similar communities based on their k-mer profiles. Using machine-learning-based algorithms, their subcellular localizations were mapped. We further explored the functional relevance of each community through AD-dysregulated miRNA, RNA-binding protein (RBP) interactors, and pathway enrichment analyses. Further investigation of the miRNA-lncRNA and RBP-lncRNA networks from each community revealed the top RBPs, miRNAs, and lncRNAs for each cluster. The experimental validation community yielded ELAVL4 and miR-16-5p as the predominant RBP and miRNA, respectively. Five lncRNAs emerged as the top-ranking candidates from the RBP/miRNA-lncRNA networks. Further analyses of these networks revealed the presence of multiple regulatory triads where the RBP-lncRNA interactions could be augmented by the enhanced miRNA-lncRNA interactions. Our results advance the understanding of the mechanism of lncRNA-mediated AD regulation through their interacting partners and demonstrate how these functionally segregated but overlapping regulatory networks can modulate the disease holistically.
Collapse
Affiliation(s)
- Somenath Sen
- Biophysics and Structural Genomics Division, Saha Institute of Nuclear Physics, A CI of Homi Bhabha National Institute, Kolkata, 700 064, India
| | - Debashis Mukhopadhyay
- Biophysics and Structural Genomics Division, Saha Institute of Nuclear Physics, A CI of Homi Bhabha National Institute, Kolkata, 700 064, India.
| |
Collapse
|
23
|
Liu T, Qiao H, Wang Z, Yang X, Pan X, Yang Y, Ye X, Sakurai T, Lin H, Zhang Y. CodLncScape Provides a Self-Enriching Framework for the Systematic Collection and Exploration of Coding LncRNAs. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400009. [PMID: 38602457 PMCID: PMC11165466 DOI: 10.1002/advs.202400009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 03/19/2024] [Indexed: 04/12/2024]
Abstract
Recent studies have revealed that numerous lncRNAs can translate proteins under specific conditions, performing diverse biological functions, thus termed coding lncRNAs. Their comprehensive landscape, however, remains elusive due to this field's preliminary and dispersed nature. This study introduces codLncScape, a framework for coding lncRNA exploration consisting of codLncDB, codLncFlow, codLncWeb, and codLncNLP. Specifically, it contains a manually compiled knowledge base, codLncDB, encompassing 353 coding lncRNA entries validated by experiments. Building upon codLncDB, codLncFlow investigates the expression characteristics of these lncRNAs and their diagnostic potential in the pan-cancer context, alongside their association with spermatogenesis. Furthermore, codLncWeb emerges as a platform for storing, browsing, and accessing knowledge concerning coding lncRNAs within various programming environments. Finally, codLncNLP serves as a knowledge-mining tool to enhance the timely content inclusion and updates within codLncDB. In summary, this study offers a well-functioning, content-rich ecosystem for coding lncRNA research, aiming to accelerate systematic studies in this field.
Collapse
Affiliation(s)
- Tianyuan Liu
- Tsukuba Life Science Innovation ProgramUniversity of TsukubaTsukuba3058577Japan
| | - Huiyuan Qiao
- Innovative Institute of Chinese Medicine and PharmacyAcademy for InterdisciplineChengdu University of Traditional Chinese MedicineChengdu611137China
| | - Zixu Wang
- Department of Computer ScienceUniversity of TsukubaTsukuba3058577Japan
| | - Xinyan Yang
- Department of Developmental BiologySchool of Basic Medical SciencesSouthern Medical UniversityGuangzhou510515China
| | - Xianrun Pan
- Innovative Institute of Chinese Medicine and PharmacyAcademy for InterdisciplineChengdu University of Traditional Chinese MedicineChengdu611137China
| | - Yu Yang
- School of Healthcare TechnologyChengdu Neusoft UniversityChengdu611844China
| | - Xiucai Ye
- Tsukuba Life Science Innovation ProgramUniversity of TsukubaTsukuba3058577Japan
- Department of Computer ScienceUniversity of TsukubaTsukuba3058577Japan
| | - Tetsuya Sakurai
- Tsukuba Life Science Innovation ProgramUniversity of TsukubaTsukuba3058577Japan
- Department of Computer ScienceUniversity of TsukubaTsukuba3058577Japan
| | - Hao Lin
- School of Life Science and TechnologyUniversity of Electronic Science and Technology of ChinaChengdu611731China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and PharmacyAcademy for InterdisciplineChengdu University of Traditional Chinese MedicineChengdu611137China
| |
Collapse
|
24
|
Kulaeva ED, Muzlaeva ES, Mashkina EV. mRNA-lncRNA gene expression signature in HPV-associated neoplasia and cervical cancer. Vavilovskii Zhurnal Genet Selektsii 2024; 28:342-350. [PMID: 38946889 PMCID: PMC11211991 DOI: 10.18699/vjgb-24-39] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 01/18/2024] [Accepted: 01/19/2024] [Indexed: 07/02/2024] Open
Abstract
Cervical cancer is one of the most frequent cancers in women and is associated with human papillomavirus (HPV) in 70 % of cases. Cervical cancer occurs because of progression of low-differentiated cervical intraepithelial neoplasia through grade 2 and 3 lesions. Along with the protein-coding genes, long noncoding RNAs (lncRNAs) play an important role in the development of malignant cell transformation. Although human papillomavirus is widespread, there is currently no well-characterized transcriptomic signature to predict whether this tumor will develop in the presence of HPV-associated neoplastic changes in the cervical epithelium. Changes in gene activity in tumors reflect the biological diversity of cellular phenotype and physiological functions and can be an important diagnostic marker. We performed comparative transcriptome analysis using open RNA sequencing data to assess differentially expressed genes between normal tissue, neoplastic epithelium, and cervical cancer. Raw data were preprocessed using the Galaxy platform. Batch effect correction, identification of differentially expressed genes, and gene set enrichment analysis (GSEA) were performed using R programming language packages. Subcellular localization of lncRNA was analyzed using Locate-R and iLoc-LncRNA 2.0 web services. 1,572 differentially expressed genes (DEGs) were recorded in the "cancer vs. control" comparison, and 1,260 DEGs were recorded in the "cancer vs. neoplasia" comparison. Only two genes were observed to be differentially expressed in the "neoplasia vs. control" comparison. The search for common genes among the most strongly differentially expressed genes among all comparison groups resulted in the identification of an expression signature consisting of the CCL20, CDKN2A, CTCFL, piR-55219, TRH, SLC27A6 and EPHA5 genes. The transcription level of the CCL20 and CDKN2A genes becomes increased at the stage of neoplastic epithelial changes and stays so in cervical cancer. Validation on an independent microarray dataset showed that the differential expression patterns of the CDKN2A and SLC27A6 genes were conserved in the respective gene expression comparisons between groups.
Collapse
Affiliation(s)
- E D Kulaeva
- Southern Federal University, Rostov-on-Don, Russia
| | - E S Muzlaeva
- Southern Federal University, Rostov-on-Don, Russia
| | - E V Mashkina
- Southern Federal University, Rostov-on-Don, Russia
| |
Collapse
|
25
|
Zhang ZY, Zhang Z, Ye X, Sakurai T, Lin H. A BERT-based model for the prediction of lncRNA subcellular localization in Homo sapiens. Int J Biol Macromol 2024; 265:130659. [PMID: 38462114 DOI: 10.1016/j.ijbiomac.2024.130659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/19/2024] [Accepted: 03/04/2024] [Indexed: 03/12/2024]
Abstract
Understanding the subcellular localization of lncRNAs is crucial for comprehending their regulation activities. The conventional detection of lncRNA subcellular location usually uses in situ detection techniques, which are resource intensive. Some machine learning-based algorithms have been proposed for lncRNA subcellular location prediction in mammals. However, due to the low level of conservation of lncRNA sequence, the performance of cross-species models remains unsatisfactory. In this study, we curated a novel dataset containing subcellular location information of lncRNAs in Homo sapiens. Subsequently, based on the BERT pre-trained language algorithm, we developed a model for lncRNA subcellular location prediction. Our model achieved a micro-average area under the receiver operating characteristic (AUROC) of 0.791 on the training set and an AUROC of 0.700 on the testing nucleus set. Additionally, we conducted cross-species validation and motif discovery to further investigate underlying patterns. In summary, our study provides valuable guidance and computational analysis tools for exploring the mechanisms of lncRNA subcellular localization and the dynamic spatial changes of RNA in abnormal physiological states.
Collapse
Affiliation(s)
- Zhao-Yue Zhang
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba 3058577, Japan
| | - Zheng Zhang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Hao Lin
- Center for Information Biology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
26
|
Wang J, Horlacher M, Cheng L, Winther O. DeepLocRNA: an interpretable deep learning model for predicting RNA subcellular localization with domain-specific transfer-learning. Bioinformatics 2024; 40:btae065. [PMID: 38317052 PMCID: PMC10879750 DOI: 10.1093/bioinformatics/btae065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 01/22/2024] [Accepted: 02/01/2024] [Indexed: 02/07/2024] Open
Abstract
MOTIVATION Accurate prediction of RNA subcellular localization plays an important role in understanding cellular processes and functions. Although post-transcriptional processes are governed by trans-acting RNA binding proteins (RBPs) through interaction with cis-regulatory RNA motifs, current methods do not incorporate RBP-binding information. RESULTS In this article, we propose DeepLocRNA, an interpretable deep-learning model that leverages a pre-trained multi-task RBP-binding prediction model to predict the subcellular localization of RNA molecules via fine-tuning. We constructed DeepLocRNA using a comprehensive dataset with variant RNA types and evaluated it on the held-out dataset. Our model achieved state-of-the-art performance in predicting RNA subcellular localization in mRNA and miRNA. It has also demonstrated great generalization capabilities, performing well on both human and mouse RNA. Additionally, a motif analysis was performed to enhance the interpretability of the model, highlighting signal factors that contributed to the predictions. The proposed model provides general and powerful prediction abilities for different RNA types and species, offering valuable insights into the localization patterns of RNA molecules and contributing to our understanding of cellular processes at the molecular level. A user-friendly web server is available at: https://biolib.com/KU/DeepLocRNA/.
Collapse
Affiliation(s)
- Jun Wang
- Bioinformatics Centre, Department of Biology, University of Copenhagen, København Ø 2100, Denmark
| | - Marc Horlacher
- Computational Health Center, Helmholtz Center Munich, Neuherberg 85764, Germany
| | - Lixin Cheng
- Shenzhen People’s Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
| | - Ole Winther
- Bioinformatics Centre, Department of Biology, University of Copenhagen, København Ø 2100, Denmark
- Center for Genomic Medicine, Rigshospitalet (Copenhagen University Hospital), Copenhagen 2100, Denmark
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| |
Collapse
|
27
|
Li B, Feng C, Zhang W, Sun S, Yue D, Zhang X, Yang X. Comprehensive non-coding RNA analysis reveals specific lncRNA/circRNA-miRNA-mRNA regulatory networks in the cotton response to drought stress. Int J Biol Macromol 2023; 253:126558. [PMID: 37659489 DOI: 10.1016/j.ijbiomac.2023.126558] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 07/29/2023] [Accepted: 08/20/2023] [Indexed: 09/04/2023]
Abstract
Root and leaf are essential organs of plants in sensing and responding to drought stress. However, comparative knowledge of non-coding RNAs (ncRNAs) of root and leaf tissues in the regulation of drought response in cotton is limited. Here, we used deep sequencing data of leaf and root tissues of drought-resistant and drought-sensitive cotton varieties for identifying miRNAs, lncRNAs and circRNAs. A total of 1531 differentially expressed (DE) ncRNAs was identified, including 77 DE miRNAs, 1393 DE lncRNAs and 61 DE circRNAs. The tissue-specific and variety-specific competing endogenous RNA (ceRNA) networks of DE lncRNA-miRNA-mRNA response to drought were constructed. Furthermore, the novel drought-responsive lncRNA 1 (DRL1), specifically and differentially expressed in root, was verified to positively affect phenotypes of cotton seedlings under drought stress, competitively binding to miR477b with GhNAC1 and GhSCL3. In addition, we also constructed another ceRNA network consisting of 18 DE circRNAs, 26 DE miRNAs and 368 DE mRNAs. Fourteen circRNA were characterized, and a novel molecular regulatory system of circ125- miR7484b/miR7450b was proposed under drought stress. Our findings revealed the specificity of ncRNA expression in tissue- and variety-specific patterns involved in the response to drought stress, and uncovered novel regulatory pathways and potentially effective molecules in genetic improvement for crop drought resistance.
Collapse
Affiliation(s)
- Baoqi Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China.
| | - Cheng Feng
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Wenhao Zhang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Simin Sun
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Dandan Yue
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Xianlong Zhang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Xiyan Yang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China.
| |
Collapse
|
28
|
Zeng M, Wu Y, Li Y, Yin R, Lu C, Duan J, Li M. LncLocFormer: a Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism. Bioinformatics 2023; 39:btad752. [PMID: 38109668 PMCID: PMC10749772 DOI: 10.1093/bioinformatics/btad752] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 11/13/2023] [Accepted: 12/17/2023] [Indexed: 12/20/2023] Open
Abstract
MOTIVATION There is mounting evidence that the subcellular localization of lncRNAs can provide valuable insights into their biological functions. In the real world of transcriptomes, lncRNAs are usually localized in multiple subcellular localizations. Furthermore, lncRNAs have specific localization patterns for different subcellular localizations. Although several computational methods have been developed to predict the subcellular localization of lncRNAs, few of them are designed for lncRNAs that have multiple subcellular localizations, and none of them take motif specificity into consideration. RESULTS In this study, we proposed a novel deep learning model, called LncLocFormer, which uses only lncRNA sequences to predict multi-label lncRNA subcellular localization. LncLocFormer utilizes eight Transformer blocks to model long-range dependencies within the lncRNA sequence and shares information across the lncRNA sequence. To exploit the relationship between different subcellular localizations and find distinct localization patterns for different subcellular localizations, LncLocFormer employs a localization-specific attention mechanism. The results demonstrate that LncLocFormer outperforms existing state-of-the-art predictors on the hold-out test set. Furthermore, we conducted a motif analysis and found LncLocFormer can capture known motifs. Ablation studies confirmed the contribution of the localization-specific attention mechanism in improving the prediction performance. AVAILABILITY AND IMPLEMENTATION The LncLocFormer web server is available at http://csuligroup.com:9000/LncLocFormer. The source code can be obtained from https://github.com/CSUBioGroup/LncLocFormer.
Collapse
Affiliation(s)
- Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yifan Wu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yiming Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Rui Yin
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32603, United States
| | - Chengqian Lu
- School of Computer Science, Key Laboratory of Intelligent Computing and Information Processing, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Junwen Duan
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| |
Collapse
|
29
|
Fu X, Chen Y, Tian S. DlncRNALoc: A discrete wavelet transform-based model for predicting lncRNA subcellular localization. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:20648-20667. [PMID: 38124569 DOI: 10.3934/mbe.2023913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
The prediction of long non-coding RNA (lncRNA) subcellular localization is essential to the understanding of its function and involvement in cellular regulation. Traditional biological experimental methods are costly and time-consuming, making computational methods the preferred approach for predicting lncRNA subcellular localization (LSL). However, existing computational methods have limitations due to the structural characteristics of lncRNAs and the uneven distribution of data across subcellular compartments. We propose a discrete wavelet transform (DWT)-based model for predicting LSL, called DlncRNALoc. We construct a physicochemical property matrix of a 2-tuple bases based on lncRNA sequences, and we introduce a DWT lncRNA feature extraction method. We use the Synthetic Minority Over-sampling Technique (SMOTE) for oversampling and the local fisher discriminant analysis (LFDA) algorithm to optimize feature information. The optimized feature vectors are fed into support vector machine (SVM) to construct a predictive model. DlncRNALoc has been applied for a five-fold cross-validation on the three sets of benchmark datasets. Extensive experiments have demonstrated the superiority and effectiveness of the DlncRNALoc model in predicting LSL.
Collapse
Affiliation(s)
- Xiangzheng Fu
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, China
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
- Department of Basic Biology, Changsha Medical College, Changsha, Hunan, China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
- Department of Basic Biology, Changsha Medical College, Changsha, Hunan, China
| | - Sha Tian
- Department of Internal Medicine, College of Integrated Chinese and Western Medicine, Hunan University of Chinese Medicine, Changsha, Hunan, China
| |
Collapse
|
30
|
Wang X, Bi J, Yang C, Li Y, Yang Y, Deng J, Wang L, Gao X, Lin Y, Liu J, Yin G. Long non-coding RNA LOC103222771 promotes infection of porcine reproductive and respiratory syndrome virus in Marc-145 cells by downregulating Claudin-4. Vet Microbiol 2023; 286:109890. [PMID: 37857013 DOI: 10.1016/j.vetmic.2023.109890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 09/12/2023] [Accepted: 10/12/2023] [Indexed: 10/21/2023]
Abstract
Porcine reproductive and respiratory syndrome (PRRS) is an important swine disease caused by infection of porcine reproductive and respiratory syndrome virus (PRRSV), which leads to huge loss in swine industry. How to effectively control PRRS is challenging. Long non-coding RNA (lncRNA) are key regulator of viral infections and anti-virus immunological responses, therefore, further understanding of lncRNAs will aid to identification of novel regulators of viral infections and better design of prevention and control strategies to viral infection related diseases and immune disorders. We demonstrated that PRRSV infection upregulated the expression of lncRNA LOC103222771 in Marc-145 cells and porcine alveolar macrophage cells (PAMs) and that LOC103222771 is mainly located in cytoplasm. Knockdown of LOC103222771 could inhibit the PRRSV infection in Marc-145 cells. RNA-seq analysis and subsequent validation revealed increased expression of Claudin-4 (CLDN4) in Marc-145 when LOC103222771 was specifically downregulated,suggesting that LOC103222771 might be an upstream regulator of CLDN4, an important component of tight junctions for establishment of the paracellular barrier that controls the flow of molecules in the intercellular space between epithelial cells. We and others showed that Downregulation of CLDN4 could boost the infection of PRRSV. Collectively, LOC103222771/CLDN4 signal axis might be a novel mechanism of PRRSV pathogenesis, implying a potential therapeutic target against PRRSV infection.
Collapse
Affiliation(s)
- Xinxian Wang
- College of Animal Veterinary Medicine, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Junlong Bi
- College of Animal Veterinary Medicine, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Chao Yang
- College of Animal Veterinary Medicine, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Yongneng Li
- College of Animal Veterinary Medicine, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Ying Yang
- College of Animal Veterinary Medicine, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Junwen Deng
- College of Animal Veterinary Medicine, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Lei Wang
- College of Animal Veterinary Medicine, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Xiaolin Gao
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan 650201, China
| | - Yingbo Lin
- Department of Oncology-Pathology, Karolinska Institutet, Stockholm 17176, Sweden
| | - Jianping Liu
- Department of Gastroenterology, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China.
| | - Gefen Yin
- College of Animal Veterinary Medicine, Yunnan Agricultural University, Kunming, Yunnan 650201, China.
| |
Collapse
|
31
|
Wang J, Horlacher M, Cheng L, Winther O. RNA trafficking and subcellular localization-a review of mechanisms, experimental and predictive methodologies. Brief Bioinform 2023; 24:bbad249. [PMID: 37466130 PMCID: PMC10516376 DOI: 10.1093/bib/bbad249] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/30/2023] [Accepted: 06/16/2023] [Indexed: 07/20/2023] Open
Abstract
RNA localization is essential for regulating spatial translation, where RNAs are trafficked to their target locations via various biological mechanisms. In this review, we discuss RNA localization in the context of molecular mechanisms, experimental techniques and machine learning-based prediction tools. Three main types of molecular mechanisms that control the localization of RNA to distinct cellular compartments are reviewed, including directed transport, protection from mRNA degradation, as well as diffusion and local entrapment. Advances in experimental methods, both image and sequence based, provide substantial data resources, which allow for the design of powerful machine learning models to predict RNA localizations. We review the publicly available predictive tools to serve as a guide for users and inspire developers to build more effective prediction models. Finally, we provide an overview of multimodal learning, which may provide a new avenue for the prediction of RNA localization.
Collapse
Affiliation(s)
- Jun Wang
- Bioinformatics Centre, Department of Biology, University of Copenhagen, København Ø 2100, Denmark
| | - Marc Horlacher
- Computational Health Center, Helmholtz Center, Munich, Germany
| | - Lixin Cheng
- Shenzhen People’s Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
| | - Ole Winther
- Bioinformatics Centre, Department of Biology, University of Copenhagen, København Ø 2100, Denmark
- Center for Genomic Medicine, Rigshospitalet (Copenhagen University Hospital), Copenhagen 2100, Denmark
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| |
Collapse
|
32
|
Charoenkwan P, Schaduangrat N, Shoombuatong W. StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens. BMC Bioinformatics 2023; 24:301. [PMID: 37507654 PMCID: PMC10386778 DOI: 10.1186/s12859-023-05421-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
BACKGROUND The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision. RESULTS In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866. CONCLUSIONS In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server ( http://2pmlab.camt.cmu.ac.th/StackTTCA ) to maximize user convenience for high-throughput screening of novel TTCAs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
33
|
Fan Y, Xiong H, Sun G. DeepASDPred: a CNN-LSTM-based deep learning method for Autism spectrum disorders risk RNA identification. BMC Bioinformatics 2023; 24:261. [PMID: 37349705 DOI: 10.1186/s12859-023-05378-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 06/06/2023] [Indexed: 06/24/2023] Open
Abstract
BACKGROUND Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders characterized by difficulty communicating with society and others, behavioral difficulties, and a brain that processes information differently than normal. Genetics has a strong impact on ASD associated with early onset and distinctive signs. Currently, all known ASD risk genes are able to encode proteins, and some de novo mutations disrupting protein-coding genes have been demonstrated to cause ASD. Next-generation sequencing technology enables high-throughput identification of ASD risk RNAs. However, these efforts are time-consuming and expensive, so an efficient computational model for ASD risk gene prediction is necessary. RESULTS In this study, we propose DeepASDPerd, a predictor for ASD risk RNA based on deep learning. Firstly, we use K-mer to feature encode the RNA transcript sequences, and then fuse them with corresponding gene expression values to construct a feature matrix. After combining chi-square test and logistic regression to select the best feature subset, we input them into a binary classification prediction model constructed by convolutional neural network and long short-term memory for training and classification. The results of the tenfold cross-validation proved our method outperformed the state-of-the-art methods. Dataset and source code are available at https://github.com/Onebear-X/DeepASDPred is freely available. CONCLUSIONS Our experimental results show that DeepASDPred has outstanding performance in identifying ASD risk RNA genes.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
| | - Hui Xiong
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
| | - Guicong Sun
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.
| |
Collapse
|
34
|
Bai T, Yan K, Liu B. DAmiRLocGNet: miRNA subcellular localization prediction by combining miRNA-disease associations and graph convolutional networks. Brief Bioinform 2023:bbad212. [PMID: 37332057 DOI: 10.1093/bib/bbad212] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/17/2023] [Accepted: 05/18/2023] [Indexed: 06/20/2023] Open
Abstract
MicroRNAs (miRNAs) are human post-transcriptional regulators in humans, which are involved in regulating various physiological processes by regulating the gene expression. The subcellular localization of miRNAs plays a crucial role in the discovery of their biological functions. Although several computational methods based on miRNA functional similarity networks have been presented to identify the subcellular localization of miRNAs, it remains difficult for these approaches to effectively extract well-referenced miRNA functional representations due to insufficient miRNA-disease association representation and disease semantic representation. Currently, there has been a significant amount of research on miRNA-disease associations, making it possible to address the issue of insufficient miRNA functional representation. In this work, a novel model is established, named DAmiRLocGNet, based on graph convolutional network (GCN) and autoencoder (AE) for identifying the subcellular localizations of miRNA. The DAmiRLocGNet constructs the features based on miRNA sequence information, miRNA-disease association information and disease semantic information. GCN is utilized to gather the information of neighboring nodes and capture the implicit information of network structures from miRNA-disease association information and disease semantic information. AE is employed to capture sequence semantics from sequence similarity networks. The evaluation demonstrates that the performance of DAmiRLocGNet is superior to other competing computational approaches, benefiting from implicit features captured by using GCNs. The DAmiRLocGNet has the potential to be applied to the identification of subcellular localization of other non-coding RNAs. Moreover, it can facilitate further investigation into the functional mechanisms underlying miRNA localization. The source code and datasets are accessed at http://bliulab.net/DAmiRLocGNet.
Collapse
Affiliation(s)
- Tao Bai
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- School of Mathematics & Computer Science, Yan'an University, Shaanxi 716000, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
35
|
Li J, Zou Q, Yuan L. A review from biological mapping to computation-based subcellular localization. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 32:507-521. [PMID: 37215152 PMCID: PMC10192651 DOI: 10.1016/j.omtn.2023.04.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Subcellular localization is crucial to the study of virus and diseases. Specifically, research on protein subcellular localization can help identify clues between virus and host cells that can aid in the design of targeted drugs. Research on RNA subcellular localization is significant for human diseases (such as Alzheimer's disease, colon cancer, etc.). To date, only reviews addressing subcellular localization of proteins have been published, which are outdated for reference, and reviews of RNA subcellular localization are not comprehensive. Therefore, we collated (the most up-to-date) literature on protein and RNA subcellular localization to help researchers understand changes in the field of protein and RNA subcellular localization. Extensive and complete methods for constructing subcellular localization models have also been summarized, which can help readers understand the changes in application of biotechnology and computer science in subcellular localization research and explore how to use biological data to construct improved subcellular localization models. This paper is the first review to cover both protein subcellular localization and RNA subcellular localization. We urge researchers from biology and computational biology to jointly pay attention to transformation patterns, interrelationships, differences, and causality of protein subcellular localization and RNA subcellular localization.
Collapse
Affiliation(s)
- Jing Li
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang 324000, China
- School of Biomedical Sciences, University of Hong Kong, Hong Kong, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang 324000, China
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, 100 Minjiang Main Road, Quzhou, Zhejiang 324000, China
| |
Collapse
|
36
|
Dieter C, Lemos NE, Girardi E, Ramos DT, Corrêa NRDF, Canani LH, Bauer AC, Assmann TS, Crispim D. The lncRNA MALAT1 is upregulated in urine of type 1 diabetes mellitus patients with diabetic kidney disease. Genet Mol Biol 2023; 46:e20220291. [PMID: 37272835 DOI: 10.1590/1678-4685-gmb-2022-0291] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 01/27/2023] [Indexed: 06/06/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are RNAs with >200 nucleotides that are unable to encode proteins and are involved in gene expression regulation. LncRNAs have a key role in many physiological and pathological processes and, consequently, they have been associated with several human diseases, including diabetes chronic complications, such as diabetes kidney disease (DKD). In this context, some studies have identified the dysregulation of the lncRNAs MALAT1 and TUG1 in patients with DKD; nevertheless, available data are still contradictory. Thus, the objective of this study was to compare MALAT1 and TUG1 expressions in urine of patients with type 1 diabetes mellitus (T1DM) categorized according to DKD presence. This study comprised 18 T1DM patients with DKD (cases) and 9 long-duration T1DM patients without DKD (controls). MALAT1 and TUG1 were analyzed using qPCR. Bioinformatics analyses were done to identify both lncRNA target genes and the signaling pathways under their regulation. The lncRNA MALAT1 was upregulated in urine of T1DM patients with DKD vs. T1DM controls (P = 0.007). The expression of lncRNA TUG1 did not differ between groups (P = 0.815). Bioinformatics analysis showed these two lncRNAs take part in metabolism-related pathways. The present study shows that the lncRNA MALAT1 is upregulated in T1DM patients presenting DKD.
Collapse
Affiliation(s)
- Cristine Dieter
- Hospital de Clínicas de Porto Alegre, Serviço de Endocrinologia e Metabologia, Porto Alegre, RS, Brazil
- Universidade Federal do Rio Grande do Sul, Faculdade de Medicina, Departamento de Medicina Interna, Programa de Pós-Graduação em Ciências Médicas: Endocrinologia, Porto Alegre, RS, Brazil
| | - Natália Emerim Lemos
- Hospital de Clínicas de Porto Alegre, Serviço de Endocrinologia e Metabologia, Porto Alegre, RS, Brazil
- Universidade de São Paulo, Instituto de Química, Departamento de Bioquímica, São Paulo, SP, Brazil
| | - Eliandra Girardi
- Hospital de Clínicas de Porto Alegre, Serviço de Endocrinologia e Metabologia, Porto Alegre, RS, Brazil
| | - Denise Taurino Ramos
- Hospital de Clínicas de Porto Alegre, Serviço de Endocrinologia e Metabologia, Porto Alegre, RS, Brazil
| | | | - Luís Henrique Canani
- Hospital de Clínicas de Porto Alegre, Serviço de Endocrinologia e Metabologia, Porto Alegre, RS, Brazil
- Universidade Federal do Rio Grande do Sul, Faculdade de Medicina, Departamento de Medicina Interna, Programa de Pós-Graduação em Ciências Médicas: Endocrinologia, Porto Alegre, RS, Brazil
| | - Andrea Carla Bauer
- Hospital de Clínicas de Porto Alegre, Serviço de Endocrinologia e Metabologia, Porto Alegre, RS, Brazil
- Universidade Federal do Rio Grande do Sul, Faculdade de Medicina, Departamento de Medicina Interna, Programa de Pós-Graduação em Ciências Médicas: Endocrinologia, Porto Alegre, RS, Brazil
- Hospital de Clínicas de Porto Alegre, Serviço de Nefrologia, Porto Alegre, RS, Brazil
| | - Taís Silveira Assmann
- Hospital de Clínicas de Porto Alegre, Serviço de Endocrinologia e Metabologia, Porto Alegre, RS, Brazil
- Universidade Federal do Rio Grande do Sul, Faculdade de Medicina, Departamento de Medicina Interna, Programa de Pós-Graduação em Ciências Médicas: Endocrinologia, Porto Alegre, RS, Brazil
| | - Daisy Crispim
- Hospital de Clínicas de Porto Alegre, Serviço de Endocrinologia e Metabologia, Porto Alegre, RS, Brazil
- Universidade Federal do Rio Grande do Sul, Faculdade de Medicina, Departamento de Medicina Interna, Programa de Pós-Graduação em Ciências Médicas: Endocrinologia, Porto Alegre, RS, Brazil
| |
Collapse
|
37
|
Palos K, Yu L, Railey CE, Nelson Dittrich AC, Nelson ADL. Linking discoveries, mechanisms, and technologies to develop a clearer perspective on plant long noncoding RNAs. THE PLANT CELL 2023; 35:1762-1786. [PMID: 36738093 PMCID: PMC10226578 DOI: 10.1093/plcell/koad027] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 12/19/2022] [Accepted: 12/22/2022] [Indexed: 05/30/2023]
Abstract
Long noncoding RNAs (lncRNAs) are a large and diverse class of genes in eukaryotic genomes that contribute to a variety of regulatory processes. Functionally characterized lncRNAs play critical roles in plants, ranging from regulating flowering to controlling lateral root formation. However, findings from the past decade have revealed that thousands of lncRNAs are present in plant transcriptomes, and characterization has lagged far behind identification. In this setting, distinguishing function from noise is challenging. However, the plant community has been at the forefront of discovery in lncRNA biology, providing many functional and mechanistic insights that have increased our understanding of this gene class. In this review, we examine the key discoveries and insights made in plant lncRNA biology over the past two and a half decades. We describe how discoveries made in the pregenomics era have informed efforts to identify and functionally characterize lncRNAs in the subsequent decades. We provide an overview of the functional archetypes into which characterized plant lncRNAs fit and speculate on new avenues of research that may uncover yet more archetypes. Finally, this review discusses the challenges facing the field and some exciting new molecular and computational approaches that may help inform lncRNA comparative and functional analyses.
Collapse
Affiliation(s)
- Kyle Palos
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Li’ang Yu
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Caylyn E Railey
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
- Plant Biology Graduate Field, Cornell University, Ithaca, NY 14853, USA
| | | | | |
Collapse
|
38
|
Liu Z, Guo T, Yin Z, Zeng Y, Liu H, Yin H. Functional inference of long non-coding RNAs through exploration of highly conserved regions. Front Genet 2023; 14:1177259. [PMID: 37260771 PMCID: PMC10229068 DOI: 10.3389/fgene.2023.1177259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 04/28/2023] [Indexed: 06/02/2023] Open
Abstract
Background: Long non-coding RNAs (lncRNAs), which are generally less functionally characterized or less annotated, evolve more rapidly than mRNAs and substantially possess fewer sequence conservation patterns than protein-coding genes across divergent species. People assume that the functional inference could be conducted on the evolutionarily conserved long non-coding RNAs as they are most likely to be functional. In the past decades, substantial progress has been made in discussions on the evolutionary conservation of non-coding genomic regions from multiple perspectives. However, understanding their conservation and the functions associated with sequence conservation in relation to further corresponding phenotypic variability or disorders still remains incomplete. Results: Accordingly, we determined a highly conserved region (HCR) to verify the sequence conservation among long non-coding RNAs and systematically profiled homologous long non-coding RNA clusters in humans and mice based on the detection of highly conserved regions. Moreover, according to homolog clustering, we explored the potential function inference via highly conserved regions on representative long non-coding RNAs. On lncRNA XACT, we investigated the potential functional competence between XACT and lncRNA XIST by recruiting miRNA-29a, regulating the downstream target genes. In addition, on lncRNA LINC00461, we examined the interaction relationship between LINC00461 and SND1. This interaction or association may be perturbed during the progression of glioma. In addition, we have constructed a website with user-friendly web interfaces for searching, analyzing, and downloading to present the homologous clusters of humans and mice. Conclusion: Collectively, homolog clustering via the highly conserved region definition and detection on long non-coding RNAs, as well as the functional explorations on representative sequences in our research, would provide new evidence for the potential function of long non-coding RNAs. Our results on the remarkable roles of long non-coding RNAs would presumably provide a new theoretical basis and candidate diagnostic indicators for tumors.
Collapse
Affiliation(s)
- Zhongpeng Liu
- Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources, College of Tropical Crops, Hainan University, Haikou, China
| | - Tianbin Guo
- Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources, College of Tropical Crops, Hainan University, Haikou, China
| | - Zhuoda Yin
- TJ-YZ School of Network Science, Haikou University of Economics, Haikou, China
| | - Yanluo Zeng
- Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources, College of Tropical Crops, Hainan University, Haikou, China
| | - Haiwen Liu
- Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources, College of Tropical Crops, Hainan University, Haikou, China
| | - Hongyan Yin
- Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources, College of Tropical Crops, Hainan University, Haikou, China
| |
Collapse
|
39
|
Kumar R, Mondal R, Lahiri T, Pal MK. Application of sequence semantic and integrated cellular geography approach to study alternative biogenesis of exonic circular RNA. BMC Bioinformatics 2023; 24:148. [PMID: 37069509 PMCID: PMC10108499 DOI: 10.1186/s12859-023-05279-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 04/09/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND Concurrent existence of lncRNA and circular RNA at both nucleus and cytosol within a cell at different proportions is well reported. Previous studies showed that circular RNAs are synthesized in nucleus followed by transportation across the nuclear membrane and the export is primarily defined by their length. lncRNAs primarily originated through inefficient splicing and seem to use NXF1 for cytoplasm export. However, it is not clear whether circularization of lncRNA happens only in nucleus or it also occurs in cytoplasm. Studies indicate that circular RNAs arise when the splicing apparatus undergoes a phenomenon of back splicing. Minor spliceosome (U12 type) mediated splicing occurs in cytoplasm and is responsible for the splicing of 0.5% of introns of human cells. Therefore, possibility of cRNA biogenesis mediated by minor spliceosome at cytoplasm cannot be ruled out. Secondly, information on genes transcribing both circular and lncRNAs along with total number of RBP binding sites for both of these RNA types is extractable from databases. This study showed how these apparently unconnected pieces of reports could be put together to build a model for exploring biogenesis of circular RNA. RESULTS As a result of this study, a model was built under the premises that, sequences with special semantics were molecular precursors in biogenesis of circular RNA which occurred through catalytic role of some specific RBPs. The model outcome was further strengthened by fulfillment of three logical lemmas which were extracted and assimilated in this work using a novel data analytic approach, Integrated Cellular Geography. Result of the study was found to be in well agreement with proposed model. Furthermore this study also indicated that biogenesis of circular RNA was a post-transcriptional event. CONCLUSIONS Overall, this study provides a novel systems biology based model under the paradigm of Integrated Cellular Geography which can assimilate independently performed experimental results and data published by global researchers on RNA biology to provide important information on biogenesis of circular RNAs considering lncRNAs as precursor molecule. This study also suggests the possible RBP-mediated circularization of RNA in the cytoplasm through back-splicing using minor spliceosome.
Collapse
Affiliation(s)
- Rajnish Kumar
- Department of Pathology and Laboratory Medicine, Medical Center, University of Kansas, Kansas City, 66160, USA
| | - Rajkrishna Mondal
- Department of Biotechnology, Nagaland University, Dimapur, Nagaland, 797112, India
| | - Tapobrata Lahiri
- Room No. 4302, Department of Applied Sciences, Computer Centre - II, Indian Institute of Information Technology-Allahabad, Allahabad, 211015, India.
| | - Manoj Kumar Pal
- Faculty of Engineering and Technology, United University Prayagraj, Prayagraj, UP, 211012, India
| |
Collapse
|
40
|
Qian M, Xiao S, Yang Y, Yu F, Wen J, Lu L, Wang H. Screening and identification of cyprinid herpesvirus 2 (CyHV-2) ORF55-interacting proteins by phage display. Virol J 2023; 20:66. [PMID: 37046316 PMCID: PMC10091560 DOI: 10.1186/s12985-023-02026-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 04/01/2023] [Indexed: 04/14/2023] Open
Abstract
BACKGROUND Cyprinid herpesvirus 2 (CyHV-2) is a pathogenic fish virus belonging to family Alloherpesviridae. The CyHV-2 gene encoding thymidine kinase (TK) is an important virulence-associated factor. Therefore, we aimed to investigate the biological function of open reading frame 55 (ORF55) in viral replication. METHODS Purified CyHV-2 ORF55 protein was obtained by prokaryotic expression, and the interacting peptide was screened out using phage display. Host interacting proteins were then predicted and validated. RESULTS ORF55 was efficiently expressed in the prokaryotic expression system. Protein and peptide interaction prediction and dot-blot overlay assay confirmed that peptides identified by phage display could interact with the ORF55 protein. Comparing the peptides to the National Center for Biotechnology Information database revealed four potential interacting proteins. Reverse transcription quantitative PCR results demonstrated high expression of an actin-binding Rho-activating protein in the latter stages of virus-infected cells, and molecular docking, cell transfection and coimmunoprecipitation experiments confirmed that it interacted with the ORF55 protein. CONCLUSION During viral infection, the ORF55 protein exerts its biological function through interactions with host proteins. The specific mechanisms remain to be further explored.
Collapse
Affiliation(s)
- Min Qian
- National Pathogen Collection Center for Aquatic Animals, Shanghai Ocean University, Shanghai, 201306, China
- National Demonstration Center for Experimental Fisheries Science Education, Shanghai Ocean University, Shanghai, 201306, China
| | - Simin Xiao
- National Pathogen Collection Center for Aquatic Animals, Shanghai Ocean University, Shanghai, 201306, China
- Key Laboratory of Freshwater Aquatic Genetic Resources, Ministry of Agriculture, Shanghai Ocean University, Shanghai, 201306, China
| | - Yapeng Yang
- National Pathogen Collection Center for Aquatic Animals, Shanghai Ocean University, Shanghai, 201306, China
- Key Laboratory of Freshwater Aquatic Genetic Resources, Ministry of Agriculture, Shanghai Ocean University, Shanghai, 201306, China
| | - Fei Yu
- Institute of Marine Biology, College of Oceanography, Hohai University, Nanjing, 210098, China
| | - Jinxuan Wen
- National Pathogen Collection Center for Aquatic Animals, Shanghai Ocean University, Shanghai, 201306, China
- Key Laboratory of Freshwater Aquatic Genetic Resources, Ministry of Agriculture, Shanghai Ocean University, Shanghai, 201306, China
| | - Liqun Lu
- National Pathogen Collection Center for Aquatic Animals, Shanghai Ocean University, Shanghai, 201306, China
- Key Laboratory of Freshwater Aquatic Genetic Resources, Ministry of Agriculture, Shanghai Ocean University, Shanghai, 201306, China
| | - Hao Wang
- National Pathogen Collection Center for Aquatic Animals, Shanghai Ocean University, Shanghai, 201306, China.
- National Demonstration Center for Experimental Fisheries Science Education, Shanghai Ocean University, Shanghai, 201306, China.
- Key Laboratory of Freshwater Aquatic Genetic Resources, Ministry of Agriculture, Shanghai Ocean University, Shanghai, 201306, China.
| |
Collapse
|
41
|
Lone IM, Midlej K, Nun NB, Iraqi FA. Intestinal cancer development in response to oral infection with high-fat diet-induced Type 2 diabetes (T2D) in collaborative cross mice under different host genetic background effects. Mamm Genome 2023; 34:56-75. [PMID: 36757430 DOI: 10.1007/s00335-023-09979-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/20/2023] [Indexed: 02/10/2023]
Abstract
Type 2 diabetes (T2D) is a metabolic disease with an imbalance in blood glucose concentration. There are significant studies currently showing association between T2D and intestinal cancer developments. High-fat diet (HFD) plays part in the disease development of T2D, intestinal cancer and infectious diseases through many biological mechanisms, including but not limited to inflammation. Understanding the system genetics of the multimorbidity of these diseases will provide an important knowledge and platform for dissecting the complexity of these diseases. Furthermore, in this study we used some machine learning (ML) models to explore more aspects of diabetes mellitus. The ultimate aim of this project is to study the genetic factors, which underline T2D development, associated with intestinal cancer in response to a HFD consumption and oral coinfection, jointly or separately, on the same host genetic background. A cohort of 307 mice of eight different CC mouse lines in the four experimental groups was assessed. The mice were maintained on either HFD or chow diet (CHD) for 12-week period, while half of each dietary group was either coinfected with oral bacteria or uninfected. Host response to a glucose load and clearance was assessed using intraperitoneal glucose tolerance test (IPGTT) at two time points (weeks 6 and 12) during the experiment period and, subsequently, was translated to area under curve (AUC) values. At week 5 of the experiment, mice of group two and four were coinfected with Porphyromonas gingivalis (Pg) and Fusobacterium nucleatum (Fn) strains, three times a week, while keeping the other uninfected mice as a control group. At week 12, mice were killed, small intestines and colon were extracted, and subsequently, the polyp counts were assessed; as well, the intestine lengths and size were measured. Our results have shown that there is a significant variation in polyp's number in different CC lines, with a spectrum between 2.5 and 12.8 total polyps on average. There was a significant correlation between area under curve (AUC) and intestine measurements, including polyp counts, length and size. In addition, our results have shown a significant sex effect on polyp development and glucose tolerance ability with males more susceptible to HFD than females by showing higher AUC in the glucose tolerance test. The ML results showed that classification with random forest could reach the highest accuracy when all the attributes were used. These results provide an excellent platform for proceeding toward understanding the nature of the genes involved in resistance and rate of development of intestinal cancer and T2D induced by HFD and oral coinfection. Once obtained, such data can be used to predict individual risk for developing these diseases and to establish the genetically based strategy for their prevention and treatment.
Collapse
Affiliation(s)
- Iqbal M Lone
- Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel-Aviv University, Ramat Aviv, 69978, Tel-Aviv, Israel
| | - Kareem Midlej
- Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel-Aviv University, Ramat Aviv, 69978, Tel-Aviv, Israel
| | - Nadav Ben Nun
- Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel-Aviv University, Ramat Aviv, 69978, Tel-Aviv, Israel
| | - Fuad A Iraqi
- Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel-Aviv University, Ramat Aviv, 69978, Tel-Aviv, Israel.
| |
Collapse
|
42
|
Schaduangrat N, Anuwongcharoen N, Moni MA, Lio' P, Charoenkwan P, Shoombuatong W. StackPR is a new computational approach for large-scale identification of progesterone receptor antagonists using the stacking strategy. Sci Rep 2022; 12:16435. [PMID: 36180453 PMCID: PMC9525257 DOI: 10.1038/s41598-022-20143-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 09/09/2022] [Indexed: 11/24/2022] Open
Abstract
Progesterone receptors (PRs) are implicated in various cancers since their presence/absence can determine clinical outcomes. The overstimulation of progesterone can facilitate oncogenesis and thus, its modulation through PR inhibition is urgently needed. To address this issue, a novel stacked ensemble learning approach (termed StackPR) is presented for fast, accurate, and large-scale identification of PR antagonists using only SMILES notation without the need for 3D structural information. We employed six popular machine learning (ML) algorithms (i.e., logistic regression, partial least squares, k-nearest neighbor, support vector machine, extremely randomized trees, and random forest) coupled with twelve conventional molecular descriptors to create 72 baseline models. Then, a genetic algorithm in conjunction with the self-assessment-report approach was utilized to determine m out of the 72 baseline models as means of developing the final meta-predictor using the stacking strategy and tenfold cross-validation test. Experimental results on the independent test dataset show that StackPR achieved impressive predictive performance with an accuracy of 0.966 and Matthew's coefficient correlation of 0.925. In addition, analysis based on the SHapley Additive exPlanation algorithm and molecular docking indicates that aliphatic hydrocarbons and nitrogen-containing substructures were the most important features for having PR antagonist activity. Finally, we implemented an online webserver using StackPR, which is freely accessible at http://pmlabstack.pythonanywhere.com/StackPR . StackPR is anticipated to be a powerful computational tool for the large-scale identification of unknown PR antagonist candidates for follow-up experimental validation.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nuttapat Anuwongcharoen
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
43
|
Yang Z, Song Y, Li Y, Mao Y, Du G, Tan B, Zhang H. Integrative analyses of prognosis, tumor immunity, and ceRNA network of the ferroptosis-associated gene FANCD2 in hepatocellular carcinoma. Front Genet 2022; 13:955225. [PMID: 36246623 PMCID: PMC9557971 DOI: 10.3389/fgene.2022.955225] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Accepted: 09/05/2022] [Indexed: 11/24/2022] Open
Abstract
Extensive evidence has revealed that ferroptosis plays a vital role in HCC development and progression. Fanconi anemia complementation group D2 (FANCD2) has been reported to serve as a ferroptosis-associated gene and has a close relationship with tumorigenesis and drug resistance. However, the impact of the FANCD2-related immune response and its mechanisms in HCC remains incompletely understood. In the current research, we evaluated the prognostic significance and immune-associated mechanism of FANCD2 based on multiple bioinformatics methods and databases. The results demonstrated that FANCD2 was commonly upregulated in 15/33 tumors, and only the high expression of FANCD2 in HCC was closely correlated with worse clinical outcomes by OS and DFS analyses. Moreover, ncRNAs, including two major types, miRNAs and lncRNAs, were closely involved in mediating FANCD2 upregulation in HCC and were established in a ceRNA network by performing various in silico analyses. The DUXAP8-miR-29c-FANCD2 and LINC00511-miR-29c-FANCD2 axes were identified as the most likely ncRNA-associated upstream regulatory axis of FANCD2 in HCC. Finally, FANCD2 expression was confirmed to be positively related to HCC immune cell infiltration, immune checkpoints, and IPS analysis, and GSEA results also revealed that this ferroptosis-associated gene was primarily involved in cancer-associated pathways in HCC. In conclusion, our investigations indicate that ncRNA-related modulatory overexpression of FANCD2 might act as a promising prognostic and immunotherapeutic target against HCC.
Collapse
Affiliation(s)
- Zhihao Yang
- Department of Oncology, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
- Tianjin Key Laboratory of Medical Epigenetics, Key Laboratory of Breast Cancer Prevention and Therapy (Ministry of Education), Department of Biochemistry and Molecular Biology, Tianjin Medical University, Tianjin, China
| | - Yaoshu Song
- Department of Oncology, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
- North Sichuan Medical College, Nanchong, China
| | - Ya Li
- Department of Pathology and Medical Research Center, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Yiming Mao
- Suzhou Kowloon Hospital, Shanghai Jiao Tong University School of Medicine, Suzhou, China
| | - Guobo Du
- Department of Oncology, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
- North Sichuan Medical College, Nanchong, China
| | - Bangxian Tan
- Department of Oncology, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
- North Sichuan Medical College, Nanchong, China
- *Correspondence: Bangxian Tan, ; Hongpan Zhang,
| | - Hongpan Zhang
- Department of Oncology, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
- North Sichuan Medical College, Nanchong, China
- *Correspondence: Bangxian Tan, ; Hongpan Zhang,
| |
Collapse
|
44
|
Arif M, Kabir M, Ahmed S, Khan A, Ge F, Khelifi A, Yu DJ. DeepCPPred: A Deep Learning Framework for the Discrimination of Cell-Penetrating Peptides and Their Uptake Efficiencies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2749-2759. [PMID: 34347603 DOI: 10.1109/tcbb.2021.3102133] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cell-penetrating peptides (CPPs) are special peptides capable of carrying a variety of bioactive molecules, such as genetic materials, short interfering RNAs and nanoparticles, into cells. Recently, research on CPP has gained substantial interest from researchers, and the biological mechanisms of CPPS have been assessed in the context of safe drug delivery agents and therapeutic applications. Correct identification and synthesis of CPPs using traditional biochemical methods is an extremely slow, expensive and laborious task particularly due to the large volume of unannotated peptide sequences accumulating in the World Bank repository. Hence, a powerful bioinformatics predictor that rapidly identifies CPPs with a high recognition rate is urgently needed. To date, numerous computational methods have been developed for CPP prediction. However, the available machine-learning (ML) tools are unable to distinguish both the CPPs and their uptake efficiencies. This study aimed to develop a two-layer deep learning framework named DeepCPPred to identify both CPPs in the first phase and peptide uptake efficiency in the second phase. The DeepCPPred predictor first uses four types of descriptors that cover evolutionary, energy estimation, reduced sequence and amino-acid contact information. Then, the extracted features are optimized through the elastic net algorithm and fed into a cascade deep forest algorithm to build the final CPP model. The proposed method achieved 99.45 percent overall accuracy with the CPP924 benchmark dataset in the first layer and 95.43 percent accuracy in the second layer with the CPPSite3 dataset using a 5-fold cross-validation test. Thus, our proposed bioinformatics tool surpassed all the existing state-of-the-art sequence-based CPP approaches.
Collapse
|
45
|
Wei A, Wang L. Prediction of Synaptically Localized RNAs in Human Neurons Using Developmental Brain Gene Expression Data. Genes (Basel) 2022; 13:1488. [PMID: 36011399 PMCID: PMC9408096 DOI: 10.3390/genes13081488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/16/2022] [Accepted: 08/19/2022] [Indexed: 11/16/2022] Open
Abstract
In the nervous system, synapses are special and pervasive structures between axonal and dendritic terminals, which facilitate electrical and chemical communications among neurons. Extensive studies have been conducted in mice and rats to explore the RNA pool at synapses and investigate RNA transport, local protein synthesis, and synaptic plasticity. However, owing to the experimental difficulties of studying human synaptic transcriptomes, the full pool of human synaptic RNAs remains largely unclear. We developed a new machine learning method, called PredSynRNA, to predict the synaptic localization of human RNAs. Training instances of dendritically localized RNAs were compiled from previous rodent studies, overcoming the shortage of empirical instances of human synaptic RNAs. Using RNA sequence and gene expression data as features, various models with different learning algorithms were constructed and evaluated. Strikingly, the models using the developmental brain gene expression features achieved superior performance for predicting synaptically localized RNAs. We examined the relevant expression features learned by PredSynRNA and used an independent test dataset to further validate the model performance. PredSynRNA models were then applied to the prediction and prioritization of candidate RNAs localized to human synapses, providing valuable targets for experimental investigations into neuronal mechanisms and brain disorders.
Collapse
Affiliation(s)
- Anqi Wei
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
- Center for Human Genetics, Clemson University, Greenwood, SC 29646, USA
| | - Liangjiang Wang
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
- Center for Human Genetics, Clemson University, Greenwood, SC 29646, USA
| |
Collapse
|
46
|
Jiang L, Yang J, Xu Q, Lv K, Cao Y. Machine learning for the micropeptide encoded by LINC02381 regulates ferroptosis through the glucose transporter SLC2A10 in glioblastoma. BMC Cancer 2022; 22:882. [PMID: 35962317 PMCID: PMC9373536 DOI: 10.1186/s12885-022-09972-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 08/03/2022] [Indexed: 11/10/2022] Open
Abstract
Glioblastoma (GBM) is the most common primary intracranial tumor in the central nervous system, and resistance to temozolomide is an important reason for the failure of GBM treatment. We screened out that Solute Carrier Family 2 Member 10 (SLC2A10) is significantly highly expressed in GBM with a poor prognosis, which is also enriched in the NF-E2 p45-related factor 2 (NRF2) signalling pathway. The NRF2 signalling pathway is an important defence mechanism against ferroptosis. SLC2A10 related LINC02381 is highly expressed in GBM, which is localized in the cytoplasm/exosomes, and LINC02381 encoded micropeptides are localized in the exosomes. The micropeptide encoded by LINC02381 may be a potential treatment strategy for GBM, but the underlying mechanism of its function is not precise yet. We put forward the hypothesis: “The micropeptide encoded by LINC02381 regulates ferroptosis through the glucose transporter SLC2A10 in GBM.” This study innovatively used machine learning for micropeptide to provide personalized diagnosis and treatment plans for precise treatment of GBM, thereby promoting the development of translational medicine. The study aimed to help find new disease diagnoses and prognostic biomarkers and provide a new strategy for experimental scientists to design the downstream validation experiments.
Collapse
Affiliation(s)
- Lan Jiang
- Key Laboratory of Non-Coding RNA Transformation Research of Anhui Higher Education Institution, Yijishan Hospital of Wannan Medical College, Wuhu, China.,Central Laboratory, Yijishan Hospital of Wannan Medical College, Wuhu, China.,Anhui Provincial Clinical Research Center for Critical Respiratory Disease, Wuhu, China
| | - Jianke Yang
- School of Preclinical Medicine, Wannan Medical College, Wuhu, China
| | - Qiancheng Xu
- Anhui Provincial Clinical Research Center for Critical Respiratory Disease, Wuhu, China
| | - Kun Lv
- Key Laboratory of Non-Coding RNA Transformation Research of Anhui Higher Education Institution, Yijishan Hospital of Wannan Medical College, Wuhu, China. .,Central Laboratory, Yijishan Hospital of Wannan Medical College, Wuhu, China. .,Anhui Provincial Clinical Research Center for Critical Respiratory Disease, Wuhu, China.
| | - Yunpeng Cao
- Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China.
| |
Collapse
|
47
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Circ-LocNet: A Computational Framework for Circular RNA Sub-Cellular Localization Prediction. Int J Mol Sci 2022; 23:ijms23158221. [PMID: 35897818 PMCID: PMC9329987 DOI: 10.3390/ijms23158221] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/15/2022] [Accepted: 07/20/2022] [Indexed: 02/04/2023] Open
Abstract
Circular ribonucleic acids (circRNAs) are novel non-coding RNAs that emanate from alternative splicing of precursor mRNA in reversed order across exons. Despite the abundant presence of circRNAs in human genes and their involvement in diverse physiological processes, the functionality of most circRNAs remains a mystery. Like other non-coding RNAs, sub-cellular localization knowledge of circRNAs has the aptitude to demystify the influence of circRNAs on protein synthesis, degradation, destination, their association with different diseases, and potential for drug development. To date, wet experimental approaches are being used to detect sub-cellular locations of circular RNAs. These approaches help to elucidate the role of circRNAs as protein scaffolds, RNA-binding protein (RBP) sponges, micro-RNA (miRNA) sponges, parental gene expression modifiers, alternative splicing regulators, and transcription regulators. To complement wet-lab experiments, considering the progress made by machine learning approaches for the determination of sub-cellular localization of other non-coding RNAs, the paper in hand develops a computational framework, Circ-LocNet, to precisely detect circRNA sub-cellular localization. Circ-LocNet performs comprehensive extrinsic evaluation of 7 residue frequency-based, residue order and frequency-based, and physio-chemical property-based sequence descriptors using the five most widely used machine learning classifiers. Further, it explores the performance impact of K-order sequence descriptor fusion where it ensembles similar as well dissimilar genres of statistical representation learning approaches to reap the combined benefits. Considering the diversity of statistical representation learning schemes, it assesses the performance of second-order, third-order, and going all the way up to seventh-order sequence descriptor fusion. A comprehensive empirical evaluation of Circ-LocNet over a newly developed benchmark dataset using different settings reveals that standalone residue frequency-based sequence descriptors and tree-based classifiers are more suitable to predict sub-cellular localization of circular RNAs. Further, K-order heterogeneous sequence descriptors fusion in combination with tree-based classifiers most accurately predict sub-cellular localization of circular RNAs. We anticipate this study will act as a rich baseline and push the development of robust computational methodologies for the accurate sub-cellular localization determination of novel circRNAs.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
- Correspondence:
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- School of Computer Science & Electrical Engineering, National University of Sciences and Technology, Islamabad 44000, Pakistan;
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
48
|
Le P, Ahmed N, Yeo GW. Illuminating RNA biology through imaging. Nat Cell Biol 2022; 24:815-824. [PMID: 35697782 PMCID: PMC11132331 DOI: 10.1038/s41556-022-00933-9] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 05/06/2022] [Indexed: 12/14/2022]
Abstract
RNA processing plays a central role in accurately transmitting genetic information into functional RNA and protein regulators. To fully appreciate the RNA life-cycle, tools to observe RNA with high spatial and temporal resolution are critical. Here we review recent advances in RNA imaging and highlight how they will propel the field of RNA biology. We discuss current trends in RNA imaging and their potential to elucidate unanswered questions in RNA biology.
Collapse
Affiliation(s)
- Phuong Le
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Stem Cell Program, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Noorsher Ahmed
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Stem Cell Program, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Gene W Yeo
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA.
- Stem Cell Program, University of California San Diego, La Jolla, CA, USA.
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA.
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
49
|
Wang Y, Zhu X, Yang L, Hu X, He K, Yu C, Jiao S, Chen J, Guo R, Yang S. IDDLncLoc: Subcellular Localization of LncRNAs Based on a Framework for Imbalanced Data Distributions. Interdiscip Sci 2022; 14:409-420. [PMID: 35192174 DOI: 10.1007/s12539-021-00497-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 12/16/2021] [Accepted: 12/20/2021] [Indexed: 06/14/2023]
Abstract
Long non-coding RNAs play a crucial role in many life processes of cell, such as genetic markers, RNA splicing, signaling, and protein regulation. Considering that identifying lncRNA's localization in the cell through experimental methods is complicated, hard to reproduce, and expensive, we propose a novel method named IDDLncLoc in this paper, which adopts an ensemble model to solve the problem of the subcellular localization. In the proposal model, dinucleotide-based auto-cross covariance features, k-mer nucleotide composition features, and composition, transition, and distribution features are introduced to encode a raw RNA sequence to vector. To screen out reliable features, feature selection through binomial distribution, and recursive feature elimination is employed. Furthermore, strategies of oversampling in mini-batch, random sampling, and stacking ensemble strategies are customized to overcome the problem of data imbalance on the benchmark dataset. Finally, compared with the latest methods, IDDLncLoc achieves an accuracy of 94.96% on the benchmark dataset, which is 2.59% higher than the best method, and the results further demonstrate IDDLncLoc is excellent on the subcellular localization of lncRNA. Besides, a user-friendly web server is available at http://lncloc.club .
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Xiaopeng Zhu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Lili Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
- Department of Obstetrics, The First Hospital of Jilin University, Changchun, China
| | - Xuemei Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Kai He
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Cuinan Yu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Shaoqing Jiao
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Jiali Chen
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Rui Guo
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Sen Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
| |
Collapse
|
50
|
Han YC, Xie HZ, Lu B, Xiang RL, Li JY, Qian H, Zhang SY. Effect of berberine on global modulation of lncRNAs and mRNAs expression profiles in patients with stable coronary heart disease. BMC Genomics 2022; 23:400. [PMID: 35619068 PMCID: PMC9134690 DOI: 10.1186/s12864-022-08641-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 05/16/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Berberine (BBR) is an isoquinoline alkaloid found in the Berberis species. It was found to have protected effects in cardiovascular diseases. Here, we investigated the effect the regulatory function of long noncoding RNAs (lncRNAs) during the treatment of stable coronary heart disease (CHD) using BBR. We performed microarray analyses to identify differentially expressed (DE) lncRNAs and mRNAs between whole blood samples from 5 patients with stable CHD taking BBR and 5 no BBR volunteers. DE lncRNAs and mRNAs were validated by quantitative real-time PCR. RESULTS A total of 1703 DE lncRNAs and 912 DE mRNAs were identified. Kyoto Encyclopedia of Genes and Genomes pathway analysis indicated DE mRNAs might be associated with mammalian target of rapamycin and mitogen-activated protein kinase pathway. These pathways may be involved in the healing process after CHD. To study the relationship between mRNAs encoding transcription factors (DNA damage inducible transcript 3, sal-like protein 4 and estrogen receptor alpha gene) and CHD related de mRNAs, we performed protein and protein interaction analysis on their corresponding proteins. AKT and apoptosis pathway were significant enriched in protein and protein interaction network. BBR may affect downstream apoptosis pathways through DNA damage inducible transcript 3, sal-like protein 4 and estrogen receptor alpha gene. Growth arrest-specific transcript 5 might regulate CHD-related mRNAs through competing endogenous RNA mechanism and may be the downstream target gene regulated by BBR. Verified by the quantitative real-time PCR, we identified 8 DE lncRNAs that may relate to CHD. We performed coding and non-coding co-expression and competing endogenous RNA mechanism analysis of these 8 DE lncRNAs and CHD-related DE mRNA, and predicted their subcellular localization and N6-methyladenosine modification sites. CONCLUSION Our research found that BBR may affect mammalian target of rapamycin, mitogen-activated protein kinase, apoptosis pathway and growth arrest-specific transcript 5 in the process of CHD. These pathways may be involved in the healing process after CHD. Our research might provide novel insights for functional research of BBR.
Collapse
Affiliation(s)
- Ye-Chen Han
- Department of Cardiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No.1 Shuaifuyuan, Dongcheng District, Beijing, 100730, China
| | - Hong-Zhi Xie
- Department of Cardiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No.1 Shuaifuyuan, Dongcheng District, Beijing, 100730, China
| | - Bo Lu
- Department of Gastroenterology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
| | - Ruo-Lan Xiang
- Department of Physiology and Pathophysiology, Peking University School of Basic Medical Sciences, Beijing, 100191, China
| | - Jing-Yi Li
- Department of Cardiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No.1 Shuaifuyuan, Dongcheng District, Beijing, 100730, China
| | - Hao Qian
- Department of Cardiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No.1 Shuaifuyuan, Dongcheng District, Beijing, 100730, China
| | - Shu-Yang Zhang
- Department of Cardiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No.1 Shuaifuyuan, Dongcheng District, Beijing, 100730, China.
| |
Collapse
|