1
|
Zhang X, Ma H, Wang S, Wu H, Jiang Y, Liu Q. NPI-HGNN: A Heterogeneous Graph Neural Network-Based Approach for Predicting ncRNA-Protein Interactions. Interdiscip Sci 2025:10.1007/s12539-025-00689-4. [PMID: 39982679 DOI: 10.1007/s12539-025-00689-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 01/07/2025] [Accepted: 01/09/2025] [Indexed: 02/22/2025]
Abstract
Accurate identification of ncRNA-protein interactions (NPIs) is critical for understanding various cellular activities and biological functions of ncRNAs and proteins. Many sequence- and/or structure- and graph-based computational approaches have been developed to identify NPIs from large-scale ncRNA and protein data in a high-throughput manner. However, many sequence- and/or structure- and graph-based computational approaches often ignore either the topological information in NPIs or the influence of other molecule networks on NPI prediction. In this work, we propose NPI-HGNN, an end-to-end graph neural network (GNN)-based approach for the identification of NPIs from a large heterogeneous network, consisting of the ncRNA-protein interaction network, the ncRNA-ncRNA similarity network, and the protein-protein interaction network. To our knowledge, NPI-HGNN is the first GNN-based predictor that integrates related heterogeneous networks for NPI prediction. Experiments on five benchmarking datasets demonstrate that NPI-HGNN outperformed several state-of-the-art sequence- and/or structure- and graph-based predictors. In addition, we showcased the prediction power of NPI-HGNN by identifying 12 interacting ncRNAs of the pre-mRNA 3' end processing protein, which indicates the effectiveness of the proposed model. The source code of NPI-HGNN is freely available for academic purposes at https://github.com/zhangxin11111/NPI-HGNN .
Collapse
Affiliation(s)
- Xin Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Haofeng Ma
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Sizhe Wang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Hao Wu
- School of Software, Shandong University, Jinan, 250100, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
- Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, 712100, China.
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China.
- Shaanxi Engineering Research Center of Agricultural Information Intelligent Perception and Analysis, Northwest A&F University, Yangling, 712100, China.
| |
Collapse
|
2
|
Zhang X, Liu M, Li Z, Zhuo L, Fu X, Zou Q. Fusion of multi-source relationships and topology to infer lncRNA-protein interactions. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102187. [PMID: 38706631 PMCID: PMC11066462 DOI: 10.1016/j.omtn.2024.102187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 05/07/2024]
Abstract
Long non-coding RNAs (lncRNAs) are important factors involved in biological regulatory networks. Accurately predicting lncRNA-protein interactions (LPIs) is vital for clarifying lncRNA's functions and pathogenic mechanisms. Existing deep learning models have yet to yield satisfactory results in LPI prediction. Recently, graph autoencoders (GAEs) have seen rapid development, excelling in tasks like link prediction and node classification. We employed GAE technology for LPI prediction, devising the FMSRT-LPI model based on path masking and degree regression strategies and thereby achieving satisfactory outcomes. This represents the first known integration of path masking and degree regression strategies into the GAE framework for potential LPI inference. The effectiveness of our FMSRT-LPI model primarily relies on four key aspects. First, within the GAE framework, our model integrates multi-source relationships of lncRNAs and proteins with LPN's topological data. Second, the implemented masking strategy efficiently identifies LPN's key paths, reconstructs the network, and reduces the impact of redundant or incorrect data. Third, the integrated degree decoder balances degree and structural information, enhancing node representation. Fourth, the PolyLoss function we introduced is more appropriate for LPI prediction tasks. The results on multiple public datasets further demonstrate our model's potential in LPI prediction.
Collapse
Affiliation(s)
- Xinyu Zhang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Mingzhe Liu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Zhen Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou 510000, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China
| |
Collapse
|
3
|
Sun DZ, Sun ZL, Liu M, Yong SH. LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering. Interdiscip Sci 2024; 16:378-391. [PMID: 38206558 DOI: 10.1007/s12539-023-00598-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/25/2023] [Accepted: 12/05/2023] [Indexed: 01/12/2024]
Abstract
Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.
Collapse
Affiliation(s)
- Dian-Zheng Sun
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
| | - Zhan-Li Sun
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China.
| | - Mengya Liu
- School of Computer Science and Technology, Anhui University, Hefei, 230601, China
| | - Shuang-Hao Yong
- School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
| |
Collapse
|
4
|
Shen C, Mao D, Tang J, Liao Z, Chen S. Prediction of LncRNA-Protein Interactions Based on Kernel Combinations and Graph Convolutional Networks. IEEE J Biomed Health Inform 2024; 28:1937-1948. [PMID: 37327093 DOI: 10.1109/jbhi.2023.3286917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
The complexes of long non-coding RNAs bound to proteins can be involved in regulating life activities at various stages of organisms. However, in the face of the growing number of lncRNAs and proteins, verifying LncRNA-Protein Interactions (LPI) based on traditional biological experiments is time-consuming and laborious. Therefore, with the improvement of computing power, predicting LPI has met new development opportunity. In virtue of the state-of-the-art works, a framework called LncRNA-Protein Interactions based on Kernel Combinations and Graph Convolutional Networks (LPI-KCGCN) has been proposed in this article. We first construct kernel matrices by taking advantage of extracting both the lncRNAs and protein concerning the sequence features, sequence similarity features, expression features, and gene ontology. Then reconstruct the existent kernel matrices as the input of the next step. Combined with known LPI interactions, the reconstructed similarity matrices, which can be used as features of the topology map of the LPI network, are exploited in extracting potential representations in the lncRNA and protein space using a two-layer Graph Convolutional Network. The predicted matrix can be finally obtained by training the network to produce scoring matrices w.r.t. lncRNAs and proteins. Different LPI-KCGCN variants are ensemble to derive the final prediction results and testify on balanced and unbalanced datasets. The 5-fold cross-validation shows that the optimal feature information combination on a dataset with 15.5% positive samples has an AUC value of 0.9714 and an AUPR value of 0.9216. On another highly unbalanced dataset with only 5% positive samples, LPI-KCGCN also has outperformed the state-of-the-art works, which achieved an AUC value of 0.9907 and an AUPR value of 0.9267.
Collapse
|
5
|
Yan J, Qu W, Li X, Wang R, Tan J. GATLGEMF: A graph attention model with line graph embedding multi-complex features for ncRNA-protein interactions prediction. Comput Biol Chem 2024; 108:108000. [PMID: 38070456 DOI: 10.1016/j.compbiolchem.2023.108000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 11/27/2023] [Accepted: 12/03/2023] [Indexed: 01/22/2024]
Abstract
Non-coding RNA (ncRNA) plays an important role in many fundamental biological processes, and it may be closely associated with many complex human diseases. NcRNAs exert their functions by interacting with proteins. Therefore, identifying novel ncRNA-protein interactions (NPIs) is important for understanding the mechanism of ncRNAs role. The computational approach has the advantage of low cost and high efficiency. Machine learning and deep learning have achieved great success by making full use of sequence information and structure information. Graph neural network (GNN) is a deep learning algorithm for complex network link prediction, which can extract and discover features in graph topology data. In this study, we propose a new computational model called GATLGEMF. We used a line graph transformation strategy to obtain the most valuable feature information and input this feature information into the attention network to predict NPIs. The results on four benchmark datasets show that our method achieves superior performance. We further compare GATLGEMF with the state-of-the-art existing methods to evaluate the model performance. GATLGEMF shows the best performance with the area under curve (AUC) of 92.41% and 98.93% on RPI2241 and NPInter v2.0 datasets, respectively. In addition, a case study shows that GATLGEMF has the ability to predict new interactions based on known interactions. The source code is available at https://github.com/JianjunTan-Beijing/GATLGEMF.
Collapse
Affiliation(s)
- Jing Yan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Wenyan Qu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Xiaoyi Li
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Ruobing Wang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China.
| |
Collapse
|
6
|
Huiwen J, Kai S. Prediction of LncRNA-protein Interactions Using Auto-Encoder, SE-ResNet Models and Transfer Learning. Microrna 2024; 13:155-165. [PMID: 38591194 DOI: 10.2174/0122115366288068240322064431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/26/2024] [Accepted: 03/09/2024] [Indexed: 04/10/2024]
Abstract
BACKGROUND Long non-coding RNA (lncRNA) plays a crucial role in various biological processes, and mutations or imbalances of lncRNAs can lead to several diseases, including cancer, Prader-Willi syndrome, autism, Alzheimer's disease, cartilage-hair hypoplasia, and hearing loss. Understanding lncRNA-protein interactions (LPIs) is vital for elucidating basic cellular processes, human diseases, viral replication, transcription, and plant pathogen resistance. Despite the development of several LPI calculation methods, predicting LPI remains challenging, with the selection of variables and deep learning structure being the focus of LPI research. METHODS We propose a deep learning framework called AR-LPI, which extracts sequence and secondary structure features of proteins and lncRNAs. The framework utilizes an auto-encoder for feature extraction and employs SE-ResNet for prediction. Additionally, we apply transfer learning to the deep neural network SE-ResNet for predicting small-sample datasets. RESULTS Through comprehensive experimental comparison, we demonstrate that the AR-LPI architecture performs better in LPI prediction. Specifically, the accuracy of AR-LPI increases by 2.86% to 94.52%, while the F-value of AR-LPI increases by 2.71% to 94.73%. CONCLUSION Our experimental results show that the overall performance of AR-LPI is better than that of other LPI prediction tools.
Collapse
Affiliation(s)
- Jiang Huiwen
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong, China
| | - Song Kai
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong, China
| |
Collapse
|
7
|
Xie M, Xie R, Wang H. LPI-IBWA: Predicting lncRNA-protein interactions based on an improved Bi-Random walk algorithm. Methods 2023; 220:98-105. [PMID: 37972912 DOI: 10.1016/j.ymeth.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 10/14/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023] Open
Abstract
Many studies have shown that long-chain noncoding RNAs (lncRNAs) are involved in a variety of biological processes such as post-transcriptional gene regulation, splicing, and translation by combining with corresponding proteins. Predicting lncRNA-protein interactions is an effective approach to infer the functions of lncRNAs. The paper proposes a new computational model named LPI-IBWA. At first, LPI-IBWA uses similarity kernel fusion (SKF) to integrate various types of biological information to construct lncRNA and protein similarity networks. Then, a bounded matrix completion model and a weighted k-nearest known neighbors algorithm are utilized to update the initial sparse lncRNA-protein interaction matrix. Based on the updated lncRNA-protein interaction matrix, the lncRNA similarity network and the protein similarity network are integrated into a heterogeneous network. Finally, an improved Bi-Random walk algorithm is used to predict novel latent lncRNA-protein interactions. 5-fold cross-validation experiments on a benchmark dataset showed that the AUC and AUPR of LPI-IBWA reach 0.920 and 0.736, respectively, which are higher than those of other state-of-the-art methods. Furthermore, the experimental results of case studies on a novel dataset also illustrated that LPI-IBWA could efficiently predict potential lncRNA-protein interactions.
Collapse
Affiliation(s)
- Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, China.
| | - Ruijie Xie
- College of Information Science and Engineering, Hunan Normal University, China.
| | - Hao Wang
- College of Information Science and Engineering, Hunan Normal University, China.
| |
Collapse
|
8
|
Lv G, Xia Y, Qi Z, Zhao Z, Tang L, Chen C, Yang S, Wang Q, Gu L. LncRNA-protein interaction prediction with reweighted feature selection. BMC Bioinformatics 2023; 24:410. [PMID: 37904080 PMCID: PMC10617115 DOI: 10.1186/s12859-023-05536-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
LncRNA-protein interactions are ubiquitous in organisms and play a crucial role in a variety of biological processes and complex diseases. Many computational methods have been reported for lncRNA-protein interaction prediction. However, the experimental techniques to detect lncRNA-protein interactions are laborious and time-consuming. Therefore, to address this challenge, this paper proposes a reweighting boosting feature selection (RBFS) method model to select key features. Specially, a reweighted apporach can adjust the contribution of each observational samples to learning model fitting; let higher weights are given more influence samples than those with lower weights. Feature selection with boosting can efficiently rank to iterate over important features to obtain the optimal feature subset. Besides, in the experiments, the RBFS method is applied to the prediction of lncRNA-protein interactions. The experimental results demonstrate that our method achieves higher accuracy and less redundancy with fewer features.
Collapse
Affiliation(s)
- Guohao Lv
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Yingchun Xia
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zhao Qi
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zihao Zhao
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Lianggui Tang
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Cheng Chen
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Shuai Yang
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Qingyong Wang
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Lichuan Gu
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
9
|
Su Z, Lu H, Wu Y, Li Z, Duan L. Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM. Front Genet 2023; 14:1238095. [PMID: 37655066 PMCID: PMC10466784 DOI: 10.3389/fgene.2023.1238095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Accepted: 07/19/2023] [Indexed: 09/02/2023] Open
Abstract
Introduction: Lung cancer is one of the most frequent neoplasms worldwide with approximately 2.2 million new cases and 1.8 million deaths each year. The expression levels of programmed death ligand-1 (PDL1) demonstrate a complex association with lung cancer. Neuroblastoma is a high-risk malignant tumor and is mainly involved in childhood patients. Identification of new biomarkers for these two diseases can significantly promote their diagnosis and therapy. However, in vivo experiments to discover potential biomarkers are costly and laborious. Consequently, artificial intelligence technologies, especially machine learning methods, provide a powerful avenue to find new biomarkers for various diseases. Methods: We developed a machine learning-based method named LDAenDL to detect potential long noncoding RNA (lncRNA) biomarkers for lung cancer and neuroblastoma using an ensemble of a deep neural network and LightGBM. LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network, graph attention network, and convolutional neural network to learn the biological features of the lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network and LightGBM to find new lncRNA-disease associations (LDAs). Finally, the proposed LDAenDL method is applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma. Results: The experimental results show that LDAenDL computed the best AUCs of 0.8701, 107 0.8953, and 0.9110 under cross-validation on lncRNAs, diseases, and lncRNA-disease pairs on Dataset 1, respectively, and 0.9490, 0.9157, and 0.9708 on Dataset 2, respectively. Furthermore, AUPRs of 0.8903, 0.9061, and 0.9166 under three cross-validations were obtained on Dataset 1, and 0.9582, 0.9122, and 0.9743 on Dataset 2. The results demonstrate that LDAenDL significantly outperformed the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). Case studies demonstrate that CCDC26 and IFNG-AS1 may be new biomarkers of lung cancer, SNHG3 may associate with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers of neuroblastoma. Conclusion: We hope that the proposed LDAenDL method can help the development of targeted therapies for these two diseases.
Collapse
Affiliation(s)
- Zhenguo Su
- Clinical Lab, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, China
| | - Huihui Lu
- Department of Thoracic Cardiovascular Surgery, Hunan Province Directly Affiliated TCM Hospital, Zhuzhou, China
| | - Yan Wu
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lian Duan
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| |
Collapse
|
10
|
Han Y, Zhang SW. Docsubty: FLAncRPI-LGAT: Prediction of ncRNA-Protein Interactions with Line Graph Attention Network Framework. Comput Struct Biotechnol J 2023; 21:2286-2295. [PMID: 37035546 PMCID: PMC10073990 DOI: 10.1016/j.csbj.2023.03.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/11/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Identification of ncRNA-protein interactions (ncRPIs) through wet experiments is still time-consuming and highly-costly. Although several computational approaches have been developed to predict ncRPIs using the structure and sequence information of ncRNAs and proteins, the prediction accuracy needs to be improved, and the results lack interpretability. In this work, we proposed a novel computational method (called ncRPI-LGAT) to predict the ncRNA-Protein Interactions by transforming the link prediction (i.e., subgraph classification) task into a node classification task in the line network, and introducing a Line Graph ATtention network framework. ncRPI-LGAT first extracts the ncRNA/protein attributes using node2vec, and then generates the local enclosing subgraph of a target ncRNA-protein pair with SEAL. Because using the pooling operations in local enclosing subgraphs to learn a fixed-size feature vector for representing ncRNAs/proteins will cause the information loss, ncRPI-LGAT converts the local enclosing subgraphs into their corresponding line graphs, in which the node corresponds to the edge (i.e., ncRNA-protein pair) of the local enclosing subgraphs. Then, the attention mechanism-based graph neural network GATv2 is used on these line graphs to efficiently learn the embedding features of the target nodes (i.e., ncRNA-protein pairs) by focusing on learning the significance of one ncRNA-protein pair to another ncRNA-protein pair. These embedding features of one ncRNA-protein pair obtained from multi-head attention are concatenated in series and then fed them into a fully connected network to predict ncRPIs. Compared with other state-of-the-art methods in the 5CV test, ncRPI-LGAT shows superior performance on three benchmark datasets, demonstrating the effectiveness of our ncRPI-LGAT method in predicting ncRNA-protein interactions.
Collapse
|
11
|
Constructing discriminative feature space for LncRNA-protein interaction based on deep autoencoder and marginal fisher analysis. Comput Biol Med 2023; 157:106711. [PMID: 36924738 DOI: 10.1016/j.compbiomed.2023.106711] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/26/2023] [Accepted: 02/26/2023] [Indexed: 03/04/2023]
Abstract
Long non-coding RNAs (lncRNAs) play important roles by regulating proteins in many biological processes and life activities. To uncover molecular mechanisms of lncRNA, it is very necessary to identify interactions of lncRNA with proteins. Recently, some machine learning methods were proposed to detect lncRNA-protein interactions according to the distribution of known interactions. The performances of these methods were largely dependent upon: (1) how exactly the distribution of known interactions was characterized by feature space; (2) how discriminative the feature space was for distinguishing lncRNA-protein interactions. Because the known interactions may be multiple and complex model, it remains a challenge to construct discriminative feature space for lncRNA-protein interactions. To resolve this problem, a novel method named DFRPI was developed based on deep autoencoder and marginal fisher analysis in this paper. Firstly, some initial features of lncRNA-protein interactions were extracted from the primary sequences and secondary structures of lncRNA and protein. Secondly, a deep autoencoder was exploited to learn encode parameters of the initial features to describe the known interactions precisely. Next, the marginal fisher analysis was employed to optimize the encode parameters of features to characterize a discriminative feature space of the lncRNA-protein interactions. Finally, a random forest-based predictor was trained on the discriminative feature space to detect lncRNA-protein interactions. Verified by a series of experiments, the results showed that our predictor achieved the precision of 0.920, recall of 0.916, accuracy of 0.918, MCC of 0.836, specificity of 0.920, sensitivity of 0.916 and AUC of 0.906 respectively, which outperforms the concerned methods for predicting lncRNA-protein interaction. It may be suggested that the proposed method can generate a reasonable and effective feature space for distinguishing lncRNA-protein interactions accurately. The code and data are available on https://github.com/D0ub1e-D/DFRPI.
Collapse
|
12
|
Wei MM, Yu CQ, Li LP, You ZH, Ren ZH, Guan YJ, Wang XF, Li YC. LPIH2V: LncRNA-protein interactions prediction using HIN2Vec based on heterogeneous networks model. Front Genet 2023; 14:1122909. [PMID: 36845392 PMCID: PMC9950107 DOI: 10.3389/fgene.2023.1122909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 01/30/2023] [Indexed: 02/12/2023] Open
Abstract
LncRNA-protein interaction plays an important role in the development and treatment of many human diseases. As the experimental approaches to determine lncRNA-protein interactions are expensive and time-consuming, considering that there are few calculation methods, therefore, it is urgent to develop efficient and accurate methods to predict lncRNA-protein interactions. In this work, a model for heterogeneous network embedding based on meta-path, namely LPIH2V, is proposed. The heterogeneous network is composed of lncRNA similarity networks, protein similarity networks, and known lncRNA-protein interaction networks. The behavioral features are extracted in a heterogeneous network using the HIN2Vec method of network embedding. The results showed that LPIH2V obtains an AUC of 0.97 and ACC of 0.95 in the 5-fold cross-validation test. The model successfully showed superiority and good generalization ability. Compared to other models, LPIH2V not only extracts attribute characteristics by similarity, but also acquires behavior properties by meta-path wandering in heterogeneous networks. LPIH2V would be beneficial in forecasting interactions between lncRNA and protein.
Collapse
Affiliation(s)
- Meng-Meng Wei
- School of Information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an, China,*Correspondence: Chang-Qing Yu, ; Li-Ping Li,
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China,College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi, China,*Correspondence: Chang-Qing Yu, ; Li-Ping Li,
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an, China
| | | | | |
Collapse
|
13
|
Ma Y, Zhang H, Jin C, Kang C. Predicting lncRNA-protein interactions with bipartite graph embedding and deep graph neural networks. Front Genet 2023; 14:1136672. [PMID: 36845380 PMCID: PMC9948011 DOI: 10.3389/fgene.2023.1136672] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 01/30/2023] [Indexed: 02/11/2023] Open
Abstract
Background: Long non-coding RNAs (lncRNAs) play crucial roles in numerous biological processes. Investigation of the lncRNA-protein interaction contributes to discovering the undetected molecular functions of lncRNAs. In recent years, increasingly computational approaches have substituted the traditional time-consuming experiments utilized to crack the possible unknown associations. However, significant explorations of the heterogeneity in association prediction between lncRNA and protein are inadequate. It remains challenging to integrate the heterogeneity of lncRNA-protein interactions with graph neural network algorithms. Methods: In this paper, we constructed a deep architecture based on GNN called BiHo-GNN, which is the first to integrate the properties of homogeneous with heterogeneous networks through bipartite graph embedding. Different from previous research, BiHo-GNN can capture the mechanism of molecular association by the data encoder of heterogeneous networks. Meanwhile, we design the process of mutual optimization between homogeneous and heterogeneous networks, which can promote the robustness of BiHo-GNN. Results: We collected four datasets for predicting lncRNA-protein interaction and compared the performance of current prediction models on benchmarking dataset. In comparison with the performance of other models, BiHo-GNN outperforms existing bipartite graph-based methods. Conclusion: Our BiHo-GNN integrates the bipartite graph with homogeneous graph networks. Based on this model structure, the lncRNA-protein interactions and potential associations can be predicted and discovered accurately.
Collapse
Affiliation(s)
- Yuzhou Ma
- College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tianjin, China,*Correspondence: Han Zhang,
| | - Chen Jin
- College of Computer Science, Nankai University, Tianjin, China
| | - Chuanze Kang
- College of Artificial Intelligence, Nankai University, Tianjin, China
| |
Collapse
|
14
|
Wang JX, Zhao X, Xu SQ. Screening Key lncRNAs of Ankylosing Spondylitis Using Bioinformatics Analysis. J Inflamm Res 2022; 15:6087-6096. [PMID: 36386591 PMCID: PMC9642369 DOI: 10.2147/jir.s387258] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/26/2022] [Indexed: 04/08/2025] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) are important regulators in ankylosing spondylitis (AS). Few studies have examined the lncRNA-RNA binding protein (RBP) interaction in AS. This study performed bioinformatics analysis and clinical verification to identify key lncRNAs and propose their RBP interaction. METHODS Three GEO datasets of AS were analyzed by differential expression analysis. The differentially expressed lncRNAs between the AS and control groups were screened out, and the intersecting lncRNAs were regarded as target lncRNAs. Functional was performed to identify target lncRNAs by enrichment analysis, co-expressed RNA analysis, and lncRNA-RBP interaction analysis. Finally, this study analyzed the differential expression level and clinical value of lncRNAs between the AS and control groups. RESULTS Linc00304, linc00926, and MIAT were differentially expressed and upregulated. Enrichment analysis indicated that the key KEGG terms were the T-cell receptor signaling pathway and B-cell receptor signaling pathway. The key molecular function term was protein binding, and the key biological process term was adaptive immune response. In qRT-PCR results, 44 samples were validated. linc00304 expression was positively correlated with bath ankylosing spondylitis disease activity index (BASDAI), bath ankylosing spondylitis functional index (BASFI), erythrocyte sedimentation rate (ESR), and c-reactive protein (CRP). linc00926 expression was only positively correlated with ESR, whereas MIAT expression was positively correlated with BASFI, ESR, and CRP. Logistic regression revealed that linc00304, ESR, and CRP were the independent risk factors for BASDAI activation. The area under the curve (AUC) of serum linc00304 level in the diagnosis of AS was 0.687 (cutoff value: 0.413, specificity: 0.423, sensitivity: 0.900). AUC of linc00926 was 0.664 (cutoff value: 0.299, sensitivity: 0.882, specificity: 0.417). AUC of MIAT was 0.623 (cutoff value: 0.432, specificity: 0.443, sensitivity: 0.890) (all P <0.05). CONCLUSION Overall, this study uncovered three novel lncRNAs, which were upregulated in AS, and proposed a new lncRNA-RBP-mRNA interaction that might regulate adaptive immune response.
Collapse
Affiliation(s)
- Jian-Xiong Wang
- Department of Rheumatology & Immunology, The First Affiliated Hospital of Anhui Medical University, Hefei, People’s Republic of China
| | - Xu Zhao
- Department of Rheumatology & Immunology, The First Affiliated Hospital of Anhui Medical University, Hefei, People’s Republic of China
| | - Sheng-Qian Xu
- Department of Rheumatology & Immunology, The First Affiliated Hospital of Anhui Medical University, Hefei, People’s Republic of China
| |
Collapse
|
15
|
Peng L, Wang C, Tian X, Zhou L, Li K. Finding lncRNA-Protein Interactions Based on Deep Learning With Dual-Net Neural Architecture. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3456-3468. [PMID: 34587091 DOI: 10.1109/tcbb.2021.3116232] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The identification of lncRNA-protein interactions (LPIs) is important to understand the biological functions and molecular mechanisms of lncRNAs. However, most computational models are evaluated on a unique dataset, thereby resulting in prediction bias. Furthermore, previous models have not uncovered potential proteins (or lncRNAs) interacting with a new lncRNA (or protein). Finally, the performance of these models can be improved. In this study, we develop a Deep Learning framework with Dual-net Neural architecture to find potential LPIs (LPI-DLDN). First, five LPI datasets are collected. Second, the features of lncRNAs and proteins are extracted by Pyfeat and BioTriangle, respectively. Third, these features are concatenated as a vector after dimension reduction. Finally, a deep learning model with dual-net neural architecture is designed to classify lncRNA-protein pairs. LPI-DLDN is compared with six state-of-the-art LPI prediction methods (LPI-XGBoost, LPI-HeteSim, LPI-NRLMF, PLIPCOM, LPI-CNNCP, and Capsule-LPI) under four cross validations. The results demonstrate the powerful LPI classification performance of LPI-DLDN. Case study analyses show that there may be interactions between RP11-439E19.10 and Q15717, and between RP11-196G18.22 and Q9NUL5. The novelty of LPI-DLDN remains, integrating various biological features, designing a novel deep learning-based LPI identification framework, and selecting the optimal LPI feature subset based on feature importance ranking.
Collapse
|
16
|
Pepe G, Appierdo R, Carrino C, Ballesio F, Helmer-Citterich M, Gherardini PF. Artificial intelligence methods enhance the discovery of RNA interactions. Front Mol Biosci 2022; 9:1000205. [PMID: 36275611 PMCID: PMC9585310 DOI: 10.3389/fmolb.2022.1000205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type.
Collapse
Affiliation(s)
- G Pepe
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - R Appierdo
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - C Carrino
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - F Ballesio
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - M Helmer-Citterich
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - PF Gherardini
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
17
|
Multi-feature Fusion Method Based on Linear Neighborhood Propagation Predict Plant LncRNA-Protein Interactions. Interdiscip Sci 2022; 14:545-554. [PMID: 35040094 DOI: 10.1007/s12539-022-00501-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 12/28/2021] [Accepted: 01/04/2022] [Indexed: 12/31/2022]
Abstract
Long non-coding RNAs (lncRNAs) have attracted extensive attention due to their important roles in various biological processes, among which lncRNA-protein interaction plays an important regulatory role in plant immunity and life activities. Laboratory methods are time consuming and labor-intensive, so that many computational methods have gradually emerged as auxiliary tools to assist relevant research. However, there are relatively few methods to predict lncRNA-protein interaction of plant. Due to the lack of experimentally verified interactions data, there is an imbalance between known and unknown interaction samples in plant data sets. In this study, a multi-feature fusion method based on linear neighborhood propagation is developed to predict plant unobserved lncRNA-protein interaction pairs through known interaction pairs, called MPLPLNP. The linear neighborhood similarity of the feature space is calculated and the results are predicted by label propagation. Meanwhile, multiple feature training is integrated to better explore the potential interaction information in the data. The experimental results show that the proposed multi-feature fusion method can improve the performance of the model, and is superior to other state-of-the-art approaches. Moreover, the proposed approach has better performance and generalization ability on various plant datasets, which is expected to facilitate the related research of plant molecular biology.
Collapse
|
18
|
Zhao C, Xie W, Zhu H, Zhao M, Liu W, Wu Z, Wang L, Zhu B, Li S, Zhou Y, Jiang X, Xu Q, Ren C. LncRNAs and their RBPs: How to influence the fate of stem cells? Stem Cell Res Ther 2022; 13:175. [PMID: 35505438 PMCID: PMC9066789 DOI: 10.1186/s13287-022-02851-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 04/12/2022] [Indexed: 12/12/2022] Open
Abstract
Stem cells are distinctive cells that have self-renewal potential and unique ability to differentiate into multiple functional cells. Stem cell is a frontier field of life science research and has always been a hot spot in biomedical research. Recent studies have shown that long non-coding RNAs (lncRNAs) have irreplaceable roles in stem cell self-renewal and differentiation. LncRNAs play crucial roles in stem cells through a variety of regulatory mechanisms, including the recruitment of RNA-binding proteins (RBPs) to affect the stability of their mRNAs or the expression of downstream genes. RBPs interact with different RNAs to regulate gene expression at transcriptional and post-transcriptional levels and play important roles in determining the fate of stem cells. In this review, the functions of lncRNAs and their RBPs in self-renewal and differentiation of stem cell are summarized. We focus on the four regulatory mechanisms by which lncRNAs and their RBPs are involved in epigenetic regulation, signaling pathway regulation, splicing, mRNA stability and subcellular localization and further discuss other noncoding RNAs (ncRNAs) and their RBPs in the fate of stem cells. This work provides a more comprehensive understanding of the roles of lncRNAs in determining the fate of stem cells, and a further understanding of their regulatory mechanisms will provide a theoretical basis for the development of clinical regenerative medicine.
Collapse
Affiliation(s)
- Cong Zhao
- Cancer Research Institute, Department of Neurosurgery, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, 410008, China.,The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, School of Basic Medicine, Central South University, Changsha, 410008, China
| | - Wen Xie
- Cancer Research Institute, Department of Neurosurgery, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, 410008, China.,The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, School of Basic Medicine, Central South University, Changsha, 410008, China
| | - Hecheng Zhu
- Changsha Kexin Cancer Hospital, Changsha, 410205, China
| | - Ming Zhao
- Changsha Kexin Cancer Hospital, Changsha, 410205, China
| | - Weidong Liu
- Cancer Research Institute, Department of Neurosurgery, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, 410008, China.,The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, School of Basic Medicine, Central South University, Changsha, 410008, China
| | - Zhaoping Wu
- Cancer Research Institute, Department of Neurosurgery, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, 410008, China.,Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, 410008, China
| | - Lei Wang
- Cancer Research Institute, Department of Neurosurgery, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, 410008, China.,The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, School of Basic Medicine, Central South University, Changsha, 410008, China
| | - Bin Zhu
- Cancer Research Institute, Department of Neurosurgery, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, 410008, China.,The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, School of Basic Medicine, Central South University, Changsha, 410008, China
| | - Shasha Li
- Cancer Research Institute, Department of Neurosurgery, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, 410008, China.,The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, School of Basic Medicine, Central South University, Changsha, 410008, China
| | - Yao Zhou
- Cancer Research Institute, Department of Neurosurgery, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, 410008, China.,The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, School of Basic Medicine, Central South University, Changsha, 410008, China
| | - Xingjun Jiang
- Cancer Research Institute, Department of Neurosurgery, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, 410008, China. .,Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, 410008, China.
| | - Qiang Xu
- Department of Orthopedics, The Affiliated Zhuzhou Hospital of Xiangya Medical College, Central South University, Zhuzhou, 412007, China. .,School of Materials Science and Engineering, Central South University, Changsha, 410083, China.
| | - Caiping Ren
- Cancer Research Institute, Department of Neurosurgery, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, 410008, China. .,The Key Laboratory of Carcinogenesis of the Chinese Ministry of Health and the Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, School of Basic Medicine, Central South University, Changsha, 410008, China.
| |
Collapse
|
19
|
Lombardo SD, Wangsaputra IF, Menche J, Stevens A. Network Approaches for Charting the Transcriptomic and Epigenetic Landscape of the Developmental Origins of Health and Disease. Genes (Basel) 2022; 13:764. [PMID: 35627149 PMCID: PMC9141211 DOI: 10.3390/genes13050764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/04/2022] [Accepted: 04/13/2022] [Indexed: 02/04/2023] Open
Abstract
The early developmental phase is of critical importance for human health and disease later in life. To decipher the molecular mechanisms at play, current biomedical research is increasingly relying on large quantities of diverse omics data. The integration and interpretation of the different datasets pose a critical challenge towards the holistic understanding of the complex biological processes that are involved in early development. In this review, we outline the major transcriptomic and epigenetic processes and the respective datasets that are most relevant for studying the periconceptional period. We cover both basic data processing and analysis steps, as well as more advanced data integration methods. A particular focus is given to network-based methods. Finally, we review the medical applications of such integrative analyses.
Collapse
Affiliation(s)
- Salvo Danilo Lombardo
- Max Perutz Labs, Department of Structural and Computational Biology, University of Vienna, 1030 Vienna, Austria;
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1030 Vienna, Austria
| | - Ivan Fernando Wangsaputra
- Maternal and Fetal Health Research Group, Division of Developmental Biology and Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9WL, UK;
| | - Jörg Menche
- Max Perutz Labs, Department of Structural and Computational Biology, University of Vienna, 1030 Vienna, Austria;
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1030 Vienna, Austria
- Faculty of Mathematics, University of Vienna, 1030 Vienna, Austria
| | - Adam Stevens
- Maternal and Fetal Health Research Group, Division of Developmental Biology and Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9WL, UK;
| |
Collapse
|
20
|
Peng L, Tan J, Tian X, Zhou L. EnANNDeep: An Ensemble-based lncRNA-protein Interaction Prediction Framework with Adaptive k-Nearest Neighbor Classifier and Deep Models. Interdiscip Sci 2022; 14:209-232. [PMID: 35006529 DOI: 10.1007/s12539-021-00483-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/14/2021] [Accepted: 09/15/2021] [Indexed: 01/08/2023]
Abstract
lncRNA-protein interactions (LPIs) prediction can deepen the understanding of many important biological processes. Artificial intelligence methods have reported many possible LPIs. However, most computational techniques were evaluated mainly on one dataset, which may produce prediction bias. More importantly, they were validated only under cross validation on lncRNA-protein pairs, and did not consider the performance under cross validations on lncRNAs and proteins, thus fail to search related proteins/lncRNAs for a new lncRNA/protein. Under an ensemble learning framework (EnANNDeep) composed of adaptive k-nearest neighbor classifier and Deep models, this study focuses on systematically finding underlying linkages between lncRNAs and proteins. First, five LPI-related datasets are arranged. Second, multiple source features are integrated to depict an lncRNA-protein pair. Third, adaptive k-nearest neighbor classifier, deep neural network, and deep forest are designed to score unknown lncRNA-protein pairs, respectively. Finally, interaction probabilities from the three predictors are integrated based on a soft voting technique. In comparing to five classical LPI identification models (SFPEL, PMDKN, CatBoost, PLIPCOM, and LPI-SKF) under fivefold cross validations on lncRNAs, proteins, and LPIs, EnANNDeep computes the best average AUCs of 0.8660, 0.8775, and 0.9166, respectively, and the best average AUPRs of 0.8545, 0.8595, and 0.9054, respectively, indicating its superior LPI prediction ability. Case study analyses indicate that SNHG10 may have dense linkage with Q15717. In the ensemble framework, adaptive k-nearest neighbor classifier can separately pick the most appropriate k for each query lncRNA-protein pair. More importantly, deep models including deep neural network and deep forest can effectively learn the representative features of lncRNAs and proteins.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China. .,College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China.
| | - Jingwei Tan
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China.
| |
Collapse
|
21
|
Ren ZH, Yu CQ, Li LP, You ZH, Guan YJ, Li YC, Pan J. SAWRPI: A Stacking Ensemble Framework With Adaptive Weight for Predicting ncRNA-Protein Interactions Using Sequence Information. Front Genet 2022; 13:839540. [PMID: 35360836 PMCID: PMC8963817 DOI: 10.3389/fgene.2022.839540] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 02/07/2022] [Indexed: 11/13/2022] Open
Abstract
Non-coding RNAs (ncRNAs) take essential effects on biological processes, like gene regulation. One critical way of ncRNA executing biological functions is interactions between ncRNA and RNA binding proteins (RBPs). Identifying proteins, involving ncRNA-protein interactions, can well understand the function ncRNA. Many high-throughput experiment have been applied to recognize the interactions. As a consequence of these approaches are time- and labor-consuming, currently, a great number of computational methods have been developed to improve and advance the ncRNA-protein interactions research. However, these methods may be not available to all RNAs and proteins, particularly processing new RNAs and proteins. Additionally, most of them cannot process well with long sequence. In this work, a computational method SAWRPI is proposed to make prediction of ncRNA-protein through sequence information. More specifically, the raw features of protein and ncRNA are firstly extracted through the k-mer sparse matrix with SVD reduction and learning nucleic acid symbols by natural language processing with local fusion strategy, respectively. Then, to classify easily, Hilbert Transformation is exploited to transform raw feature data to the new feature space. Finally, stacking ensemble strategy is adopted to learn high-level abstraction features automatically and generate final prediction results. To confirm the robustness and stability, three different datasets containing two kinds of interactions are utilized. In comparison with state-of-the-art methods and other results classifying or feature extracting strategies, SAWRPI achieved high performance on three datasets, containing two kinds of lncRNA-protein interactions. Upon our finding, SAWRPI is a trustworthy, robust, yet simple and can be used as a beneficial supplement to the task of predicting ncRNA-protein interactions.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an, China
| | - Yue-Chao Li
- School of Information Engineering, Xijing University, Xi’an, China
| | - Jie Pan
- School of Information Engineering, Xijing University, Xi’an, China
| |
Collapse
|
22
|
Zhao G, Li P, Qiao X, Han X, Liu ZP. Predicting lncRNA–Protein Interactions by Heterogenous Network Embedding. Front Genet 2022; 12:814073. [PMID: 35186016 PMCID: PMC8854746 DOI: 10.3389/fgene.2021.814073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 12/27/2021] [Indexed: 12/25/2022] Open
Abstract
lncRNA–protein interactions play essential roles in a variety of cellular processes. However, the experimental methods for systematically mapping of lncRNA–protein interactions remain time-consuming and expensive. Therefore, it is urgent to develop reliable computational methods for predicting lncRNA–protein interactions. In this study, we propose a computational method called LncPNet to predict potential lncRNA–protein interactions by embedding an lncRNA–protein heterogenous network. The experimental results indicate that LncPNet achieves promising performance on benchmark datasets extracted from the NPInter database with an accuracy of 0.930 and area under ROC curve (AUC) of 0.971. In addition, we further compare our method with other eight state-of-the-art methods, and the results illustrate that our method achieves superior prediction performance. LncPNet provides an effective method via a new perspective of representing lncRNA–protein heterogenous network, which will greatly benefit the prediction of lncRNA–protein interactions.
Collapse
Affiliation(s)
- Guoqing Zhao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Xu Qiao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Xianhua Han
- Faculty of Science, Yamaguchi University, Yamaguchi, Japan
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
- *Correspondence: Zhi-Ping Liu,
| |
Collapse
|
23
|
V SKP, Thahsin A, M M, G G. A Heterogeneous Information Network Model for Long Non-Coding RNA Function Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:255-266. [PMID: 32750859 DOI: 10.1109/tcbb.2020.3000518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Exciting information on the functional roles played by long non-coding RNA (lncRNA) has drawn substantial research attention these days. With the advent of techniques such as RNA-Seq, thousands of lncRNAs are identified in very short time spans. However, due to the poor annotation rate, only a few of them are functionally characterised. The wet lab experiments to elucidate lncRNA functions are challenging, slow progressing and sometimes prohibitively expensive. This work attempts to solve the crucial problem of developing computational methods to predict lncRNA functions. The model presented here, predicts the functions of lncRNAs by making use of a meta-path based measure, AvgSim on a Heterogeneous Information Network (HIN). The network is constructed from existing protein and function association data of lncRNAs, lncRNA co-expression data and protein protein interaction data. Out of the 2,758 lncRNA considered for the experiment, the proposed method predicts possible functions for 2,695 lncRNAs with an accuracy of 73.68 percent and found to perform better than the other state-of-the-art approaches for an independent test set. A case study of two well-known lncRNAs (HOTAIR and H19) is conducted and the associated functions are identified. The results were validated using experimental evidence from the literature. The script and data used for the implementation of the model is freely available at: http://bdbl.nitc.ac.in/LncFunPred/index.html.
Collapse
|
24
|
Peng L, Yuan R, Shen L, Gao P, Zhou L. LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification. BioData Min 2021; 14:50. [PMID: 34861891 PMCID: PMC8642957 DOI: 10.1186/s13040-021-00277-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 08/22/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias. RESULTS In this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717. CONCLUSIONS Fusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.,College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ruya Yuan
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Pengfei Gao
- College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
| |
Collapse
|
25
|
LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification. BMC Bioinformatics 2021; 22:568. [PMID: 34836494 PMCID: PMC8620196 DOI: 10.1186/s12859-021-04485-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 11/09/2021] [Indexed: 12/03/2022] Open
Abstract
Background Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. However, computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may thus fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate. Results Under a hybrid framework (LPI-HyADBS) integrating feature selection based on AdaBoost, and classification models including deep neural network (DNN), extreme gradient Boost (XGBoost), and SVM with a penalty Coefficient of misclassification (C-SVM), this work focuses on finding new LPIs. First, five datasets are arranged. Each dataset contains lncRNA sequences, protein sequences, and an LPI network. Second, biological features of lncRNAs and proteins are acquired based on Pyfeat. Third, the obtained features of lncRNAs and proteins are selected based on AdaBoost and concatenated to depict each LPI sample. Fourth, DNN, XGBoost, and C-SVM are used to classify lncRNA-protein pairs based on the concatenated features. Finally, a hybrid framework is developed to integrate the classification results from the above three classifiers. LPI-HyADBS is compared to six classical LPI prediction approaches (LPI-SKF, LPI-NRLMF, Capsule-LPI, LPI-CNNCP, LPLNP, and LPBNI) on five datasets under 5-fold cross validations on lncRNAs, proteins, lncRNA-protein pairs, and independent lncRNAs and independent proteins. The results show LPI-HyADBS has the best LPI prediction performance under four different cross validations. In particular, LPI-HyADBS obtains better classification ability than other six approaches under the constructed independent dataset. Case analyses suggest that there is relevance between ZNF667-AS1 and Q15717. Conclusions Integrating feature selection approach based on AdaBoost, three classification techniques including DNN, XGBoost, and C-SVM, this work develops a hybrid framework to identify new linkages between lncRNAs and proteins. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04485-x.
Collapse
|
26
|
Yu H, Shen ZA, Du PF. NPI-RGCNAE: Fast predicting ncRNA-protein interactions using the Relational Graph Convolutional Network Auto-Encoder. IEEE J Biomed Health Inform 2021; 26:1861-1871. [PMID: 34699377 DOI: 10.1109/jbhi.2021.3122527] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
- ncRNAs play important roles in a variety of biological processes by interacting with RNA-binding proteins. Therefore, identifying ncRNA-protein interactions is important to understanding the biological functions of ncRNAs. Since experimental methods to determine ncRNA-protein interactions are always costly and time-consuming, computational methods have been proposed as alternative approaches. We developed a novel method NPI-RGCNAE (predicting ncRNA-Protein Interactions by the Relational Graph Convolutional Network Auto-Encoder). With a reliable negative sample selection strategy, we applied the Relational Graph Convolutional Network encoder and the DistMult decoder to predict ncRNA-protein interactions in an accurate and efficient way. By using the 5-fold cross-validation, we found that our method achieved a comparable performance to all state-of-the-art methods. Our method requires less than 10% training time of all state-of-the-art methods. It is a more efficient choice with large datasets in practice. All datasets and source codes of NPI-RGCNAE have been deposited in a public Github repository (https://github.com/Angelia0hh/NPI-RGCNAE).
Collapse
|
27
|
Zhou L, Wang Z, Tian X, Peng L. LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA-protein interaction identification. BMC Bioinformatics 2021; 22:479. [PMID: 34607567 PMCID: PMC8489074 DOI: 10.1186/s12859-021-04399-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/14/2021] [Indexed: 12/31/2022] Open
Abstract
Background Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA–protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions between lncRNAs and proteins, experimental techniques are costly and time-consuming. Therefore, computational methods are increasingly exploited to uncover the possible associations. However, existing computational methods have several limitations. First, majority of them were measured based on one simple dataset, which may result in the prediction bias. Second, few of them are applied to identify relevant data for new lncRNAs (or proteins). Finally, they failed to utilize diverse biological information of lncRNAs and proteins. Results Under the feed-forward deep architecture based on gradient boosting decision trees (LPI-deepGBDT), this work focuses on classify unobserved LPIs. First, three human LPI datasets and two plant LPI datasets are arranged. Second, the biological features of lncRNAs and proteins are extracted by Pyfeat and BioProt, respectively. Thirdly, the features are dimensionally reduced and concatenated as a vector to represent an lncRNA–protein pair. Finally, a deep architecture composed of forward mappings and inverse mappings is developed to predict underlying linkages between lncRNAs and proteins. LPI-deepGBDT is compared with five classical LPI prediction models (LPI-BLS, LPI-CatBoost, PLIPCOM, LPI-SKF, and LPI-HNM) under three cross validations on lncRNAs, proteins, lncRNA–protein pairs, respectively. It obtains the best average AUC and AUPR values under the majority of situations, significantly outperforming other five LPI identification methods. That is, AUCs computed by LPI-deepGBDT are 0.8321, 0.6815, and 0.9073, respectively and AUPRs are 0.8095, 0.6771, and 0.8849, respectively. The results demonstrate the powerful classification ability of LPI-deepGBDT. Case study analyses show that there may be interactions between GAS5 and Q15717, RAB30-AS1 and O00425, and LINC-01572 and P35637. Conclusions Integrating ensemble learning and hierarchical distributed representations and building a multiple-layered deep architecture, this work improves LPI prediction performance as well as effectively probes interaction data for new lncRNAs/proteins.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Zhao Wang
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China. .,College of Life Sciences and Chemistry, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China.
| |
Collapse
|
28
|
Tian X, Shen L, Wang Z, Zhou L, Peng L. A novel lncRNA-protein interaction prediction method based on deep forest with cascade forest structure. Sci Rep 2021; 11:18881. [PMID: 34556758 PMCID: PMC8460650 DOI: 10.1038/s41598-021-98277-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 08/18/2021] [Indexed: 02/08/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) regulate many biological processes by interacting with corresponding RNA-binding proteins. The identification of lncRNA-protein Interactions (LPIs) is significantly important to well characterize the biological functions and mechanisms of lncRNAs. Existing computational methods have been effectively applied to LPI prediction. However, the majority of them were evaluated only on one LPI dataset, thereby resulting in prediction bias. More importantly, part of models did not discover possible LPIs for new lncRNAs (or proteins). In addition, the prediction performance remains limited. To solve with the above problems, in this study, we develop a Deep Forest-based LPI prediction method (LPIDF). First, five LPI datasets are obtained and the corresponding sequence information of lncRNAs and proteins are collected. Second, features of lncRNAs and proteins are constructed based on four-nucleotide composition and BioSeq2vec with encoder-decoder structure, respectively. Finally, a deep forest model with cascade forest structure is developed to find new LPIs. We compare LPIDF with four classical association prediction models based on three fivefold cross validations on lncRNAs, proteins, and LPIs. LPIDF obtains better average AUCs of 0.9012, 0.6937 and 0.9457, and the best average AUPRs of 0.9022, 0.6860, and 0.9382, respectively, for the three CVs, significantly outperforming other methods. The results show that the lncRNA FTX may interact with the protein P35637 and needs further validation.
Collapse
Affiliation(s)
- Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China
| | - Zhenwu Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China.
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China.
| |
Collapse
|
29
|
Sun X, Cheng L, Liu J, Xie C, Yang J, Li F. Predicting lncRNA-Protein Interaction With Weighted Graph-Regularized Matrix Factorization. Front Genet 2021; 12:690096. [PMID: 34335693 PMCID: PMC8322775 DOI: 10.3389/fgene.2021.690096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/21/2021] [Indexed: 11/13/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are widely concerned because of their close associations with many key biological activities. Though precise functions of most lncRNAs are unknown, research works show that lncRNAs usually exert biological function by interacting with the corresponding proteins. The experimental validation of interactions between lncRNAs and proteins is costly and time-consuming. In this study, we developed a weighted graph-regularized matrix factorization (LPI-WGRMF) method to find unobserved lncRNA-protein interactions (LPIs) based on lncRNA similarity matrix, protein similarity matrix, and known LPIs. We compared our proposed LPI-WGRMF method with five classical LPI prediction methods, that is, LPBNI, LPI-IBNRA, LPIHN, RWR, and collaborative filtering (CF). The results demonstrate that the LPI-WGRMF method can produce high-accuracy performance, obtaining an AUC score of 0.9012 and AUPR of 0.7324. The case study showed that SFPQ, SNHG3, and PRPF31 may associate with Q9NUL5, Q9NUL5, and Q9UKV8 with the highest linking probabilities and need to further experimental validation.
Collapse
Affiliation(s)
- Xibo Sun
- Yidu Central Hospital of Weifang, Weifang, China
| | | | - Jinyang Liu
- Geneis Beijing Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Cuinan Xie
- Geneis Beijing Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Jiasheng Yang
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Fu Li
- Department of Thoracic Surgery, The Second Affiliated Hospital of Hainan Medical University, Haikou, China
| |
Collapse
|
30
|
Yu H, Shen ZA, Zhou YK, Du PF. Recent advances in predicting protein-lncRNA interactions using machine learning methods. Curr Gene Ther 2021; 22:228-244. [PMID: 34254917 DOI: 10.2174/1566523221666210712190718] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 05/01/2021] [Accepted: 05/31/2021] [Indexed: 11/22/2022]
Abstract
Long non-coding RNAs (LncRNAs) are a type of RNA with little or no protein-coding ability. Their length is more than 200 nucleotides. A large number of studies have indicated that lncRNAs play a significant role in various biological processes, including chromatin organizations, epigenetic programmings, transcriptional regulations, post-transcriptional processing, and circadian mechanism at the cellular level. Since lncRNAs perform vast functions through their interactions with proteins, identifying lncRNA-protein interaction is crucial to the understandings of the lncRNA molecular functions. However, due to the high cost and time-consuming disadvantage of experimental methods, a variety of computational methods have emerged. Recently, many effective and novel machine learning methods have been developed. In general, these methods fall into two categories: semi-supervised learning methods and supervised learning methods. The latter category can be further classified into the deep learning-based method, the ensemble learning-based method, and the hybrid method. In this paper, we focused on supervised learning methods. We summarized the state-of-the-art methods in predicting lncRNA-protein interactions. Furthermore, the performance and the characteristics of different methods have also been compared in this work. Considering the limits of the existing models, we analyzed the problems and discussed future research potentials.
Collapse
Affiliation(s)
- Han Yu
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Zi-Ang Shen
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Yuan-Ke Zhou
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| |
Collapse
|
31
|
Li Y, Sun H, Feng S, Zhang Q, Han S, Du W. Capsule-LPI: a LncRNA-protein interaction predicting tool based on a capsule network. BMC Bioinformatics 2021; 22:246. [PMID: 33985444 PMCID: PMC8120853 DOI: 10.1186/s12859-021-04171-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 05/05/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) play important roles in multiple biological processes. Identifying LncRNA-protein interactions (LPIs) is key to understanding lncRNA functions. Although some LPIs computational methods have been developed, the LPIs prediction problem remains challenging. How to integrate multimodal features from more perspectives and build deep learning architectures with better recognition performance have always been the focus of research on LPIs. RESULTS We present a novel multichannel capsule network framework to integrate multimodal features for LPI prediction, Capsule-LPI. Capsule-LPI integrates four groups of multimodal features, including sequence features, motif information, physicochemical properties and secondary structure features. Capsule-LPI is composed of four feature-learning subnetworks and one capsule subnetwork. Through comprehensive experimental comparisons and evaluations, we demonstrate that both multimodal features and the architecture of the multichannel capsule network can significantly improve the performance of LPI prediction. The experimental results show that Capsule-LPI performs better than the existing state-of-the-art tools. The precision of Capsule-LPI is 87.3%, which represents a 1.7% improvement. The F-value of Capsule-LPI is 92.2%, which represents a 1.4% improvement. CONCLUSIONS This study provides a novel and feasible LPI prediction tool based on the integration of multimodal features and a capsule network. A webserver ( http://csbg-jlu.site/lpc/predict ) is developed to be convenient for users.
Collapse
Affiliation(s)
- Ying Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Hang Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Shiyao Feng
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Qi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Siyu Han
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
- Department of Computer Science, Faculty of Engineering, University of Bristol, Bristol, BS8 1UB, UK
| | - Wei Du
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China.
| |
Collapse
|
32
|
Chen Y, Fu X, Li Z, Peng L, Zhuo L. Prediction of lncRNA-Protein Interactions via the Multiple Information Integration. Front Bioeng Biotechnol 2021; 9:647113. [PMID: 33718346 PMCID: PMC7947871 DOI: 10.3389/fbioe.2021.647113] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 01/19/2021] [Indexed: 01/09/2023] Open
Abstract
The long non-coding RNA (lncRNA)-protein interaction plays an important role in the post-transcriptional gene regulation, such as RNA splicing, translation, signaling, and the development of complex diseases. The related research on the prediction of lncRNA-protein interaction relationship is beneficial in the excavation and the discovery of the mechanism of lncRNA function and action occurrence, which are important. Traditional experimental methods for detecting lncRNA-protein interactions are expensive and time-consuming. Therefore, computational methods provide many effective strategies to deal with this problem. In recent years, most computational methods only use the information of the lncRNA-lncRNA or the protein-protein similarity and cannot fully capture all features to identify their interactions. In this paper, we propose a novel computational model for the lncRNA-protein prediction on the basis of machine learning methods. First, a feature method is proposed for representing the information of the network topological properties of lncRNA and protein interactions. The basic composition feature information and evolutionary information based on protein, the lncRNA sequence feature information, and the lncRNA expression profile information are extracted. Finally, the above feature information is fused, and the optimized feature vector is used with the recursive feature elimination algorithm. The optimized feature vectors are input to the support vector machine (SVM) model. Experimental results show that the proposed method has good effectiveness and accuracy in the lncRNA-protein interaction prediction.
Collapse
Affiliation(s)
- Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, China
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang, China
| | - Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Zejun Li
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang, China
| | - Li Peng
- College of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Linlin Zhuo
- Department of Mathematics and Information Engineering, Wenzhou University Oujiang College, Wenzhou, China
| |
Collapse
|
33
|
Shaw D, Chen H, Xie M, Jiang T. DeepLPI: a multimodal deep learning method for predicting the interactions between lncRNAs and protein isoforms. BMC Bioinformatics 2021; 22:24. [PMID: 33461501 PMCID: PMC7814738 DOI: 10.1186/s12859-020-03914-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 11/30/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) regulate diverse biological processes via interactions with proteins. Since the experimental methods to identify these interactions are expensive and time-consuming, many computational methods have been proposed. Although these computational methods have achieved promising prediction performance, they neglect the fact that a gene may encode multiple protein isoforms and different isoforms of the same gene may interact differently with the same lncRNA. RESULTS In this study, we propose a novel method, DeepLPI, for predicting the interactions between lncRNAs and protein isoforms. Our method uses sequence and structure data to extract intrinsic features and expression data to extract topological features. To combine these different data, we adopt a hybrid framework by integrating a multimodal deep learning neural network and a conditional random field. To overcome the lack of known interactions between lncRNAs and protein isoforms, we apply a multiple instance learning (MIL) approach. In our experiment concerning the human lncRNA-protein interactions in the NPInter v3.0 database, DeepLPI improved the prediction performance by 4.7% in term of AUC and 5.9% in term of AUPRC over the state-of-the-art methods. Our further correlation analyses between interactive lncRNAs and protein isoforms also illustrated that their co-expression information helped predict the interactions. Finally, we give some examples where DeepLPI was able to outperform the other methods in predicting mouse lncRNA-protein interactions and novel human lncRNA-protein interactions. CONCLUSION Our results demonstrated that the use of isoforms and MIL contributed significantly to the improvement of performance in predicting lncRNA and protein interactions. We believe that such an approach would find more applications in predicting other functional roles of RNAs and proteins.
Collapse
Affiliation(s)
- Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
- Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
34
|
Zhou YK, Hu J, Shen ZA, Zhang WY, Du PF. LPI-SKF: Predicting lncRNA-Protein Interactions Using Similarity Kernel Fusions. Front Genet 2020; 11:615144. [PMID: 33362868 PMCID: PMC7758075 DOI: 10.3389/fgene.2020.615144] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 11/16/2020] [Indexed: 01/24/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play an important role in serval biological activities, including transcription, splicing, translation, and some other cellular regulation processes. lncRNAs perform their biological functions by interacting with various proteins. The studies on lncRNA-protein interactions are of great value to the understanding of lncRNA functional mechanisms. In this paper, we proposed a novel model to predict potential lncRNA-protein interactions using the SKF (similarity kernel fusion) and LapRLS (Laplacian regularized least squares) algorithms. We named this method the LPI-SKF. Various similarities of both lncRNAs and proteins were integrated into the LPI-SKF. LPI-SKF can be applied in predicting potential interactions involving novel proteins or lncRNAs. We obtained an AUROC (area under receiver operating curve) of 0.909 in a 5-fold cross-validation, which outperforms other state-of-the-art methods. A total of 19 out of the top 20 ranked interaction predictions were verified by existing data, which implied that the LPI-SKF had great potential in discovering unknown lncRNA-protein interactions accurately. All data and codes of this work can be downloaded from a GitHub repository (https://github.com/zyk2118216069/LPI-SKF).
Collapse
Affiliation(s)
| | | | | | | | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
35
|
Peng L, Liu F, Yang J, Liu X, Meng Y, Deng X, Peng C, Tian G, Zhou L. Probing lncRNA-Protein Interactions: Data Repositories, Models, and Algorithms. Front Genet 2020; 10:1346. [PMID: 32082358 PMCID: PMC7005249 DOI: 10.3389/fgene.2019.01346] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 12/09/2019] [Indexed: 12/31/2022] Open
Abstract
Identifying lncRNA-protein interactions (LPIs) is vital to understanding various key biological processes. Wet experiments found a few LPIs, but experimental methods are costly and time-consuming. Therefore, computational methods are increasingly exploited to capture LPI candidates. We introduced relevant data repositories, focused on two types of LPI prediction models: network-based methods and machine learning-based methods. Machine learning-based methods contain matrix factorization-based techniques and ensemble learning-based techniques. To detect the performance of computational methods, we compared parts of LPI prediction models on Leave-One-Out cross-validation (LOOCV) and fivefold cross-validation. The results show that SFPEL-LPI obtained the best performance of AUC. Although computational models have efficiently unraveled some LPI candidates, there are many limitations involved. We discussed future directions to further boost LPI predictive performance.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Fuxing Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Jialiang Yang
- Department of Sciences, Genesis (Beijing) Co. Ltd., Beijing, China
| | - Xiaojun Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yajie Meng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiaojun Deng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Cheng Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Geng Tian
- Department of Sciences, Genesis (Beijing) Co. Ltd., Beijing, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
36
|
Zhou YK, Shen ZA, Yu H, Luo T, Gao Y, Du PF. Predicting lncRNA-Protein Interactions With miRNAs as Mediators in a Heterogeneous Network Model. Front Genet 2020; 10:1341. [PMID: 32038709 PMCID: PMC6988623 DOI: 10.3389/fgene.2019.01341] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 12/09/2019] [Indexed: 01/20/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play important roles in various biological processes, where lncRNA–protein interactions are usually involved. Therefore, identifying lncRNA–protein interactions is of great significance to understand the molecular functions of lncRNAs. Since the experiments to identify lncRNA–protein interactions are always costly and time consuming, computational methods are developed as alternative approaches. However, existing lncRNA–protein interaction predictors usually require prior knowledge of lncRNA–protein interactions with experimental evidences. Their performances are limited due to the number of known lncRNA–protein interactions. In this paper, we explored a novel way to predict lncRNA–protein interactions without direct prior knowledge. MiRNAs were picked up as mediators to estimate potential interactions between lncRNAs and proteins. By validating our results based on known lncRNA–protein interactions, our method achieved an AUROC (Area Under Receiver Operating Curve) of 0.821, which is comparable to the state-of-the-art methods. Moreover, our method achieved an improved AUROC of 0.852 by further expanding the training dataset. We believe that our method can be a useful supplement to the existing methods, as it provides an alternative way to estimate lncRNA–protein interactions in a heterogeneous network without direct prior knowledge. All data and codes of this work can be downloaded from GitHub (https://github.com/zyk2118216069/LncRNA-protein-interactions-prediction).
Collapse
Affiliation(s)
- Yuan-Ke Zhou
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Zi-Ang Shen
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Han Yu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Tao Luo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yang Gao
- School of Medicine, Nankai University, Tianjin, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
37
|
Zhang T, Wang M, Xi J, Li A. LPGNMF: Predicting Long Non-Coding RNA and Protein Interaction Using Graph Regularized Nonnegative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:189-197. [PMID: 30059315 DOI: 10.1109/tcbb.2018.2861009] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Long non-coding RNAs (lncRNA) play crucial roles in a variety of biological processes and complex diseases. Massive studies have indicated that lncRNAs interact with related proteins to exert regulation of cellular biological processes. Because it is time-consuming and expensive to determine lncRNA-protein interaction by experiment, more accurate predictions of interaction by computational methods are imperative. We propose a novel computational approach, predicting lncRNA-protein interaction using graph regularized nonnegative matrix factorization (LPGNMF), to discover unobserved lncRNA-protein association. First, we calculate lncRNA similarity and protein similarity by integrating the lncRNA expression information and gene ontology information. Subsequently, we utilize graph regularized nonnegative matrix factorization framework to predict potential interactions for all lncRNA simultaneously. In the cross validation test, LPGNMF achieves an AUC of 85.2 percent, higher than those of other compared methods. In addition, novel lncRNA-protein interactions detected by LPGNMF are validated by literatures or database. The results indicate that our method is effective to discover potential lncRNA-protein interaction.
Collapse
|
38
|
LPI-BLS: Predicting lncRNA–protein interactions with a broad learning system-based stacked ensemble classifier. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.08.084] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
39
|
Ma Y, He T, Jiang X. Projection-Based Neighborhood Non-Negative Matrix Factorization for lncRNA-Protein Interaction Prediction. Front Genet 2019; 10:1148. [PMID: 31824563 PMCID: PMC6880730 DOI: 10.3389/fgene.2019.01148] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 10/21/2019] [Indexed: 12/25/2022] Open
Abstract
Many long ncRNAs (lncRNA) make their effort by interacting with the corresponding RNA-binding proteins, and identifying the interactions between lncRNAs and proteins is important to understand the functions of lncRNA. Compared with the time-consuming and laborious experimental methods, more and more computational models are proposed to predict lncRNA-protein interactions. However, few models can effectively utilize the biological network topology of lncRNA (protein) and combine its sequence structure features, and most models cannot effectively predict new proteins (lncRNA) that do not interact with any lncRNA (proteins). In this study, we proposed a projection-based neighborhood non-negative matrix decomposition model (PMKDN) to predict potential lncRNA-protein interactions by integrating multiple biological features of lncRNAs (proteins). First, according to lncRNA (protein) sequences and lncRNA expression profile data, we extracted multiple features of lncRNA (protein). Second, based on protein GO ontology annotation, lncRNA sequences, lncRNA(protein) feature information, and modified lncRNA-protein interaction network, we calculated multiple similarities of lncRNA (protein), and fused them to obtain a more accurate lncRNA(protein) similarity network. Finally, combining the similarity and various feature information of lncRNA (protein), as well as the modified interaction network, we proposed a projection-based neighborhood non-negative matrix decomposition algorithm to predict the potential lncRNA-protein interactions. On two benchmark datasets, PMKDN showed better performance than other state-of-the-art methods for the prediction of new lncRNA-protein interactions, new lncRNAs, and new proteins. Case study further indicates that PMKDN can be used as an effective tool for lncRNA-protein interaction prediction.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics & Statistics, Central China Normal University, Wuhan, China.,Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, China
| | - Tingting He
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, China.,School of Computer, Central China Normal University, Wuhan, China
| | - Xingpeng Jiang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, China.,School of Computer, Central China Normal University, Wuhan, China
| |
Collapse
|
40
|
Chen X, Shi W, Deng L. Prediction of Disease Comorbidity Using HeteSim Scores based on Multiple Heterogeneous Networks. Curr Gene Ther 2019; 19:232-241. [DOI: 10.2174/1566523219666190917155959] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/14/2019] [Accepted: 06/16/2019] [Indexed: 12/25/2022]
Abstract
Background:
Accumulating experimental studies have indicated that disease comorbidity
causes additional pain to patients and leads to the failure of standard treatments compared to patients
who have a single disease. Therefore, accurate prediction of potential comorbidity is essential to design
more efficient treatment strategies. However, only a few disease comorbidities have been discovered
in the clinic.
Objective:
In this work, we propose PCHS, an effective computational method for predicting disease
comorbidity.
Materials and Methods:
We utilized the HeteSim measure to calculate the relatedness score for different
disease pairs in the global heterogeneous network, which integrates six networks based on biological
information, including disease-disease associations, drug-drug interactions, protein-protein interactions
and associations among them. We built the prediction model using the Support Vector Machine
(SVM) based on the HeteSim scores.
Results and Conclusion:
The results showed that PCHS performed significantly better than previous
state-of-the-art approaches and achieved an AUC score of 0.90 in 10-fold cross-validation. Furthermore,
some of our predictions have been verified in literatures, indicating the effectiveness of our method.
Collapse
Affiliation(s)
- Xuegong Chen
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| | - Wanwan Shi
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| |
Collapse
|
41
|
Zhang H, Ming Z, Fan C, Zhao Q, Liu H. A path-based computational model for long non-coding RNA-protein interaction prediction. Genomics 2019; 112:1754-1760. [PMID: 31639442 DOI: 10.1016/j.ygeno.2019.09.018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 09/20/2019] [Accepted: 09/24/2019] [Indexed: 10/25/2022]
Abstract
Recently, lncRNAs have attracted accumulating attentions because more and more experimental researches have shown lncRNA can play critical roles in many biological processes. Predicting potential interactions between lncRNAs and proteins are key to understand the lncRNAs biological functions. But traditional biological experiments are expensive and time-consuming, network similarity methods provide a powerful solution to computationally predict lncRNA-protein interactions. In this work, a novel path-based lncRNA-protein interaction (PBLPI) prediction model is proposed by integrating protein semantic similarity, lncRNA functional similarity, known human lncRNA-protein interactions, and Gaussian interaction profile kernel similarity. PBLPI model utilizes three interlinked sub-graphs to construct a heterogeneous graph, and then infers potential lncRNA-protein interactions through depth-first search algorithm. Consequently, PBLPI achieves reliable performance in the frameworks of 5-fold cross validation (average AUC is 0.9244 and AUPR is 0.6478). In the case study, we use "Mus musculus" data to further validate the reliability of PBLPI method. It is anticipated that PBLPI would become a useful tool to identify potential lncRNA-protein interactions.
Collapse
Affiliation(s)
- Hui Zhang
- School of Life Science, Liaoning University, Shenyang 110036, China
| | - Zhong Ming
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen 518060, China; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Chunlong Fan
- College of Computer Science, Shenyang Aerospace University, Shenyang 110136, China
| | - Qi Zhao
- College of Computer Science, Shenyang Aerospace University, Shenyang 110136, China.
| | - Hongsheng Liu
- School of Life Science, Liaoning University, Shenyang 110036, China; Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Liaoning Province, Shenyang 110036, China; Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang 110036, China.
| |
Collapse
|
42
|
Wekesa JS, Luan Y, Chen M, Meng J. A Hybrid Prediction Method for Plant lncRNA-Protein Interaction. Cells 2019; 8:E521. [PMID: 31151273 PMCID: PMC6627874 DOI: 10.3390/cells8060521] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 05/22/2019] [Accepted: 05/29/2019] [Indexed: 01/23/2023] Open
Abstract
Long non-protein-coding RNAs (lncRNAs) identification and analysis are pervasive in transcriptome studies due to their roles in biological processes. In particular, lncRNA-protein interaction has plausible relevance to gene expression regulation and in cellular processes such as pathogen resistance in plants. While lncRNA-protein interaction has been studied in animals, there has yet to be extensive research in plants. In this paper, we propose a novel plant lncRNA-protein interaction prediction method, namely PLRPIM, which combines deep learning and shallow machine learning methods. The selection of an optimal feature subset and subsequent efficient compression are significant challenges for deep learning models. The proposed method adopts k-mer and extracts high-level abstraction sequence-based features using stacked sparse autoencoder. Based on the extracted features, the fusion of random forest (RF) and light gradient boosting machine (LGBM) is used to build the prediction model. The performances are evaluated on Arabidopsis thaliana and Zea mays datasets. Results from experiments demonstrate PLRPIM's superiority compared with other prediction tools on the two datasets. Based on 5-fold cross-validation, we obtain 89.98% and 93.44% accuracy, 0.954 and 0.982 AUC for Arabidopsis thaliana and Zea mays, respectively. PLRPIM predicts potential lncRNA-protein interaction pairs effectively, which can facilitate lncRNA related research including function prediction.
Collapse
Affiliation(s)
- Jael Sanyanda Wekesa
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, Liaoning, China.
- Department of Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi 62000-00200, Kenya.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian 116023, Liaoning, China.
| | - Ming Chen
- College of Life Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, Liaoning, China.
| |
Collapse
|
43
|
Xie G, Wu C, Sun Y, Fan Z, Liu J. LPI-IBNRA: Long Non-coding RNA-Protein Interaction Prediction Based on Improved Bipartite Network Recommender Algorithm. Front Genet 2019; 10:343. [PMID: 31057602 PMCID: PMC6482170 DOI: 10.3389/fgene.2019.00343] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2019] [Accepted: 03/29/2019] [Indexed: 12/26/2022] Open
Abstract
According to the latest research, lncRNAs (long non-coding RNAs) play a broad and important role in various biological processes by interacting with proteins. However, identifying whether proteins interact with a specific lncRNA through biological experimental methods is difficult, costly, and time-consuming. Thus, many bioinformatics computational methods have been proposed to predict lncRNA-protein interactions. In this paper, we proposed a novel approach called Long non-coding RNA-Protein Interaction Prediction based on Improved Bipartite Network Recommender Algorithm (LPI-IBNRA). In the proposed method, we implemented a two-round resource allocation and eliminated the second-order correlations appropriately on the bipartite network. Experimental results illustrate that LPI-IBNRA outperforms five previous methods, with the AUC values of 0.8932 in leave-one-out cross validation (LOOCV) and 0.8819 ± 0.0052 in 10-fold cross validation, respectively. In addition, case studies on four lncRNAs were carried out to show the predictive power of LPI-IBNRA.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computers, Guangdong University of Technology, Guangzhou, China
| | - Cuiming Wu
- School of Computers, Guangdong University of Technology, Guangzhou, China
| | - Yuping Sun
- School of Computers, Guangdong University of Technology, Guangzhou, China
| | - Zhiliang Fan
- School of Computers, Guangdong University of Technology, Guangzhou, China
| | - Jianghui Liu
- Department of Emergency, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
44
|
Long Noncoding RNA and Protein Interactions: From Experimental Results to Computational Models Based on Network Methods. Int J Mol Sci 2019; 20:ijms20061284. [PMID: 30875752 PMCID: PMC6471543 DOI: 10.3390/ijms20061284] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Revised: 03/09/2019] [Accepted: 03/11/2019] [Indexed: 01/13/2023] Open
Abstract
Non-coding RNAs with a length of more than 200 nucleotides are long non-coding RNAs (lncRNAs), which have gained tremendous attention in recent decades. Many studies have confirmed that lncRNAs have important influence in post-transcriptional gene regulation; for example, lncRNAs affect the stability and translation of splicing factor proteins. The mutations and malfunctions of lncRNAs are closely related to human disorders. As lncRNAs interact with a variety of proteins, predicting the interaction between lncRNAs and proteins is a significant way to depth exploration functions and enrich annotations of lncRNAs. Experimental approaches for lncRNA–protein interactions are expensive and time-consuming. Computational approaches to predict lncRNA–protein interactions can be grouped into two broad categories. The first category is based on sequence, structural information and physicochemical property. The second category is based on network method through fusing heterogeneous data to construct lncRNA related heterogeneous network. The network-based methods can capture the implicit feature information in the topological structure of related biological heterogeneous networks containing lncRNAs, which is often ignored by sequence-based methods. In this paper, we summarize and discuss the materials, interaction score calculation algorithms, advantages and disadvantages of state-of-the-art algorithms of lncRNA–protein interaction prediction based on network methods to assist researchers in selecting a suitable method for acquiring more dependable results. All the related different network data are also collected and processed in convenience of users, and are available at https://github.com/HAN-Siyu/APINet/.
Collapse
|
45
|
Shen C, Ding Y, Tang J, Guo F. Multivariate Information Fusion With Fast Kernel Learning to Kernel Ridge Regression in Predicting LncRNA-Protein Interactions. Front Genet 2019; 9:716. [PMID: 30697228 PMCID: PMC6340980 DOI: 10.3389/fgene.2018.00716] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Accepted: 12/21/2018] [Indexed: 12/31/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) constitute a large class of transcribed RNA molecules. They have a characteristic length of more than 200 nucleotides which do not encode proteins. They play an important role in regulating gene expression by interacting with the homologous RNA-binding proteins. Due to the laborious and time-consuming nature of wet experimental methods, more researchers should pay great attention to computational approaches for the prediction of lncRNA-protein interaction (LPI). An in-depth literature review in the state-of-the-art in silico investigations, leads to the conclusion that there is still room for improving the accuracy and velocity. This paper propose a novel method for identifying LPI by employing Kernel Ridge Regression, based on Fast Kernel Learning (LPI-FKLKRR). This approach, uses four distinct similarity measures for lncRNA and protein space, respectively. It is remarkable, that we extract Gene Ontology (GO) with proteins, in order to improve the quality of information in protein space. The process of heterogeneous kernels integration, applies Fast Kernel Learning (FastKL) to deal with weight optimization. The extrapolation model is obtained by gaining the ultimate prediction associations, after using Kernel Ridge Regression (KRR). Experimental outcomes show that the ability of modeling with LPI-FKLKRR has extraordinary performance compared with LPI prediction schemes. On benchmark dataset, it has been observed that the best Area Under Precision Recall Curve (AUPR) of 0.6950 is obtained by our proposed model LPI-FKLKRR, which outperforms the integrated LPLNP (AUPR: 0.4584), RWR (AUPR: 0.2827), CF (AUPR: 0.2357), LPIHN (AUPR: 0.2299), and LPBNI (AUPR: 0.3302). Also, combined with the experimental results of a case study on a novel dataset, it is anticipated that LPI-FKLKRR will be a useful tool for LPI prediction.
Collapse
Affiliation(s)
- Cong Shen
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
46
|
Zhu R, Li G, Liu JX, Dai LY, Guo Y. ACCBN: ant-Colony-clustering-based bipartite network method for predicting long non-coding RNA-protein interactions. BMC Bioinformatics 2019; 20:16. [PMID: 30626319 PMCID: PMC6327428 DOI: 10.1186/s12859-018-2586-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 12/17/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Long non-coding RNA (lncRNA) studies play an important role in the development, invasion, and metastasis of the tumor. The analysis and screening of the differential expression of lncRNAs in cancer and corresponding paracancerous tissues provides new clues for finding new cancer diagnostic indicators and improving the treatment. Predicting lncRNA-protein interactions is very important in the analysis of lncRNAs. This article proposes an Ant-Colony-Clustering-Based Bipartite Network (ACCBN) method and predicts lncRNA-protein interactions. The ACCBN method combines ant colony clustering and bipartite network inference to predict lncRNA-protein interactions. RESULTS A five-fold cross-validation method was used in the experimental test. The results show that the values of the evaluation indicators of ACCBN on the test set are significantly better after comparing the predictive ability of ACCBN with RWR, ProCF, LPIHN, and LPBNI method. CONCLUSIONS With the continuous development of biology, besides the research on the cellular process, the research on the interaction function between proteins becomes a new key topic of biology. The studies on protein-protein interactions had important implications for bioinformatics, clinical medicine, and pharmacology. However, there are many kinds of proteins, and their functions of interactions are complicated. Moreover, the experimental methods require time to be confirmed because it is difficult to estimate. Therefore, a viable solution is to predict protein-protein interactions efficiently with computers. The ACCBN method has a good effect on the prediction of protein-protein interactions in terms of sensitivity, precision, accuracy, and F1-score.
Collapse
Affiliation(s)
- Rong Zhu
- School of Information Science and Engineering, Central South University, Changsha, 410083, China. .,School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| | - Guangshun Li
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Ling-Yun Dai
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Ying Guo
- School of Information Science and Engineering, Central South University, Changsha, 410083, China.
| |
Collapse
|
47
|
Zhang W, Yue X, Tang G, Wu W, Huang F, Zhang X. SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions. PLoS Comput Biol 2018; 14:e1006616. [PMID: 30533006 PMCID: PMC6331124 DOI: 10.1371/journal.pcbi.1006616] [Citation(s) in RCA: 108] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 01/14/2019] [Accepted: 11/02/2018] [Indexed: 01/12/2023] Open
Abstract
LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. Existing computational methods utilize multiple lncRNA features or multiple protein features to predict lncRNA-protein interactions, but features are not available for all lncRNAs or proteins; most of existing methods are not capable of predicting interacting proteins (or lncRNAs) for new lncRNAs (or proteins), which don’t have known interactions. In this paper, we propose the sequence-based feature projection ensemble learning method, “SFPEL-LPI”, to predict lncRNA-protein interactions. First, SFPEL-LPI extracts lncRNA sequence-based features and protein sequence-based features. Second, SFPEL-LPI calculates multiple lncRNA-lncRNA similarities and protein-protein similarities by using lncRNA sequences, protein sequences and known lncRNA-protein interactions. Then, SFPEL-LPI combines multiple similarities and multiple features with a feature projection ensemble learning frame. In computational experiments, SFPEL-LPI accurately predicts lncRNA-protein associations and outperforms other state-of-the-art methods. More importantly, SFPEL-LPI can be applied to new lncRNAs (or proteins). The case studies demonstrate that our method can find out novel lncRNA-protein interactions, which are confirmed by literature. Finally, we construct a user-friendly web server, available at http://www.bioinfotech.cn/SFPEL-LPI/. LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. In this paper, we propose a novel computational method “SFPEL-LPI” to predict lncRNA-protein interactions. SFPEL-LPI makes use of lncRNA sequences, protein sequences and known lncRNA-protein associations to extract features and calculate similarities for lncRNAs and proteins, and then combines them with a feature projection ensemble learning frame. SFPEL-LPI can predict unobserved interactions between lncRNAs and proteins, and also can make predictions for new lncRNAs (or proteins), which have no interactions with any proteins (or lncRNAs). SFPEL-LPI produces high-accuracy performances on the benchmark dataset when evaluated by five-fold cross validation, and outperforms state-of-the-art methods. The case studies demonstrate that SFPEL-LPI can find out novel associations, which are confirmed by literature. To facilitate the lncRNA-protein interaction prediction, we develop a user-friendly web server, available at http://www.bioinfotech.cn/SFPEL-LPI/.
Collapse
Affiliation(s)
- Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- School of Computer Science, Wuhan University, Wuhan, China
- * E-mail: , (WZ); (XZ)
| | - Xiang Yue
- Department of Computer Science and Engineering, The Ohio State University, Columbus, United States of America
| | - Guifeng Tang
- School of Computer Science, Wuhan University, Wuhan, China
| | - Wenjian Wu
- Electronic Information School, Wuhan University, Wuhan, China
| | - Feng Huang
- School of Computer Science, Wuhan University, Wuhan, China
| | - Xining Zhang
- School of Computer Science, Wuhan University, Wuhan, China
- * E-mail: , (WZ); (XZ)
| |
Collapse
|
48
|
Deng L, Wang J, Xiao Y, Wang Z, Liu H. Accurate prediction of protein-lncRNA interactions by diffusion and HeteSim features across heterogeneous network. BMC Bioinformatics 2018; 19:370. [PMID: 30309340 PMCID: PMC6182872 DOI: 10.1186/s12859-018-2390-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 09/19/2018] [Indexed: 12/12/2022] Open
Abstract
Background Identifying the interactions between proteins and long non-coding RNAs (lncRNAs) is of great importance to decipher the functional mechanisms of lncRNAs. However, current experimental techniques for detection of lncRNA-protein interactions are limited and inefficient. Many methods have been proposed to predict protein-lncRNA interactions, but few studies make use of the topological information of heterogenous biological networks associated with the lncRNAs. Results In this work, we propose a novel approach, PLIPCOM, using two groups of network features to detect protein-lncRNA interactions. In particular, diffusion features and HeteSim features are extracted from protein-lncRNA heterogenous network, and then combined to build the prediction model using the Gradient Tree Boosting (GTB) algorithm. Our study highlights that the topological features of the heterogeneous network are crucial for predicting protein-lncRNA interactions. The cross-validation experiments on the benchmark dataset show that PLIPCOM method substantially outperformed previous state-of-the-art approaches in predicting protein-lncRNA interactions. We also prove the robustness of the proposed method on three unbalanced data sets. Moreover, our case studies demonstrate that our method is effective and reliable in predicting the interactions between lncRNAs and proteins. Availability The source code and supporting files are publicly available at: http://denglab.org/PLIPCOM/.
Collapse
Affiliation(s)
- Lei Deng
- School of Software, Central South University, Changsha, 410075, China
| | - Junqiang Wang
- School of Software, Central South University, Changsha, 410075, China
| | - Yun Xiao
- School of Software, Central South University, Changsha, 410075, China
| | - Zixiang Wang
- School of Software, Central South University, Changsha, 410075, China
| | - Hui Liu
- Lab of Information Management, Changzhou University, Jiangsu, 213164, China.
| |
Collapse
|
49
|
The Bipartite Network Projection-Recommended Algorithm for Predicting Long Non-coding RNA-Protein Interactions. MOLECULAR THERAPY. NUCLEIC ACIDS 2018; 13:464-471. [PMID: 30388620 PMCID: PMC6205413 DOI: 10.1016/j.omtn.2018.09.020] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 09/25/2018] [Accepted: 09/25/2018] [Indexed: 01/23/2023]
Abstract
With the development of science and biotechnology, many evidences show that ncRNAs play an important role in the development of important biological processes, especially in chromatin modification, cell differentiation and proliferation, RNA progressing, human diseases, etc. Moreover, lncRNAs account for the majority of ncRNAs, and the functions of lncRNAs are expressed by the related RNA-binding proteins. It is well known that the experimental verification of lncRNA-protein relationships is a waste of time and expensive. So many time-saving and inexpensive computational methods are proposed to uncover potential lncRNA-protein interactions. In this work, we propose a novel computational method to predict the potential lncRNA-protein interactions with the bipartite network projection recommended algorithm (LPI-BNPRA). Our approach is a semi-supervised method based on the lncRNA similarity matrix, protein similarity matrix, and lncRNA-protein interaction matrix. Compared with three previous methods under the leave-one-out cross-validation, our model has a more high-confidence result with the AUC value of 0.8754 and the AUPR value of 0.6283. We also do case studies by the Mus musculus dataset to further reflect the reliability of our approach. This suggests that LPI-BNPRA will be a reliable computational method to uncover lncRNA-protein interactions in biomedical research.
Collapse
|
50
|
Zhao Q, Zhang Y, Hu H, Ren G, Zhang W, Liu H. IRWNRLPI: Integrating Random Walk and Neighborhood Regularized Logistic Matrix Factorization for lncRNA-Protein Interaction Prediction. Front Genet 2018; 9:239. [PMID: 30023002 PMCID: PMC6040094 DOI: 10.3389/fgene.2018.00239] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 06/15/2018] [Indexed: 11/13/2022] Open
Abstract
Long non-coding RNA (lncRNA) plays an important role in many important biological processes and has attracted widespread attention. Although the precise functions and mechanisms for most lncRNAs are still unknown, we are certain that lncRNAs usually perform their functions by interacting with the corresponding RNA- binding proteins. For example, lncRNA-protein interactions play an important role in post transcriptional gene regulation, such as splicing, translation, signaling, and advances in complex diseases. However, experimental verification of lncRNA-protein interactions prediction is time-consuming and laborious. In this work, we propose a computational method, named IRWNRLPI, to find the potential associations between lncRNAs and proteins. IRWNRLPI integrates two algorithms, random walk and neighborhood regularized logistic matrix factorization, which can optimize a lot more than using an algorithm alone. Moreover, the method is semi-supervised and does not require negative samples. Based on the leave-one-out cross validation, we obtain the AUC of 0.9150 and the AUPR of 0.7138, demonstrating its reliable performance. In addition, by means of case study in the “Mus musculus,” many lncRNA-protein interactions which are predicted by our method can be successfully confirmed by experiments. This suggests that IRWNRLPI will be a useful bioinformatics resource in biomedical research.
Collapse
Affiliation(s)
- Qi Zhao
- School of Mathematics, Liaoning University, Shenyang, China.,Research Center for Computer Simulating and Information Processing of Bio-Macromolecules of Liaoning Province, Shenyang, China
| | - Yue Zhang
- School of Mathematics, Liaoning University, Shenyang, China
| | - Huan Hu
- School of Life Science, Liaoning University, Shenyang, China
| | - Guofei Ren
- School of Information, Liaoning University, Shenyang, China
| | - Wen Zhang
- School of Computer, Wuhan University, Wuhan, China
| | - Hongsheng Liu
- Research Center for Computer Simulating and Information Processing of Bio-Macromolecules of Liaoning Province, Shenyang, China.,School of Life Science, Liaoning University, Shenyang, China.,Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, China
| |
Collapse
|