1
|
Li P, Liu ZP. Structure-Based Prediction of lncRNA-Protein Interactions by Deep Learning. Methods Mol Biol 2025; 2883:363-376. [PMID: 39702717 DOI: 10.1007/978-1-0716-4290-0_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
The interactions between long noncoding RNA (lncRNA) and protein play crucial roles in various biological processes. Computational methods are essential for predicting lncRNA-protein interactions and deciphering their mechanisms. In this chapter, we aim to introduce the fundamental framework for predicting lncRNA-protein interactions based on three-dimensional structure information. With the increasing availability of lncRNA and protein molecular tertiary structures, the feasibility of using deep learning methods for automatic representation and learning has become evident. This chapter outlines the key steps in predicting lncRNA-protein interactions using deep learning, including three common non-Euclidean data representations for lncRNA and proteins, as well as neural networks tailored to these specific data characteristics. We also highlight the advantages and challenges of structure-based prediction of lncRNA-protein interactions with geometric deep learning methods.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China.
| |
Collapse
|
2
|
Basu S, Yu J, Kihara D, Kurgan L. Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences. Brief Bioinform 2024; 26:bbaf016. [PMID: 39833102 PMCID: PMC11745544 DOI: 10.1093/bib/bbaf016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 12/24/2024] [Accepted: 01/06/2025] [Indexed: 01/22/2025] Open
Abstract
Computational prediction of nucleic acid-binding residues in protein sequences is an active field of research, with over 80 methods that were released in the past 2 decades. We identify and discuss 87 sequence-based predictors that include dozens of recently published methods that are surveyed for the first time. We overview historical progress and examine multiple practical issues that include availability and impact of predictors, key features of their predictive models, and important aspects related to their training and assessment. We observe that the past decade has brought increased use of deep neural networks and protein language models, which contributed to substantial gains in the predictive performance. We also highlight advancements in vital and challenging issues that include cross-predictions between deoxyribonucleic acid (DNA)-binding and ribonucleic acid (RNA)-binding residues and targeting the two distinct sources of binding annotations, structure-based versus intrinsic disorder-based. The methods trained on the structure-annotated interactions tend to perform poorly on the disorder-annotated binding and vice versa, with only a few methods that target and perform well across both annotation types. The cross-predictions are a significant problem, with some predictors of DNA-binding or RNA-binding residues indiscriminately predicting interactions with both nucleic acid types. Moreover, we show that methods with web servers are cited substantially more than tools without implementation or with no longer working implementations, motivating the development and long-term maintenance of the web servers. We close by discussing future research directions that aim to drive further progress in this area.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| | - Jing Yu
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, 915 Mitch Daniels Boulevard, West Lafayette, IN 47907, United States
- Department of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Richmond, VA 23284, United States
| |
Collapse
|
3
|
Naha S, Kaur S, Bhattacharya R, Cheemanapalli S, Iyyappan Y. ANPS: machine learning based server for identification of anti-nutritional proteins in plants. Funct Integr Genomics 2024; 24:201. [PMID: 39453508 DOI: 10.1007/s10142-024-01474-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 10/09/2024] [Accepted: 10/10/2024] [Indexed: 10/26/2024]
Abstract
Anti-nutrient factors are inherently present in almost all major crops, which impede the absorption of crucial vitamins and minerals upon human consumption. The commonly found anti-nutrients in food crops are saponins, tannins, lectins, and phytates etc. Currently, there is a lack of computational server for identification of proteins that encode for anti-nutritional factors in plants. Consequently, this study represents a computational approach aimed at distinguishing between proteins encoding anti-nutritional factors and those providing essential nutrients. In this work, machine learning algorithms have been employed to identify plant specific anti-nutrient factor proteins from protein sequences by using compositional features. Achieving a five-fold cross-validation training performance of 94.34% AUC-ROC and 94.13% AUC-PR with extreme gradient boosting surpasses the performance of other methods such as support vector machine, random forest, and adaptive boosting. These results suggest the proposed approach is highly reliable in predicting plant-specific anti-nutritional factor proteins. The resulting prediction models have led to the development of an online server named ANPS, freely available at https://nipb-bi.icar.gov.in .
Collapse
Affiliation(s)
- Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, Pusa, New Delhi, 110012, India
| | - Sarvjeet Kaur
- ICAR-National Institute for Plant Biotechnology, Pusa, New Delhi, 110012, India
| | | | | | - Yuvaraj Iyyappan
- ICAR-National Institute for Plant Biotechnology, Pusa, New Delhi, 110012, India.
| |
Collapse
|
4
|
Yang Y, Zhang J. Ascites-derived hsa-miR-181a-5p serves as a prognostic marker for gastric cancer-associated malignant ascites. BMC Genomics 2024; 25:628. [PMID: 38914980 PMCID: PMC11194912 DOI: 10.1186/s12864-024-10359-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 04/29/2024] [Indexed: 06/26/2024] Open
Abstract
BACKGROUND Peritoneal carcinomatosis was the main reason leading to gastric cancer (GC)-related death. We aimed to explore the roles of dysregulated microRNAs (miRNAs) and related immune regulation activities in GC-associated malignant ascites. METHODS GSE126399 were downloaded from GEO database. Differentially expressed miRNAs in GC ascites samples was firstly screened, and critical miRNAs were further investigated by LASSO (least absolute shrinkage and selection operator) logistic regression and random forest (RF) algorithm. Receiver operating characteristic of critical miRNAs was also constructed. Moreover, functional analysis, immune cell infiltration associated with differentially expressed mRNAs were further analyzed. After selecting key modules by weighted gene co-expression network analysis, mRNAs related with survival performance and transcription factor (TF)-miRNA-mRNA network were constructed. RESULTS Hsa-miR-181b-5p was confirmed as critical differentially expressed miRNAs in GC ascites. Then, the tumor samples were divided into high- and low- expression groups divided by mean expression levels of hsa-miR-181b-5p, and subjects with high hsa-miR-181b-5p levels had better survival outcomes. In total, 197 differentially expressed mRNAs associated with hsa-miR-181b-5p levels were obtained, and these mRNAs were mainly enriched in muscle activity and vascular smooth muscle contraction. Hsa-miR-181b-5 was positively related with activated CD4 T cells and negatively related with eosinophil. 17 mRNAs were selected as mRNAs significantly related with prognosis of GC, such as PDK4 and RAMP1. Finally, 75 TF-miRNA-mRNA relationships were obtained, including 15 TFs, hsa-miR-181b-5p, and five mRNAs. CONCLUSION Our data suggest that the differentially expressed hsa-miR-181b-5p in ascites samples of GC patients may be a valuable prognostic marker and a potential target for therapeutic intervention, which should be validated in the near future.
Collapse
Affiliation(s)
- Yongchao Yang
- Department of General Surgery 1, Sunshine Union Hospital, Weifang City, 261072, Shandong Province, China
| | - Junliang Zhang
- Department of Emergency Medicine, Sunshine Union Hospital, No. 9000, Yingqian Street, High-tech Zone, Weifang City, 261072, Shandong Province, China.
| |
Collapse
|
5
|
Sun C, Feng Y. EPDRNA: A Model for Identifying DNA-RNA Binding Sites in Disease-Related Proteins. Protein J 2024; 43:513-521. [PMID: 38491248 DOI: 10.1007/s10930-024-10183-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/02/2024] [Indexed: 03/18/2024]
Abstract
Protein-DNA and protein-RNA interactions are involved in many biological processes and regulate many cellular functions. Moreover, they are related to many human diseases. To understand the molecular mechanism of protein-DNA binding and protein-RNA binding, it is important to identify which residues in the protein sequence bind to DNA and RNA. At present, there are few methods for specifically identifying the binding sites of disease-related protein-DNA and protein-RNA. In this study, so we combined four machine learning algorithms into an ensemble classifier (EPDRNA) to predict DNA and RNA binding sites in disease-related proteins. The dataset used in model was collated from UniProt and PDB database, and PSSM, physicochemical properties and amino acid type were used as features. The EPDRNA adopted soft voting and achieved the best AUC value of 0.73 at the DNA binding sites, and the best AUC value of 0.71 at the RNA binding sites in 10-fold cross validation in the training sets. In order to further verify the performance of the model, we assessed EPDRNA for the prediction of DNA-binding sites and the prediction of RNA-binding sites on the independent test dataset. The EPDRNA achieved 85% recall rate and 25% precision on the protein-DNA interaction independent test set, and achieved 82% recall rate and 27% precision on the protein-RNA interaction independent test set. The online EPDRNA webserver is freely available at http://www.s-bioinformatics.cn/epdrna .
Collapse
Affiliation(s)
- CanZhuang Sun
- College of Science, Inner Mongolia Agriculture University, Hohhot, 010018, People's Republic of China
| | - YongE Feng
- College of Science, Inner Mongolia Agriculture University, Hohhot, 010018, People's Republic of China.
| |
Collapse
|
6
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
7
|
Wu H, Zhou B, Zhou H, Zhang P, Wang M. Be-1DCNN: a neural network model for chromatin loop prediction based on bagging ensemble learning. Brief Funct Genomics 2023; 22:475-484. [PMID: 37133976 DOI: 10.1093/bfgp/elad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 03/10/2023] [Accepted: 03/29/2023] [Indexed: 05/04/2023] Open
Abstract
The chromatin loops in the three-dimensional (3D) structure of chromosomes are essential for the regulation of gene expression. Despite the fact that high-throughput chromatin capture techniques can identify the 3D structure of chromosomes, chromatin loop detection utilizing biological experiments is arduous and time-consuming. Therefore, a computational method is required to detect chromatin loops. Deep neural networks can form complex representations of Hi-C data and provide the possibility of processing biological datasets. Therefore, we propose a bagging ensemble one-dimensional convolutional neural network (Be-1DCNN) to detect chromatin loops from genome-wide Hi-C maps. First, to obtain accurate and reliable chromatin loops in genome-wide contact maps, the bagging ensemble learning method is utilized to synthesize the prediction results of multiple 1DCNN models. Second, each 1DCNN model consists of three 1D convolutional layers for extracting high-dimensional features from input samples and one dense layer for producing the prediction results. Finally, the prediction results of Be-1DCNN are compared to those of the existing models. The experimental results indicate that Be-1DCNN predicts high-quality chromatin loops and outperforms the state-of-the-art methods using the same evaluation metrics. The source code of Be-1DCNN is available for free at https://github.com/HaoWuLab-Bioinformatics/Be1DCNN.
Collapse
Affiliation(s)
- Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
- School of Software, Shandong University, Jinan, 250101 Shandong, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| | - Meili Wang
- College of Information Engineering, Northwest A&F University, Yangling, 712100 Shaanxi, China
| |
Collapse
|
8
|
Shen Z, Liu W, Zhao S, Zhang Q, Wang S, Yuan L. Nucleotide-level prediction of CircRNA-protein binding based on fully convolutional neural network. Front Genet 2023; 14:1283404. [PMID: 37867600 PMCID: PMC10587422 DOI: 10.3389/fgene.2023.1283404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Accepted: 09/21/2023] [Indexed: 10/24/2023] Open
Abstract
Introduction: CircRNA-protein binding plays a critical role in complex biological activity and disease. Various deep learning-based algorithms have been proposed to identify CircRNA-protein binding sites. These methods predict whether the CircRNA sequence includes protein binding sites from the sequence level, and primarily concentrate on analysing the sequence specificity of CircRNA-protein binding. For model performance, these methods are unsatisfactory in accurately predicting motif sites that have special functions in gene expression. Methods: In this study, based on the deep learning models that implement pixel-level binary classification prediction in computer vision, we viewed the CircRNA-protein binding sites prediction as a nucleotide-level binary classification task, and use a fully convolutional neural networks to identify CircRNA-protein binding motif sites (CPBFCN). Results: CPBFCN provides a new path to predict CircRNA motifs. Based on the MEME tool, the existing CircRNA-related and protein-related database, we analysed the motif functions discovered by CPBFCN. We also investigated the correlation between CircRNA sponge and motif distribution. Furthermore, by comparing the motif distribution with different input sequence lengths, we found that some motifs in the flanking sequences of CircRNA-protein binding region may contribute to CircRNA-protein binding. Conclusion: This study contributes to identify circRNA-protein binding and provides help in understanding the role of circRNA-protein binding in gene expression regulation.
Collapse
Affiliation(s)
- Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, Henan, China
| | - Wei Liu
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, Henan, China
| | - ShuJun Zhao
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, Henan, China
| | - QinHu Zhang
- EIT Institute for Advanced Study, Ningbo, Zhejiang, China
| | - SiGuo Wang
- EIT Institute for Advanced Study, Ningbo, Zhejiang, China
| | - Lin Yuan
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| |
Collapse
|
9
|
Agarwal A, Kant S, Bahadur RP. Efficient mapping of RNA-binding residues in RNA-binding proteins using local sequence features of binding site residues in protein-RNA complexes. Proteins 2023; 91:1361-1379. [PMID: 37254800 DOI: 10.1002/prot.26528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 04/13/2023] [Accepted: 05/02/2023] [Indexed: 06/01/2023]
Abstract
Protein-RNA interactions play vital roles in plethora of biological processes such as regulation of gene expression, protein synthesis, mRNA processing and biogenesis. Identification of RNA-binding residues (RBRs) in proteins is essential to understand RNA-mediated protein functioning, to perform site-directed mutagenesis and to develop novel targeted drug therapies. Moreover, the extensive gap between sequence and structural data restricts the identification of binding sites in unsolved structures. However, efficient use of computational methods demanding only sequence to identify binding residues can bridge this huge sequence-structure gap. In this study, we have extensively studied protein-RNA interface in known RNA-binding proteins (RBPs). We find that the interface is highly enriched in basic and polar residues with Gly being the most common interface neighbor. We investigated several amino acid features and developed a method to predict putative RBRs from amino acid sequence. We have implemented balanced random forest (BRF) classifier with local residue features of protein sequences for prediction. With 5-fold cross-validations, the sequence pattern derived dipeptide composition based BRF model (DCP-BRF) resulted in an accuracy of 87.9%, specificity of 88.8%, sensitivity of 82.2%, Mathew's correlation coefficient of 0.60 and AUC of 0.93, performing better than few existing methods. We further validated our prediction model on known human RBPs through RBR prediction and could map ~54% of them. Further, knowledge of binding site preferences obtained from computational predictions combined with experimental validations of potential RNA binding sites can enhance our understanding of protein-RNA interactions. This may serve to accelerate investigations on functional roles of many novel RBPs.
Collapse
Affiliation(s)
- Ankita Agarwal
- School of Bio Science, Indian Institute of Technology Kharagpur, Kharagpur, India
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Shri Kant
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| |
Collapse
|
10
|
Li P, Liu ZP. GeoBind: segmentation of nucleic acid binding interface on protein surface with geometric deep learning. Nucleic Acids Res 2023; 51:e60. [PMID: 37070217 PMCID: PMC10250245 DOI: 10.1093/nar/gkad288] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 03/21/2023] [Accepted: 04/06/2023] [Indexed: 04/19/2023] Open
Abstract
Unveiling the nucleic acid binding sites of a protein helps reveal its regulatory functions in vivo. Current methods encode protein sites from the handcrafted features of their local neighbors and recognize them via a classification, which are limited in expressive ability. Here, we present GeoBind, a geometric deep learning method for predicting nucleic binding sites on protein surface in a segmentation manner. GeoBind takes the whole point clouds of protein surface as input and learns the high-level representation based on the aggregation of their neighbors in local reference frames. Testing GeoBind on benchmark datasets, we demonstrate GeoBind is superior to state-of-the-art predictors. Specific case studies are performed to show the powerful ability of GeoBind to explore molecular surfaces when deciphering proteins with multimer formation. To show the versatility of GeoBind, we further extend GeoBind to five other types of ligand binding sites prediction tasks and achieve competitive performances.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
11
|
Zhang F, Li M, Zhang J, Kurgan L. HybridRNAbind: prediction of RNA interacting residues across structure-annotated and disorder-annotated proteins. Nucleic Acids Res 2023; 51:e25. [PMID: 36629262 PMCID: PMC10018345 DOI: 10.1093/nar/gkac1253] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 11/22/2022] [Accepted: 12/15/2022] [Indexed: 01/12/2023] Open
Abstract
The sequence-based predictors of RNA-binding residues (RBRs) are trained on either structure-annotated or disorder-annotated binding regions. A recent study of predictors of protein-binding residues shows that they are plagued by high levels of cross-predictions (protein binding residues are predicted as nucleic acid binding) and that structure-trained predictors perform poorly for the disorder-annotated regions and vice versa. Consequently, we analyze a representative set of the structure and disorder trained predictors of RBRs to comprehensively assess quality of their predictions. Our empirical analysis that relies on a new and low-similarity benchmark dataset reveals that the structure-trained predictors of RBRs perform well for the structure-annotated proteins while the disorder-trained predictors provide accurate results for the disorder-annotated proteins. However, these methods work only modestly well on the opposite types of annotations, motivating the need for new solutions. Using an empirical approach, we design HybridRNAbind meta-model that generates accurate predictions and low amounts of cross-predictions when tested on data that combines structure and disorder-annotated RBRs. We release this meta-model as a convenient webserver which is available at https://www.csuligroup.com/hybridRNAbind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
12
|
Sun Z, Zheng S, Zhao H, Niu Z, Lu Y, Pan Y, Yang Y. To Improve Prediction of Binding Residues With DNA, RNA, Carbohydrate, and Peptide Via Multi-Task Deep Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3735-3743. [PMID: 34637380 DOI: 10.1109/tcbb.2021.3118916] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
MOTIVATION The interactions of proteins with DNA, RNA, peptide, and carbohydrate play key roles in various biological processes. The studies of uncharacterized protein-molecules interactions could be aided by accurate predictions of residues that bind with partner molecules. However, the existing methods for predicting binding residues on proteins remain of relatively low accuracies due to the limited number of complex structures in databases. As different types of molecules partially share chemical mechanisms, the predictions for each molecular type should benefit from the binding information with other molecule types. RESULTS In this study, we employed a multiple task deep learning strategy to develop a new sequence-based method for simultaneously predicting binding residues/sites with multiple important molecule types named MTDsite. By combining four training sets for DNA, RNA, peptide, and carbohydrate-binding proteins, our method yielded accurate and robust predictions with AUC values of 0.852, 0836, 0.758, and 0.776 on their respective independent test sets, which are 0.52 to 6.6% better than other state-of-the-art methods. To my best knowledge, this is the first method using multi-task framework to predict multiple molecular binding sites simultaneously.
Collapse
|
13
|
Bheemireddy S, Sandhya S, Srinivasan N, Sowdhamini R. Computational tools to study RNA-protein complexes. Front Mol Biosci 2022; 9:954926. [PMID: 36275618 PMCID: PMC9585174 DOI: 10.3389/fmolb.2022.954926] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/20/2022] [Indexed: 11/19/2022] Open
Abstract
RNA is the key player in many cellular processes such as signal transduction, replication, transport, cell division, transcription, and translation. These diverse functions are accomplished through interactions of RNA with proteins. However, protein–RNA interactions are still poorly derstood in contrast to protein–protein and protein–DNA interactions. This knowledge gap can be attributed to the limited availability of protein-RNA structures along with the experimental difficulties in studying these complexes. Recent progress in computational resources has expanded the number of tools available for studying protein-RNA interactions at various molecular levels. These include tools for predicting interacting residues from primary sequences, modelling of protein-RNA complexes, predicting hotspots in these complexes and insights into derstanding in the dynamics of their interactions. Each of these tools has its strengths and limitations, which makes it significant to select an optimal approach for the question of interest. Here we present a mini review of computational tools to study different aspects of protein-RNA interactions, with focus on overall application, development of the field and the future perspectives.
Collapse
Affiliation(s)
- Sneha Bheemireddy
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Sankaran Sandhya
- Department of Biotechnology, Faculty of Life and Allied Health Sciences, M.S. Ramaiah University of Applied Sciences, Bengaluru, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| | | | - Ramanathan Sowdhamini
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, TIFR, GKVK Campus, Bangalore, India
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| |
Collapse
|
14
|
Pepe G, Appierdo R, Carrino C, Ballesio F, Helmer-Citterich M, Gherardini PF. Artificial intelligence methods enhance the discovery of RNA interactions. Front Mol Biosci 2022; 9:1000205. [PMID: 36275611 PMCID: PMC9585310 DOI: 10.3389/fmolb.2022.1000205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type.
Collapse
Affiliation(s)
- G Pepe
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - R Appierdo
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - C Carrino
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - F Ballesio
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - M Helmer-Citterich
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - PF Gherardini
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
15
|
Li P, Liu ZP. PST-PRNA: prediction of RNA-binding sites using protein surface topography and deep learning. Bioinformatics 2022; 38:2162-2168. [PMID: 35150250 DOI: 10.1093/bioinformatics/btac078] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 01/20/2022] [Accepted: 02/05/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Protein-RNA interactions play essential roles in many biological processes, including pre-mRNA processing, post-transcriptional gene regulation and RNA degradation. Accurate identification of binding sites on RNA-binding proteins (RBPs) is important for functional annotation and site-directed mutagenesis. Experimental assays to sparse RBPs are precise and convincing but also costly and time consuming. Therefore, flexible and reliable computational methods are required to recognize RNA-binding residues. RESULTS In this work, we propose PST-PRNA, a novel model for predicting RNA-binding sites (PRNA) based on protein surface topography (PST). Taking full advantage of the 3D structural information of protein, PST-PRNA creates representative topography images of the entire protein surface by mapping it onto a unit spherical surface. Four kinds of descriptors are encoded to represent residues on the surface. Then, the potential features are integrated and optimized by using deep learning models. We compile a comprehensive non-redundant RBP dataset to train and test PST-PRNA using 10-fold cross-validation. Numerous experiments demonstrate PST-PRNA learns successfully the latent structural information of protein surface. On the non-redundant dataset with sequence identity of 0.3, PST-PRNA achieves area under the receiver operating characteristic curves (AUC) value of 0.860 and Matthew's correlation coefficient value of 0.420. Furthermore, we construct a completely independent test dataset for justification and comparison. PST-PRNA achieves AUC value of 0.913 on the independent dataset, which is superior to the other state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION The code and data are available at https://www.github.com/zpliulab/PST-PRNA. A web server is freely available at http://www.zpliulab.cn/PSTPRNA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
16
|
PRIP: A Protein-RNA Interface Predictor Based on Semantics of Sequences. Life (Basel) 2022; 12:life12020307. [PMID: 35207594 PMCID: PMC8879494 DOI: 10.3390/life12020307] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 01/28/2022] [Accepted: 02/04/2022] [Indexed: 01/08/2023] Open
Abstract
RNA–protein interactions play an indispensable role in many biological processes. Growing evidence has indicated that aberration of the RNA–protein interaction is associated with many serious human diseases. The precise and quick detection of RNA–protein interactions is crucial to finding new functions and to uncovering the mechanism of interactions. Although many methods have been presented to recognize RNA-binding sites, there is much room left for the improvement of predictive accuracy. We present a sequence semantics-based method (called PRIP) for predicting RNA-binding interfaces. The PRIP extracted semantic embedding by pre-training the Word2vec with the corpus. Extreme gradient boosting was employed to train a classifier. The PRIP obtained a SN of 0.73 over the five-fold cross validation and a SN of 0.67 over the independent test, outperforming the state-of-the-art methods. Compared with other methods, this PRIP learned the hidden relations between words in the context. The analysis of the semantics relationship implied that the semantics of some words were specific to RNA-binding interfaces. This method is helpful to explore the mechanism of RNA–protein interactions from a semantics point of view.
Collapse
|
17
|
A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences. Comput Struct Biotechnol J 2022; 20:3195-3207. [PMID: 35832617 PMCID: PMC9249596 DOI: 10.1016/j.csbj.2022.06.036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 06/14/2022] [Accepted: 06/14/2022] [Indexed: 11/24/2022] Open
Abstract
RNA are master players in various cellular and biological processes and RNA-protein interactions are vital for proper functioning of cellular machineries. Knowledge of binding sites is crucial to decipher their functional implications. RNA NC-triplet and NC-quartet features could give reasonably high performance. RF model outperformed other machine learning classifiers with 85% accuracy and 0.93 AUC and performed better than few existing methods. An online webserver “Nucpred” is developed with trained model and freely accessible for scientific community.
RNA-protein interactions play vital roles in driving the cellular machineries. Despite significant involvement in several biological processes, the underlying molecular mechanism of RNA-protein interactions is still elusive. This may be due to the experimental difficulties in solving co-crystallized RNA-protein complexes. Inherent flexibility of RNA molecules to adopt different conformations makes them functionally diverse. Their interactions with protein have implications in RNA disease biology. Thus, study of binding interfaces can provide a mechanistic insight of the molecular functioning and aberrations caused due to altered interactions. Moreover, high-throughput sequencing technologies have generated huge sequence data compared to available structural data of RNA-protein complexes. In such a scenario, efficient computational algorithms are required for identification of protein-binding interfaces of RNA in the absence of known structures. We have investigated several machine learning classifiers and various features derived from nucleotide sequences to identify protein-binding nucleotides in RNA. We achieve best performance with nucleotide-triplet and nucleotide-quartet feature-based random forest models. An overall accuracy of 84.8%, sensitivity of 83.2%, specificity of 86.1%, MCC of 0.70 and AUC of 0.93 is achieved. We have further implemented the developed models in a user-friendly webserver “Nucpred”, which is freely accessible at “http://www.csb.iitkgp.ac.in/applications/Nucpred/index”.
Collapse
|
18
|
Oldfield CJ, Peng Z, Kurgan L. Disordered RNA-Binding Region Prediction with DisoRDPbind. Methods Mol Biol 2021; 2106:225-239. [PMID: 31889261 DOI: 10.1007/978-1-0716-0231-7_14] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
RNA chaperone activity is one of the many functions of intrinsically disordered regions (IDRs). IDRs function without the prerequisite of a stable structure. Instead, their functions arise from structural ensembles. A common theme in IDR function is molecular recognition; IDRs mediate interactions with other proteins, RNA, and DNA. Many computational methods are available to predict IDRs from protein sequence, but relatively few are available for predicting IDR functions. Available methods primarily focus on protein-protein interactions. DisoRDPbind was developed to predict several protein functions including interactions with RNA. This method is available as a user-friendly web interface, located at http://biomine.cs.vcu.edu/servers/DisoRDPbind/ . The development and architecture of DisoRDPbind is briefly presented, and its accuracy relative to other RNA-binding residue predictors is discussed. We explain usage of the web interface in detail and provide an example of prediction results and interpretation. While DisoRDPbind does not identify RNA chaperones directly, we provide a case study of an RNA chaperone, HCV core protein, as an example of the method's utility in the study of RNA chaperones.
Collapse
Affiliation(s)
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, People's Republic of China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
19
|
Ju R, Zhou P, Wen S, Wei W, Xue Y, Huang X, Yang X. 3D-CNN-SPP: A Patient Risk Prediction System From Electronic Health Records via 3D CNN and Spatial Pyramid Pooling. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2019.2960474] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
20
|
Wang Y, Zhou M, Zou Q, Xu L. Machine learning for phytopathology: from the molecular scale towards the network scale. Brief Bioinform 2021; 22:6204793. [PMID: 33787847 DOI: 10.1093/bib/bbab037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 01/09/2021] [Accepted: 01/26/2021] [Indexed: 01/16/2023] Open
Abstract
With the increasing volume of high-throughput sequencing data from a variety of omics techniques in the field of plant-pathogen interactions, sorting, retrieving, processing and visualizing biological information have become a great challenge. Within the explosion of data, machine learning offers powerful tools to process these complex omics data by various algorithms, such as Bayesian reasoning, support vector machine and random forest. Here, we introduce the basic frameworks of machine learning in dissecting plant-pathogen interactions and discuss the applications and advances of machine learning in plant-pathogen interactions from molecular to network biology, including the prediction of pathogen effectors, plant disease resistance protein monitoring and the discovery of protein-protein networks. The aim of this review is to provide a summary of advances in plant defense and pathogen infection and to indicate the important developments of machine learning in phytopathology.
Collapse
Affiliation(s)
- Yansu Wang
- Postdoctoral Innovation Practice Base, Shenzhen Polytechnic, China
| | | | - Quan Zou
- University of Electronic Science and Technology of China
| | - Lei Xu
- Shenzhen Polytechnic, China
| |
Collapse
|
21
|
Yang C, Ding Y, Meng Q, Tang J, Guo F. Granular multiple kernel learning for identifying RNA-binding protein residues via integrating sequence and structure information. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05573-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
22
|
Zhang J, Chen Q, Liu B. NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning. Brief Bioinform 2021; 22:6102667. [PMID: 33454744 DOI: 10.1093/bib/bbaa397] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/05/2020] [Accepted: 12/03/2020] [Indexed: 01/01/2023] Open
Abstract
The interactions between proteins and nucleic acid sequences play many important roles in gene expression and some cellular activities. Accurate prediction of the nucleic acid binding residues in proteins will facilitate the research of the protein functions, gene expression, drug design, etc. In this regard, several computational methods have been proposed to predict the nucleic acid binding residues in proteins. However, these methods cannot satisfactorily measure the global interactions among the residues along protein. Furthermore, these methods are suffering cross-prediction problem, new strategies should be explored to solve this problem. In this study, a new computational method called NCBRPred was proposed to predict the nucleic acid binding residues based on the multilabel sequence labeling model. NCBRPred used the bidirectional Gated Recurrent Units (BiGRUs) to capture the global interactions among the residues, and treats this task as a multilabel learning task. Experimental results on three widely used benchmark datasets and an independent dataset showed that NCBRPred achieved higher predictive results with lower cross-prediction, outperforming 10 existing state-of-the-art predictors. The web-server and a stand-alone package of NCBRPred are freely available at http://bliulab.net/NCBRPred. It is anticipated that NCBRPred will become a very useful tool for identifying nucleic acid binding residues.
Collapse
Affiliation(s)
- Jun Zhang
- Computer Science and Technology with Harbin Institute of Technology, Shenzhen, China
| | - Qingcai Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| |
Collapse
|
23
|
Liu Y, Gong W, Yang Z, Li C. SNB-PSSM: A spatial neighbor-based PSSM used for protein-RNA binding site prediction. J Mol Recognit 2021; 34:e2887. [PMID: 33442949 DOI: 10.1002/jmr.2887] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 12/22/2020] [Accepted: 12/23/2020] [Indexed: 01/09/2023]
Abstract
Protein-RNA interactions play essential roles in a wide variety of biological processes. Recognition of RNA-binding residues on proteins has been a challenging problem. Most of methods utilize the position-specific scoring matrix (PSSM). It has been found that considering the evolutionary information of sequence neighboring residues can improve the prediction. In this work, we introduce a novel method SNB-PSSM (spatial neighbor-based PSSM) combined with the structure window scheme where the evolutionary information of spatially neighboring residues is considered. The results show our method consistently outperforms the standard and smoothed PSSM methods. Tested on multiple datasets, this approach shows an encouraging performance compared with RNABindRPlus, BindN+, PPRInt, xypan, Predict_RBP, SpaPF, PRNA, and KYG, although is inferior to RNAProSite, RBscore, and aaRNA. In addition, since our method is not sensitive to protein structure changes, it can be applied well on binding site predictions of modeled structures. Thus, the result also suggests the evolution of binding sites is spatially cooperative. The proposed method as an effective tool of considering evolutionary information can be widely used for the nucleic acid-/protein-binding site prediction and functional motif finding.
Collapse
Affiliation(s)
- Yang Liu
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Weikang Gong
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Zhen Yang
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| | - Chunhua Li
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China
| |
Collapse
|
24
|
Abstract
Biological processes are often mediated by complexes formed between proteins and various biomolecules. The 3D structures of such protein-biomolecule complexes provide insights into the molecular mechanism of their action. The structure of these complexes can be predicted by various computational methods. Choosing an appropriate method for modelling depends on the category of biomolecule that a protein interacts with and the availability of structural information about the protein and its interacting partner. We intend for the contents of this chapter to serve as a guide as to what software would be the most appropriate for the type of data at hand and the kind of 3D complex structure required. Particularly, we have dealt with protein-small molecule ligand, protein-peptide, protein-protein, and protein-nucleic acid interactions.Most, if not all, model building protocols perform some sampling and scoring. Typically, several alternate conformations and configurations of the interactors are sampled. Each such sample is then scored for optimization. To boost the confidence in these predicted models, their assessment using other independent scoring schemes besides the inbuilt/default ones would prove to be helpful. This chapter also lists such software and serves as a guide to gauge the fidelity of modelled structures of biomolecular complexes.
Collapse
|
25
|
Wang H, Zhao Y. RBinds: A user-friendly server for RNA binding site prediction. Comput Struct Biotechnol J 2020; 18:3762-3765. [PMID: 34136090 PMCID: PMC8164131 DOI: 10.1016/j.csbj.2020.10.043] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/27/2020] [Accepted: 10/31/2020] [Indexed: 12/03/2022] Open
Abstract
RNA performs various biological functions by interacting with other molecules. The knowledge of RNA binding sites is essential for the understanding of RNA-protein or RNA-ligand complex structures and their mechanisms. However, the RNA binding site prediction study requires tedious programming scripts and manual handling. One user-friendly bioinformatics tool for RNA binding site prediction has been missing. This limitation motivated us to develop the RBinds, a user-friendly web server, to predict the RNA binding site using a simple graphical user interface. Some advanced features implemented in RBinds are (1) transforming the RNA structure to a network automatically; (2) analyzing the structural network properties to predict binding site; (3) constructing one annotated force-directed network; (4) providing a visualization tool for users to scale and rotate the structure; (5) offering the related tools to predict or simulate RNA structures. RBinds web server is a reliable and user-friendly tool and facilitates the RNA binding site study without installing programs locally. RBinds is freely accessible at http://zhaoserver.com.cn/RBinds/RBinds.html.
Collapse
Affiliation(s)
- Huiwen Wang
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Yunjie Zhao
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
26
|
Wang K, Hu G, Wu Z, Su H, Yang J, Kurgan L. Comprehensive Survey and Comparative Assessment of RNA-Binding Residue Predictions with Analysis by RNA Type. Int J Mol Sci 2020; 21:E6879. [PMID: 32961749 PMCID: PMC7554811 DOI: 10.3390/ijms21186879] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 09/15/2020] [Accepted: 09/17/2020] [Indexed: 02/07/2023] Open
Abstract
With close to 30 sequence-based predictors of RNA-binding residues (RBRs), this comparative survey aims to help with understanding and selection of the appropriate tools. We discuss past reviews on this topic, survey a comprehensive collection of predictors, and comparatively assess six representative methods. We provide a novel and well-designed benchmark dataset and we are the first to report and compare protein-level and datasets-level results, and to contextualize performance to specific types of RNAs. The methods considered here are well-cited and rely on machine learning algorithms on occasion combined with homology-based prediction. Empirical tests reveal that they provide relatively accurate predictions. Virtually all methods perform well for the proteins that interact with rRNAs, some generate accurate predictions for mRNAs, snRNA, SRP and IRES, while proteins that bind tRNAs are predicted poorly. Moreover, except for DRNApred, they confuse DNA and RNA-binding residues. None of the six methods consistently outperforms the others when tested on individual proteins. This variable and complementary protein-level performance suggests that users should not rely on applying just the single best dataset-level predictor. We recommend that future work should focus on the development of approaches that facilitate protein-level selection of accurate predictors and the consensus-based prediction of RBRs.
Collapse
Affiliation(s)
- Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China;
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Hong Su
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Jianyi Yang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
27
|
Yang H, Deng Z, Pan X, Shen HB, Choi KS, Wang L, Wang S, Wu J. RNA-binding protein recognition based on multi-view deep feature and multi-label learning. Brief Bioinform 2020; 22:5893431. [PMID: 32808039 DOI: 10.1093/bib/bbaa174] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 06/17/2020] [Accepted: 07/09/2020] [Indexed: 12/28/2022] Open
Abstract
RNA-binding protein (RBP) is a class of proteins that bind to and accompany RNAs in regulating biological processes. An RBP may have multiple target RNAs, and its aberrant expression can cause multiple diseases. Methods have been designed to predict whether a specific RBP can bind to an RNA and the position of the binding site using binary classification model. However, most of the existing methods do not take into account the binding similarity and correlation between different RBPs. While methods employing multiple labels and Long Short Term Memory Network (LSTM) are proposed to consider binding similarity between different RBPs, the accuracy remains low due to insufficient feature learning and multi-label learning on RNA sequences. In response to this challenge, the concept of RNA-RBP Binding Network (RRBN) is proposed in this paper to provide theoretical support for multi-label learning to identify RBPs that can bind to RNAs. It is experimentally shown that the RRBN information can significantly improve the prediction of unknown RNA-RBP interactions. To further improve the prediction accuracy, we present the novel computational method iDeepMV which integrates multi-view deep learning technology under the multi-label learning framework. iDeepMV first extracts data from the views of amino acid sequence and dipeptide component based on the RNA sequences as the original view. Deep neural network models are then designed for the respective views to perform deep feature learning. The extracted deep features are fed into multi-label classifiers which are trained with the RNA-RBP interaction information for the three views. Finally, a voting mechanism is designed to make comprehensive decision on the results of the multi-label classifiers. Our experimental results show that the prediction performance of iDeepMV, which combines multi-view deep feature learning models with RNA-RBP interaction information, is significantly better than that of the state-of-the-art methods. iDeepMV is freely available at http://www.csbio.sjtu.edu.cn/bioinf/iDeepMV for academic use. The code is freely available at http://github.com/uchihayht/iDeepMV.
Collapse
Affiliation(s)
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science of Jiangnan University, Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (LCNBI) and ZJLab
| | - Xiaoyong Pan
- Department of Automation of Shanghai Jiao Tong University
| | | | | | - Lei Wang
- School of Biotechnology and Key Laboratory of Industrial Biotechnology Ministry in Jiangnan University
| | - Shitong Wang
- School of Artificial Intelligence and Computer Science of Jiangnan University
| | - Jing Wu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology Ministry in Jiangnan University
| |
Collapse
|
28
|
Wang YG, Huang SY, Wang LN, Zhou ZY, Qiu JD. Accurate prediction of species-specific 2-hydroxyisobutyrylation sites based on machine learning frameworks. Anal Biochem 2020; 602:113793. [DOI: 10.1016/j.ab.2020.113793] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 04/25/2020] [Accepted: 05/20/2020] [Indexed: 12/17/2022]
|
29
|
Identification of common and dissimilar biomarkers for different cancer types from gene expressions of RNA-sequencing data. GENE REPORTS 2020. [DOI: 10.1016/j.genrep.2020.100654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
30
|
Zheng N, Wang K, Zhan W, Deng L. Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches. Curr Drug Metab 2019; 20:177-184. [PMID: 30156155 DOI: 10.2174/1389200219666180829121038] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/21/2018] [Accepted: 08/02/2018] [Indexed: 01/15/2023]
Abstract
BACKGROUND Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions. METHODS In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods. RESULTS We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions. CONCLUSION The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.
Collapse
Affiliation(s)
- Nantao Zheng
- School of Software, Central South University, Changsha, 410075, China
| | - Kairou Wang
- School of Software, Central South University, Changsha, 410075, China
| | - Weihua Zhan
- School of Electronics and Computer Science, Zhejiang Wanli University, Ningbo 315100, China
| | - Lei Deng
- School of Software, Central South University, Changsha, 410075, China.,Shanghai Key Lab of Intelligent Information Processing, Shanghai 200433, China
| |
Collapse
|
31
|
Ju Y, Yuan L, Yang Y, Zhao H. CircSLNN: Identifying RBP-Binding Sites on circRNAs via Sequence Labeling Neural Networks. Front Genet 2019; 10:1184. [PMID: 31824574 PMCID: PMC6886371 DOI: 10.3389/fgene.2019.01184] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Accepted: 10/25/2019] [Indexed: 11/28/2022] Open
Abstract
The interactions between RNAs and RNA binding proteins (RBPs) are crucial for understanding post-transcriptional regulation mechanisms. A lot of computational tools have been developed to automatically predict the binding relationship between RNAs and RBPs. However, most of the methods can only predict the presence or absence of binding sites for a sequence fragment, without providing specific information on the position or length of the binding sites. Besides, the existing tools focus on the interaction between RBPs and linear RNAs, while the binding sites on circular RNAs (circRNAs) have been rarely studied. In this study, we model the prediction of binding sites on RNAs as a sequence labeling problem, and propose a new model called circSLNN to identify the specific location of RBP-binding sites on circRNAs. CircSLNN is driven by pretrained RNA embedding vectors and a composite labeling model. On our constructed circRNA datasets, our model has an average F1 score of 0.790. We assess the performance on full-length RNA sequences, the proposed model outperforms previous classification-based models by a large margin.
Collapse
Affiliation(s)
- Yuqi Ju
- Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Liangliang Yuan
- Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Yang
- Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.,Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China.,Brain Science and Technology Research Center, Shanghai Jiao Tong University, Shanghai, China
| | - Hai Zhao
- Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.,Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China.,Brain Science and Technology Research Center, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
32
|
Sagar A, Xue B. Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019; 26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022]
Abstract
The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review.
Collapse
Affiliation(s)
- Amit Sagar
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| |
Collapse
|
33
|
Westerhout J, Krone T, Snippe A, Babé L, McClain S, Ladics GS, Houben GF, Verhoeckx KCM. Allergenicity prediction of novel and modified proteins: Not a mission impossible! Development of a Random Forest allergenicity prediction model. Regul Toxicol Pharmacol 2019; 107:104422. [DOI: 10.1016/j.yrtph.2019.104422] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 06/21/2019] [Accepted: 07/12/2019] [Indexed: 11/29/2022]
|
34
|
Wang X, Yu B, Ma A, Chen C, Liu B, Ma Q. Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique. Bioinformatics 2019; 35:2395-2402. [PMID: 30520961 PMCID: PMC6612859 DOI: 10.1093/bioinformatics/bty995] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2018] [Revised: 11/19/2018] [Accepted: 12/03/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The prediction of protein-protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. RESULTS A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2-15.7% and 6.1-18.9% higher than the other existing tools, respectively. AVAILABILITY AND IMPLEMENTATION The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoying Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, China
- School of Mathematics, Shandong University, Jinan, China
- Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, China
- Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, China
- School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Anjun Ma
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, USA
- Department Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, China
- Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, China
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, China
| | - Qin Ma
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, USA
- Department Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
35
|
Pan Y, Wang Z, Zhan W, Deng L. Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach. Bioinformatics 2019; 34:1473-1480. [PMID: 29281004 DOI: 10.1093/bioinformatics/btx822] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 12/19/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Identifying RNA-binding residues, especially energetically favored hot spots, can provide valuable clues for understanding the mechanisms and functional importance of protein-RNA interactions. Yet, limited availability of experimentally recognized energy hot spots in protein-RNA crystal structures leads to the difficulties in developing empirical identification approaches. Computational prediction of RNA-binding hot spot residues is still in its infant stage. Results Here, we describe a computational method, PrabHot (Prediction of protein-RNA binding hot spots), that can effectively detect hot spot residues on protein-RNA binding interfaces using an ensemble of conceptually different machine learning classifiers. Residue interaction network features and new solvent exposure characteristics are combined together and selected for classification with the Boruta algorithm. In particular, two new reference datasets (benchmark and independent) have been generated containing 107 hot spots from 47 known protein-RNA complex structures. In 10-fold cross-validation on the training dataset, PrabHot achieves promising performances with an AUC score of 0.86 and a sensitivity of 0.78, which are significantly better than that of the pioneer RNA-binding hot spot prediction method HotSPRing. We also demonstrate the capability of our proposed method on the independent test dataset and gain a competitive advantage as a result. Availability and implementation The PrabHot webserver is freely available at http://denglab.org/PrabHot/. Contact leideng@csu.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuliang Pan
- School of Software, Central South University, Changsha 410075, China
| | - Zixiang Wang
- School of Software, Central South University, Changsha 410075, China
| | - Weihua Zhan
- School of Electronics and Computer Science, Zhejiang Wanli University, Ningbo 315100, China
| | - Lei Deng
- School of Software, Central South University, Changsha 410075, China
- Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China
| |
Collapse
|
36
|
Hu W, Qin L, Li M, Pu X, Guo Y. Individually double minimum-distance definition of protein-RNA binding residues and application to structure-based prediction. J Comput Aided Mol Des 2018; 32:1363-1373. [PMID: 30478757 DOI: 10.1007/s10822-018-0177-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 11/14/2018] [Indexed: 01/01/2023]
Abstract
Identifying protein-RNA binding residues is essential for understanding the mechanism of protein-RNA interactions. So far, rigid distance thresholds are commonly used to define protein-RNA binding residues. However, after investigating 182 non-redundant protein-RNA complexes, we find that it would be unsuitable for a certain amount of complexes since the distances between proteins and RNAs vary widely. In this work, a novel definition method was proposed based on a flexible distance cutoff. This method can fully consider the individual differences among complexes by setting a variable tolerance limit of protein-RNA interactions, i.e. the double minimum-distance by which different distance thresholds are achieved for different complexes. In order to validate our method, a comprehensive comparison between our flexible method and traditional rigid methods was implemented in terms of interface structure, amino acid composition, interface area and interaction force, etc. The results indicate that this method is more reasonable because it incorporates the specificity of different complexes by extracting the important residues lost by rigid distance methods and discarding some redundant residues. Finally, to further test our double minimum-distance definition strategy, we developed a classifier to predict those binding sites derived from our new method by using structural features and a random forest machine learning algorithm. The model achieved a satisfactory prediction performance and the accuracy on independent data sets reaches to 85.0%. To the best of our knowledge, it is the first prediction model to define positive and negative samples using a flexible cutoff. So the comparison analysis and modeling results have demonstrated that our method would be a very promising strategy for more precisely defining protein-RNA binding sites.
Collapse
Affiliation(s)
- Wen Hu
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China
| | - Liu Qin
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China.
| |
Collapse
|
37
|
Xu Y, Yang Y, Ding J, Li C. iGlu-Lys: A Predictor for Lysine Glutarylation Through Amino Acid Pair Order Features. IEEE Trans Nanobioscience 2018; 17:394-401. [DOI: 10.1109/tnb.2018.2848673] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
38
|
Niu M, Li Y, Wang C, Han K. RFAmyloid: A Web Server for Predicting Amyloid Proteins. Int J Mol Sci 2018; 19:ijms19072071. [PMID: 30013015 PMCID: PMC6073578 DOI: 10.3390/ijms19072071] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 07/10/2018] [Accepted: 07/12/2018] [Indexed: 12/22/2022] Open
Abstract
Amyloid is an insoluble fibrous protein and its mis-aggregation can lead to some diseases, such as Alzheimer’s disease and Creutzfeldt–Jakob’s disease. Therefore, the identification of amyloid is essential for the discovery and understanding of disease. We established a novel predictor called RFAmy based on random forest to identify amyloid, and it employed SVMProt 188-D feature extraction method based on protein composition and physicochemical properties and pse-in-one feature extraction method based on amino acid composition, autocorrelation pseudo acid composition, profile-based features and predicted structures features. In the ten-fold cross-validation test, RFAmy’s overall accuracy was 89.19% and F-measure was 0.891. Results were obtained by comparison experiments with other feature, classifiers, and existing methods. This shows the effectiveness of RFAmy in predicting amyloid protein. The RFAmy proposed in this paper can be accessed through the URL http://server.malab.cn/RFAmyloid/.
Collapse
Affiliation(s)
- Mengting Niu
- School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Yanjuan Li
- School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150040, China.
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150040, China.
| |
Collapse
|
39
|
Chowdhury S, Zhang J, Kurgan L. In Silico Prediction and Validation of Novel RNA Binding Proteins and Residues in the Human Proteome. Proteomics 2018; 18:e1800064. [PMID: 29806170 DOI: 10.1002/pmic.201800064] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 05/05/2018] [Indexed: 12/22/2022]
Abstract
Deciphering a complete landscape of protein-RNA interactions in the human proteome remains an elusive challenge. We computationally elucidate RNA binding proteins (RBPs) using an approach that complements previous efforts. We employ two modern complementary sequence-based methods that provide accurate predictions from the structured and the intrinsically disordered sequences, even in the absence of sequence similarity to the known RBPs. We generate and analyze putative RNA binding residues on the whole proteome scale. Using a conservative setting that ensures low, 5% false positive rate, we identify 1511 putative RBPs that include 281 known RBPs and 166 RBPs that were previously predicted. We empirically demonstrate that these overlaps are statistically significant. We also validate the putative RBPs based on two major hallmarks of their RNA binding residues: high levels of evolutionary conservation and enrichment in charged amino acids. Moreover, we show that the novel RBPs are significantly under-annotated functionally which coincides with the fact that they were not yet found to interact with RNAs. We provide two examples of our novel putative RBPs for which there is recent evidence of their interactions with RNAs. The dataset of novel putative RBPs and RNA binding residues for the future hypothesis generation is provided in the Supporting Information.
Collapse
Affiliation(s)
- Shomeek Chowdhury
- Dr. Vikram Sarabhai Institute of Cell and Molecular Biology, Maharaja Sayajirao University of Baroda, Gujarat, 390005, India.,Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Jian Zhang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.,School of Computer and Information Technology, Xinyang Normal University, Xinyang, 464000, P. R. China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
40
|
Improving conditional random field model for prediction of protein-RNA residue-base contacts. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-018-0136-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
41
|
Cheng L, Leung KS. Quantification of non-coding RNA target localization diversity and its application in cancers. J Mol Cell Biol 2018; 10:130-138. [DOI: 10.1093/jmcb/mjy006] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2017] [Accepted: 01/24/2018] [Indexed: 12/13/2022] Open
Affiliation(s)
- Lixin Cheng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| |
Collapse
|
42
|
Shen WJ, Cui W, Chen D, Zhang J, Xu J. RPiRLS: Quantitative Predictions of RNA Interacting with Any Protein of Known Sequence. Molecules 2018; 23:molecules23030540. [PMID: 29495575 PMCID: PMC6017498 DOI: 10.3390/molecules23030540] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Revised: 02/24/2018] [Accepted: 02/25/2018] [Indexed: 02/05/2023] Open
Abstract
RNA-protein interactions (RPIs) have critical roles in numerous fundamental biological processes, such as post-transcriptional gene regulation, viral assembly, cellular defence and protein synthesis. As the number of available RNA-protein binding experimental data has increased rapidly due to high-throughput sequencing methods, it is now possible to measure and understand RNA-protein interactions by computational methods. In this study, we integrate a sequence-based derived kernel with regularized least squares to perform prediction. The derived kernel exploits the contextual information around an amino acid or a nucleic acid as well as the repetitive conserved motif information. We propose a novel machine learning method, called RPiRLS to predict the interaction between any RNA and protein of known sequences. For the RPiRLS classifier, each protein sequence comprises up to 20 diverse amino acids but for the RPiRLS-7G classifier, each protein sequence is represented by using 7-letter reduced alphabets based on their physiochemical properties. We evaluated both methods on a number of benchmark data sets and compared their performances with two newly developed and state-of-the-art methods, RPI-Pred and IPMiner. On the non-redundant benchmark test sets extracted from the PRIDB, the RPiRLS method outperformed RPI-Pred and IPMiner in terms of accuracy, specificity and sensitivity. Further, RPiRLS achieved an accuracy of 92% on the prediction of lncRNA-protein interactions. The proposed method can also be extended to construct RNA-protein interaction networks. The RPiRLS web server is freely available at http://bmc.med.stu.edu.cn/RPiRLS.
Collapse
Affiliation(s)
- Wen-Jun Shen
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| | - Wenjuan Cui
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.
| | - Danze Chen
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| | - Jieming Zhang
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| | - Jianzhen Xu
- Department of Bioinformatics, Shantou University Medical College, Shantou 515000, Guangdong, China.
| |
Collapse
|
43
|
Ding L, Wang M, Sun D, Li A. TPGLDA: Novel prediction of associations between lncRNAs and diseases via lncRNA-disease-gene tripartite graph. Sci Rep 2018; 8:1065. [PMID: 29348552 PMCID: PMC5773503 DOI: 10.1038/s41598-018-19357-3] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 12/28/2017] [Indexed: 12/29/2022] Open
Abstract
Accumulating evidences have indicated that lncRNAs play an important role in various human complex diseases. However, known disease-related lncRNAs are still comparatively small in number, and experimental identification is time-consuming and labor-intensive. Therefore, developing a useful computational method for inferring potential associations between lncRNAs and diseases has become a hot topic, which can significantly help people to explore complex human diseases at the molecular level and effectively advance the quality of disease diagnostics, therapy, prognosis and prevention. In this paper, we propose a novel prediction of lncRNA-disease associations via lncRNA-disease-gene tripartite graph (TPGLDA), which integrates gene-disease associations with lncRNA-disease associations. Compared to previous studies, TPGLDA can be used to better delineate the heterogeneity of coding-non-coding genes-disease association and can effectively identify potential lncRNA-disease associations. After implementing the leave-one-out cross validation, TPGLDA achieves an AUC value of 93.9% which demonstrates its good predictive performance. Moreover, the top 5 predicted rankings of lung cancer, hepatocellular carcinoma and ovarian cancer are manually confirmed by different relevant databases and literatures, affording convincing evidence of the good performance as well as potential value of TPGLDA in identifying potential lncRNA-disease associations. Matlab and R codes of TPGLDA can be found at following: https://github.com/USTC-HIlab/TPGLDA .
Collapse
Affiliation(s)
- Liang Ding
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China.
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230027, China.
| | - Dongdong Sun
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230027, China
| |
Collapse
|
44
|
Zhang J, Ma Z, Kurgan L. Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains. Brief Bioinform 2017; 20:1250-1268. [DOI: 10.1093/bib/bbx168] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 11/15/2017] [Indexed: 11/13/2022] Open
Abstract
Abstract
Proteins interact with a variety of molecules including proteins and nucleic acids. We review a comprehensive collection of over 50 studies that analyze and/or predict these interactions. While majority of these studies address either solely protein–DNA or protein–RNA binding, only a few have a wider scope that covers both protein–protein and protein–nucleic acid binding. Our analysis reveals that binding residues are typically characterized with three hallmarks: relative solvent accessibility (RSA), evolutionary conservation and propensity of amino acids (AAs) for binding. Motivated by drawbacks of the prior studies, we perform a large-scale analysis to quantify and contrast the three hallmarks for residues that bind DNA-, RNA-, protein- and (for the first time) multi-ligand-binding residues that interact with DNA and proteins, and with RNA and proteins. Results generated on a well-annotated data set of over 23 000 proteins show that conservation of binding residues is higher for nucleic acid- than protein-binding residues. Multi-ligand-binding residues are more conserved and have higher RSA than single-ligand-binding residues. We empirically show that each hallmark discriminates between binding and nonbinding residues, even predicted RSA, and that combining them improves discriminatory power for each of the five types of interactions. Linear scoring functions that combine these hallmarks offer good predictive performance of residue-level propensity for binding and provide intuitive interpretation of predictions. Better understanding of these residue-level interactions will facilitate development of methods that accurately predict binding in the exponentially growing databases of protein sequences.
Collapse
|
45
|
Tang Y, Liu D, Wang Z, Wen T, Deng L. A boosting approach for prediction of protein-RNA binding residues. BMC Bioinformatics 2017; 18:465. [PMID: 29219069 PMCID: PMC5773889 DOI: 10.1186/s12859-017-1879-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background RNA binding proteins play important roles in post-transcriptional RNA processing and transcriptional regulation. Distinguishing the RNA-binding residues in proteins is crucial for understanding how protein and RNA recognize each other and function together as a complex. Results We propose PredRBR, an effectively computational approach to predict RNA-binding residues. PredRBR is built with gradient tree boosting and an optimal feature set selected from a large number of sequence and structure characteristics and two categories of structural neighborhood properties. In cross-validation experiments on the RBP170 data set show that PredRBR achieves an overall accuracy of 0.84, a sensitivity of 0.85, MCC of 0.55 and AUC of 0.92, which are significantly better than that of other widely used machine learning algorithms such as Support Vector Machine, Random Forest, and Adaboost. We further calculate the feature importance of different feature categories and find that structural neighborhood characteristics are critical in the recognization of RNA binding residues. Also, PredRBR yields significantly better prediction accuracy on an independent test set (RBP101) in comparison with other state-of-the-art methods. Conclusions The superior performance over existing RNA-binding residue prediction methods indicates the importance of the gradient tree boosting algorithm combined with the optimal selected features.
Collapse
Affiliation(s)
- Yongjun Tang
- Department of Clinical Pharmacology, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, 410008, China.,Institute of Clinical Pharmacology, Hunan Key Laboratory of Pharmacogenetics, Central South University, 87 Xiangya Road, Changsha, 410008, China.,Department of Pediatrics, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, 410008, China
| | - Diwei Liu
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China
| | - Zixiang Wang
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China
| | - Ting Wen
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China
| | - Lei Deng
- School of Software, Central South University, No.22 Shaoshan South Road, Changsha, 410075, China.
| |
Collapse
|
46
|
Luo J, Liu L, Venkateswaran S, Song Q, Zhou X. RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites. Sci Rep 2017; 7:614. [PMID: 28377624 PMCID: PMC5429624 DOI: 10.1038/s41598-017-00795-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2016] [Accepted: 03/13/2017] [Indexed: 01/11/2023] Open
Abstract
RNA and protein interactions play crucial roles in multiple biological processes, while these interactions are significantly influenced by the structures and sequences of protein and RNA molecules. In this study, we first performed an analysis of RNA-protein interacting complexes, and identified interface properties of sequences and structures, which reveal the diverse nature of the binding sites. With the observations, we built a three-step prediction model, namely RPI-Bind, for the identification of RNA-protein binding regions using the sequences and structures of both proteins and RNAs. The three steps include 1) the prediction of RNA binding regions on protein, 2) the prediction of protein binding regions on RNA, and 3) the prediction of interacting regions on both RNA and protein simultaneously, with the results from steps 1) and 2). Compared with existing methods, most of which employ only sequences, our model significantly improves the prediction accuracy at each of the three steps. Especially, our model outperforms the catRAPID by >20% at the 3rd step. All of these results indicate the importance of structures in RNA-protein interactions, and suggest that the RPI-Bind model is a powerful theoretical framework for studying RNA-protein interactions.
Collapse
Affiliation(s)
- Jiesi Luo
- Center for Bioinformatics and Systems Biology and Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC, 27157, USA
| | - Liang Liu
- Center for Bioinformatics and Systems Biology and Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC, 27157, USA
| | - Suresh Venkateswaran
- Center for Bioinformatics and Systems Biology and Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC, 27157, USA
| | - Qianqian Song
- Center for Bioinformatics and Systems Biology and Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC, 27157, USA
| | - Xiaobo Zhou
- Center for Bioinformatics and Systems Biology and Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC, 27157, USA.
| |
Collapse
|
47
|
Cheng Z, Huang K, Wang Y, Liu H, Guan J, Zhou S. Selecting high-quality negative samples for effectively predicting protein-RNA interactions. BMC SYSTEMS BIOLOGY 2017; 11:9. [PMID: 28361676 PMCID: PMC5374704 DOI: 10.1186/s12918-017-0390-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background The identification of Protein-RNA Interactions (PRIs) is important to understanding cell activities. Recently, several machine learning-based methods have been developed for identifying PRIs. However, the performance of these methods is unsatisfactory. One major reason is that they usually use unreliable negative samples in the training process. Methods For boosting the performance of PRI prediction, we propose a novel method to generate reliable negative samples. Concretely, we firstly collect the known PRIs as positive samples for generating positive sets. For each positive set, we construct two corresponding negative sets, one is by our method and the other by random method. Each positive set is combined with a negative set to form a dataset for model training and performance evaluation. Consequently, we get 18 datasets of different species and different ratios of negative samples to positive samples. Secondly, sequence-based features are extracted to represent each of PRIs and protein-RNA pairs in the datasets. A filter-based method is employed to cut down the dimensionality of feature vectors for reducing computational cost. Finally, the performance of support vector machine (SVM), random forest (RF) and naive Bayes (NB) is evaluated on the generated 18 datasets. Results Extensive experiments show that comparing to using randomly-generated negative samples, all classifiers achieve substantial performance improvement by using negative samples selected by our method. The improvements on accuracy and geometric mean for the SVM classifier, the RF classifier and the NB classifier are as high as 204.5 and 68.7%, 174.5 and 53.9%, 80.9 and 54.3%, respectively. Conclusion Our method is useful to the identification of PRIs.
Collapse
Affiliation(s)
- Zhanzhan Cheng
- School of Computer Science, Fudan University, Handan Road, Shanghai, 200433, China
| | - Kai Huang
- School of Computer Science, Fudan University, Handan Road, Shanghai, 200433, China
| | - Yang Wang
- School of Computer Science, Jiangxi Normal University, Nanchang, 330022, China
| | - Hui Liu
- The Bioinformatics Lab at Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China.,Lab of Information Management, Changzhou University, Changzhou, 213164, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China
| | - Shuigeng Zhou
- School of Computer Science, Fudan University, Handan Road, Shanghai, 200433, China. .,The Bioinformatics Lab at Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China.
| |
Collapse
|
48
|
Xu Y, Li L, Ding J, Wu LY, Mai G, Zhou F. Gly-PseAAC: Identifying protein lysine glycation through sequences. Gene 2017; 602:1-7. [DOI: 10.1016/j.gene.2016.11.021] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Revised: 08/29/2016] [Accepted: 11/10/2016] [Indexed: 11/29/2022]
|
49
|
A Brief Review of RNA-Protein Interaction Database Resources. Noncoding RNA 2017; 3:ncrna3010006. [PMID: 29657278 PMCID: PMC5832006 DOI: 10.3390/ncrna3010006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2016] [Accepted: 01/23/2017] [Indexed: 12/25/2022] Open
Abstract
RNA–Protein interactions play critical roles in various biological processes. By collecting and analyzing the RNA–Protein interactions and binding sites from experiments and predictions, RNA–Protein interaction databases have become an essential resource for the exploration of the transcriptional and post-transcriptional regulatory network. Here, we briefly review several widely used RNA–Protein interaction database resources developed in recent years to provide a guide of these databases. The content and major functions in databases are presented. The brief description of database helps users to quickly choose the database containing information they interested. In short, these RNA–Protein interaction database resources are continually updated, but the current state shows the efforts to identify and analyze the large amount of RNA–Protein interactions.
Collapse
|
50
|
Liu ZP, Liu S, Chen R, Huang X, Wu LY. Structure alignment-based classification of RNA-binding pockets reveals regional RNA recognition motifs on protein surfaces. BMC Bioinformatics 2017; 18:27. [PMID: 28077065 PMCID: PMC5225598 DOI: 10.1186/s12859-016-1410-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 12/07/2016] [Indexed: 11/23/2022] Open
Abstract
Background Many critical biological processes are strongly related to protein-RNA interactions. Revealing the protein structure motifs for RNA-binding will provide valuable information for deciphering protein-RNA recognition mechanisms and benefit complementary structural design in bioengineering. RNA-binding events often take place at pockets on protein surfaces. The structural classification of local binding pockets determines the major patterns of RNA recognition. Results In this work, we provide a novel framework for systematically identifying the structure motifs of protein-RNA binding sites in the form of pockets on regional protein surfaces via a structure alignment-based method. We first construct a similarity network of RNA-binding pockets based on a non-sequential-order structure alignment method for local structure alignment. By using network community decomposition, the RNA-binding pockets on protein surfaces are clustered into groups with structural similarity. With a multiple structure alignment strategy, the consensus RNA-binding pockets in each group are identified. The crucial recognition patterns, as well as the protein-RNA binding motifs, are then identified and analyzed. Conclusions Large-scale RNA-binding pockets on protein surfaces are grouped by measuring their structural similarities. This similarity network-based framework provides a convenient method for modeling the structural relationships of functional pockets. The local structural patterns identified serve as structure motifs for the recognition with RNA on protein surfaces. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1410-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Shutang Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Ruitang Chen
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Xiaopeng Huang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.,National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Ling-Yun Wu
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China. .,National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, 100190, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|