1
|
Pradhan UK, Naha S, Das R, Gupta A, Parsad R, Meher PK. RBProkCNN: Deep learning on appropriate contextual evolutionary information for RNA binding protein discovery in prokaryotes. Comput Struct Biotechnol J 2024; 23:1631-1640. [PMID: 38660008 PMCID: PMC11039349 DOI: 10.1016/j.csbj.2024.04.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/12/2024] [Accepted: 04/12/2024] [Indexed: 04/26/2024] Open
Abstract
RNA-binding proteins (RBPs) are central to key functions such as post-transcriptional regulation, mRNA stability, and adaptation to varied environmental conditions in prokaryotes. While the majority of research has concentrated on eukaryotic RBPs, recent developments underscore the crucial involvement of prokaryotic RBPs. Although computational methods have emerged in recent years to identify RBPs, they have fallen short in accurately identifying prokaryotic RBPs due to their generic nature. To bridge this gap, we introduce RBProkCNN, a novel machine learning-driven computational model meticulously designed for the accurate prediction of prokaryotic RBPs. The prediction process involves the utilization of eight shallow learning algorithms and four deep learning models, incorporating PSSM-based evolutionary features. By leveraging a convolutional neural network (CNN) and evolutionarily significant features selected through extreme gradient boosting variable importance measure, RBProkCNN achieved the highest accuracy in five-fold cross-validation, yielding 98.04% auROC and 98.19% auPRC. Furthermore, RBProkCNN demonstrated robust performance with an independent dataset, showcasing a commendable 95.77% auROC and 95.78% auPRC. Noteworthy is its superior predictive accuracy when compared to several state-of-the-art existing models. RBProkCNN is available as an online prediction tool (https://iasri-sg.icar.gov.in/rbprokcnn/), offering free access to interested users. This tool represents a substantial contribution, enriching the array of resources available for the accurate and efficient prediction of prokaryotic RBPs.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Ritwika Das
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| |
Collapse
|
2
|
Wang J, Chen S, Yuan Q, Chen J, Li D, Wang L, Yang Y. Predicting the effects of mutations on protein solubility using graph convolution network and protein language model representation. J Comput Chem 2024; 45:436-445. [PMID: 37933773 DOI: 10.1002/jcc.27249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/11/2023] [Accepted: 10/21/2023] [Indexed: 11/08/2023]
Abstract
Solubility is one of the most important properties of protein. Protein solubility can be greatly changed by single amino acid mutations and the reduced protein solubility could lead to diseases. Since experimental methods to determine solubility are time-consuming and expensive, in-silico methods have been developed to predict the protein solubility changes caused by mutations mostly through protein evolution information. However, these methods are slow since it takes long time to obtain evolution information through multiple sequence alignment. In addition, these methods are of low performance because they do not fully utilize protein 3D structures due to a lack of experimental structures for most proteins. Here, we proposed a sequence-based method DeepMutSol to predict solubility change from residual mutations based on the Graph Convolutional Neural Network (GCN), where the protein graph was initiated according to predicted protein structure from Alphafold2, and the nodes (residues) were represented by protein language embeddings. To circumvent the small data of solubility changes, we further pretrained the model over absolute protein solubility. DeepMutSol was shown to outperform state-of-the-art methods in benchmark tests. In addition, we applied the method to clinically relevant genes from the ClinVar database and the predicted solubility changes were shown able to separate pathogenic mutations. All of the data sets and the source code are available at https://github.com/biomed-AI/DeepMutSol.
Collapse
Affiliation(s)
- Jing Wang
- Guangzhou institute of technology, Xidian University, Guangzhou, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Sheng Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jianwen Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Danping Li
- School of Telecommunications Engineering, Xidian University, Xi'an, China
| | - Lei Wang
- School of Electronic Engineering, Xidian University, Xi'an, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
3
|
Fang Y, Jiang Y, Wei L, Ma Q, Ren Z, Yuan Q, Wei DQ. DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model. Bioinformatics 2023; 39:btad718. [PMID: 38015872 PMCID: PMC10723037 DOI: 10.1093/bioinformatics/btad718] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/04/2023] [Accepted: 11/27/2023] [Indexed: 11/30/2023] Open
Abstract
MOTIVATION Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. RESULTS In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. AVAILABILITY AND IMPLEMENTATION The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.
Collapse
Affiliation(s)
- Yitian Fang
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Yi Jiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Leyi Wei
- School of Software, Shandong University, Jinan, Shandong 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | | | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| |
Collapse
|
4
|
Pradhan UK, Meher PK, Naha S, Pal S, Gupta S, Gupta A, Parsad R. RBPLight: a computational tool for discovery of plant-specific RNA-binding proteins using light gradient boosting machine and ensemble of evolutionary features. Brief Funct Genomics 2023; 22:401-410. [PMID: 37158175 DOI: 10.1093/bfgp/elad016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/12/2023] [Accepted: 04/21/2023] [Indexed: 05/10/2023] Open
Abstract
RNA-binding proteins (RBPs) are essential for post-transcriptional gene regulation in eukaryotes, including splicing control, mRNA transport and decay. Thus, accurate identification of RBPs is important to understand gene expression and regulation of cell state. In order to detect RBPs, a number of computational models have been developed. These methods made use of datasets from several eukaryotic species, specifically from mice and humans. Although some models have been tested on Arabidopsis, these techniques fall short of correctly identifying RBPs for other plant species. Therefore, the development of a powerful computational model for identifying plant-specific RBPs is needed. In this study, we presented a novel computational model for locating RBPs in plants. Five deep learning models and ten shallow learning algorithms were utilized for prediction with 20 sequence-derived and 20 evolutionary feature sets. The highest repeated five-fold cross-validation accuracy, 91.24% AU-ROC and 91.91% AU-PRC, was achieved by light gradient boosting machine. While evaluated using an independent dataset, the developed approach achieved 94.00% AU-ROC and 94.50% AU-PRC. The proposed model achieved significantly higher accuracy for predicting plant-specific RBPs as compared to the currently available state-of-art RBP prediction models. Despite the fact that certain models have already been trained and assessed on the model organism Arabidopsis, this is the first comprehensive computer model for the discovery of plant-specific RBPs. The web server RBPLight was also developed, which is publicly accessible at https://iasri-sg.icar.gov.in/rbplight/, for the convenience of researchers to identify RBPs in plants.
Collapse
Affiliation(s)
- Upendra K Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Prabina K Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Soumen Pal
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sagar Gupta
- CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur (HP) 176061, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| |
Collapse
|
5
|
Song Y, Yuan Q, Zhao H, Yang Y. Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures. Brief Bioinform 2023; 24:bbad360. [PMID: 37824738 DOI: 10.1093/bib/bbad360] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 09/18/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open
Abstract
The interactions between nucleic acids and proteins are important in diverse biological processes. The high-quality prediction of nucleic-acid-binding sites continues to pose a significant challenge. Presently, the predictive efficacy of sequence-based methods is constrained by their exclusive consideration of sequence context information, whereas structure-based methods are unsuitable for proteins lacking known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breakthrough of ESMFold for fast prediction of protein structures, we have developed GLMSite, which accurately identifies DNA- and RNA-binding sites using geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The network was trained using a geometric vector perceptron, and the geometric embeddings were subsequently fed into a common network to acquire common binding characteristics. Finally, these characteristics were input into two fully connected layers to predict binding sites with DNA and RNA, respectively. Through comprehensive tests on DNA/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for inferring nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, codes, and trained models are available at https://github.com/biomed-AI/nucleic-acid-binding.
Collapse
Affiliation(s)
- Yidong Song
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Qianmu Yuan
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Huiying Zhao
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
6
|
Zhang C, Shine M, Pyle AM, Zhang Y. US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat Methods 2022; 19:1109-1115. [PMID: 36038728 DOI: 10.1038/s41592-022-01585-1] [Citation(s) in RCA: 163] [Impact Index Per Article: 54.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 07/19/2022] [Indexed: 11/09/2022]
Abstract
Structure comparison and alignment are of fundamental importance in structural biology studies. We developed the first universal platform, US-align, to uniformly align monomer and complex structures of different macromolecules-proteins, RNAs and DNAs. The pipeline is built on a uniform TM-score objective function coupled with a heuristic alignment searching algorithm. Large-scale benchmarks demonstrated consistent advantages of US-align over state-of-the-art methods in pairwise and multiple structure alignments of different molecules. Detailed analyses showed that the main advantage of US-align lies in the extensive optimization of the unified objective function powered by efficient heuristic search iterations, which substantially improve the accuracy and speed of the structural alignment process. Meanwhile, the universal protocol fusing different molecular and structural types helps facilitate the heterogeneous oligomer structure comparison and template-based protein-protein and protein-RNA/DNA docking.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Morgan Shine
- Yale Combined Program in the Biological and Biomedical Sciences, Yale University, New Haven, CT, USA
| | - Anna Marie Pyle
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.,Yale Combined Program in the Biological and Biomedical Sciences, Yale University, New Haven, CT, USA.,Department of Chemistry, Yale University, New Haven, CT, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. .,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
7
|
Ribonomics Approaches to Identify RBPome in Plants and Other Eukaryotes: Current Progress and Future Prospects. Int J Mol Sci 2022; 23:ijms23115923. [PMID: 35682602 PMCID: PMC9180120 DOI: 10.3390/ijms23115923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 05/16/2022] [Accepted: 05/20/2022] [Indexed: 02/01/2023] Open
Abstract
RNA-binding proteins (RBPs) form complex interactions with RNA to regulate the cell’s activities including cell development and disease resistance. RNA-binding proteome (RBPome) aims to profile and characterize the RNAs and proteins that interact with each other to carry out biological functions. Generally, RNA-centric and protein-centric ribonomic approaches have been successfully developed to profile RBPome in different organisms including plants and animals. Further, more and more novel methods that were firstly devised and applied in mammalians have shown great potential to unravel RBPome in plants such as RNA-interactome capture (RIC) and orthogonal organic phase separation (OOPS). Despise the development of various robust and state-of-the-art ribonomics techniques, genome-wide RBP identifications and characterizations in plants are relatively fewer than those in other eukaryotes, indicating that ribonomics techniques have great opportunities in unraveling and characterizing the RNA–protein interactions in plant species. Here, we review all the available approaches for analyzing RBPs in living organisms. Additionally, we summarize the transcriptome-wide approaches to characterize both the coding and non-coding RBPs in plants and the promising use of RBPome for booming agriculture.
Collapse
|
8
|
Longo LM, Kolodny R, McGlynn SE. Evidence for the emergence of β-trefoils by 'Peptide Budding' from an IgG-like β-sandwich. PLoS Comput Biol 2022; 18:e1009833. [PMID: 35157697 PMCID: PMC8880906 DOI: 10.1371/journal.pcbi.1009833] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 02/25/2022] [Accepted: 01/13/2022] [Indexed: 12/02/2022] Open
Abstract
As sequence and structure comparison algorithms gain sensitivity, the intrinsic interconnectedness of the protein universe has become increasingly apparent. Despite this general trend, β-trefoils have emerged as an uncommon counterexample: They are an isolated protein lineage for which few, if any, sequence or structure associations to other lineages have been identified. If β-trefoils are, in fact, remote islands in sequence-structure space, it implies that the oligomerizing peptide that founded the β-trefoil lineage itself arose de novo. To better understand β-trefoil evolution, and to probe the limits of fragment sharing across the protein universe, we identified both 'β-trefoil bridging themes' (evolutionarily-related sequence segments) and 'β-trefoil-like motifs' (structure motifs with a hallmark feature of the β-trefoil architecture) in multiple, ostensibly unrelated, protein lineages. The success of the present approach stems, in part, from considering β-trefoil sequence segments or structure motifs rather than the β-trefoil architecture as a whole, as has been done previously. The newly uncovered inter-lineage connections presented here suggest a novel hypothesis about the origins of the β-trefoil fold itself-namely, that it is a derived fold formed by 'budding' from an Immunoglobulin-like β-sandwich protein. These results demonstrate how the evolution of a folded domain from a peptide need not be a signature of antiquity and underpin an emerging truth: few protein lineages escape nature's sewing table.
Collapse
Affiliation(s)
- Liam M. Longo
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- Blue Marble Space Institute of Science, Seattle, Washington, United States of America
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa, Israel
| | - Shawn E. McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- Blue Marble Space Institute of Science, Seattle, Washington, United States of America
| |
Collapse
|
9
|
Zhang W, Huang J. EViS: An Enhanced Virtual Screening Approach Based on Pocket-Ligand Similarity. J Chem Inf Model 2022; 62:498-510. [PMID: 35084171 DOI: 10.1021/acs.jcim.1c00944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Virtual screening (VS) is a popular technology in drug discovery to identify a new scaffold of actives for a specific drug target, which can be classified into ligand-based and structure-based approaches. As the number of protein-ligand complex structures available in public databases increases, it would be possible to develop a template searching-based VS approach that utilizes such information. In this work, we proposed an enhanced VS approach, which is termed EViS, to integrate ligand docking, protein pocket template searching, and ligand template shape similarity calculation. A novel and simple PL-score to characterize local pocket-ligand template similarity was used to evaluate the screening compounds. Benchmark tests were performed on three datasets including DUDE, LIT-PCBA, and DEKOIS. EViS achieved the average enrichment factors (EFs) of 27.8 and 23.4 at a 1% cutoff for experimental and predicted structures on the widely used DUDE dataset, respectively. Detailed data analysis shows that EViS benefits from obtaining favorable ligand poses from docking and using such ligand geometric information to perform three-dimensional (3D) ligand similarity calculations, and the PL-score is efficient to screen compounds based on template searching in the protein-ligand structure database.
Collapse
Affiliation(s)
- Wenyi Zhang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China.,Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China.,Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Jing Huang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China.,Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China.,Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| |
Collapse
|
10
|
Mirzaei S, Razmara J, Lotfi S. GADP-align: A genetic algorithm and dynamic programming-based method for structural alignment of proteins. BIOIMPACTS 2020; 11:271-279. [PMID: 34631489 PMCID: PMC8494253 DOI: 10.34172/bi.2021.37] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 06/10/2020] [Accepted: 06/16/2020] [Indexed: 11/16/2022]
Abstract
![]()
Introduction: Similarity analysis of protein structure is considered as a fundamental step to give insight into the relationships between proteins. The primary step in structural alignment is looking for the optimal correspondence between residues of two structures to optimize the scoring function. An exhaustive search for finding such a correspondence between two structures is intractable.
Methods: In this paper, a hybrid method is proposed, namely GADP-align, for pairwise protein structure alignment. The proposed method looks for an optimal alignment using a hybrid method based on a genetic algorithm and an iterative dynamic programming technique. To this end, the method first creates an initial map of correspondence between secondary structure elements (SSEs) of two proteins. Then, a genetic algorithm combined with an iterative dynamic programming algorithm is employed to optimize the alignment.
Results: The GADP-align algorithm was employed to align 10 ‘difficult to align’ protein pairs in order to evaluate its performance. The experimental study shows that the proposed hybrid method produces highly accurate alignments in comparison with the methods using exactly the dynamic programming technique. Furthermore, the proposed method prevents the local optimal traps caused by the unsuitable initial guess of the corresponding residues.
Conclusion: The findings of this paper demonstrate that employing the genetic algorithm along with the dynamic programming technique yields highly accurate alignments between a protein pair by exploring the global alignment and avoiding trapping in local alignments.
Collapse
Affiliation(s)
- Soraya Mirzaei
- Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran
| | - Jafar Razmara
- Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran
| | - Shahriar Lotfi
- Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran
| |
Collapse
|
11
|
Chen S, Sun Z, Lin L, Liu Z, Liu X, Chong Y, Lu Y, Zhao H, Yang Y. To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map. J Chem Inf Model 2019; 60:391-399. [PMID: 31800243 DOI: 10.1021/acs.jcim.9b00438] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2, has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one-dimensional (1D) structural properties that are not sufficient to represent three-dimensional (3D) structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances and developed a new method (SPROF) to predict protein sequence profiles based on an image captioning learning frame. To our best knowledge, this is the first method to employ a 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long-range information from the 2D distance map. Thus, such network architecture using a 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction. The online server and the source code is available at http://biomed.nscc-gz.cn and https://github.com/biomed-AI/SPROF , respectively.
Collapse
Affiliation(s)
- Sheng Chen
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Zhe Sun
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Lihua Lin
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Zifeng Liu
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Xun Liu
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Yutian Chong
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Yutong Lu
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital , Sun Yat-sen University , Guangzhou 510000 , China
| | - Yuedong Yang
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China.,Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University) of the Ministry of Education , Guangzhou 510000 , China
| |
Collapse
|
12
|
Sagar A, Xue B. Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019; 26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022]
Abstract
The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review.
Collapse
Affiliation(s)
- Amit Sagar
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| |
Collapse
|
13
|
Joung I, Kim JY, Joo K, Lee J. Non-sequential protein structure alignment by conformational space annealing and local refinement. PLoS One 2019; 14:e0210177. [PMID: 30699145 PMCID: PMC6353097 DOI: 10.1371/journal.pone.0210177] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 12/18/2018] [Indexed: 11/18/2022] Open
Abstract
Protein structure alignment is an important tool for studying evolutionary biology and protein modeling. A tool which intensively searches for the globally optimal non-sequential alignments is rarely found. We propose ALIGN-CSA which shows improvement in scores, such as DALI-score, SP-score, SO-score and TM-score over the benchmark set including 286 cases. We performed benchmarking of existing popular alignment scoring functions, where the dependence of the search algorithm was effectively eliminated by using ALIGN-CSA. For the benchmarking, we set the minimum block size to 4 to prevent much fragmented alignments where the biological relevance of small alignment blocks is hard to interpret. With this condition, globally optimal alignments were searched by ALIGN-CSA using the four scoring functions listed above, and TM-score is found to be the most effective in generating alignments with longer match lengths and smaller RMSD values. However, DALI-score is the most effective in generating alignments similar to the manually curated reference alignments, which implies that DALI-score is more biologically relevant score. Due to the high demand on computational resources of ALIGN-CSA, we also propose a relatively fast local refinement method, which can control the minimum block size and whether to allow the reverse alignment. ALIGN-CSA can be used to obtain much improved alignment at the cost of relatively more extensive computation. For faster alignment, we propose a refinement protocol that improves the score of a given alignment obtained by various external tools. All programs are available from http://lee.kias.re.kr.
Collapse
Affiliation(s)
- InSuk Joung
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, Korea
| | - Jong Yun Kim
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, Korea
| | - Keehyoung Joo
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, Korea
- * E-mail:
| |
Collapse
|
14
|
Litfin T, Yang Y, Zhou Y. SPOT-Peptide: Template-Based Prediction of Peptide-Binding Proteins and Peptide-Binding Sites. J Chem Inf Model 2019; 59:924-930. [DOI: 10.1021/acs.jcim.8b00777] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Thomas Litfin
- School of Information and Communication Technology, Griffith University, Southport, QLD 4222, Australia
| | - Yuedong Yang
- School of Data and Computer Science, Sun-Yat Sen University, Guangzhou, Guangdong 510006, China
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Southport, QLD 4222, Australia
- Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia
| |
Collapse
|
15
|
Poursheikhali Asghari M, Abdolmaleki P. Prediction of RNA- and DNA-Binding Proteins Using Various Machine Learning Classifiers. Avicenna J Med Biotechnol 2019; 11:104-111. [PMID: 30800250 PMCID: PMC6359699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Nucleic acid-binding proteins play major roles in different biological processes, such as transcription, splicing and translation. Therefore, the nucleic acid-binding function prediction of proteins is a step toward full functional annotation of proteins. The aim of our research was the improvement of nucleic-acid binding function prediction. METHODS In the current study, nine machine-learning algorithms were used to predict RNA- and DNA-binding proteins and also to discriminate between RNA-binding proteins and DNA-binding proteins. The electrostatic features were utilized for prediction of each function in corresponding adapted protein datasets. The leave-one-out cross-validation process was used to measure the performance of employed classifiers. RESULTS Radial basis function classifier gave the best results in predicting RNA- and DNA-binding proteins in comparison with other classifiers applied. In discriminating between RNA- and DNA-binding proteins, multilayer perceptron classifier was the best one. CONCLUSION Our findings show that the prediction of nucleic acid-binding function based on these simple electrostatic features can be improved by applied classifiers. Moreover, a reasonable progress to distinguish between RNA- and DNA-binding proteins has been achieved.
Collapse
Affiliation(s)
| | - Parviz Abdolmaleki
- Corresponding author: Parviz Abdolmaleki, Ph.D., Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran, Tel: +98 21 82883404, Fax: +98 21 82884457, E-mail:,
| |
Collapse
|
16
|
Chen J, Sun J, Liu X, Liu F, Liu R, Wang J. Structure-based prediction of West Nile virus-human protein-protein interactions. J Biomol Struct Dyn 2018; 37:2310-2321. [PMID: 30044201 DOI: 10.1080/07391102.2018.1479659] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
In recent years, West Nile virus (WNV) has posed a great threat to global human health due to its explosive spread. Studying the protein-protein interactions (PPIs) between WNV and human is beneficial for understanding the pathogenesis of WNV and the immune response mechanism of human against WNV infection at the molecular level. In this study, we identified the human target proteins which interact with WNV based on protein structure similarity, and then the interacting pairs were filtered by the subcellular co-localization information. As a result, a network of 3346 interactions was constructed, involving 6 WNV proteins and 1970 human target proteins. To our knowledge, this is the first predicted interactome for WNV-human. By analyzing the topological properties and evolutionary rates of the human target proteins, it was demonstrated that these proteins tend to be the hub and bottleneck proteins in the human PPI network and are more conserved than the non-target ones. Triplet analysis showed that the target proteins are adjacent to each other in the human PPI network, suggesting that these proteins may have similar biological functions. Further, the functional enrichment analysis indicated that the target proteins are mainly involved in virus process, transcription regulation, cell adhesion, and so on. In addition, the common and specific targets were identified and compared based on the networks between WNV-human and Dengue virus II (DENV2)-human. Finally, by combining topological features and existing drug target information, we identified 30 potential anti-WNV human targets, among which 11 ones were reported to be associated with WNV infection. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Jing Chen
- a Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics , Huazhong Agricultural University , Wuhan , People's Republic of China
| | - Jun Sun
- a Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics , Huazhong Agricultural University , Wuhan , People's Republic of China
| | - Xiangming Liu
- b Gongqing Institute of Science and Technology , Gongqing , People's Republic of China
| | - Feng Liu
- a Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics , Huazhong Agricultural University , Wuhan , People's Republic of China
| | - Rong Liu
- a Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics , Huazhong Agricultural University , Wuhan , People's Republic of China
| | - Jia Wang
- a Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics , Huazhong Agricultural University , Wuhan , People's Republic of China
| |
Collapse
|
17
|
Deep-RBPPred: Predicting RNA binding proteins in the proteome scale based on deep learning. Sci Rep 2018; 8:15264. [PMID: 30323214 PMCID: PMC6189057 DOI: 10.1038/s41598-018-33654-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 09/28/2018] [Indexed: 12/22/2022] Open
Abstract
RNA binding protein (RBP) plays an important role in cellular processes. Identifying RBPs by computation and experiment are both essential. Recently, an RBP predictor, RBPPred, is proposed in our group to predict RBPs. However, RBPPred is too slow for that it needs to generate PSSM matrix as its feature. Herein, based on the protein feature of RBPPred and Convolutional Neural Network (CNN), we develop a deep learning model called Deep-RBPPred. With the balance and imbalance training set, we obtain Deep-RBPPred-balance and Deep-RBPPred-imbalance models. Deep-RBPPred has three advantages comparing to previous methods. (1) Deep-RBPPred only needs few physicochemical properties based on protein sequences. (2) Deep-RBPPred runs much faster. (3) Deep-RBPPred has a good generalization ability. In the meantime, Deep-RBPPred is still as good as the state-of-the-art method. Testing in A. thaliana, S. cerevisiae and H. sapiens proteomes, MCC values are 0.82 (0.82), 0.65 (0.69) and 0.85 (0.80) for balance model (imbalance model) when the score cutoff is set to 0.5, respectively. In the same testing dataset, different machine learning algorithms (CNN and SVM) are also compared. The results show that CNN-based model can identify more RBPs than SVM-based. In comparing the balance and imbalance model, both CNN-base and SVM-based tend to favor the majority class in the imbalance set. Deep-RBPPred forecasts 280 (balance model) and 265 (imbalance model) of 299 new RBP. The sensitivity of balance model is about 7% higher than the state-of-the-art method. We also apply deep-RBPPred to 30 eukaryotes and 109 bacteria proteomes downloaded from Uniprot to estimate all possible RBPs. The estimating result shows that rates of RBPs in eukaryote proteomes are much higher than bacteria proteomes.
Collapse
|
18
|
Tiralongo J, Cooper O, Litfin T, Yang Y, King R, Zhan J, Zhao H, Bovin N, Day CJ, Zhou Y. YesU from Bacillus subtilis preferentially binds fucosylated glycans. Sci Rep 2018; 8:13139. [PMID: 30177739 PMCID: PMC6120924 DOI: 10.1038/s41598-018-31241-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 08/10/2018] [Indexed: 12/25/2022] Open
Abstract
The interaction of carbohydrate-binding proteins (CBPs) with their corresponding glycan ligands is challenging to study both experimentally and computationally. This is in part due to their low binding affinity, high flexibility, and the lack of a linear sequence in carbohydrates, as exists in nucleic acids and proteins. We recently described a function-prediction technique called SPOT-Struc that identifies CBPs by global structural alignment and binding-affinity prediction. Here we experimentally determined the carbohydrate specificity and binding affinity of YesU (RCSB PDB ID: 1oq1), an uncharacterized protein from Bacillus subtilis that SPOT-Struc predicted would bind high mannose-type glycans. Glycan array analyses however revealed glycan binding patterns similar to those exhibited by fucose (Fuc)-binding lectins, with SPR analysis revealing high affinity binding to Lewisx and lacto-N-fucopentaose III. Structure based alignment of YesU revealed high similarity to the legume lectins UEA-I and GS-IV, and docking of Lewisx into YesU revealed a complex structure model with predicted binding affinity of −4.3 kcal/mol. Moreover the adherence of B. subtilis to intestinal cells was significantly inhibited by Lex and Ley but by not non-fucosylated glycans, suggesting the interaction of YesU to fucosylated glycans may be involved in the adhesion of B. subtilis to the gastrointestinal tract of mammals.
Collapse
Affiliation(s)
- Joe Tiralongo
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia.
| | - Oren Cooper
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia
| | - Tom Litfin
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, People's Republic of China
| | - Rebecca King
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia
| | - Jian Zhan
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia
| | - Huiying Zhao
- Queensland Institute of Medical Research, Brisbane, Queensland, Australia
| | - Nicolai Bovin
- Shemyakin Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| | - Christopher J Day
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast Campus, QLD 4222, Australia.
| |
Collapse
|
19
|
Gao J, Yang Y, Zhou Y. Grid-based prediction of torsion angle probabilities of protein backbone and its application to discrimination of protein intrinsic disorder regions and selection of model structures. BMC Bioinformatics 2018; 19:29. [PMID: 29390958 PMCID: PMC5796405 DOI: 10.1186/s12859-018-2031-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 01/17/2018] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Protein structure can be described by backbone torsion angles: rotational angles about the N-Cα bond (φ) and the Cα-C bond (ψ) or the angle between Cαi-1-Cαi-Cαi + 1 (θ) and the rotational angle about the Cαi-Cαi + 1 bond (τ). Thus, their accurate prediction is useful for structure prediction and model refinement. Early methods predicted torsion angles in a few discrete bins whereas most recent methods have focused on prediction of angles in real, continuous values. Real value prediction, however, is unable to provide the information on probabilities of predicted angles. RESULTS Here, we propose to predict angles in fine grids of 5° by using deep learning neural networks. We found that this grid-based technique can yield 2-6% higher accuracy in predicting angles in the same 5° bin than existing prediction techniques compared. We further demonstrate the usefulness of predicted probabilities at given angle bins in discrimination of intrinsically disorder regions and in selection of protein models. CONCLUSIONS The proposed method may be useful for characterizing protein structure and disorder. The method is available at http://sparks-lab.org/server/SPIDER2/ as a part of SPIDER2 package.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071 People’s Republic of China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510000 People’s Republic of China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr, Southport, QLD 4222 Australia
| |
Collapse
|
20
|
Cao H, Wei D, Yang Y, Shang Y, Li G, Zhou Y, Ma Q, Xu Y. Systems-level understanding of ethanol-induced stresses and adaptation in E. coli. Sci Rep 2017; 7:44150. [PMID: 28300180 PMCID: PMC5353561 DOI: 10.1038/srep44150] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 02/02/2017] [Indexed: 01/10/2023] Open
Abstract
Understanding ethanol-induced stresses and responses in biofuel-producing bacteria at systems level has significant implications in engineering more efficient biofuel producers. We present a computational study of transcriptomic and genomic data of both ethanol-stressed and ethanol-adapted E. coli cells with computationally predicated ethanol-binding proteins and experimentally identified ethanol tolerance genes. Our analysis suggests: (1) ethanol damages cell wall and membrane integrity, causing increased stresses, particularly reactive oxygen species, which damages DNA and reduces the O2 level; (2) decreased cross-membrane proton gradient from membrane damage, coupled with hypoxia, leads to reduced ATP production by aerobic respiration, driving cells to rely more on fatty acid oxidation, anaerobic respiration and fermentation for ATP production; (3) the reduced ATP generation results in substantially decreased synthesis of macromolecules; (4) ethanol can directly bind 213 proteins including transcription factors, altering their functions; (5) all these changes together induce multiple stress responses, reduced biosynthesis, cell viability and growth; and (6) ethanol-adapted E. coli cells restore the majority of these reduced activities through selection of specific genomic mutations and alteration of stress responses, ultimately restoring normal ATP production, macromolecule biosynthesis, and growth. These new insights into the energy and mass balance will inform design of more ethanol-tolerant strains.
Collapse
Affiliation(s)
- Huansheng Cao
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, the University of Georgia, Athens, GA 30602, USA
- BioEnergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Du Wei
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, the University of Georgia, Athens, GA 30602, USA
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr., Southport, QLD 4222, Australia
| | - Yu Shang
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, the University of Georgia, Athens, GA 30602, USA
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Gaoyang Li
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, the University of Georgia, Athens, GA 30602, USA
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr., Southport, QLD 4222, Australia
| | - Qin Ma
- Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD 57007, USA
- BioSNTR, Brookings, SD, 57007, USA
| | - Ying Xu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, the University of Georgia, Athens, GA 30602, USA
- BioEnergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| |
Collapse
|
21
|
Chan YH, Venev SV, Zeldovich KB, Matthews CR. Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints. Nat Commun 2017; 8:14614. [PMID: 28262665 PMCID: PMC5343507 DOI: 10.1038/ncomms14614] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 01/11/2017] [Indexed: 02/07/2023] Open
Abstract
Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation to temperature stress were probed by auxotrophic complementation of yeast with prokaryotic, thermophilic IGPS. Analysis of beneficial mutations pointed to an unexpected, long-range allosteric pathway towards the active site of the protein. Significant correlations between the fitness landscapes of distant orthologues implicate both sequence and structure as primary forces in defining the TIM barrel fitness landscape and suggest that fitness landscapes can be translocated in sequence space. Exploration of fitness landscapes in the context of a protein fold provides a strategy for elucidating the sequence-structure-fitness relationships in other common motifs. The TIM barrel fold is an evolutionarily conserved motif found in proteins with a variety of enzymatic functions. Here the authors explore the fitness landscape of the TIM barrel protein IGPS and uncover evolutionary constraints on both sequence and structure, accompanied by long range allosteric interactions.
Collapse
Affiliation(s)
- Yvonne H Chan
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA
| | - Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, Massachusetts 01605, USA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, Massachusetts 01605, USA
| | - C Robert Matthews
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA
| |
Collapse
|
22
|
MetQ of Neisseria gonorrhoeae Is a Surface-Expressed Antigen That Elicits Bactericidal and Functional Blocking Antibodies. Infect Immun 2017; 85:IAI.00898-16. [PMID: 27895130 PMCID: PMC5278169 DOI: 10.1128/iai.00898-16] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 11/19/2016] [Indexed: 01/30/2023] Open
Abstract
Neisseria gonorrhoeae, the causative agent of the sexually transmitted infection (STI) gonorrhea, is a growing public health threat for which a vaccine is urgently needed. We characterized the functional role of the gonococcal MetQ protein, which is the methionine binding component of an ABC transporter system, and assessed its potential as a candidate antigen for inclusion in a gonococcal vaccine. MetQ has been found to be highly conserved in all strains investigated to date, it is localized on the bacterial surface, and it binds l-methionine with a high affinity. MetQ is also involved in gonococcal adherence to cervical epithelial cells. Mutants lacking MetQ have impaired survival in human monocytes, macrophages, and serum. Furthermore, antibodies raised against MetQ are bactericidal and are able to block gonococcal adherence to epithelial cells. These data suggest that MetQ elicits both bactericidal and functional blocking antibodies and is a valid candidate antigen for additional investigation and possible inclusion in a vaccine for prevention of gonorrhea.
Collapse
|
23
|
CRHunter: integrating multifaceted information to predict catalytic residues in enzymes. Sci Rep 2016; 6:34044. [PMID: 27665935 PMCID: PMC5036049 DOI: 10.1038/srep34044] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 09/07/2016] [Indexed: 11/08/2022] Open
Abstract
A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.
Collapse
|
24
|
Sun M, Wang X, Zou C, He Z, Liu W, Li H. Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinformatics 2016; 17:231. [PMID: 27266516 PMCID: PMC4897909 DOI: 10.1186/s12859-016-1110-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 06/02/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. RESULTS In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. CONCLUSIONS The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
Collapse
Affiliation(s)
- Meijian Sun
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Xia Wang
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Chuanxin Zou
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Zenghui He
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Wei Liu
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Honglin Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China.
| |
Collapse
|
25
|
Yang Y, Zhan J, Zhou Y. SPOT‐Ligand: Fast and effective structure‐based virtual screening by binding homology search according to ligand and receptor similarity. J Comput Chem 2016; 37:1734-9. [DOI: 10.1002/jcc.24380] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Revised: 01/12/2016] [Accepted: 03/05/2016] [Indexed: 12/11/2022]
Affiliation(s)
- Yuedong Yang
- Institute for Glycomics and School of Information and Communication TechnologyGriffith UniversityParklands DrSouthport QLD4222 Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication TechnologyGriffith UniversityParklands DrSouthport QLD4222 Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication TechnologyGriffith UniversityParklands DrSouthport QLD4222 Australia
| |
Collapse
|
26
|
Cui X, Naveed H, Gao X. Finding optimal interaction interface alignments between biological complexes. Bioinformatics 2015; 31:i133-41. [PMID: 26072475 PMCID: PMC4765866 DOI: 10.1093/bioinformatics/btv242] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Motivation: Biological molecules perform their functions through interactions with other molecules. Structure alignment of interaction interfaces between biological complexes is an indispensable step in detecting their structural similarities, which are keys to understanding their evolutionary histories and functions. Although various structure alignment methods have been developed to successfully access the similarities of protein structures or certain types of interaction interfaces, existing alignment tools cannot directly align arbitrary types of interfaces formed by protein, DNA or RNA molecules. Specifically, they require a ‘blackbox preprocessing’ to standardize interface types and chain identifiers. Yet their performance is limited and sometimes unsatisfactory. Results: Here we introduce a novel method, PROSTA-inter, that automatically determines and aligns interaction interfaces between two arbitrary types of complex structures. Our method uses sequentially remote fragments to search for the optimal superimposition. The optimal residue matching problem is then formulated as a maximum weighted bipartite matching problem to detect the optimal sequence order-independent alignment. Benchmark evaluation on all non-redundant protein–DNA complexes in PDB shows significant performance improvement of our method over TM-align and iAlign (with the ‘blackbox preprocessing’). Two case studies where our method discovers, for the first time, structural similarities between two pairs of functionally related protein–DNA complexes are presented. We further demonstrate the power of our method on detecting structural similarities between a protein–protein complex and a protein–RNA complex, which is biologically known as a protein–RNA mimicry case. Availability and implementation: The PROSTA-inter web-server is publicly available at http://www.cbrc.kaust.edu.sa/prosta/. Contact:xin.gao@kaust.edu.sa
Collapse
Affiliation(s)
- Xuefeng Cui
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Hammad Naveed
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
27
|
Jin X, Awale M, Zasso M, Kostro D, Patiny L, Reymond JL. PDB-Explorer: a web-based interactive map of the protein data bank in shape space. BMC Bioinformatics 2015; 16:339. [PMID: 26493835 PMCID: PMC4619230 DOI: 10.1186/s12859-015-0776-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 10/14/2015] [Indexed: 11/17/2022] Open
Abstract
Background The RCSB Protein Data Bank (PDB) provides public access to experimentally determined 3D-structures of biological macromolecules (proteins, peptides and nucleic acids). While various tools are available to explore the PDB, options to access the global structural diversity of the entire PDB and to perceive relationships between PDB structures remain very limited. Methods A 136-dimensional atom pair 3D-fingerprint for proteins (3DP) counting categorized atom pairs at increasing through-space distances was designed to represent the molecular shape of PDB-entries. Nearest neighbor searches examples were reported exemplifying the ability of 3DP-similarity to identify closely related biomolecules from small peptides to enzyme and large multiprotein complexes such as virus particles. The principle component analysis was used to obtain the visualization of PDB in 3DP-space. Results The 3DP property space groups proteins and protein assemblies according to their 3D-shape similarity, yet shows exquisite ability to distinguish between closely related structures. An interactive website called PDB-Explorer is presented featuring a color-coded interactive map of PDB in 3DP-space. Each pixel of the map contains one or more PDB-entries which are directly visualized as ribbon diagrams when the pixel is selected. The PDB-Explorer website allows performing 3DP-nearest neighbor searches of any PDB-entry or of any structure uploaded as protein-type PDB file. All functionalities on the website are implemented in JavaScript in a platform-independent manner and draw data from a server that is updated daily with the latest PDB additions, ensuring complete and up-to-date coverage. The essentially instantaneous 3DP-similarity search with the PDB-Explorer provides results comparable to those of much slower 3D-alignment algorithms, and automatically clusters proteins from the same superfamilies in tight groups. Conclusion A chemical space classification of PDB based on molecular shape was obtained using a new atom-pair 3D-fingerprint for proteins and implemented in a web-based database exploration tool comprising an interactive color-coded map of the PDB chemical space and a nearest neighbor search tool. The PDB-Explorer website is freely available at www.cheminfo.org/pdbexplorer and represents an unprecedented opportunity to interactively visualize and explore the structural diversity of the PDB. ᅟ ᅟMaps of PDB in 3DP-space color-coded by heavy atom count and shape. ![]()
Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0776-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xian Jin
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012, Berne, Switzerland.
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012, Berne, Switzerland.
| | - Michaël Zasso
- Ecole Polytechnique Fédérale de Lausanne (EPFL), Institute of Chemical Sciences and Engineering (ISIC), Lausanne, 1015, Switzerland.
| | - Daniel Kostro
- Ecole Polytechnique Fédérale de Lausanne (EPFL), Institute of Chemical Sciences and Engineering (ISIC), Lausanne, 1015, Switzerland.
| | - Luc Patiny
- Ecole Polytechnique Fédérale de Lausanne (EPFL), Institute of Chemical Sciences and Engineering (ISIC), Lausanne, 1015, Switzerland.
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012, Berne, Switzerland.
| |
Collapse
|
28
|
Fox NK, Brenner SE, Chandonia JM. The value of protein structure classification information-Surveying the scientific literature. Proteins 2015; 83:2025-38. [PMID: 26313554 PMCID: PMC4609302 DOI: 10.1002/prot.24915] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 08/06/2015] [Accepted: 08/18/2015] [Indexed: 11/08/2022]
Abstract
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.
Collapse
Affiliation(s)
- Naomi K Fox
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| | - Steven E Brenner
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720.,Department of Plant and Microbial Biology, University of California, Berkeley, California, 94720
| | - John-Marc Chandonia
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| |
Collapse
|
29
|
Approaching rational epitope vaccine design for hepatitis C virus with meta-server and multivalent scaffolding. Sci Rep 2015; 5:12501. [PMID: 26238798 PMCID: PMC4533164 DOI: 10.1038/srep12501] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 06/18/2015] [Indexed: 02/06/2023] Open
Abstract
Development of a prophylactic vaccine against hepatitis C virus (HCV) has been hampered by the extraordinary viral diversity and the poor host immune response. Scaffolding, by grafting an epitope onto a heterologous protein scaffold, offers a possible solution to epitope vaccine design. In this study, we designed and characterized epitope vaccine antigens for the antigenic sites of HCV envelope glycoproteins E1 (residues 314–324) and E2 (residues 412–423), for which neutralizing antibody-bound structures are available. We first combined six structural alignment algorithms in a “scaffolding meta-server” to search for diverse scaffolds that can structurally accommodate the HCV epitopes. For each antigenic site, ten scaffolds were selected for computational design, and the resulting epitope scaffolds were analyzed using structure-scoring functions and molecular dynamics simulation. We experimentally confirmed that three E1 and five E2 epitope scaffolds bound to their respective neutralizing antibodies, but with different kinetics. We then investigated a “multivalent scaffolding” approach by displaying 24 copies of an epitope scaffold on a self-assembling nanoparticle, which markedly increased the avidity of antibody binding. Our study thus demonstrates the utility of a multi-scale scaffolding strategy in epitope vaccine design and provides promising HCV immunogens for further assessment in vivo.
Collapse
|
30
|
SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues. PLoS One 2015; 10:e0133260. [PMID: 26176857 PMCID: PMC4503397 DOI: 10.1371/journal.pone.0133260] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/25/2015] [Indexed: 11/19/2022] Open
Abstract
Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Collapse
|
31
|
Computational tools for epitope vaccine design and evaluation. Curr Opin Virol 2015; 11:103-12. [PMID: 25837467 DOI: 10.1016/j.coviro.2015.03.013] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Revised: 03/13/2015] [Accepted: 03/16/2015] [Indexed: 12/15/2022]
Abstract
Rational approaches will be required to develop universal vaccines for viral pathogens such as human immunodeficiency virus, hepatitis C virus, and influenza, for which empirical approaches have failed. The main objective of a rational vaccine strategy is to design novel immunogens that are capable of inducing long-term protective immunity. In practice, this requires structure-based engineering of the target neutralizing epitopes and a quantitative readout of vaccine-induced immune responses. Therefore, computational tools that can facilitate these two areas have played increasingly important roles in rational vaccine design in recent years. Here we review the computational techniques developed for protein structure prediction and antibody repertoire analysis, and demonstrate how they can be applied to the design and evaluation of epitope vaccines.
Collapse
|
32
|
Zhao H, Wang J, Zhou Y, Yang Y. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome. PLoS One 2014; 9:e96694. [PMID: 24792350 PMCID: PMC4008587 DOI: 10.1371/journal.pone.0096694] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Accepted: 04/10/2014] [Indexed: 12/25/2022] Open
Abstract
As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.
Collapse
Affiliation(s)
- Huiying Zhao
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Jihua Wang
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Dezhou University, Dezhou, Shandong, China
| | - Yaoqi Zhou
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Dezhou University, Dezhou, Shandong, China
- Institute for Glycomics and School of Information and Communication Technique, Griffith University, Southport, Queensland, Australia
- * E-mail: (YZ); (YY)
| | - Yuedong Yang
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Institute for Glycomics and School of Information and Communication Technique, Griffith University, Southport, Queensland, Australia
- * E-mail: (YZ); (YY)
| |
Collapse
|
33
|
Yang Y, Zhao H, Wang J, Zhou Y. SPOT-Seq-RNA: predicting protein-RNA complex structure and RNA-binding function by fold recognition and binding affinity prediction. Methods Mol Biol 2014; 1137:119-30. [PMID: 24573478 PMCID: PMC3937850 DOI: 10.1007/978-1-4939-0366-5_9] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
RNA-binding proteins (RBPs) play key roles in RNA metabolism and post-transcriptional regulation. Computational methods have been developed separately for prediction of RBPs and RNA-binding residues by machine-learning techniques and prediction of protein-RNA complex structures by rigid or semiflexible structure-to-structure docking. Here, we describe a template-based technique called SPOT-Seq-RNA that integrates prediction of RBPs, RNA-binding residues, and protein-RNA complex structures into a single package. This integration is achieved by combining template-based structure-prediction software, SPARKS X, with binding affinity prediction software, DRNA. This tool yields reasonable sensitivity (46 %) and high precision (84 %) for an independent test set of 215 RBPs and 5,766 non-RBPs. SPOT-Seq-RNA is computationally efficient for genome-scale prediction of RBPs and protein-RNA complex structures. Its application to human genome study has revealed a similar sensitivity and ability to uncover hundreds of novel RBPs beyond simple homology. The online server and downloadable version of SPOT-Seq-RNA are available at http://sparks-lab.org/server/SPOT-Seq-RNA/.
Collapse
Affiliation(s)
- Yuedong Yang
- School of Informatics, Indiana University Purdue University, Indianapolis, IN, USA
| | | | | | | |
Collapse
|
34
|
Zhao H, Yang Y, Zhou Y. Prediction of RNA binding proteins comes of age from low resolution to high resolution. MOLECULAR BIOSYSTEMS 2013; 9:2417-25. [PMID: 23872922 PMCID: PMC3870025 DOI: 10.1039/c3mb70167k] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Networks of protein-RNA interactions is likely to be larger than protein-protein and protein-DNA interaction networks because RNA transcripts are encoded tens of times more than proteins (e.g. only 3% of human genome coded for proteins), have diverse function and localization, and are controlled by proteins from birth (transcription) to death (degradation). This massive network is evidenced by several recent experimental discoveries of large numbers of previously unknown RNA-binding proteins (RBPs). Meanwhile, more than 400 non-redundant protein-RNA complex structures (at 25% sequence identity or less) have been deposited into the protein databank. These sequences and structural resources for RBPs provide ample data for the development of computational techniques dedicated to RBP prediction, as experimentally determining RNA-binding functions is time-consuming and expensive. This review compares traditional machine-learning based approaches with emerging template-based methods at several levels of prediction resolution ranging from two-state binding/non-binding prediction, to binding residue prediction and protein-RNA complex structure prediction. The analysis indicates that the two approaches are complementary and their combinations may lead to further improvements.
Collapse
Affiliation(s)
- Huiying Zhao
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana 46202, USA.
| | | | | |
Collapse
|
35
|
Hung CL, Lin YL. Implementation of a parallel protein structure alignment service on cloud. Int J Genomics 2013; 2013:439681. [PMID: 23671842 PMCID: PMC3647543 DOI: 10.1155/2013/439681] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Accepted: 02/20/2013] [Indexed: 12/20/2022] Open
Abstract
Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.
Collapse
Affiliation(s)
- Che-Lun Hung
- Department of Computer Science and Communication Engineering, Providence University, 200, Sec. 7, Taiwan Boulevard, Shalu Dist., Taichung 43301, Taiwan
| | - Yaw-Ling Lin
- Department of Computer Science and Information Engineering, Providence University, 200, Sec. 7, Taiwan Boulevard, Shalu Dist., Taichung 43301, Taiwan
| |
Collapse
|