1
|
Sagendorf JM, Mitra R, Huang J, Chen XS, Rohs R. Structure-based prediction of protein-nucleic acid binding using graph neural networks. Biophys Rev 2024; 16:297-314. [PMID: 39345796 PMCID: PMC11427629 DOI: 10.1007/s12551-024-01201-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 05/28/2024] [Indexed: 10/01/2024] Open
Abstract
Protein-nucleic acid (PNA) binding plays critical roles in the transcription, translation, regulation, and three-dimensional organization of the genome. Structural models of proteins bound to nucleic acids (NA) provide insights into the chemical, electrostatic, and geometric properties of the protein structure that give rise to NA binding but are scarce relative to models of unbound proteins. We developed a deep learning approach for predicting PNA binding given the unbound structure of a protein that we call PNAbind. Our method utilizes graph neural networks to encode the spatial distribution of physicochemical and geometric properties of protein structures that are predictive of NA binding. Using global physicochemical encodings, our models predict the overall binding function of a protein, and using local encodings, they predict the location of individual NA binding residues. Our models can discriminate between specificity for DNA or RNA binding, and we show that predictions made on computationally derived protein structures can be used to gain mechanistic understanding of chemical and structural features that determine NA recognition. Binding site predictions were validated against benchmark datasets, achieving AUROC scores in the range of 0.92-0.95. We applied our models to the HIV-1 restriction factor APOBEC3G and showed that our model predictions are consistent with and help explain experimental RNA binding data. Supplementary information The online version contains supplementary material available at 10.1007/s12551-024-01201-w.
Collapse
Affiliation(s)
- Jared M. Sagendorf
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
- Present Address: Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158 USA
| | - Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
| | - Jiawei Huang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
| | - Xiaojiang S. Chen
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089 USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089 USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089 USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089 USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089 USA
| |
Collapse
|
2
|
Jiang Z, Xiao SR, Liu R. Dissecting and predicting different types of binding sites in nucleic acids based on structural information. Brief Bioinform 2021; 23:6384399. [PMID: 34624074 PMCID: PMC8769709 DOI: 10.1093/bib/bbab411] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/26/2021] [Accepted: 09/07/2021] [Indexed: 12/16/2022] Open
Abstract
The biological functions of DNA and RNA generally depend on their interactions with other molecules, such as small ligands, proteins and nucleic acids. However, our knowledge of the nucleic acid binding sites for different interaction partners is very limited, and identification of these critical binding regions is not a trivial work. Herein, we performed a comprehensive comparison between binding and nonbinding sites and among different categories of binding sites in these two nucleic acid classes. From the structural perspective, RNA may interact with ligands through forming binding pockets and contact proteins and nucleic acids using protruding surfaces, while DNA may adopt regions closer to the middle of the chain to make contacts with other molecules. Based on structural information, we established a feature-based ensemble learning classifier to identify the binding sites by fully using the interplay among different machine learning algorithms, feature spaces and sample spaces. Meanwhile, we designed a template-based classifier by exploiting structural conservation. The complementarity between the two classifiers motivated us to build an integrative framework for improving prediction performance. Moreover, we utilized a post-processing procedure based on the random walk algorithm to further correct the integrative predictions. Our unified prediction framework yielded promising results for different binding sites and outperformed existing methods.
Collapse
Affiliation(s)
- Zheng Jiang
- College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Si-Rui Xiao
- College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Rong Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| |
Collapse
|
3
|
Tan C, Wang T, Yang W, Deng L. PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction. Molecules 2019; 25:molecules25010098. [PMID: 31888057 PMCID: PMC6982935 DOI: 10.3390/molecules25010098] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2019] [Revised: 12/20/2019] [Accepted: 12/21/2019] [Indexed: 11/16/2022] Open
Abstract
Interactions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and regulation. Single-stranded DNA-binding proteins (SSBs) are necessary for DNA replication, recombination, and repair and are responsible for binding to the single-stranded DNA. Therefore, the effective classification of DNA-binding proteins is helpful for functional annotations of proteins. In this work, we propose PredPSD, a computational method based on sequence information that accurately predicts SSBs and DSBs. It introduces three novel feature extraction algorithms. In particular, we use the autocross-covariance (ACC) transformation to transform feature matrices into fixed-length vectors. Then, we put the optimal feature subset obtained by the minimal-redundancy-maximal-relevance criterion (mRMR) feature selection algorithm into the gradient tree boosting (GTB). In 10-fold cross-validation based on a benchmark dataset, PredPSD achieves promising performances with an AUC score of 0.956 and an accuracy of 0.912, which are better than those of existing methods. Moreover, our method has significantly improved the prediction accuracy in independent testing. The experimental results show that PredPSD can significantly recognize the binding specificity and differentiate DSBs and SSBs.
Collapse
Affiliation(s)
- Changgeng Tan
- School of Computer Science and Engineering, Central South University, Changsha 410075, China; (C.T.); (T.W.); (W.Y.)
| | - Tong Wang
- School of Computer Science and Engineering, Central South University, Changsha 410075, China; (C.T.); (T.W.); (W.Y.)
| | - Wenyi Yang
- School of Computer Science and Engineering, Central South University, Changsha 410075, China; (C.T.); (T.W.); (W.Y.)
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410075, China; (C.T.); (T.W.); (W.Y.)
- School of Software, Xinjiang University, Urumqi 830008, China
- Correspondence: ; Tel.: +86-731-82539736
| |
Collapse
|
4
|
Qiu Z, Nakamura S, Fujimoto K. Reversible photo-cross-linking of the GCN4 peptide containing 3-cyanovinylcarbazole amino acid to double-stranded DNA. Org Biomol Chem 2019; 17:6277-6283. [PMID: 31192345 DOI: 10.1039/c9ob00372j] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Interaction analysis in vivo greatly promotes the analyses and understanding of biological functions. The interaction between DNA and peptides or proteins is very important in terms of readout and amplifying information from genomic DNA. In this study, we designed and synthesized a photo-cross-linkable amino acid, l-3-cyanovinlycarbazole amino acid (l-CNVA), to double-stranded DNA. Reversible photo-cross-linking between DNA and peptides containing CNVA, having 3-cyanovinylcarbazole moieties capable of photo-cross-linking to nucleic acids, was demonstrated. As a result, it was shown that the GCN4 peptide, containing CNVA, can be photo-cross-linked to DNA, and its adduct was photo-split into the original peptide and DNA with 312 nm-irradiation. This is the first report that reversibly manipulates photo-crosslinking between double stranded DNA and peptides. In addition, this reversible photo-cross-linking, using l-CNVA, is faster and with higher yield than that using diazirine and psoralen.
Collapse
Affiliation(s)
- Zhiyong Qiu
- School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology. Asahidai 1-1, Nomi, Ishikawa 923-1292, Japan.
| | | | | |
Collapse
|
5
|
Dvir S, Argoetti A, Mandel-Gutfreund Y. Ribonucleoprotein particles: advances and challenges in computational methods. Curr Opin Struct Biol 2018; 53:124-130. [PMID: 30172766 DOI: 10.1016/j.sbi.2018.08.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 08/07/2018] [Indexed: 01/16/2023]
Abstract
RNA-binding proteins (RBPs) interact with RNA to form Ribonucleoprotein Particles (RNPs). The interaction between RBPs and their RNA partners are traditionally thought to be mediated by highly conserved RNA-binding domains (RBDs). Recently, high-throughput studies led to the discovery of hundreds of novel proteins and domains, of which many do not follow the classical definition of RNA-binding. Despite technological innovations, experimental screenings are currently limited to the detection of specific types of RNPs, underscoring the importance of computational methods for predicting novel RBPs and RNA interacting residues and interfaces. Here, we discuss major challenges in computational prediction of RBPs and RBDs and outline new strategies to circumvent current limitations of experimental techniques.
Collapse
Affiliation(s)
- Shlomi Dvir
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Amir Argoetti
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Yael Mandel-Gutfreund
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel; Department of Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel.
| |
Collapse
|
6
|
Wang W, Sun L, Zhang S, Zhang H, Shi J, Xu T, Li K. Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences. BMC Bioinformatics 2017; 18:300. [PMID: 28606086 PMCID: PMC5469069 DOI: 10.1186/s12859-017-1715-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 06/06/2017] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs. RESULTS Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing. CONCLUSIONS Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.
Collapse
Affiliation(s)
- Wei Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan Province 453007 China
- Laboratory of Computation Intelligence and Information Processing, Engineering Technology Research Center for Computing Intelligence and Data Mining, Xinxiang, Henan Province 453007 China
| | - Lin Sun
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan Province 453007 China
| | - Shiguang Zhang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan Province 453007 China
| | - Hongjun Zhang
- School of Aviation Engineering, Anyang University, Anyang, Henan Province 455000 China
| | - Jinling Shi
- School of International Education, Xuchang University, Xuchang, Henan Province 461000 China
| | - Tianhe Xu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan Province 453007 China
| | - Keliang Li
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan Province 453007 China
| |
Collapse
|
7
|
Protein-RNA interactions: structural biology and computational modeling techniques. Biophys Rev 2016; 8:359-367. [PMID: 28510023 DOI: 10.1007/s12551-016-0223-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 09/20/2016] [Indexed: 12/30/2022] Open
Abstract
RNA-binding proteins are functionally diverse within cells, being involved in RNA-metabolism, translation, DNA damage repair, and gene regulation at both the transcriptional and post-transcriptional levels. Much has been learnt about their interactions with RNAs through structure determination techniques and computational modeling. This review gives an overview of the structural data currently available for protein-RNA complexes, and discusses the technical issues facing structural biologists working to solve their structures. The review focuses on three techniques used to solve the 3-dimensional structure of protein-RNA complexes at atomic resolution, namely X-ray crystallography, solution nuclear magnetic resonance (NMR) and cryo-electron microscopy (cryo-EM). The review then focuses on the main computational modeling techniques that use these atomic resolution data: discussing the prediction of RNA-binding sites on unbound proteins, docking proteins, and RNAs, and modeling the molecular dynamics of the systems. In conclusion, the review looks at the future directions this field of research might take.
Collapse
|
8
|
Sun M, Wang X, Zou C, He Z, Liu W, Li H. Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinformatics 2016; 17:231. [PMID: 27266516 PMCID: PMC4897909 DOI: 10.1186/s12859-016-1110-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 06/02/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. RESULTS In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. CONCLUSIONS The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
Collapse
Affiliation(s)
- Meijian Sun
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Xia Wang
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Chuanxin Zou
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Zenghui He
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Wei Liu
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Honglin Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China.
| |
Collapse
|
9
|
Paz I, Kligun E, Bengad B, Mandel-Gutfreund Y. BindUP: a web server for non-homology-based prediction of DNA and RNA binding proteins. Nucleic Acids Res 2016; 44:W568-74. [PMID: 27198220 PMCID: PMC4987955 DOI: 10.1093/nar/gkw454] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 05/11/2016] [Indexed: 12/12/2022] Open
Abstract
Gene expression is a multi-step process involving many layers of regulation. The main regulators of the pathway are DNA and RNA binding proteins. While over the years, a large number of DNA and RNA binding proteins have been identified and extensively studied, it is still expected that many other proteins, some with yet another known function, are awaiting to be discovered. Here we present a new web server, BindUP, freely accessible through the website http://bindup.technion.ac.il/, for predicting DNA and RNA binding proteins using a non-homology-based approach. Our method is based on the electrostatic features of the protein surface and other general properties of the protein. BindUP predicts nucleic acid binding function given the proteins three-dimensional structure or a structural model. Additionally, BindUP provides information on the largest electrostatic surface patches, visualized on the server. The server was tested on several datasets of DNA and RNA binding proteins, including proteins which do not possess DNA or RNA binding domains and have no similarity to known nucleic acid binding proteins, achieving very high accuracy. BindUP is applicable in either single or batch modes and can be applied for testing hundreds of proteins simultaneously in a highly efficient manner.
Collapse
Affiliation(s)
- Inbal Paz
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Efrat Kligun
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Barak Bengad
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Yael Mandel-Gutfreund
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| |
Collapse
|
10
|
Wang W, Liu J, Sun L. Surface shapes and surrounding environment analysis of single- and double-stranded DNA-binding proteins in protein-DNA interface. Proteins 2016; 84:979-89. [PMID: 27038080 DOI: 10.1002/prot.25045] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Revised: 03/15/2016] [Accepted: 03/25/2016] [Indexed: 11/12/2022]
Abstract
Protein-DNA bindings are critical to many biological processes. However, the structural mechanisms underlying these interactions are not fully understood. Here, we analyzed the residues shape (peak, flat, or valley) and the surrounding environment of double-stranded DNA-binding proteins (DSBs) and single-stranded DNA-binding proteins (SSBs) in protein-DNA interfaces. In the results, we found that the interface shapes, hydrogen bonds, and the surrounding environment present significant differences between the two kinds of proteins. Built on the investigation results, we constructed a random forest (RF) classifier to distinguish DSBs and SSBs with satisfying performance. In conclusion, we present a novel methodology to characterize protein interfaces, which will deepen our understanding of the specificity of proteins binding to ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA). Proteins 2016; 84:979-989. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Wei Wang
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China.,Laboratory of Computation Intelligence and Information Processing, Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan Province, China
| | - Juan Liu
- Institute of Computer Software, School of Computer, Wuhan University, Wuhan, 430072, China
| | - Lin Sun
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China.,Laboratory of Computation Intelligence and Information Processing, Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan Province, China
| |
Collapse
|
11
|
Wang W, Liu J, Xiong Y, Zhu L, Zhou X. Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information. IET Syst Biol 2014; 8:176-83. [PMID: 25075531 DOI: 10.1049/iet-syb.2013.0048] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs) play different roles in biological processes when they bind to single-stranded DNA (ssDNA) or double-stranded DNA (dsDNA). However, the underlying binding mechanisms of SSBs and DSBs have not yet been fully understood. Here, the authors firstly constructed two groups of ssDNA and dsDNA specific binding sites from two non-redundant sets of SSBs and DSBs. They further analysed the relationship between the two classes of binding sites and a newly proposed set of features (residue charge distribution, secondary structure and spatial shape). To assess and utilise the predictive power of these features, they trained a classification model using support vector machine to make predictions about the ssDNA and the dsDNA binding sites. The author's analysis and prediction results indicated that the two classes of binding sites can be distinguishable by the three types of features, and the final classifier using all the features achieved satisfactory performance. In conclusion, the proposed features will deepen their understanding of the specificity of proteins which bind to ssDNA or dsDNA.
Collapse
Affiliation(s)
- Wei Wang
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China
| | - Juan Liu
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China.
| | - Yi Xiong
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| | - Lida Zhu
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China
| | - Xionghui Zhou
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China
| |
Collapse
|
12
|
Wang W, Liu J, Zhou X. Identification of single-stranded and double-stranded DNA binding proteins based on protein structure. BMC Bioinformatics 2014; 15 Suppl 12:S4. [PMID: 25474071 PMCID: PMC4243121 DOI: 10.1186/1471-2105-15-s12-s4] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Background Protein-DNA interactions are essential for many biological processes. However, the structural mechanisms underlying these interactions are not fully understood. DNA binding proteins can be classified into double-stranded DNA binding proteins (DSBs) and single-stranded DNA binding proteins (SSBs), and they take part in different biological functions. DSBs usually act as transcriptional factors to regulate the genes' expressions, while SSBs usually play roles in DNA replication, recombination, and repair, etc. Understanding the binding specificity of a DNA binding protein is helpful for the research of protein functions. Results In this paper, we investigated the differences between DSBs and SSBs on surface tunnels as well as the OB-fold domain information. We detected the largest clefts on the protein surfaces, to obtain several features to be used for distinguishing the potential interfaces between SSBs and DSBs, and compared its structure with each of the six OB-fold protein templates, and use the maximal alignment score TM-score as the OB-fold feature of the protein, based on which, we constructed the support vector machine (SVM) classification model to automatically distinguish these two kinds of proteins, with prediction accuracy of 87%,83% and 83% for HOLO-set, APO-set and Mixed-set respectively. Conclusions We found that they have different ranges of tunnel lengths and tunnel curvatures; moreover, the alignment results with OB-fold templates have also found to be the discriminative feature of SSBs and DSBs. Experimental results on 10-fold cross validation indicate that the new feature set are effective to describe DNA binding proteins. The evaluation results on both bound (DNA-bound) and non-bound (DNA-free) proteins have shown the satisfactory performance of our method.
Collapse
|
13
|
Yang XX, Deng ZL, Liu R. RBRDetector: Improved prediction of binding residues on RNA-binding protein structures using complementary feature- and template-based strategies. Proteins 2014; 82:2455-71. [DOI: 10.1002/prot.24610] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2014] [Revised: 04/28/2014] [Accepted: 05/09/2014] [Indexed: 11/05/2022]
Affiliation(s)
- Xiao-Xia Yang
- Agricultural Bioinformatics Key Laboratory of Hubei Province; College of Informatics; Huazhong Agricultural University; Wuhan 430070 People's Republic of China
| | - Zhi-Luo Deng
- Agricultural Bioinformatics Key Laboratory of Hubei Province; College of Informatics; Huazhong Agricultural University; Wuhan 430070 People's Republic of China
| | - Rong Liu
- Agricultural Bioinformatics Key Laboratory of Hubei Province; College of Informatics; Huazhong Agricultural University; Wuhan 430070 People's Republic of China
| |
Collapse
|
14
|
Hischenhuber B, Havlicek H, Todoric J, Höllrigl-Binder S, Schreiner W, Knapp B. Differential geometric analysis of alterations in MH α-helices. J Comput Chem 2013; 34:1862-79. [PMID: 23703160 PMCID: PMC3739936 DOI: 10.1002/jcc.23328] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Revised: 04/12/2013] [Accepted: 04/13/2013] [Indexed: 01/03/2023]
Abstract
Antigen presenting cells present processed peptides via their major histocompatibility (MH) complex to the T cell receptors (TRs) of T cells. If a peptide is immunogenic, a signaling cascade can be triggered within the T cell. However, the binding of different peptides and/or different TRs to MH is also known to influence the spatial arrangement of the MH α-helices which could itself be an additional level of T cell regulation. In this study, we introduce a new methodology based on differential geometric parameters to describe MH deformations in a detailed and comparable way. For this purpose, we represent MH α-helices by curves. On the basis of these curves, we calculate in a first step the curvature and torsion to describe each α-helix independently. In a second step, we calculate the distribution parameter and the conical curvature of the ruled surface to describe the relative orientation of the two α-helices. On the basis of four different test sets, we show how these differential geometric parameters can be used to describe changes in the spatial arrangement of the MH α-helices for different biological challenges. In the first test set, we illustrate on the basis of all available crystal structures for (TR)/pMH complexes how the binding of TRs influences the MH helices. In the second test set, we show a cross evaluation of different MH alleles with the same peptide and the same MH allele with different peptides. In the third test set, we present the spatial effects of different TRs on the same peptide/MH complex. In the fourth test set, we illustrate how a severe conformational change in an α-helix can be described quantitatively. Taken together, we provide a novel structural methodology to numerically describe subtle and severe alterations in MH α-helices for a broad range of applications.
Collapse
Affiliation(s)
- Birgit Hischenhuber
- Center for Medical Statistics, Informatics, and Intelligent Systems, Section for Biosimulation and Bioinformatics, Medical University of Vienna, Vienna, Austria
| | | | | | | | | | | |
Collapse
|
15
|
Mandel-Gutfreund Y. 74 Novel geometric approaches to uniquely characterize DNA-binding interfaces. J Biomol Struct Dyn 2013. [DOI: 10.1080/07391102.2013.786508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
16
|
Dey S, Pal A, Guharoy M, Sonavane S, Chakrabarti P. Characterization and prediction of the binding site in DNA-binding proteins: improvement of accuracy by combining residue composition, evolutionary conservation and structural parameters. Nucleic Acids Res 2012; 40:7150-61. [PMID: 22641851 PMCID: PMC3424558 DOI: 10.1093/nar/gks405] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
We present a set of four parameters that in combination can predict DNA-binding residues on protein structures to a high degree of accuracy. These are the number of evolutionary conserved residues (Ncons) and their spatial clustering (ρe), hydrogen bond donor capability (Dp) and residue propensity (Rp). We first used these parameters to characterize 130 interfaces in a set of 126 DNA-binding proteins (DBPs). The applicability of these parameters both individually and in combination, to distinguish the true binding region from the rest of the protein surface was then analyzed. Rp shows the best performance identifying the true interface with the top rank in 83% cases. Importantly, we also used the unbound-bound test cases of the protein–DNA docking benchmark to test the efficacy of our method. When applied to the unbound form of the DBPs, Rp can distinguish 86% cases. Finally, we have applied the SVM approach for recognizing the interface region using the above parameters along with the individual amino acid composition as attributes. The accuracy of prediction is 90.5% for the bound structures and 93.6% for the unbound form of the proteins.
Collapse
Affiliation(s)
- Sucharita Dey
- Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata 700 054, India
| | | | | | | | | |
Collapse
|
17
|
Walia RR, Caragea C, Lewis BA, Towfic F, Terribilini M, El-Manzalawy Y, Dobbs D, Honavar V. Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art. BMC Bioinformatics 2012; 13:89. [PMID: 22574904 PMCID: PMC3490755 DOI: 10.1186/1471-2105-13-89] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 05/10/2012] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.
Collapse
Affiliation(s)
- Rasna R Walia
- Bioinformatics and Computational Biology Program, Iowa State University, USA
- Department of Computer Science, Iowa State University, USA
| | - Cornelia Caragea
- Center for Computational Intelligence, Learning and Discovery, Iowa State University, USA
- College of Information Sciences & Technology, The Pennsylvania State University, University Park, USA
| | - Benjamin A Lewis
- Bioinformatics and Computational Biology Program, Iowa State University, USA
- Department of Genetics, Development and Cell Biology, , USA
| | - Fadi Towfic
- Center for Computational Intelligence, Learning and Discovery, Iowa State University, USA
- The Broad Institute, USA
| | | | - Yasser El-Manzalawy
- Department of Computer Science, Iowa State University, USA
- Center for Computational Intelligence, Learning and Discovery, Iowa State University, USA
- Department of Systems & Computer Engineering, Al-Azhar University, Egypt
| | - Drena Dobbs
- Bioinformatics and Computational Biology Program, Iowa State University, USA
- Department of Genetics, Development and Cell Biology, , USA
| | - Vasant Honavar
- Bioinformatics and Computational Biology Program, Iowa State University, USA
- Department of Computer Science, Iowa State University, USA
- Center for Computational Intelligence, Learning and Discovery, Iowa State University, USA
| |
Collapse
|
18
|
Dror I, Shazman S, Mukherjee S, Zhang Y, Glaser F, Mandel-Gutfreund Y. Predicting nucleic acid binding interfaces from structural models of proteins. Proteins 2011; 80:482-9. [PMID: 22086767 DOI: 10.1002/prot.23214] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Revised: 09/27/2011] [Accepted: 09/30/2011] [Indexed: 11/06/2022]
Abstract
The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure.
Collapse
Affiliation(s)
- Iris Dror
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel 32000
| | | | | | | | | | | |
Collapse
|
19
|
Computational methods for prediction of protein-RNA interactions. J Struct Biol 2011; 179:261-8. [PMID: 22019768 DOI: 10.1016/j.jsb.2011.10.001] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Revised: 09/28/2011] [Accepted: 10/04/2011] [Indexed: 12/21/2022]
Abstract
Understanding the molecular mechanism of protein-RNA recognition and complex formation is a major challenge in structural biology. Unfortunately, the experimental determination of protein-RNA complexes by X-ray crystallography and nuclear magnetic resonance spectroscopy (NMR) is tedious and difficult. Alternatively, protein-RNA interactions can be predicted by computational methods. Although less accurate than experimental observations, computational predictions can be sufficiently accurate to prompt functional hypotheses and guide experiments, e.g. to identify individual amino acid or nucleotide residues. In this article we review 10 methods for predicting protein-RNA interactions, seven of which predict RNA-binding sites from protein sequences and three from structures. We also developed a meta-predictor that uses the output of top three sequence-based primary predictors to calculate a consensus prediction, which outperforms all the primary predictors. In order to fully cover the software for predicting protein-RNA interactions, we also describe five methods for protein-RNA docking. The article highlights the strengths and shortcomings of existing methods for the prediction of protein-RNA interactions and provides suggestions for their further development.
Collapse
|