1
|
Machine learning for the identification of respiratory viral attachment machinery from sequences data. PLoS One 2023; 18:e0281642. [PMID: 36862685 PMCID: PMC9980812 DOI: 10.1371/journal.pone.0281642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 01/27/2023] [Indexed: 03/03/2023] Open
Abstract
At the outset of an emergent viral respiratory pandemic, sequence data is among the first molecular information available. As viral attachment machinery is a key target for therapeutic and prophylactic interventions, rapid identification of viral "spike" proteins from sequence can significantly accelerate the development of medical countermeasures. For six families of respiratory viruses, covering the vast majority of airborne and droplet-transmitted diseases, host cell entry is mediated by the binding of viral surface glycoproteins that interact with a host cell receptor. In this report it is shown that sequence data for an unknown virus belonging to one of the six families above provides sufficient information to identify the protein(s) responsible for viral attachment. Random forest models that take as input a set of respiratory viral sequences can classify the protein as "spike" vs. non-spike based on predicted secondary structure elements alone (with 97.3% correctly classified) or in combination with N-glycosylation related features (with 97.0% correctly classified). Models were validated through 10-fold cross-validation, bootstrapping on a class-balanced set, and an out-of-sample extra-familial validation set. Surprisingly, we showed that secondary structural elements and N-glycosylation features were sufficient for model generation. The ability to rapidly identify viral attachment machinery directly from sequence data holds the potential to accelerate the design of medical countermeasures for future pandemics. Furthermore, this approach may be extendable for the identification of other potential viral targets and for viral sequence annotation in general in the future.
Collapse
|
2
|
Yan TC, Yue ZX, Xu HQ, Liu YH, Hong YF, Chen GX, Tao L, Xie T. A systematic review of state-of-the-art strategies for machine learning-based protein function prediction. Comput Biol Med 2023; 154:106446. [PMID: 36680931 DOI: 10.1016/j.compbiomed.2022.106446] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/07/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
New drug discovery is inseparable from the discovery of drug targets, and the vast majority of the known targets are proteins. At the same time, proteins are essential structural and functional elements of living cells necessary for the maintenance of all forms of life. Therefore, protein functions have become the focus of many pharmacological and biological studies. Traditional experimental techniques are no longer adequate for rapidly growing annotation of protein sequences, and approaches to protein function prediction using computational methods have emerged and flourished. A significant trend has been to use machine learning to achieve this goal. In this review, approaches to protein function prediction based on the sequence, structure, protein-protein interaction (PPI) networks, and fusion of multi-information sources are discussed. The current status of research on protein function prediction using machine learning is considered, and existing challenges and prominent breakthroughs are discussed to provide ideas and methods for future studies.
Collapse
Affiliation(s)
- Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
3
|
Sahni G, Mewara B, Lalwani S, Kumar R. CF-PPI: Centroid based new feature extraction approach for Protein-Protein Interaction Prediction. J EXP THEOR ARTIF IN 2022. [DOI: 10.1080/0952813x.2022.2052189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Gunjan Sahni
- Department of Computer Science and Engineering, Career Point University, Kota, India
| | - Bhawna Mewara
- Department of Computer Science and Engineering, Career Point University, Kota, India
| | - Soniya Lalwani
- Department of Mathematics, Career Point University, Kota, India
| | - Rajesh Kumar
- Department of Electrical Engineering, Malaviya National Institute of Technology, Jaipur, India
| |
Collapse
|
4
|
Gao J, Zheng S, Yao M, Wu P. Precise estimation of residue relative solvent accessible area from Cα atom distance matrix using a deep learning method. Bioinformatics 2021; 38:94-98. [PMID: 34450651 DOI: 10.1093/bioinformatics/btab616] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 08/12/2021] [Accepted: 08/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The solvent accessible surface is an essential structural property measure related to the protein structure and protein function. Relative solvent accessible area (RSA) is a standard measure to describe the degree of residue exposure in the protein surface or inside of protein. However, this computation will fail when the residues information is missing. RESULTS In this article, we proposed a novel method for estimation RSA using the Cα atom distance matrix with the deep learning method (EAGERER). The new method, EAGERER, achieves Pearson correlation coefficients of 0.921-0.928 on two independent test datasets. We empirically demonstrate that EAGERER can yield better Pearson correlation coefficients than existing RSA estimators, such as coordination number, half sphere exposure and SphereCon. To the best of our knowledge, EAGERER represents the first method to estimate the solvent accessible area using limited information with a deep learning model. It could be useful to the protein structure and protein function prediction. AVAILABILITYAND IMPLEMENTATION The method is free available at https://github.com/cliffgao/EAGERER. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Shuangjia Zheng
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510006, China
| | - Mengting Yao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Peikun Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
5
|
Bhasin M, Varadarajan R. Prediction of Function Determining and Buried Residues Through Analysis of Saturation Mutagenesis Datasets. Front Mol Biosci 2021; 8:635425. [PMID: 33778004 PMCID: PMC7991590 DOI: 10.3389/fmolb.2021.635425] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 01/25/2021] [Indexed: 11/13/2022] Open
Abstract
Mutational scanning can be used to probe effects of large numbers of point mutations on protein function. Positions affected by mutation are primarily at either buried or at exposed residues directly involved in function, hereafter designated as active-site residues. In the absence of prior structural information, it has not been easy to distinguish between these two categories of residues. We curated and analyzed a set of twelve published deep mutational scanning datasets. The analysis revealed differential patterns of mutational sensitivity and substitution preferences at buried and exposed positions. Prediction of buried-sites solely from the mutational sensitivity data was facilitated by incorporating predicted sequence-based accessibility values. For active-site residues we observed mean sensitivity, specificity and accuracy of 61, 90 and 88% respectively. For buried residues the corresponding figures were 59, 90 and 84% while for exposed non active-site residues these were 98, 44 and 82% respectively. We also identified positions which did not follow these general trends and might require further experimental re-validation. This analysis highlights the ability of deep mutational scans to provide important structural and functional insights, even in the absence of three-dimensional structures determined using conventional structure determination techniques, and also discuss some limitations of the methodology.
Collapse
Affiliation(s)
- Munmun Bhasin
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, India
| |
Collapse
|
6
|
Zohra Smaili F, Tian S, Roy A, Alazmi M, Arold ST, Mukherjee S, Scott Hefty P, Chen W, Gao X. QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:998-1011. [PMID: 33631427 PMCID: PMC9403031 DOI: 10.1016/j.gpb.2021.02.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Revised: 04/03/2019] [Accepted: 05/17/2019] [Indexed: 11/25/2022]
Abstract
The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. Here, we propose a novel method, Quantitative Annotation of Unknown STructure (QAUST), to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. QAUST uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein–protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. These three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation (CAFA) benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of human tripartite motif-containing 22 (TRIM22) protein predicted by QAUST can be experimentally validated.
Collapse
Affiliation(s)
- Fatima Zohra Smaili
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Shuye Tian
- Department of Biology, Southern University of Science and Technology of China (SUSTC), Shenzhen 518055, China
| | - Ambrish Roy
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Meshari Alazmi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia; College of Computer Science and Engineering, University of Hail, Hail 55476, Saudi Arabia
| | - Stefan T Arold
- Biological and Environmental Sciences and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Srayanta Mukherjee
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - P Scott Hefty
- Department of Molecular Bioscience, University of Kansas, Lawrence, KS 66047, USA
| | - Wei Chen
- Department of Biology, Southern University of Science and Technology of China (SUSTC), Shenzhen 518055, China.
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.
| |
Collapse
|
7
|
Chen G, Seukep AJ, Guo M. Recent Advances in Molecular Docking for the Research and Discovery of Potential Marine Drugs. Mar Drugs 2020; 18:md18110545. [PMID: 33143025 PMCID: PMC7692358 DOI: 10.3390/md18110545] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 10/27/2020] [Accepted: 10/28/2020] [Indexed: 12/28/2022] Open
Abstract
Marine drugs have long been used and exhibit unique advantages in clinical practices. Among the marine drugs that have been approved by the Food and Drug Administration (FDA), the protein–ligand interactions, such as cytarabine–DNA polymerase, vidarabine–adenylyl cyclase, and eribulin–tubulin complexes, are the important mechanisms of action for their efficacy. However, the complex and multi-targeted components in marine medicinal resources, their bio-active chemical basis, and mechanisms of action have posed huge challenges in the discovery and development of marine drugs so far, which need to be systematically investigated in-depth. Molecular docking could effectively predict the binding mode and binding energy of the protein–ligand complexes and has become a major method of computer-aided drug design (CADD), hence this powerful tool has been widely used in many aspects of the research on marine drugs. This review introduces the basic principles and software of the molecular docking and further summarizes the applications of this method in marine drug discovery and design, including the early virtual screening in the drug discovery stage, drug target discovery, potential mechanisms of action, and the prediction of drug metabolism. In addition, this review would also discuss and prospect the problems of molecular docking, in order to provide more theoretical basis for clinical practices and new marine drug research and development.
Collapse
Affiliation(s)
- Guilin Chen
- Key Laboratory of Plant Germplasm Enhancement & Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China; (G.C.); (A.J.S.)
- Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan 430074, China
- Innovation Academy for Drug Discovery and Development, Chinese Academy of Sciences, Shanghai 201203, China
| | - Armel Jackson Seukep
- Key Laboratory of Plant Germplasm Enhancement & Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China; (G.C.); (A.J.S.)
- Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan 430074, China
- Innovation Academy for Drug Discovery and Development, Chinese Academy of Sciences, Shanghai 201203, China
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Buea, P.O. Box 63 Buea, Cameroon
| | - Mingquan Guo
- Key Laboratory of Plant Germplasm Enhancement & Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China; (G.C.); (A.J.S.)
- Sino-Africa Joint Research Center, Chinese Academy of Sciences, Wuhan 430074, China
- Innovation Academy for Drug Discovery and Development, Chinese Academy of Sciences, Shanghai 201203, China
- Correspondence: ; Tel.: +86-27-8770-0850
| |
Collapse
|
8
|
Gress A, Kalinina OV. SphereCon-a method for precise estimation of residue relative solvent accessible area from limited structural information. Bioinformatics 2020; 36:3372-3378. [PMID: 32154837 DOI: 10.1093/bioinformatics/btaa159] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 02/28/2020] [Accepted: 03/04/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. RESULTS We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. AVAILABILITY AND IMPLEMENTATION https://github.com/kalininalab/spherecon. CONTACT alexander.gress@helmholtz-hips.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexander Gress
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken 66123, Germany.,Graduate School of Computer Science, Saarland University, Saarbrücken 66123, Germany
| | - Olga V Kalinina
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken 66123, Germany.,Medical Faculty, Saarland University, Homburg 66421, Germany
| |
Collapse
|
9
|
An Ensemble Classifier with Random Projection for Predicting Protein–Protein Interactions Using Sequence and Evolutionary Information. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8010089] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
10
|
Du Y, Wu NC, Jiang L, Zhang T, Gong D, Shu S, Wu TT, Sun R. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis. mBio 2016; 7:e01801-16. [PMID: 27803181 PMCID: PMC5090041 DOI: 10.1128/mbio.01801-16] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 10/07/2016] [Indexed: 11/28/2022] Open
Abstract
Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. IMPORTANCE To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available.
Collapse
Affiliation(s)
- Yushen Du
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
- Cancer Institute, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, ZJU-UCLA Joint Center for Medical Education and Research, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Nicholas C Wu
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, USA
| | - Lin Jiang
- Department of Neurology, University of California Los Angeles, Los Angeles, California, USA
| | - Tianhao Zhang
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, USA
| | - Danyang Gong
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
| | - Sara Shu
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
| | - Ting-Ting Wu
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, Los Angeles, California, USA
- Cancer Institute, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, ZJU-UCLA Joint Center for Medical Education and Research, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, USA
| |
Collapse
|
11
|
Hu J, Li J, Chen N, Zhang X. Conservation of hot regions in protein–protein interaction in evolution. Methods 2016; 110:73-80. [DOI: 10.1016/j.ymeth.2016.06.020] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Revised: 06/08/2016] [Accepted: 06/21/2016] [Indexed: 11/28/2022] Open
|
12
|
Integrating Perspectives on Animal Venom Diversity: An Introduction to the Symposium. Integr Comp Biol 2016; 56:934-937. [DOI: 10.1093/icb/icw112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
|
13
|
Isaac AE, Sinha S. Analysis of core-periphery organization in protein contact networks reveals groups of structurally and functionally critical residues. J Biosci 2015; 40:683-99. [PMID: 26564971 DOI: 10.1007/s12038-015-9554-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The representation of proteins as networks of interacting amino acids, referred to as protein contact networks (PCN), and their subsequent analyses using graph theoretic tools, can provide novel insights into the key functional roles of specific groups of residues. We have characterized the networks corresponding to the native states of 66 proteins (belonging to different families) in terms of their core-periphery organization. The resulting hierarchical classification of the amino acid constituents of a protein arranges the residues into successive layers - having higher core order - with increasing connection density, ranging from a sparsely linked periphery to a densely intra-connected core (distinct from the earlier concept of protein core defined in terms of the three-dimensional geometry of the native state, which has least solvent accessibility). Our results show that residues in the inner cores are more conserved than those at the periphery. Underlining the functional importance of the network core, we see that the receptor sites for known ligand molecules of most proteins occur in the innermost core. Furthermore, the association of residues with structural pockets and cavities in binding or active sites increases with the core order. From mutation sensitivity analysis, we show that the probability of deleterious or intolerant mutations also increases with the core order. We also show that stabilization centre residues are in the innermost cores, suggesting that the network core is critically important in maintaining the structural stability of the protein. A publicly available Web resource for performing core-periphery analysis of any protein whose native state is known has been made available by us at http://www.imsc.res.in/ ~sitabhra/proteinKcore/index.html.
Collapse
Affiliation(s)
- Arnold Emerson Isaac
- Bioinformatics Division, School of Bio Sciences and Technology, VIT University, Vellore, India
| | | |
Collapse
|
14
|
Hernández S, Franco L, Calvo A, Ferragut G, Hermoso A, Amela I, Gómez A, Querol E, Cedano J. Bioinformatics and Moonlighting Proteins. Front Bioeng Biotechnol 2015; 3:90. [PMID: 26157797 PMCID: PMC4478894 DOI: 10.3389/fbioe.2015.00090] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 06/10/2015] [Indexed: 01/25/2023] Open
Abstract
Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyze and describe several approaches that use sequences, structures, interactomics, and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are (a) remote homology searches using Psi-Blast, (b) detection of functional motifs and domains, (c) analysis of data from protein–protein interaction databases (PPIs), (d) match the query protein sequence to 3D databases (i.e., algorithms as PISITE), and (e) mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs) has the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations – it requires the existence of multialigned family protein sequences – but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/), previously published by our group, has been used as a benchmark for the all of the analyses.
Collapse
Affiliation(s)
- Sergio Hernández
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona , Barcelona , Spain
| | - Luís Franco
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona , Barcelona , Spain
| | - Alejandra Calvo
- Laboratorio de Inmunología, Universidad de la República Regional Norte-Salto , Salto , Uruguay
| | - Gabriela Ferragut
- Laboratorio de Inmunología, Universidad de la República Regional Norte-Salto , Salto , Uruguay
| | - Antoni Hermoso
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona , Barcelona , Spain
| | - Isaac Amela
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona , Barcelona , Spain
| | - Antonio Gómez
- Cancer Epigenetics and Biology Program, Institut d'Investigació Biomèdica de Bellvitge, L'Hospitalet de Llobregat , Barcelona , Spain
| | - Enrique Querol
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona , Barcelona , Spain
| | - Juan Cedano
- Laboratorio de Inmunología, Universidad de la República Regional Norte-Salto , Salto , Uruguay
| |
Collapse
|
15
|
Pradeepkiran JA, Sainath SB, Kumar KK, Bhaskar M. Complete genome-wide screening and subtractive genomic approach revealed new virulence factors, potential drug targets against bio-war pathogen Brucella melitensis 16M. DRUG DESIGN DEVELOPMENT AND THERAPY 2015; 9:1691-706. [PMID: 25834405 PMCID: PMC4371898 DOI: 10.2147/dddt.s76948] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Brucella melitensis 16M is a Gram-negative coccobacillus that infects both animals and humans. It causes a disease known as brucellosis, which is characterized by acute febrile illness in humans and causes abortions in livestock. To prevent and control brucellosis, identification of putative drug targets is crucial. The present study aimed to identify drug targets in B. melitensis 16M by using a subtractive genomic approach. We used available database repositories (Database of Essential Genes, Kyoto Encyclopedia of Genes and Genomes Automatic Annotation Server, and Kyoto Encyclopedia of Genes and Genomes) to identify putative genes that are nonhomologous to humans and essential for pathogen B. melitensis 16M. The results revealed that among 3 Mb genome size of pathogen, 53 putative characterized and 13 uncharacterized hypothetical genes were identified; further, from Basic Local Alignment Search Tool protein analysis, one hypothetical protein showed a close resemblance (50%) to Silicibacter pomeroyi DUF1285 family protein (2RE3). A further homology model of the target was constructed using MODELLER 9.12 and optimized through variable target function method by molecular dynamics optimization with simulating annealing. The stereochemical quality of the restrained model was evaluated by PROCHECK, VERIFY-3D, ERRAT, and WHATIF servers. Furthermore, structure-based virtual screening was carried out against the predicted active site of the respective protein using the glycerol structural analogs from the PubChem database. We identified five best inhibitors with strong affinities, stable interactions, and also with reliable drug-like properties. Hence, these leads might be used as the most effective inhibitors of modeled protein. The outcome of the present work of virtual screening of putative gene targets might facilitate design of potential drugs for better treatment against brucellosis.
Collapse
Affiliation(s)
| | - Sri Bhashyam Sainath
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, Porto, Portugal ; Department of Biotechnology, Vikrama Simhapuri University, Nellore, Andhra Pradesh, India
| | - Konidala Kranthi Kumar
- Division of Animal Biotechnology, Department of Zoology, Sri Venkateswara University, Tirupati, India
| | - Matcha Bhaskar
- Division of Animal Biotechnology, Department of Zoology, Sri Venkateswara University, Tirupati, India
| |
Collapse
|
16
|
Lua RC, Marciano DC, Katsonis P, Adikesavan AK, Wilkins AD, Lichtarge O. Prediction and redesign of protein-protein interactions. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:194-202. [PMID: 24878423 DOI: 10.1016/j.pbiomolbio.2014.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 05/02/2014] [Accepted: 05/17/2014] [Indexed: 12/14/2022]
Abstract
Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function - those mediated by protein-protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI.
Collapse
Affiliation(s)
- Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David C Marciano
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Anbu K Adikesavan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
17
|
Zhao H, Wang J, Zhou Y, Yang Y. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome. PLoS One 2014; 9:e96694. [PMID: 24792350 PMCID: PMC4008587 DOI: 10.1371/journal.pone.0096694] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Accepted: 04/10/2014] [Indexed: 12/25/2022] Open
Abstract
As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.
Collapse
Affiliation(s)
- Huiying Zhao
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Jihua Wang
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Dezhou University, Dezhou, Shandong, China
| | - Yaoqi Zhou
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Dezhou University, Dezhou, Shandong, China
- Institute for Glycomics and School of Information and Communication Technique, Griffith University, Southport, Queensland, Australia
- * E-mail: (YZ); (YY)
| | - Yuedong Yang
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Institute for Glycomics and School of Information and Communication Technique, Griffith University, Southport, Queensland, Australia
- * E-mail: (YZ); (YY)
| |
Collapse
|
18
|
Chen YH, Chiang YH, Ma HI. Analysis of spatial and temporal protein expression in the cerebral cortex after ischemia-reperfusion injury. J Clin Neurol 2014; 10:84-93. [PMID: 24829593 PMCID: PMC4017024 DOI: 10.3988/jcn.2014.10.2.84] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Revised: 09/24/2013] [Accepted: 09/26/2013] [Indexed: 01/26/2023] Open
Abstract
Background and Purpose Hypoxia, or ischemia, is a common cause of neurological deficits in the elderly. This study elucidated the mechanisms underlying ischemia-induced brain injury that results in neurological sequelae. Methods Cerebral ischemia was induced in male Sprague-Dawley rats by transient ligation of the left carotid artery followed by 60 min of hypoxia. A two-dimensional differential proteome analysis was performed using matrix-assisted laser desorption ionization-time-of-flight mass spectrometry to compare changes in protein expression on the lesioned side of the cortex relative to that on the contralateral side at 0, 6, and 24 h after ischemia. Results The expressions of the following five proteins were up-regulated in the ipsilateral cortex at 24 h after ischemia-reperfusion injury compared to the contralateral (i.e., control) side: aconitase 2, neurotensin-related peptide, hypothetical protein XP-212759, 60-kDa heat-shock protein, and aldolase A. The expression of one protein, dynamin-1, was up-regulated only at the 6-h time point. The level of 78-kDa glucose-regulated protein precursor on the lesioned side of the cerebral cortex was found to be high initially, but then down-regulated by 24 h after the induction of ischemia-reperfusion injury. The expressions of several metabolic enzymes and translational factors were also perturbed soon after brain ischemia. Conclusions These findings provide insights into the mechanisms underlying the neurodegenerative events that occur following cerebral ischemia.
Collapse
Affiliation(s)
- Yuan-Hao Chen
- Department of Neurological Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan, ROC
| | - Yung-Hsiao Chiang
- Section of Neurosurgery, Department of Surgery, Taipei Medical University Hospital, Taipei Medical University, Taipei, Taiwan, ROC
| | - Hsin-I Ma
- Department of Neurological Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan, ROC
| |
Collapse
|
19
|
Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Brief Bioinform 2014; 16:71-88. [DOI: 10.1093/bib/bbt092] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
20
|
Nemoto W, Saito A, Oikawa H. Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions. Comput Struct Biotechnol J 2013; 8:e201308007. [PMID: 24688747 PMCID: PMC3962155 DOI: 10.5936/csbj.201308007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 11/12/2013] [Accepted: 11/13/2013] [Indexed: 11/22/2022] Open
Abstract
Structural genomics projects have solved many new structures with unknown functions. One strategy to investigate the function of a structure is to computationally find the functionally important residues or regions on it. Therefore, the development of functional region prediction methods has become an important research subject. An effective approach is to use a method employing structural and evolutionary information, such as the evolutionary trace (ET) method. ET ranks the residues of a protein structure by calculating the scores for relative evolutionary importance, and locates functionally important sites by identifying spatial clusters of highly ranked residues. After ET was developed, numerous ET-like methods were subsequently reported, and many of them are in practical use, although they require certain conditions. In this mini review, we first introduce the remaining problems and the recent improvements in the methods using structural and evolutionary information. We then summarize the recent developments of the methods. Finally, we conclude by describing possible extensions of the evolution- and structure-based methods.
Collapse
Affiliation(s)
- Wataru Nemoto
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Akira Saito
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Hayato Oikawa
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| |
Collapse
|
21
|
Dukka BK. Structure-based Methods for Computational Protein Functional Site Prediction. Comput Struct Biotechnol J 2013; 8:e201308005. [PMID: 24688745 PMCID: PMC3962076 DOI: 10.5936/csbj.201308005] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Revised: 11/07/2013] [Accepted: 11/11/2013] [Indexed: 11/22/2022] Open
Abstract
Due to the advent of high throughput sequencing techniques and structural genomic projects, the number of gene and protein sequences has been ever increasing. Computational methods to annotate these genes and proteins are even more indispensable. Proteins are important macromolecules and study of the function of proteins is an important problem in structural bioinformatics. This paper discusses a number of methods to predict protein functional site especially focusing on protein ligand binding site prediction. Initially, a short overview is presented on recent advances in methods for selection of homologous sequences. Furthermore, a few recent structural based approaches and sequence-and-structure based approaches for protein functional sites are discussed in details.
Collapse
Affiliation(s)
- B Kc Dukka
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, 27411, USA
| |
Collapse
|
22
|
Wong GY, Leung FHF, Ling SH. Predicting protein-ligand binding site using support vector machine with protein properties. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1517-1529. [PMID: 24407309 DOI: 10.1109/tcbb.2013.126] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Identification of protein-ligand binding site is an important task in structure-based drug design and docking algorithms. In the past two decades, different approaches have been developed to predict the binding site, such as the geometric, energetic, and sequence-based methods. When scores are calculated from these methods, the algorithm for doing classification becomes very important and can affect the prediction results greatly. In this paper, the support vector machine (SVM) is used to cluster the pockets that are most likely to bind ligands with the attributes of geometric characteristics, interaction potential, offset from protein, conservation score, and properties surrounding the pockets. Our approach is compared to LIGSITE, LIGSITE(CSC), SURFNET, Fpocket, PocketFinder, Q-SiteFinder, ConCavity, and MetaPocket on the data set LigASite and 198 drug-target protein complexes. The results show that our approach improves the success rate from 60 to 80 percent at AUC measure and from 61 to 66 percent at top 1 prediction. Our method also provides more comprehensive results than the others.
Collapse
Affiliation(s)
| | | | - S H Ling
- University of Technology Sydney, Sydney
| |
Collapse
|
23
|
Wilkins AD, Venner E, Marciano DC, Erdin S, Atri B, Lua RC, Lichtarge O. Accounting for epistatic interactions improves the functional analysis of protein structures. Bioinformatics 2013; 29:2714-21. [PMID: 24021383 PMCID: PMC3799481 DOI: 10.1093/bioinformatics/btt489] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Motivation: The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. Methods and Results: We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Conclusions: Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. Contact:lichtarge@bcm.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Angela D Wilkins
- Department of Molecular and Human Genetics, CIBR Center for Computational and Integrative Biomedical Research and Program in Structural and Computational Biology & Molecular Biophysics, Baylor College of Medicine, Houston, TX 77030 and Center for Human Genetic Research, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | | | | | | | | | | | | |
Collapse
|
24
|
Murakami Y, Kinoshita K, Kinjo AR, Nakamura H. Exhaustive comparison and classification of ligand-binding surfaces in proteins. Protein Sci 2013; 22:1379-91. [PMID: 23934772 PMCID: PMC3795496 DOI: 10.1002/pro.2329] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 07/29/2013] [Accepted: 08/05/2013] [Indexed: 12/03/2022]
Abstract
Many proteins function by interacting with other small molecules (ligands). Identification of ligand-binding sites (LBS) in proteins can therefore help to infer their molecular functions. A comprehensive comparison among local structures of LBSs was previously performed, in order to understand their relationships and to classify their structural motifs. However, similar exhaustive comparison among local surfaces of LBSs (patches) has never been performed, due to computational complexity. To enhance our understanding of LBSs, it is worth performing such comparisons among patches and classifying them based on similarities of their surface configurations and electrostatic potentials. In this study, we first developed a rapid method to compare two patches. We then clustered patches corresponding to the same PDB chemical component identifier for a ligand, and selected a representative patch from each cluster. We subsequently exhaustively as compared the representative patches and clustered them using similarity score, PatSim. Finally, the resultant PatSim scores were compared with similarities of atomic structures of the LBSs and those of the ligand-binding protein sequences and functions. Consequently, we classified the patches into ∼2000 well-characterized clusters. We found that about 63% of these clusters are used in identical protein folds, although about 25% of the clusters are conserved in distantly related proteins and even in proteins with cross-fold similarity. Furthermore, we showed that patches with higher PatSim score have potential to be involved in similar biological processes.
Collapse
Affiliation(s)
- Yoichi Murakami
- Graduate School of Information Sciences, Tohoku University, 6-3-09 Aramaki-aza-aoba, Aoba-ku, Sendai, Miyagi, 982-0036, Japan
| | | | | | | |
Collapse
|
25
|
Zhang Z, Lange OF. Replica exchange improves sampling in low-resolution docking stage of RosettaDock. PLoS One 2013; 8:e72096. [PMID: 24009670 PMCID: PMC3756964 DOI: 10.1371/journal.pone.0072096] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Accepted: 07/10/2013] [Indexed: 11/18/2022] Open
Abstract
Many protein-protein docking protocols are based on a shotgun approach, in which thousands of independent random-start trajectories minimize the rigid-body degrees of freedom. Another strategy is enumerative sampling as used in ZDOCK. Here, we introduce an alternative strategy, ReplicaDock, using a small number of long trajectories of temperature replica exchange. We compare replica exchange sampling as low-resolution stage of RosettaDock with RosettaDock's original shotgun sampling as well as with ZDOCK. A benchmark of 30 complexes starting from structures of the unbound binding partners shows improved performance for ReplicaDock and ZDOCK when compared to shotgun sampling at equal or less computational expense. ReplicaDock and ZDOCK consistently reach lower energies and generate significantly more near-native conformations than shotgun sampling. Accordingly, they both improve typical metrics of prediction quality of complex structures after refinement. Additionally, the refined ReplicaDock ensembles reach significantly lower interface energies and many previously hidden features of the docking energy landscape become visible when ReplicaDock is applied.
Collapse
Affiliation(s)
- Zhe Zhang
- Biomolecular NMR and Munich Center for Integrated Protein Science, Department Chemie, Technische Universität München, Garching, Germany
| | - Oliver F. Lange
- Biomolecular NMR and Munich Center for Integrated Protein Science, Department Chemie, Technische Universität München, Garching, Germany
- Institute of Structural Biology, Helmholtz Zentrum München, Neuherberg, Germany
- * E-mail:
| |
Collapse
|
26
|
Skolnick J, Zhou H, Gao M. Are predicted protein structures of any value for binding site prediction and virtual ligand screening? Curr Opin Struct Biol 2013; 23:191-7. [PMID: 23415854 DOI: 10.1016/j.sbi.2013.01.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 01/04/2013] [Accepted: 01/23/2013] [Indexed: 01/03/2023]
Abstract
The recently developed field of ligand homology modeling (LHM) that extends the ideas of protein homology modeling to the prediction of ligand binding sites and for use in virtual ligand screening has emerged as a powerful new approach. Unlike traditional docking methodologies, LHM can be applied to low-to-moderate resolution predicted as well as experimental structures with little if any diminution in performance; thereby enabling ≈ 75% of an average proteome to have potentially significant virtual screening predictions. In large scale benchmarking, LHM is able to predict off-target ligand binding. Thus, despite the widespread belief to the contrary, low-to-moderate resolution predicted structures have considerable utility for biochemical function prediction.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA.
| | | | | |
Collapse
|
27
|
Ma X, Guo J, Liu HD, Xie JM, Sun X. Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1766-1775. [PMID: 22868682 DOI: 10.1109/tcbb.2012.106] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The recognition of DNA-binding residues in proteins is critical to our understanding of the mechanisms of DNA-protein interactions, gene expression, and for guiding drug design. Therefore, a prediction method DNABR (DNA Binding Residues) is proposed for predicting DNA-binding residues in protein sequences using the random forest (RF) classifier with sequence-based features. Two types of novel sequence features are proposed in this study, which reflect the information about the conservation of physicochemical properties of the amino acids, and the correlation of amino acids between different sequence positions in terms of physicochemical properties. The first type of feature uses the evolutionary information combined with the conservation of physicochemical properties of the amino acids while the second reflects the dependency effect of amino acids with regards to polarity charge and hydrophobic properties in the protein sequences. Those two features and an orthogonal binary vector which reflect the characteristics of 20 types of amino acids are used to build the DNABR, a model to predict DNA-binding residues in proteins. The DNABR model achieves a value of 0.6586 for Matthew’s correlation coefficient (MCC) and 93.04 percent overall accuracy (ACC) with a68.47 percent sensitivity (SE) and 98.16 percent specificity (SP), respectively. The comparisons with each feature demonstrate that these two novel features contribute most to the improvement in predictive ability. Furthermore, performance comparisons with other approaches clearly show that DNABR has an excellent prediction performance for detecting binding residues in putative DNA-binding protein. The DNABR web-server system is freely available at http://www.cbi.seu.edu.cn/DNABR/.
Collapse
Affiliation(s)
- Xin Ma
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University and Nanjing Audit University, Nanjing, P.R. China.
| | | | | | | | | |
Collapse
|
28
|
Bhardwaj N, Langlois R, Zhao G, Lu H. Structure Based Prediction of Binding Residues on DNA-binding Proteins. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2012; 2005:2611-4. [PMID: 17282773 DOI: 10.1109/iembs.2005.1617004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Annotation of the functional sites on the surface of a protein has been the subject of many studies. In this regard, the search for attributes and features characterizing these sites is of prime consequence. Here, we present an implementation of a kernel-based machine learning protocol for identifying residues on a DNA-binding protein form the interface with the DNA. Sequence and structural features including solvent accessibility, local composition, net charge and electrostatic potentials are examined. These features are then fed into Support Vector Machines (SVM) to predict the DNA-binding residues on the surface of the protein. In order to compare with published work, we predict binding residues by training on other binding and non-binding residues in the same protein for which we achieved an accuracy of 79%. The sensitivity and specificity are 59% and 89%. We also consider a more realistic approach, predicting the binding residues of proteins entirely withheld from the training set achieving values of 66%, 43% and 81%, respectively. Performances reported here are better than other published results. Moreover, since our protocol does not lean on sequence or structural homology, it can be used to annotate unclassified proteins and more generally to identify novel binding sites with no similarity to the known cases.
Collapse
Affiliation(s)
- Nitin Bhardwaj
- Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | | | | | | |
Collapse
|
29
|
Arnold Emerson I, Gothandam KM. Residue centrality in alpha helical polytopic transmembrane protein structures. J Theor Biol 2012; 309:78-87. [PMID: 22721996 DOI: 10.1016/j.jtbi.2012.06.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Revised: 04/16/2012] [Accepted: 06/04/2012] [Indexed: 10/28/2022]
Abstract
Transmembrane proteins serve as receptors, transporters or as enzymes. They mediate a broad range of fundamental cellular activities including signal transduction, cell trafficking and photosynthesis. In this study, we analyzed the significance of central residues in the polytopic transmembrane proteins. Each protein is represented as an undirected graph, where residues represent nodes and inter-residue interactions as the edges. Residue centrality was calculated by removing the nodes and its corresponding edges from the protein contact network. Results revealed that 80% of the predicted central residues had normalized conservation values below the mean since they were slowly evolving conserved sites. We also found that 56% of amino acids were interacting with the ligand molecules and metal ions. Predicted central residues in the polytopic transmembrane proteins were found to account for 84% of binding and active site amino acids. From mutation sensitivity analysis, it was observed that 89% of central residues had deleterious mutations whose probabilities were greater than their mean value. Interestingly, we find that z-score values of each amino acid positively correlate with the conservation scores and also with the degrees of each node. Results show that 87% of central residues are hub residues.
Collapse
Affiliation(s)
- I Arnold Emerson
- School of Bio Sciences and Technology, VIT University, Vellore-632014, Tamil Nadu, India
| | | |
Collapse
|
30
|
Structural analysis of hypothetical proteins from Helicobacter pylori: an approach to estimate functions of unknown or hypothetical proteins. Int J Mol Sci 2012; 13:7109-7137. [PMID: 22837682 PMCID: PMC3397514 DOI: 10.3390/ijms13067109] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Revised: 05/29/2012] [Accepted: 06/01/2012] [Indexed: 12/12/2022] Open
Abstract
Helicobacter pylori (H. pylori) have a unique ability to survive in extreme acidic environments and to colonize the gastric mucosa. It can cause diverse gastric diseases such as peptic ulcers, chronic gastritis, mucosa-associated lymphoid tissue (MALT) lymphoma, gastric cancer, etc. Based on genomic research of H. pylori, over 1600 genes have been functionally identified so far. However, H. pylori possess some genes that are uncharacterized since: (i) the gene sequences are quite new; (ii) the function of genes have not been characterized in any other bacterial systems; and (iii) sometimes, the protein that is classified into a known protein based on the sequence homology shows some functional ambiguity, which raises questions about the function of the protein produced in H. pylori. Thus, there are still a lot of genes to be biologically or biochemically characterized to understand the whole picture of gene functions in the bacteria. In this regard, knowledge on the 3D structure of a protein, especially unknown or hypothetical protein, is frequently useful to elucidate the structure-function relationship of the uncharacterized gene product. That is, a structural comparison with known proteins provides valuable information to help predict the cellular functions of hypothetical proteins. Here, we show the 3D structures of some hypothetical proteins determined by NMR spectroscopy and X-ray crystallography as a part of the structural genomics of H. pylori. In addition, we show some successful approaches of elucidating the function of unknown proteins based on their structural information.
Collapse
|
31
|
Nemoto W, Toh H. Functional region prediction with a set of appropriate homologous sequences--an index for sequence selection by integrating structure and sequence information with spatial statistics. BMC STRUCTURAL BIOLOGY 2012; 12:11. [PMID: 22643026 PMCID: PMC3533907 DOI: 10.1186/1472-6807-12-11] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Accepted: 04/19/2012] [Indexed: 11/17/2022]
Abstract
Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. Conclusions Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems.
Collapse
Affiliation(s)
- Wataru Nemoto
- Computational Biology Research Center (CBRC), Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | |
Collapse
|
32
|
Wilkins AD, Bachman BJ, Erdin S, Lichtarge O. The use of evolutionary patterns in protein annotation. Curr Opin Struct Biol 2012; 22:316-25. [PMID: 22633559 DOI: 10.1016/j.sbi.2012.05.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Accepted: 05/01/2012] [Indexed: 01/13/2023]
Abstract
With genomic data skyrocketing, their biological interpretation remains a serious challenge. Diverse computational methods address this problem by pointing to the existence of recurrent patterns among sequence, structure, and function. These patterns emerge naturally from evolutionary variation, natural selection, and divergence--the defining features of biological systems--and they identify molecular events and shapes that underlie specificity of function and allosteric communication. Here we review these methods, and the patterns they identify in case studies and in proteome-wide applications, to infer and rationally redesign function.
Collapse
Affiliation(s)
- Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | | | |
Collapse
|
33
|
Barrantes-Reynolds R, Wallace SS, Bond JP. Using shifts in amino acid frequency and substitution rate to identify latent structural characters in base-excision repair enzymes. PLoS One 2011; 6:e25246. [PMID: 21998646 PMCID: PMC3188539 DOI: 10.1371/journal.pone.0025246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2010] [Accepted: 08/30/2011] [Indexed: 12/30/2022] Open
Abstract
Protein evolution includes the birth and death of structural motifs. For example, a zinc finger or a salt bridge may be present in some, but not all, members of a protein family. We propose that such transitions are manifest in sequence phylogenies as concerted shifts in substitution rates of amino acids that are neighbors in a representative structure. First, we identified rate shifts in a quartet from the Fpg/Nei family of base excision repair enzymes using a method developed by Xun Gu and coworkers. We found the shifts to be spatially correlated, more precisely, associated with a flexible loop involved in bacterial Fpg substrate specificity. Consistent with our result, sequences and structures provide convincing evidence that this loop plays a very different role in other family members. Second, then, we developed a method for identifying latent protein structural characters (LSC) given a set of homologous sequences based on Gu's method and proximity in a high-resolution structure. Third, we identified LSC and assigned states of LSC to clades within the Fpg/Nei family of base excision repair enzymes. We describe seven LSC; an accompanying Proteopedia page (http://proteopedia.org/wiki/index.php/Fpg_Nei_Protein_Family) describes these in greater detail and facilitates 3D viewing. The LSC we found provided a surprisingly complete picture of the interaction of the protein with the DNA capturing familiar examples, such as a Zn finger, as well as more subtle interactions. Their preponderance is consistent with an important role as phylogenetic characters. Phylogenetic inference based on LSC provided convincing evidence of independent losses of Zn fingers. Structural motifs may serve as important phylogenetic characters and modeling transitions involving structural motifs may provide a much deeper understanding of protein evolution.
Collapse
Affiliation(s)
- Ramiro Barrantes-Reynolds
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America
| | - Susan S. Wallace
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America
| | - Jeffrey P. Bond
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America
- * E-mail:
| |
Collapse
|
34
|
Wass MN, David A, Sternberg MJE. Challenges for the prediction of macromolecular interactions. Curr Opin Struct Biol 2011; 21:382-90. [DOI: 10.1016/j.sbi.2011.03.013] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2010] [Revised: 03/04/2011] [Accepted: 03/24/2011] [Indexed: 12/14/2022]
|
35
|
Kc DB, Livesay DR. Topology improves phylogenetic motif functional site predictions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:226-233. [PMID: 21071810 DOI: 10.1109/tcbb.2009.60] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Prediction of protein functional sites from sequence-derived data remains an open bioinformatics problem. We have developed a phylogenetic motif (PM) functional site prediction approach that identifies functional sites from alignment fragments that parallel the evolutionary patterns of the family. In our approach, PMs are identified by comparing tree topologies of each alignment fragment to that of the complete phylogeny. Herein, we bypass the phylogenetic reconstruction step and identify PMs directly from distance matrix comparisons. In order to optimize the new algorithm, we consider three different distance matrices and 13 different matrix similarity scores. We assess the performance of the various approaches on a structurally nonredundant data set that includes three types of functional site definitions. Without exception, the predictive power of the original approach outperforms the distance matrix variants. While the distance matrix methods fail to improve upon the original approach, our results are important because they clearly demonstrate that the improved predictive power is based on the topological comparisons. Meaning that phylogenetic trees are a straightforward, yet powerful way to improve functional site prediction accuracy. While complementary studies have shown that topology improves predictions of protein-protein interactions, this report represents the first demonstration that trees improve functional site predictions as well.
Collapse
Affiliation(s)
- Dukka B Kc
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd., Charlotte, NC 28223, USA.
| | | |
Collapse
|
36
|
Prymula K, Jadczyk T, Roterman I. Catalytic residues in hydrolases: analysis of methods designed for ligand-binding site prediction. J Comput Aided Mol Des 2010; 25:117-33. [PMID: 21104192 PMCID: PMC3032897 DOI: 10.1007/s10822-010-9402-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Accepted: 11/08/2010] [Indexed: 11/26/2022]
Abstract
The comparison of eight tools applicable to ligand-binding site prediction is presented. The methods examined cover three types of approaches: the geometrical (CASTp, PASS, Pocket-Finder), the physicochemical (Q-SiteFinder, FOD) and the knowledge-based (ConSurf, SuMo, WebFEATURE). The accuracy of predictions was measured in reference to the catalytic residues documented in the Catalytic Site Atlas. The test was performed on a set comprising selected chains of hydrolases. The results were analysed with regard to size, polarity, secondary structure, accessible solvent area of predicted sites as well as parameters commonly used in machine learning (F-measure, MCC). The relative accuracies of predictions are presented in the ROC space, allowing determination of the optimal methods by means of the ROC convex hull. Additionally the minimum expected cost analysis was performed. Both advantages and disadvantages of the eight methods are presented. Characterization of protein chains in respect to the level of difficulty in the active site prediction is introduced. The main reasons for failures are discussed. Overall, the best performance offers SuMo followed by FOD, while Pocket-Finder is the best method among the geometrical approaches.
Collapse
Affiliation(s)
- Katarzyna Prymula
- Faculty of Chemistry, Jagiellonian University, 3 Ingardena Street, 30-060 Krakow, Poland
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, 7E Kopernika Street, 31-034 Krakow, Poland
| | - Tomasz Jadczyk
- Department of Electronics, AGH University of Science and Technology, 30 Mickiewicza Avenue, 30-059 Krakow, Poland
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, 16 Lazarza Street, 31-530 Krakow, Poland
| |
Collapse
|
37
|
Sonavane S, Chakrabarti P. Prediction of active site cleft using support vector machines. J Chem Inf Model 2010; 50:2266-73. [PMID: 21080689 DOI: 10.1021/ci1002922] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Computational tools are available today for the detection and delineation of the clefts and cavities in protein 3D structure and ranking them on the basis of probable binding site clefts. There is a need to improve the ranking of clefts and accuracy of predicting catalytic site clefts. Our results show that the distance of the clefts from protein centroid and sequence entropy of the lining residues, when used in conjunction with the volume, are valuable descriptors for predicting the catalytic site. We have applied the SVM approach for recognizing and ranking the active site clefts and tested its performance using different combinations of attributes. In both the ligand-bound and the unbound forms of structures, our method correctly predicts the active site clefts in 73% of cases at rank one. If we consider the results at rank 3 (i.e., the correct solution is among one of the top three solutions), the correctly predicted cases are 94% and 90% for the bound and the unbound forms of structures, respectively. Our approach improves the ranking of binding site clefts in comparison with CASTp and is comparable to other existing methods like Fpocket. Although the data set for training the SVM approach is rather small in size, the results are encouraging for the method to be used as complementary to other existing tools.
Collapse
Affiliation(s)
- Shrihari Sonavane
- Department of Biochemistry and Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata 700 054, India
| | | |
Collapse
|
38
|
Volkamer A, Griewel A, Grombacher T, Rarey M. Analyzing the Topology of Active Sites: On the Prediction of Pockets and Subpockets. J Chem Inf Model 2010; 50:2041-52. [DOI: 10.1021/ci100241y] [Citation(s) in RCA: 172] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Andrea Volkamer
- Research Group for Computational Molecular Design, Bundesstr. 43, 20146 Hamburg, Germany, and Merck KGaA, Frankfurter Str. 250, 64293 Darmstadt, Germany
| | - Axel Griewel
- Research Group for Computational Molecular Design, Bundesstr. 43, 20146 Hamburg, Germany, and Merck KGaA, Frankfurter Str. 250, 64293 Darmstadt, Germany
| | - Thomas Grombacher
- Research Group for Computational Molecular Design, Bundesstr. 43, 20146 Hamburg, Germany, and Merck KGaA, Frankfurter Str. 250, 64293 Darmstadt, Germany
| | - Matthias Rarey
- Research Group for Computational Molecular Design, Bundesstr. 43, 20146 Hamburg, Germany, and Merck KGaA, Frankfurter Str. 250, 64293 Darmstadt, Germany
| |
Collapse
|
39
|
Lee T, Min H, Kim SJ, Yoon S. Application of maximin correlation analysis to classifying protein environments for function prediction. Biochem Biophys Res Commun 2010; 400:219-24. [DOI: 10.1016/j.bbrc.2010.08.042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2010] [Accepted: 08/11/2010] [Indexed: 10/19/2022]
|
40
|
Bandyopadhyay D, Huan J, Liu J, Prins J, Snoeyink J, Wang W, Tropsha A. Functional neighbors: inferring relationships between nonhomologous protein families using family-specific packing motifs. ACTA ACUST UNITED AC 2010; 14:1137-43. [PMID: 20570776 DOI: 10.1109/titb.2010.2053550] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We describe a new approach for inferring the functional relationships between nonhomologous protein families by looking at statistical enrichment of alternative function predictions in classification hierarchies such as Gene Ontology (GO) and Structural Classification of Proteins (SCOP). Protein structures are represented by robust graph representations, and the fast frequent subgraph mining algorithm is applied to protein families to generate sets of family-specific packing motifs, i.e., amino acid residue-packing patterns shared by most family members but infrequent in other proteins. The function of a protein is inferred by identifying in it motifs characteristic of a known family. We employ these family-specific motifs to elucidate functional relationships between families in the GO and SCOP hierarchies. Specifically, we postulate that two families are functionally related if one family is statistically enriched by motifs characteristic of another family, i.e., if the number of proteins in a family containing a motif from another family is greater than expected by chance. This function-inference method can help annotate proteins of unknown function, establish functional neighbors of existing families, and help specify alternate functions for known proteins.
Collapse
Affiliation(s)
- Deepak Bandyopadhyay
- Department of Computational and Structural Chemistry, GlaxoSmithKline, Collegeville, PA UP12-210, USA.
| | | | | | | | | | | | | |
Collapse
|
41
|
Bell RE, Ben-Tal N. In silico identification of functional protein interfaces. Comp Funct Genomics 2010; 4:420-3. [PMID: 18629079 PMCID: PMC2447364 DOI: 10.1002/cfg.309] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2003] [Revised: 06/03/2003] [Accepted: 06/03/2003] [Indexed: 12/02/2022] Open
Abstract
Proteins perform many of their biological roles through protein–protein, protein–DNA or protein–ligand interfaces. The identification of the amino acids comprising
these interfaces often enhances our understanding of the biological function of
the proteins. Many methods for the detection of functional interfaces have been developed,
and large-scale analyses have provided assessments of their accuracy. Among
them are those that consider the size of the protein interface, its amino acid composition
and its physicochemical and geometrical properties. Other methods to this
effect use statistical potential functions of pairwise interactions, and evolutionary
information. The rationale of the evolutionary approach is that functional and structural
constraints impose selective pressure; hence, biologically important interfaces
often evolve at a slower pace than do other external regions of the protein. Recently,
an algorithm, Rate4Site, and a web-server, ConSurf (http://consurf.tau.ac.il/), for
the identification of functional interfaces based on the evolutionary relations among
homologous proteins as reflected in phylogenetic trees, were developed in our laboratory.
The explicit use of the tree topology and branch lengths makes the method
remarkably accurate and sensitive. Here we demonstrate its potency in the identification
of the functional interfaces of a hypothetical protein, the structure of which was
determined as part of the international structural genomics effort. Finally, we propose
to combine complementary procedures, in order to enhance the overall performance
of methods for the identification of functional interfaces in proteins.
Collapse
Affiliation(s)
- Rachel E Bell
- Department of Biochemistry, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel
| | | |
Collapse
|
42
|
Wass MN, Kelley LA, Sternberg MJE. 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Res 2010; 38:W469-73. [PMID: 20513649 PMCID: PMC2896164 DOI: 10.1093/nar/gkq406] [Citation(s) in RCA: 474] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
3DLigandSite is a web server for the prediction of ligand-binding sites. It is based upon successful manual methods used in the eighth round of the Critical Assessment of techniques for protein Structure Prediction (CASP8). 3DLigandSite utilizes protein-structure prediction to provide structural models for proteins that have not been solved. Ligands bound to structures similar to the query are superimposed onto the model and used to predict the binding site. In benchmarking against the CASP8 targets 3DLigandSite obtains a Matthew’s correlation co-efficient (MCC) of 0.64, and coverage and accuracy of 71 and 60%, respectively, similar results to our manual performance in CASP8. In further benchmarking using a large set of protein structures, 3DLigandSite obtains an MCC of 0.68. The web server enables users to submit either a query sequence or structure. Predictions are visually displayed via an interactive Jmol applet. 3DLigandSite is available for use at http://www.sbg.bio.ic.ac.uk/3dligandsite.
Collapse
Affiliation(s)
- Mark N Wass
- Structural Bioinformatics Group, Centre for Bioinformatics, Imperial College London, London, SW7 2AZ, UK
| | | | | |
Collapse
|
43
|
Guharoy M, Chakrabarti P. Conserved residue clusters at protein-protein interfaces and their use in binding site identification. BMC Bioinformatics 2010; 11:286. [PMID: 20507585 PMCID: PMC2894039 DOI: 10.1186/1471-2105-11-286] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2010] [Accepted: 05/27/2010] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Biological evolution conserves protein residues that are important for structure and function. Both protein stability and function often require a certain degree of structural co-operativity between spatially neighboring residues and it has previously been shown that conserved residues occur clustered together in protein tertiary structures, enzyme active sites and protein-DNA interfaces. Residues comprising protein interfaces are often more conserved compared to those occurring elsewhere on the protein surface. We investigate the extent to which conserved residues within protein-protein interfaces are clustered together in three-dimensions. RESULTS Out of 121 and 392 interfaces in homodimers and heterocomplexes, 96.7 and 86.7%, respectively, have the conserved positions clustered within the overall interface region. The significance of this clustering was established in comparison to what is seen for the subsets of the same size of randomly selected residues from the interface. Conserved residues occurring in larger interfaces could often be sub-divided into two or more distinct sub-clusters. These structural cluster(s) comprising conserved residues indicate functionally important regions within the protein-protein interface that can be targeted for further structural and energetic analysis by experimental scanning mutagenesis. Almost 60% of experimental hot spot residues (with DeltaDeltaG > 2 kcal/mol) were localized to these conserved residue clusters. An analysis of the residue types that are enriched within these conserved subsets compared to the overall interface showed that hydrophobic and aromatic residues are favored, but charged residues (both positive and negative) are less common. The potential use of this method for discriminating binding sites (interfaces) versus random surface patches was explored by comparing the clustering of conserved residues within each of these regions--in about 50% cases the true interface is ranked among the top 10% of all surface patches. CONCLUSIONS Protein-protein interaction sites are much larger than small molecule biding sites, but still conserved residues are not randomly distributed over the whole interface and are distinctly clustered. The clustered nature of evolutionarily conserved residues within interfaces as compared to those within other surface patches not involved in binding has important implications for the identification of protein-protein binding sites and would have applications in docking studies.
Collapse
Affiliation(s)
- Mainak Guharoy
- Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata, India
| | | |
Collapse
|
44
|
Kundrotas PJ, Vakser IA. Accuracy of protein-protein binding sites in high-throughput template-based modeling. PLoS Comput Biol 2010; 6:e1000727. [PMID: 20369011 PMCID: PMC2848539 DOI: 10.1371/journal.pcbi.1000727] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2009] [Accepted: 03/01/2010] [Indexed: 11/18/2022] Open
Abstract
The accuracy of protein structures, particularly their binding sites, is essential for the success of modeling protein complexes. Computationally inexpensive methodology is required for genome-wide modeling of such structures. For systematic evaluation of potential accuracy in high-throughput modeling of binding sites, a statistical analysis of target-template sequence alignments was performed for a representative set of protein complexes. For most of the complexes, alignments containing all residues of the interface were found. The full interface alignments were obtained even in the case of poor alignments where a relatively small part of the target sequence (as low as 40%) aligned to the template sequence, with a low overall alignment identity (<30%). Although such poor overall alignments might be considered inadequate for modeling of whole proteins, the alignment of the interfaces was strong enough for docking. In the set of homology models built on these alignments, one third of those ranked 1 by a simple sequence identity criteria had RMSD<5 Å, the accuracy suitable for low-resolution template free docking. Such models corresponded to multi-domain target proteins, whereas for single-domain proteins the best models had 5 Å<RMSD<10 Å, the accuracy suitable for less sensitive structure-alignment methods. Overall, ∼50% of complexes with the interfaces modeled by high-throughput techniques had accuracy suitable for meaningful docking experiments. This percentage will grow with the increasing availability of co-crystallized protein-protein complexes. Protein-protein interactions play a central role in life processes at the molecular level. The structural information on these interactions is essential for our understanding of these processes and our ability to design drugs to cure diseases. Limitations of experimental techniques to determine the structure of protein-protein complexes leave the vast majority of these complexes to be determined by computational modeling. The modeling is also important for revealing the mechanisms of the complex formation. The 3D modeling of protein complexes (protein docking) relies on the structure of the individual proteins for the prediction of their assembly. Thus the structural accuracy of the individual proteins, which often are models themselves, is critical for the docking. For the docking purposes, the accuracy of the binding sites is obviously essential, whereas the accuracy of the non-binding regions is less critical. In our study, we systematically analyze the accuracy of the binding sites in protein models produced by high-throughput techniques suitable for large-scale (e.g., genome-wide) studies. The results indicate that this accuracy is adequate for the low- to medium-resolution docking of a significant part of known protein-protein complexes.
Collapse
Affiliation(s)
- Petras J. Kundrotas
- Center for Bioinformatics and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, United States of America
| | - Ilya A. Vakser
- Center for Bioinformatics and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail: .
| |
Collapse
|
45
|
Xu Y, Tillier ERM. Regional covariation and its application for predicting protein contact patches. Proteins 2010; 78:548-58. [PMID: 19768681 DOI: 10.1002/prot.22576] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Correlated mutation analysis (CMA) is an effective approach for predicting functional and structural residue interactions from multiple sequence alignments (MSAs) of proteins. As nearby residues may also play a role in a given functional interaction, we were interested in seeing whether covarying sites were clustered, and whether this could be used to enhance the predictive power of CMA. A large-scale search for coevolving regions within protein domains revealed that if two sites in a MSA covary, then neighboring sites in the alignment also typically covary, resulting in clusters of covarying residues. The program PatchD(http://www.uhnres.utoronto.ca/labs/tillier/) was developed to measure the covariation between disconnected sequence clusters to reveal patch covariation. Patches that exhibit strong covariation identify multiple residues that are generally nearby in the protein structure, suggesting that the detection of covarying patches can be used in conjunction with traditional CMA approaches to reveal functional interaction partners.
Collapse
Affiliation(s)
- Yongbai Xu
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | | |
Collapse
|
46
|
Tripathi A, Kellogg GE. A novel and efficient tool for locating and characterizing protein cavities and binding sites. Proteins 2010; 78:825-42. [PMID: 19847777 DOI: 10.1002/prot.22608] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Systematic investigation of a protein and its binding site characteristics are crucial for designing small molecules that modulate protein functions. However, fundamental uncertainties in binding site interactions and insufficient knowledge of the properties of even well-defined binding pockets can make it difficult to design optimal drugs. Herein, we report the development and implementation of a cavity detection algorithm built with HINT toolkit functions that we are naming Vectorial Identification of Cavity Extents (VICE). This very efficient algorithm is based on geometric criteria applied to simple integer grid maps. In testing, we carried out a systematic investigation on a very diverse data set of proteins and protein-protein/protein-polynucleotide complexes for locating and characterizing the indentations, cavities, pockets, grooves, channels, and surface regions. Additionally, we evaluated a curated data set of unbound proteins for which a ligand-bound protein structures are also known; here the VICE algorithm located the actual ligand in the largest cavity in 83% of the cases and in one of the three largest in 90% of the cases. An interactive front-end provides a quick and simple procedure for locating, displaying and manipulating cavities in these structures. Information describing the cavity, including its volume and surface area metrics, and lists of atoms, residues, and/or chains lining the binding pocket, can be easily obtained and analyzed. For example, the relative cross-sectional surface area (to total surface area) of cavity openings in well-enclosed cavities is 0.06 +/- 0.04 and in surface clefts or crevices is 0.25 +/- 0.09. Proteins 2010. (c) 2009 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Ashutosh Tripathi
- Department of Medicinal Chemistry and Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, Virginia 23298-0540, USA
| | | |
Collapse
|
47
|
Sankararaman S, Sha F, Kirsch JF, Jordan MI, Sjölander K. Active site prediction using evolutionary and structural information. ACTA ACUST UNITED AC 2010; 26:617-24. [PMID: 20080507 PMCID: PMC2828116 DOI: 10.1093/bioinformatics/btq008] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites. Results: In cross-validation experiments on two benchmark datasets from the Catalytic Site Atlas and CATRES resources containing a total of 437 manually curated enzymes spanning 487 SCOP families, Discern increases catalytic site recall between 12% and 20% over methods that combine information from both sequence and structure, and by ≥50% over methods that make use of sequence conservation signal only. Controlled experiments show that Discern's improvement in catalytic residue prediction is derived from the combination of three ingredients: the use of the INTREPID phylogenomic method to extract conservation information; the use of 3D structure data, including features computed for residues that are proximal in the structure; and a statistical regularization procedure to prevent overfitting. Contact:kimmen@berkeley.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
48
|
Panjkovich A, Aloy P. Predicting protein–protein interaction specificity through the integration of three-dimensional structural information and the evolutionary record of protein domains. MOLECULAR BIOSYSTEMS 2010; 6:741. [DOI: 10.1039/b918395g] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
|
49
|
D’Abramo M, Meyer T, Bernadó P, Pons C, Recio JF, Orozco M. On the Use of low-resolution Data to Improve Structure Prediction of Proteins and Protein Complexes. J Chem Theory Comput 2009; 5:3129-37. [DOI: 10.1021/ct900305m] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Marco D’Abramo
- Molecular Modeling and Bioinformatics Unit, IRB-BSC Joint Research Program in Computational Biology, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain and Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Structural and Computational Biology Program, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain, Life Sciences Department, Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Departament de
| | - Tim Meyer
- Molecular Modeling and Bioinformatics Unit, IRB-BSC Joint Research Program in Computational Biology, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain and Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Structural and Computational Biology Program, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain, Life Sciences Department, Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Departament de
| | - Pau Bernadó
- Molecular Modeling and Bioinformatics Unit, IRB-BSC Joint Research Program in Computational Biology, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain and Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Structural and Computational Biology Program, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain, Life Sciences Department, Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Departament de
| | - Carles Pons
- Molecular Modeling and Bioinformatics Unit, IRB-BSC Joint Research Program in Computational Biology, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain and Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Structural and Computational Biology Program, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain, Life Sciences Department, Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Departament de
| | - Juan Fernández Recio
- Molecular Modeling and Bioinformatics Unit, IRB-BSC Joint Research Program in Computational Biology, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain and Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Structural and Computational Biology Program, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain, Life Sciences Department, Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Departament de
| | - Modesto Orozco
- Molecular Modeling and Bioinformatics Unit, IRB-BSC Joint Research Program in Computational Biology, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain and Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Structural and Computational Biology Program, Institute for Research in Biomedicine Josep Samitier 1-5, Barcelona 08028, Spain, Life Sciences Department, Barcelona Supercomputing Center, Jordi Girona 29, Barcelona 08034, Spain, Departament de
| |
Collapse
|
50
|
Alterovitz R, Arvey A, Sankararaman S, Dallett C, Freund Y, Sjölander K. ResBoost: characterizing and predicting catalytic residues in enzymes. BMC Bioinformatics 2009; 10:197. [PMID: 19558703 PMCID: PMC2713229 DOI: 10.1186/1471-2105-10-197] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2008] [Accepted: 06/27/2009] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed. RESULTS We propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA). CONCLUSION ResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.
Collapse
Affiliation(s)
- Ron Alterovitz
- Department of Computer Science, University of North Carolina at Chapel Hill, USA
| | - Aaron Arvey
- Department of Computer Science and Engineering, University of California, San Diego, USA
| | - Sriram Sankararaman
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA
| | - Carolina Dallett
- Department of Bioengineering, University of California, Berkeley, USA
| | - Yoav Freund
- Department of Computer Science and Engineering, University of California, San Diego, USA
| | - Kimmen Sjölander
- Department of Bioengineering, University of California, Berkeley, USA
| |
Collapse
|