1
|
Bakare OO, Keyster M, Pretorius A. Identification of biomarkers for the accurate and sensitive diagnosis of three bacterial pneumonia pathogens using in silico approaches. BMC Mol Cell Biol 2020; 21:82. [PMID: 33218302 PMCID: PMC7678116 DOI: 10.1186/s12860-020-00328-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 11/09/2020] [Indexed: 11/10/2022] Open
Abstract
Background Pneumonia ranks as one of the main infectious sources of mortality among kids under 5 years of age, killing 2500 a day; late research has additionally demonstrated that mortality is higher in the elderly. A few biomarkers, which up to this point have been distinguished for its determination lack specificity, as these biomarkers fail to build up a differentiation between pneumonia and other related diseases, for example, pulmonary tuberculosis and Human Immunodeficiency Infection (HIV). There is an inclusive global consensus of an improved comprehension of the utilization of new biomarkers, which are delivered in light of pneumonia infection for precision identification to defeat these previously mentioned constraints. Antimicrobial peptides (AMPs) have been demonstrated to be promising remedial specialists against numerous illnesses. This research work sought to identify AMPs as biomarkers for three bacterial pneumonia pathogens such as Streptococcus pneumoniae, Klebsiella pneumoniae, Acinetobacter baumannii using in silico technology. Hidden Markov Models (HMMER) was used to identify putative anti-bacterial pneumonia AMPs against the identified receptor proteins of Streptococcus pneumoniae, Klebsiella pneumoniae, and Acinetobacter baumannii. The physicochemical parameters of these putative AMPs were computed and their 3-D structures were predicted using I-TASSER. These AMPs were subsequently subjected to docking interaction analysis against the identified bacterial pneumonia pathogen proteins using PATCHDOCK. Results The in silico results showed 18 antibacterial AMPs which were ranked based on their E values with significant physicochemical parameters in conformity with known experimentally validated AMPs. The AMPs also bound the pneumonia receptors of their respective pathogens sensitively at the extracellular regions. Conclusions The propensity of these AMPs to bind pneumonia pathogens proteins justifies that they would be potential applicant biomarkers for the recognizable detection of these bacterial pathogens in a point-of-care POC pneumonia diagnostics. The high sensitivity, accuracy, and specificity of the AMPs likewise justify the utilization of HMMER in the design and discovery of AMPs for disease diagnostics and therapeutics.
Collapse
Affiliation(s)
- Olalekan Olanrewaju Bakare
- Bioinformatics Research Group, Biotechnology Department, University of the Western Cape, Cape Town, 7535, South Africa. .,Environmental Biotechnology Laboratory, Biotechnology Department, University of the Western Cape, Cape Town, 7535, South Africa.
| | - Marshall Keyster
- Environmental Biotechnology Laboratory, Biotechnology Department, University of the Western Cape, Cape Town, 7535, South Africa
| | - Ashley Pretorius
- Bioinformatics Research Group, Biotechnology Department, University of the Western Cape, Cape Town, 7535, South Africa
| |
Collapse
|
2
|
Brylinski M, Skolnick J. FINDSITE-metal: integrating evolutionary information and machine learning for structure-based metal-binding site prediction at the proteome level. Proteins 2010; 79:735-51. [PMID: 21287609 DOI: 10.1002/prot.22913] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Revised: 09/27/2010] [Accepted: 10/07/2010] [Indexed: 12/13/2022]
Abstract
The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure-based approaches showing considerable promise. In this article, we present FINDSITE-metal, a new threading-based method designed specifically to detect metal-binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE-metal. Combining structure/evolutionary information with machine learning results in highly accurate metal-binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal-binding sites are detected with the best predicted binding site at rank 1 and within the top two ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium, and magnesium ions, the binding metal can be predicted with high, typically 70% to 90%, accuracy. FINDSITE-metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome-wide application of FINDSITE-metal that quantifies the metal-binding complement of the human proteome. FINDSITE-metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite-metal/.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
3
|
Docking of calcium ions in proteins with flexible side chains and deformable backbones. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2009; 39:825-38. [PMID: 19937325 DOI: 10.1007/s00249-009-0561-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2009] [Revised: 10/20/2009] [Accepted: 10/23/2009] [Indexed: 10/20/2022]
Abstract
A method of docking Ca(2+) ions in proteins with flexible side chains and deformable backbones is proposed. The energy was calculated with the AMBER force field, implicit solvent, and solvent exposure-dependent and distance-dependent dielectric function. Starting structures were generated with Ca(2+) coordinates and side-chain torsions sampled in 1000 A(3) cubes centered at the experimental Ca(2+) positions. The energy was Monte Carlo-minimized. The method was tested on fourteen Ca(2+)-binding sites. For twelve Ca(2+)-binding sites the root mean square (RMS) deviation of the apparent global minimum from the experimental structure was below 1.3 and 1.7 A for Ca(2+) ions and side-chain heavy atoms, respectively. Energies of multiple local minima correlate with the RMS deviations from the X-ray structures. Two Ca(2+)-binding sites at the surface of proteinase K were not predicted, because of underestimation of Ca(2+) hydration energy by the implicit-solvent method.
Collapse
|
4
|
Skolnick J, Brylinski M. FINDSITE: a combined evolution/structure-based approach to protein function prediction. Brief Bioinform 2009; 10:378-91. [PMID: 19324930 DOI: 10.1093/bib/bbp017] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A key challenge of the post-genomic era is the identification of the function(s) of all the molecules in a given organism. Here, we review the status of sequence and structure-based approaches to protein function inference and ligand screening that can provide functional insights for a significant fraction of the approximately 50% of ORFs of unassigned function in an average proteome. We then describe FINDSITE, a recently developed algorithm for ligand binding site prediction, ligand screening and molecular function prediction, which is based on binding site conservation across evolutionary distant proteins identified by threading. Importantly, FINDSITE gives comparable results when high-resolution experimental structures as well as predicted protein models are used.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology 250 14th St NW, Atlanta, GA 30318, USA.
| | | |
Collapse
|
5
|
Halperin I, Glazer DS, Wu S, Altman RB. The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications. BMC Genomics 2008; 9 Suppl 2:S2. [PMID: 18831785 PMCID: PMC2559884 DOI: 10.1186/1471-2164-9-s2-s2] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Structural genomics efforts contribute new protein structures that often lack significant sequence and fold similarity to known proteins. Traditional sequence and structure-based methods may not be sufficient to annotate the molecular functions of these structures. Techniques that combine structural and functional modeling can be valuable for functional annotation. FEATURE is a flexible framework for modeling and recognition of functional sites in macromolecular structures. Here, we present an overview of the main components of the FEATURE framework, and describe the recent developments in its use. These include automating training sets selection to increase functional coverage, coupling FEATURE to structural diversity generating methods such as molecular dynamics simulations and loop modeling methods to improve performance, and using FEATURE in large-scale modeling and structure determination efforts.
Collapse
Affiliation(s)
- Inbal Halperin
- Department of Genetics, 318 Campus Drive, Clark Center S240, Stanford, CA 94305, USA.
| | | | | | | |
Collapse
|
6
|
Abstract
Metals play a variety of roles in biological processes, and hence their presence in a protein structure can yield vital functional information. Because the residues that coordinate a metal often undergo conformational changes upon binding, detection of binding sites based on simple geometric criteria in proteins without bound metal is difficult. However, aspects of the physicochemical environment around a metal binding site are often conserved even when this structural rearrangement occurs. We have developed a Bayesian classifier using known zinc binding sites as positive training examples and nonmetal binding regions that nonetheless contain residues frequently observed in zinc sites as negative training examples. In order to allow variation in the exact positions of atoms, we average a variety of biochemical and biophysical properties in six concentric spherical shells around the site of interest. At a specificity of 99.8%, this method achieves 75.5% sensitivity in unbound proteins at a positive predictive value of 73.6%. We also test its accuracy on predicted protein structures obtained by homology modeling using templates with 30%-50% sequence identity to the target sequences. At a specificity of 99.8%, we correctly identify at least one zinc binding site in 65.5% of modeled proteins. Thus, in many cases, our model is accurate enough to identify metal binding sites in proteins of unknown structure for which no high sequence identity homologs of known structure exist. Both the source code and a Web interface are available to the public at http://feature.stanford.edu/metals.
Collapse
Affiliation(s)
- Jessica C Ebert
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | | |
Collapse
|
7
|
Rossi KA, Weigelt CA, Nayeem A, Krystek SR. Loopholes and missing links in protein modeling. Protein Sci 2007; 16:1999-2012. [PMID: 17660258 PMCID: PMC2206982 DOI: 10.1110/ps.072887807] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Revised: 06/08/2007] [Accepted: 06/09/2007] [Indexed: 10/23/2022]
Abstract
This paper provides an unbiased comparison of four commercially available programs for loop sampling, Prime, Modeler, ICM, and Sybyl, each of which uses a different modeling protocol. The study assesses the quality of results and examines the relative strengths and weaknesses of each method. The set of loops to be modeled varied in length from 4-12 amino acids. The approaches used for loop modeling can be classified into two methodologies: ab initio loop generation (Modeler and Prime) and database searches (Sybyl and ICM). Comparison of the modeled loops to the native structures was used to determine the accuracy of each method. All of the protocols returned similar results for short loop lengths (four to six residues), but as loop length increased, the quality of the results varied among the programs. Prime generated loops with RMSDs <2.5 A for loops up to 10 residues, while the other three methods met the 2.5 A criteria at seven-residue loops. Additionally, the ability of the software to utilize disulfide bonds and X-ray crystal packing influenced the quality of the results. In the final analysis, the top-ranking loop from each program was rarely the loop with the lowest RMSD with respect to the native template, revealing a weakness in all programs to correctly rank the modeled loops.
Collapse
Affiliation(s)
- Karen A Rossi
- Computer-Assisted Drug Design, Pharmaceutical Research Institute, Bristol-Myers Squibb Company, Princeton, New Jersey 08543, USA.
| | | | | | | |
Collapse
|
8
|
Deng H, Chen G, Yang W, Yang JJ. Predicting calcium-binding sites in proteins - a graph theory and geometry approach. Proteins 2006; 64:34-42. [PMID: 16617426 DOI: 10.1002/prot.20973] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Identifying calcium-binding sites in proteins is one of the first steps towards predicting and understanding the role of calcium in biological systems for protein structure and function studies. Due to the complexity and irregularity of calcium-binding sites, a fast and accurate method for predicting and identifying calcium-binding protein is needed. Here we report our development of a new fast algorithm (GG) to detect calcium-binding sites. The GG algorithm uses a graph theory algorithm to find oxygen clusters of the protein and a geometric algorithm to identify the center of these clusters. A cluster of four or more oxygen atoms has a high potential for calcium binding. High performance with about 90% site sensitivity and 80% site selectivity has been obtained for three datasets containing a total of 123 proteins. The results suggest that a sphere of a certain size with four or more oxygen atoms on the surface and without other atoms inside is necessary and sufficient for quickly identifying the majority of the calcium-binding sites with high accuracy. Our finding opens a new avenue to visualize and analyze calcium-binding sites in proteins facilitating the prediction of functions from structural genomic information.
Collapse
Affiliation(s)
- Hai Deng
- Department of Computer Science, Georgia State University, Atlanta, Georgia 30302, USA
| | | | | | | |
Collapse
|
9
|
Ivanisenko VA, Pintus SS, Grigorovich DA, Kolchanov NA. PDBSite: a database of the 3D structure of protein functional sites. Nucleic Acids Res 2005; 33:D183-7. [PMID: 15608173 PMCID: PMC540059 DOI: 10.1093/nar/gki105] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The PDBSite database provides comprehensive structural and functional information on various protein sites (post-translational modification, catalytic active, organic and inorganic ligand binding, protein-protein, protein-DNA and protein-RNA interactions) in the Protein Data Bank (PDB). The PDBSite is available online at http://wwwmgs.bionet.nsc.ru/mgs/gnw/pdbsite/. It consists of functional sites extracted from PDB using the SITE records and of an additional set containing the protein interaction sites inferred from the contact residues in heterocomplexes. The PDBSite was set up by automated processing of the PDB. The PDBSite database can be queried through the functional description and the structural characteristics of the site and its environment. The PDBSite is integrated with the PDBSiteScan tool allowing structural comparisons of a protein against the functional sites. The PDBSite enables the recognition of functional sites in protein tertiary structures, providing annotation of function through structure. The PDBSite is updated after each new PDB release.
Collapse
Affiliation(s)
- Vladimir A Ivanisenko
- Institute of Cytology and Genetics SBRAS, Lavrentyev Avenue 10, Novosibirsk 630090, Russia.
| | | | | | | |
Collapse
|
10
|
Sodhi JS, Bryson K, McGuffin LJ, Ward JJ, Wernisch L, Jones DT. Predicting metal-binding site residues in low-resolution structural models. J Mol Biol 2004; 342:307-20. [PMID: 15313626 DOI: 10.1016/j.jmb.2004.07.019] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2004] [Revised: 07/06/2004] [Accepted: 07/08/2004] [Indexed: 11/26/2022]
Abstract
The accurate prediction of the biochemical function of a protein is becoming increasingly important, given the unprecedented growth of both structural and sequence databanks. Consequently, computational methods are required to analyse such data in an automated manner to ensure genomes are annotated accurately. Protein structure prediction methods, for example, are capable of generating approximate structural models on a genome-wide scale. However, the detection of functionally important regions in such crude models, as well as structural genomics targets, remains an extremely important problem. The method described in the current study, MetSite, represents a fully automatic approach for the detection of metal-binding residue clusters applicable to protein models of moderate quality. The method involves using sequence profile information in combination with approximate structural data. Several neural network classifiers are shown to be able to distinguish metal sites from non-sites with a mean accuracy of 94.5%. The method was demonstrated to identify metal-binding sites correctly in LiveBench targets where no obvious metal-binding sequence motifs were detectable using InterPro. Accurate detection of metal sites was shown to be feasible for low-resolution predicted structures generated using mGenTHREADER where no side-chain information was available. High-scoring predictions were observed for a recently solved hypothetical protein from Haemophilus influenzae, indicating a putative metal-binding site.
Collapse
Affiliation(s)
- Jaspreet Singh Sodhi
- Bioinformatics Unit, Department of Computer Science, University College London, Gower Street, WC1E 6BT, UK
| | | | | | | | | | | |
Collapse
|
11
|
Banatao DR, Altman RB, Klein TE. Microenvironment analysis and identification of magnesium binding sites in RNA. Nucleic Acids Res 2003; 31:4450-60. [PMID: 12888505 PMCID: PMC169872 DOI: 10.1093/nar/gkg471] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Interactions with magnesium (Mg2+) ions are essential for RNA folding and function. The locations and function of bound Mg2+ ions are difficult to characterize both experimentally and computationally. In particular, the P456 domain of the Tetrahymena thermophila group I intron, and a 58 nt 23s rRNA from Escherichia coli have been important systems for studying the role of Mg2+ binding in RNA, but characteristics of all the binding sites remain unclear. We therefore investigated the Mg2+ binding capabilities of these RNA systems using a computational approach to identify and further characterize their Mg2+ binding sites. The approach is based on the FEATURE algorithm, reported previously for microenvironment analysis of protein functional sites. We have determined novel physicochemical descriptions of site-bound and diffusely bound Mg2+ ions in RNA that are useful for prediction. Electrostatic calculations using the Non-Linear Poisson Boltzmann (NLPB) equation provided further evidence for the locations of site-bound ions. We confirmed the locations of experimentally determined sites and further differentiated between classes of ion binding. We also identified potentially important, high scoring sites in the group I intron that are not currently annotated as Mg2+ binding sites. We note their potential function and believe they deserve experimental follow-up.
Collapse
Affiliation(s)
- D Rey Banatao
- Department of Genetics and Stanford Medical Informatics, 251 Campus Drive, Stanford University, CA 94305, USA
| | | | | |
Collapse
|
12
|
Liang MP, Banatao DR, Klein TE, Brutlag DL, Altman RB. WebFEATURE: An interactive web tool for identifying and visualizing functional sites on macromolecular structures. Nucleic Acids Res 2003; 31:3324-7. [PMID: 12824318 PMCID: PMC168960 DOI: 10.1093/nar/gkg553] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
WebFEATURE (http://feature.stanford.edu/webfeature/) is a web-accessible structural analysis tool that allows users to scan query structures for functional sites in both proteins and nucleic acids. WebFEATURE is the public interface to the scanning algorithm of the FEATURE package, a supervised learning algorithm for creating and identifying 3D, physicochemical motifs in molecular structures. Given an input structure or Protein Data Bank identifier (PDB ID), and a statistical model of a functional site, WebFEATURE will return rank-scored 'hits' in 3D space that identify regions in the structure where similar distributions of physicochemical properties occur relative to the site model. Users can visualize and interactively manipulate scored hits and the query structure in web browsers that support the Chime plug-in. Alternatively, results can be downloaded and visualized through other freely available molecular modeling tools, like RasMol, PyMOL and Chimera. A major application of WebFEATURE is in rapid annotation of function to structures in the context of structural genomics.
Collapse
Affiliation(s)
- Mike P Liang
- Department of Genetics and Stanford Medical Informatics, 251 Campus Drive, Stanford University, Stanford, CA 94305, USA
| | | | | | | | | |
Collapse
|
13
|
Samudrala R, Levitt M. A comprehensive analysis of 40 blind protein structure predictions. BMC STRUCTURAL BIOLOGY 2002; 2:3. [PMID: 12150712 PMCID: PMC122083 DOI: 10.1186/1472-6807-2-3] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2002] [Accepted: 08/01/2002] [Indexed: 11/21/2022]
Abstract
BACKGROUND We thoroughly analyse the results of 40 blind predictions for which an experimental answer was made available at the fourth meeting on the critical assessment of protein structure methods (CASP4). Using our comparative modelling and fold recognition methodologies, we made 29 predictions for targets that had sequence identities ranging from 50% to 10% to the nearest related protein with known structure. Using our ab initio methodologies, we made eleven predictions for targets that had no detectable sequence relationships. RESULTS For 23 of these proteins, we produced models ranging from 1.0 to 6.0 A root mean square deviation (RMSD) for the Calpha atoms between the model and the corresponding experimental structure for all or large parts of the protein, with model accuracies scaling fairly linearly with respect to sequence identity (i.e., the higher the sequence identity, the better the prediction). We produced nine models with accuracies ranging from 4.0 to 6.0 A Calpha RMSD for 60-100 residue proteins (or large fragments of a protein), with a prediction accuracy of 4.0 A Calpha RMSD for residues 1-80 for T110/rbfa. CONCLUSIONS The areas of protein structure prediction that work well, and areas that need improvement, are discernable by examining how our methods have performed over the past four CASP experiments. These results have implications for modelling the structure of all tractable proteins encoded by the genome of an organism.
Collapse
Affiliation(s)
- Ram Samudrala
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| | - Michael Levitt
- Department of Structural Biology, Stanford University, School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
14
|
Yang W, Lee HW, Hellinga H, Yang JJ. Structural analysis, identification, and design of calcium-binding sites in proteins. Proteins 2002; 47:344-56. [PMID: 11948788 DOI: 10.1002/prot.10093] [Citation(s) in RCA: 98] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Assigning proteins with functions based on the 3-D structure requires high-speed techniques to make a systematic survey of protein structures. Calcium regulates many biological systems by binding numerous proteins in different biological environments. Despite the great diversity in the composition of ligand residues and bond angles and lengths of calcium-binding sites, our structural analysis of 11 calcium-binding sites in different classes of proteins has shown that common local structural parameters can be used to identify and design calcium-binding proteins. Natural calcium-binding sites in both EF-hand proteins and non-EF-hand proteins can be described with the smallest deviation from the geometry of an ideal pentagonal bipyramid. Further, two different magnesium-binding sites in parvalbumin and calbindin(D9K) can also be identified using an octahedral geometry. Using the established method, we have designed de novo calcium-binding sites into the scaffold of non-calcium-binding proteins CD2 and Rop. Our results suggest that it is possible to identify calcium- and magnesium-binding sites in proteins and design de novo metal-binding sites.
Collapse
Affiliation(s)
- Wei Yang
- Department of Biology Drug Design, Georgia State University, Atlanta, Georgia, USA
| | | | | | | |
Collapse
|
15
|
Zheng W, Doniach S. Protein structure prediction constrained by solution X-ray scattering data and structural homology identification. J Mol Biol 2002; 316:173-87. [PMID: 11829511 DOI: 10.1006/jmbi.2001.5324] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Here we perform a systematic exploration of the use of distance constraints derived from small angle X-ray scattering (SAXS) measurements to filter candidate protein structures for the purpose of protein structure prediction. This is an intrinsically more complex task than that of applying distance constraints derived from NMR data where the identity of the pair of amino acid residues subject to a given distance constraint is known. SAXS, on the other hand, yields a histogram of pair distances (pair distribution function), but the identities of the pairs contributing to a given bin of the histogram are not known. Our study is based on an extension of the Levitt-Hinds coarse grained approach to ab initio protein structure prediction to generate a candidate set of C(alpha) backbones. In spite of the lack of specific residue information inherent in the SAXS data, our study shows that the implementation of a SAXS filter is capable of effectively purifying the set of native structure candidates and thus provides a substantial improvement in the reliability of protein structure prediction. We test the quality of our predicted C(alpha) backbones by doing structural homology searches against the Dali domain library, and find that the results are very encouraging. In spite of the lack of local structural details and limited modeling accuracy at the C(alpha) backbone level, we find that useful information about fold classification can be extracted from this procedure. This approach thus provides a way to use a SAXS data based structure prediction algorithm to generate potential structural homologies in cases where lack of sequence homology prevents identification of candidate folds for a given protein. Thus our approach has the potential to help in determination of the biological function of a protein based on structural homology instead of sequence homology.
Collapse
Affiliation(s)
- Wenjun Zheng
- Departments of Physics, Stanford University, CA 94305, USA
| | | |
Collapse
|
16
|
de la Cruz X, Sillitoe I, Orengo C. Use of structure comparison methods for the refinement of protein structure predictions. I. Identifying the structural family of a protein from low-resolution models. Proteins 2002; 46:72-84. [PMID: 11746704 DOI: 10.1002/prot.10002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Predicting the three-dimensional structure of proteins is still one of the most challenging problems in molecular biology. Despite its difficulty, several investigators have started to produce consistently low-resolution predictions for small proteins. However, in most of these cases, the prediction accuracy is still too low to make them useful. In the present article, we address the problem of obtaining better-quality predictions, starting from low-resolution models. To this end, we have devised a new procedure that uses these models, together with structure comparison methods, to identify the structural family of the target protein. This would allow, in a second step not described in the present work, to refine the predictions using conserved features of the identified family. In our approach, the structure database is investigated using predictions, at different accuracy levels, for a given protein. As query structures, we used both low-resolution versions of the native structures, as well as different sets of low accuracy predictions. In general, we found that for predictions with a resolution of > or =5-7 A, structure comparison methods were able to identify the fold of a protein in the top positions.
Collapse
Affiliation(s)
- Xavier de la Cruz
- Departmento de Bioquímica y Biología Molecular Facultad de Químicas; Universidad de Barcelona, Barcelona, Spain.
| | | | | |
Collapse
|
17
|
Lindauer K, Loerting T, Liedl KR, Kroemer RT. Prediction of the structure of human Janus kinase 2 (JAK2) comprising the two carboxy-terminal domains reveals a mechanism for autoregulation. PROTEIN ENGINEERING 2001; 14:27-37. [PMID: 11287676 DOI: 10.1093/protein/14.1.27] [Citation(s) in RCA: 127] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
The structure of human Janus kinase 2 (JAK2) comprising the two C-terminal domains (JH1 and JH2) was predicted by application of homology modelling techniques. JH1 and JH2 represent the tyrosine kinase and tyrosine kinase-like domains, respectively, and are crucial for function and regulation of the protein. A comparison between the structures of the two domains is made and structural differences are highlighted. Prediction of the relative orientation of JH1 and JH2 was aided by a newly developed method for the detection of correlated amino acid mutations. Analysis of the interactions between the two domains led to a model for the regulatory effect of JH2 on JH1. The predictions are consistent with available experimental data on JAK2 or related proteins and provide an explanation for inhibition of JH1 tyrosine kinase activity by the adjacent JH2 domain.
Collapse
Affiliation(s)
- K Lindauer
- Department of Chemistry, Queen Mary and Westfield College, University of London, Mile End Road, London E1 4NS, UK
| | | | | | | |
Collapse
|
18
|
Feig M, Rotkiewicz P, Kolinski A, Skolnick J, Brooks CL. Accurate reconstruction of all-atom protein representations from side-chain-based low-resolution models. Proteins 2000; 41:86-97. [PMID: 10944396 DOI: 10.1002/1097-0134(20001001)41:1<86::aid-prot110>3.0.co;2-y] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A procedure for the reconstruction of all-atom protein structures from side-chain center-based low-resolution models is introduced and applied to a set of test proteins with high-resolution X-ray structures. The accuracy of the rebuilt all-atom models is measured by root mean square deviations to the corresponding X-ray structures and percentages of correct chi(1) and chi(2) side-chain dihedrals. The benefit of including C(alpha) positions in the low-resolution model is examined, and the effect of lattice-based models on the reconstruction accuracy is discussed. Programs and scripts implementing the reconstruction procedure are made available through the NIH research resource for Multiscale Modeling Tools in Structural Biology (http://mmtsb.scripps.edu).
Collapse
Affiliation(s)
- M Feig
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | | | | | | | |
Collapse
|
19
|
Abstract
Comparative protein structure prediction is limited mostly by the errors in alignment and loop modeling. We describe here a new automated modeling technique that significantly improves the accuracy of loop predictions in protein structures. The positions of all nonhydrogen atoms of the loop are optimized in a fixed environment with respect to a pseudo energy function. The energy is a sum of many spatial restraints that include the bond length, bond angle, and improper dihedral angle terms from the CHARMM-22 force field, statistical preferences for the main-chain and side-chain dihedral angles, and statistical preferences for nonbonded atomic contacts that depend on the two atom types, their distance through space, and separation in sequence. The energy function is optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the predicted loop conformation corresponds to the lowest energy conformation among 500 independent optimizations. Predictions were made for 40 loops of known structure at each length from 1 to 14 residues. The accuracy of loop predictions is evaluated as a function of thoroughness of conformational sampling, loop length, and structural properties of native loops. When accuracy is measured by local superposition of the model on the native loop, 100, 90, and 30% of 4-, 8-, and 12-residue loop predictions, respectively, had <2 A RMSD error for the mainchain N, C(alpha), C, and O atoms; the average accuracies were 0.59 +/- 0.05, 1.16 +/- 0.10, and 2.61 +/- 0.16 A, respectively. To simulate real comparative modeling problems, the method was also evaluated by predicting loops of known structure in only approximately correct environments with errors typical of comparative modeling without misalignment. When the RMSD distortion of the main-chain stem atoms is 2.5 A, the average loop prediction error increased by 180, 25, and 3% for 4-, 8-, and 12-residue loops, respectively. The accuracy of the lowest energy prediction for a given loop can be estimated from the structural variability among a number of low energy predictions. The relative value of the present method is gauged by (1) comparing it with one of the most successful previously described methods, and (2) describing its accuracy in recent blind predictions of protein structure. Finally, it is shown that the average accuracy of prediction is limited primarily by the accuracy of the energy function rather than by the extent of conformational sampling.
Collapse
Affiliation(s)
- A Fiser
- Laboratory of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, New York, New York 10021, USA.
| | | | | |
Collapse
|
20
|
Samudrala R, Levitt M. Decoys 'R' Us: a database of incorrect conformations to improve protein structure prediction. Protein Sci 2000; 9:1399-401. [PMID: 10933507 PMCID: PMC2144680 DOI: 10.1110/ps.9.7.1399] [Citation(s) in RCA: 186] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The development of an energy or scoring function for protein structure prediction is greatly enhanced by testing the function on a set of computer-generated conformations (decoys) to determine whether it can readily distinguish native-like conformations from nonnative ones. We have created "Decoys 'R' Us," a database containing many such sets of conformations, to provide a resource that allows scoring functions to be improved.
Collapse
Affiliation(s)
- R Samudrala
- Department of Structural Biology, Stanford University School of Medicine, California 94305, USA.
| | | |
Collapse
|
21
|
Xia Y, Huang ES, Levitt M, Samudrala R. Ab initio construction of protein tertiary structures using a hierarchical approach. J Mol Biol 2000; 300:171-85. [PMID: 10864507 DOI: 10.1006/jmbi.2000.3835] [Citation(s) in RCA: 141] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We present a hierarchical method to predict protein tertiary structure models from sequence. We start with complete enumeration of conformations using a simple tetrahedral lattice model. We then build conformations with increasing detail, and at each step select a subset of conformations using empirical energy functions with increasing complexity. After enumeration on lattice, we select a subset of low energy conformations using a statistical residue-residue contact energy function, and generate all-atom models using predicted secondary structure. A combined knowledge-based atomic level energy function is then used to select subsets of the all-atom models. The final predictions are generated using a consensus distance geometry procedure. We test the feasibility of the procedure on a set of 12 small proteins covering a wide range of protein topologies. A rigorous double-blind test of our method was made under the auspices of the CASP3 experiment, where we did ab initio structure predictions for 12 proteins using this approach. The performance of our methodology at CASP3 is reasonably good and completely consistent with our initial tests.
Collapse
Affiliation(s)
- Y Xia
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | | | | | | |
Collapse
|
22
|
Abstract
The sequencing of the human genome and numerous pathogen genomes has resulted in an explosion of potential drug targets. These targets represent both an unprecedented opportunity and a technological challenge for the pharmaceutical industry. A new strategy is required to initiate small-molecule drug discovery with sets of incompletely characterized, disease-associated proteins. One such strategy is the early application of combinatorial chemistry and other technologies to the discovery of bioactive small-molecule ligands that act on candidate drug targets. Therapeutically active ligands serve to concurrently validate a target and provide lead structures for downstream drug development, thereby accelerating the drug discovery process.
Collapse
Affiliation(s)
- GR Lenz
- NeoGenesis, 840 Memorial Drive, Cambridge, MA 02139, USA
| | | | | |
Collapse
|