1
|
Methods for the Refinement of Protein Structure 3D Models. Int J Mol Sci 2019; 20:ijms20092301. [PMID: 31075942 PMCID: PMC6539982 DOI: 10.3390/ijms20092301] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 04/24/2019] [Accepted: 05/07/2019] [Indexed: 12/25/2022] Open
Abstract
The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
Collapse
|
2
|
Fukushima M. Constructing failure in big biology: The socio-technical anatomy of Japan's Protein 3000 Project. SOCIAL STUDIES OF SCIENCE 2016; 46:7-33. [PMID: 26983170 DOI: 10.1177/0306312715612146] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This study focuses on the 5-year Protein 3000 Project launched in 2002, the largest biological project in Japan. The project aimed to overcome Japan's alleged failure to contribute fully to the Human Genome Project, by determining 3000 protein structures, 30 percent of the global target. Despite its achievement of this goal, the project was fiercely criticized in various sectors of society and was often branded an awkward failure. This article tries to solve the mystery of why such failure discourse was prevalent. Three explanatory factors are offered: first, because some goals were excluded during project development, there was a dynamic of failed expectations; second, structural genomics, while promoting collaboration with the international community, became an 'anti-boundary object', only the absence of which bound heterogeneous domestic actors; third, there developed an urgent sense of international competition in order to obtain patents on such structural information.
Collapse
|
3
|
Skolnick J, Brylinski M. FINDSITE: a combined evolution/structure-based approach to protein function prediction. Brief Bioinform 2009; 10:378-91. [PMID: 19324930 DOI: 10.1093/bib/bbp017] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A key challenge of the post-genomic era is the identification of the function(s) of all the molecules in a given organism. Here, we review the status of sequence and structure-based approaches to protein function inference and ligand screening that can provide functional insights for a significant fraction of the approximately 50% of ORFs of unassigned function in an average proteome. We then describe FINDSITE, a recently developed algorithm for ligand binding site prediction, ligand screening and molecular function prediction, which is based on binding site conservation across evolutionary distant proteins identified by threading. Importantly, FINDSITE gives comparable results when high-resolution experimental structures as well as predicted protein models are used.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology 250 14th St NW, Atlanta, GA 30318, USA.
| | | |
Collapse
|
4
|
Nair R, Liu J, Soong TT, Acton TB, Everett JK, Kouranov A, Fiser A, Godzik A, Jaroszewski L, Orengo C, Montelione GT, Rost B. Structural genomics is the largest contributor of novel structural leverage. ACTA ACUST UNITED AC 2009; 10:181-91. [PMID: 19194785 PMCID: PMC2705706 DOI: 10.1007/s10969-008-9055-6] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2008] [Accepted: 12/08/2008] [Indexed: 11/28/2022]
Abstract
The Protein Structural Initiative (PSI) at the US National Institutes of Health (NIH) is funding four large-scale centers for structural genomics (SG). These centers systematically target many large families without structural coverage, as well as very large families with inadequate structural coverage. Here, we report a few simple metrics that demonstrate how successfully these efforts optimize structural coverage: while the PSI-2 (2005-now) contributed more than 8% of all structures deposited into the PDB, it contributed over 20% of all novel structures (i.e. structures for protein sequences with no structural representative in the PDB on the date of deposition). The structural coverage of the protein universe represented by today’s UniProt (v12.8) has increased linearly from 1992 to 2008; structural genomics has contributed significantly to the maintenance of this growth rate. Success in increasing novel leverage (defined in Liu et al. in Nat Biotechnol 25:849–851, 2007) has resulted from systematic targeting of large families. PSI’s per structure contribution to novel leverage was over 4-fold higher than that for non-PSI structural biology efforts during the past 8 years. If the success of the PSI continues, it may just take another ~15 years to cover most sequences in the current UniProt database.
Collapse
Affiliation(s)
- Rajesh Nair
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Gileadi O, Burgess-Brown NA, Colebrook SM, Berridge G, Savitsky P, Smee CEA, Loppnau P, Johansson C, Salah E, Pantic NH. High throughput production of recombinant human proteins for crystallography. Methods Mol Biol 2008; 426:221-246. [PMID: 18542867 DOI: 10.1007/978-1-60327-058-8_14] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This chapter presents in detail the process used in high throughput bacterial production of recombinant human proteins for crystal structure determination. The core principles are: (1) Generating at least 10 truncated constructs from each target gene. (2) Ligation-independent cloning (LIC) into a bacterial expression vector. All proteins are expressed with an N-terminal, TEV protease cleavable fusion peptide. (3) Small-scale test expression to identify constructs producing soluble protein. (4) Liter-scale production in shaker flasks. (5) Purification by Ni-affinity chromatography and gel filtration. (6) Protein characterization and preparation for crystallography. The chapter also briefly presents alternative procedures, to be applied based on specific knowledge of protein families or when the core protocol is unsatisfactory. This scheme has been applied to more than 550 human proteins (>10,000 constructs) and has resulted in the deposition of 112 unique structures. The methods presented do not depend on specialized equipment or robotics; hence, they provide an effective approach for handling individual proteins in a regular research lab.
Collapse
Affiliation(s)
- Opher Gileadi
- The Structural Genomics Consortium, Botnar Research Centre, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Stumpff-Kane AW, Maksimiak K, Lee MS, Feig M. Sampling of near-native protein conformations during protein structure refinement using a coarse-grained model, normal modes, and molecular dynamics simulations. Proteins 2007; 70:1345-56. [PMID: 17876825 DOI: 10.1002/prot.21674] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Protein structure refinement from comparative models with the goal of predicting structures at near-experimental accuracy remains an unsolved problem. Structure refinement might be achieved with an iterative protocol where the most native-like structure from a set of decoys generated from an initial model in one cycle is used as the starting structure for the next cycle. Conformational sampling based on the coarse-grained SICHO model, atomic level of detail molecular dynamics simulations, and normal-mode analysis is compared in the context of such a protocol. All of the sampling methods can achieve significant refinement close to experimental structures, although the distribution of structures and the ability to reach native-like structures differs greatly. Implications for the practical application of such sampling methods and the requirements for scoring functions in an iterative refinement protocol are analyzed in the context of theoretical predictions for the distribution of protein-like conformations with a random sampling protocol.
Collapse
Affiliation(s)
- Andrew W Stumpff-Kane
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824-1319, USA
| | | | | | | |
Collapse
|
7
|
Bernadó P, Mylonas E, Petoukhov MV, Blackledge M, Svergun DI. Structural characterization of flexible proteins using small-angle X-ray scattering. J Am Chem Soc 2007; 129:5656-64. [PMID: 17411046 DOI: 10.1021/ja069124n] [Citation(s) in RCA: 914] [Impact Index Per Article: 53.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Structural analysis of flexible macromolecular systems such as intrinsically disordered or multidomain proteins with flexible linkers is a difficult task as high-resolution techniques are barely applicable. A new approach, ensemble optimization method (EOM), is proposed to quantitatively characterize flexible proteins in solution using small-angle X-ray scattering (SAXS). The flexibility is taken into account by allowing for the coexistence of different conformations of the protein contributing to the experimental scattering pattern. These conformers are selected using a genetic algorithm from a pool containing a large number of randomly generated models covering the protein configurational space. Quantitative criteria are developed to analyze the EOM selected models and to determine the optimum number of conformers in the ensemble. Simultaneous fitting of multiple scattering patterns from deletion mutants, if available, provides yet more detailed local information about the structure. The efficiency of EOM is demonstrated in model and practical examples on completely or partially unfolded proteins and on multidomain proteins interconnected by linkers. In the latter case, EOM is able to distinguish between rigid and flexible proteins and to directly assess the interdomain contacts.
Collapse
Affiliation(s)
- Pau Bernadó
- European Molecular Biology Laboratory, Hamburg Outstation, 22603 Hamburg, Germany.
| | | | | | | | | |
Collapse
|
8
|
Mueller M, Martens L, Apweiler R. Annotating the human proteome: Beyond establishing a parts list. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2007; 1774:175-91. [PMID: 17223395 DOI: 10.1016/j.bbapap.2006.11.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2006] [Revised: 11/16/2006] [Accepted: 11/21/2006] [Indexed: 12/31/2022]
Abstract
The completion of the human genome has shifted the attention from deciphering the sequence to the identification and characterisation of the functional components, including genes. Improved gene prediction algorithms, together with the existing transcript and protein information, have enabled the identification of most exons in a genome. Availability of the 'parts list' has fostered the development of experimental approaches to systematically interrogate gene function on the genome, transcriptome and proteome level. Studying gene function at the protein level is vital to the understanding of how cells perform their functions as variations in protein isoforms and protein quantity which may underlie a change in phenotype can often not be deduced from sequence or transcript level genomics experiments alone. Recent advancements in proteomics have afforded technologies capable of measuring protein expression, post-translational modifications of these proteins, their subcellular localisation and assembly into complexes and pathways. Although an enormous amount of data already exists on the function of many human proteins, much of it is scattered over multiple resources. Public domain databases are therefore required to manage and collate this information and present it to the user community in both a human and machine readable manner. Of special importance here is the integration of heterogeneous data to facilitate the creation of resources that go beyond a mere parts list.
Collapse
Affiliation(s)
- Michael Mueller
- EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
| | | | | |
Collapse
|
9
|
Stumpff-Kane AW, Feig M. A correlation-based method for the enhancement of scoring functions on funnel-shaped energy landscapes. Proteins 2006; 63:155-64. [PMID: 16397892 DOI: 10.1002/prot.20853] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A correlation-based approach is introduced for enhancing the ability of structure-scoring methods to identify and distinguish native-like conformations. The proposed method relies on a funnel-shaped scoring function that decreases steadily toward the native state. It takes advantage of the idea that the structure from a given ensemble that is closest to the native basin leads to the highest correlation coefficient between a given score and distance to that structure as an approximation of the native state for the entire ensemble. The method is applied successfully to a number of different test cases that demonstrate substantial improvements in the correlation of the score with the distance from the true native state but also result in the selection of more native-like structures compared to the original score.
Collapse
Affiliation(s)
- Andrew W Stumpff-Kane
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824-1319, USA
| | | |
Collapse
|
10
|
Luan T, Jaravine V, Yee A, Arrowsmith CH, Orekhov VY. Optimization of resolution and sensitivity of 4D NOESY using multi-dimensional decomposition. JOURNAL OF BIOMOLECULAR NMR 2005; 33:1-14. [PMID: 16222553 DOI: 10.1007/s10858-005-1363-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2005] [Accepted: 07/14/2005] [Indexed: 05/04/2023]
Abstract
Highly resolved multi-dimensional NOE data are essential for rapid and accurate determination of spatial protein structures such as in structural genomics projects. Four-dimensional spectra contain almost no spectral overlap inherently present in lower dimensionality spectra and are highly amenable to application of automated routines for spectral resonance location and assignment. However, a high resolution 4D data set using conventional uniform sampling usually requires unacceptably long measurement time. Recently we have reported that the use of non-uniform sampling and multi-dimensional decomposition (MDD) can remedy this problem. Here we validate accuracy and robustness of the method, and demonstrate its usefulness for fully protonated protein samples. The method was applied to 11 kDa protein PA1123 from structural genomics pipeline. A systematic evaluation of spectral reconstructions obtained using 15-100% subsets of the complete reference 4D 1H-13C-13C-1H NOESY spectrum has been performed. With the experimental time saving of up to six times, the resolution and the sensitivity per unit time is shown to be similar to that of the fully recorded spectrum. For the 30% data subset we demonstrate that the intensities in the reconstructed and reference 4D spectra correspond with a correlation coefficient of 0.997 in the full range of spectral amplitudes. Intensities of the strong, middle and weak cross-peaks correlate with coefficients 0.9997, 0.9965, and 0.83. The method does not produce false peaks. 2% of weak peaks lost in the 30% reconstruction is in line with theoretically expected noise increase for the shorter measurement time. Together with good accuracy in the relative line-widths these translate to reliable distance constrains derived from sparsely sampled, high resolution 4D NOESY data.
Collapse
Affiliation(s)
- T Luan
- The Swedish NMR centre at Göteborg University, Medicinaregatan 5C, P.O. Box 465, 40530, Göteborg, Sweden
| | | | | | | | | |
Collapse
|
11
|
Depristo MA, de Bakker PIW, Johnson RJK, Blundell TL. Crystallographic Refinement by Knowledge-Based Exploration of Complex Energy Landscapes. Structure 2005; 13:1311-9. [PMID: 16154088 DOI: 10.1016/j.str.2005.06.008] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Revised: 06/03/2005] [Accepted: 06/08/2005] [Indexed: 11/24/2022]
Abstract
Although X-ray crystallography remains the most versatile method to determine the three-dimensional atomic structure of proteins and much progress has been made in model building and refinement techniques, it remains a challenge to elucidate accurately the structure of proteins in medium-resolution crystals. This is largely due to the difficulty of exploring an immense conformational space to identify the set of conformers that collectively best fits the experimental diffraction pattern. We show here that combining knowledge-based conformational sampling in RAPPER with molecular dynamics/simulated annealing (MD/SA) vastly improves the quality and power of refinement compared to MD/SA alone. The utility of this approach is highlighted by the automated determination of a lysozyme mutant from a molecular replacement solution that is in congruence with a model prepared independently by crystallographers. Finally, we discuss the implications of this work on structure determination in particular and conformational sampling and energy minimization in general.
Collapse
Affiliation(s)
- Mark A Depristo
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, United Kingdom.
| | | | | | | |
Collapse
|
12
|
Petoukhov MV, Svergun DI. Global rigid body modeling of macromolecular complexes against small-angle scattering data. Biophys J 2005; 89:1237-50. [PMID: 15923225 PMCID: PMC1366608 DOI: 10.1529/biophysj.105.064154] [Citation(s) in RCA: 737] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
New methods to automatically build models of macromolecular complexes from high-resolution structures or homology models of their subunits or domains against x-ray or neutron small-angle scattering data are presented. Depending on the complexity of the object, different approaches are employed for the global search of the optimum configuration of subunits fitting the experimental data. An exhaustive grid search is used for hetero- and homodimeric particles and for symmetric oligomers formed by identical subunits. For the assemblies or multidomain proteins containing more then one subunit/domain per asymmetric unit, heuristic algorithms based on simulated annealing are used. Fast computational algorithms based on spherical harmonics representation of scattering amplitudes are employed. The methods allow one to construct interconnected models without steric clashes, to account for the particle symmetry and to incorporate information from other methods, on distances between specific residues or nucleotides. For multidomain proteins, addition of missing linkers between the domains is possible. Simultaneous fitting of multiple scattering patterns from subcomplexes or deletion mutants is incorporated. The efficiency of the methods is illustrated by their application to complexes of different types in several simulated and practical examples. Limitations and possible ambiguity of rigid body modeling are discussed and simplified docking criteria are provided to rank multiple models. The methods described are implemented in publicly available computer programs running on major hardware platforms.
Collapse
|
13
|
Cabantous S, Terwilliger TC, Waldo GS. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat Biotechnol 2004; 23:102-7. [PMID: 15580262 DOI: 10.1038/nbt1044] [Citation(s) in RCA: 640] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2004] [Accepted: 10/22/2004] [Indexed: 11/09/2022]
Abstract
Existing protein tagging and detection methods are powerful but have drawbacks. Split protein tags can perturb protein solubility or may not work in living cells. Green fluorescent protein (GFP) fusions can misfold or exhibit altered processing. Fluorogenic biarsenical FLaSH or ReASH substrates overcome many of these limitations but require a polycysteine tag motif, a reducing environment and cell transfection or permeabilization. An ideal protein tag would be genetically encoded, would work both in vivo and in vitro, would provide a sensitive analytical signal and would not require external chemical reagents or substrates. One way to accomplish this might be with a split GFP, but the GFP fragments reported thus far are large and fold poorly, require chemical ligation or fused interacting partners to force their association, or require coexpression or co-refolding to produce detectable folded and fluorescent GFP. We have engineered soluble, self-associating fragments of GFP that can be used to tag and detect either soluble or insoluble proteins in living cells or cell lysates. The split GFP system is simple and does not change fusion protein solubility.
Collapse
Affiliation(s)
- Stéphanie Cabantous
- Bioscience Division, MS-M888, Los Alamos National Laboratory, PO Box 1663, Los Alamos, New Mexico 87545, USA
| | | | | |
Collapse
|
14
|
Schmid MB. Seeing is believing: the impact of structural genomics on antimicrobial drug discovery. Nat Rev Microbiol 2004; 2:739-46. [PMID: 15372084 DOI: 10.1038/nrmicro978] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Over the past decade, the availability of complete microbial genome sequences has led to changes in the strategies that are used to search for novel anti-infectives. However, despite the identification of many new potential drug targets, novel antimicrobial agents have been slow to emerge from these efforts. In part, this reflects the long discovery and development times that are needed to bring new drugs to market and the bottlenecks at the stages of identifying good lead compounds and optimizing these leads into drug candidates. Structural genomics will hopefully provide opportunities to overcome these bottlenecks and populate the antimicrobial pipeline.
Collapse
Affiliation(s)
- Molly B Schmid
- MBS Associates, 38 Avenue Road, Suite 601, Toronto, Ontario M5R 2G2, Canada.
| |
Collapse
|