1
|
Oeffner RD, Croll TI, Millán C, Poon BK, Schlicksup CJ, Read RJ, Terwilliger TC. Putting AlphaFold models to work with phenix.process_predicted_model and ISOLDE. Acta Crystallogr D Struct Biol 2022; 78:1303-1314. [PMID: 36322415 PMCID: PMC9629492 DOI: 10.1107/s2059798322010026] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/13/2022] [Indexed: 11/23/2022] Open
Abstract
AlphaFold has recently become an important tool in providing models for experimental structure determination by X-ray crystallography and cryo-EM. Large parts of the predicted models typically approach the accuracy of experimentally determined structures, although there are frequently local errors and errors in the relative orientations of domains. Importantly, residues in the model of a protein predicted by AlphaFold are tagged with a predicted local distance difference test score, informing users about which regions of the structure are predicted with less confidence. AlphaFold also produces a predicted aligned error matrix indicating its confidence in the relative positions of each pair of residues in the predicted model. The phenix.process_predicted_model tool downweights or removes low-confidence residues and can break a model into confidently predicted domains in preparation for molecular replacement or cryo-EM docking. These confidence metrics are further used in ISOLDE to weight torsion and atom-atom distance restraints, allowing the complete AlphaFold model to be interactively rearranged to match the docked fragments and reducing the need for the rebuilding of connecting regions.
Collapse
Affiliation(s)
- Robert D. Oeffner
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge Biomedical Campus, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Tristan I. Croll
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge Biomedical Campus, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Claudia Millán
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge Biomedical Campus, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Billy K. Poon
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory (LBNL), Building 33R0349, Berkeley, CA 94720-8235, USA
| | - Christopher J. Schlicksup
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory (LBNL), Building 33R0349, Berkeley, CA 94720-8235, USA
| | - Randy J. Read
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge Biomedical Campus, The Keith Peters Building, Hills Road, Cambridge CB2 0XY, United Kingdom,Correspondence e-mail: ,
| | - Tom C. Terwilliger
- New Mexico Consortium, Los Alamos National Laboratory, 100 Entrada Drive, Los Alamos, NM 87544, USA,Correspondence e-mail: ,
| |
Collapse
|
2
|
Naumann TA, Sollenberger KG, Hao G. Production of selenomethionine labeled polyglycine hydrolases in Pichia pastoris. Protein Expr Purif 2022; 194:106076. [PMID: 35240278 DOI: 10.1016/j.pep.2022.106076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 02/24/2022] [Accepted: 02/25/2022] [Indexed: 01/05/2023]
Abstract
Producing recombinant proteins with incorporated selenomethionine (SeMet) facilitates solving X-ray crystallographic structures of novel proteins. Production of SeMet labeled proteins in the yeast Pichia pastoris (syn. Komagataella phaffii) is difficult because SeMet is mildly toxic, reducing protein expression levels. To counteract this yield loss for a novel protease, Epicoccum sorghi chitinase modifying protein (Es-cmp), a novel disease promoting protease secreted by these plant pathogenic fungi, we isolated a yeast strain that secreted more protein. By comparing the expression level of 48 strains we isolated one that produced significantly more protein. This strain was found to be gene dosed, having four copies of the expression cassette. After optimization the strain produced Es-cmp in defined media with SeMet at levels nearly equal to that of the original strain in complex media. Also, we produced SeMet labeled protein for a homologous protease from the fungus Fusarium vanettenii, Fvan-cmp, by directly selecting a gene dosed strain on agar plates with increased zeocin. Linearization of plasmid with PmeI before electroporation led to high numbers of 1 mg/mL zeocin resistant clones with significantly increased expression compared to those selected on 0.1 mg/mL. The gene dosed strains expressing Es-cmp and Fvan-cmp allowed production of 8.5 and 16.8 mg of SeMet labeled protein from 500 mL shake flask cultures. The results demonstrate that selection of P. pastoris expression strains by plating after transformation on agar with 1 mg/mL zeocin rather than the standard 0.1 mg/mL directly selects gene dosed strains that can facilitate production of selenomethionine labeled proteins.
Collapse
Affiliation(s)
- Todd A Naumann
- Mycotoxin Prevention and Applied Microbiology Research Unit, National Center for Agriculture Utilization Research, Peoria, IL, 61604, USA.
| | - Kurt G Sollenberger
- Mycotoxin Prevention and Applied Microbiology Research Unit, National Center for Agriculture Utilization Research, Peoria, IL, 61604, USA
| | - Guixia Hao
- Mycotoxin Prevention and Applied Microbiology Research Unit, National Center for Agriculture Utilization Research, Peoria, IL, 61604, USA
| |
Collapse
|
3
|
McCoy AJ, Sammito MD, Read RJ. Implications of AlphaFold2 for crystallographic phasing by molecular replacement. Acta Crystallogr D Struct Biol 2022; 78:1-13. [PMID: 34981757 PMCID: PMC8725160 DOI: 10.1107/s2059798321012122] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 11/13/2021] [Indexed: 12/11/2022] Open
Abstract
The AlphaFold2 results in the 14th edition of Critical Assessment of Structure Prediction (CASP14) showed that accurate (low root-mean-square deviation) in silico models of protein structure domains are on the horizon, whether or not the protein is related to known structures through high-coverage sequence similarity. As highly accurate models become available, generated by harnessing the power of correlated mutations and deep learning, one of the aspects of structural biology to be impacted will be methods of phasing in crystallography. Here, the data from CASP14 are used to explore the prospects for changes in phasing methods, and in particular to explore the prospects for molecular-replacement phasing using in silico models.
Collapse
Affiliation(s)
- Airlie J. McCoy
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Massimo D. Sammito
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Randy J. Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| |
Collapse
|
4
|
Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN, Schaeffer RD, Millán C, Park H, Adams C, Glassman CR, DeGiovanni A, Pereira JH, Rodrigues AV, van Dijk AA, Ebrecht AC, Opperman DJ, Sagmeister T, Buhlheller C, Pavkov-Keller T, Rathinaswamy MK, Dalwadi U, Yip CK, Burke JE, Garcia KC, Grishin NV, Adams PD, Read RJ, Baker D. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021; 373:871-876. [PMID: 34282049 PMCID: PMC7612213 DOI: 10.1126/science.abj8754] [Citation(s) in RCA: 2109] [Impact Index Per Article: 703.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/07/2021] [Indexed: 01/17/2023]
Abstract
DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.
Collapse
Affiliation(s)
- Minkyung Baek
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Justas Dauparas
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Sergey Ovchinnikov
- Faculty of Arts and Sciences, Division of Science, Harvard University, Cambridge, MA 02138, USA
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA
| | - Gyu Rie Lee
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Jue Wang
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Claudia Millán
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | - Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Carson Adams
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Caleb R Glassman
- Program in Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Andy DeGiovanni
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Jose H Pereira
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Andria V Rodrigues
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Alberdina A van Dijk
- Department of Biochemistry, Focus Area Human Metabolomics, North-West University, 2531 Potchefstroom, South Africa
| | - Ana C Ebrecht
- Department of Biochemistry, Focus Area Human Metabolomics, North-West University, 2531 Potchefstroom, South Africa
| | - Diederik J Opperman
- Department of Biotechnology, University of the Free State, 205 Nelson Mandela Drive, Bloemfontein 9300, South Africa
| | - Theo Sagmeister
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50, 8010 Graz, Austria
| | - Christoph Buhlheller
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50, 8010 Graz, Austria
- Medical University of Graz, Graz, Austria
| | - Tea Pavkov-Keller
- Institute of Molecular Biosciences, University of Graz, Humboldtstrasse 50, 8010 Graz, Austria
- BioTechMed-Graz, Graz, Austria
| | - Manoj K Rathinaswamy
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada
| | - Udit Dalwadi
- Life Sciences Institute, Department of Biochemistry and Molecular Biology, The University of British Columbia, Vancouver, BC, Canada
| | - Calvin K Yip
- Life Sciences Institute, Department of Biochemistry and Molecular Biology, The University of British Columbia, Vancouver, BC, Canada
| | - John E Burke
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada
| | - K Christopher Garcia
- Program in Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Nick V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Paul D Adams
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Randy J Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
5
|
Millán C, Keegan RM, Pereira J, Sammito MD, Simpkin AJ, McCoy AJ, Lupas AN, Hartmann MD, Rigden DJ, Read RJ. Assessing the utility of CASP14 models for molecular replacement. Proteins 2021; 89:1752-1769. [PMID: 34387010 PMCID: PMC8881082 DOI: 10.1002/prot.26214] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 07/20/2021] [Accepted: 07/27/2021] [Indexed: 11/21/2022]
Abstract
The assessment of CASP models for utility in molecular replacement is a measure of their use in a valuable real‐world application. In CASP7, the metric for molecular replacement assessment involved full likelihood‐based molecular replacement searches; however, this restricted the assessable targets to crystal structures with only one copy of the target in the asymmetric unit, and to those where the search found the correct pose. In CASP10, full molecular replacement searches were replaced by likelihood‐based rigid‐body refinement of models superimposed on the target using the LGA algorithm, with the metric being the refined log‐likelihood‐gain (LLG) score. This enabled multi‐copy targets and very poor models to be evaluated, but a significant further issue remained: the requirement of diffraction data for assessment. We introduce here the relative‐expected‐LLG (reLLG), which is independent of diffraction data. This reLLG is also independent of any crystal form, and can be calculated regardless of the source of the target, be it X‐ray, NMR or cryo‐EM. We calibrate the reLLG against the LLG for targets in CASP14, showing that it is a robust measure of both model and group ranking. Like the LLG, the reLLG shows that accurate coordinate error estimates add substantial value to predicted models. We find that refinement by CASP groups can often convert an inadequate initial model into a successful MR search model. Consistent with findings from others, we show that the AlphaFold2 models are sufficiently good, and reliably so, to surpass other current model generation strategies for attempting molecular replacement phasing.
Collapse
Affiliation(s)
- Claudia Millán
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, United Kingdom
| | - Ronan M Keegan
- Scientific Computing Dept., Science and Technologies Facilities Council, UK Research and Innovation, Didcot, Oxfordshire, United Kingdom
| | - Joana Pereira
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, Tübingen, Germany
| | - Massimo D Sammito
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, United Kingdom
| | - Adam J Simpkin
- Institute of Systems, Molecular and Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7BE, United Kingdom
| | - Airlie J McCoy
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, United Kingdom
| | - Andrei N Lupas
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, Tübingen, Germany
| | - Marcus D Hartmann
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, Tübingen, Germany
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7BE, United Kingdom
| | - Randy J Read
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, United Kingdom
| |
Collapse
|
6
|
McCoy AJ, Stockwell DH, Sammito MD, Oeffner RD, Hatti KS, Croll TI, Read RJ. Phasertng: directed acyclic graphs for crystallographic phasing. Acta Crystallogr D Struct Biol 2021; 77:1-10. [PMID: 33404520 PMCID: PMC7787104 DOI: 10.1107/s2059798320014746] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 11/06/2020] [Indexed: 12/01/2022] Open
Abstract
Crystallographic phasing strategies increasingly require the exploration and ranking of many hypotheses about the number, types and positions of atoms, molecules and/or molecular fragments in the unit cell, each with only a small chance of being correct. Accelerating this move has been improvements in phasing methods, which are now able to extract phase information from the placement of very small fragments of structure, from weak experimental phasing signal or from combinations of molecular replacement and experimental phasing information. Describing phasing in terms of a directed acyclic graph allows graph-management software to track and manage the path to structure solution. The crystallographic software supporting the graph data structure must be strictly modular so that nodes in the graph are efficiently generated by the encapsulated functionality. To this end, the development of new software, Phasertng, which uses directed acyclic graphs natively for input/output, has been initiated. In Phasertng, the codebase of Phaser has been rebuilt, with an emphasis on modularity, on scripting, on speed and on continuing algorithm development. As a first application of phasertng, its advantages are demonstrated in the context of phasertng.xtricorder, a tool to analyse and triage merged data in preparation for molecular replacement or experimental phasing. The description of the phasing strategy with directed acyclic graphs is a generalization that extends beyond the functionality of Phasertng, as it can incorporate results from bioinformatics and other crystallographic tools, and will facilitate multifaceted search strategies, dynamic ranking of alternative search pathways and the exploitation of machine learning to further improve phasing strategies.
Collapse
Affiliation(s)
- Airlie J. McCoy
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Duncan H. Stockwell
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Massimo D. Sammito
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Robert D. Oeffner
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Kaushik S. Hatti
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
- Drug Discovery Unit, Wellcome Centre for Anti-Infectives Research, School of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, United Kingdom
| | - Tristan I. Croll
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| | - Randy J. Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, United Kingdom
| |
Collapse
|
7
|
Wallner B. Estimating local protein model quality: prospects for molecular replacement. ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY 2020; 76:285-290. [PMID: 32133992 PMCID: PMC7057213 DOI: 10.1107/s2059798320000972] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 01/24/2020] [Indexed: 11/10/2022]
Abstract
Model quality assessment programs estimate the quality of protein models and can be used to estimate local error in protein models. ProQ3D is the most recent and most accurate version of our software. Here, it is demonstrated that it is possible to use local error estimates to substantially increase the quality of the models for molecular replacement (MR). Adjusting the B factors using ProQ3D improved the log-likelihood gain (LLG) score by over 50% on average, resulting in significantly more successful models in MR compared with not using error estimates. On a data set of 431 homology models to address difficult MR targets, models with error estimates from ProQ3D received an LLG of >50 for almost half of the models 209/431 (48.5%), compared with 175/431 (40.6%) for the previous version, ProQ2, and only 74/431 (17.2%) for models with no error estimates, clearly demonstrating the added value of using error estimates to enable MR for more targets. ProQ3D is available from http://proq3.bioinfo.se/ both as a server and as a standalone download.
Collapse
Affiliation(s)
- Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, SE-581 83 Linköping, Sweden
| |
Collapse
|
8
|
Croll TI, Sammito MD, Kryshtafovych A, Read RJ. Evaluation of template-based modeling in CASP13. Proteins 2019; 87:1113-1127. [PMID: 31407380 PMCID: PMC6851432 DOI: 10.1002/prot.25800] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 07/29/2019] [Accepted: 08/08/2019] [Indexed: 12/12/2022]
Abstract
Performance in the template‐based modeling (TBM) category of CASP13 is assessed here, using a variety of metrics. Performance of the predictor groups that participated is ranked using the primary ranking score that was developed by the assessors for CASP12. This reveals that the best results are obtained by groups that include contact predictions or inter‐residue distance predictions derived from deep multiple sequence alignments. In cases where there is a good homolog in the wwPDB (TBM‐easy category), the best results are obtained by modifying a template. However, for cases with poorer homologs (TBM‐hard), very good results can be obtained without using an explicit template, by deep learning algorithms trained on the wwPDB. Alternative metrics are introduced, to allow testing of aspects of structural models that are not addressed by traditional CASP metrics. These include comparisons to the main‐chain and side‐chain torsion angles of the target, and the utility of models for solving crystal structures by the molecular replacement method. The alternative metrics are poorly correlated with the traditional metrics, and it is proposed that modeling has reached a sufficient level of maturity that the best models should be expected to satisfy this wider range of criteria.
Collapse
Affiliation(s)
- Tristan I Croll
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| | - Massimo D Sammito
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| | | | - Randy J Read
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| |
Collapse
|
9
|
Virtanen JJ, Zhang Y. MR-REX: molecular replacement by cooperative conformational search and occupancy optimization on low-accuracy protein models. Acta Crystallogr D Struct Biol 2018; 74:606-620. [PMID: 29968671 PMCID: PMC6038387 DOI: 10.1107/s2059798318005612] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 04/10/2018] [Indexed: 11/10/2022] Open
Abstract
Molecular replacement (MR) has commonly been employed to derive the phase information in protein crystal X-ray diffraction, but its success rate decreases rapidly when the search model is dissimilar to the target. MR-REX has been developed to perform an MR search by replica-exchange Monte Carlo simulations, which enables cooperative rotation and translation searches and simultaneous clash and occupancy optimization. MR-REX was tested on a set of 1303 protein structures of different accuracies and successfully placed 699 structures at positions that have an r.m.s.d. of below 2 Å to the target position, which is 10% higher than was obtained by Phaser. However, cases studies show that many of the models for which Phaser failed and MR-REX succeeded can be solved by Phaser by pruning them and using nondefault parameters. The factors effecting success and the parts of the methodology which lead to success are studied. The results demonstrate a new avenue for molecular replacement which outperforms (and has results that are complementary to) the state-of-the-art MR methods, in particular for distantly homologous proteins.
Collapse
Affiliation(s)
- Jouko J. Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
10
|
Oeffner RD, Afonine PV, Millán C, Sammito M, Usón I, Read RJ, McCoy AJ. On the application of the expected log-likelihood gain to decision making in molecular replacement. Acta Crystallogr D Struct Biol 2018; 74:245-255. [PMID: 29652252 PMCID: PMC5892874 DOI: 10.1107/s2059798318004357] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2017] [Accepted: 03/14/2018] [Indexed: 11/18/2022] Open
Abstract
Molecular-replacement phasing of macromolecular crystal structures is often fast, but if a molecular-replacement solution is not immediately obtained the crystallographer must judge whether to pursue molecular replacement or to attempt experimental phasing as the quickest path to structure solution. The introduction of the expected log-likelihood gain [eLLG; McCoy et al. (2017), Proc. Natl Acad. Sci. USA, 114, 3637-3641] has given the crystallographer a powerful new tool to aid in making this decision. The eLLG is the log-likelihood gain on intensity [LLGI; Read & McCoy (2016), Acta Cryst. D72, 375-387] expected from a correctly placed model. It is calculated as a sum over the reflections of a function dependent on the fraction of the scattering for which the model accounts, the estimated model coordinate error and the measurement errors in the data. It is shown how the eLLG may be used to answer the question `can I solve my structure by molecular replacement?'. However, this is only the most obvious of the applications of the eLLG. It is also discussed how the eLLG may be used to determine the search order and minimal data requirements for obtaining a molecular-replacement solution using a given model, and for decision making in fragment-based molecular replacement, single-atom molecular replacement and likelihood-guided model pruning.
Collapse
Affiliation(s)
- Robert D. Oeffner
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, England
| | - Pavel V. Afonine
- Lawrence Berkeley National Laboratory, One Cyclotron Road, BLDG 64R0121, Berkeley, CA 94720, USA
- Department of Physics and International Centre for Quantum and Molecular Structures, Shanghai University, Shanghai 200444, People’s Republic of China
| | - Claudia Millán
- Crystallographic Methods, Institute of Molecular Biology of Barcelona (IBMB–CSIC), Barcelona Science Park, Helix Building, Baldiri Reixac 15, 08028 Barcelona, Spain
| | - Massimo Sammito
- Crystallographic Methods, Institute of Molecular Biology of Barcelona (IBMB–CSIC), Barcelona Science Park, Helix Building, Baldiri Reixac 15, 08028 Barcelona, Spain
| | - Isabel Usón
- Crystallographic Methods, Institute of Molecular Biology of Barcelona (IBMB–CSIC), Barcelona Science Park, Helix Building, Baldiri Reixac 15, 08028 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, 08003 Barcelona, Spain
| | - Randy J. Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, England
| | - Airlie J. McCoy
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, England
| |
Collapse
|
11
|
Park H, Ovchinnikov S, Kim DE, DiMaio F, Baker D. Protein homology model refinement by large-scale energy optimization. Proc Natl Acad Sci U S A 2018; 115:3054-3059. [PMID: 29507254 PMCID: PMC5866580 DOI: 10.1073/pnas.1719115115] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Proteins fold to their lowest free-energy structures, and hence the most straightforward way to increase the accuracy of a partially incorrect protein structure model is to search for the lowest-energy nearby structure. This direct approach has met with little success for two reasons: first, energy function inaccuracies can lead to false energy minima, resulting in model degradation rather than improvement; and second, even with an accurate energy function, the search problem is formidable because the energy only drops considerably in the immediate vicinity of the global minimum, and there are a very large number of degrees of freedom. Here we describe a large-scale energy optimization-based refinement method that incorporates advances in both search and energy function accuracy that can substantially improve the accuracy of low-resolution homology models. The method refined low-resolution homology models into correct folds for 50 of 84 diverse protein families and generated improved models in recent blind structure prediction experiments. Analyses of the basis for these improvements reveal contributions from both the improvements in conformational sampling techniques and the energy function.
Collapse
Affiliation(s)
- Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA 98105
| | - David E Kim
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98105;
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105
| |
Collapse
|
12
|
Abstract
Molecular replacement is a method for solving the crystallographic phase problem using an atomic model for the target structure. State-of-the-art methods have moved the field significantly from when it was first envisaged as a method for solving cases of high homology and completeness between a model and target structure. Improvements brought about by application of maximum likelihood statistics mean that various errors in the model and pathologies in the data can be accounted for, so that cases hitherto thought to be intractable are standardly solvable. As a result, molecular replacement phasing now accounts for the lion's share of structures deposited in the Protein Data Bank. However, there will always be cases at the fringes of solvability. I discuss here the approaches that will help tackle challenging molecular replacement cases.
Collapse
Affiliation(s)
- Airlie J McCoy
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 0XY, UK.
| |
Collapse
|
13
|
Wang Y, Virtanen J, Xue Z, Tesmer JJG, Zhang Y. Using iterative fragment assembly and progressive sequence truncation to facilitate phasing and crystal structure determination of distantly related proteins. Acta Crystallogr D Struct Biol 2016; 72:616-28. [PMID: 27139625 PMCID: PMC4931812 DOI: 10.1107/s2059798316003016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 02/19/2016] [Indexed: 04/15/2023] Open
Abstract
Molecular replacement (MR) often requires templates with high homology to solve the phase problem in X-ray crystallography. I-TASSER-MR has been developed to test whether the success rate for structure determination of distant-homology proteins could be improved by a combination of iterative fragmental structure-assembly simulations with progressive sequence truncation designed to trim regions with high variation. The pipeline was tested on two independent protein sets consisting of 61 proteins from CASP8 and 100 high-resolution proteins from the PDB. After excluding homologous templates, I-TASSER generated full-length models with an average TM-score of 0.773, which is 12% higher than the best threading templates. Using these as search models, I-TASSER-MR found correct MR solutions for 95 of 161 targets as judged by having a TFZ of >8 or with the final structure closer to the native than the initial search models. The success rate was 16% higher than when using the best threading templates. I-TASSER-MR was also applied to 14 protein targets from structure genomics centers. Seven of these were successfully solved by I-TASSER-MR. These results confirm that advanced structure assembly and progressive structural editing can significantly improve the success rate of MR for targets with distant homology to proteins of known structure.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People’s Republic of China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People’s Republic of China
| | - John J. G. Tesmer
- Departments of Pharmacology and Biological Chemistry, University of Michigan, Ann Arbor, MI 41809, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
14
|
Pike ACW, Garman EF, Krojer T, von Delft F, Carpenter EP. An overview of heavy-atom derivatization of protein crystals. Acta Crystallogr D Struct Biol 2016; 72:303-18. [PMID: 26960118 PMCID: PMC4784662 DOI: 10.1107/s2059798316000401] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 01/08/2016] [Indexed: 11/11/2022] Open
Abstract
Heavy-atom derivatization is one of the oldest techniques for obtaining phase information for protein crystals and, although it is no longer the first choice, it remains a useful technique for obtaining phases for unknown structures and for low-resolution data sets. It is also valuable for confirming the chain trace in low-resolution electron-density maps. This overview provides a summary of the technique and is aimed at first-time users of the method. It includes guidelines on when to use it, which heavy atoms are most likely to work, how to prepare heavy-atom solutions, how to derivatize crystals and how to determine whether a crystal is in fact a derivative.
Collapse
Affiliation(s)
- Ashley C. W. Pike
- Structural Genomics Consortium, University of Oxford, Roosevelt Drive, Oxford OX11 9HP, England
| | - Elspeth F. Garman
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, England
| | - Tobias Krojer
- Structural Genomics Consortium, University of Oxford, Roosevelt Drive, Oxford OX11 9HP, England
| | - Frank von Delft
- Structural Genomics Consortium, University of Oxford, Roosevelt Drive, Oxford OX11 9HP, England
- Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0QX, England
- Department of Biochemistry, University of Johannesburg, Aukland Park 2006, South Africa
| | - Elisabeth P. Carpenter
- Structural Genomics Consortium, University of Oxford, Roosevelt Drive, Oxford OX11 9HP, England
| |
Collapse
|
15
|
Uziela K, Wallner B. ProQ2: estimation of model accuracy implemented in Rosetta. Bioinformatics 2016; 32:1411-3. [PMID: 26733453 PMCID: PMC4848402 DOI: 10.1093/bioinformatics/btv767] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 12/23/2015] [Indexed: 11/24/2022] Open
Abstract
Motivation: Model quality assessment programs are used to predict the quality of modeled protein structures. They can be divided into two groups depending on the information they are using: ensemble methods using consensus of many alternative models and methods only using a single model to do its prediction. The consensus methods excel in achieving high correlations between prediction and true quality measures. However, they frequently fail to pick out the best possible model, nor can they be used to generate and score new structures. Single-model methods on the other hand do not have these inherent shortcomings and can be used both to sample new structures and to improve existing consensus methods. Results: Here, we present an implementation of the ProQ2 program to estimate both local and global model accuracy as part of the Rosetta modeling suite. The current implementation does not only make it possible to run large batch runs locally, but it also opens up a whole new arena for conformational sampling using machine learned scoring functions and to incorporate model accuracy estimation in to various existing modeling schemes. ProQ2 participated in CASP11 and results from CASP11 are used to benchmark the current implementation. Based on results from CASP11 and CAMEO-QE, a continuous benchmark of quality estimation methods, it is clear that ProQ2 is the single-model method that performs best in both local and global model accuracy. Availability and implementation:https://github.com/bjornwallner/ProQ_scripts Contact:bjornw@ifm.liu.se Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Karolis Uziela
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, SE-581 83, Linköping, Sweden and Swedish e-Science Research Center, Linköping, Sweden
| |
Collapse
|
16
|
Yang J, Wang Y, Zhang Y. ResQ: An Approach to Unified Estimation of B-Factor and Residue-Specific Error in Protein Structure Prediction. J Mol Biol 2015; 428:693-701. [PMID: 26437129 DOI: 10.1016/j.jmb.2015.09.024] [Citation(s) in RCA: 88] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Revised: 08/23/2015] [Accepted: 09/28/2015] [Indexed: 11/15/2022]
Abstract
Computer-based structure prediction becomes a major tool to provide large-scale structure models for annotating biological function of proteins. Information of residue-level accuracy and thermal mobility (or B-factor), which is critical to decide how biologists utilize the predicted models, is however missed in most structure prediction pipelines. We developed ResQ for unified residue-level model quality and B-factor estimations by combining local structure assembly variations with sequence-based and structure-based profiling. ResQ was tested on 635 non-redundant proteins with structure models generated by I-TASSER, where the average difference between estimated and observed distance errors is 1.4Å for the confidently modeled proteins. ResQ was further tested on structure decoys from CASP9-11 experiments, where the error of local structure quality prediction is consistently lower than or comparable to other state-of-the-art predictors. Finally, ResQ B-factor profile was used to assist molecular replacement, which resulted in successful solutions on several proteins that could not be solved from constant B-factor settings.
Collapse
Affiliation(s)
- Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yan Wang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|