1
|
Dicks L, Wales DJ. Exploiting Sequence-Dependent Rotamer Information in Global Optimization of Proteins. J Phys Chem B 2022; 126:8381-8390. [PMID: 36257022 PMCID: PMC9623586 DOI: 10.1021/acs.jpcb.2c04647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Rotamers, namely amino acid side chain conformations common to many different peptides, can be compiled into libraries. These rotamer libraries are used in protein modeling, where the limited conformational space occupied by amino acid side chains is exploited. Here, we construct a sequence-dependent rotamer library from simulations of all possible tripeptides, which provides rotameric states dependent on adjacent amino acids. We observe significant sensitivity of rotamer populations to sequence and find that the library is successful in locating side chain conformations present in crystal structures. The library is designed for applications with basin-hopping global optimization, where we use it to propose moves in conformational space. The addition of rotamer moves significantly increases the efficiency of protein structure prediction within this framework, and we determine parameters to optimize efficiency.
Collapse
Affiliation(s)
- L. Dicks
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom,IBM
Research, The Hartree Centre STFC Laboratory,
Sci-Tech Daresbury, Warrington WA4 4AD, United Kingdom
| | - D. J. Wales
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom,
| |
Collapse
|
2
|
Misiura M, Shroff R, Thyer R, Kolomeisky AB. DLPacker: Deep learning for prediction of amino acid side chain conformations in proteins. Proteins 2022; 90:1278-1290. [PMID: 35122328 DOI: 10.1002/prot.26311] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 12/03/2021] [Accepted: 12/07/2021] [Indexed: 12/20/2022]
Abstract
Prediction of side chain conformations of amino acids in proteins (also termed "packing") is an important and challenging part of protein structure prediction with many interesting applications in protein design. A variety of methods for packing have been developed but more accurate ones are still needed. Machine learning (ML) methods have recently become a powerful tool for solving various problems in diverse areas of science, including structural biology. In this study, we evaluate the potential of deep neural networks (DNNs) for prediction of amino acid side chain conformations. We formulate the problem as image-to-image transformation and train a U-net style DNN to solve the problem. We show that our method outperforms other physics-based methods by a significant margin: reconstruction RMSDs for most amino acids are about 20% smaller compared to SCWRL4 and Rosetta Packer with RMSDs for bulky hydrophobic amino acids Phe, Tyr, and Trp being up to 50% smaller.
Collapse
Affiliation(s)
- Mikita Misiura
- Department of Chemistry, Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
| | | | - Ross Thyer
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Anatoly B Kolomeisky
- Department of Chemistry, Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA.,Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA.,Department of Physics and Astronomy, Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
| |
Collapse
|
3
|
Jin Y, Johannissen LO, Hay S. Predicting new protein conformations from molecular dynamics simulation conformational landscapes and machine learning. Proteins 2021; 89:915-921. [PMID: 33629765 DOI: 10.1002/prot.26068] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 01/21/2021] [Accepted: 02/23/2021] [Indexed: 11/06/2022]
Abstract
Molecular dynamics (MD) simulations are a popular method of studying protein structure and function, but are unable to reliably sample all relevant conformational space in reasonable computational timescales. A range of enhanced sampling methods are available that can improve conformational sampling, but these do not offer a complete solution. We present here a proof-of-principle method of combining MD simulation with machine learning to explore protein conformational space. An autoencoder is used to map snapshots from MD simulations onto a user-defined conformational landscape defined by principal components analysis or specific structural features, and we show that we can predict, with useful accuracy, conformations that are not present in the training data. This method offers a new approach to the prediction of new low energy/physically realistic structures of conformationally dynamic proteins and allows an alternative approach to enhanced sampling of MD simulations.
Collapse
Affiliation(s)
- Yiming Jin
- Manchester Institute of Biotechnology and Department of Chemistry, The University of Manchester, Manchester, UK
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Linus O Johannissen
- Manchester Institute of Biotechnology and Department of Chemistry, The University of Manchester, Manchester, UK
| | - Sam Hay
- Manchester Institute of Biotechnology and Department of Chemistry, The University of Manchester, Manchester, UK
| |
Collapse
|
4
|
Colbes J, Corona RI, Lezcano C, Rodríguez D, Brizuela CA. Protein side-chain packing problem: is there still room for improvement? Brief Bioinform 2018; 18:1033-1043. [PMID: 27567382 DOI: 10.1093/bib/bbw079] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Indexed: 11/12/2022] Open
Abstract
The protein side-chain packing problem (PSCPP) is an important subproblem of both protein structure prediction and protein design. During the past two decades, a large number of methods have been proposed to tackle this problem. These methods consist of three main components: a rotamer library, a scoring function and a search strategy. The average overall accuracy level obtained by these methods is approximately 87%. Whether a better accuracy level could be achieved remains to be answered. To address this question, we calculated the maximum accuracy level attainable using a simple rotamer library, independently of the energy function or the search method. Using 2883 different structures from the Protein Data Bank, we compared this accuracy level with the accuracy level of five state-of-the-art methods. These comparisons indicated that, for buried residues in the protein, we are already close to the best possible accuracy results. In addition, for exposed residues, we found that a significant gap exists between the possible improvement and the maximum accuracy level achievable with current methods. After determining that an improvement is possible, the next step is to understand what limitations are preventing us from obtaining such an improvement. Previous works on protein structure prediction and protein design have shown that scoring function inaccuracies may represent the main obstacle to achieving better results for these problems. To show that the same is true for the PSCPP, we evaluated the quality of two scoring functions used by some state-of-the-art algorithms. Our results indicate that neither of these scoring functions can guide the search method correctly, thereby reinforcing the idea that efforts to solve the PSCPP must also focus on developing better scoring functions.
Collapse
|
5
|
Colbes J, Aguila SA, Brizuela CA. Scoring of Side-Chain Packings: An Analysis of Weight Factors and Molecular Dynamics Structures. J Chem Inf Model 2018; 58:443-452. [PMID: 29368924 DOI: 10.1021/acs.jcim.7b00679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The protein side-chain packing problem (PSCPP) is a central task in computational protein design. The problem is usually modeled as a combinatorial optimization problem, which consists of searching for a set of rotamers, from a given rotamer library, that minimizes a scoring function (SF). The SF is a weighted sum of terms, that can be decomposed in physics-based and knowledge-based terms. Although there are many methods to obtain approximate solutions for this problem, all of them have similar performances and there has not been a significant improvement in recent years. Studies on protein structure prediction and protein design revealed the limitations of current SFs to achieve further improvements for these two problems. In the same line, a recent work reported a similar result for the PSCPP. In this work, we ask whether or not this negative result regarding further improvements in performance is due to (i) an incorrect weighting of the SFs terms or (ii) the constrained conformation resulting from the protein crystallization process. To analyze these questions, we (i) model the PSCPP as a bi-objective combinatorial optimization problem, optimizing, at the same time, the two most important terms of two SFs of state-of-the-art algorithms and (ii) performed a preprocessing relaxation of the crystal structure through molecular dynamics to simulate the protein in the solvent and evaluated the performance of these two state-of-the-art SFs under these conditions. Our results indicate that (i) no matter what combination of weight factors we use the current SFs will not lead to better performances and (ii) the evaluated SFs will not be able to improve performance on relaxed structures. Furthermore, the experiments revealed that the SFs and the methods are biased toward crystallized structures.
Collapse
Affiliation(s)
- Jose Colbes
- Computer Science Department, CICESE Research Center , 22860 Ensenada, Mexico
| | - Sergio A Aguila
- Centro de Nanociencias y Nanotecnologia, Universidad Nacional Autonoma de Mexico , Km. 107 Carretera Tijuana-Ensenada, Ensenada, Baja California, Mexico , C.P. 22860
| | - Carlos A Brizuela
- Computer Science Department, CICESE Research Center , 22860 Ensenada, Mexico
| |
Collapse
|
6
|
Miao Z, Cao Y. Quantifying side-chain conformational variations in protein structure. Sci Rep 2016; 6:37024. [PMID: 27845406 PMCID: PMC5109468 DOI: 10.1038/srep37024] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 10/24/2016] [Indexed: 12/15/2022] Open
Abstract
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.
Collapse
Affiliation(s)
- Zhichao Miao
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 67000 Strasbourg, France.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences and State Key Laboratory of Biotherapy, Sichuan University, Chengdu, 610014, China
| |
Collapse
|
7
|
Taghizadeh M, Goliaei B, Madadkar-Sobhani A. SDRL: a sequence-dependent protein side-chain rotamer library. MOLECULAR BIOSYSTEMS 2016; 11:2000-7. [PMID: 25953624 DOI: 10.1039/c5mb00057b] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Since the introduction of the first protein side-chain rotamer library (RL) almost half a century ago, RLs have been components of many programs and algorithms in structural bioinformatics. Based on the dependence of side-chain dihedral angles on the local backbone, three types of RLs have been identified: backbone-independent, secondary-structure-dependent and backbone-dependent. In all previous studies, the effect of sequence specificity on side-chain conformational preferences was neglected. In the effort to develop a new class of RLs, we considered that the side-chain conformation of the central residue in each triplet on a protein backbone depends on the sequence of the triplet; therefore, we developed a sequence-dependent rotamer library (SDRL). To accomplish this, 400 possible triplet sequences for 18 natural amino acids as the central residue, which corresponds to 7200 triplet sequences in total, were considered. Searching the set of 11 546 selected PDB entries for the 7200 triplet sequences resulted in 2 364 541 instances occurring for 18 amino acids. Our results show that Leu and Val experience minimal impact from the adjacent residues in adopting side-chain conformations. Cys, Ile, Trp, His, Asp, Met, Glu, Gln, Arg and Lys, on the other hand, adopt their side-chain conformations mostly based on the adjacent residues on the backbone. The remaining residue types were moderately dependent on the adjacent residues. Using the new library, side-chain repacking algorithms can find preferred conformations of each residue more easily than with other backbone-independent RLs.
Collapse
Affiliation(s)
- Mohammad Taghizadeh
- Laboratory of Biophysics and Molecular Biology, Institute of Biochemistry and Biophysics (IBB), Tehran University, P.O. Box 13145-1384, Tehran, Iran.
| | | | | |
Collapse
|
8
|
Pottel J, Moitessier N. Single-Point Mutation with a Rotamer Library Toolkit: Toward Protein Engineering. J Chem Inf Model 2015; 55:2657-71. [PMID: 26623941 DOI: 10.1021/acs.jcim.5b00525] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Protein engineers have long been hard at work to harness biocatalysts as a natural source of regio-, stereo-, and chemoselectivity in order to carry out chemistry (reactions and/or substrates) not previously achieved with these enzymes. The extreme labor demands and exponential number of mutation combinations have induced computational advances in this domain. The first step in our virtual approach is to predict the correct conformations upon mutation of residues (i.e., rebuilding side chains). For this purpose, we opted for a combination of molecular mechanics and statistical data. In this work, we have developed automated computational tools to extract protein structural information and created conformational libraries for each amino acid dependent on a variable number of parameters (e.g., resolution, flexibility, secondary structure). We have also developed the necessary tool to apply the mutation and optimize the conformation accordingly. For side-chain conformation prediction, we obtained overall average root-mean-square deviations (RMSDs) of 0.91 and 1.01 Å for the 18 flexible natural amino acids within two distinct sets of over 3000 and 1500 side-chain residues, respectively. The commonly used dihedral angle differences were also evaluated and performed worse than the state of the art. These two metrics are also compared. Furthermore, we generated a family-specific library for kinases that produced an average 2% lower RMSD upon side-chain reconstruction and a residue-specific library that yielded a 17% improvement. Ultimately, since our protein engineering outlook involves using our docking software, Fitted/Impacts, we applied our mutation protocol to a benchmarked data set for self- and cross-docking. Our side-chain reconstruction does not hinder our docking software, demonstrating differences in pose prediction accuracy of approximately 2% (RMSD cutoff metric) for a set of over 200 protein/ligand structures. Similarly, when docking to a set of over 100 kinases, side-chain reconstruction (using both general and biased conformation libraries) had minimal detriment to the docking accuracy.
Collapse
Affiliation(s)
- Joshua Pottel
- Department of Chemistry, McGill University , 801 Sherbrooke Street West, Montreal, QC, Canada H3A 0B8
| | - Nicolas Moitessier
- Department of Chemistry, McGill University , 801 Sherbrooke Street West, Montreal, QC, Canada H3A 0B8
| |
Collapse
|
9
|
Hughes TJ, Cardamone S, Popelier PLA. Realistic sampling of amino acid geometries for a multipolar polarizable force field. J Comput Chem 2015; 36:1844-57. [PMID: 26235784 PMCID: PMC4973712 DOI: 10.1002/jcc.24006] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Revised: 06/19/2015] [Accepted: 06/20/2015] [Indexed: 12/19/2022]
Abstract
The Quantum Chemical Topological Force Field (QCTFF) uses the machine learning method kriging to map atomic multipole moments to the coordinates of all atoms in the molecular system. It is important that kriging operates on relevant and realistic training sets of molecular geometries. Therefore, we sampled single amino acid geometries directly from protein crystal structures stored in the Protein Databank (PDB). This sampling enhances the conformational realism (in terms of dihedral angles) of the training geometries. However, these geometries can be fraught with inaccurate bond lengths and valence angles due to artefacts of the refinement process of the X-ray diffraction patterns, combined with experimentally invisible hydrogen atoms. This is why we developed a hybrid PDB/nonstationary normal modes (NM) sampling approach called PDB/NM. This method is superior over standard NM sampling, which captures only geometries optimized from the stationary points of single amino acids in the gas phase. Indeed, PDB/NM combines the sampling of relevant dihedral angles with chemically correct local geometries. Geometries sampled using PDB/NM were used to build kriging models for alanine and lysine, and their prediction accuracy was compared to models built from geometries sampled from three other sampling approaches. Bond length variation, as opposed to variation in dihedral angles, puts pressure on prediction accuracy, potentially lowering it. Hence, the larger coverage of dihedral angles of the PDB/NM method does not deteriorate the predictive accuracy of kriging models, compared to the NM sampling around local energetic minima used so far in the development of QCTFF.
Collapse
Affiliation(s)
- Timothy J Hughes
- Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester, M1 7DN, Great Britain
- School of Chemistry, University of Manchester, Oxford Road, Manchester, M13 9PL, Great Britain
| | - Salvatore Cardamone
- Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester, M1 7DN, Great Britain
- School of Chemistry, University of Manchester, Oxford Road, Manchester, M13 9PL, Great Britain
| | - Paul L A Popelier
- Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester, M1 7DN, Great Britain
- School of Chemistry, University of Manchester, Oxford Road, Manchester, M13 9PL, Great Britain
| |
Collapse
|
10
|
Ahmed MH, Koparde VN, Safo MK, Neel Scarsdale J, Kellogg GE. 3d interaction homology: The structurally known rotamers of tyrosine derive from a surprisingly limited set of information-rich hydropathic interaction environments described by maps. Proteins 2015; 83:1118-36. [DOI: 10.1002/prot.24813] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Revised: 02/24/2015] [Accepted: 04/05/2015] [Indexed: 12/20/2022]
Affiliation(s)
- Mostafa H. Ahmed
- Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University; Richmond VA 23298 USA
- Department of Medicinal Chemistry; Virginia Commonwealth University; Richmond VA 23298 USA
| | - Vishal N. Koparde
- Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University; Richmond VA 23298 USA
- Department of Medicinal Chemistry; Virginia Commonwealth University; Richmond VA 23298 USA
| | - Martin K. Safo
- Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University; Richmond VA 23298 USA
- Department of Medicinal Chemistry; Virginia Commonwealth University; Richmond VA 23298 USA
| | - J. Neel Scarsdale
- Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University; Richmond VA 23298 USA
- Center for the Study of Biological Complexity; Virginia Commonwealth University; Richmond VA 23298 USA
| | - Glen E. Kellogg
- Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University; Richmond VA 23298 USA
- Department of Medicinal Chemistry; Virginia Commonwealth University; Richmond VA 23298 USA
| |
Collapse
|