1
|
Noske J, Kynast JP, Lemm D, Schmidt S, Höcker B. PocketOptimizer 2.0: A modular framework for computer-aided ligand-binding design. Protein Sci 2023; 32:e4516. [PMID: 36403089 PMCID: PMC9793973 DOI: 10.1002/pro.4516] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 11/12/2022] [Accepted: 11/14/2022] [Indexed: 11/21/2022]
Abstract
The ability to design customized proteins to perform specific tasks is of great interest. We are particularly interested in the design of sensitive and specific small molecule ligand-binding proteins for biotechnological or biomedical applications. Computational methods can narrow down the immense combinatorial space to find the best solution and thus provide starting points for experimental procedures. However, success rates strongly depend on accurate modeling and energetic evaluation. Not only intra- but also intermolecular interactions have to be considered. To address this problem, we developed PocketOptimizer, a modular computational protein design pipeline, that predicts mutations in the binding pockets of proteins to increase affinity for a specific ligand. Its modularity enables users to compare different combinations of force fields, rotamer libraries, and scoring functions. Here, we present a much-improved version--PocketOptimizer 2.0. We implemented a cleaner user interface, an extended architecture with more supported tools, such as force fields and scoring functions, a backbone-dependent rotamer library, as well as different improvements in the underlying algorithms. Version 2.0 was tested against a benchmark of design cases and assessed in comparison to the first version. Our results show how newly implemented features such as the new rotamer library can lead to improved prediction accuracy. Therefore, we believe that PocketOptimizer 2.0, with its many new and improved functionalities, provides a robust and versatile environment for the design of small molecule-binding pockets in proteins. It is widely applicable and extendible due to its modular framework. PocketOptimizer 2.0 can be downloaded at https://github.com/Hoecker-Lab/pocketoptimizer.
Collapse
Affiliation(s)
- Jakob Noske
- Department of BiochemistryUniversity of BayreuthBayreuthGermany
| | | | - Dominik Lemm
- Department of BiochemistryUniversity of BayreuthBayreuthGermany,Present address:
Department of PhysicsUniversity of ViennaViennaAustria
| | - Steffen Schmidt
- Computational BiochemistryUniversity of BayreuthBayreuthGermany
| | - Birte Höcker
- Department of BiochemistryUniversity of BayreuthBayreuthGermany
| |
Collapse
|
2
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| |
Collapse
|
3
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
4
|
Bouchiba Y, Ruffini M, Schiex T, Barbe S. Computational Design of Miniprotein Binders. Methods Mol Biol 2022; 2405:361-382. [PMID: 35298822 DOI: 10.1007/978-1-0716-1855-4_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Miniprotein binders hold a great interest as a class of drugs that bridges the gap between monoclonal antibodies and small molecule drugs. Like monoclonal antibodies, they can be designed to bind to therapeutic targets with high affinity, but they are more stable and easier to produce and to administer. In this chapter, we present a structure-based computational generic approach for miniprotein inhibitor design. Specifically, we describe step-by-step the implementation of the approach for the design of miniprotein binders against the SARS-CoV-2 coronavirus, using available structural data on the SARS-CoV-2 spike receptor binding domain (RBD) in interaction with its native target, the human receptor ACE2. Structural data being increasingly accessible around many protein-protein interaction systems, this method might be applied to the design of miniprotein binders against numerous therapeutic targets. The computational pipeline exploits provable and deterministic artificial intelligence-based protein design methods, with some recent additions in terms of binding energy estimation, multistate design and diverse library generation.
Collapse
Affiliation(s)
- Younes Bouchiba
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France
| | - Manon Ruffini
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, Toulouse, France
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, Toulouse, France
| | - Sophie Barbe
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France.
| |
Collapse
|
5
|
Defresne M, Barbe S, Schiex T. Protein Design with Deep Learning. Int J Mol Sci 2021; 22:11741. [PMID: 34769173 PMCID: PMC8584038 DOI: 10.3390/ijms222111741] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/23/2021] [Accepted: 10/26/2021] [Indexed: 12/21/2022] Open
Abstract
Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.
Collapse
Affiliation(s)
- Marianne Defresne
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| |
Collapse
|
6
|
Maguire JB, Grattarola D, Mulligan VK, Klyshko E, Melo H. XENet: Using a new graph convolution to accelerate the timeline for protein design on quantum computers. PLoS Comput Biol 2021; 17:e1009037. [PMID: 34570773 PMCID: PMC8496835 DOI: 10.1371/journal.pcbi.1009037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 10/07/2021] [Accepted: 09/14/2021] [Indexed: 11/30/2022] Open
Abstract
Graph representations are traditionally used to represent protein structures in sequence design protocols in which the protein backbone conformation is known. This infrequently extends to machine learning projects: existing graph convolution algorithms have shortcomings when representing protein environments. One reason for this is the lack of emphasis on edge attributes during massage-passing operations. Another reason is the traditionally shallow nature of graph neural network architectures. Here we introduce an improved message-passing operation that is better equipped to model local kinematics problems such as protein design. Our approach, XENet, pays special attention to both incoming and outgoing edge attributes. We compare XENet against existing graph convolutions in an attempt to decrease rotamer sample counts in Rosetta's rotamer substitution protocol, used for protein side-chain optimization and sequence design. This use case is motivating because it both reduces the size of the search space for classical side-chain optimization algorithms, and allows larger protein design problems to be solved with quantum algorithms on near-term quantum computers with limited qubit counts. XENet outperformed competing models while also displaying a greater tolerance for deeper architectures. We found that XENet was able to decrease rotamer counts by 40% without loss in quality. This decreased the memory consumption for classical pre-computation of rotamer energies in our use case by more than a factor of 3, the qubit consumption for an existing sequence design quantum algorithm by 40%, and the size of the solution space by a factor of 165. Additionally, XENet displayed an ability to handle deeper architectures than competing convolutions.
Collapse
Affiliation(s)
- Jack B. Maguire
- Menten AI, Inc., Palo Alto, California, United States of America
| | - Daniele Grattarola
- Faculty of Informatics, Università della Svizzera italiana, Lugano, Switzerland
| | - Vikram Khipple Mulligan
- Center for Computational Biology, Flatiron Institute, New York, New York, United States of America
| | - Eugene Klyshko
- Menten AI, Inc., Palo Alto, California, United States of America
- Department of Physics, University of Toronto, Toronto, Ontario, Canada
| | - Hans Melo
- Menten AI, Inc., Palo Alto, California, United States of America
| |
Collapse
|
7
|
Guaranteed Diversity and Optimality in Cost Function Network Based Computational Protein Design Methods. ALGORITHMS 2021. [DOI: 10.3390/a14060168] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Proteins are the main active molecules of life. Although natural proteins play many roles, as enzymes or antibodies for example, there is a need to go beyond the repertoire of natural proteins to produce engineered proteins that precisely meet application requirements, in terms of function, stability, activity or other protein capacities. Computational Protein Design aims at designing new proteins from first principles, using full-atom molecular models. However, the size and complexity of proteins require approximations to make them amenable to energetic optimization queries. These approximations make the design process less reliable, and a provable optimal solution may fail. In practice, expensive libraries of solutions are therefore generated and tested. In this paper, we explore the idea of generating libraries of provably diverse low-energy solutions by extending cost function network algorithms with dedicated automaton-based diversity constraints on a large set of realistic full protein redesign problems. We observe that it is possible to generate provably diverse libraries in reasonable time and that the produced libraries do enhance the Native Sequence Recovery, a traditional measure of design methods reliability.
Collapse
|
8
|
Bouchiba Y, Cortés J, Schiex T, Barbe S. Molecular flexibility in computational protein design: an algorithmic perspective. Protein Eng Des Sel 2021; 34:6271252. [PMID: 33959778 DOI: 10.1093/protein/gzab011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/12/2021] [Accepted: 03/29/2021] [Indexed: 12/19/2022] Open
Abstract
Computational protein design (CPD) is a powerful technique for engineering new proteins, with both great fundamental implications and diverse practical interests. However, the approximations usually made for computational efficiency, using a single fixed backbone and a discrete set of side chain rotamers, tend to produce rigid and hyper-stable folds that may lack functionality. These approximations contrast with the demonstrated importance of molecular flexibility and motions in a wide range of protein functions. The integration of backbone flexibility and multiple conformational states in CPD, in order to relieve the inaccuracies resulting from these simplifications and to improve design reliability, are attracting increased attention. However, the greatly increased search space that needs to be explored in these extensions defines extremely challenging computational problems. In this review, we outline the principles of CPD and discuss recent effort in algorithmic developments for incorporating molecular flexibility in the design process.
Collapse
Affiliation(s)
- Younes Bouchiba
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France.,Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Juan Cortés
- Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Thomas Schiex
- Université de Toulouse, ANITI, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France
| |
Collapse
|
9
|
Karimi M, Zhu S, Cao Y, Shen Y. De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks. J Chem Inf Model 2020; 60:5667-5681. [PMID: 32945673 PMCID: PMC7775287 DOI: 10.1021/acs.jcim.0c00593] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Although massive data is quickly accumulating on protein sequence and structure, there is a small and limited number of protein architectural types (or structural folds). This study is addressing the following question: how well could one reveal underlying sequence-structure relationships and design protein sequences for an arbitrary, potentially novel, structural fold? In response to the question, we have developed novel deep generative models, namely, semisupervised gcWGAN (guided, conditional, Wasserstein Generative Adversarial Networks). To overcome training difficulties and improve design qualities, we build our models on conditional Wasserstein GAN (WGAN) that uses Wasserstein distance in the loss function. Our major contributions include (1) constructing a low-dimensional and generalizable representation of the fold space for the conditional input, (2) developing an ultrafast sequence-to-fold predictor (or oracle) and incorporating its feedback into WGAN as a loss to guide model training, and (3) exploiting sequence data with and without paired structures to enable a semisupervised training strategy. Assessed by the oracle over 100 novel folds not in the training set, gcWGAN generates more successful designs and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). Assessed by sequence- and structure-based predictors, gcWGAN designs are physically and biologically sound. Assessed by a structure predictor over representative novel folds, including one not even part of basis folds, gcWGAN designs have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. The ultrafast data-driven model is further shown to boost the success of a principle-driven de novo method (RosettaDesign), through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning generalizable principles from current sequence-structure data. Data, source codes, and trained models are available at https://github.com/Shen-Lab/gcWGAN.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| |
Collapse
|
10
|
Lowegard AU, Frenkel MS, Holt GT, Jou JD, Ojewole AA, Donald BR. Novel, provable algorithms for efficient ensemble-based computational protein design and their application to the redesign of the c-Raf-RBD:KRas protein-protein interface. PLoS Comput Biol 2020; 16:e1007447. [PMID: 32511232 PMCID: PMC7329130 DOI: 10.1371/journal.pcbi.1007447] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 07/01/2020] [Accepted: 05/13/2020] [Indexed: 11/25/2022] Open
Abstract
The K* algorithm provably approximates partition functions for a set of states (e.g., protein, ligand, and protein-ligand complex) to a user-specified accuracy ε. Often, reaching an ε-approximation for a particular set of partition functions takes a prohibitive amount of time and space. To alleviate some of this cost, we introduce two new algorithms into the osprey suite for protein design: fries, a Fast Removal of Inadequately Energied Sequences, and EWAK*, an Energy Window Approximation to K*. fries pre-processes the sequence space to limit a design to only the most stable, energetically favorable sequence possibilities. EWAK* then takes this pruned sequence space as input and, using a user-specified energy window, calculates K* scores using the lowest energy conformations. We expect fries/EWAK* to be most useful in cases where there are many unstable sequences in the design sequence space and when users are satisfied with enumerating the low-energy ensemble of conformations. In combination, these algorithms provably retain calculational accuracy while limiting the input sequence space and the conformations included in each partition function calculation to only the most energetically favorable, effectively reducing runtime while still enriching for desirable sequences. This combined approach led to significant speed-ups compared to the previous state-of-the-art multi-sequence algorithm, BBK*, while maintaining its efficiency and accuracy, which we show across 40 different protein systems and a total of 2,826 protein design problems. Additionally, as a proof of concept, we used these new algorithms to redesign the protein-protein interface (PPI) of the c-Raf-RBD:KRas complex. The Ras-binding domain of the protein kinase c-Raf (c-Raf-RBD) is the tightest known binder of KRas, a protein implicated in difficult-to-treat cancers. fries/EWAK* accurately retrospectively predicted the effect of 41 different sets of mutations in the PPI of the c-Raf-RBD:KRas complex. Notably, these mutations include mutations whose effect had previously been incorrectly predicted using other computational methods. Next, we used fries/EWAK* for prospective design and discovered a novel point mutation that improves binding of c-Raf-RBD to KRas in its active, GTP-bound state (KRasGTP). We combined this new mutation with two previously reported mutations (which were highly-ranked by osprey) to create a new variant of c-Raf-RBD, c-Raf-RBD(RKY). fries/EWAK* in osprey computationally predicted that this new variant binds even more tightly than the previous best-binding variant, c-Raf-RBD(RK). We measured the binding affinity of c-Raf-RBD(RKY) using a bio-layer interferometry (BLI) assay, and found that this new variant exhibits single-digit nanomolar affinity for KRasGTP, confirming the computational predictions made with fries/EWAK*. This new variant binds roughly five times more tightly than the previous best known binder and roughly 36 times more tightly than the design starting point (wild-type c-Raf-RBD). This study steps through the advancement and development of computational protein design by presenting theory, new algorithms, accurate retrospective designs, new prospective designs, and biochemical validation. Computational structure-based protein design is an innovative tool for redesigning proteins to introduce a particular or novel function. One such function is improving the binding of one protein to another, which can increase our understanding of important protein systems. Herein we introduce two novel, provable algorithms, fries and EWAK*, for more efficient computational structure-based protein design as well as their application to the redesign of the c-Raf-RBD:KRas protein-protein interface. These new algorithms speed-up computational structure-based protein design while maintaining accurate calculations, allowing for larger, previously infeasible protein designs. Additionally, using fries and EWAK* within the osprey suite, we designed the tightest known binder of KRas, a heavily studied cancer target that interacts with a number of different proteins. This previously undiscovered variant of a KRas-binding domain, c-Raf-RBD, has potential to serve as a tool to further probe the protein-protein interface of KRas with its effectors and its discovery alone emphasizes the potential for more successful applications of computational structure-based protein design.
Collapse
Affiliation(s)
- Anna U. Lowegard
- Program in Computational Biology and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Marcel S. Frenkel
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Graham T. Holt
- Program in Computational Biology and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Jonathan D. Jou
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Adegoke A. Ojewole
- Program in Computational Biology and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
11
|
Mulligan VK. The emerging role of computational design in peptide macrocycle drug discovery. Expert Opin Drug Discov 2020; 15:833-852. [PMID: 32345066 DOI: 10.1080/17460441.2020.1751117] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Drug discovery is a laborious process with rising cost per new drug. Peptide macrocycles are promising therapeutics, though conformational flexibility can reduce target affinity and specificity. Recent computational advancements address this problem by enabling rational design of rigidly folded peptide macrocycles. AREAS COVERED This review summarizes currently approved peptide macrocycle therapeutics and discusses advantages of mesoscale drugs over small molecules or protein therapeutics. It describes the history, rationale, and state of the art of computational tools, such as Rosetta, that allow the design of rigidly structured peptide macrocycles. The emerging pipeline for designing peptide macrocycle drugs is described, including current challenges in designing permeable molecules that can emulate the chameleonic behavior of natural macrocycles. Prospects for reducing computational cost and improving accuracy with emerging computational technologies are also discussed. EXPERT OPINION To embrace computational design of peptide macrocycle drugs, we must shift current attitudes regarding the role of computation in drug discovery, and move beyond Lipinski's rules. This technology has the potential to shift failures to earlier in silico stages of the drug discovery process, improving success rates in costly clinical trials. Given the available tools, now is the time for drug developers to incorporate peptide macrocycle design into drug discovery pipelines.
Collapse
Affiliation(s)
- Vikram K Mulligan
- Systems Biology, Center for Computational Biology, Flatiron Institute , New York, NY, USA
| |
Collapse
|
12
|
Jou JD, Holt GT, Lowegard AU, Donald BR. Minimization-Aware Recursive K*: A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape. J Comput Biol 2019; 27:550-564. [PMID: 31855059 DOI: 10.1089/cmb.2019.0315] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Protein design algorithms that model continuous sidechain flexibility and conformational ensembles better approximate the in vitro and in vivo behavior of proteins. The previous state of the art, iMinDEE-A*-K*, computes provable ɛ-approximations to partition functions of protein states (e.g., bound vs. unbound) by computing provable, admissible pairwise-minimized energy lower bounds on protein conformations, and using the A* enumeration algorithm to return a gap-free list of lowest-energy conformations. iMinDEE-A*-K* runs in time sublinear in the number of conformations, but can be trapped in loosely-bounded, low-energy conformational wells containing many conformations with highly similar energies. That is, iMinDEE-A*-K* is unable to exploit the correlation between protein conformation and energy: similar conformations often have similar energy. We introduce two new concepts that exploit this correlation: Minimization-Aware Enumeration and Recursive K*. We combine these two insights into a novel algorithm, Minimization-Aware Recursive K* (MARK*), which tightens bounds not on single conformations, but instead on distinct regions of the conformation space. We compare the performance of iMinDEE-A*-K* versus MARK* by running the Branch and Bound over K* (BBK*) algorithm, which provably returns sequences in order of decreasing K* score, using either iMinDEE-A*-K* or MARK* to approximate partition functions. We show on 200 design problems that MARK* not only enumerates and minimizes vastly fewer conformations than the previous state of the art, but also runs up to 2 orders of magnitude faster. Finally, we show that MARK* not only efficiently approximates the partition function, but also provably approximates the energy landscape. To our knowledge, MARK* is the first algorithm to do so. We use MARK* to analyze the change in energy landscape of the bound and unbound states of an HIV-1 capsid protein C-terminal domain in complex with a camelid VHH, and measure the change in conformational entropy induced by binding. Thus, MARK* both accelerates existing designs and offers new capabilities not possible with previous algorithms.
Collapse
Affiliation(s)
- Jonathan D Jou
- Department of Computer Science, Duke University, Durham, North Carolina
| | - Graham T Holt
- Department of Computer Science, Duke University, Durham, North Carolina.,Computational Biology and Bioinformatics Program, Duke University, Durham, North Carolina
| | - Anna U Lowegard
- Department of Computer Science, Duke University, Durham, North Carolina.,Computational Biology and Bioinformatics Program, Duke University, Durham, North Carolina
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, North Carolina.,Department of Biochemistry, Duke University Medical Center, Durham, North Carolina.,Department of Chemistry, Duke University, Durham, North Carolina
| |
Collapse
|
13
|
Holt GT, Jou JD, Gill NP, Lowegard AU, Martin JW, Madden DR, Donald BR. Computational Analysis of Energy Landscapes Reveals Dynamic Features That Contribute to Binding of Inhibitors to CFTR-Associated Ligand. J Phys Chem B 2019; 123:10441-10455. [PMID: 31697075 DOI: 10.1021/acs.jpcb.9b07278] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The CFTR-associated ligand PDZ domain (CALP) binds to the cystic fibrosis transmembrane conductance regulator (CFTR) and mediates lysosomal degradation of mature CFTR. Inhibition of this interaction has been explored as a therapeutic avenue for cystic fibrosis. Previously, we reported the ensemble-based computational design of a novel peptide inhibitor of CALP, which resulted in the most binding-efficient inhibitor to date. This inhibitor, kCAL01, was designed using osprey and evinced significant biological activity in in vitro cell-based assays. Here, we report a crystal structure of kCAL01 bound to CALP and compare structural features against iCAL36, a previously developed inhibitor of CALP. We compute side-chain energy landscapes for each structure to not only enable approximation of binding thermodynamics but also reveal ensemble features that contribute to the comparatively efficient binding of kCAL01. Finally, we compare the previously reported design ensemble for kCAL01 vs the new crystal structure and show that, despite small differences between the design model and crystal structure, significant biophysical features that enhance inhibitor binding are captured in the design ensemble. This suggests not only that ensemble-based design captured thermodynamically significant features observed in vitro, but also that a design eschewing ensembles would miss the kCAL01 sequence entirely.
Collapse
Affiliation(s)
- Graham T Holt
- Department of Computer Science , Duke University , Durham , North Carolina 27708 , United States.,Program in Computational Biology and Bioinformatics , Duke University , Durham , North Carolina 27708 , United States
| | - Jonathan D Jou
- Department of Computer Science , Duke University , Durham , North Carolina 27708 , United States
| | - Nicholas P Gill
- Department of Biochemistry & Cell Biology , Geisel School of Medicine at Dartmouth , Hanover , New Hampshire 03755 , United States
| | - Anna U Lowegard
- Department of Computer Science , Duke University , Durham , North Carolina 27708 , United States.,Program in Computational Biology and Bioinformatics , Duke University , Durham , North Carolina 27708 , United States
| | - Jeffrey W Martin
- Department of Computer Science , Duke University , Durham , North Carolina 27708 , United States
| | - Dean R Madden
- Department of Biochemistry & Cell Biology , Geisel School of Medicine at Dartmouth , Hanover , New Hampshire 03755 , United States
| | - Bruce R Donald
- Department of Computer Science , Duke University , Durham , North Carolina 27708 , United States.,Department of Biochemistry , Duke University , Durham , North Carolina 27710 , United States.,Department of Chemistry , Duke University , Durham , North Carolina 27710 , United States
| |
Collapse
|
14
|
HALLEN MARKA, DONALD BRUCER. Protein Design by Provable Algorithms. COMMUNICATIONS OF THE ACM 2019; 62:76-84. [PMID: 31607753 PMCID: PMC6788629 DOI: 10.1145/3338124] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein design algorithms can leverage provable guarantees of accuracy to provide new insights and unique optimized molecules.
Collapse
Affiliation(s)
- MARK A. HALLEN
- Research assistant professor at the Toyota Technological Institute at Chicago, IL, USA
| | - BRUCE R. DONALD
- James B. Duke Professor of Computer Science at Duke University, as well as a
professor of chemistry and biochemistry in the Duke University Medical
Center, Durham, NC, USA
| |
Collapse
|
15
|
Vucinic J, Simoncini D, Ruffini M, Barbe S, Schiex T. Positive multistate protein design. Bioinformatics 2019; 36:122-130. [DOI: 10.1093/bioinformatics/btz497] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 05/20/2019] [Accepted: 06/11/2019] [Indexed: 11/12/2022] Open
Abstract
Abstract
Motivation
Structure-based computational protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. The usual approach considers a single rigid backbone as a target, which ignores backbone flexibility. Multistate design (MSD) allows instead to consider several backbone states simultaneously, defining challenging computational problems.
Results
We introduce efficient reductions of positive MSD problems to Cost Function Networks with two different fitness definitions and implement them in the Pompd (Positive Multistate Protein design) software. Pompd is able to identify guaranteed optimal sequences of positive multistate full protein redesign problems and exhaustively enumerate suboptimal sequences close to the MSD optimum. Applied to nuclear magnetic resonance and back-rubbed X-ray structures, we observe that the average energy fitness provides the best sequence recovery. Our method outperforms state-of-the-art guaranteed computational design approaches by orders of magnitudes and can solve MSD problems with sizes previously unreachable with guaranteed algorithms.
Availability and implementation
https://forgemia.inra.fr/thomas.schiex/pompd as documented Open Source.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jelena Vucinic
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan Cedex, France
| | - David Simoncini
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
- IRIT UMR 5505-CNRS, Université de Toulouse, 31042 Cedex 9, France
| | - Manon Ruffini
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan Cedex, France
| | - Sophie Barbe
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
| | - Thomas Schiex
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan Cedex, France
| |
Collapse
|
16
|
Simoncini D, Zhang KYJ, Schiex T, Barbe S. A structural homology approach for computational protein design with flexible backbone. Bioinformatics 2018; 35:2418-2426. [DOI: 10.1093/bioinformatics/bty975] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 11/01/2018] [Accepted: 11/28/2018] [Indexed: 01/09/2023] Open
Abstract
Abstract
Motivation
Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs.
Results
We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%.
Availability and implementation
Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Simoncini
- Laboratoire d'Ingénierie des Systèmes Biologiques et des Procédés, LISBP, Université de Toulouse, CNRS, INRA, INSA, F Toulouse cedex 04, France
- Institut de recherche en informatique de Toulouse, IRIT, UMR 5505-CNRS, Université de Toulouse, Cedex 9, France
| | - Kam Y J Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa, Japan
| | - Thomas Schiex
- Institut de recherche en informatique de Toulouse, UMR 5505-CNRS, Université de Toulouse, Cedex 9, France
| | - Sophie Barbe
- Laboratoire d'Ingénierie des Systèmes Biologiques et des Procédés, LISBP, Université de Toulouse, CNRS, INRA, INSA, F Toulouse cedex 04, France
| |
Collapse
|
17
|
Hallen MA. PLUG (Pruning of Local Unrealistic Geometries) removes restrictions on biophysical modeling for protein design. Proteins 2018; 87:62-73. [PMID: 30378699 DOI: 10.1002/prot.25623] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 10/10/2018] [Accepted: 10/16/2018] [Indexed: 12/29/2022]
Abstract
Protein design algorithms must search an enormous conformational space to identify favorable conformations. As a result, those that perform this search with guarantees of accuracy generally start with a conformational pruning step, such as dead-end elimination (DEE). However, the mathematical assumptions of DEE-based pruning algorithms have up to now severely restricted the biophysical model that can feasibly be used in protein design. To lift these restrictions, I propose to prune local unrealistic geometries (PLUG) using a linear programming-based method. PLUG's biophysical model consists only of well-known lower bounds on interatomic distances. PLUG is intended as preprocessing for energy-based protein design calculations, whose biophysical model need not support DEE pruning. Based on 96 test cases, PLUG is at least as effective at pruning as DEE for larger protein designs-the type that most require pruning. When combined with the LUTE protein design algorithm, PLUG greatly facilitates designs that account for continuous entropy, large multistate designs with continuous flexibility, and designs with extensive continuous backbone flexibility and advanced nonpairwise energy functions. Many of these designs are tractable only with PLUG, either for empirical reasons (LUTE's machine learning step achieves an accurate fit only after PLUG pruning), or for theoretical reasons (many energy functions are fundamentally incompatible with DEE).
Collapse
Affiliation(s)
- Mark A Hallen
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
18
|
Charpentier A, Mignon D, Barbe S, Cortes J, Schiex T, Simonson T, Allouche D. Variable Neighborhood Search with Cost Function Networks To Solve Large Computational Protein Design Problems. J Chem Inf Model 2018; 59:127-136. [DOI: 10.1021/acs.jcim.8b00510] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | - David Mignon
- Laboratoire de Biochimie (CNRS UMR 7654), École Polytechnique, 91128 Palaiseau, France
| | - Sophie Barbe
- Laboratoire d’Ingénierie des Systèmes Biologiques et Procédés, LISBP, Université de Toulouse, CNRS, INRA, INSA, 31077 Toulouse, France
| | - Juan Cortes
- LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Thomas Schiex
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR 7654), École Polytechnique, 91128 Palaiseau, France
| | - David Allouche
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan, France
| |
Collapse
|
19
|
Lechner H, Ferruz N, Höcker B. Strategies for designing non-natural enzymes and binders. Curr Opin Chem Biol 2018; 47:67-76. [PMID: 30248579 DOI: 10.1016/j.cbpa.2018.07.022] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2018] [Revised: 07/16/2018] [Accepted: 07/17/2018] [Indexed: 12/20/2022]
Abstract
The design of tailor-made enzymes is a major goal in biochemical research that can result in wide-range applications and will lead to a better understanding of how proteins fold and function. In this review we highlight recent advances in enzyme and small molecule binder design. A focus is placed on novel strategies for the design of scaffolds, developments in computational methods, and recent applications of these techniques on receptors, sensors, and enzymes. Further, the integration of computational and experimental methodologies is discussed. The outlined examples of designed enzymes and binders for various purposes highlight the importance of this topic and underline the need for tailor-made proteins.
Collapse
Affiliation(s)
- Horst Lechner
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany.
| |
Collapse
|
20
|
Dauzhenka T, Kundrotas PJ, Vakser IA. Computational Feasibility of an Exhaustive Search of Side-Chain Conformations in Protein-Protein Docking. J Comput Chem 2018; 39:2012-2021. [PMID: 30226647 DOI: 10.1002/jcc.25381] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Revised: 03/24/2018] [Accepted: 05/26/2018] [Indexed: 11/07/2022]
Abstract
Protein-protein docking procedures typically perform the global scan of the proteins relative positions, followed by the local refinement of the putative matches. Because of the size of the search space, the global scan is usually implemented as rigid-body search, using computationally inexpensive intermolecular energy approximations. An adequate refinement has to take into account structural flexibility. Since the refinement performs conformational search of the interacting proteins, it is extremely computationally challenging, given the enormous amount of the internal degrees of freedom. Different approaches limit the search space by restricting the search to the side chains, rotameric states, coarse-grained structure representation, principal normal modes, and so on. Still, even with the approximations, the refinement presents an extreme computational challenge due to the very large number of the remaining degrees of freedom. Given the complexity of the search space, the advantage of the exhaustive search is obvious. The obstacle to such search is computational feasibility. However, the growing computational power of modern computers, especially due to the increasing utilization of Graphics Processing Unit (GPU) with large amount of specialized computing cores, extends the ranges of applicability of the brute-force search methods. This proof-of-concept study demonstrates computational feasibility of an exhaustive search of side-chain conformations in protein pocking. The procedure, implemented on the GPU architecture, was used to generate the optimal conformations in a large representative set of protein-protein complexes. © 2018 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Taras Dauzhenka
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66047
| | - Petras J Kundrotas
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66047
| | - Ilya A Vakser
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66047.,Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66047
| |
Collapse
|
21
|
Villa F, Panel N, Chen X, Simonson T. Adaptive landscape flattening in amino acid sequence space for the computational design of protein:peptide binding. J Chem Phys 2018; 149:072302. [PMID: 30134674 DOI: 10.1063/1.5022249] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
For the high throughput design of protein:peptide binding, one must explore a vast space of amino acid sequences in search of low binding free energies. This complex problem is usually addressed with either simple heuristic scoring or expensive sequence enumeration schemes. Far more efficient than enumeration is a recent Monte Carlo approach that adaptively flattens the energy landscape in sequence space of the unbound peptide and provides formally exact binding free energy differences. The method allows the binding free energy to be used directly as the design criterion. We propose several improvements that allow still more efficient sampling and can address larger design problems. They include the use of Replica Exchange Monte Carlo and landscape flattening for both the unbound and bound peptides. We used the method to design peptides that bind to the PDZ domain of the Tiam1 signaling protein and could serve as inhibitors of its activity. Four peptide positions were allowed to mutate freely. Almost 75 000 peptide variants were processed in two simulations of 109 steps each that used 1 CPU hour on a desktop machine. 96% of the theoretical sequence space was sampled. The relative binding free energies agreed qualitatively with values from experiment. The sampled sequences agreed qualitatively with an experimental library of Tiam1-binding peptides. The main assumption limiting accuracy is the fixed backbone approximation, which could be alleviated in future work by using increased computational resources and multi-backbone designs.
Collapse
Affiliation(s)
- Francesco Villa
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Nicolas Panel
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Xingyu Chen
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
22
|
Abstract
Motivation Multistate protein design addresses real-world challenges, such as multi-specificity design and backbone flexibility, by considering both positive and negative protein states with an ensemble of substates for each. It also presents an enormous challenge to exact algorithms that guarantee the optimal solutions and enable a direct test of mechanistic hypotheses behind models. However, efficient exact algorithms are lacking for multistate protein design. Results We have developed an efficient exact algorithm called interconnected cost function networks (iCFN) for multistate protein design. Its generic formulation allows for a wide array of applications such as stability, affinity and specificity designs while addressing concerns such as global flexibility of protein backbones. iCFN treats each substate design as a weighted constraint satisfaction problem (WCSP) modeled through a CFN; and it solves the coupled WCSPs using novel bounds and a depth-first branch-and-bound search over a tree structure of sequences, substates, and conformations. When iCFN is applied to specificity design of a T-cell receptor, a problem of unprecedented size to exact methods, it drastically reduces search space and running time to make the problem tractable. Moreover, iCFN generates experimentally-agreeing receptor designs with improved accuracy compared with state-of-the-art methods, highlights the importance of modeling backbone flexibility in protein design, and reveals molecular mechanisms underlying binding specificity. Availability and implementation https://shen-lab.github.io/software/iCFN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering and TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering and TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, USA
| |
Collapse
|
23
|
Hallen MA, Donald BR. CATS (Coordinates of Atoms by Taylor Series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics 2018; 33:i5-i12. [PMID: 28882005 PMCID: PMC5870559 DOI: 10.1093/bioinformatics/btx277] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation When proteins mutate or bind to ligands, their backbones often move significantly, especially in loop regions. Computational protein design algorithms must model these motions in order to accurately optimize protein stability and binding affinity. However, methods for backbone conformational search in design have been much more limited than for sidechain conformational search. This is especially true for combinatorial protein design algorithms, which aim to search a large sequence space efficiently and thus cannot rely on temporal simulation of each candidate sequence. Results We alleviate this difficulty with a new parameterization of backbone conformational space, which represents all degrees of freedom of a specified segment of protein chain that maintain valid bonding geometry (by maintaining the original bond lengths and angles and ω dihedrals). In order to search this space, we present an efficient algorithm, CATS, for computing atomic coordinates as a function of our new continuous backbone internal coordinates. CATS generalizes the iMinDEE and EPIC protein design algorithms, which model continuous flexibility in sidechain dihedrals, to model continuous, appropriately localized flexibility in the backbone dihedrals ϕ and ψ as well. We show using 81 test cases based on 29 different protein structures that CATS finds sequences and conformations that are significantly lower in energy than methods with less or no backbone flexibility do. In particular, we show that CATS can model the viability of an antibody mutation known experimentally to increase affinity, but that appears sterically infeasible when modeled with less or no backbone flexibility. Availability and implementation Our code is available as free software at https://github.com/donaldlab/OSPREY_refactor. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark A Hallen
- Department of Computer Science, Duke University, Durham, NC, USA.,Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, USA.,Department of Chemistry, Duke University, Durham, NC, USA.,Department of Biochemistry, Duke University Medical Center, Durham, NC, USA
| |
Collapse
|
24
|
Ojewole AA, Jou JD, Fowler VG, Donald BR. BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces. J Comput Biol 2018; 25:726-739. [PMID: 29641249 DOI: 10.1089/cmb.2017.0267] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Computational protein design (CPD) algorithms that compute binding affinity, Ka, search for sequences with an energetically favorable free energy of binding. Recent work shows that three principles improve the biological accuracy of CPD: ensemble-based design, continuous flexibility of backbone and side-chain conformations, and provable guarantees of accuracy with respect to the input. However, previous methods that use all three design principles are single-sequence (SS) algorithms, which are very costly: linear in the number of sequences and thus exponential in the number of simultaneously mutable residues. To address this computational challenge, we introduce BBK*, a new CPD algorithm whose key innovation is the multisequence (MS) bound: BBK* efficiently computes a single provable upper bound to approximate Ka for a combinatorial number of sequences, and avoids SS computation for all provably suboptimal sequences. Thus, to our knowledge, BBK* is the first provable, ensemble-based CPD algorithm to run in time sublinear in the number of sequences. Computational experiments on 204 protein design problems show that BBK* finds the tightest binding sequences while approximating Ka for up to 105-fold fewer sequences than the previous state-of-the-art algorithms, which require exhaustive enumeration of sequences. Furthermore, for 51 protein-ligand design problems, BBK* provably approximates Ka up to 1982-fold faster than the previous state-of-the-art iMinDEE/[Formula: see text]/[Formula: see text] algorithm. Therefore, BBK* not only accelerates protein designs that are possible with previous provable algorithms, but also efficiently performs designs that are too large for previous methods.
Collapse
Affiliation(s)
- Adegoke A Ojewole
- 1 Department of Computer Science, Duke University , Durham, North Carolina.,2 Computational Biology and Bioinformatics Program, Duke University , Durham, North Carolina
| | - Jonathan D Jou
- 1 Department of Computer Science, Duke University , Durham, North Carolina
| | - Vance G Fowler
- 3 Division of Infectious Diseases, Duke University Medical Center , Durham, North Carolina
| | - Bruce R Donald
- 1 Department of Computer Science, Duke University , Durham, North Carolina.,4 Department of Biochemistry, Duke University Medical Center , Durham North Carolina
| |
Collapse
|
25
|
Viricel C, de Givry S, Schiex T, Barbe S. Cost function network-based design of protein–protein interactions: predicting changes in binding affinity. Bioinformatics 2018; 34:2581-2589. [DOI: 10.1093/bioinformatics/bty092] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 02/16/2018] [Indexed: 11/14/2022] Open
Affiliation(s)
- Clément Viricel
- Laboratoire d’Ingénierie des Systèmes Biologiques et des Procédés, Université de Toulouse, CNRS, INRA, INSA, Toulouse, France
- Unité de Mathématiques et Informatique Appliquées de Toulouse, INRA, Castanet Tolosan cedex, France
| | - Simon de Givry
- Unité de Mathématiques et Informatique Appliquées de Toulouse, INRA, Castanet Tolosan cedex, France
| | - Thomas Schiex
- Unité de Mathématiques et Informatique Appliquées de Toulouse, INRA, Castanet Tolosan cedex, France
| | - Sophie Barbe
- Laboratoire d’Ingénierie des Systèmes Biologiques et des Procédés, Université de Toulouse, CNRS, INRA, INSA, Toulouse, France
| |
Collapse
|
26
|
Traoré S, Allouche D, André I, Schiex T, Barbe S. Deterministic Search Methods for Computational Protein Design. Methods Mol Biol 2017; 1529:107-123. [PMID: 27914047 DOI: 10.1007/978-1-4939-6637-0_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
One main challenge in Computational Protein Design (CPD) lies in the exploration of the amino-acid sequence space, while considering, to some extent, side chain flexibility. The exorbitant size of the search space urges for the development of efficient exact deterministic search methods enabling identification of low-energy sequence-conformation models, corresponding either to the global minimum energy conformation (GMEC) or an ensemble of guaranteed near-optimal solutions. In contrast to stochastic local search methods that are not guaranteed to find the GMEC, exact deterministic approaches always identify the GMEC and prove its optimality in finite but exponential worst-case time. After a brief overview on these two classes of methods, we discuss the grounds and merits of four deterministic methods that have been applied to solve CPD problems. These approaches are based either on the Dead-End-Elimination theorem combined with A* algorithm (DEE/A*), on Cost Function Networks algorithms (CFN), on Integer Linear Programming solvers (ILP) or on Markov Random Fields solvers (MRF). The way two of these methods (DEE/A* and CFN) can be used in practice to identify low-energy sequence-conformation models starting from a pairwise decomposed energy matrix is detailed in this review.
Collapse
Affiliation(s)
- Seydou Traoré
- INSA, UPS, INP, Université de Toulouse, 135 Avenue de Rangueil, 31077, Toulouse, France
- Laboratoire d'Ingénierie Ingénierie des Systèmes Biologiques et des Procédés - INSA, INRA, UMR792, 31400, Toulouse, France
- CNRS, UMR5504, 31400, Toulouse, France
| | - David Allouche
- Unité de Mathématiques et Informatique de Toulouse, UR 875, INRA, 31320, Castanet Tolosan, France
| | - Isabelle André
- INSA, UPS, INP, Université de Toulouse, 135 Avenue de Rangueil, 31077, Toulouse, France
- Laboratoire d'Ingénierie Ingénierie des Systèmes Biologiques et des Procédés - INSA, INRA, UMR792, 31400, Toulouse, France
- CNRS, UMR5504, 31400, Toulouse, France
| | - Thomas Schiex
- Unité de Mathématiques et Informatique de Toulouse, UR 875, INRA, 31320, Castanet Tolosan, France
| | - Sophie Barbe
- INSA, UPS, INP, Université de Toulouse, 135 Avenue de Rangueil, 31077, Toulouse, France.
- Laboratoire d'Ingénierie Ingénierie des Systèmes Biologiques et des Procédés - INSA, INRA, UMR792, 31400, Toulouse, France.
- CNRS, UMR5504, 31400, Toulouse, France.
| |
Collapse
|
27
|
Hallen MA, Jou JD, Donald BR. LUTE (Local Unpruned Tuple Expansion): Accurate Continuously Flexible Protein Design with General Energy Functions and Rigid Rotamer-Like Efficiency. J Comput Biol 2016; 24:536-546. [PMID: 27681371 DOI: 10.1089/cmb.2016.0136] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Most protein design algorithms search over discrete conformations and an energy function that is residue-pairwise, that is, a sum of terms that depend on the sequence and conformation of at most two residues. Although modeling of continuous flexibility and of non-residue-pairwise energies significantly increases the accuracy of protein design, previous methods to model these phenomena add a significant asymptotic cost to design calculations. We now remove this cost by modeling continuous flexibility and non-residue-pairwise energies in a form suitable for direct input to highly efficient, discrete combinatorial optimization algorithms such as DEE/A* or branch-width minimization. Our novel algorithm performs a local unpruned tuple expansion (LUTE), which can efficiently represent both continuous flexibility and general, possibly nonpairwise energy functions to an arbitrary level of accuracy using a discrete energy matrix. We show using 47 design calculation test cases that LUTE provides a dramatic speedup in both single-state and multistate continuously flexible designs.
Collapse
Affiliation(s)
- Mark A Hallen
- 1 Department of Computer Science, Levine Science Research Center, Duke University , Durham, North Carolina
| | - Jonathan D Jou
- 1 Department of Computer Science, Levine Science Research Center, Duke University , Durham, North Carolina
| | - Bruce R Donald
- 1 Department of Computer Science, Levine Science Research Center, Duke University , Durham, North Carolina.,2 Department of Chemistry, Duke University , Durham, North Carolina.,3 Department of Biochemistry, Duke University Medical Center , Durham, North Carolina
| |
Collapse
|
28
|
Pan Y, Dong Y, Zhou J, Hallen M, Donald BR, Zeng J, Xu W. cOSPREY: A Cloud-Based Distributed Algorithm for Large-Scale Computational Protein Design. J Comput Biol 2016; 23:737-49. [PMID: 27154509 PMCID: PMC5586165 DOI: 10.1089/cmb.2015.0234] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Finding the global minimum energy conformation (GMEC) of a huge combinatorial search space is the key challenge in computational protein design (CPD) problems. Traditional algorithms lack a scalable and efficient distributed design scheme, preventing researchers from taking full advantage of current cloud infrastructures. We design cloud OSPREY (cOSPREY), an extension to a widely used protein design software OSPREY, to allow the original design framework to scale to the commercial cloud infrastructures. We propose several novel designs to integrate both algorithm and system optimizations, such as GMEC-specific pruning, state search partitioning, asynchronous algorithm state sharing, and fault tolerance. We evaluate cOSPREY on three different cloud platforms using different technologies and show that it can solve a number of large-scale protein design problems that have not been possible with previous approaches.
Collapse
Affiliation(s)
- Yuchao Pan
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Yuxi Dong
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Jingtian Zhou
- Department of Pharmacology and Pharmaceutical Sciences, Tsinghua University, Beijing, China
| | - Mark Hallen
- Department of Computer Science, Duke University, Durham, North Carolina
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, North Carolina
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Wei Xu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
29
|
Gainza P, Nisonoff HM, Donald BR. Algorithms for protein design. Curr Opin Struct Biol 2016; 39:16-26. [PMID: 27086078 PMCID: PMC5065368 DOI: 10.1016/j.sbi.2016.03.006] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Revised: 03/15/2016] [Accepted: 03/22/2016] [Indexed: 02/05/2023]
Abstract
Computational structure-based protein design programs are becoming an increasingly important tool in molecular biology. These programs compute protein sequences that are predicted to fold to a target structure and perform a desired function. The success of a program's predictions largely relies on two components: first, the input biophysical model, and second, the algorithm that computes the best sequence(s) and structure(s) according to the biophysical model. Improving both the model and the algorithm in tandem is essential to improving the success rate of current programs, and here we review recent developments in algorithms for protein design, emphasizing how novel algorithms enable the use of more accurate biophysical models. We conclude with a list of algorithmic challenges in computational protein design that we believe will be especially important for the design of therapeutic proteins and protein assemblies.
Collapse
Affiliation(s)
- Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC, United States
| | - Hunter M Nisonoff
- Department of Computer Science, Duke University, Durham, NC, United States
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, United States; Department of Biochemistry, Duke University Medical Center, Durham, NC, United States; Department of Chemistry, Duke University, Durham, NC, United States.
| |
Collapse
|
30
|
Zhou Y, Wu Y, Zeng J. Computational Protein Design Using AND/OR Branch-and-Bound Search. J Comput Biol 2016; 23:439-51. [DOI: 10.1089/cmb.2015.0212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Affiliation(s)
- Yichao Zhou
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Yuexin Wu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
31
|
Mignon D, Simonson T. Comparing three stochastic search algorithms for computational protein design: Monte Carlo, replica exchange Monte Carlo, and a multistart, steepest-descent heuristic. J Comput Chem 2016; 37:1781-93. [PMID: 27197555 DOI: 10.1002/jcc.24393] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Revised: 02/26/2016] [Accepted: 03/27/2016] [Indexed: 01/11/2023]
Abstract
Computational protein design depends on an energy function and an algorithm to search the sequence/conformation space. We compare three stochastic search algorithms: a heuristic, Monte Carlo (MC), and a Replica Exchange Monte Carlo method (REMC). The heuristic performs a steepest-descent minimization starting from thousands of random starting points. The methods are applied to nine test proteins from three structural families, with a fixed backbone structure, a molecular mechanics energy function, and with 1, 5, 10, 20, 30, or all amino acids allowed to mutate. Results are compared to an exact, "Cost Function Network" method that identifies the global minimum energy conformation (GMEC) in favorable cases. The designed sequences accurately reproduce experimental sequences in the hydrophobic core. The heuristic and REMC agree closely and reproduce the GMEC when it is known, with a few exceptions. Plain MC performs well for most cases, occasionally departing from the GMEC by 3-4 kcal/mol. With REMC, the diversity of the sequences sampled agrees with exact enumeration where the latter is possible: up to 2 kcal/mol above the GMEC. Beyond, room temperature replicas sample sequences up to 10 kcal/mol above the GMEC, providing thermal averages and a solution to the inverse protein folding problem. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- David Mignon
- Laboratoire De Biochimie (UMR CNRS 7654), Department Of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire De Biochimie (UMR CNRS 7654), Department Of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
32
|
Traoré S, Roberts KE, Allouche D, Donald BR, André I, Schiex T, Barbe S. Fast search algorithms for computational protein design. J Comput Chem 2016; 37:1048-58. [PMID: 26833706 PMCID: PMC4828276 DOI: 10.1002/jcc.24290] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Revised: 09/23/2015] [Accepted: 11/27/2015] [Indexed: 12/12/2022]
Abstract
One of the main challenges in computational protein design (CPD) is the huge size of the protein sequence and conformational space that has to be computationally explored. Recently, we showed that state-of-the-art combinatorial optimization technologies based on Cost Function Network (CFN) processing allow speeding up provable rigid backbone protein design methods by several orders of magnitudes. Building up on this, we improved and injected CFN technology into the well-established CPD package Osprey to allow all Osprey CPD algorithms to benefit from associated speedups. Because Osprey fundamentally relies on the ability of A* to produce conformations in increasing order of energy, we defined new A* strategies combining CFN lower bounds, with new side-chain positioning-based branching scheme. Beyond the speedups obtained in the new A*-CFN combination, this novel branching scheme enables a much faster enumeration of suboptimal sequences, far beyond what is reachable without it. Together with the immediate and important speedups provided by CFN technology, these developments directly benefit to all the algorithms that previously relied on the DEE/ A* combination inside Osprey* and make it possible to solve larger CPD problems with provable algorithms.
Collapse
Affiliation(s)
- Seydou Traoré
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- INRA, UMR792, Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
| | - Kyle E. Roberts
- Department of Biochemistry, Department of Computer Science, Department of Chemistry, Duke University, Durham, NC, USA
| | - David Allouche
- Unité de Mathématiques et Informatique Appliquées de Toulouse, UR 875, INRA, F-31320 Castanet Tolosan, France
| | - Bruce R. Donald
- Department of Biochemistry, Department of Computer Science, Department of Chemistry, Duke University, Durham, NC, USA
| | - Isabelle André
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- INRA, UMR792, Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
| | - Thomas Schiex
- Unité de Mathématiques et Informatique Appliquées de Toulouse, UR 875, INRA, F-31320 Castanet Tolosan, France
| | - Sophie Barbe
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- INRA, UMR792, Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
| |
Collapse
|
33
|
Gaillard T, Panel N, Simonson T. Protein side chain conformation predictions with an MMGBSA energy function. Proteins 2016; 84:803-19. [PMID: 26948696 DOI: 10.1002/prot.25030] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 02/22/2016] [Accepted: 02/27/2016] [Indexed: 12/17/2022]
Abstract
The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an "MMGBSA" energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803-819. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Thomas Gaillard
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Nicolas Panel
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Thomas Simonson
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| |
Collapse
|
34
|
Maximova T, Moffatt R, Ma B, Nussinov R, Shehu A. Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics. PLoS Comput Biol 2016; 12:e1004619. [PMID: 27124275 PMCID: PMC4849799 DOI: 10.1371/journal.pcbi.1004619] [Citation(s) in RCA: 144] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.
Collapse
Affiliation(s)
- Tatiana Maximova
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Ryan Moffatt
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Buyong Ma
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
| | - Ruth Nussinov
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
- Department of Biongineering, George Mason University, Fairfax, Virginia, United States of America
- School of Systems Biology, George Mason University, Manassas, Virginia, United States of America
| |
Collapse
|
35
|
Simoncini D, Allouche D, de Givry S, Delmas C, Barbe S, Schiex T. Guaranteed Discrete Energy Optimization on Large Protein Design Problems. J Chem Theory Comput 2015; 11:5980-9. [DOI: 10.1021/acs.jctc.5b00594] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
| | - David Allouche
- INRA MIAT, UR 875, Castanet-Tolosan, 31326 Cedex, France
| | - Simon de Givry
- INRA MIAT, UR 875, Castanet-Tolosan, 31326 Cedex, France
| | - Céline Delmas
- INRA MIAT, UR 875, Castanet-Tolosan, 31326 Cedex, France
| | - Sophie Barbe
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
| | - Thomas Schiex
- INRA MIAT, UR 875, Castanet-Tolosan, 31326 Cedex, France
| |
Collapse
|
36
|
Roberts KE, Gainza P, Hallen MA, Donald BR. Fast gap-free enumeration of conformations and sequences for protein design. Proteins 2015; 83:1859-1877. [PMID: 26235965 DOI: 10.1002/prot.24870] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Revised: 07/14/2015] [Accepted: 07/21/2015] [Indexed: 12/12/2022]
Abstract
Despite significant successes in structure-based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest-energy structures and sequences are found. DEE/A*-based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap-free list of low-energy protein conformations, which is necessary for ensemble-based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*-based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs.
Collapse
Affiliation(s)
- Kyle E Roberts
- Department of Computer Science, Duke University, Durham, NC
| | - Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC
| | - Mark A Hallen
- Department of Computer Science, Duke University, Durham, NC
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC.,Department of Biochemistry, Duke University Medical Center, Durham, NC.,Department of Chemistry, Duke University, Durham, NC
| |
Collapse
|
37
|
Roberts KE, Donald BR. Improved energy bound accuracy enhances the efficiency of continuous protein design. Proteins 2015; 83:1151-64. [PMID: 25846627 DOI: 10.1002/prot.24808] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 03/24/2015] [Indexed: 11/07/2022]
Abstract
Flexibility and dynamics are important for protein function and a protein's ability to accommodate amino acid substitutions. However, when computational protein design algorithms search over protein structures, the allowed flexibility is often reduced to a relatively small set of discrete side-chain and backbone conformations. While simplifications in scoring functions and protein flexibility are currently necessary to computationally search the vast protein sequence and conformational space, a rigid representation of a protein causes the search to become brittle and miss low-energy structures. Continuous rotamers more closely represent the allowed movement of a side chain within its torsional well and have been successfully incorporated into the protein design framework to design biomedically relevant protein systems. The use of continuous rotamers in protein design enables algorithms to search a larger conformational space than previously possible, but adds additional complexity to the design search. To design large, complex systems with continuous rotamers, new algorithms are needed to increase the efficiency of the search. We present two methods, PartCR and HOT, that greatly increase the speed and efficiency of protein design with continuous rotamers. These methods specifically target the large errors in energetic terms that are used to bound pairwise energies during the design search. By tightening the energy bounds, additional pruning of the conformation space can be achieved, and the number of conformations that must be enumerated to find the global minimum energy conformation is greatly reduced.
Collapse
Affiliation(s)
- Kyle E Roberts
- Department of Computer Science, Duke University, Durham, North Carolina
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, North Carolina.,Department of Biochemistry, Duke University Medical Center, Durham, North Carolina.,Department of Chemistry, Duke University, Durham, North Carolina
| |
Collapse
|
38
|
Approximate Counting with Deterministic Guarantees for Affinity Computation. ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING 2015. [DOI: 10.1007/978-3-319-18167-7_15] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
39
|
Allouche D, André I, Barbe S, Davies J, de Givry S, Katsirelos G, O'Sullivan B, Prestwich S, Schiex T, Traoré S. Computational protein design as an optimization problem. ARTIF INTELL 2014. [DOI: 10.1016/j.artint.2014.03.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|