1
|
Almubarak HF, Tan W, Hoffmann AD, Sun Y, Wei J, El-Shennawy L, Squires JR, Dashzeveg NK, Simonton B, Jia Y, Iyer R, Xu Y, Nicolaescu V, Elli D, Randall GC, Schipma MJ, Swaminathan S, Ison MG, Liu H, Fang D, Shen Y. Novel antibody language model accelerates IgG screening and design for broad-spectrum antiviral therapy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.01.582176. [PMID: 38496411 PMCID: PMC10942297 DOI: 10.1101/2024.03.01.582176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Therapeutic antibodies have become one of the most influential therapeutics in modern medicine to fight against infectious pathogens, cancer, and many other diseases. However, experimental screening for highly efficacious targeting antibodies is labor-intensive and of high cost, which is exacerbated by evolving antigen targets under selective pressure such as fast-mutating viral variants. As a proof-of-concept, we developed a machine learning-assisted antibody generation pipeline AbGen that greatly accelerates the screening and re-design of immunoglobulins G (IgGs) against a broad spectrum of SARS-CoV-2 coronavirus variant strains. Our AbGen centers around a novel antibody language model (AbLM) that is pretrained on 12 million generic protein domain sequences and fine-tuned on 4,000+ paired VH-VL sequences, with IgG-specific CDR-masking and VH-VL cross-attention. AbLM provides a latent space of IgG sequence embeddings for AbGen, including (a) landscapes of IgGs' activities in neutralizing the wild-type virus are analyzed through structure prediction for IgG and IgG-antigen (viral protein spike's receptor binding domain, RBD) interactions; and (b) landscapes of IgGs' susceptibility in neutralizing variant viruses are predicted through Gaussian process regression, despite that as few as 14 clinical antibodies' responses to variants of concern are available. The AbGen pipeline was applied to over 1300 IgG sequences we collected from RBD-binding B cells of convalescent patients. With experimental validations, AbGen efficiently prioritized IgG candidates against a broad spectrum of viral variants (wildtype, Delta, and Omicron), preventing the infection of host cells in vitro and hACE2 transgenic mice in vivo. Compared to other existing protein language models that require 10-100 times more model parameters, AbLM improved the precision from around 50% to 75% to predict IgGs with low variant susceptibility. Furthermore, AbGen enables structure-based computational protein redesign for selected IgG clones with single amino acid substitutions at the RBD-binding interface that doubled the IgG blockade efficacy for one of the severe, therapy-resistant strains - Delta (B.1.617). Our work expedites applications of artificial intelligence in antibody screen and re-design combining data-driven protein language models and Kriging for antibody sequence analysis and activity prediction, in synergy with physics-driven protein docking and design for antibody-antigen interface analyses and functional optimization.
Collapse
Affiliation(s)
- Hannah Faisal Almubarak
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
- Driskill Graduate Program, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | - Andrew D. Hoffmann
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | - Juncheng Wei
- Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Lamiaa El-Shennawy
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Joshua R. Squires
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
- Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Nurmaa K. Dashzeveg
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Brooke Simonton
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Yuzhi Jia
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Radhika Iyer
- Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Yanan Xu
- Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Vlad Nicolaescu
- Howard T. Ricketts Laboratory and Department of Microbiology, the University of Chicago, Chicago, IL 60637
| | - Derek Elli
- Howard T. Ricketts Laboratory and Department of Microbiology, the University of Chicago, Chicago, IL 60637
| | - Glenn C. Randall
- Howard T. Ricketts Laboratory and Department of Microbiology, the University of Chicago, Chicago, IL 60637
| | - Matthew J. Schipma
- NUseq Core Facility, Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Suchitra Swaminathan
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
- Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | | | - Huiping Liu
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
- Division of Hematology and Oncology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Deyu Fang
- Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 60611
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| |
Collapse
|
2
|
Irani S, Tan W, Li Q, Toy W, Jones C, Gadiya M, Marra A, Katzenellenbogen JA, Carlson KE, Katzenellenbogen BS, Karimi M, Segu Rajappachetty R, Del Priore IS, Reis-Filho JS, Shen Y, Chandarlapaty S. Somatic estrogen receptor α mutations that induce dimerization promote receptor activity and breast cancer proliferation. J Clin Invest 2024; 134:e163242. [PMID: 37883178 PMCID: PMC10760953 DOI: 10.1172/jci163242] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open
Abstract
Physiologic activation of estrogen receptor α (ERα) is mediated by estradiol (E2) binding in the ligand-binding pocket of the receptor, repositioning helix 12 (H12) to facilitate binding of coactivator proteins in the unoccupied coactivator binding groove. In breast cancer, activation of ERα is often observed through point mutations that lead to the same H12 repositioning in the absence of E2. Through expanded genetic sequencing of breast cancer patients, we identified a collection of mutations located far from H12 but nonetheless capable of promoting E2-independent transcription and breast cancer cell growth. Using machine learning and computational structure analyses, this set of mutants was inferred to act distinctly from the H12-repositioning mutants and instead was associated with conformational changes across the ERα dimer interface. Through both in vitro and in-cell assays of full-length ERα protein and isolated ligand-binding domain, we found that these mutants promoted ERα dimerization, stability, and nuclear localization. Point mutations that selectively disrupted dimerization abrogated E2-independent transcriptional activity of these dimer-promoting mutants. The results reveal a distinct mechanism for activation of ERα function through enforced receptor dimerization and suggest dimer disruption as a potential therapeutic strategy to treat ER-dependent cancers.
Collapse
Affiliation(s)
- Seema Irani
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, USA
| | - Qing Li
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Weiyi Toy
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Catherine Jones
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Mayur Gadiya
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Antonio Marra
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - John A. Katzenellenbogen
- Department of Chemistry and Molecular and Integrative Physiology, and the Cancer Center, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Kathryn E. Carlson
- Department of Chemistry and Molecular and Integrative Physiology, and the Cancer Center, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Benita S. Katzenellenbogen
- Department of Chemistry and Molecular and Integrative Physiology, and the Cancer Center, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, USA
| | - Ramya Segu Rajappachetty
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Isabella S. Del Priore
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Jorge S. Reis-Filho
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, USA
- Department of Computer Science and Engineering and
- Institute of Biosciences and Technology and Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, Texas, USA
| | - Sarat Chandarlapaty
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
- Weill Cornell Medical College, New York, New York, USA
| |
Collapse
|
3
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
4
|
Ayadi Z, Boulila W, Farah IR, Leborgne A, Gançarski P. Resolution methods for constraint satisfaction problem in remote sensing field: A survey of static and dynamic algorithms. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
5
|
Bouchiba Y, Ruffini M, Schiex T, Barbe S. Computational Design of Miniprotein Binders. Methods Mol Biol 2022; 2405:361-382. [PMID: 35298822 DOI: 10.1007/978-1-0716-1855-4_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Miniprotein binders hold a great interest as a class of drugs that bridges the gap between monoclonal antibodies and small molecule drugs. Like monoclonal antibodies, they can be designed to bind to therapeutic targets with high affinity, but they are more stable and easier to produce and to administer. In this chapter, we present a structure-based computational generic approach for miniprotein inhibitor design. Specifically, we describe step-by-step the implementation of the approach for the design of miniprotein binders against the SARS-CoV-2 coronavirus, using available structural data on the SARS-CoV-2 spike receptor binding domain (RBD) in interaction with its native target, the human receptor ACE2. Structural data being increasingly accessible around many protein-protein interaction systems, this method might be applied to the design of miniprotein binders against numerous therapeutic targets. The computational pipeline exploits provable and deterministic artificial intelligence-based protein design methods, with some recent additions in terms of binding energy estimation, multistate design and diverse library generation.
Collapse
Affiliation(s)
- Younes Bouchiba
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France
| | - Manon Ruffini
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, Toulouse, France
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, Toulouse, France
| | - Sophie Barbe
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France.
| |
Collapse
|
6
|
Opuu V, Mignon D, Simonson T. Knowledge-Based Unfolded State Model for Protein Design. Methods Mol Biol 2022; 2405:403-424. [PMID: 35298824 DOI: 10.1007/978-1-0716-1855-4_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The design of proteins and miniproteins is an important challenge. Designed variants should be stable, meaning the folded/unfolded free energy difference should be large enough. Thus, the unfolded state plays a central role. An extended peptide model is often used, where side chains interact with solvent and nearby backbone, but not each other. The unfolded energy is then a function of sequence composition only and can be empirically parametrized. If the space of sequences is explored with a Monte Carlo procedure, protein variants will be sampled according to a well-defined Boltzmann probability distribution. We can then choose unfolded model parameters to maximize the probability of sampling native-like sequences. This leads to a well-defined maximum likelihood framework. We present an iterative algorithm that follows the likelihood gradient. The method is presented in the context of our Proteus software, as a detailed downloadable tutorial. The unfolded model is combined with a folded model that uses molecular mechanics and a Generalized Born solvent. It was optimized for three PDZ domains and then used to redesign them. The sequences sampled are native-like and similar to a recent PDZ design study that was experimentally validated.
Collapse
Affiliation(s)
- Vaitea Opuu
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - David Mignon
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France.
| |
Collapse
|
7
|
Nazet J, Lang E, Merkl R. Rosetta:MSF:NN: Boosting performance of multi-state computational protein design with a neural network. PLoS One 2021; 16:e0256691. [PMID: 34437621 PMCID: PMC8389498 DOI: 10.1371/journal.pone.0256691] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 08/12/2021] [Indexed: 12/05/2022] Open
Abstract
Rational protein design aims at the targeted modification of existing proteins. To reach this goal, software suites like Rosetta propose sequences to introduce the desired properties. Challenging design problems necessitate the representation of a protein by means of a structural ensemble. Thus, Rosetta multi-state design (MSD) protocols have been developed wherein each state represents one protein conformation. Computational demands of MSD protocols are high, because for each of the candidate sequences a costly three-dimensional (3D) model has to be created and assessed for all states. Each of these scores contributes one data point to a complex, design-specific energy landscape. As neural networks (NN) proved well-suited to learn such solution spaces, we integrated one into the framework Rosetta:MSF instead of the so far used genetic algorithm with the aim to reduce computational costs. As its predecessor, Rosetta:MSF:NN administers a set of candidate sequences and their scores and scans sequence space iteratively. During each iteration, the union of all candidate sequences and their Rosetta scores are used to re-train NNs that possess a design-specific architecture. The enormous speed of the NNs allows an extensive assessment of alternative sequences, which are ranked on the scores predicted by the NN. Costly 3D models are computed only for a small fraction of best-scoring sequences; these and the corresponding 3D-based scores replace half of the candidate sequences during each iteration. The analysis of two sets of candidate sequences generated for a specific design problem by means of a genetic algorithm confirmed that the NN predicted 3D-based scores quite well; the Pearson correlation coefficient was at least 0.95. Applying Rosetta:MSF:NN:enzdes to a benchmark consisting of 16 ligand-binding problems showed that this protocol converges ten-times faster than the genetic algorithm and finds sequences with comparable scores.
Collapse
Affiliation(s)
- Julian Nazet
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Elmar Lang
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
- * E-mail:
| |
Collapse
|
8
|
Bouchiba Y, Cortés J, Schiex T, Barbe S. Molecular flexibility in computational protein design: an algorithmic perspective. Protein Eng Des Sel 2021; 34:6271252. [PMID: 33959778 DOI: 10.1093/protein/gzab011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/12/2021] [Accepted: 03/29/2021] [Indexed: 12/19/2022] Open
Abstract
Computational protein design (CPD) is a powerful technique for engineering new proteins, with both great fundamental implications and diverse practical interests. However, the approximations usually made for computational efficiency, using a single fixed backbone and a discrete set of side chain rotamers, tend to produce rigid and hyper-stable folds that may lack functionality. These approximations contrast with the demonstrated importance of molecular flexibility and motions in a wide range of protein functions. The integration of backbone flexibility and multiple conformational states in CPD, in order to relieve the inaccuracies resulting from these simplifications and to improve design reliability, are attracting increased attention. However, the greatly increased search space that needs to be explored in these extensions defines extremely challenging computational problems. In this review, we outline the principles of CPD and discuss recent effort in algorithmic developments for incorporating molecular flexibility in the design process.
Collapse
Affiliation(s)
- Younes Bouchiba
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France.,Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Juan Cortés
- Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Thomas Schiex
- Université de Toulouse, ANITI, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France
| |
Collapse
|
9
|
Karimi M, Zhu S, Cao Y, Shen Y. De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks. J Chem Inf Model 2020; 60:5667-5681. [PMID: 32945673 PMCID: PMC7775287 DOI: 10.1021/acs.jcim.0c00593] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Although massive data is quickly accumulating on protein sequence and structure, there is a small and limited number of protein architectural types (or structural folds). This study is addressing the following question: how well could one reveal underlying sequence-structure relationships and design protein sequences for an arbitrary, potentially novel, structural fold? In response to the question, we have developed novel deep generative models, namely, semisupervised gcWGAN (guided, conditional, Wasserstein Generative Adversarial Networks). To overcome training difficulties and improve design qualities, we build our models on conditional Wasserstein GAN (WGAN) that uses Wasserstein distance in the loss function. Our major contributions include (1) constructing a low-dimensional and generalizable representation of the fold space for the conditional input, (2) developing an ultrafast sequence-to-fold predictor (or oracle) and incorporating its feedback into WGAN as a loss to guide model training, and (3) exploiting sequence data with and without paired structures to enable a semisupervised training strategy. Assessed by the oracle over 100 novel folds not in the training set, gcWGAN generates more successful designs and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). Assessed by sequence- and structure-based predictors, gcWGAN designs are physically and biologically sound. Assessed by a structure predictor over representative novel folds, including one not even part of basis folds, gcWGAN designs have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. The ultrafast data-driven model is further shown to boost the success of a principle-driven de novo method (RosettaDesign), through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning generalizable principles from current sequence-structure data. Data, source codes, and trained models are available at https://github.com/Shen-Lab/gcWGAN.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| |
Collapse
|
10
|
Adaptive landscape flattening allows the design of both enzyme: Substrate binding and catalytic power. PLoS Comput Biol 2020; 16:e1007600. [PMID: 31917825 PMCID: PMC7041857 DOI: 10.1371/journal.pcbi.1007600] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 02/25/2020] [Accepted: 12/11/2019] [Indexed: 01/30/2023] Open
Abstract
Designed enzymes are of fundamental and technological interest. Experimental directed evolution still has significant limitations, and computational approaches are a complementary route. A designed enzyme should satisfy multiple criteria: stability, substrate binding, transition state binding. Such multi-objective design is computationally challenging. Two recent studies used adaptive importance sampling Monte Carlo to redesign proteins for ligand binding. By first flattening the energy landscape of the apo protein, they obtained positive design for the bound state and negative design for the unbound. We have now extended the method to design an enzyme for specific transition state binding, i.e., for its catalytic power. We considered methionyl-tRNA synthetase (MetRS), which attaches methionine (Met) to its cognate tRNA, establishing codon identity. Previously, MetRS and other synthetases have been redesigned by experimental directed evolution to accept noncanonical amino acids as substrates, leading to genetic code expansion. Here, we have redesigned MetRS computationally to bind several ligands: the Met analog azidonorleucine, methionyl-adenylate (MetAMP), and the activated ligands that form the transition state for MetAMP production. Enzyme mutants known to have azidonorleucine activity were recovered by the design calculations, and 17 mutants predicted to bind MetAMP were characterized experimentally and all found to be active. Mutants predicted to have low activation free energies for MetAMP production were found to be active and the predicted reaction rates agreed well with the experimental values. We suggest the present method should become the paradigm for computational enzyme design. Designed enzymes are of major interest. Experimental directed evolution still has significant limitations, and computational approaches are another route. Enzymes must be stable, bind substrates, and be powerful catalysts. It is challenging to design for all these properties. A method to design substrate binding was proposed recently. It used an adaptive Monte Carlo method to explore mutations of a few amino acids near the substrate. A bias energy was gradually “learned” such that, in the absence of the ligand, the simulation visited most of the possible protein mutations with comparable probabilities. Remarkably, a simulation of the protein:ligand complex, including the bias, will then preferentially sample tight-binding sequences. We generalized the method to design binding specificity. We tested it for the methionyl-tRNA synthetase enzyme, which has been engineered in order to expand the genetic code. We redesigned the enzyme to obtain variants with low activation free energies for the catalytic step. The variants proposed by the simulations were shown experimentally to be active, and the predicted activation free energies were in reasonable agreement with the experimental values. We expect the new method will become the paradigm for computational enzyme design.
Collapse
|
11
|
Cao Y, Sun Y, Karimi M, Chen H, Moronfoye O, Shen Y. Predicting pathogenicity of missense variants with weakly supervised regression. Hum Mutat 2019; 40:1579-1592. [PMID: 31144781 PMCID: PMC6744350 DOI: 10.1002/humu.23826] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 05/23/2019] [Accepted: 05/27/2019] [Indexed: 12/27/2022]
Abstract
Quickly growing genetic variation data of unknown clinical significance demand computational methods that can reliably predict clinical phenotypes and deeply unravel molecular mechanisms. On the platform enabled by the Critical Assessment of Genome Interpretation (CAGI), we develop a novel "weakly supervised" regression (WSR) model that not only predicts precise clinical significance (probability of pathogenicity) from inexact training annotations (class of pathogenicity) but also infers underlying molecular mechanisms in a variant-specific manner. Compared to multiclass logistic regression, a representative multiclass classifier, our kernelized WSR improves the performance for the ENIGMA Challenge set from 0.72 to 0.97 in binary area under the receiver operating characteristic curve (AUC) and from 0.64 to 0.80 in ordinal multiclass AUC. WSR model interpretation and protein structural interpretation reach consensus in corroborating the most probable molecular mechanisms by which some pathogenic BRCA1 variants confer clinical significance, namely metal-binding disruption for p.C44F and p.C47Y, protein-binding disruption for p.M18T, and structure destabilization for p.S1715N.
Collapse
Affiliation(s)
- Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Haoran Chen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Oluwaseyi Moronfoye
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| |
Collapse
|
12
|
Savojardo C, Petrosino M, Babbi G, Bovo S, Corbi-Verge C, Casadio R, Fariselli P, Folkman L, Garg A, Karimi M, Katsonis P, Kim PM, Lichtarge O, Martelli PL, Pasquo A, Pal D, Shen Y, Strokach AV, Turina P, Zhou Y, Andreoletti G, Brenner S, Chiaraluce R, Consalvi V, Capriotti E. Evaluating the predictions of the protein stability change upon single amino acid substitutions for the FXN CAGI5 challenge. Hum Mutat 2019; 40:1392-1399. [PMID: 31209948 PMCID: PMC6744327 DOI: 10.1002/humu.23843] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 06/02/2019] [Accepted: 06/09/2019] [Indexed: 12/31/2022]
Abstract
Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant ( Δ Δ G H 2 O ) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the Δ Δ G H 2 O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Maria Petrosino
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Roma, Roma, Italy
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Samuele Bovo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Carles Corbi-Verge
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| | - Piero Fariselli
- Department of Medical Sciences University of Torino, 10126 Torino, Italy
| | - Lukas Folkman
- School of Information and Communication Technology, Griffith University, Parklands Dr, Southport, QLD 4222, Australia
| | - Aditi Garg
- Department of Computational and Data Sciences. Indian Institute of Science, Bengaluru 560 012, India
| | - Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Philip M. Kim
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, 1 King’s College Cir, Toronto, ON M5S 1A8, Canada
- Department of Computer Science, University of Toronto, 214 College St, Toronto, ON M5T 3A1, Canada
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Pharmacology, Baylor College of Medicine, Houston, Texas 77030, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Alessandra Pasquo
- ENEA CR Frascati, Diagnostics and Metrology Laboratory,FSN-TECFIS-DIM, Frascati, Italy
| | - Debnath Pal
- Department of Computational and Data Sciences. Indian Institute of Science, Bengaluru 560 012, India
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840, USA
| | - Alexey V. Strokach
- Department of Computer Science, University of Toronto, 214 College St, Toronto, ON M5T 3A1, Canada
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Parklands Dr, Southport, QLD 4222, Australia
- Institute for Glycomics, Griffith University, Parklands Dr, Southport QLD 4222, Australia
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Steven Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Roberta Chiaraluce
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Roma, Roma, Italy
| | - Valerio Consalvi
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Roma, Roma, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy
| |
Collapse
|
13
|
Vucinic J, Simoncini D, Ruffini M, Barbe S, Schiex T. Positive multistate protein design. Bioinformatics 2019; 36:122-130. [DOI: 10.1093/bioinformatics/btz497] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 05/20/2019] [Accepted: 06/11/2019] [Indexed: 11/12/2022] Open
Abstract
Abstract
Motivation
Structure-based computational protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. The usual approach considers a single rigid backbone as a target, which ignores backbone flexibility. Multistate design (MSD) allows instead to consider several backbone states simultaneously, defining challenging computational problems.
Results
We introduce efficient reductions of positive MSD problems to Cost Function Networks with two different fitness definitions and implement them in the Pompd (Positive Multistate Protein design) software. Pompd is able to identify guaranteed optimal sequences of positive multistate full protein redesign problems and exhaustively enumerate suboptimal sequences close to the MSD optimum. Applied to nuclear magnetic resonance and back-rubbed X-ray structures, we observe that the average energy fitness provides the best sequence recovery. Our method outperforms state-of-the-art guaranteed computational design approaches by orders of magnitudes and can solve MSD problems with sizes previously unreachable with guaranteed algorithms.
Availability and implementation
https://forgemia.inra.fr/thomas.schiex/pompd as documented Open Source.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jelena Vucinic
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan Cedex, France
| | - David Simoncini
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
- IRIT UMR 5505-CNRS, Université de Toulouse, 31042 Cedex 9, France
| | - Manon Ruffini
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan Cedex, France
| | - Sophie Barbe
- LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France
| | - Thomas Schiex
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan Cedex, France
| |
Collapse
|
14
|
Kang JC, Sun W, Khare P, Karimi M, Wang X, Shen Y, Ober RJ, Ward ES. Engineering a HER2-specific antibody-drug conjugate to increase lysosomal delivery and therapeutic efficacy. Nat Biotechnol 2019; 37:523-526. [PMID: 30936563 PMCID: PMC6668989 DOI: 10.1038/s41587-019-0073-7] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 02/20/2019] [Indexed: 12/11/2022]
Abstract
We improve the potency of antibody-drug conjugates (ADCs) containing the
HER2-specific antibody pertuzumab by reducing their affinity for HER2 by
>250-fold at acidic endosomal pH relative to near neutral pH. These
engineered pertuzumab variants show increased lysosomal delivery and
cytotoxicity towards tumor cells expressing intermediate HER2 levels. In
HER2int xenograft tumor models in mice, the variants show higher
therapeutic efficacy than the parent ADC and a clinically-approved HER2-specific
ADC.
Collapse
Affiliation(s)
- Jeffrey C Kang
- Department of Molecular and Cellular Medicine, Texas A&M University Health Science Center, College Station, TX, USA
| | - Wei Sun
- Department of Molecular and Cellular Medicine, Texas A&M University Health Science Center, College Station, TX, USA
| | - Priyanka Khare
- Department of Molecular and Cellular Medicine, Texas A&M University Health Science Center, College Station, TX, USA
| | - Mostafa Karimi
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Xiaoli Wang
- Department of Molecular and Cellular Medicine, Texas A&M University Health Science Center, College Station, TX, USA
| | - Yang Shen
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Raimund J Ober
- Department of Molecular and Cellular Medicine, Texas A&M University Health Science Center, College Station, TX, USA. .,Department of Biomedical Engineering, Texas A&M University, College Station, TX, USA. .,Cancer Sciences Unit, Centre for Cancer Immunology, Faculty of Medicine, University of Southampton, Southampton, UK.
| | - E Sally Ward
- Department of Molecular and Cellular Medicine, Texas A&M University Health Science Center, College Station, TX, USA. .,Cancer Sciences Unit, Centre for Cancer Immunology, Faculty of Medicine, University of Southampton, Southampton, UK. .,Department of Microbial Pathogenesis and Immunology, Texas A&M University Health Science Center, Bryan, TX, USA.
| |
Collapse
|
15
|
Carvalho HF, Branco RJF, Leite FAS, Matzapetakis M, Roque ACA, Iranzo O. Hydrolytic zinc metallopeptides using a computational multi-state design approach. Catal Sci Technol 2019. [DOI: 10.1039/c9cy01364d] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Combination of multi-state design and long-timescale conformational dynamics as a powerful strategy to obtain metalloenzymes.
Collapse
Affiliation(s)
- Henrique F. Carvalho
- UCIBIO
- Departamento de Química
- Faculdade de Ciências e Tecnologia
- Universidade Nova de Lisboa
- 2829-516 Caparica
| | - Ricardo J. F. Branco
- UCIBIO
- Departamento de Química
- Faculdade de Ciências e Tecnologia
- Universidade Nova de Lisboa
- 2829-516 Caparica
| | - Fábio A. S. Leite
- UCIBIO
- Departamento de Química
- Faculdade de Ciências e Tecnologia
- Universidade Nova de Lisboa
- 2829-516 Caparica
| | - Manolis Matzapetakis
- Instituto de Tecnologia Química e Biológica António Xavier
- Universidade Nova de Lisboa
- 2780-157 Oeiras
- Portugal
| | - A. Cecília A. Roque
- UCIBIO
- Departamento de Química
- Faculdade de Ciências e Tecnologia
- Universidade Nova de Lisboa
- 2829-516 Caparica
| | | |
Collapse
|