1
|
Ingraham JB, Baranov M, Costello Z, Barber KW, Wang W, Ismail A, Frappier V, Lord DM, Ng-Thow-Hing C, Van Vlack ER, Tie S, Xue V, Cowles SC, Leung A, Rodrigues JV, Morales-Perez CL, Ayoub AM, Green R, Puentes K, Oplinger F, Panwar NV, Obermeyer F, Root AR, Beam AL, Poelwijk FJ, Grigoryan G. Illuminating protein space with a programmable generative model. Nature 2023; 623:1070-1078. [PMID: 37968394 PMCID: PMC10686827 DOI: 10.1038/s41586-023-06728-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 10/06/2023] [Indexed: 11/17/2023]
Abstract
Three billion years of evolution has produced a tremendous diversity of protein molecules1, but the full potential of proteins is likely to be much greater. Accessing this potential has been challenging for both computation and experiments because the space of possible protein molecules is much larger than the space of those likely to have functions. Here we introduce Chroma, a generative model for proteins and protein complexes that can directly sample novel protein structures and sequences, and that can be conditioned to steer the generative process towards desired properties and functions. To enable this, we introduce a diffusion process that respects the conformational statistics of polymer ensembles, an efficient neural architecture for molecular systems that enables long-range reasoning with sub-quadratic scaling, layers for efficiently synthesizing three-dimensional structures of proteins from predicted inter-residue geometries and a general low-temperature sampling algorithm for diffusion models. Chroma achieves protein design as Bayesian inference under external constraints, which can involve symmetries, substructure, shape, semantics and even natural-language prompts. The experimental characterization of 310 proteins shows that sampling from Chroma results in proteins that are highly expressed, fold and have favourable biophysical properties. The crystal structures of two designed proteins exhibit atomistic agreement with Chroma samples (a backbone root-mean-square deviation of around 1.0 Å). With this unified approach to protein design, we hope to accelerate the programming of protein matter to benefit human health, materials science and synthetic biology.
Collapse
Affiliation(s)
| | | | | | | | - Wujie Wang
- Generate Biomedicines, Somerville, MA, USA
| | | | | | | | | | | | - Shan Tie
- Generate Biomedicines, Somerville, MA, USA
| | | | | | - Alan Leung
- Generate Biomedicines, Somerville, MA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Fram B, Truebridge I, Su Y, Riesselman AJ, Ingraham JB, Passera A, Napier E, Thadani NN, Lim S, Roberts K, Kaur G, Stiffler M, Marks DS, Bahl CD, Khan AR, Sander C, Gauthier NP. Simultaneous enhancement of multiple functional properties using evolution-informed protein design. bioRxiv 2023:2023.05.09.539914. [PMID: 37214973 PMCID: PMC10197589 DOI: 10.1101/2023.05.09.539914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Designing optimized proteins is important for a range of practical applications. Protein design is a rapidly developing field that would benefit from approaches that enable many changes in the amino acid primary sequence, rather than a small number of mutations, while maintaining structure and enhancing function. Homologous protein sequences contain extensive information about various protein properties and activities that have emerged over billions of years of evolution. Evolutionary models of sequence co-variation, derived from a set of homologous sequences, have proven effective in a range of applications including structure determination and mutation effect prediction. In this work we apply one of these models (EVcouplings) to computationally design highly divergent variants of the model protein TEM-1 β-lactamase, and characterize these designs experimentally using multiple biochemical and biophysical assays. Nearly all designed variants were functional, including one with 84 mutations from the nearest natural homolog. Surprisingly, all functional designs had large increases in thermostability and most had a broadening of available substrates. These property enhancements occurred while maintaining a nearly identical structure to the wild type enzyme. Collectively, this work demonstrates that evolutionary models of sequence co-variation (1) are able to capture complex epistatic interactions that successfully guide large sequence departures from natural contexts, and (2) can be applied to generate functional diversity useful for many applications in protein design.
Collapse
Affiliation(s)
- Benjamin Fram
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Ian Truebridge
- Institute for Protein Innovation, Boston, Massachusetts, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children’s Hospital, Harvard Medical School; Boston, MA, USA
- current address: AI Proteins; Boston, MA, USA
| | - Yang Su
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Adam J. Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Program in Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John B. Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Alessandro Passera
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- current address: Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus-Vienna-Biocenter 1, 1030 Vienna, Austria
| | - Eve Napier
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland
| | - Nicole N. Thadani
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Samuel Lim
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Kristen Roberts
- Selux Diagnostics, Inc., 56 Roland Street, Charlestown, MA, USA
| | - Gurleen Kaur
- Selux Diagnostics, Inc., 56 Roland Street, Charlestown, MA, USA
| | - Michael Stiffler
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Debora S. Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Christopher D. Bahl
- Institute for Protein Innovation, Boston, Massachusetts, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children’s Hospital, Harvard Medical School; Boston, MA, USA
- current address: AI Proteins; Boston, MA, USA
| | - Amir R. Khan
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland
- Division of Newborn Medicine, Boston Children’s Hospital, Boston, MA, USA
| | - Chris Sander
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Nicholas P. Gauthier
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
3
|
Pattanaik L, Ingraham JB, Grambow CA, Green WH. Generating transition states of isomerization reactions with deep learning. Phys Chem Chem Phys 2020; 22:23618-23626. [PMID: 33112304 DOI: 10.1039/d0cp04670a] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Lack of quality data and difficulty generating these data hinder quantitative understanding of reaction kinetics. Specifically, conventional methods to generate transition state structures are deficient in speed, accuracy, or scope. We describe a novel method to generate three-dimensional transition state structures for isomerization reactions using reactant and product geometries. Our approach relies on a graph neural network to predict the transition state distance matrix and a least squares optimization to reconstruct the coordinates based on which entries of the distance matrix the model perceives to be important. We feed the structures generated by our algorithm through a rigorous quantum mechanics workflow to ensure the predicted transition state corresponds to the ground truth reactant and product. In both generating viable geometries and predicting accurate transition states, our method achieves excellent results. We envision workflows like this, which combine neural networks and quantum chemistry calculations, will become the preferred methods for computing chemical reactions.
Collapse
Affiliation(s)
- Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | - John B Ingraham
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Colin A Grambow
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
| |
Collapse
|
4
|
Hopf TA, Green AG, Schubert B, Mersmann S, Schärfe CPI, Ingraham JB, Toth-Petroczy A, Brock K, Riesselman AJ, Palmedo P, Kang C, Sheridan R, Draizen EJ, Dallago C, Sander C, Marks DS. The EVcouplings Python framework for coevolutionary sequence analysis. Bioinformatics 2020; 35:1582-1584. [PMID: 30304492 PMCID: PMC6499242 DOI: 10.1093/bioinformatics/bty862] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 09/06/2018] [Accepted: 10/08/2018] [Indexed: 01/03/2023] Open
Abstract
SUMMARY Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments, calculation and evaluation of evolutionary couplings (ECs), and de novo prediction of structure and mutation effects. The combination of an easy to use, flexible command line interface and an underlying modular Python package makes the full power of coevolutionary analyses available to entry-level and advanced users. AVAILABILITY AND IMPLEMENTATION https://github.com/debbiemarkslab/evcouplings.
Collapse
Affiliation(s)
- Thomas A Hopf
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Department of Cell Biology, Harvard Medical School, Boston, MA, USA
| | - Anna G Green
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Benjamin Schubert
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Department of Cell Biology, Harvard Medical School, Boston, MA, USA.,cBio Center, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sophia Mersmann
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Charlotta P I Schärfe
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Center for Bioinformatics, University of Tübingen, Tübingen, Germany.,Applied Bioinformatics, Department of Computer Science, Tübingen, Germany
| | - John B Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | | | - Kelly Brock
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Adam J Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Perry Palmedo
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Chan Kang
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Robert Sheridan
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Eli J Draizen
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
| | - Christian Dallago
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Department of Cell Biology, Harvard Medical School, Boston, MA, USA.,Department of Informatics, Technische Universität München, Garching, Germany
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA.,cBio Center, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
5
|
Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat Methods 2018; 15:816-822. [PMID: 30250057 DOI: 10.1038/s41592-018-0138-4] [Citation(s) in RCA: 239] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 07/29/2018] [Indexed: 01/05/2023]
Abstract
The functions of proteins and RNAs are defined by the collective interactions of many residues, and yet most statistical models of biological sequences consider sites nearly independently. Recent approaches have demonstrated benefits of including interactions to capture pairwise covariation, but leave higher-order dependencies out of reach. Here we show how it is possible to capture higher-order, context-dependent constraints in biological sequences via latent variable models with nonlinear dependencies. We found that DeepSequence ( https://github.com/debbiemarkslab/DeepSequence ), a probabilistic model for sequence families, predicted the effects of mutations across a variety of deep mutational scanning experiments substantially better than existing methods based on the same evolutionary data. The model, learned in an unsupervised manner solely on the basis of sequence information, is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space.
Collapse
Affiliation(s)
- Adam J Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Program in Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John B Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Program in Systems Biology, Harvard University, Cambridge, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
6
|
Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CP, Springer M, Sander C, Marks DS. Mutation effects predicted from sequence co-variation. Nat Biotechnol 2017; 35:128-135. [PMID: 28092658 PMCID: PMC5383098 DOI: 10.1038/nbt.3769] [Citation(s) in RCA: 355] [Impact Index Per Article: 50.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 12/09/2016] [Indexed: 01/09/2023]
Abstract
Many high-throughput experimental technologies have been developed to assess the effects of large numbers of mutations (variation) on phenotypes. However, designing functional assays for these methods is challenging, and systematic testing of all combinations is impossible, so robust methods to predict the effects of genetic variation are needed. Most prediction methods exploit evolutionary sequence conservation but do not consider the interdependencies of residues or bases. We present EVmutation, an unsupervised statistical method for predicting the effects of mutations that explicitly captures residue dependencies between positions. We validate EVmutation by comparing its predictions with outcomes of high-throughput mutagenesis experiments and measurements of human disease mutations and show that it outperforms methods that do not account for epistasis. EVmutation can be used to assess the quantitative effects of mutations in genes of any organism. We provide pre-computed predictions for ∼7,000 human proteins at http://evmutation.org/.
Collapse
Affiliation(s)
- Thomas A. Hopf
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- Department of Informatics, Technische Universität München, Garching, Germany
| | - John B. Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | | | - Charlotta P.I. Schärfe
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Michael Springer
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Debora S. Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
7
|
Weinreb C, Riesselman AJ, Ingraham JB, Gross T, Sander C, Marks DS. 3D RNA and Functional Interactions from Evolutionary Couplings. Cell 2016; 165:963-75. [PMID: 27087444 DOI: 10.1016/j.cell.2016.03.030] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Revised: 01/15/2016] [Accepted: 03/18/2016] [Indexed: 11/18/2022]
Abstract
Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces the research on the structure and functional interactions of these RNA gene sequences. We mine the evolutionary sequence record to derive precise information about the function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions-e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by increasing sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA.
Collapse
Affiliation(s)
- Caleb Weinreb
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Adam J Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA; Program in Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - John B Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Torsten Gross
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA; Institute of Pathology, Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|