1
|
Beck J, Shanmugaratnam S, Höcker B. Diversifying de novo TIM barrels by hallucination. Protein Sci 2024; 33:e5001. [PMID: 38723111 PMCID: PMC11081422 DOI: 10.1002/pro.5001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/26/2024] [Accepted: 04/10/2024] [Indexed: 05/13/2024]
Abstract
De novo protein design expands the protein universe by creating new sequences to accomplish tailor-made enzymes in the future. A promising topology to implement diverse enzyme functions is the ubiquitous TIM-barrel fold. Since the initial de novo design of an idealized four-fold symmetric TIM barrel, the family of de novo TIM barrels is expanding rapidly. Despite this and in contrast to natural TIM barrels, these novel proteins lack cavities and structural elements essential for the incorporation of binding sites or enzymatic functions. In this work, we diversified a de novo TIM barrel by extending multiple βα-loops using constrained hallucination. Experimentally tested designs were found to be soluble upon expression in Escherichia coli and well-behaved. Biochemical characterization and crystal structures revealed successful extensions with defined α-helical structures. These diversified de novo TIM barrels provide a framework to explore a broad spectrum of functions based on the potential of natural TIM barrels.
Collapse
Affiliation(s)
- Julian Beck
- Department of BiochemistryUniversity of BayreuthBayreuthGermany
| | | | - Birte Höcker
- Department of BiochemistryUniversity of BayreuthBayreuthGermany
| |
Collapse
|
2
|
Jiang H, Jude KM, Wu K, Fallas J, Ueda G, Brunette TJ, Hicks DR, Pyles H, Yang A, Carter L, Lamb M, Li X, Levine PM, Stewart L, Garcia KC, Baker D. De novo design of buttressed loops for sculpting protein functions. Nat Chem Biol 2024:10.1038/s41589-024-01632-2. [PMID: 38816644 DOI: 10.1038/s41589-024-01632-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/29/2024] [Indexed: 06/01/2024]
Abstract
In natural proteins, structured loops have central roles in molecular recognition, signal transduction and enzyme catalysis. However, because of the intrinsic flexibility and irregularity of loop regions, organizing multiple structured loops at protein functional sites has been very difficult to achieve by de novo protein design. Here we describe a solution to this problem that designs tandem repeat proteins with structured loops (9-14 residues) buttressed by extensive hydrogen bonding interactions. Experimental characterization shows that the designs are monodisperse, highly soluble, folded and thermally stable. Crystal structures are in close agreement with the design models, with the loops structured and buttressed as designed. We demonstrate the functionality afforded by loop buttressing by designing and characterizing binders for extended peptides in which the loops form one side of an extended binding pocket. The ability to design multiple structured loops should contribute generally to efforts to design new protein functions.
Collapse
Affiliation(s)
- Hanlun Jiang
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Kevin M Jude
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA, USA
| | - Kejia Wu
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Biological Physics, Structure and Design Graduate Program, University of Washington, Seattle, WA, USA
| | - Jorge Fallas
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - George Ueda
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - T J Brunette
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Derrick R Hicks
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Harley Pyles
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Aerin Yang
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA, USA
| | - Lauren Carter
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Mila Lamb
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Xinting Li
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Paul M Levine
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Lance Stewart
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - K Christopher Garcia
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, USA.
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA.
- Institute for Protein Design, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
3
|
Yu LT, Kreutzberger MAB, Hancu MC, Bui TH, Farsheed AC, Egelman EH, Hartgerink JD. Beyond the Triple Helix: Exploration of the Hierarchical Assembly Space of Collagen-like Peptides. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.14.594194. [PMID: 38798367 PMCID: PMC11118445 DOI: 10.1101/2024.05.14.594194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
The de novo design of self-assembling peptides has garnered significant attention in scientific research. While alpha-helical assemblies have been extensively studied, exploration of polyproline type II (PPII) helices, such as those found in collagen, remains relatively limited. In this study, we focused on understanding the sequence-structure relationship in hierarchical assemblies of collagen-like peptides, using defense collagen SP-A as a model. By dissecting the sequence derived from SP-A and synthesizing short collagen-like peptides, we successfully constructed a discrete bundle of hollow triple helices. Mutation studies pinpointed amino acid sequences, including hydrophobic and charged residues that are critical for oligomer formation. These insights guided the de novo design of collagen-like peptides, resulting in the formation of diverse quaternary structures, including discrete and heterogenous bundled oligomers, 2D nanosheets, and pH-responsive nanoribbons. Our study represents a significant advancement in the understanding and harnessing of collagen higher-order assemblies beyond the triple helix.
Collapse
Affiliation(s)
- Le Tracy Yu
- Department of Chemistry, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| | - Mark A. B. Kreutzberger
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, 22903, USA
| | - Maria C. Hancu
- Department of Chemistry, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| | - Thi H. Bui
- Department of Chemistry, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| | - Adam C. Farsheed
- Department of Bioengineering, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| | - Edward H. Egelman
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, 22903, USA
| | - Jeffrey D. Hartgerink
- Department of Chemistry, Rice University, 6100 Main Street, Houston, TX, 77005, USA
- Department of Bioengineering, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| |
Collapse
|
4
|
Crauwels C, Heidig SL, Díaz A, Vranken WF. Large-scale structure-informed multiple sequence alignment of proteins with SIMSApiper. Bioinformatics 2024; 40:btae276. [PMID: 38648741 PMCID: PMC11099654 DOI: 10.1093/bioinformatics/btae276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/20/2024] [Accepted: 04/18/2024] [Indexed: 04/25/2024] Open
Abstract
SUMMARY SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity-based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements. AVAILABILITY AND IMPLEMENTATION The pipeline is implemented using Nextflow, Python3, and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.
Collapse
Affiliation(s)
- Charlotte Crauwels
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
- AI Lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| | - Sophie-Luise Heidig
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium
- AI Lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium
- Evolutionary Biology & Ecology, Université libre de Bruxelles, Brussels, 1050, Belgium
| | - Adrián Díaz
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
- AI Lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
- AI Lab, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| |
Collapse
|
5
|
Zheng Z, Goncearenco A, Berezovsky IN. Back in time to the Gly-rich prototype of the phosphate binding elementary function. Curr Res Struct Biol 2024; 7:100142. [PMID: 38655428 PMCID: PMC11035071 DOI: 10.1016/j.crstbi.2024.100142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 03/31/2024] [Accepted: 04/03/2024] [Indexed: 04/26/2024] Open
Abstract
Binding of nucleotides and their derivatives is one of the most ancient elementary functions dating back to the Origin of Life. We review here the works considering one of the key elements in binding of (di)nucleotide-containing ligands - phosphate binding. We start from a brief discussion of major participants, conditions, and events in prebiotic evolution that resulted in the Origin of Life. Tracing back to the basic functions, including metal and phosphate binding, and, potentially, formation of primitive protein-protein interactions, we focus here on the phosphate binding. Critically assessing works on the structural, functional, and evolutionary aspects of phosphate binding, we perform a simple computational experiment reconstructing its most ancient and generic sequence prototype. The profiles of the phosphate binding signatures have been derived in form of position-specific scoring matrices (PSSMs), their peculiarities depending on the type of the ligands have been analyzed, and evolutionary connections between them have been delineated. Then, the apparent prototype that gave rise to all relevant phosphate-binding signatures had also been reconstructed. We show that two major signatures of the phosphate binding that discriminate between the binding of dinucleotide- and nucleotide-containing ligands are GxGxxG and GxxGxG, respectively. It appears that the signature archetypal for dinucleotide-containing ligands is more generic, and it can frequently bind phosphate groups in nucleotide-containing ligands as well. The reconstructed prototype's key signature GxGGxG underlies the role of glycine residues in providing flexibility and interactions necessary for binding the phosphate groups. The prototype also contains other ancient amino acids, valine, and alanine, showing versatility towards evolutionary design and functional diversification.
Collapse
Affiliation(s)
- Zejun Zheng
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | | | - Igor N. Berezovsky
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| |
Collapse
|
6
|
Capponi S, Wang S. AI in cellular engineering and reprogramming. Biophys J 2024:S0006-3495(24)00245-5. [PMID: 38576162 DOI: 10.1016/j.bpj.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/19/2024] [Accepted: 04/01/2024] [Indexed: 04/06/2024] Open
Abstract
During the last decade, artificial intelligence (AI) has increasingly been applied in biophysics and related fields, including cellular engineering and reprogramming, offering novel approaches to understand, manipulate, and control cellular function. The potential of AI lies in its ability to analyze complex datasets and generate predictive models. AI algorithms can process large amounts of data from single-cell genomics and multiomic technologies, allowing researchers to gain mechanistic insights into the control of cell identity and function. By integrating and interpreting these complex datasets, AI can help identify key molecular events and regulatory pathways involved in cellular reprogramming. This knowledge can inform the design of precision engineering strategies, such as the development of new transcription factor and signaling molecule cocktails, to manipulate cell identity and drive authentic cell fate across lineage boundaries. Furthermore, when used in combination with computational methods, AI can accelerate and improve the analysis and understanding of the intricate relationships between genes, proteins, and cellular processes. In this review article, we explore the current state of AI applications in biophysics with a specific focus on cellular engineering and reprogramming. Then, we showcase a couple of recent applications where we combined machine learning with experimental and computational techniques. Finally, we briefly discuss the challenges and prospects of AI in cellular engineering and reprogramming, emphasizing the potential of these technologies to revolutionize our ability to engineer cells for a variety of applications, from disease modeling and drug discovery to regenerative medicine and biomanufacturing.
Collapse
Affiliation(s)
- Sara Capponi
- IBM Almaden Research Center, San Jose, California; Center for Cellular Construction, San Francisco, California.
| | - Shangying Wang
- Bay Area Institute of Science, Altos Labs, Redwood City, California.
| |
Collapse
|
7
|
Listov D, Goverde CA, Correia BE, Fleishman SJ. Opportunities and challenges in design and optimization of protein function. Nat Rev Mol Cell Biol 2024:10.1038/s41580-024-00718-y. [PMID: 38565617 DOI: 10.1038/s41580-024-00718-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
The field of protein design has made remarkable progress over the past decade. Historically, the low reliability of purely structure-based design methods limited their application, but recent strategies that combine structure-based and sequence-based calculations, as well as machine learning tools, have dramatically improved protein engineering and design. In this Review, we discuss how these methods have enabled the design of increasingly complex structures and therapeutically relevant activities. Additionally, protein optimization methods have improved the stability and activity of complex eukaryotic proteins. Thanks to their increased reliability, computational design methods have been applied to improve therapeutics and enzymes for green chemistry and have generated vaccine antigens, antivirals and drug-delivery nano-vehicles. Moreover, the high success of design methods reflects an increased understanding of basic rules that govern the relationships among protein sequence, structure and function. However, de novo design is still limited mostly to α-helix bundles, restricting its potential to generate sophisticated enzymes and diverse protein and small-molecule binders. Designing complex protein structures is a challenging but necessary next step if we are to realize our objective of generating new-to-nature activities.
Collapse
Affiliation(s)
- Dina Listov
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Casper A Goverde
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Bruno E Correia
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| | - Sarel Jacob Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
8
|
Mustieles-Del-Ser P, Ruano-Gallego D, Parro V. Immunoanalytical Detection of Conserved Peptides: Refining the Universe of Biomarker Targets in Planetary Exploration. Anal Chem 2024; 96:4764-4773. [PMID: 38484023 DOI: 10.1021/acs.analchem.3c04165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Ancient peptides are remnants of early biochemistry that continue to play pivotal roles in current proteins. They are simple molecules yet complex enough to exhibit independent functions, being products of an evolved biochemistry at the interface of life and nonlife. Their adsorption to minerals may contribute to their stabilization and preservation over time. To investigate the feasibility of conserved peptide sequences and structures as target biomarkers for the search for life on Mars or other planetary bodies, we conducted a bioinformatics selection of well-conserved ancient peptides and produced polyclonal antibodies for their detection using fluorescence microarray immunoassays. Additionally, we explored how adsorbing peptides to Mars-representative minerals to form organomineral complexes could affect their immunological detection. The results demonstrated that the selected peptides exhibited autonomous folding, with some of them regaining their structure, even after denaturation. Furthermore, their cognate antibodies detected their conformational features regardless of amino acid sequences, thereby broadening the spectrum of target peptide sequences. While certain antibodies displayed unspecific binding to bare minerals, we validated that peptide-mineral complexes can be detected using sandwich immunoassays, as confirmed through desorption and competitive assays. Consequently, we conclude that the diversity of peptide sequences and structures suitable for use as target biomarkers in astrobiology can be constrained to a few well conserved sets, and they can be detected even if they are adsorbed in organomineral complexes.
Collapse
Affiliation(s)
- Pedro Mustieles-Del-Ser
- Centro de Astrobiología (CAB) INTA-CSIC, Torrejón de Ardoz 28850, Spain
- Departments of Physics and Mathematics, and Automatics, Universidad de Alcalá (UAH), Alcalá de Henares 28805, Spain
| | | | - Víctor Parro
- Centro de Astrobiología (CAB) INTA-CSIC, Torrejón de Ardoz 28850, Spain
| |
Collapse
|
9
|
Goverde CA, Pacesa M, Goldbach N, Dornfeld LJ, Balbi PEM, Georgeon S, Rosset S, Kapoor S, Choudhury J, Dauparas J, Schellhaas C, Kozlov S, Baker D, Ovchinnikov S, Vecchio AJ, Correia BE. Computational design of soluble functional analogues of integral membrane proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.09.540044. [PMID: 38496615 PMCID: PMC10942269 DOI: 10.1101/2023.05.09.540044] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
De novo design of complex protein folds using solely computational means remains a significant challenge. Here, we use a robust deep learning pipeline to design complex folds and soluble analogues of integral membrane proteins. Unique membrane topologies, such as those from GPCRs, are not found in the soluble proteome and we demonstrate that their structural features can be recapitulated in solution. Biophysical analyses reveal high thermal stability of the designs and experimental structures show remarkable design accuracy. The soluble analogues were functionalized with native structural motifs, standing as a proof-of-concept for bringing membrane protein functions to the soluble proteome, potentially enabling new approaches in drug discovery. In summary, we designed complex protein topologies and enriched them with functionalities from membrane proteins, with high experimental success rates, leading to a de facto expansion of the functional soluble fold space.
Collapse
|
10
|
Koch J, Romero‐Romero S, Höcker B. Stepwise introduction of stabilizing mutations reveals nonlinear additive effects in de novo TIM barrels. Protein Sci 2024; 33:e4926. [PMID: 38380781 PMCID: PMC10880431 DOI: 10.1002/pro.4926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 01/29/2024] [Accepted: 01/30/2024] [Indexed: 02/22/2024]
Abstract
Over the past decades, the TIM-barrel fold has served as a model system for the exploration of how changes in protein sequences affect their structural, stability, and functional characteristics, and moreover, how this information can be leveraged to design proteins from the ground up. After numerous attempts to design de novo proteins with this specific fold, sTIM11 was the first validated de novo design of an idealized four-fold symmetric TIM barrel. Subsequent efforts to enhance the stability of this initial design resulted in the development of DeNovoTIMs, a family of de novo TIM barrels with various stabilizing mutations. In this study, we present an investigation into the biophysical and thermodynamic effects upon introducing a varying number of stabilizing mutations per quarter along the sequence of a four-fold symmetric TIM barrel. We compared the base design DeNovoTIM0 without any stabilizing mutations with variants containing mutations in one, two, three, and all four quarters-designated TIM1q, TIM2q, TIM3q, and DeNovoTIM6, respectively. This analysis revealed a stepwise and nonlinear change in the thermodynamic properties that correlated with the number of mutated quarters, suggesting positive nonadditive effects. To shed light on the significance of the location of stabilized quarters, we engineered two variants of TIM2q which contain the same number of mutations but positioned in different quarter locations. Characterization of these TIM2q variants revealed that the mutations exhibit varying effects on the overall protein stability, contingent upon the specific region in which they are introduced. These findings emphasize that the amount and location of stabilized interfaces among the four quarters play a crucial role in shaping the conformational stability of these four-fold symmetric TIM barrels. Analysis of de novo proteins, as described in this study, enhances our understanding of how sequence variations can finely modulate stability in both naturally occurring and computationally designed proteins.
Collapse
Affiliation(s)
| | | | - Birte Höcker
- Department of BiochemistryUniversity of BayreuthBayreuthGermany
| |
Collapse
|
11
|
Chu AE, Lu T, Huang PS. Sparks of function by de novo protein design. Nat Biotechnol 2024; 42:203-215. [PMID: 38361073 DOI: 10.1038/s41587-024-02133-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 01/09/2024] [Indexed: 02/17/2024]
Abstract
Information in proteins flows from sequence to structure to function, with each step causally driven by the preceding one. Protein design is founded on inverting this process: specify a desired function, design a structure executing this function, and find a sequence that folds into this structure. This 'central dogma' underlies nearly all de novo protein-design efforts. Our ability to accomplish these tasks depends on our understanding of protein folding and function and our ability to capture this understanding in computational methods. In recent years, deep learning-derived approaches for efficient and accurate structure modeling and enrichment of successful designs have enabled progression beyond the design of protein structures and towards the design of functional proteins. We examine these advances in the broader context of classical de novo protein design and consider implications for future challenges to come, including fundamental capabilities such as sequence and structure co-design and conformational control considering flexibility, and functional objectives such as antibody and enzyme design.
Collapse
Affiliation(s)
- Alexander E Chu
- Biophysics Program, Stanford University, Palo Alto, CA, USA
- Department of Bioengineering, Stanford University, Palo Alto, CA, USA
- Google DeepMind, London, UK
| | - Tianyu Lu
- Department of Bioengineering, Stanford University, Palo Alto, CA, USA
| | - Po-Ssu Huang
- Biophysics Program, Stanford University, Palo Alto, CA, USA.
- Department of Bioengineering, Stanford University, Palo Alto, CA, USA.
| |
Collapse
|
12
|
Sakuma K, Kobayashi N, Sugiki T, Nagashima T, Fujiwara T, Suzuki K, Kobayashi N, Murata T, Kosugi T, Tatsumi-Koga R, Koga N. Design of complicated all-α protein structures. Nat Struct Mol Biol 2024; 31:275-282. [PMID: 38177681 DOI: 10.1038/s41594-023-01147-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 10/04/2023] [Indexed: 01/06/2024]
Abstract
A wide range of de novo protein structure designs have been achieved, but the complexity of naturally occurring protein structures is still far beyond these designs. Here, to expand the diversity and complexity of de novo designed protein structures, we sought to develop a method for designing 'difficult-to-describe' α-helical protein structures composed of irregularly aligned α-helices like globins. Backbone structure libraries consisting of a myriad of α-helical structures with five or six helices were generated by combining 18 helix-loop-helix motifs and canonical α-helices, and five distinct topologies were selected for de novo design. The designs were found to be monomeric with high thermal stability in solution and fold into the target topologies with atomic accuracy. This study demonstrated that complicated α-helical proteins are created using typical building blocks. The method we developed will enable us to explore the universe of protein structures for designing novel functional proteins.
Collapse
Affiliation(s)
- Koya Sakuma
- Department of Structural Molecular Science, School of Physical Sciences, SOKENDAI (The Graduate University for Advanced Studies), Hayama, Japan
| | - Naohiro Kobayashi
- RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan
- Institute for Protein Research, Osaka University, Suita, Japan
| | | | - Toshio Nagashima
- RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan
| | | | - Kano Suzuki
- Department of Chemistry, Graduate School of Science, Chiba University, Chiba, Japan
| | - Naoya Kobayashi
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of National Sciences, Okazaki, Japan
| | - Takeshi Murata
- Department of Chemistry, Graduate School of Science, Chiba University, Chiba, Japan
- Membrane Protein Research Center, Chiba University, Chiba, Japan
- Structural Biology Research Center, Institute of Materials Structure Science, High Energy Accelerator Research Organization (KEK), Tsukuba, Japan
| | - Takahiro Kosugi
- Department of Structural Molecular Science, School of Physical Sciences, SOKENDAI (The Graduate University for Advanced Studies), Hayama, Japan
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of National Sciences, Okazaki, Japan
- Research Center of Integrative Molecular Systems, Institute for Molecular Science, National Institutes of National Sciences, Okazaki, Japan
| | - Rie Tatsumi-Koga
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of National Sciences, Okazaki, Japan
| | - Nobuyasu Koga
- Department of Structural Molecular Science, School of Physical Sciences, SOKENDAI (The Graduate University for Advanced Studies), Hayama, Japan.
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of National Sciences, Okazaki, Japan.
- Research Center of Integrative Molecular Systems, Institute for Molecular Science, National Institutes of National Sciences, Okazaki, Japan.
- Institute for Protein Research, Osaka University, Suita, Japan.
| |
Collapse
|
13
|
Kortemme T. De novo protein design-From new structures to programmable functions. Cell 2024; 187:526-544. [PMID: 38306980 PMCID: PMC10990048 DOI: 10.1016/j.cell.2023.12.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 12/03/2023] [Accepted: 12/19/2023] [Indexed: 02/04/2024]
Abstract
Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo, without starting from proteins found in nature. In this Perspective, I will discuss the state of the field of de novo protein design at the juncture of physics-based modeling approaches and AI. New protein folds and higher-order assemblies can be designed with considerable experimental success rates, and difficult problems requiring tunable control over protein conformations and precise shape complementarity for molecular recognition are coming into reach. Emerging approaches incorporate engineering principles-tunability, controllability, and modularity-into the design process from the beginning. Exciting frontiers lie in deconstructing cellular functions with de novo proteins and, conversely, constructing synthetic cellular signaling from the ground up. As methods improve, many more challenges are unsolved.
Collapse
Affiliation(s)
- Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA.
| |
Collapse
|
14
|
Lee EJ, Gladkov N, Miller JE, Yeates TO. Design of Ligand-Operable Protein-Cages That Open Upon Specific Protein Binding. ACS Synth Biol 2024; 13:157-167. [PMID: 38133598 DOI: 10.1021/acssynbio.3c00383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Protein nanocages have diverse applications in medicine and biotechnology, including molecular delivery. However, although numerous studies have demonstrated the ability of protein nanocages to encapsulate various molecular species, limited methods are available for subsequently opening a nanocage for cargo release under specific conditions. A modular platform with a specific protein-target-based mechanism of nanocage opening is notably lacking. To address this important technology gap, we present a new class of designed protein cages, the Ligand-Operable Cage (LOC). LOCs primarily comprise a protein nanocage core and a fused surface binding adaptor. The geometry of the LOC is designed so that binding of a target protein ligand (or multiple copies thereof) to the surface binder is sterically incompatible with retention of the assembled state of the cage. Therefore, the tight binding of a target ligand drives cage disassembly by mass action, subsequently exposing the encapsulated cargo. LOCs are modular; direct substitution of the surface binder sequence can reprogram the nanocage to open in response to any target protein ligand of interest. We demonstrate these design principles using both a natural and a designed protein cage as the core, with different proteins acting as the triggering ligand and with different reporter readouts─fluorescence unquenching and luminescence─for cage disassembly. These developments advance the critical problem of targeted molecular delivery and detection.
Collapse
Affiliation(s)
- Eric J Lee
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, California 90095, United States
| | - Nika Gladkov
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, California 90095, United States
| | - Justin E Miller
- Molecular Biology Institute, UCLA, Los Angeles, California 90095, United States
| | - Todd O Yeates
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, California 90095, United States
- Molecular Biology Institute, UCLA, Los Angeles, California 90095, United States
- UCLA-DOE Institute for Genomics and Proteomics, UCLA, Los Angeles, California 90095, United States
| |
Collapse
|
15
|
Aguilera-Puga MDC, Cancelarich NL, Marani MM, de la Fuente-Nunez C, Plisson F. Accelerating the Discovery and Design of Antimicrobial Peptides with Artificial Intelligence. Methods Mol Biol 2024; 2714:329-352. [PMID: 37676607 DOI: 10.1007/978-1-0716-3441-7_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Peptides modulate many processes of human physiology targeting ion channels, protein receptors, or enzymes. They represent valuable starting points for the development of new biologics against communicable and non-communicable disorders. However, turning native peptide ligands into druggable materials requires high selectivity and efficacy, predictable metabolism, and good safety profiles. Machine learning models have gradually emerged as cost-effective and time-saving solutions to predict and generate new proteins with optimal properties. In this chapter, we will discuss the evolution and applications of predictive modeling and generative modeling to discover and design safe and effective antimicrobial peptides. We will also present their current limitations and suggest future research directions, applicable to peptide drug design campaigns.
Collapse
Affiliation(s)
- Mariana D C Aguilera-Puga
- Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV-IPN), Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Irapuato, Guanajuato, Mexico
- CINVESTAV-IPN, Unidad Irapuato, Departamento de Biotecnología y Bioquímica, Irapuato, Guanajuato, Mexico
| | - Natalia L Cancelarich
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Puerto Madryn, Argentina
| | - Mariela M Marani
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Puerto Madryn, Argentina
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| | - Fabien Plisson
- Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV-IPN), Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Irapuato, Guanajuato, Mexico.
- CINVESTAV-IPN, Unidad Irapuato, Departamento de Biotecnología y Bioquímica, Irapuato, Guanajuato, Mexico.
| |
Collapse
|
16
|
Zhou X, Chen G, Ye J, Wang E, Zhang J, Mao C, Li Z, Hao J, Huang X, Tang J, Heng PA. ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention. Nat Commun 2023; 14:7434. [PMID: 37973874 PMCID: PMC10654420 DOI: 10.1038/s41467-023-43166-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 11/02/2023] [Indexed: 11/19/2023] Open
Abstract
Inverse Protein Folding (IPF) is an important task of protein design, which aims to design sequences compatible with a given backbone structure. Despite the prosperous development of algorithms for this task, existing methods tend to rely on noisy predicted residues located in the local neighborhood when generating sequences. To address this limitation, we propose an entropy-based residue selection method to remove noise in the input residue context. Additionally, we introduce ProRefiner, a memory-efficient global graph attention model to fully utilize the denoised context. Our proposed method achieves state-of-the-art performance on multiple sequence design benchmarks in different design settings. Furthermore, we demonstrate the applicability of ProRefiner in redesigning Transposon-associated transposase B, where six out of the 20 variants we propose exhibit improved gene editing activity.
Collapse
Affiliation(s)
- Xinyi Zhou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Central Ave, Hong Kong, China
| | | | - Junjie Ye
- Noah's Ark Lab, Huawei, Shenzhen, China
| | - Ercheng Wang
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jun Zhang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Cong Mao
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Zhanwei Li
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
| | | | | | - Jin Tang
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
| | - Pheng Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Central Ave, Hong Kong, China
- Zhejiang Lab, Kechuang Avenue, Hangzhou, China
| |
Collapse
|
17
|
An L, Hicks DR, Zorine D, Dauparas J, Wicky BIM, Milles LF, Courbet A, Bera AK, Nguyen H, Kang A, Carter L, Baker D. Hallucination of closed repeat proteins containing central pockets. Nat Struct Mol Biol 2023; 30:1755-1760. [PMID: 37770718 PMCID: PMC10643118 DOI: 10.1038/s41594-023-01112-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 08/28/2023] [Indexed: 09/30/2023]
Abstract
In pseudocyclic proteins, such as TIM barrels, β barrels, and some helical transmembrane channels, a single subunit is repeated in a cyclic pattern, giving rise to a central cavity that can serve as a pocket for ligand binding or enzymatic activity. Inspired by these proteins, we devised a deep-learning-based approach to broadly exploring the space of closed repeat proteins starting from only a specification of the repeat number and length. Biophysical data for 38 structurally diverse pseudocyclic designs produced in Escherichia coli are consistent with the design models, and the three crystal structures we were able to obtain are very close to the designed structures. Docking studies suggest the diversity of folds and central pockets provide effective starting points for designing small-molecule binders and enzymes.
Collapse
Affiliation(s)
- Linna An
- Department of Biochemistry, University of Washington, Seattle, WA, USA.
- Institute for Protein Design, University of Washington, Seattle, WA, USA.
| | - Derrick R Hicks
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Dmitri Zorine
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Justas Dauparas
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Basile I M Wicky
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Lukas F Milles
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Alexis Courbet
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Asim K Bera
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Hannah Nguyen
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Alex Kang
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Lauren Carter
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA.
- Institute for Protein Design, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
18
|
Capponi S, Daniels KG. Harnessing the power of artificial intelligence to advance cell therapy. Immunol Rev 2023; 320:147-165. [PMID: 37415280 DOI: 10.1111/imr.13236] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 06/17/2023] [Indexed: 07/08/2023]
Abstract
Cell therapies are powerful technologies in which human cells are reprogrammed for therapeutic applications such as killing cancer cells or replacing defective cells. The technologies underlying cell therapies are increasing in effectiveness and complexity, making rational engineering of cell therapies more difficult. Creating the next generation of cell therapies will require improved experimental approaches and predictive models. Artificial intelligence (AI) and machine learning (ML) methods have revolutionized several fields in biology including genome annotation, protein structure prediction, and enzyme design. In this review, we discuss the potential of combining experimental library screens and AI to build predictive models for the development of modular cell therapy technologies. Advances in DNA synthesis and high-throughput screening techniques enable the construction and screening of libraries of modular cell therapy constructs. AI and ML models trained on this screening data can accelerate the development of cell therapies by generating predictive models, design rules, and improved designs.
Collapse
Affiliation(s)
- Sara Capponi
- Department of Functional Genomics and Cellular Engineering, IBM Almaden Research Center, San Jose, California, USA
- Center for Cellular Construction, San Francisco, California, USA
| | - Kyle G Daniels
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, California, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
19
|
Wu X, Zhao S, Tian Z, Han C, Jiang X, Wang L. Dynamics of loops surrounding the active site architecture in GH5_2 subfamily TfCel5A for cellulose degradation. BIOTECHNOLOGY FOR BIOFUELS AND BIOPRODUCTS 2023; 16:154. [PMID: 37853500 PMCID: PMC10583438 DOI: 10.1186/s13068-023-02411-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 10/12/2023] [Indexed: 10/20/2023]
Abstract
BACKGROUND Lignocellulose is the most abundant natural biomass resource for the production of biofuels and other chemicals. The efficient degradation of cellulose by cellulases is a critical step for the lignocellulose bioconversion. Understanding the structure-catalysis relationship is vital for rational design of more stable and highly active enzymes. Glycoside hydrolase (GH) family 5 is the largest and most functionally diverse group of cellulases, with a conserved TIM barrel structure. The important roles of the various loop regions of GH5 enzymes in catalysis, however, remain poorly understood. RESULTS In the present study, we investigated the relationship between the loops surrounding active site architecture and its catalytic efficiency, taking TfCel5A, an enzyme from GH5_2 subfamily of Thermobifida fusca, as an example. Large-scale computational simulations and site-directed mutagenesis experiments revealed that three loops (loop 8, 3, and 7) around active cleft played diverse roles in substrate binding, intermediate formation, and product release, respectively. The highly flexible and charged residue triad of loop 8 was responsible for capturing the ligand into the active cleft. Severe fluctuation of loop 3 led to the distortion of sugar conformation at the - 1 subsite. The wobble of loop 7 might facilitate product release, and the enzyme activity of the mutant Y361W in loop 7 was increased by approximately 40%. CONCLUSION This study unraveled the vital roles of loops in active site architecture and provided new insights into the catalytic mechanism of the GH5_2 cellulases.
Collapse
Affiliation(s)
- Xiuyun Wu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Sha Zhao
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Zhennan Tian
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Chao Han
- Shandong Key Laboratory of Agricultural Microbiology, Shandong Agricultural University, Tai'an, 271018, China
| | - Xukai Jiang
- National Glycoengineering Research Center, Shandong University, Qingdao, 266237, China
| | - Lushan Wang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China.
| |
Collapse
|
20
|
Jiang H, Jude KM, Wu K, Fallas J, Ueda G, Brunette TJ, Hicks D, Pyles H, Yang A, Carter L, Lamb M, Li X, Levine PM, Stewart L, Garcia KC, Baker D. De novo design of buttressed loops for sculpting protein functions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.22.554384. [PMID: 37662224 PMCID: PMC10473674 DOI: 10.1101/2023.08.22.554384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
In natural proteins, structured loops play central roles in molecular recognition, signal transduction and enzyme catalysis. However, because of the intrinsic flexibility and irregularity of loop regions, organizing multiple structured loops at protein functional sites has been very difficult to achieve by de novo protein design. Here we describe a solution to this problem that generates structured loops buttressed by extensive hydrogen bonding interactions with two neighboring loops and with secondary structure elements. We use this approach to design tandem repeat proteins with buttressed loops ranging from 9 to 14 residues in length. Experimental characterization shows the designs are folded and monodisperse, highly soluble, and thermally stable. Crystal structures are in close agreement with the computational design models, with the loops structured and buttressed by their neighbors as designed. We demonstrate the functionality afforded by loop buttressing by designing and characterizing binders for extended peptides in which the loops form one side of an extended binding pocket. The ability to design multiple structured loops should contribute quite generally to efforts to design new protein functions.
Collapse
Affiliation(s)
- Hanlun Jiang
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - Kevin M Jude
- Howard Hughes Medical Institute, Stanford University School of Medicine
| | - Kejia Wu
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
- Biological Physics, Structure and Design Graduate Program, University of Washington
| | - Jorge Fallas
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - George Ueda
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - T J Brunette
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - Derrick Hicks
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - Harley Pyles
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - Aerin Yang
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine
| | - Lauren Carter
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - Mila Lamb
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - Xinting Li
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - Paul M Levine
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - Lance Stewart
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
| | - K Christopher Garcia
- Howard Hughes Medical Institute, Stanford University School of Medicine
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine
- Department of Structural Biology, Stanford University School of Medicine
| | - David Baker
- Department of Biochemistry, University of Washington
- Institute for Protein Design, University of Washington
- Howard Hughes Medical Institute, Stanford University School of Medicine
- Howard Hughes Medical Institute, University of Washington
| |
Collapse
|
21
|
Minami S, Kobayashi N, Sugiki T, Nagashima T, Fujiwara T, Tatsumi-Koga R, Chikenji G, Koga N. Exploration of novel αβ-protein folds through de novo design. Nat Struct Mol Biol 2023; 30:1132-1140. [PMID: 37400653 PMCID: PMC10442233 DOI: 10.1038/s41594-023-01029-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 05/30/2023] [Indexed: 07/05/2023]
Abstract
A fundamental question in protein evolution is whether nature has exhaustively sampled nearly all possible protein folds throughout evolution, or whether a large fraction of the possible folds remains unexplored. To address this question, we defined a set of rules for β-sheet topology to predict novel αβ-folds and carried out a systematic de novo protein design exploration of the novel αβ-folds predicted by the rules. The designs for all eight of the predicted novel αβ-folds with a four-stranded β-sheet, including a knot-forming one, folded into structures close to the design models. Further, the rules predicted more than 10,000 novel αβ-folds with five- to eight-stranded β-sheets; this number far exceeds the number of αβ-folds observed in nature so far. This result suggests that a vast number of αβ-folds are possible, but have not emerged or have become extinct due to evolutionary bias.
Collapse
Affiliation(s)
- Shintaro Minami
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan
| | - Naohiro Kobayashi
- Institute for Protein Research (IPR), Osaka University, Osaka, Japan
- RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan
| | - Toshihiko Sugiki
- Institute for Protein Research (IPR), Osaka University, Osaka, Japan
| | - Toshio Nagashima
- RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan
| | | | - Rie Tatsumi-Koga
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan
| | - George Chikenji
- Department of Applied Physics, Graduate School of Engineering, Nagoya University, Nagoya, Japan
| | - Nobuyasu Koga
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan.
- SOKENDAI, The Graduate University for Advanced Studies, Hayama, Japan.
- Research Center of Integrative Molecular Systems, Institute for Molecular Science (IMS), National Institutes of Natural Sciences (NINS), Okazaki, Japan.
- Laboratory for Protein Design, Institute for Protein Research (IPR), Osaka University, Osaka, Japan.
| |
Collapse
|
22
|
Corbella M, Pinto GP, Kamerlin SCL. Loop dynamics and the evolution of enzyme activity. Nat Rev Chem 2023; 7:536-547. [PMID: 37225920 DOI: 10.1038/s41570-023-00495-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/06/2023] [Indexed: 05/26/2023]
Abstract
In the early 2000s, Tawfik presented his 'New View' on enzyme evolution, highlighting the role of conformational plasticity in expanding the functional diversity of limited repertoires of sequences. This view is gaining increasing traction with increasing evidence of the importance of conformational dynamics in both natural and laboratory evolution of enzymes. The past years have seen several elegant examples of harnessing conformational (particularly loop) dynamics to successfully manipulate protein function. This Review revisits flexible loops as critical participants in regulating enzyme activity. We showcase several systems of particular interest: triosephosphate isomerase barrel proteins, protein tyrosine phosphatases and β-lactamases, while briefly discussing other systems in which loop dynamics are important for selectivity and turnover. We then discuss the implications for engineering, presenting examples of successful loop manipulation in either improving catalytic efficiency, or changing selectivity completely. Overall, it is becoming clearer that mimicking nature by manipulating the conformational dynamics of key protein loops is a powerful method of tailoring enzyme activity, without needing to target active-site residues.
Collapse
Affiliation(s)
- Marina Corbella
- Department of Chemistry, Uppsala University, Uppsala, Sweden
| | - Gaspar P Pinto
- Department of Chemistry, Uppsala University, Uppsala, Sweden
- Cortex Discovery GmbH, Regensburg, Germany
| | - Shina C L Kamerlin
- Department of Chemistry, Uppsala University, Uppsala, Sweden.
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
23
|
Madani A, Krause B, Greene ER, Subramanian S, Mohr BP, Holton JM, Olmos JL, Xiong C, Sun ZZ, Socher R, Fraser JS, Naik N. Large language models generate functional protein sequences across diverse families. Nat Biotechnol 2023; 41:1099-1106. [PMID: 36702895 PMCID: PMC10400306 DOI: 10.1038/s41587-022-01618-2] [Citation(s) in RCA: 153] [Impact Index Per Article: 153.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 11/17/2022] [Indexed: 01/27/2023]
Abstract
Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here we describe ProGen, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from >19,000 families and is augmented with control tags specifying protein properties. ProGen can be further fine-tuned to curated sequences and tags to improve controllable generation performance of proteins from families with sufficient homologous samples. Artificial proteins fine-tuned to five distinct lysozyme families showed similar catalytic efficiencies as natural lysozymes, with sequence identity to natural proteins as low as 31.4%. ProGen is readily adapted to diverse protein families, as we demonstrate with chorismate mutase and malate dehydrogenase.
Collapse
Affiliation(s)
- Ali Madani
- Salesforce Research, Palo Alto, CA, USA.
- Profluent Bio, San Francisco, CA, USA.
| | | | - Eric R Greene
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
| | - Subu Subramanian
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, USA
| | | | - James M Holton
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
| | - Jose Luis Olmos
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
| | | | | | | | - James S Fraser
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
| | | |
Collapse
|
24
|
Yan J, Li S, Zhang Y, Hao A, Zhao Q. ZetaDesign: an end-to-end deep learning method for protein sequence design and side-chain packing. Brief Bioinform 2023; 24:bbad257. [PMID: 37429578 DOI: 10.1093/bib/bbad257] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/05/2023] [Accepted: 06/21/2023] [Indexed: 07/12/2023] Open
Abstract
Computational protein design has been demonstrated to be the most powerful tool in the last few years among protein designing and repacking tasks. In practice, these two tasks are strongly related but often treated separately. Besides, state-of-the-art deep-learning-based methods cannot provide interpretability from an energy perspective, affecting the accuracy of the design. Here we propose a new systematic approach, including both a posterior probability and a joint probability parts, to solve the two essential questions once for all. This approach takes the physicochemical property of amino acids into consideration and uses the joint probability model to ensure the convergence between structure and amino acid type. Our results demonstrated that this method could generate feasible, high-confidence sequences with low-energy side conformations. The designed sequences can fold into target structures with high confidence and maintain relatively stable biochemical properties. The side chain conformation has a significantly lower energy landscape without delegating to a rotamer library or performing the expensive conformational searches. Overall, we propose an end-to-end method that combines the advantages of both deep learning and energy-based methods. The design results of this model demonstrate high efficiency, and precision, as well as a low energy state and good interpretability.
Collapse
Affiliation(s)
- Junyu Yan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Shuai Li
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Ying Zhang
- The Key Laboratory of Cell Proliferation and Regulation Biology, Ministry of Education, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Aimin Hao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Qinping Zhao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| |
Collapse
|
25
|
Tagami S. Why we are made of proteins and nucleic acids: Structural biology views on extraterrestrial life. Biophys Physicobiol 2023; 20:e200026. [PMID: 38496239 PMCID: PMC10941967 DOI: 10.2142/biophysico.bppb-v20.0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/29/2023] [Indexed: 03/19/2024] Open
Abstract
Is it a miracle that life exists on the Earth, or is it a common phenomenon in the universe? If extraterrestrial organisms exist, what are they like? To answer these questions, we must understand what kinds of molecules could evolve into life, or in other words, what properties are generally required to perform biological functions and store genetic information. This review summarizes recent findings on simple ancestral proteins, outlines the basic knowledge in textbooks, and discusses the generally required properties for biological molecules from structural biology viewpoints (e.g., restriction of shapes, and types of intra- and intermolecular interactions), leading to the conclusion that proteins and nucleic acids are at least one of the simplest (and perhaps very common) forms of catalytic and genetic biopolymers in the universe. This review article is an extended version of the Japanese article, On the Origin of Life: Coevolution between RNA and Peptide, published in SEIBUTSU BUTSURI Vol. 61, p. 232-235 (2021).
Collapse
Affiliation(s)
- Shunsuke Tagami
- RIKEN Center for Biosystems Dynamics Research, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
26
|
Hanreich S, Bonandi E, Drienovská I. Design of Artificial Enzymes: Insights into Protein Scaffolds. Chembiochem 2023; 24:e202200566. [PMID: 36418221 DOI: 10.1002/cbic.202200566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 11/18/2022] [Accepted: 11/21/2022] [Indexed: 11/25/2022]
Abstract
The design of artificial enzymes has emerged as a promising tool for the generation of potent biocatalysts able to promote new-to-nature reactions with improved catalytic performances, providing a powerful platform for wide-ranging applications and a better understanding of protein functions and structures. The selection of an appropriate protein scaffold plays a key role in the design process. This review aims to give a general overview of the most common protein scaffolds that can be exploited for the generation of artificial enzymes. Several examples are discussed and categorized according to the strategy used for the design of the artificial biocatalyst, namely the functionalization of natural enzymes, the creation of a new catalytic site in a protein scaffold bearing a wide hydrophobic pocket and de novo protein design. The review is concluded by a comparison of these different methods and by our perspective on the topic.
Collapse
Affiliation(s)
- Stefanie Hanreich
- Department of Chemistry and Pharmaceutical Sciences Vrije Universiteit, Amsterdam, De Boelelaan 1108, 1081 HZ Amsterdam (The, Netherlands
| | - Elisa Bonandi
- Department of Chemistry and Pharmaceutical Sciences Vrije Universiteit, Amsterdam, De Boelelaan 1108, 1081 HZ Amsterdam (The, Netherlands
| | - Ivana Drienovská
- Department of Chemistry and Pharmaceutical Sciences Vrije Universiteit, Amsterdam, De Boelelaan 1108, 1081 HZ Amsterdam (The, Netherlands
| |
Collapse
|
27
|
Kim DE, Jensen DR, Feldman D, Tischer D, Saleem A, Chow CM, Li X, Carter L, Milles L, Nguyen H, Kang A, Bera AK, Peterson FC, Volkman BF, Ovchinnikov S, Baker D. De novo design of small beta barrel proteins. Proc Natl Acad Sci U S A 2023; 120:e2207974120. [PMID: 36897987 PMCID: PMC10089152 DOI: 10.1073/pnas.2207974120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 01/27/2023] [Indexed: 03/12/2023] Open
Abstract
Small beta barrel proteins are attractive targets for computational design because of their considerable functional diversity despite their very small size (<70 amino acids). However, there are considerable challenges to designing such structures, and there has been little success thus far. Because of the small size, the hydrophobic core stabilizing the fold is necessarily very small, and the conformational strain of barrel closure can oppose folding; also intermolecular aggregation through free beta strand edges can compete with proper monomer folding. Here, we explore the de novo design of small beta barrel topologies using both Rosetta energy-based methods and deep learning approaches to design four small beta barrel folds: Src homology 3 (SH3) and oligonucleotide/oligosaccharide-binding (OB) topologies found in nature and five and six up-and-down-stranded barrels rarely if ever seen in nature. Both approaches yielded successful designs with high thermal stability and experimentally determined structures with less than 2.4 Å rmsd from the designed models. Using deep learning for backbone generation and Rosetta for sequence design yielded higher design success rates and increased structural diversity than Rosetta alone. The ability to design a large and structurally diverse set of small beta barrel proteins greatly increases the protein shape space available for designing binders to protein targets of interest.
Collapse
Affiliation(s)
- David E. Kim
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- HHMI, University of Washington, Seattle, WA98195
| | - Davin R. Jensen
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI53226
| | - David Feldman
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Doug Tischer
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Ayesha Saleem
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Cameron M. Chow
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Xinting Li
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Lauren Carter
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Lukas Milles
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Hannah Nguyen
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Alex Kang
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Asim K. Bera
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Francis C. Peterson
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI53226
| | - Brian F. Volkman
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI53226
| | - Sergey Ovchinnikov
- Division of Science, Faculty of Arts and Sciences, Harvard University, Cambridge, MA02138
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA02138
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- HHMI, University of Washington, Seattle, WA98195
| |
Collapse
|
28
|
Meller A, Ward M, Borowsky J, Kshirsagar M, Lotthammer JM, Oviedo F, Ferres JL, Bowman GR. Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network. Nat Commun 2023; 14:1177. [PMID: 36859488 PMCID: PMC9977097 DOI: 10.1038/s41467-023-36699-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 02/09/2023] [Indexed: 03/03/2023] Open
Abstract
Cryptic pockets expand the scope of drug discovery by enabling targeting of proteins currently considered undruggable because they lack pockets in their ground state structures. However, identifying cryptic pockets is labor-intensive and slow. The ability to accurately and rapidly predict if and where cryptic pockets are likely to form from a structure would greatly accelerate the search for druggable pockets. Here, we present PocketMiner, a graph neural network trained to predict where pockets are likely to open in molecular dynamics simulations. Applying PocketMiner to single structures from a newly curated dataset of 39 experimentally confirmed cryptic pockets demonstrates that it accurately identifies cryptic pockets (ROC-AUC: 0.87) >1,000-fold faster than existing methods. We apply PocketMiner across the human proteome and show that predicted pockets open in simulations, suggesting that over half of proteins thought to lack pockets based on available structures likely contain cryptic pockets, vastly expanding the potentially druggable proteome.
Collapse
Affiliation(s)
- Artur Meller
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, 660 S. Euclid Ave., Box 8231, St. Louis, MO, 63110, USA
- Medical Scientist Training Program, Washington University in St. Louis, 660 S. Euclid Ave., St. Louis, MO, 63110, USA
| | - Michael Ward
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, 660 S. Euclid Ave., Box 8231, St. Louis, MO, 63110, USA
| | - Jonathan Borowsky
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, 660 S. Euclid Ave., Box 8231, St. Louis, MO, 63110, USA
| | | | - Jeffrey M Lotthammer
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, 660 S. Euclid Ave., Box 8231, St. Louis, MO, 63110, USA
| | - Felipe Oviedo
- AI for Good Research Lab, Microsoft, Redmond, WA, USA
| | | | - Gregory R Bowman
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, 660 S. Euclid Ave., Box 8231, St. Louis, MO, 63110, USA.
- Department of Biochemistry and Molecular Biophysics, University of Pennsylvania, 3620 Hamilton Walk, Philadelphia, PA, 19104, USA.
| |
Collapse
|
29
|
De novo protein fold design through sequence-independent fragment assembly simulations. Proc Natl Acad Sci U S A 2023; 120:e2208275120. [PMID: 36656852 PMCID: PMC9942881 DOI: 10.1073/pnas.2208275120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
De novo protein design generally consists of two steps, including structure and sequence design. Many protein design studies have focused on sequence design with scaffolds adapted from native structures in the PDB, which renders novel areas of protein structure and function space unexplored. We developed FoldDesign to create novel protein folds from specific secondary structure (SS) assignments through sequence-independent replica-exchange Monte Carlo (REMC) simulations. The method was tested on 354 non-redundant topologies, where FoldDesign consistently created stable structural folds, while recapitulating on average 87.7% of the SS elements. Meanwhile, the FoldDesign scaffolds had well-formed structures with buried residues and solvent-exposed areas closely matching their native counterparts. Despite the high fidelity to the input SS restraints and local structural characteristics of native proteins, a large portion of the designed scaffolds possessed global folds completely different from natural proteins in the PDB, highlighting the ability of FoldDesign to explore novel areas of protein fold space. Detailed data analyses revealed that the major contributions to the successful structure design lay in the optimal energy force field, which contains a balanced set of SS packing terms, and REMC simulations, which were coupled with multiple auxiliary movements to efficiently search the conformational space. Additionally, the ability to recognize and assemble uncommon super-SS geometries, rather than the unique arrangement of common SS motifs, was the key to generating novel folds. These results demonstrate a strong potential to explore both structural and functional spaces through computational design simulations that natural proteins have not reached through evolution.
Collapse
|
30
|
Kordes S, Beck J, Shanmugaratnam S, Flecks M, Höcker B. Physics-based approach to extend a de novo TIM barrel with rationally designed helix-loop-helix motifs. Protein Eng Des Sel 2023; 36:gzad012. [PMID: 37707513 DOI: 10.1093/protein/gzad012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/04/2023] [Accepted: 09/05/2023] [Indexed: 09/15/2023] Open
Abstract
Computational protein design promises the ability to build tailor-made proteins de novo. While a range of de novo proteins have been constructed so far, the majority of these designs have idealized topologies that lack larger cavities which are necessary for the incorporation of small molecule binding sites or enzymatic functions. One attractive target for enzyme design is the TIM-barrel fold, due to its ubiquity in nature and capability to host versatile functions. With the successful de novo design of a 4-fold symmetric TIM barrel, sTIM11, an idealized, minimalistic scaffold was created. In this work, we attempted to extend this de novo TIM barrel by incorporating a helix-loop-helix motif into its βα-loops by applying a physics-based modular design approach using Rosetta. Further diversification was performed by exploiting the symmetry of the scaffold to integrate two helix-loop-helix motifs into the scaffold. Analysis with AlphaFold2 and biochemical characterization demonstrate the formation of additional α-helical secondary structure elements supporting the successful extension as intended.
Collapse
Affiliation(s)
- Sina Kordes
- Department of Biochemistry, University of Bayreuth, Bayreuth 95447, Germany
| | - Julian Beck
- Department of Biochemistry, University of Bayreuth, Bayreuth 95447, Germany
| | | | - Merle Flecks
- Department of Biochemistry, University of Bayreuth, Bayreuth 95447, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth 95447, Germany
| |
Collapse
|
31
|
Mizutani Y, Mizuno M. Time-resolved spectroscopic mapping of vibrational energy flow in proteins: Understanding thermal diffusion at the nanoscale. J Chem Phys 2022; 157:240901. [PMID: 36586981 DOI: 10.1063/5.0116734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Vibrational energy exchange between various degrees of freedom is critical to barrier-crossing processes in proteins. Hemeproteins are well suited for studying vibrational energy exchange in proteins because the heme group is an efficient photothermal converter. The released energy by heme following photoexcitation shows migration in a protein moiety on a picosecond timescale, which is observed using time-resolved ultraviolet resonance Raman spectroscopy. The anti-Stokes ultraviolet resonance Raman intensity of a tryptophan residue is an excellent probe for the vibrational energy in proteins, allowing the mapping of energy flow with the spatial resolution of a single amino acid residue. This Perspective provides an overview of studies on vibrational energy flow in proteins, including future perspectives for both methodologies and applications.
Collapse
Affiliation(s)
- Yasuhisa Mizutani
- Department of Chemistry, Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka, Osaka 560-0043, Japan
| | - Misao Mizuno
- Department of Chemistry, Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka, Osaka 560-0043, Japan
| |
Collapse
|
32
|
Rosignoli S, Paiardini A. Boosting the Full Potential of PyMOL with Structural Biology Plugins. Biomolecules 2022; 12:biom12121764. [PMID: 36551192 PMCID: PMC9775141 DOI: 10.3390/biom12121764] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 11/23/2022] [Accepted: 11/24/2022] [Indexed: 11/29/2022] Open
Abstract
Over the past few decades, the number of available structural bioinformatics pipelines, libraries, plugins, web resources and software has increased exponentially and become accessible to the broad realm of life scientists. This expansion has shaped the field as a tangled network of methods, algorithms and user interfaces. In recent years PyMOL, widely used software for biomolecules visualization and analysis, has started to play a key role in providing an open platform for the successful implementation of expert knowledge into an easy-to-use molecular graphics tool. This review outlines the plugins and features that make PyMOL an eligible environment for supporting structural bioinformatics analyses.
Collapse
|
33
|
A Short Tale of the Origin of Proteins and Ribosome Evolution. Microorganisms 2022; 10:microorganisms10112115. [DOI: 10.3390/microorganisms10112115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 09/30/2022] [Accepted: 10/19/2022] [Indexed: 11/16/2022] Open
Abstract
Proteins are the workhorses of the cell and have been key players throughout the evolution of all organisms, from the origin of life to the present era. How might life have originated from the prebiotic chemistry of early Earth? This is one of the most intriguing unsolved questions in biology. Currently, however, it is generally accepted that amino acids, the building blocks of proteins, were abiotically available on primitive Earth, which would have made the formation of early peptides in a similar fashion possible. Peptides are likely to have coevolved with ancestral forms of RNA. The ribosome is the most evident product of this coevolution process, a sophisticated nanomachine that performs the synthesis of proteins codified in genomes. In this general review, we explore the evolution of proteins from their peptide origins to their folding and regulation based on the example of superoxide dismutase (SOD1), a key enzyme in oxygen metabolism on modern Earth.
Collapse
|
34
|
A generic framework for hierarchical de novo protein design. Proc Natl Acad Sci U S A 2022; 119:e2206111119. [PMID: 36252041 DOI: 10.1073/pnas.2206111119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
De novo protein design enables the exploration of novel sequences and structures absent from the natural protein universe. De novo design also stands as a stringent test for our understanding of the underlying physical principles of protein folding and may lead to the development of proteins with unmatched functional characteristics. The first fundamental challenge of de novo design is to devise "designable" structural templates leading to sequences that will adopt the predicted fold. Here, we built on the TopoBuilder (TB) de novo design method, to automatically assemble structural templates with native-like features starting from string descriptors that capture the overall topology of proteins. Our framework eliminates the dependency of hand-crafted and fold-specific rules through an iterative, data-driven approach that extracts geometrical parameters from structural tertiary motifs. We evaluated the TopoBuilder framework by designing sequences for a set of five protein folds and experimental characterization revealed that several sequences were folded and stable in solution. The TopoBuilder de novo design framework will be broadly useful to guide the generation of artificial proteins with customized geometries, enabling the exploration of the protein universe.
Collapse
|
35
|
Öten AM, Atak E, Taktak Karaca B, Fırtına S, Kutlu A. Discussing the roles of proline and glycine from the perspective of cold adaptation in lipases and cellulases. BIOCATAL BIOTRANSFOR 2022. [DOI: 10.1080/10242422.2022.2124111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Affiliation(s)
- Ahmet Melih Öten
- Biology Education Center, Faculty of Science and Technology, Uppsala University, Uppsala, Sweden
| | - Evren Atak
- Bioinformatics and System Biology, Bioengineering Department, Gebze Technical University, Kocaeli, Turkey
| | - Banu Taktak Karaca
- Molecular Biology & Genetics Department, Faculty of Natural Science and Engineering, Atlas University, Istanbul, Turkey
| | - Sinem Fırtına
- Bioinformatics & Genetics, Faculty of Natural Science and Engineering, İstinye University, Istanbul, Turkey
| | - Aslı Kutlu
- Bioinformatics & Genetics, Faculty of Natural Science and Engineering, İstinye University, Istanbul, Turkey
| |
Collapse
|
36
|
Chu AE, Fernandez D, Liu J, Eguchi RR, Huang PS. De Novo Design of a Highly Stable Ovoid TIM Barrel: Unlocking Pocket Shape towards Functional Design. BIODESIGN RESEARCH 2022; 2022:9842315. [PMID: 37850141 PMCID: PMC10521652 DOI: 10.34133/2022/9842315] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 05/26/2022] [Indexed: 10/19/2023] Open
Abstract
The ability to finely control the structure of protein folds is an important prerequisite to functional protein design. The TIM barrel fold is an important target for these efforts as it is highly enriched for diverse functions in nature. Although a TIM barrel protein has been designed de novo, the ability to finely alter the curvature of the central beta barrel and the overall architecture of the fold remains elusive, limiting its utility for functional design. Here, we report the de novo design of a TIM barrel with ovoid (twofold) symmetry, drawing inspiration from natural beta and TIM barrels with ovoid curvature. We use an autoregressive backbone sampling strategy to implement our hypothesis for elongated barrel curvature, followed by an iterative enrichment sequence design protocol to obtain sequences which yield a high proportion of successfully folding designs. Designed sequences are highly stable and fold to the designed barrel curvature as determined by a 2.1 Å resolution crystal structure. The designs show robustness to drastic mutations, retaining high melting temperatures even when multiple charged residues are buried in the hydrophobic core or when the hydrophobic core is ablated to alanine. As a scaffold with a greater capacity for hosting diverse hydrogen bonding networks and installation of binding pockets or active sites, the ovoid TIM barrel represents a major step towards the de novo design of functional TIM barrels.
Collapse
Affiliation(s)
- Alexander E Chu
- Biophysics Program, Stanford University, Stanford, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Daniel Fernandez
- Program in Chemistry, Engineering, And Medicine for Human Health (ChEM-H), Stanford University, Stanford, CA, USA
- Stanford ChEM-H, Macromolecular Structure Knowledge Center, Stanford University, Stanford, CA, USA
| | - Jingjia Liu
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Raphael R Eguchi
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Stanford ChEM-H, Macromolecular Structure Knowledge Center, Stanford University, Stanford, CA, USA
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Po-Ssu Huang
- Biophysics Program, Stanford University, Stanford, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Stanford ChEM-H, Macromolecular Structure Knowledge Center, Stanford University, Stanford, CA, USA
- Bio-X Institute, Stanford University, Stanford, CA, USA
| |
Collapse
|
37
|
Wicky BIM, Milles LF, Courbet A, Ragotte RJ, Dauparas J, Kinfu E, Tipps S, Kibler RD, Baek M, DiMaio F, Li X, Carter L, Kang A, Nguyen H, Bera AK, Baker D. Hallucinating symmetric protein assemblies. Science 2022; 378:56-61. [PMID: 36108048 PMCID: PMC9724707 DOI: 10.1126/science.add1964] [Citation(s) in RCA: 60] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Deep learning generative approaches provide an opportunity to broadly explore protein structure space beyond the sequences and structures of natural proteins. Here, we use deep network hallucination to generate a wide range of symmetric protein homo-oligomers given only a specification of the number of protomers and the protomer length. Crystal structures of seven designs are very similar to the computational models (median root mean square deviation: 0.6 angstroms), as are three cryo-electron microscopy structures of giant 10-nanometer rings with up to 1550 residues and C33 symmetry; all differ considerably from previously solved structures. Our results highlight the rich diversity of new protein structures that can be generated using deep learning and pave the way for the design of increasingly complex components for nanomachines and biomaterials.
Collapse
Affiliation(s)
- B. I. M. Wicky
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - L. F. Milles
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - A. Courbet
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - R. J. Ragotte
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - J. Dauparas
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - E. Kinfu
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - S. Tipps
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - R. D. Kibler
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - M. Baek
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - F. DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - X. Li
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - L. Carter
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - A. Kang
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - H. Nguyen
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - A. K. Bera
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - D. Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
38
|
Peñas-Utrilla D, Marcos E. Identifying well-folded de novo proteins in the new era of accurate structure prediction. Front Mol Biosci 2022; 9:991380. [PMID: 36275629 PMCID: PMC9581288 DOI: 10.3389/fmolb.2022.991380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 09/20/2022] [Indexed: 11/29/2022] Open
Abstract
Computational de novo protein design tailors proteins for target structures and oligomerisation states with high stability, which allows overcoming many limitations of natural proteins when redesigned for new functions. Despite significant advances in the field over the past decade, it remains challenging to predict sequences that will fold as stable monomers in solution or binders to a particular protein target; thereby requiring substantial experimental resources to identify proteins with the desired properties. To overcome this, here we leveraged the large amount of design data accumulated in the last decade, and the breakthrough in protein structure prediction from last year to investigate on improved ways of selecting promising designs before experimental testing. We collected de novo proteins from previous studies, 518 designed as monomers of different folds and 2112 as binders against the Botulinum neurotoxin, and analysed their structures with AlphaFold2, RoseTTAFold and fragment quality descriptors in combination with other properties related to surface interactions. These features showed high complementarity in rationalizing the experimental results, which allowed us to generate quite accurate machine learning models for predicting well-folded monomers and binders with a small set of descriptors. Cross-validating designs with varied orthogonal computational techniques should guide us for identifying design imperfections, rescuing designs and making more robust design selections before experimental testing.
Collapse
|
39
|
Allen PW, Cook JA, Colquhoun AN, Sorin EJ, Tapavicza E, Schwans JP. Energetically unfavorable protein angles: Exploration of a conserved dihedral angle in triosephosphate isomerase. Biopolymers 2022; 113:e23525. [DOI: 10.1002/bip.23525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 08/22/2022] [Accepted: 08/25/2022] [Indexed: 11/06/2022]
Affiliation(s)
- Patrick W. Allen
- Department of Chemistry and Biochemistry California State University Long Beach Long Beach California USA
| | - Jordan A. Cook
- Department of Chemistry and Biochemistry California State University Long Beach Long Beach California USA
| | - Anh N. Colquhoun
- Department of Chemistry and Biochemistry California State University Long Beach Long Beach California USA
| | - Eric J. Sorin
- Department of Chemistry and Biochemistry California State University Long Beach Long Beach California USA
| | - Enrico Tapavicza
- Department of Chemistry and Biochemistry California State University Long Beach Long Beach California USA
| | - Jason P. Schwans
- Department of Chemistry and Biochemistry California State University Long Beach Long Beach California USA
| |
Collapse
|
40
|
ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun 2022; 13:4348. [PMID: 35896542 PMCID: PMC9329459 DOI: 10.1038/s41467-022-32007-7] [Citation(s) in RCA: 93] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/13/2022] [Indexed: 11/29/2022] Open
Abstract
Protein design aims to build novel proteins customized for specific purposes, thereby holding the potential to tackle many environmental and biomedical problems. Recent progress in Transformer-based architectures has enabled the implementation of language models capable of generating text with human-like capabilities. Here, motivated by this success, we describe ProtGPT2, a language model trained on the protein space that generates de novo protein sequences following the principles of natural ones. The generated proteins display natural amino acid propensities, while disorder predictions indicate that 88% of ProtGPT2-generated proteins are globular, in line with natural sequences. Sensitive sequence searches in protein databases show that ProtGPT2 sequences are distantly related to natural ones, and similarity networks further demonstrate that ProtGPT2 is sampling unexplored regions of protein space. AlphaFold prediction of ProtGPT2-sequences yields well-folded non-idealized structures with embodiments and large loops and reveals topologies not captured in current structure databases. ProtGPT2 generates sequences in a matter of seconds and is freely available. Protein design aims to build novel proteins customized for specific purposes, thereby holding the potential to tackle many environmental and biomedical problems. Here the authors apply some of the latest advances in natural language processing, generative Transformers, to train ProtGPT2, a language model that explores unseen regions of the protein space while designing proteins with nature-like properties.
Collapse
|
41
|
Magi Meconi G, Sasselli IR, Bianco V, Onuchic JN, Coluzza I. Key aspects of the past 30 years of protein design. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2022; 85:086601. [PMID: 35704983 DOI: 10.1088/1361-6633/ac78ef] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/15/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins' most remarkable feature is their modularity. The large amount of information required to specify each protein's function is analogically encoded with an alphabet of just ∼20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Collapse
Affiliation(s)
- Giulia Magi Meconi
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | - Ivan R Sasselli
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | | | - Jose N Onuchic
- Center for Theoretical Biological Physics, Department of Physics & Astronomy, Department of Chemistry, Department of Biosciences, Rice University, Houston, TX 77251, United States of America
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Bld. Martina Casiano, UPV/EHU Science Park, Barrio Sarriena s/n, 48940 Leioa, Spain
- Basque Foundation for Science, Ikerbasque, 48009, Bilbao, Spain
| |
Collapse
|
42
|
|
43
|
Eguchi RR, Choe CA, Huang PS. Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput Biol 2022; 18:e1010271. [PMID: 35759518 PMCID: PMC9269947 DOI: 10.1371/journal.pcbi.1010271] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 07/08/2022] [Accepted: 06/01/2022] [Indexed: 12/26/2022] Open
Abstract
While deep learning models have seen increasing applications in protein science, few have been implemented for protein backbone generation—an important task in structure-based problems such as active site and interface design. We present a new approach to building class-specific backbones, using a variational auto-encoder to directly generate the 3D coordinates of immunoglobulins. Our model is torsion- and distance-aware, learns a high-resolution embedding of the dataset, and generates novel, high-quality structures compatible with existing design tools. We show that the Ig-VAE can be used with Rosetta to create a computational model of a SARS-CoV2-RBD binder via latent space sampling. We further demonstrate that the model’s generative prior is a powerful tool for guiding computational protein design, motivating a new paradigm under which backbone design is solved as constrained optimization problem in the latent space of a generative model. Many essential biochemical processes are governed by protein-protein interactions (PPIs), and our ability to make binding proteins that modulate PPIs is crucial to the creation of therapeutics and the study of cell-signaling. One critical aspect of PPI design is to capture protein conformational flexibility. Deep generative models are a class of mathematical models that are able to synthesize novel data from a finite set of training examples. Here, we make advances in computational protein design methodology by developing a deep generative model that creates protein backbones adopting the immunoglobulin fold, which is found in natural binding proteins such as antibodies. While generative models have been powerful in tasks such as image generation, using them to create proteins has remained a challenge. We solve this problem with a new model that allows for the direct generation of novel 3D molecules and show that they are of high chemical accuracy. Generated structures work well with existing protein design methods such as Rosetta, providing access to a large collection of novel immunoglobulin structures. Finally, we present a new protein design framework, called “generative design,” that shows how deep generative models such as ours can be applied to virtually any protein design problem.
Collapse
Affiliation(s)
- Raphael R. Eguchi
- Department of Biochemistry, Stanford University, Stanford, California, United States of America
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Christian A. Choe
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
| | - Po-Ssu Huang
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
44
|
The Structural Rule Distinguishing a Superfold: A Case Study of Ferredoxin Fold and the Reverse Ferredoxin Fold. Molecules 2022; 27:molecules27113547. [PMID: 35684484 PMCID: PMC9181952 DOI: 10.3390/molecules27113547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/24/2022] [Accepted: 05/28/2022] [Indexed: 01/27/2023] Open
Abstract
Superfolds are folds commonly observed among evolutionarily unrelated multiple superfamilies of proteins. Since discovering superfolds almost two decades ago, structural rules distinguishing superfolds from the other ordinary folds have been explored but remained elusive. Here, we analyzed a typical superfold, the ferredoxin fold, and the fold which reverses the N to C terminus direction from the ferredoxin fold as a case study to find the rule to distinguish superfolds from the other folds. Though all the known structural characteristics for superfolds apply to both the ferredoxin fold and the reverse ferredoxin fold, the reverse fold has been found only in a single superfamily. The database analyses in the present study revealed the structural preferences of αβ- and βα-units; the preferences separate two α-helices in the ferredoxin fold, preventing their collision and stabilizing the fold. In contrast, in the reverse ferredoxin fold, the preferences bring two helices near each other, inducing structural conflict. The Rosetta folding simulations suggested that the ferredoxin fold is physically much more realizable than the reverse ferredoxin fold. Therefore, we propose that minimal structural conflict or minimal frustration among secondary structures is the rule to distinguish a superfold from ordinary folds. Intriguingly, the database analyses revealed that a most stringent structural rule in proteins, the right-handedness of the βαβ-unit, is broken in a set of structures to prevent the frustration, suggesting the proposed rule of minimum frustration among secondary structural units is comparably strong as the right-handedness rule of the βαβ-unit.
Collapse
|
45
|
Blaber M. Variable and Conserved Regions of Secondary Structure in the β-Trefoil Fold: Structure Versus Function. Front Mol Biosci 2022; 9:889943. [PMID: 35517858 PMCID: PMC9062101 DOI: 10.3389/fmolb.2022.889943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 04/01/2022] [Indexed: 11/13/2022] Open
Abstract
β-trefoil proteins exhibit an approximate C3 rotational symmetry. An analysis of the secondary structure for members of this diverse superfamily of proteins indicates that it is comprised of remarkably conserved β-strands and highly-divergent turn regions. A fundamental “minimal” architecture can be identified that is devoid of heterogenous and extended turn regions, and is conserved among all family members. Conversely, the different functional families of β-trefoils can potentially be identified by their unique turn patterns (or turn “signature”). Such analyses provide clues as to the evolution of the β-trefoil family, suggesting a folding/stability role for the β-strands and a functional role for turn regions. This viewpoint can also guide de novo protein design of β-trefoil proteins having novel functionality.
Collapse
Affiliation(s)
- Michael Blaber
- Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, FL, United States
| |
Collapse
|
46
|
Tenorio CA, Parker JB, Blaber M. Functionalization of a symmetric protein scaffold: Redundant folding nuclei and alternative oligomeric folding pathways. Protein Sci 2022; 31:e4301. [PMID: 35481645 PMCID: PMC8996475 DOI: 10.1002/pro.4301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 03/12/2022] [Accepted: 03/15/2022] [Indexed: 02/02/2023]
Abstract
Successful de novo protein design ideally targets specific folding kinetics, stability thermodynamics, and biochemical functionality, and the simultaneous achievement of all these criteria in a single step design is challenging. Protein design is potentially simplified by separating the problem into two steps: (a) an initial design of a protein "scaffold" having appropriate folding kinetics and stability thermodynamics, followed by (b) appropriate functional mutation-possibly involving insertion of a peptide functional "cassette." This stepwise approach can also separate the orthogonal effects of the "stability/function" and "foldability/function" tradeoffs commonly observed in protein design. If the scaffold is a protein architecture having an exact rotational symmetry, then there is the potential for redundant folding nuclei and multiple equivalent sites of functionalization; thereby enabling broader functional adaptation. We describe such a "scaffold" and functional "cassette" design strategy applied to a β-trefoil threefold symmetric architecture and a heparin ligand functionality. The results support the availability of redundant folding nuclei within this symmetric architecture, and also identify a minimal peptide cassette conferring heparin affinity. The results also identify an energy barrier of destabilization that switches the protein folding pathway from monomeric to trimeric, thereby identifying another potential advantage of symmetric protein architecture in de novo design.
Collapse
Affiliation(s)
- Connie A. Tenorio
- Department of Biomedical Sciences Florida State University Tallahassee Florida USA
| | - Joseph B. Parker
- Department of Biomedical Sciences Florida State University Tallahassee Florida USA
| | - Michael Blaber
- Department of Biomedical Sciences Florida State University Tallahassee Florida USA
| |
Collapse
|
47
|
Tee WV, Wah Tan Z, Guarnera E, Berezovsky IN. Conservation and diversity in allosteric fingerprints of proteins for evolutionary-inspired engineering and design. J Mol Biol 2022; 434:167577. [PMID: 35395233 DOI: 10.1016/j.jmb.2022.167577] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 03/30/2022] [Accepted: 03/30/2022] [Indexed: 11/26/2022]
Abstract
Hand-in-hand work of physics and evolution delivered protein universe with diversity of forms, sizes, and functions. Pervasiveness and advantageous traits of allostery made it an important component of the protein function regulation, calling for thorough investigation of its structural determinants and evolution. Learning directly from nature, we explored here allosteric communication in several major folds and repeat proteins, including α/β and β-barrels, β-propellers, Ig-like fold, ankyrin and α/β leucine-rich repeat proteins, which provide structural platforms for many different enzymatic and signalling functions. We obtained a picture of conserved allosteric communication characteristic in different fold types, modifications of the structure-driven signalling patterns via sequence-determined divergence to specific functions, as well as emergence and potential diversification of allosteric regulation in multi-domain proteins and oligomeric assemblies. Our observations will be instrumental in facilitating the engineering and de novo design of proteins with allosterically regulated functions, including development of therapeutic biologics. In particular, results described here may guide the identification of the optimal structural platforms (e.g. fold type, size, and oligomerization states) and the types of diversifications/perturbations, such as mutations, effector binding, and order-disorder transition. The tunable allosteric linkage across distant regions can be used as a pivotal component in the design/engineering of modular biological systems beyond the traditional scaffolding function.
Collapse
Affiliation(s)
- Wei-Ven Tee
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671
| | - Zhen Wah Tan
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671
| | - Enrico Guarnera
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671
| | - Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, Singapore 117597.
| |
Collapse
|
48
|
Ding W, Nakai K, Gong H. Protein design via deep learning. Brief Bioinform 2022; 23:6554124. [PMID: 35348602 PMCID: PMC9116377 DOI: 10.1093/bib/bbac102] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 12/11/2022] Open
Abstract
Proteins with desired functions and properties are important in fields like nanotechnology and biomedicine. De novo protein design enables the production of previously unseen proteins from the ground up and is believed as a key point for handling real social challenges. Recent introduction of deep learning into design methods exhibits a transformative influence and is expected to represent a promising and exciting future direction. In this review, we retrospect the major aspects of current advances in deep-learning-based design procedures and illustrate their novelty in comparison with conventional knowledge-based approaches through noticeable cases. We not only describe deep learning developments in structure-based protein design and direct sequence design, but also highlight recent applications of deep reinforcement learning in protein design. The future perspectives on design goals, challenges and opportunities are also comprehensively discussed.
Collapse
Affiliation(s)
- Wenze Ding
- School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, China.,School of Future Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China.,MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| | - Kenta Nakai
- Institute of Medical Science, the University of Tokyo, Tokyo 1088639, Japan
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China.,Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
49
|
Khersonsky O, Fleishman SJ. What Have We Learned from Design of Function in Large Proteins? BIODESIGN RESEARCH 2022; 2022:9787581. [PMID: 37850148 PMCID: PMC10521758 DOI: 10.34133/2022/9787581] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 02/21/2022] [Indexed: 10/19/2023] Open
Abstract
The overarching goal of computational protein design is to gain complete control over protein structure and function. The majority of sophisticated binders and enzymes, however, are large and exhibit diverse and complex folds that defy atomistic design calculations. Encouragingly, recent strategies that combine evolutionary constraints from natural homologs with atomistic calculations have significantly improved design accuracy. In these approaches, evolutionary constraints mitigate the risk from misfolding and aggregation, focusing atomistic design calculations on a small but highly enriched sequence subspace. Such methods have dramatically optimized diverse proteins, including vaccine immunogens, enzymes for sustainable chemistry, and proteins with therapeutic potential. The new generation of deep learning-based ab initio structure predictors can be combined with these methods to extend the scope of protein design, in principle, to any natural protein of known sequence. We envision that protein engineering will come to rely on completely computational methods to efficiently discover and optimize biomolecular activities.
Collapse
Affiliation(s)
- Olga Khersonsky
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Sarel J. Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
50
|
Tatta ER, Imchen M, Moopantakath J, Kumavath R. Bioprospecting of microbial enzymes: current trends in industry and healthcare. Appl Microbiol Biotechnol 2022; 106:1813-1835. [PMID: 35254498 DOI: 10.1007/s00253-022-11859-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 02/15/2022] [Accepted: 02/26/2022] [Indexed: 12/13/2022]
Abstract
Microbial enzymes have an indispensable role in producing foods, pharmaceuticals, and other commercial goods. Many novel enzymes have been reported from all domains of life, such as plants, microbes, and animals. Nonetheless, industrially desirable enzymes of microbial origin are limited. This review article discusses the classifications, applications, sources, and challenges of most demanded industrial enzymes such as pectinases, cellulase, lipase, and protease. In addition, the production of novel enzymes through protein engineering technologies such as directed evolution, rational, and de novo design, for the improvement of existing industrial enzymes is also explored. We have also explored the role of metagenomics, nanotechnology, OMICs, and machine learning approaches in the bioprospecting of novel enzymes. Overall, this review covers the basics of biocatalysts in industrial and healthcare applications and provides an overview of existing microbial enzyme optimization tools. KEY POINTS: • Microbial bioactive molecules are vital for therapeutic and industrial applications. • High-throughput OMIC is the most proficient approach for novel enzyme discovery. • Comprehensive databases and efficient machine learning models are the need of the hour to fast forward de novo enzyme design and discovery.
Collapse
Affiliation(s)
- Eswar Rao Tatta
- Department of Genomic Science, School of Biological Sciences, Central University of Kerala, Tejaswini Hills, Periya (PO.), Kasaragod, Kerala, 671320, India
| | - Madangchanok Imchen
- Department of Genomic Science, School of Biological Sciences, Central University of Kerala, Tejaswini Hills, Periya (PO.), Kasaragod, Kerala, 671320, India
| | - Jamseel Moopantakath
- Department of Genomic Science, School of Biological Sciences, Central University of Kerala, Tejaswini Hills, Periya (PO.), Kasaragod, Kerala, 671320, India
| | - Ranjith Kumavath
- Department of Genomic Science, School of Biological Sciences, Central University of Kerala, Tejaswini Hills, Periya (PO.), Kasaragod, Kerala, 671320, India.
| |
Collapse
|