1
|
Min X, Liao Y, Chen X, Yang Q, Ying J, Zou J, Yang C, Zhang J, Ge S, Xia N. PB-GPT: An innovative GPT-based model for protein backbone generation. Structure 2024; 32:1820-1833.e5. [PMID: 39173620 DOI: 10.1016/j.str.2024.07.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 06/02/2024] [Accepted: 07/28/2024] [Indexed: 08/24/2024]
Abstract
With advanced computational methods, it is now feasible to modify or design proteins for specific functions, a process with significant implications for disease treatment and other medical applications. Protein structures and functions are intrinsically linked to their backbones, making the design of these backbones a pivotal aspect of protein engineering. In this study, we focus on the task of unconditionally generating protein backbones. By means of codebook quantization and compression dictionaries, we convert protein backbone structures into a distinctive coded language and propose a GPT-based protein backbone generation model, PB-GPT. To validate the generalization performance of the model, we trained and evaluated the model on both public datasets and small protein datasets. The results demonstrate that our model has the capability to unconditionally generate elaborate, highly realistic protein backbones with structural patterns resembling those of natural proteins, thus showcasing the significant potential of large language models in protein structure design.
Collapse
Affiliation(s)
- Xiaoping Min
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Yiyang Liao
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Xiao Chen
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Qianli Yang
- Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Junjie Ying
- Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Jiajun Zou
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Chongzhou Yang
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Jun Zhang
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Shengxiang Ge
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China.
| | - Ningshao Xia
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China.
| |
Collapse
|
2
|
Lipsh-Sokolik R, Fleishman SJ. Addressing epistasis in the design of protein function. Proc Natl Acad Sci U S A 2024; 121:e2314999121. [PMID: 39133844 PMCID: PMC11348311 DOI: 10.1073/pnas.2314999121] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024] Open
Abstract
Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional active-site variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.
Collapse
Affiliation(s)
- Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
3
|
Toledo-Patiño S, Goetz SK, Shanmugaratnam S, Höcker B, Farías-Rico JA. Molecular handcraft of a well-folded protein chimera. FEBS Lett 2024; 598:1375-1386. [PMID: 38508768 DOI: 10.1002/1873-3468.14856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 02/11/2024] [Accepted: 02/12/2024] [Indexed: 03/22/2024]
Abstract
Modular assembly is a compelling pathway to create new proteins, a concept supported by protein engineering and millennia of evolution. Natural evolution provided a repository of building blocks, known as domains, which trace back to even shorter segments that underwent numerous 'copy-paste' processes culminating in the scaffolds we see today. Utilizing the subdomain-database Fuzzle, we constructed a fold-chimera by integrating a flavodoxin-like fragment into a periplasmic binding protein. This chimera is well-folded and a crystal structure reveals stable interfaces between the fragments. These findings demonstrate the adaptability of α/β-proteins and offer a stepping stone for optimization. By emphasizing the practicality of fragment databases, our work pioneers new pathways in protein engineering. Ultimately, the results substantiate the conjecture that periplasmic binding proteins originated from a flavodoxin-like ancestor.
Collapse
Affiliation(s)
- Saacnicteh Toledo-Patiño
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Okinawa Institute of Science and Technology Graduate University, Japan
| | | | - Sooruban Shanmugaratnam
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Department of Biochemistry, University of Bayreuth, Germany
| | - Birte Höcker
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Department of Biochemistry, University of Bayreuth, Germany
| | - José Arcadio Farías-Rico
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| |
Collapse
|
4
|
Zheng T, Zhang C. Engineering strategies and challenges of endolysin as an antibacterial agent against Gram-negative bacteria. Microb Biotechnol 2024; 17:e14465. [PMID: 38593316 PMCID: PMC11003714 DOI: 10.1111/1751-7915.14465] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/09/2024] [Accepted: 03/21/2024] [Indexed: 04/11/2024] Open
Abstract
Bacteriophage endolysin is a novel antibacterial agent that has attracted much attention in the prevention and control of drug-resistant bacteria due to its unique mechanism of hydrolysing peptidoglycans. Although endolysin exhibits excellent bactericidal effects on Gram-positive bacteria, the presence of the outer membrane of Gram-negative bacteria makes it difficult to lyse them extracellularly, thus limiting their application field. To enhance the extracellular activity of endolysin and facilitate its crossing through the outer membrane of Gram-negative bacteria, researchers have adopted physical, chemical, and molecular methods. This review summarizes the characterization of endolysin targeting Gram-negative bacteria, strategies for endolysin modification, and the challenges and future of engineering endolysin against Gram-negative bacteria in clinical applications, to promote the application of endolysin in the prevention and control of Gram-negative bacteria.
Collapse
Affiliation(s)
- Tianyu Zheng
- Bathurst Future Agri‐Tech InstituteQingdao Agricultural UniversityQingdaoChina
| | - Can Zhang
- College of Veterinary MedicineQingdao Agricultural UniversityQingdaoChina
| |
Collapse
|
5
|
McGuinness KN, Fehon N, Feehan R, Miller M, Mutter AC, Rybak LA, Nam J, AbuSalim JE, Atkinson JT, Heidari H, Losada N, Kim JD, Koder RL, Lu Y, Silberg JJ, Slusky JSG, Falkowski PG, Nanda V. The energetics and evolution of oxidoreductases in deep time. Proteins 2024; 92:52-59. [PMID: 37596815 DOI: 10.1002/prot.26563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/06/2023] [Indexed: 08/20/2023]
Abstract
The core metabolic reactions of life drive electrons through a class of redox protein enzymes, the oxidoreductases. The energetics of electron flow is determined by the redox potentials of organic and inorganic cofactors as tuned by the protein environment. Understanding how protein structure affects oxidation-reduction energetics is crucial for studying metabolism, creating bioelectronic systems, and tracing the history of biological energy utilization on Earth. We constructed ProtReDox (https://protein-redox-potential.web.app), a manually curated database of experimentally determined redox potentials. With over 500 measurements, we can begin to identify how proteins modulate oxidation-reduction energetics across the tree of life. By mapping redox potentials onto networks of oxidoreductase fold evolution, we can infer the evolution of electron transfer energetics over deep time. ProtReDox is designed to include user-contributed submissions with the intention of making it a valuable resource for researchers in this field.
Collapse
Affiliation(s)
- Kenneth N McGuinness
- Department of Natural Sciences, Caldwell University, Caldwell, New Jersey, USA
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - Nolan Fehon
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Ryan Feehan
- Computational Biology Program, The University of Kansas, Lawrence, Kansas, USA
| | - Michelle Miller
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Andrew C Mutter
- Department of Physics, The City College of New York, New York, New York, USA
| | - Laryssa A Rybak
- Department of Physics, The City College of New York, New York, New York, USA
| | - Justin Nam
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - Jenna E AbuSalim
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - Joshua T Atkinson
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Hirbod Heidari
- Department of Chemistry, University of Texas at Austin, Austin, Texas, USA
| | - Natalie Losada
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - J Dongun Kim
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Ronald L Koder
- Department of Physics, The City College of New York, New York, New York, USA
| | - Yi Lu
- Department of Chemistry, University of Texas at Austin, Austin, Texas, USA
| | - Jonathan J Silberg
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Joanna S G Slusky
- Computational Biology Program, The University of Kansas, Lawrence, Kansas, USA
- Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, USA
| | - Paul G Falkowski
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
- Department of Earth and Planetary Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Vikas Nanda
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey, USA
| |
Collapse
|
6
|
Michel F, Romero‐Romero S, Höcker B. Retracing the evolution of a modern periplasmic binding protein. Protein Sci 2023; 32:e4793. [PMID: 37788980 PMCID: PMC10601554 DOI: 10.1002/pro.4793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/20/2023] [Accepted: 09/22/2023] [Indexed: 10/05/2023]
Abstract
Investigating the evolution of structural features in modern multidomain proteins helps to understand their immense diversity and functional versatility. The class of periplasmic binding proteins (PBPs) offers an opportunity to interrogate one of the main processes driving diversification: the duplication and fusion of protein sequences to generate new architectures. The symmetry of their two-lobed topology, their mechanism of binding, and the organization of their operon structure led to the hypothesis that PBPs arose through a duplication and fusion event of a single common ancestor. To investigate this claim, we set out to reverse the evolutionary process and recreate the structural equivalent of a single-lobed progenitor using ribose-binding protein (RBP) as our model. We found that this modern PBP can be deconstructed into its lobes, producing two proteins that represent possible progenitor halves. The isolated halves of RBP are well folded and monomeric proteins, albeit with a lower thermostability, and do not retain the original binding function. However, the two entities readily form a heterodimer in vitro and in-cell. The x-ray structure of the heterodimer closely resembles the parental protein. Moreover, the binding function is fully regained upon formation of the heterodimer with a ligand affinity similar to that observed in the modern RBP. This highlights how a duplication event could have given rise to a stable and functional PBP-like fold and provides insights into how more complex functional structures can evolve from simpler molecular components.
Collapse
Affiliation(s)
- Florian Michel
- Department of BiochemistryUniversity of BayreuthBayreuthGermany
| | | | - Birte Höcker
- Department of BiochemistryUniversity of BayreuthBayreuthGermany
| |
Collapse
|
7
|
Cordes MHJ, Sundman AK, Fox HC, Binford GJ. Protein salvage and repurposing in evolution: Phospholipase D toxins are stabilized by a remodeled scrap of a membrane association domain. Protein Sci 2023; 32:e4701. [PMID: 37313620 PMCID: PMC10303701 DOI: 10.1002/pro.4701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 06/03/2023] [Accepted: 06/07/2023] [Indexed: 06/15/2023]
Abstract
The glycerophosphodiester phosphodiesterase (GDPD)-like SMaseD/PLD domain family, which includes phospholipase D (PLD) toxins in recluse spiders and actinobacteria, evolved anciently in bacteria from the GDPD. The PLD enzymes retained the core (β/α)8 barrel fold of GDPD, while gaining a signature C-terminal expansion motif and losing a small insertion domain. Using sequence alignments and phylogenetic analysis, we infer that the C-terminal motif derives from a segment of an ancient bacterial PLAT domain. Formally, part of a protein containing a PLAT domain repeat underwent fusion to the C terminus of a GDPD barrel, leading to attachment of a segment of a PLAT domain, followed by a second complete PLAT domain. The complete domain was retained only in some basal homologs, but the PLAT segment was conserved and repurposed as the expansion motif. The PLAT segment corresponds to strands β7-β8 of a β-sandwich, while the expansion motif as represented in spider PLD toxins has been remodeled as an α-helix, a β-strand, and an ordered loop. The GDPD-PLAT fusion led to two acquisitions in founding the GDPD-like SMaseD/PLD family: (1) a PLAT domain that presumably supported early lipase activity by mediating membrane association, and (2) an expansion motif that putatively stabilized the catalytic domain, possibly compensating for, or permitting, loss of the insertion domain. Of wider significance, messy domain shuffling events can leave behind scraps of domains that can be salvaged, remodeled, and repurposed.
Collapse
Affiliation(s)
| | | | - Holden C. Fox
- Department of Chemistry and BiochemistryUniversity of ArizonaTucsonArizonaUSA
| | | |
Collapse
|
8
|
Lipsh-Sokolik R, Khersonsky O, Schröder SP, de Boer C, Hoch SY, Davies GJ, Overkleeft HS, Fleishman SJ. Combinatorial assembly and design of enzymes. Science 2023; 379:195-201. [PMID: 36634164 DOI: 10.1126/science.ade9434] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The design of structurally diverse enzymes is constrained by long-range interactions that are necessary for accurate folding. We introduce an atomistic and machine learning strategy for the combinatorial assembly and design of enzymes (CADENZ) to design fragments that combine with one another to generate diverse, low-energy structures with stable catalytic constellations. We applied CADENZ to endoxylanases and used activity-based protein profiling to recover thousands of structurally diverse enzymes. Functional designs exhibit high active-site preorganization and more stable and compact packing outside the active site. Implementing these lessons into CADENZ led to a 10-fold improved hit rate and more than 10,000 recovered enzymes. This design-test-learn loop can be applied, in principle, to any modular protein family, yielding huge diversity and general lessons on protein design principles.
Collapse
Affiliation(s)
- R Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - O Khersonsky
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - S P Schröder
- Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2300 RA Leiden, Netherlands
| | - C de Boer
- Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2300 RA Leiden, Netherlands
| | - S-Y Hoch
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - G J Davies
- York Structural Biology Laboratory, Department of Chemistry, The University of York, Heslington, York YO10 5DD, UK
| | - H S Overkleeft
- Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2300 RA Leiden, Netherlands
| | - S J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001 Rehovot, Israel
| |
Collapse
|
9
|
Abstract
Mechanisms of emergence and divergence of protein folds pose central questions in biological sciences. Incremental mutation and stepwise adaptation explain relationships between topologically similar protein folds. However, the universe of folds is diverse and riotous, suggesting more potent and creative forces are at play. Sequence and structure similarity are observed between distinct folds, indicating that proteins with distinct folds may share common ancestry. We found evidence of common ancestry between three distinct β-barrel folds: Scr kinase family homology (SH3), oligonucleotide/oligosaccharide-binding (OB), and cradle loop barrel (CLB). The data suggest a mechanism of fold evolution that interconverts SH3, OB, and CLB. This mechanism, which we call creative destruction, can be generalized to explain many examples of fold evolution including circular permutation. In creative destruction, an open reading frame duplicates or otherwise merges with another to produce a fused polypeptide. A merger forces two ancestral domains into a new sequence and spatial context. The fused polypeptide can explore folding landscapes that are inaccessible to either of the independent ancestral domains. However, the folding landscapes of the fused polypeptide are not fully independent of those of the ancestral domains. Creative destruction is thus partially conservative; a daughter fold inherits some motifs from ancestral folds. After merger and refolding, adaptive processes such as mutation and loss of extraneous segments optimize the new daughter fold. This model has application in disease states characterized by genetic instability. Fused proteins observed in cancer cells are likely to experience remodeled folding landscapes and realize altered folds, conferring new or altered functions.
Collapse
|
10
|
Insertions and deletions mediated functional divergence of Rossmann fold enzymes. Proc Natl Acad Sci U S A 2022; 119:e2207965119. [PMID: 36417431 PMCID: PMC9860332 DOI: 10.1073/pnas.2207965119] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Nucleobase-containing coenzymes are hypothesized to be relics of an early RNA-based world that preceded the emergence of proteins. Despite the importance of coenzyme-protein synergisms, their emergence and evolution remain understudied. An excellent target to address this issue is the Rossmann fold, the most catalytically diverse and abundant protein architecture in nature. We investigated two main Rossmann lineages: the nicotinamide adenine dinucleotide phosphate (NAD(P)) and the S-adenosyl methionine (SAM)- binding superfamilies. To identify the evolutionary changes that lead to a coenzyme specificity switch on these superfamilies, we performed structural and sequence-based Hidden Markov model analysis to systematically search for key motifs in their coenzyme-binding pockets. Our analyses revealed that through insertions and deletions (InDels) and a residue substitution, the ancient β1-loop-α1 coenzyme-binding structure of NAD(P) could be reshaped into the SAM-binding β1-loop-α1 structure. To experimentally prove this obsevation, we removed three amino acids from the NAD(P)-binding pocket and solved the structure of the resulting mutant, revealing the characteristic loop features of the SAM-binding pocket. To confirm the binding to SAM, we performed isothermal titration calorimetry measurements. Molecular dynamics simulations also corroborated the role of InDels in abolishing NAD binding and acquiring SAM binding. Our results uncovered how nature may have utilized insertions and deletions to optimize the different coenzyme-binding pockets and the distinct functionalities observed for Rossmann superfamilies. This work also proposes a general mechanism by which protein templates could have been recycled through the course of evolution to adopt different coenzymes and confer distinct chemistries.
Collapse
|
11
|
Ferruz N, Schmidt S, Höcker B. ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun 2022; 13:4348. [PMID: 35896542 PMCID: PMC9329459 DOI: 10.1038/s41467-022-32007-7] [Citation(s) in RCA: 190] [Impact Index Per Article: 63.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/13/2022] [Indexed: 11/29/2022] Open
Abstract
Protein design aims to build novel proteins customized for specific purposes, thereby holding the potential to tackle many environmental and biomedical problems. Recent progress in Transformer-based architectures has enabled the implementation of language models capable of generating text with human-like capabilities. Here, motivated by this success, we describe ProtGPT2, a language model trained on the protein space that generates de novo protein sequences following the principles of natural ones. The generated proteins display natural amino acid propensities, while disorder predictions indicate that 88% of ProtGPT2-generated proteins are globular, in line with natural sequences. Sensitive sequence searches in protein databases show that ProtGPT2 sequences are distantly related to natural ones, and similarity networks further demonstrate that ProtGPT2 is sampling unexplored regions of protein space. AlphaFold prediction of ProtGPT2-sequences yields well-folded non-idealized structures with embodiments and large loops and reveals topologies not captured in current structure databases. ProtGPT2 generates sequences in a matter of seconds and is freely available.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany.
- Institute of Informatics and Applications, University of Girona, Girona, Spain.
| | - Steffen Schmidt
- Computational Biochemistry, University of Bayreuth, 95447, Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| |
Collapse
|
12
|
Jayaraman V, Toledo‐Patiño S, Noda‐García L, Laurino P. Mechanisms of protein evolution. Protein Sci 2022; 31:e4362. [PMID: 35762715 PMCID: PMC9214755 DOI: 10.1002/pro.4362] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/11/2022] [Accepted: 05/14/2022] [Indexed: 11/06/2022]
Abstract
How do proteins evolve? How do changes in sequence mediate changes in protein structure, and in turn in function? This question has multiple angles, ranging from biochemistry and biophysics to evolutionary biology. This review provides a brief integrated view of some key mechanistic aspects of protein evolution. First, we explain how protein evolution is primarily driven by randomly acquired genetic mutations and selection for function, and how these mutations can even give rise to completely new folds. Then, we also comment on how phenotypic protein variability, including promiscuity, transcriptional and translational errors, may also accelerate this process, possibly via "plasticity-first" mechanisms. Finally, we highlight open questions in the field of protein evolution, with respect to the emergence of more sophisticated protein systems such as protein complexes, pathways, and the emergence of pre-LUCA enzymes.
Collapse
Affiliation(s)
- Vijay Jayaraman
- Department of Molecular Cell BiologyWeizmann Institute of ScienceRehovotIsrael
| | - Saacnicteh Toledo‐Patiño
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| | - Lianet Noda‐García
- Department of Plant Pathology and Microbiology, Institute of Environmental Sciences, Robert H. Smith Faculty of Agriculture, Food and EnvironmentHebrew University of JerusalemRehovotIsrael
| | - Paola Laurino
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| |
Collapse
|
13
|
|
14
|
León-González JA, Flatet P, Juárez-Ramírez MS, Farías-Rico JA. Folding and Evolution of a Repeat Protein on the Ribosome. Front Mol Biosci 2022; 9:851038. [PMID: 35707224 PMCID: PMC9189291 DOI: 10.3389/fmolb.2022.851038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 04/27/2022] [Indexed: 12/04/2022] Open
Abstract
Life on earth is the result of the work of proteins, the cellular nanomachines that fold into elaborated 3D structures to perform their functions. The ribosome synthesizes all the proteins of the biosphere, and many of them begin to fold during translation in a process known as cotranslational folding. In this work we discuss current advances of this field and provide computational and experimental data that highlight the role of ribosome in the evolution of protein structures. First, we used the sequence of the Ankyrin domain from the Drosophila Notch receptor to launch a deep sequence-based search. With this strategy, we found a conserved 33-residue motif shared by different protein folds. Then, to see how the vectorial addition of the motif would generate a full structure we measured the folding on the ribosome of the Ankyrin repeat protein. Not only the on-ribosome folding data is in full agreement with classical in vitro biophysical measurements but also it provides experimental evidence on how folded proteins could have evolved by duplication and fusion of smaller fragments in the RNA world. Overall, we discuss how the ribosomal exit tunnel could be conceptualized as an active site that is under evolutionary pressure to influence protein folding.
Collapse
Affiliation(s)
- José Alberto León-González
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| | - Perline Flatet
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - María Soledad Juárez-Ramírez
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| | - José Arcadio Farías-Rico
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
- *Correspondence: José Arcadio Farías-Rico,
| |
Collapse
|
15
|
Feng Q, Hou M, Liu J, Zhao K, Zhang G. Construct a variable-length fragment library for de novo protein structure prediction. Brief Bioinform 2022; 23:6547572. [PMID: 35284936 DOI: 10.1093/bib/bbac086] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/10/2022] [Accepted: 02/20/2022] [Indexed: 11/12/2022] Open
Abstract
Although remarkable achievements, such as AlphaFold2, have been made in end-to-end structure prediction, fragment libraries remain essential for de novo protein structure prediction, which can help explore and understand the protein-folding mechanism. In this work, we developed a variable-length fragment library (VFlib). In VFlib, a master structure database was first constructed from the Protein Data Bank through sequence clustering. The hidden Markov model (HMM) profile of each protein in the master structure database was generated by HHsuite, and the secondary structure of each protein was calculated by DSSP. For the query sequence, the HMM-profile was first constructed. Then, variable-length fragments were retrieved from the master structure database through dynamically variable-length profile-profile comparison. A complete method for chopping the query HMM-profile during this process was proposed to obtain fragments with increased diversity. Finally, secondary structure information was used to further screen the retrieved fragments to generate the final fragment library of specific query sequence. The experimental results obtained with a set of 120 nonredundant proteins show that the global precision and coverage of the fragment library generated by VFlib were 55.04% and 94.95% at the RMSD cutoff of 1.5 Å, respectively. Compared with the benchmark method of NNMake, the global precision of our fragment library had increased by 62.89% with equivalent coverage. Furthermore, the fragments generated by VFlib and NNMake were used to predict structure models through fragment assembly. Controlled experimental results demonstrate that the average TM-score of VFlib was 16.00% higher than that of NNMake.
Collapse
Affiliation(s)
- Qiongqiong Feng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Minghua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
16
|
Longo LM, Kolodny R, McGlynn SE. Evidence for the emergence of β-trefoils by 'Peptide Budding' from an IgG-like β-sandwich. PLoS Comput Biol 2022; 18:e1009833. [PMID: 35157697 PMCID: PMC8880906 DOI: 10.1371/journal.pcbi.1009833] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 02/25/2022] [Accepted: 01/13/2022] [Indexed: 12/02/2022] Open
Abstract
As sequence and structure comparison algorithms gain sensitivity, the intrinsic interconnectedness of the protein universe has become increasingly apparent. Despite this general trend, β-trefoils have emerged as an uncommon counterexample: They are an isolated protein lineage for which few, if any, sequence or structure associations to other lineages have been identified. If β-trefoils are, in fact, remote islands in sequence-structure space, it implies that the oligomerizing peptide that founded the β-trefoil lineage itself arose de novo. To better understand β-trefoil evolution, and to probe the limits of fragment sharing across the protein universe, we identified both 'β-trefoil bridging themes' (evolutionarily-related sequence segments) and 'β-trefoil-like motifs' (structure motifs with a hallmark feature of the β-trefoil architecture) in multiple, ostensibly unrelated, protein lineages. The success of the present approach stems, in part, from considering β-trefoil sequence segments or structure motifs rather than the β-trefoil architecture as a whole, as has been done previously. The newly uncovered inter-lineage connections presented here suggest a novel hypothesis about the origins of the β-trefoil fold itself-namely, that it is a derived fold formed by 'budding' from an Immunoglobulin-like β-sandwich protein. These results demonstrate how the evolution of a folded domain from a peptide need not be a signature of antiquity and underpin an emerging truth: few protein lineages escape nature's sewing table.
Collapse
Affiliation(s)
- Liam M. Longo
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- Blue Marble Space Institute of Science, Seattle, Washington, United States of America
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa, Israel
| | - Shawn E. McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- Blue Marble Space Institute of Science, Seattle, Washington, United States of America
| |
Collapse
|
17
|
Structural dynamics in the evolution of a bilobed protein scaffold. Proc Natl Acad Sci U S A 2021; 118:2026165118. [PMID: 34845009 PMCID: PMC8694067 DOI: 10.1073/pnas.2026165118] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/20/2021] [Indexed: 11/18/2022] Open
Abstract
Proteins conduct numerous complex biological functions by use of tailored structural dynamics. The molecular details of how these emerged from ancestral peptides remains mysterious. How does nature utilize the same repertoire of folds to diversify function? To shed light on this, we analyzed bilobed proteins with a common structural core, which is spread throughout the tree of life and is involved in diverse biological functions such as transcription, enzymatic catalysis, membrane transport, and signaling. We show here that the structural dynamics of the structural core differentiate predominantly via terminal additions during a long-period evolution. This diversifies substrate specificity and, ultimately, biological function. Novel biophysical tools allow the structural dynamics of proteins and the regulation of such dynamics by binding partners to be explored in unprecedented detail. Although this has provided critical insights into protein function, the means by which structural dynamics direct protein evolution remain poorly understood. Here, we investigated how proteins with a bilobed structure, composed of two related domains from the periplasmic-binding protein–like II domain family, have undergone divergent evolution, leading to adaptation of their structural dynamics. We performed a structural analysis on ∼600 bilobed proteins with a common primordial structural core, which we complemented with biophysical studies to explore the structural dynamics of selected examples by single-molecule Förster resonance energy transfer and Hydrogen–Deuterium exchange mass spectrometry. We show that evolutionary modifications of the structural core, largely at its termini, enable distinct structural dynamics, allowing the diversification of these proteins into transcription factors, enzymes, and extracytoplasmic transport-related proteins. Structural embellishments of the core created interdomain interactions that stabilized structural states, reshaping the active site geometry, and ultimately altered substrate specificity. Our findings reveal an as-yet-unrecognized mechanism for the emergence of functional promiscuity during long periods of evolution and are applicable to a large number of domain architectures.
Collapse
|
18
|
Pinto GP, Corbella M, Demkiv AO, Kamerlin SCL. Exploiting enzyme evolution for computational protein design. Trends Biochem Sci 2021; 47:375-389. [PMID: 34544655 DOI: 10.1016/j.tibs.2021.08.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 08/18/2021] [Accepted: 08/24/2021] [Indexed: 11/15/2022]
Abstract
Recent years have seen an explosion of interest in understanding the physicochemical parameters that shape enzyme evolution, as well as substantial advances in computational enzyme design. This review discusses three areas where evolutionary information can be used as part of the design process: (i) using ancestral sequence reconstruction (ASR) to generate new starting points for enzyme design efforts; (ii) learning from how nature uses conformational dynamics in enzyme evolution to mimic this process in silico; and (iii) modular design of enzymes from smaller fragments, again mimicking the process by which nature appears to create new protein folds. Using showcase examples, we highlight the importance of incorporating evolutionary information to continue to push forward the boundaries of enzyme design studies.
Collapse
Affiliation(s)
- Gaspar P Pinto
- Department of Chemistry - BMC, Uppsala University, BMC Box 576, S-751 23 Uppsala, Sweden
| | - Marina Corbella
- Department of Chemistry - BMC, Uppsala University, BMC Box 576, S-751 23 Uppsala, Sweden
| | - Andrey O Demkiv
- Department of Chemistry - BMC, Uppsala University, BMC Box 576, S-751 23 Uppsala, Sweden
| | | |
Collapse
|
19
|
Ferruz N, Michel F, Lobos F, Schmidt S, Höcker B. Fuzzle 2.0: Ligand Binding in Natural Protein Building Blocks. Front Mol Biosci 2021; 8:715972. [PMID: 34485385 PMCID: PMC8416435 DOI: 10.3389/fmolb.2021.715972] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 08/06/2021] [Indexed: 11/13/2022] Open
Abstract
Modern proteins have been shown to share evolutionary relationships via subdomain-sized fragments. The assembly of such fragments through duplication and recombination events led to the complex structures and functions we observe today. We previously implemented a pipeline that identified more than 1,000 of these fragments that are shared by different protein folds and developed a web interface to analyze and search for them. This resource named Fuzzle helps structural and evolutionary biologists to identify and analyze conserved parts of a protein but it also provides protein engineers with building blocks for example to design proteins by fragment combination. Here, we describe a new version of this web resource that was extended to include ligand information. This addition is a significant asset to the database since now protein fragments that bind specific ligands can be identified and analyzed. Often the mode of ligand binding is conserved in proteins thereby supporting a common evolutionary origin. The same can now be explored for subdomain-sized fragments within this database. This ligand binding information can also be used in protein engineering to graft binding pockets into other protein scaffolds or to transfer functional sites via recombination of a specific fragment. Fuzzle 2.0 is freely available at https://fuzzle.uni-bayreuth.de/2.0.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Florian Michel
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Francisco Lobos
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Steffen Schmidt
- Computational Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| |
Collapse
|
20
|
Ferruz N, Schmidt S, Höcker B. ProteinTools: a toolkit to analyze protein structures. Nucleic Acids Res 2021; 49:W559-W566. [PMID: 34019657 PMCID: PMC8262690 DOI: 10.1093/nar/gkab375] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/11/2021] [Accepted: 04/26/2021] [Indexed: 01/06/2023] Open
Abstract
The experimental characterization and computational prediction of protein structures has become increasingly rapid and precise. However, the analysis of protein structures often requires researchers to use several software packages or web servers, which complicates matters. To provide long-established structural analyses in a modern, easy-to-use interface, we implemented ProteinTools, a web server toolkit for protein structure analysis. ProteinTools gathers four applications so far, namely the identification of hydrophobic clusters, hydrogen bond networks, salt bridges, and contact maps. In all cases, the input data is a PDB identifier or an uploaded structure, whereas the output is an interactive dynamic web interface. Thanks to the modular nature of ProteinTools, the addition of new applications will become an easy task. Given the current need to have these tools in a single, fast, and interpretable interface, we believe that ProteinTools will become an essential toolkit for the wider protein research community. The web server is available at https://proteintools.uni-bayreuth.de.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Steffen Schmidt
- Computational Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| |
Collapse
|
21
|
Wu L, Qin L, Nie Y, Xu Y, Zhao YL. Computer-aided understanding and engineering of enzymatic selectivity. Biotechnol Adv 2021; 54:107793. [PMID: 34217814 DOI: 10.1016/j.biotechadv.2021.107793] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/26/2021] [Accepted: 06/28/2021] [Indexed: 12/26/2022]
Abstract
Enzymes offering chemo-, regio-, and stereoselectivity enable the asymmetric synthesis of high-value chiral molecules. Unfortunately, the drawback that naturally occurring enzymes are often inefficient or have undesired selectivity toward non-native substrates hinders the broadening of biocatalytic applications. To match the demands of specific selectivity in asymmetric synthesis, biochemists have implemented various computer-aided strategies in understanding and engineering enzymatic selectivity, diversifying the available repository of artificial enzymes. Here, given that the entire asymmetric catalytic cycle, involving precise interactions within the active pocket and substrate transport in the enzyme channel, could affect the enzymatic efficiency and selectivity, we presented a comprehensive overview of the computer-aided workflow for enzymatic selectivity. This review includes a mechanistic understanding of enzymatic selectivity based on quantum mechanical calculations, rational design of enzymatic selectivity guided by enzyme-substrate interactions, and enzymatic selectivity regulation via enzyme channel engineering. Finally, we discussed the computational paradigm for designing enzyme selectivity in silico to facilitate the advancement of asymmetric biosynthesis.
Collapse
Affiliation(s)
- Lunjie Wu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Lei Qin
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yao Nie
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China; Suqian Industrial Technology Research Institute of Jiangnan University, Suqian 223814, China.
| | - Yan Xu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China; State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China.
| | - Yi-Lei Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, MOE-LSB & MOE-LSC, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
22
|
Romero-Romero S, Kordes S, Michel F, Höcker B. Evolution, folding, and design of TIM barrels and related proteins. Curr Opin Struct Biol 2021; 68:94-104. [PMID: 33453500 PMCID: PMC8250049 DOI: 10.1016/j.sbi.2020.12.007] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/13/2020] [Accepted: 12/14/2020] [Indexed: 12/16/2022]
Abstract
Proteins are chief actors in life that perform a myriad of exquisite functions. This diversity has been enabled through the evolution and diversification of protein folds. Analysis of sequences and structures strongly suggest that numerous protein pieces have been reused as building blocks and propagated to many modern folds. This information can be traced to understand how the protein world has diversified. In this review, we discuss the latest advances in the analysis of protein evolutionary units, and we use as a model system one of the most abundant and versatile topologies, the TIM-barrel fold, to highlight the existing common principles that interconnect protein evolution, structure, folding, function, and design.
Collapse
Affiliation(s)
| | - Sina Kordes
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Florian Michel
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, 95447 Bayreuth, Germany.
| |
Collapse
|
23
|
Kolodny R, Nepomnyachiy S, Tawfik DS, Ben-Tal N. Bridging Themes: Short Protein Segments Found in Different Architectures. Mol Biol Evol 2021; 38:2191-2208. [PMID: 33502503 PMCID: PMC8136508 DOI: 10.1093/molbev/msab017] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The vast majority of theoretically possible polypeptide chains do not fold, let alone confer function. Hence, protein evolution from preexisting building blocks has clear potential advantages over ab initio emergence from random sequences. In support of this view, sequence similarities between different proteins is generally indicative of common ancestry, and we collectively refer to such homologous sequences as "themes." At the domain level, sequence homology is routinely detected. However, short themes which are segments, or fragments of intact domains, are particularly interesting because they may provide hints about the emergence of domains, as opposed to divergence of preexisting domains, or their mixing-and-matching to form multi-domain proteins. Here we identified 525 representative short themes, comprising 20-80 residues that are unexpectedly shared between domains considered to have emerged independently. Among these "bridging themes" are ones shared between the most ancient domains, for example, Rossmann, P-loop NTPase, TIM-barrel, flavodoxin, and ferredoxin-like. We elaborate on several particularly interesting cases, where the bridging themes mediate ligand binding. Ligand binding may have contributed to the stability and the plasticity of these building blocks, and to their ability to invade preexisting domains or serve as starting points for completely new domains.
Collapse
Affiliation(s)
- Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa, Israel
| | | | - Dan S Tawfik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Nir Ben-Tal
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular Biology, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
24
|
Heizinger L, Merkl R. Evidence for the preferential reuse of sub-domain motifs in primordial protein folds. Proteins 2021; 89:1167-1179. [PMID: 33957009 DOI: 10.1002/prot.26089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 04/15/2021] [Accepted: 04/28/2021] [Indexed: 11/06/2022]
Abstract
A comparison of protein backbones makes clear that not more than approximately 1400 different folds exist, each specifying the three-dimensional topology of a protein domain. Large proteins are composed of specific domain combinations and many domains can accommodate different functions. These findings confirm that the reuse of domains is key for the evolution of multi-domain proteins. If reuse was also the driving force for domain evolution, ancestral fragments of sub-domain size exist that are shared between domains possessing significantly different topologies. For the fully automated detection of putatively ancestral motifs, we developed the algorithm Fragstatt that compares proteins pairwise to identify fragments, that is, instantiations of the same motif. To reach maximal sensitivity, Fragstatt compares sequences by means of cascaded alignments of profile Hidden Markov Models. If the fragment sequences are sufficiently similar, the program determines and scores the structural concordance of the fragments. By analyzing a comprehensive set of proteins from the CATH database, Fragstatt identified 12 532 partially overlapping and structurally similar motifs that clustered to 134 unique motifs. The dissemination of these motifs is limited: We found only two domain topologies that contain two different motifs and generally, these motifs occur in not more than 18% of the CATH topologies. Interestingly, motifs are enriched in topologies that are considered ancestral. Thus, our findings suggest that the reuse of sub-domain sized fragments was relevant in early phases of protein evolution and became less important later on.
Collapse
Affiliation(s)
- Leonhard Heizinger
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| |
Collapse
|
25
|
Ferruz N, Noske J, Höcker B. Protlego: A Python package for the analysis and design of chimeric proteins. Bioinformatics 2021; 37:3182-3189. [PMID: 33901273 PMCID: PMC8504633 DOI: 10.1093/bioinformatics/btab253] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 03/05/2021] [Accepted: 04/19/2021] [Indexed: 01/03/2023] Open
Abstract
Motivation Duplication and recombination of protein fragments have led to the highly diverse protein space that we observe today. By mimicking this natural process, the design of protein chimeras via fragment recombination has proven experimentally successful and has opened a new era for the design of customizable proteins. The in silico building of structural models for these chimeric proteins, however, remains a manual task that requires a considerable degree of expertise and is not amenable for high-throughput studies. Energetic and structural analysis of the designed proteins often require the use of several tools, each with their unique technical difficulties and available in different programming languages or web servers. Results We implemented a Python package that enables automated, high-throughput design of chimeras and their structural analysis. First, it fetches evolutionarily conserved fragments from a built-in database (also available at fuzzle.uni-bayreuth.de). These relationships can then be represented via networks or further selected for chimera construction via recombination. Designed chimeras or natural proteins are then scored and minimized with the Charmm and Amber forcefields and their diverse structural features can be analyzed at ease. Here, we showcase Protlego’s pipeline by exploring the relationships between the P-loop and Rossmann superfolds, building and characterizing their offspring chimeras. We believe that Protlego provides a powerful new tool for the protein design community. Availability and implementation Protlego runs on the Linux platform and is freely available at (https://hoecker-lab.github.io/protlego/) with tutorials and documentation. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Jakob Noske
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| |
Collapse
|
26
|
Searching protein space for ancient sub-domain segments. Curr Opin Struct Biol 2021; 68:105-112. [PMID: 33476896 DOI: 10.1016/j.sbi.2020.11.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/29/2020] [Indexed: 01/08/2023]
Abstract
Evolutionary processes that formed the current protein universe left their traces, among them homologous segments that recur, or are 'reused,' in multiple proteins. These reused segments, called 'themes,' can be found at various scales, the best known of which is the domain. Yet, recent studies have begun to focus on the evolutionary insights that can be derived from sub-domain-scale themes, which are candidates for traces of more ancient events. Characterizing these may provide clues to the emergence of domains. Particularly interesting are themes that are reused across dissimilar contexts, that is, where the rest of the protein domain differs. We survey computational studies identifying reused themes within different contexts at the sub-domain level.
Collapse
|
27
|
Mylemans B, Voet AR, Tame JR. The Taming of the Screw: the natural and artificial development of β-propeller proteins. Curr Opin Struct Biol 2020; 68:48-54. [PMID: 33373773 DOI: 10.1016/j.sbi.2020.11.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/09/2020] [Accepted: 11/27/2020] [Indexed: 12/17/2022]
Abstract
Many proteins are found to possess repeated structural elements, which hint at ancient evolutionary origins and ongoing evolutionary processes. β-propeller proteins are a large family of such proteins, and a popular focus of structural analysis. This review highlights recent work to understand how they arose, and how they have developed into one of the most successful of all protein folds.
Collapse
Affiliation(s)
- Bram Mylemans
- Laboraotry for biomolecular modelling and design, KU Leuven, Celestijnenlaan 200G, 3001 Leuven, Belgium
| | - Arnout Rd Voet
- Protein Design Laboratory, Graduate School of Medical Life Science, Yokohama City University, Suehiro 1-7-29, Tsurumi, Yokohama 230-0045, Japan
| | - Jeremy Rh Tame
- Protein Design Laboratory, Graduate School of Medical Life Science, Yokohama City University, Suehiro 1-7-29, Tsurumi, Yokohama 230-0045, Japan.
| |
Collapse
|
28
|
Lipsh-Sokolik R, Listov D, Fleishman SJ. The AbDesign computational pipeline for modular backbone assembly and design of binders and enzymes. Protein Sci 2020; 30:151-159. [PMID: 33040418 PMCID: PMC7737780 DOI: 10.1002/pro.3970] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 10/07/2020] [Accepted: 10/09/2020] [Indexed: 12/12/2022]
Abstract
The functional sites of many protein families are dominated by diverse backbone regions that lack secondary structure (loops) but fold stably into their functionally competent state. Nevertheless, the design of structured loop regions from scratch, especially in functional sites, has met with great difficulty. We therefore developed an approach, called AbDesign, to exploit the natural modularity of many protein families and computationally assemble a large number of new backbones by combining naturally occurring modular fragments. This strategy yielded large, atomically accurate, and highly efficient proteins, including antibodies and enzymes exhibiting dozens of mutations from any natural protein. The combinatorial backbone‐conformation space that can be accessed by AbDesign even for a modestly sized family of homologs may exceed the diversity in the entire PDB, providing the sub‐Ångstrom level of control over the positioning of active‐site groups that is necessary for obtaining highly active proteins. This manuscript describes how to implement the pipeline using code that is freely available at https://github.com/Fleishman‐Lab/AbDesign_for_enzymes.
Collapse
Affiliation(s)
| | - Dina Listov
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|