1
|
Blanchard PL, Knick BJ, Whelan SA, Hackel BJ. Hyperstable Synthetic Mini-Proteins as Effective Ligand Scaffolds. ACS Synth Biol 2023; 12:3608-3622. [PMID: 38010428 PMCID: PMC10822706 DOI: 10.1021/acssynbio.3c00409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Small, single-domain protein scaffolds are compelling sources of molecular binding ligands with the potential for efficient physiological transport, modularity, and manufacturing. Yet, mini-proteins require a balance between biophysical robustness and diversity to enable new functions. We tested the developability and evolvability of millions of variants of 43 designed libraries of synthetic 40-amino acid βαββ proteins with diversified sheet, loop, or helix paratopes. We discovered a scaffold library that yielded hundreds of binders to seven targets while exhibiting high stability and soluble expression. Binder discovery yielded 6-122 nM affinities without affinity maturation and Tms averaging ≥78 °C. Broader βαββ libraries exhibited varied developability and evolvability. Sheet paratopes were the most consistently developable, and framework 1 was the most evolvable. Paratope evolvability was dependent on target, though several libraries were evolvable across many targets while exhibiting high stability and soluble expression. Select βαββ proteins are strong starting points for engineering performant binders.
Collapse
Affiliation(s)
- Paul L. Blanchard
- Department of Chemical Engineering and Materials Science, University of Minnesota – Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455
| | - Brandon J. Knick
- Department of Chemical Engineering and Materials Science, University of Minnesota – Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455
| | - Sarah A. Whelan
- Department of Chemical Engineering and Materials Science, University of Minnesota – Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455
| | - Benjamin J. Hackel
- Department of Chemical Engineering and Materials Science, University of Minnesota – Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455
| |
Collapse
|
2
|
McConnell A, Batten SL, Hackel BJ. Determinants of Developability and Evolvability of Synthetic Miniproteins as Ligand Scaffolds. J Mol Biol 2023; 435:168339. [PMID: 37923119 PMCID: PMC10872777 DOI: 10.1016/j.jmb.2023.168339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/23/2023] [Accepted: 10/28/2023] [Indexed: 11/07/2023]
Abstract
Binding ligands empower molecular therapeutics and diagnostics. Despite an array of protein scaffolds engineered for binding, the biophysical elements that drive developability and evolvability are not fully understood. In particular, engineering novel function while maintaining biophysical integrity within the context of small, single-domain proteins is challenged by integration of the structural framework and the evolved binding site. Miniproteins present a challenge to our limits of protein engineering capability and provide advantages in physiological targeting, modularity for multi-functional constructs, and unique binding modes. Herein, we evaluate the ability of hyperstable synthetic miniproteins, originally designed for foldedness, to function as binding scaffolds. We synthesized 45 combinatorial libraries, with 109 variants, systematically varied across two topologies, each with five starting frameworks and four or five diverse, structurally distinct paratopes, to elucidate their impact on evolvability and developability. We evaluated evolvability with yeast display binding selections against four targets. High-throughput assays -stability via yeast display and soluble expression via split-GFP in E. coli - measured developability. The comprehensive, robust dataset demonstrates how protein topology, parental framework, and paratope structure and location all impact scaffold performance. A hyperstable framework and localized diversity are not sufficient for an effective scaffold, but several designs of these elements within synthetic miniproteins designed solely for stability result in scaffold libraries with effective evolvability and developability. Engineered variants were well-folded, thermally stable, and bound target with single-digit nanomolar affinity. Thus, hyperstable synthetic miniproteins can serve as precursors to developable, evolvable mini-scaffolds with unique potential for physiological transport, modularity, and binding modes.
Collapse
Affiliation(s)
- Adam McConnell
- Department of Biomedical Engineering, University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States
| | - Sun Li Batten
- Department of Chemical Engineering and Materials Science, University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States
| | - Benjamin J Hackel
- Department of Biomedical Engineering, University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States; Department of Chemical Engineering and Materials Science, University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| |
Collapse
|
3
|
Mardikoraem M, Wang Z, Pascual N, Woldring D. Generative models for protein sequence modeling: recent advances and future directions. Brief Bioinform 2023; 24:bbad358. [PMID: 37864295 PMCID: PMC10589401 DOI: 10.1093/bib/bbad358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/08/2023] [Accepted: 09/12/2023] [Indexed: 10/22/2023] Open
Abstract
The widespread adoption of high-throughput omics technologies has exponentially increased the amount of protein sequence data involved in many salient disease pathways and their respective therapeutics and diagnostics. Despite the availability of large-scale sequence data, the lack of experimental fitness annotations underpins the need for self-supervised and unsupervised machine learning (ML) methods. These techniques leverage the meaningful features encoded in abundant unlabeled sequences to accomplish complex protein engineering tasks. Proficiency in the rapidly evolving fields of protein engineering and generative AI is required to realize the full potential of ML models as a tool for protein fitness landscape navigation. Here, we support this work by (i) providing an overview of the architecture and mathematical details of the most successful ML models applicable to sequence data (e.g. variational autoencoders, autoregressive models, generative adversarial neural networks, and diffusion models), (ii) guiding how to effectively implement these models on protein sequence data to predict fitness or generate high-fitness sequences and (iii) highlighting several successful studies that implement these techniques in protein engineering (from paratope regions and subcellular localization prediction to high-fitness sequences and protein design rules generation). By providing a comprehensive survey of model details, novel architecture developments, comparisons of model applications, and current challenges, this study intends to provide structured guidance and robust framework for delivering a prospective outlook in the ML-driven protein engineering field.
Collapse
Affiliation(s)
- Mehrsa Mardikoraem
- Michigan State University (MSU)‘s Department of Chemical Engineering and Materials Science
| | - Zirui Wang
- Regeneron Pharmaceuticals, Inc. Having received his B.S. in Chemical Engineering from MSU, he is currently pursuing a M.S. in Computer Science from Syracuse University
| | | | - Daniel Woldring
- MSU’s Department of Chemical Engineering and Materials Science and a member of MSU’s Institute for Quantitative Health Sciences and Engineering
| |
Collapse
|
4
|
Zhang C, Wu X, Song F, Liu S, Yu S, Zhou J. Core-Shell Droplet-Based Microfluidic Screening System for Filamentous Fungi. ACS Sens 2023; 8:3468-3477. [PMID: 37603446 DOI: 10.1021/acssensors.3c01018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Filamentous fungi are competitive hosts for the production of drugs, proteins, and chemicals. However, their utility is limited by screening methods and low throughput. In this work, a universal high-throughput system for optimizing protein production in filamentous fungi was described. Droplet microfluidics was used to encapsulate large mutant strain pools in biocompatible core-shell microdroplets designed to avoid mycelial punctures and thus sustain prolonged culture. The self-assembled split GFP was then used to characterize the secretory capacity of the strains and isolate strains with superior production titers according to the fluorescence signals. The platform was applied to optimize the α-amylase secretion of Aspergillus niger, resulting in the isolation of a strain with 2.02-fold higher secretion capacity. The system allows the analysis of >105 single cells per h and will facilitate ultrahigh-throughput screening experiments of filamentous fungi. This method could help identify improved hosts for the large-scale production of biotechnology-relevant proteins. This is a broadly applicable system that can be equally used in other hosts.
Collapse
Affiliation(s)
- Changtai Zhang
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Xiaohui Wu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Fuqiang Song
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Song Liu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Shiqin Yu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Jingwen Zhou
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| |
Collapse
|
5
|
Golinski AW, Schmitz ZD, Nielsen GH, Johnson B, Saha D, Appiah S, Hackel BJ, Martiniani S. Predicting and Interpreting Protein Developability Via Transfer of Convolutional Sequence Representation. ACS Synth Biol 2023; 12:2600-2615. [PMID: 37642646 PMCID: PMC10829850 DOI: 10.1021/acssynbio.3c00196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Engineered proteins have emerged as novel diagnostics, therapeutics, and catalysts. Often, poor protein developability─quantified by expression, solubility, and stability─hinders utility. The ability to predict protein developability from amino acid sequence would reduce the experimental burden when selecting candidates. Recent advances in screening technologies enabled a high-throughput (HT) developability dataset for 105 of 1020 possible variants of protein ligand scaffold Gp2. In this work, we evaluate the ability of neural networks to learn a developability representation from a HT dataset and transfer this knowledge to predict recombinant expression beyond observed sequences. The model convolves learned amino acid properties to predict expression levels 44% closer to the experimental variance compared to a non-embedded control. Analysis of learned amino acid embeddings highlights the uniqueness of cysteine, the importance of hydrophobicity and charge, and the unimportance of aromaticity, when aiming to improve the developability of small proteins. We identify clusters of similar sequences with increased recombinant expression through nonlinear dimensionality reduction and we explore the inferred expression landscape via nested sampling. The analysis enables the first direct visualization of the fitness landscape and highlights the existence of evolutionary bottlenecks in sequence space giving rise to competing subpopulations of sequences with different developability. The work advances applied protein engineering efforts by predicting and interpreting protein scaffold expression from a limited dataset. Furthermore, our statistical mechanical treatment of the problem advances foundational efforts to characterize the structure of the protein fitness landscape and the amino acid characteristics that influence protein developability.
Collapse
Affiliation(s)
- Alexander W. Golinski
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Zachary D. Schmitz
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Gregory H. Nielsen
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Bryce Johnson
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Diya Saha
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Sandhya Appiah
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Benjamin J. Hackel
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
| | - Stefano Martiniani
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN 55455
- Center for Soft Matter Research, Department of Physics, New York University, New York, NY 10003
- Simons Center for Computational Physical Chemistry, Departments of Chemistry, New York University, New York, NY 10003
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10003
| |
Collapse
|
6
|
McConnell A, Hackel BJ. Protein engineering via sequence-performance mapping. Cell Syst 2023; 14:656-666. [PMID: 37494931 PMCID: PMC10527434 DOI: 10.1016/j.cels.2023.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 05/10/2023] [Accepted: 06/21/2023] [Indexed: 07/28/2023]
Abstract
Discovery and evolution of new and improved proteins has empowered molecular therapeutics, diagnostics, and industrial biotechnology. Discovery and evolution both require efficient screens and effective libraries, although they differ in their challenges because of the absence or presence, respectively, of an initial protein variant with the desired function. A host of high-throughput technologies-experimental and computational-enable efficient screens to identify performant protein variants. In partnership, an informed search of sequence space is needed to overcome the immensity, sparsity, and complexity of the sequence-performance landscape. Early in the historical trajectory of protein engineering, these elements aligned with distinct approaches to identify the most performant sequence: selection from large, randomized combinatorial libraries versus rational computational design. Substantial advances have now emerged from the synergy of these perspectives. Rational design of combinatorial libraries aids the experimental search of sequence space, and high-throughput, high-integrity experimental data inform computational design. At the core of the collaborative interface, efficient protein characterization (rather than mere selection of optimal variants) maps sequence-performance landscapes. Such quantitative maps elucidate the complex relationships between protein sequence and performance-e.g., binding, catalytic efficiency, biological activity, and developability-thereby advancing fundamental protein science and facilitating protein discovery and evolution.
Collapse
Affiliation(s)
- Adam McConnell
- Department of Biomedical Engineering, University of Minnesota - Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455, USA
| | - Benjamin J Hackel
- Department of Biomedical Engineering, University of Minnesota - Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455, USA; Department of Chemical Engineering and Materials Science, University of Minnesota - Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455, USA.
| |
Collapse
|
7
|
Mardikoraem M, Woldring D. Protein Fitness Prediction Is Impacted by the Interplay of Language Models, Ensemble Learning, and Sampling Methods. Pharmaceutics 2023; 15:1337. [PMID: 37242577 PMCID: PMC10224321 DOI: 10.3390/pharmaceutics15051337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 04/19/2023] [Accepted: 04/21/2023] [Indexed: 05/28/2023] Open
Abstract
Advances in machine learning (ML) and the availability of protein sequences via high-throughput sequencing techniques have transformed the ability to design novel diagnostic and therapeutic proteins. ML allows protein engineers to capture complex trends hidden within protein sequences that would otherwise be difficult to identify in the context of the immense and rugged protein fitness landscape. Despite this potential, there persists a need for guidance during the training and evaluation of ML methods over sequencing data. Two key challenges for training discriminative models and evaluating their performance include handling severely imbalanced datasets (e.g., few high-fitness proteins among an abundance of non-functional proteins) and selecting appropriate protein sequence representations (numerical encodings). Here, we present a framework for applying ML over assay-labeled datasets to elucidate the capacity of sampling techniques and protein encoding methods to improve binding affinity and thermal stability prediction tasks. For protein sequence representations, we incorporate two widely used methods (One-Hot encoding and physiochemical encoding) and two language-based methods (next-token prediction, UniRep; masked-token prediction, ESM). Elaboration on performance is provided over protein fitness, protein size, and sampling techniques. In addition, an ensemble of protein representation methods is generated to discover the contribution of distinct representations and improve the final prediction score. We then implement multiple criteria decision analysis (MCDA; TOPSIS with entropy weighting), using multiple metrics well-suited for imbalanced data, to ensure statistical rigor in ranking our methods. Within the context of these datasets, the synthetic minority oversampling technique (SMOTE) outperformed undersampling while encoding sequences with One-Hot, UniRep, and ESM representations. Moreover, ensemble learning increased the predictive performance of the affinity-based dataset by 4% compared to the best single-encoding candidate (F1-score = 97%), while ESM alone was rigorous enough in stability prediction (F1-score = 92%).
Collapse
Affiliation(s)
- Mehrsa Mardikoraem
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Daniel Woldring
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
8
|
Lopez-Morales J, Vanella R, Kovacevic G, Santos MS, Nash MA. Titrating Avidity of Yeast-Displayed Proteins Using a Transcriptional Regulator. ACS Synth Biol 2023; 12:419-431. [PMID: 36728831 PMCID: PMC9942200 DOI: 10.1021/acssynbio.2c00351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Yeast surface display is a valuable tool for protein engineering and directed evolution; however, significant variability in the copy number (i.e., avidity) of displayed variants on the yeast cell wall complicates screening and selection campaigns. Here, we report an engineered titratable display platform that modulates the avidity of Aga2-fusion proteins on the yeast cell wall dependent on the concentration of the anhydrotetracycline (aTc) inducer. Our design is based on a genomic Aga1 gene copy and an episomal Aga2-fusion construct both under the control of an aTc-dependent transcriptional regulator that enables stoichiometric and titratable expression, secretion, and display of Aga2-fusion proteins. We demonstrate tunable display levels over 2-3 orders of magnitude for various model proteins, including glucose oxidase enzyme variants, mechanostable dockerin-binding domains, and anti-PDL1 affibody domains. By regulating the copy number of displayed proteins, we demonstrate the effects of titratable avidity levels on several specific phenotypic activities, including enzyme activity and cell adhesion to surfaces under shear flow. Finally, we show that titrating down the display level allows yeast-based binding affinity measurements to be performed in a regime that avoids ligand depletion effects while maintaining small sample volumes, avoiding a well-known artifact in yeast-based binding assays. The ability to titrate the multivalency of proteins on the yeast cell wall through simple inducer control will benefit protein engineering and directed evolution methodology relying on yeast display for broad classes of therapeutic and diagnostic proteins of interest.
Collapse
Affiliation(s)
- Joanan Lopez-Morales
- Department
of Chemistry, University of Basel, Basel 4058, Switzerland,Swiss
Nanoscience Institute, University of Basel, Basel 4056, Switzerland,Department
of Biosystems Science and Engineering, ETH
Zurich, Basel 4058, Switzerland
| | - Rosario Vanella
- Department
of Chemistry, University of Basel, Basel 4058, Switzerland,Department
of Biosystems Science and Engineering, ETH
Zurich, Basel 4058, Switzerland
| | - Gordana Kovacevic
- Department
of Chemistry, University of Basel, Basel 4058, Switzerland,Department
of Biosystems Science and Engineering, ETH
Zurich, Basel 4058, Switzerland
| | - Mariana Sá Santos
- Department
of Chemistry, University of Basel, Basel 4058, Switzerland,Department
of Biosystems Science and Engineering, ETH
Zurich, Basel 4058, Switzerland
| | - Michael A. Nash
- Department
of Chemistry, University of Basel, Basel 4058, Switzerland,Swiss
Nanoscience Institute, University of Basel, Basel 4056, Switzerland,Department
of Biosystems Science and Engineering, ETH
Zurich, Basel 4058, Switzerland,
| |
Collapse
|
9
|
Tresnak DT, Hackel BJ. Deep Antimicrobial Activity and Stability Analysis Inform Lysin Sequence-Function Mapping. ACS Synth Biol 2023; 12:249-264. [PMID: 36599162 PMCID: PMC10822705 DOI: 10.1021/acssynbio.2c00509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Antibiotic-resistant infectious disease is a critical challenge to human health. Antimicrobial proteins offer a compelling solution if engineered for potency, selectivity, and physiological stability. Lysins, which lyse cells via degradation of cell wall peptidoglycans, have significant potential to fill this role. Yet, the functional complexity of antimicrobial activity has hindered high-throughput characterization for discovery and design. To dramatically expand knowledge of the sequence-function landscape of lysins, we developed a depletion-based assay for library-scale measurement of lysin inhibitory activity. We coupled this platform with a high-throughput proteolytic stability assay to assess the activity and stability of ∼5 × 104 lysin catalytic domain variants, resulting in the discovery of a variant with increased activity (70 ± 20%) and stability (7.2 ± 0.4 °C increased midpoint of thermal denaturation). Ridge regression of the resulting data set demonstrated that libraries with a higher average Hamming distance better informed pairwise models and that coupling activity and stability assays enabled better prediction of catalytically active lysins. The best models achieved Pearson's correlation coefficients of 0.87 ± 0.01 and 0.61 ± 0.04 for predicting catalytic domain stability and activity, respectively. Our work provides an efficient strategy for constructing protein sequence-function landscapes, drastically increases screening throughput for engineering lysins, and yields promising lysins for further development.
Collapse
Affiliation(s)
- Daniel T. Tresnak
- Department of Chemical Engineering and Materials Science, University of Minnesota – Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455
| | - Benjamin J. Hackel
- Department of Chemical Engineering and Materials Science, University of Minnesota – Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455
| |
Collapse
|
10
|
Norrild RK, Johansson KE, O’Shea C, Morth JP, Lindorff-Larsen K, Winther JR. Increasing protein stability by inferring substitution effects from high-throughput experiments. CELL REPORTS METHODS 2022; 2:100333. [PMID: 36452862 PMCID: PMC9701609 DOI: 10.1016/j.crmeth.2022.100333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 06/22/2022] [Accepted: 10/19/2022] [Indexed: 06/17/2023]
Abstract
We apply a computational model, global multi-mutant analysis (GMMA), to inform on effects of most amino acid substitutions from a randomly mutated gene library. Using a high mutation frequency, the method can determine mutations that increase the stability of even very stable proteins for which conventional selection systems have reached their limit. As a demonstration of this, we screened a mutant library of a highly stable and computationally redesigned model protein using an in vivo genetic sensor for folding and assigned a stability effect to 374 of 912 possible single amino acid substitutions. Combining the top 9 substitutions increased the unfolding energy 47 to 69 kJ/mol in a single engineering step. Crystal structures of stabilized variants showed small perturbations in helices 1 and 2, which rendered them closer in structure to the redesign template. This case study illustrates the capability of the method, which is applicable to any screen for protein function.
Collapse
Affiliation(s)
- Rasmus Krogh Norrild
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
- Department of Biotechnology and Biomedicine, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Kristoffer Enøe Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Charlotte O’Shea
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Jens Preben Morth
- Department of Biotechnology and Biomedicine, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Jakob Rahr Winther
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| |
Collapse
|
11
|
Ahmed S, Manjunath K, Chattopadhyay G, Varadarajan R. Identification of stabilizing point mutations through mutagenesis of destabilized protein libraries. J Biol Chem 2022; 298:101785. [PMID: 35247389 PMCID: PMC8971944 DOI: 10.1016/j.jbc.2022.101785] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 02/18/2022] [Accepted: 02/26/2022] [Indexed: 01/22/2023] Open
Abstract
Although there have been recent transformative advances in the area of protein structure prediction, prediction of point mutations that improve protein stability remains challenging. It is possible to construct and screen large mutant libraries for improved activity or ligand binding. However, reliable screens for mutants that improve protein stability do not yet exist, especially for proteins that are well folded and relatively stable. Here, we demonstrate that incorporation of a single, specific, destabilizing mutation termed parent inactivating mutation into each member of a single-site saturation mutagenesis library, followed by screening for suppressors, allows for robust and accurate identification of stabilizing mutations. We carried out fluorescence-activated cell sorting of such a yeast surface display, saturation suppressor library of the bacterial toxin CcdB, followed by deep sequencing of sorted populations. We found that multiple stabilizing mutations could be identified after a single round of sorting. In addition, multiple libraries with different parent inactivating mutations could be pooled and simultaneously screened to further enhance the accuracy of identification of stabilizing mutations. Finally, we show that individual stabilizing mutations could be combined to result in a multi-mutant that demonstrated an increase in thermal melting temperature of about 20 °C, and that displayed enhanced tolerance to high temperature exposure. We conclude that as this method is robust and employs small library sizes, it can be readily extended to other display and screening formats to rapidly isolate stabilized protein mutants.
Collapse
Affiliation(s)
- Shahbaz Ahmed
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Kavyashree Manjunath
- Centre for Chemical Biology and Therapeutics, Institute of Stem Cell Science and Regenerative Medicine, Bangalore, India
| | | | | |
Collapse
|
12
|
McLure RJ, Radford SE, Brockwell DJ. High-throughput directed evolution: a golden era for protein science. TRENDS IN CHEMISTRY 2022. [DOI: 10.1016/j.trechm.2022.02.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
13
|
Mardikoraem M, Woldring D. Machine Learning-driven Protein Library Design: A Path Toward Smarter Libraries. Methods Mol Biol 2022; 2491:87-104. [PMID: 35482186 DOI: 10.1007/978-1-0716-2285-8_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Proteins are small yet valuable biomolecules that play a versatile role in therapeutics and diagnostics. The intricate sequence-structure-function paradigm in the realm of proteins opens the possibility for directly mapping amino acid sequence to function. However, the rugged nature of the protein fitness landscape and an astronomical number of possible mutations even for small proteins make navigating this system a daunting task. Moreover, the scarcity of functional proteins and the ease with which deleterious mutations are introduced, due to complex epistatic relationships, compound the existing challenges. This highlights the need for auxiliary tools in current techniques such as rational design and directed evolution. To that end, the state-of-the-art machine learning can offer time and cost efficiency in finding high fitness proteins, circumventing unnecessary wet-lab experiments. In the context of improving library design, machine learning provides valuable insights via its unique features such as high adaptation to complex systems, multi-tasking, and parallelism, and the ability to capture hidden trends in input data. Finally, both the advancements in computational resources and the rapidly increasing number of sequences in protein databases will allow more promising and detailed insights delivered from machine learning to protein library design. In this chapter, fundamental concepts and a method for machine learning-driven library design leveraging deep sequencing datasets will be discussed. We elaborate on (1) basic knowledge about machine learning algorithms, (2) the benefit of machine learning in library design, and (3) methodology for implementing machine learning in library design.
Collapse
Affiliation(s)
- Mehrsa Mardikoraem
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Daniel Woldring
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, USA.
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
14
|
DeJong MP, Ritter SC, Fransen KA, Tresnak DT, Golinski AW, Hackel BJ. A Platform for Deep Sequence-Activity Mapping and Engineering Antimicrobial Peptides. ACS Synth Biol 2021; 10:2689-2704. [PMID: 34506711 DOI: 10.1021/acssynbio.1c00314] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Developing potent antimicrobials, and platforms for their study and engineering, is critical as antibiotic resistance grows. A high-throughput method to quantify antimicrobial peptide and protein (AMP) activity across a broad continuum would be powerful to elucidate sequence-activity landscapes and identify potent mutants. Yet the complexity of antimicrobial activity has largely constrained the scope and mechanistic bandwidth of AMP variant analysis. We developed a platform to efficiently perform sequence-activity mapping of AMPs via depletion (SAMP-Dep): a bacterial host culture is transformed with an AMP mutant library, induced to intracellularly express AMPs, grown under selective pressure, and deep sequenced to quantify mutant depletion. The slope of mutant growth rate versus induction level indicates potency. Using SAMP-Dep, we mapped the sequence-activity landscape of 170 000 mutants of oncocin, a proline-rich AMP, for intracellular activity against Escherichia coli. Clonal validation supported the platform's sensitivity and accuracy. The mapped landscape revealed an extended oncocin pharmacophore contrary to earlier structural studies, clarified the C-terminus role in internalization, identified functional epistasis, and guided focused, successful synthetic peptide library design, yielding a mutant with 2-fold enhancement in both intracellular and extracellular activity. The efficiency of SAMP-Dep poises the platform to transform AMP engineering, characterization, and discovery.
Collapse
Affiliation(s)
- Matthew P. DeJong
- Department of Chemical Engineering and Materials Science, University of Minnesota − Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Seth C. Ritter
- Department of Chemical Engineering and Materials Science, University of Minnesota − Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Katharina A. Fransen
- Department of Chemical Engineering and Materials Science, University of Minnesota − Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Daniel T. Tresnak
- Department of Chemical Engineering and Materials Science, University of Minnesota − Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Alexander W. Golinski
- Department of Chemical Engineering and Materials Science, University of Minnesota − Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Benjamin J. Hackel
- Department of Chemical Engineering and Materials Science, University of Minnesota − Twin Cities, Minneapolis, Minnesota 55455, United States
| |
Collapse
|