1
|
Chen J, Chen M, Yu X. Fluorescent probes in autoimmune disease research: current status and future prospects. J Transl Med 2025; 23:411. [PMID: 40205498 PMCID: PMC11984237 DOI: 10.1186/s12967-025-06430-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Accepted: 03/25/2025] [Indexed: 04/11/2025] Open
Abstract
Autoimmune diseases (AD) present substantial challenges for early diagnosis and precise treatment due to their intricate pathogenesis and varied clinical manifestations. While existing diagnostic methods and treatment strategies have advanced, their sensitivity, specificity, and real-time applicability in clinical settings continue to exhibit significant limitations. In recent years, fluorescent probes have emerged as highly sensitive and specific biological imaging tools, demonstrating substantial potential in AD research.This review examines the response mechanisms and historical evolution of various types of fluorescent probes, systematically summarizing the latest research advancements in their application to autoimmune diseases. It highlights key applications in biomarker detection, dynamic monitoring of immune cell functions, and assessment of drug treatment efficacy. Furthermore, this article analyzes the technical challenges currently encountered in probe development and proposes potential directions for future research. With ongoing advancements in materials science, nanotechnology, and bioengineering, fluorescent probes are anticipated to achieve higher sensitivity and enhanced functional integration, thereby facilitating early detection, dynamic monitoring, and innovative treatment strategies for autoimmune diseases. Overall, fluorescent probes possess substantial scientific significance and application value in both research and clinical settings related to autoimmune diseases, signaling a new era of personalized and precision medicine.
Collapse
Affiliation(s)
- Junli Chen
- Wujin Hospital Affiliated With Jiangsu University, Changzhou, Jiangsu, China
- School of Medicine, Jiangsu University, Zhenjiang, Jiangsu, China
| | - Mingkai Chen
- Wujin Hospital Affiliated With Jiangsu University, Changzhou, Jiangsu, China
- School of Medicine, Jiangsu University, Zhenjiang, Jiangsu, China
| | - Xiaolong Yu
- Wujin Hospital Affiliated With Jiangsu University, Changzhou, Jiangsu, China.
- The Wujin Clinical College of Xuzhou Medical University, Changzhou, Jiangsu, China.
| |
Collapse
|
2
|
Thomas N, Belanger D, Xu C, Lee H, Hirano K, Iwai K, Polic V, Nyberg KD, Hoff KG, Frenz L, Emrich CA, Kim JW, Chavarha M, Ramanan A, Agresti JJ, Colwell LJ. Engineering highly active nuclease enzymes with machine learning and high-throughput screening. Cell Syst 2025; 16:101236. [PMID: 40081373 DOI: 10.1016/j.cels.2025.101236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 09/17/2024] [Accepted: 02/19/2025] [Indexed: 03/16/2025]
Abstract
Optimizing enzymes to function in novel chemical environments is a central goal of synthetic biology, but optimization is often hindered by a rugged fitness landscape and costly experiments. In this work, we present TeleProt, a machine learning (ML) framework that blends evolutionary and experimental data to design diverse protein libraries, and employ it to improve the catalytic activity of a nuclease enzyme that degrades biofilms that accumulate on chronic wounds. After multiple rounds of high-throughput experiments, TeleProt found a significantly better top-performing enzyme than directed evolution (DE), had a better hit rate at finding diverse, high-activity variants, and was even able to design a high-performance initial library using no prior experimental data. We have released a dataset of 55,000 nuclease variants, one of the most extensive genotype-phenotype enzyme activity landscapes to date, to drive further progress in ML-guided design. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Neil Thomas
- X, the Moonshot Factory, Mountain View, CA 94043, USA.
| | | | | | | | | | | | | | | | | | | | | | - Jun W Kim
- X, the Moonshot Factory, Mountain View, CA 94043, USA
| | | | - Abi Ramanan
- X, the Moonshot Factory, Mountain View, CA 94043, USA
| | | | - Lucy J Colwell
- Google DeepMind, Cambridge, MA 02142, USA; Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK.
| |
Collapse
|
3
|
Sescil J, Havens SM, Wang W. Principles and Design of Molecular Tools for Sensing and Perturbing Cell Surface Receptor Activity. Chem Rev 2025; 125:2665-2702. [PMID: 39999110 PMCID: PMC11934152 DOI: 10.1021/acs.chemrev.4c00582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2025]
Abstract
Cell-surface receptors are vital for controlling numerous cellular processes with their dysregulation being linked to disease states. Therefore, it is necessary to develop tools to study receptors and the signaling pathways they control. This Review broadly describes molecular approaches that enable 1) the visualization of receptors to determine their localization and distribution; 2) sensing receptor activation with permanent readouts as well as readouts in real time; and 3) perturbing receptor activity and mimicking receptor-controlled processes to learn more about these processes. Together, these tools have provided valuable insight into fundamental receptor biology and helped to characterize therapeutics that target receptors.
Collapse
Affiliation(s)
- Jennifer Sescil
- Department of Chemistry, University of Michigan, Ann Arbor,
MI, 48109
- Life Sciences Institute, University of Michigan, Ann Arbor,
MI, 48109
| | - Steven M. Havens
- Department of Chemistry, University of Michigan, Ann Arbor,
MI, 48109
- Life Sciences Institute, University of Michigan, Ann Arbor,
MI, 48109
| | - Wenjing Wang
- Department of Chemistry, University of Michigan, Ann Arbor,
MI, 48109
- Life Sciences Institute, University of Michigan, Ann Arbor,
MI, 48109
- Neuroscience Graduate Program, University of Michigan, Ann
Arbor, MI, 48109
- Program in Chemical Biology, University of Michigan, Ann
Arbor, MI, 48109
| |
Collapse
|
4
|
Ikebe J, Yoshida K, Ishihara S, Kurumida Y, Kameda T. Computational Design of Burkholderia cepacia Lipase Mutants that Show Enhanced Stereoselectivity in the Production of l-Menthol. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2025; 73:4829-4839. [PMID: 39960458 DOI: 10.1021/acs.jafc.4c09949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2025]
Abstract
l-Menthol, valued for its aroma, cooling properties, and biological activity, is often produced as a mixture with d-menthol, which negatively impacts taste and odor. To improve the purity of industrial l-menthol production, we engineered Burkholderia cepacia lipase (BCL) to improve its stereoselectivity. While wild-type BCL achieves only 98% enantiomeric excess, we used molecular dynamics simulations of BCL bound to menthol acetate, combined with computational rational design method (MSPER), to identify key mutation sites. We experimentally confirmed that five mutations that can effectively improve the enantiomeric excess. In particular, Q88A increased enantiomeric excess to 99.4% and productivity from 14.5% to 49.9%. Q88G exhibited the highest productivity at 54.0% with 99.3% optical purity. These BCL mutants offer a greener, more efficient approach to high-purity l-menthol production, contributing to more sustainable chemical processes.
Collapse
Affiliation(s)
- Jinzen Ikebe
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Kazunori Yoshida
- Amano Enzyme Incorporation, 1-6, Technoplaza, Kakamigahara-shi, Gifu 509-0109, Japan
| | - Satoru Ishihara
- Amano Enzyme Incorporation, 1-6, Technoplaza, Kakamigahara-shi, Gifu 509-0109, Japan
| | - Yoichi Kurumida
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
5
|
Zhang Q, Chen W, Qin M, Wang Y, Pu Z, Ding K, Liu Y, Zhang Q, Li D, Li X, Zhao Y, Yao J, Huang L, Wu J, Yang L, Chen H, Yu H. Integrating protein language models and automatic biofoundry for enhanced protein evolution. Nat Commun 2025; 16:1553. [PMID: 39934638 PMCID: PMC11814318 DOI: 10.1038/s41467-025-56751-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Accepted: 01/24/2025] [Indexed: 02/13/2025] Open
Abstract
Traditional protein engineering methods, such as directed evolution, while effective, are often slow and labor-intensive. Advances in machine learning and automated biofoundry present new opportunities for optimizing these processes. This study devises a protein language model-enabled automatic evolution platform, a closed-loop system for automated protein engineering within the Design-Build-Test-Learn cycle. The protein language model ESM-2 makes zero-shot prediction of 96 variants to initiate the cycle. The biofoundry constructs and evaluates these variants, and feeds the results back to a multi-layer perceptron to train a fitness predictor, which then makes prediction of second round of 96 variants with improved fitness. With the tRNA synthetase as a model enzyme, four-rounds of evolution carried out within 10 days lead to mutants with enzyme activity improved by up to 2.4-fold. Our system significantly enhances the speed and accuracy of protein evolution, driving faster advancements in protein engineering for industrial applications.
Collapse
Affiliation(s)
- Qiang Zhang
- Zhejiang University, Hangzhou, Zhejiang, 310058, China
- ZJU-UIUC Institute, International Campus, Zhejiang University, Haining, Zhejiang, 314400, China
| | - Wanyi Chen
- Zhejiang University, Hangzhou, Zhejiang, 310058, China
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang, 310058, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, 311200, China
| | - Ming Qin
- Zhejiang University, Hangzhou, Zhejiang, 310058, China
- School of Software Technology, Zhejiang University, Hangzhou, 315103, China
| | - Yuhao Wang
- Zhejiang University, Hangzhou, Zhejiang, 310058, China
- Polytechnic Institute, Zhejiang University, Hangzhou, 310015, China
| | - Zhongji Pu
- Xianghu Laboratory, Hangzhou, 311231, China
| | - Keyan Ding
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, 311200, China
| | - Yuyue Liu
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, 311200, China
| | - Qunfeng Zhang
- Zhejiang University, Hangzhou, Zhejiang, 310058, China
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang, 310058, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, 311200, China
| | - Dongfang Li
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, 311200, China
| | - Xinjia Li
- Xianghu Laboratory, Hangzhou, 311231, China
| | - Yu Zhao
- AI Lab, Tencent, Shenzhen, Guangdong, 518000, China
| | - Jianhua Yao
- AI Lab, Tencent, Shenzhen, Guangdong, 518000, China
| | - Lei Huang
- Zhejiang University, Hangzhou, Zhejiang, 310058, China
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang, 310058, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, 311200, China
| | - Jianping Wu
- Zhejiang University, Hangzhou, Zhejiang, 310058, China
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang, 310058, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, 311200, China
- Zhejiang Key Laboratory of Intelligent Manufacturing for Functional Chemicals, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 311215, China
| | - Lirong Yang
- Zhejiang University, Hangzhou, Zhejiang, 310058, China
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang, 310058, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, 311200, China
- Zhejiang Key Laboratory of Intelligent Manufacturing for Functional Chemicals, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 311215, China
| | - Huajun Chen
- Zhejiang University, Hangzhou, Zhejiang, 310058, China.
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, 311200, China.
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, 310027, China.
| | - Haoran Yu
- Zhejiang University, Hangzhou, Zhejiang, 310058, China.
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, 311200, China.
| |
Collapse
|
6
|
Gelman S, Johnson B, Freschlin C, Sharma A, D'Costa S, Peters J, Gitter A, Romero PA. Biophysics-based protein language models for protein engineering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.03.15.585128. [PMID: 38559182 PMCID: PMC10980077 DOI: 10.1101/2024.03.15.585128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Bryce Johnson
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | | | - Arnav Sharma
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Sameer D'Costa
- Department of Biochemistry, University of Wisconsin-Madison
| | - John Peters
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | - Philip A Romero
- Department of Biochemistry, University of Wisconsin-Madison
- Department of Biomedical Engineering, Duke University
| |
Collapse
|
7
|
Fujiuchi K, Aoki N, Ohtake T, Iwashita T, Kawasaki H. Transitions in Immunoassay Leading to Next-Generation Lateral Flow Assays and Future Prospects. Biomedicines 2024; 12:2268. [PMID: 39457581 PMCID: PMC11504701 DOI: 10.3390/biomedicines12102268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 10/03/2024] [Accepted: 10/04/2024] [Indexed: 10/28/2024] Open
Abstract
In the field of clinical testing, the traditional focus has been on the development of large-scale analysis equipment designed to process high volumes of samples with fully automatic and high-sensitivity measurements. However, there has been a growing demand in recent years for the development of analytical reagents tailored to point-of-care testing (POCT), which does not necessitate a specific location or specialized operator. This trend is epitomized using the lateral flow assay (LFA), which became a cornerstone during the 2019 pandemic due to its simplicity, speed of delivering results-within about 10 min from minimal sample concentrations-and user-friendly design. LFAs, with their paper-based construction, combine cost-effectiveness with ease of disposal, addressing both budgetary and environmental concerns comprehensively. Despite their compact size, LFAs encapsulate a wealth of technological ingenuity, embodying years of research and development. Current research is dedicated to further evolving LFA technology, paving the way for the next generation of diagnostic devices. These advancements aim to redefine accessibility, empower individuals, and enhance responsiveness to public health challenges. The future of LFAs, now unfolding, promises even greater integration into routine health management and emergency responses, underscoring their critical role in the evolution of decentralized and patient-centric healthcare solutions. In this review, the historical development of LFA and several of the latest LFA technologies using catalytic amplification, surface-enhanced Raman scattering, heat detection, electron chemical detections, magnetoresistance, and detection of reflected electrons detection are introduced to inspire readers for future research and development.
Collapse
Affiliation(s)
- Koyu Fujiuchi
- NanoSuit Research Laboratory, Institute of Photonics Medicine, Division of Preeminent Bioimaging Research, Hamamatsu University School of Medicine, Hamamatsu 431-3125, Japan;
- Research and Development Department, TAUNS Laboratories, Inc., Izunokuni-shi 410-2325, Japan; (N.A.); (T.O.)
| | - Noriko Aoki
- Research and Development Department, TAUNS Laboratories, Inc., Izunokuni-shi 410-2325, Japan; (N.A.); (T.O.)
| | - Tetsurou Ohtake
- Research and Development Department, TAUNS Laboratories, Inc., Izunokuni-shi 410-2325, Japan; (N.A.); (T.O.)
| | - Toshihide Iwashita
- Department of Regenerative and Infectious Pathology, Hamamatsu University School of Medicine, Hamamatsu 431-3125, Japan;
| | - Hideya Kawasaki
- NanoSuit Research Laboratory, Institute of Photonics Medicine, Division of Preeminent Bioimaging Research, Hamamatsu University School of Medicine, Hamamatsu 431-3125, Japan;
| |
Collapse
|
8
|
Zhang Z, Li Z, Yang M, Zhao F, Han S. Machine learning-guided multi-site combinatorial mutagenesis enhances the thermostability of pectin lyase. Int J Biol Macromol 2024; 277:134530. [PMID: 39111490 DOI: 10.1016/j.ijbiomac.2024.134530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 07/25/2024] [Accepted: 08/04/2024] [Indexed: 08/13/2024]
Abstract
Enhancing the thermostability of enzymes is crucial for industrial applications. Methods such as directed evolution are often limited by the huge sequence space and combinatorial explosion, making it difficult to obtain optimal mutants. In recent years, machine learning (ML)-guided protein engineering has become an attractive tool because of its ability to comprehensively explore the sequence space of enzymes and discover superior mutants. This study employed ML to perform combinatorial mutation design on the pectin lyase PMGL-Ba from Bacillus licheniformis, aiming to improve its thermostability. First, 18 single-point mutants with enhanced thermostability were identified through semi-rational design. Subsequently, the initial library containing a small number of low-order mutants was utilized to construct an ML model to explore the combinatorial sequence space (theoretically 196,608 mutants) of single-point mutants. The results showed that the ML-predicted second library was successfully enriched with highly thermostable combinatorial mutants. After one iteration of learning, the best-performing combinatorial mutant in the third library, P36, showed a 67-fold and 39-fold increase in half-life at 75 °C and 80 °C, respectively, as well as a 2.1-fold increase in activity. Structural analysis and molecular dynamics simulations provided insights into the improved performance of the engineered enzyme.
Collapse
Affiliation(s)
- Zhihui Zhang
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Zhixuan Li
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Manli Yang
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Fengguang Zhao
- School of Light Industry and Engineering, South China University of Technology, Guangzhou 510006, China
| | - Shuangyan Han
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China.
| |
Collapse
|
9
|
Hill A, True JM, Jones CH. Transforming drug development with synthetic biology and AI. Trends Biotechnol 2024; 42:1072-1075. [PMID: 38383215 DOI: 10.1016/j.tibtech.2024.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 02/23/2024]
Abstract
The COVID-19 pandemic has thrust RNA as a platform for drug development into the spotlight. However, identifying promising drug candidates is challenging. With advances in synthetic biology and artificial intelligence (AI) models, we can overcome this hurdle, transforming drug development and ushering in a new era in the pharmaceutical industry.
Collapse
Affiliation(s)
- Andrew Hill
- Pfizer, 66 Hudson Boulevard, New York, NY 10001, USA
| | - Jane M True
- Pfizer, 66 Hudson Boulevard, New York, NY 10001, USA
| | | |
Collapse
|
10
|
Lipsh-Sokolik R, Fleishman SJ. Addressing epistasis in the design of protein function. Proc Natl Acad Sci U S A 2024; 121:e2314999121. [PMID: 39133844 PMCID: PMC11348311 DOI: 10.1073/pnas.2314999121] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024] Open
Abstract
Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional active-site variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.
Collapse
Affiliation(s)
- Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
11
|
Vornholt T, Mutný M, Schmidt GW, Schellhaas C, Tachibana R, Panke S, Ward TR, Krause A, Jeschek M. Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning. ACS CENTRAL SCIENCE 2024; 10:1357-1370. [PMID: 39071060 PMCID: PMC11273458 DOI: 10.1021/acscentsci.4c00258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/22/2024] [Accepted: 05/02/2024] [Indexed: 07/30/2024]
Abstract
Tailored enzymes are crucial for the transition to a sustainable bioeconomy. However, enzyme engineering is laborious and failure-prone due to its reliance on serendipity. The efficiency and success rates of engineering campaigns may be improved by applying machine learning to map the sequence-activity landscape based on small experimental data sets. Yet, it often proves challenging to reliably model large sequence spaces while keeping the experimental effort tractable. To address this challenge, we present an integrated pipeline combining large-scale screening with active machine learning, which we applied to engineer an artificial metalloenzyme (ArM) catalyzing a new-to-nature hydroamination reaction. Combining lab automation and next-generation sequencing, we acquired sequence-activity data for several thousand ArM variants. We then used Gaussian process regression to model the activity landscape and guide further screening rounds. Critical characteristics of our pipeline include the cost-effective generation of information-rich data sets, the integration of an explorative round to improve the model's performance, and the inclusion of experimental noise. Our approach led to an order-of-magnitude boost in the hit rate while making efficient use of experimental resources. Search strategies like this should find broad utility in enzyme engineering and accelerate the development of novel biocatalysts.
Collapse
Affiliation(s)
- Tobias Vornholt
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
- National
Centre of Competence in Research (NCCR) Molecular Systems Engineering, 4056 Basel,Switzerland
| | - Mojmír Mutný
- Department
of Computer Science, ETH Zurich, Andreasstrasse 5, 8092 Zurich, Switzerland
| | - Gregor W. Schmidt
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
| | - Christian Schellhaas
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
| | - Ryo Tachibana
- Department
of Chemistry, University of Basel, Mattenstrasse 24a, 4058 Basel, Switzerland
| | - Sven Panke
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
- National
Centre of Competence in Research (NCCR) Molecular Systems Engineering, 4056 Basel,Switzerland
| | - Thomas R. Ward
- National
Centre of Competence in Research (NCCR) Molecular Systems Engineering, 4056 Basel,Switzerland
- Department
of Chemistry, University of Basel, Mattenstrasse 24a, 4058 Basel, Switzerland
| | - Andreas Krause
- Department
of Computer Science, ETH Zurich, Andreasstrasse 5, 8092 Zurich, Switzerland
| | - Markus Jeschek
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
- Institute
of Microbiology, University of Regensburg, Universitätsstraße 31, 93053 Regensburg, Germany
| |
Collapse
|
12
|
Huang C, Zhang L, Tang T, Wang H, Jiang Y, Ren H, Zhang Y, Fang J, Zhang W, Jia X, You S, Qin B. Application of Directed Evolution and Machine Learning to Enhance the Diastereoselectivity of Ketoreductase for Dihydrotetrabenazine Synthesis. JACS AU 2024; 4:2547-2556. [PMID: 39055154 PMCID: PMC11267543 DOI: 10.1021/jacsau.4c00284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 06/13/2024] [Accepted: 06/20/2024] [Indexed: 07/27/2024]
Abstract
Biocatalysis is an effective approach for producing chiral drug intermediates that are often difficult to synthesize using traditional chemical methods. A time-efficient strategy is required to accelerate the directed evolution process to achieve the desired enzyme function. In this research, we evaluated machine learning-assisted directed evolution as a potential approach for enzyme engineering, using a moderately diastereoselective ketoreductase library as a model system. Machine learning-assisted directed evolution and traditional directed evolution methods were compared for reducing (±)-tetrabenazine to dihydrotetrabenazine via kinetic resolution facilitated by BsSDR10, a short-chain dehydrogenase/reductase from Bacillus subtilis. Both methods successfully identified variants with significantly improved diastereoselectivity for each isomer of dihydrotetrabenazine. Furthermore, the preparation of (2S,3S,11bS)-dihydrotetrabenazine has been successfully scaled up, with an isolated yield of 40.7% and a diastereoselectivity of 91.3%.
Collapse
Affiliation(s)
- Chenming Huang
- Wuya
College of Innovation, Shenyang Pharmaceutical
University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Li Zhang
- Wuya
College of Innovation, Shenyang Pharmaceutical
University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Tong Tang
- Wuya
College of Innovation, Shenyang Pharmaceutical
University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Haijiao Wang
- Wuya
College of Innovation, Shenyang Pharmaceutical
University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Yingqian Jiang
- Wuya
College of Innovation, Shenyang Pharmaceutical
University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Hanwen Ren
- Wuya
College of Innovation, Shenyang Pharmaceutical
University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Yitian Zhang
- Wuya
College of Innovation, Shenyang Pharmaceutical
University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Jiali Fang
- Wuya
College of Innovation, Shenyang Pharmaceutical
University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Wenhe Zhang
- School
of Life Sciences and Biopharmaceutical Sciences, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Xian Jia
- School
of Pharmaceutical Engineering, Shenyang
Pharmaceutical University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Song You
- School
of Life Sciences and Biopharmaceutical Sciences, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| | - Bin Qin
- Wuya
College of Innovation, Shenyang Pharmaceutical
University, 103 Wenhua Road, Shenhe, Shenyang 110016, People’s Republic
of China
| |
Collapse
|
13
|
Shanker VR, Bruun TU, Hie BL, Kim PS. Unsupervised evolution of protein and antibody complexes with a structure-informed language model. Science 2024; 385:46-53. [PMID: 38963838 PMCID: PMC11616794 DOI: 10.1126/science.adk8946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 05/29/2024] [Indexed: 07/06/2024]
Abstract
Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to model individual functional tasks. We also demonstrate that ESM-IF1, which was only trained on single-chain structures, can be extended to engineer protein complexes. Using this approach, we screened about 30 variants of two therapeutic clinical antibodies used to treat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We achieved up to 25-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants of concern BQ.1.1 and XBB.1.5, respectively. These findings highlight the advantage of integrating structural information to identify efficient protein evolution trajectories without requiring any task-specific training data.
Collapse
Affiliation(s)
- Varun R. Shanker
- Stanford Biophysics Program, Stanford University School of Medicine; Stanford, CA 94305, USA
- Stanford Medical Scientist Training Program, Stanford University School of Medicine; Stanford CA 94305, USA
- Sarafan ChEM-H, Stanford University; Stanford, CA 94305, USA
| | - Theodora U.J. Bruun
- Stanford Medical Scientist Training Program, Stanford University School of Medicine; Stanford CA 94305, USA
- Sarafan ChEM-H, Stanford University; Stanford, CA 94305, USA
- Department of Biochemistry, Stanford University School of Medicine; Stanford, CA 94305, USA
| | - Brian L. Hie
- Sarafan ChEM-H, Stanford University; Stanford, CA 94305, USA
- Department of Biochemistry, Stanford University School of Medicine; Stanford, CA 94305, USA
| | - Peter S. Kim
- Sarafan ChEM-H, Stanford University; Stanford, CA 94305, USA
- Department of Biochemistry, Stanford University School of Medicine; Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| |
Collapse
|
14
|
Delgadillo-Guevara M, Halte M, Erhardt M, Popp PF. Fluorescent tools for the standardized work in Gram-negative bacteria. J Biol Eng 2024; 18:25. [PMID: 38589953 PMCID: PMC11003136 DOI: 10.1186/s13036-024-00420-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Accepted: 03/18/2024] [Indexed: 04/10/2024] Open
Abstract
Standardized and thoroughly characterized genetic tools are a prerequisite for studying cellular processes to ensure the reusability and consistency of experimental results. The discovery of fluorescent proteins (FPs) represents a milestone in the development of genetic reporters for monitoring transcription or protein localization in vivo. FPs have revolutionized our understanding of cellular dynamics by enabling the real-time visualization and tracking of biological processes. Despite these advancements, challenges remain in the appropriate use of FPs, specifically regarding their proper application, protein turnover dynamics, and the undesired disruption of cellular functions. Here, we systematically compared a comprehensive set of 15 FPs and assessed their performance in vivo by focusing on key parameters, such as signal over background ratios and protein stability rates, using the Gram-negative model organism Salmonella enterica as a representative host. We evaluated four protein degradation tags in both plasmid- and genome-based systems and our findings highlight the necessity of introducing degradation tags to analyze time-sensitive cellular processes. We demonstrate that the gain of dynamics mediated by the addition of degradation tags impacts the cell-to-cell heterogeneity of plasmid-based but not genome-based reporters. Finally, we probe the applicability of FPs for protein localization studies in living cells using standard and super-resolution fluorescence microscopy. In summary, our study underscores the importance of careful FP selection and paves the way for the development of improved genetic reporters to enhance the reproducibility and reliability of fluorescence-based research in Gram-negative bacteria and beyond.
Collapse
Affiliation(s)
- Mario Delgadillo-Guevara
- Institute of Biology/Molecular Microbiology, Humboldt-Universität zu Berlin, Berlin, 10115, Germany
- Department of Biochemistry and Pharmacology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Melbourne, VIC, Australia
| | - Manuel Halte
- Institute of Biology/Molecular Microbiology, Humboldt-Universität zu Berlin, Berlin, 10115, Germany
| | - Marc Erhardt
- Institute of Biology/Molecular Microbiology, Humboldt-Universität zu Berlin, Berlin, 10115, Germany
- Max Planck Unit for the Science of Pathogens, Berlin, 10117, Germany
| | - Philipp F Popp
- Institute of Biology/Molecular Microbiology, Humboldt-Universität zu Berlin, Berlin, 10115, Germany.
| |
Collapse
|
15
|
Goshisht MK. Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges. ACS OMEGA 2024; 9:9921-9945. [PMID: 38463314 PMCID: PMC10918679 DOI: 10.1021/acsomega.3c05913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/19/2024] [Accepted: 01/30/2024] [Indexed: 03/12/2024]
Abstract
Machine learning (ML), particularly deep learning (DL), has made rapid and substantial progress in synthetic biology in recent years. Biotechnological applications of biosystems, including pathways, enzymes, and whole cells, are being probed frequently with time. The intricacy and interconnectedness of biosystems make it challenging to design them with the desired properties. ML and DL have a synergy with synthetic biology. Synthetic biology can be employed to produce large data sets for training models (for instance, by utilizing DNA synthesis), and ML/DL models can be employed to inform design (for example, by generating new parts or advising unrivaled experiments to perform). This potential has recently been brought to light by research at the intersection of engineering biology and ML/DL through achievements like the design of novel biological components, best experimental design, automated analysis of microscopy data, protein structure prediction, and biomolecular implementations of ANNs (Artificial Neural Networks). I have divided this review into three sections. In the first section, I describe predictive potential and basics of ML along with myriad applications in synthetic biology, especially in engineering cells, activity of proteins, and metabolic pathways. In the second section, I describe fundamental DL architectures and their applications in synthetic biology. Finally, I describe different challenges causing hurdles in the progress of ML/DL and synthetic biology along with their solutions.
Collapse
Affiliation(s)
- Manoj Kumar Goshisht
- Department of Chemistry, Natural and
Applied Sciences, University of Wisconsin—Green
Bay, Green
Bay, Wisconsin 54311-7001, United States
| |
Collapse
|
16
|
Wait SJ, Expòsit M, Lin S, Rappleye M, Lee JD, Colby SA, Torp L, Asencio A, Smith A, Regnier M, Moussavi-Harami F, Baker D, Kim CK, Berndt A. Machine learning-guided engineering of genetically encoded fluorescent calcium indicators. NATURE COMPUTATIONAL SCIENCE 2024; 4:224-236. [PMID: 38532137 PMCID: PMC11878291 DOI: 10.1038/s43588-024-00611-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 02/15/2024] [Indexed: 03/28/2024]
Abstract
Here we used machine learning to engineer genetically encoded fluorescent indicators, protein-based sensors critical for real-time monitoring of biological activity. We used machine learning to predict the outcomes of sensor mutagenesis by analyzing established libraries that link sensor sequences to functions. Using the GCaMP calcium indicator as a scaffold, we developed an ensemble of three regression models trained on experimentally derived GCaMP mutation libraries. The trained ensemble performed an in silico functional screen on 1,423 novel, uncharacterized GCaMP variants. As a result, we identified the ensemble-derived GCaMP (eGCaMP) variants, eGCaMP and eGCaMP+, which achieve both faster kinetics and larger ∆F/F0 responses upon stimulation than previously published fast variants. Furthermore, we identified a combinatorial mutation with extraordinary dynamic range, eGCaMP2+, which outperforms the tested sixth-, seventh- and eighth-generation GCaMPs. These findings demonstrate the value of machine learning as a tool to facilitate the efficient engineering of proteins for desired biophysical characteristics.
Collapse
Affiliation(s)
- Sarah J Wait
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA
- Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
| | - Marc Expòsit
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Sophia Lin
- Center for Neuroscience, University of California, Davis, Davis, CA, USA
- Department of Neurology, University of California, Davis, Davis, CA, USA
| | - Michael Rappleye
- Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
- Institute of Pharmacology and Toxicology, University of Zürich, Zurich, Switzerland
| | - Justin Daho Lee
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA
- Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
| | - Samuel A Colby
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA
| | - Lily Torp
- Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Anthony Asencio
- Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Annette Smith
- Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
| | - Michael Regnier
- Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Farid Moussavi-Harami
- Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Division of Cardiology, University of Washington, Seattle, WA, USA
| | - David Baker
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Christina K Kim
- Center for Neuroscience, University of California, Davis, Davis, CA, USA
- Department of Neurology, University of California, Davis, Davis, CA, USA
| | - Andre Berndt
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA.
- Institute of Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
- Center for Neurobiology of Addiction, Pain, and Emotion, University of Washington, Seattle, WA, USA.
| |
Collapse
|
17
|
Yang J, Li FZ, Arnold FH. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering. ACS CENTRAL SCIENCE 2024; 10:226-241. [PMID: 38435522 PMCID: PMC10906252 DOI: 10.1021/acscentsci.3c01275] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/26/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024]
Abstract
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
Collapse
Affiliation(s)
- Jason Yang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Frances H. Arnold
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
18
|
Fu X, Suo H, Zhang J, Chen D. Machine-learning-guided Directed Evolution for AAV Capsid Engineering. Curr Pharm Des 2024; 30:811-824. [PMID: 38445704 DOI: 10.2174/0113816128286593240226060318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/07/2024] [Accepted: 02/13/2024] [Indexed: 03/07/2024]
Abstract
Target gene delivery is crucial to gene therapy. Adeno-associated virus (AAV) has emerged as a primary gene therapy vector due to its broad host range, long-term expression, and low pathogenicity. However, AAV vectors have some limitations, such as immunogenicity and insufficient targeting. Designing or modifying capsids is a potential method of improving the efficacy of gene delivery, but hindered by weak biological basis of AAV, complexity of the capsids, and limitations of current screening methods. Artificial intelligence (AI), especially machine learning (ML), has great potential to accelerate and improve the optimization of capsid properties as well as decrease their development time and manufacturing costs. This review introduces the traditional methods of designing AAV capsids and the general steps of building a sequence-function ML model, highlights the applications of ML in the development workflow, and summarizes its advantages and challenges.
Collapse
Affiliation(s)
- Xianrong Fu
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Hairui Suo
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Jiachen Zhang
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Dongmei Chen
- School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
19
|
Shanker VR, Bruun TU, Hie BL, Kim PS. Inverse folding of protein complexes with a structure-informed language model enables unsupervised antibody evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.19.572475. [PMID: 38187780 PMCID: PMC10769282 DOI: 10.1101/2023.12.19.572475] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Large language models trained on sequence information alone are capable of learning high level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here we show that a general protein language model augmented with protein structure backbone coordinates and trained on the inverse folding problem can guide evolution for diverse proteins without needing to explicitly model individual functional tasks. We demonstrate inverse folding to be an effective unsupervised, structure-based sequence optimization strategy that also generalizes to multimeric complexes by implicitly learning features of binding and amino acid epistasis. Using this approach, we screened ~30 variants of two therapeutic clinical antibodies used to treat SARS-CoV-2 infection and achieved up to 26-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants-of-concern BQ.1.1 and XBB.1.5, respectively. In addition to substantial overall improvements in protein function, we find inverse folding performs with leading experimental success rates among other reported machine learning-guided directed evolution methods, without requiring any task-specific training data.
Collapse
Affiliation(s)
- Varun R. Shanker
- Stanford Biophysics Program, Stanford University School of Medicine, Stanford, CA 94305, USA
- Stanford Medical Scientist Training Program, Stanford University School of Medicine, Stanford CA 94305, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA 94305, USA
| | - Theodora U.J. Bruun
- Stanford Medical Scientist Training Program, Stanford University School of Medicine, Stanford CA 94305, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA 94305, USA
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Brian L. Hie
- Sarafan ChEM-H, Stanford University, Stanford, CA 94305, USA
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Peter S. Kim
- Sarafan ChEM-H, Stanford University, Stanford, CA 94305, USA
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| |
Collapse
|
20
|
Curatolo AI, Kimchi O, Goodrich CP, Krueger RK, Brenner MP. A computational toolbox for the assembly yield of complex and heterogeneous structures. Nat Commun 2023; 14:8328. [PMID: 38097568 PMCID: PMC10721878 DOI: 10.1038/s41467-023-43168-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 11/02/2023] [Indexed: 12/17/2023] Open
Abstract
The self-assembly of complex structures from a set of non-identical building blocks is a hallmark of soft matter and biological systems, including protein complexes, colloidal clusters, and DNA-based assemblies. Predicting the dependence of the equilibrium assembly yield on the concentrations and interaction energies of building blocks is highly challenging, owing to the difficulty of computing the entropic contributions to the free energy of the many structures that compete with the ground state configuration. While these calculations yield well known results for spherically symmetric building blocks, they do not hold when the building blocks have internal rotational degrees of freedom. Here we present an approach for solving this problem that works with arbitrary building blocks, including proteins with known structure and complex colloidal building blocks. Our algorithm combines classical statistical mechanics with recently developed computational tools for automatic differentiation. Automatic differentiation allows efficient evaluation of equilibrium averages over configurations that would otherwise be intractable. We demonstrate the validity of our framework by comparison to molecular dynamics simulations of simple examples, and apply it to calculate the yield curves for known protein complexes and for the assembly of colloidal shells.
Collapse
Affiliation(s)
- Agnese I Curatolo
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA
| | - Ofer Kimchi
- Lewis-Sigler Institute, Princeton University, Princeton, NJ, 08544, USA
| | - Carl P Goodrich
- Institute of Science and Technology Austria, A-3400, Klosterneuburg, Austria
| | - Ryan K Krueger
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA
| | - Michael P Brenner
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA.
- Department of Physics, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
21
|
Qiu Y, Wei GW. Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models. Brief Bioinform 2023; 24:bbad289. [PMID: 37580175 PMCID: PMC10516362 DOI: 10.1093/bib/bbad289] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 07/14/2023] [Accepted: 07/26/2023] [Indexed: 08/16/2023] Open
Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, 48824 MI, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, 48824 MI, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824 MI, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824 MI, USA
| |
Collapse
|
22
|
Zhang Q, Zheng W, Song Z, Zhang Q, Yang L, Wu J, Lin J, Xu G, Yu H. Machine Learning Enables Prediction of Pyrrolysyl-tRNA Synthetase Substrate Specificity. ACS Synth Biol 2023; 12:2403-2417. [PMID: 37486975 DOI: 10.1021/acssynbio.3c00225] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
Knowledge about the substrate scope for a given enzyme is informative for elucidating biochemical pathways and also for expanding applications of the enzyme. However, no general methods are available to accurately predict the substrate specificity of an enzyme. Pyrrolysyl-tRNA synthetase (PylRS) is a powerful tool for incorporating various noncanonical amino acids (NCAAs) into proteins, which enabled us to probe, image, rationally engineer, and evolve protein structure and function. However, the incorporation of a new NCAA typically requires the selection of large libraries of PylRS with randomized mutations at active sites, and this process requires multiple rounds of selection for each new substrate. Therefore, a single aminoacyl-tRNA synthetase with broad substrate promiscuity is ideal to facilitate widespread applications of the genetic NCAA incorporation technique. Herein, machine learning models were developed to predict the substrate specificity of PylRS to accept novel NCAAs that could be incorporated into proteins by three PylRS mutants. The models were built from a training set of 285 unique enzyme-substrate pairs of three PylRS mutants including IFRS, BtaRS, and MFRS against 95 NCAAs. The best BaggingTree (BT) model was then used for virtually screening a NCAAs library containing 1474 phenylalanine, tyrosine, tryptophan, and alanine analogues, and 156 NCAAs were predicted to be accepted by at least one of the three PylRS mutants. Then, 27 NCAAs including 24 positive and 3 negative substrates were experimentally tested for their activities, and 20 of the 24 positive substrates showed weak or strong activity and were accepted by at least one PylRS mutant, among which 11 NCAAs were never reported to be incorporated into proteins before. Three negative substrates did not show any activity. Experimental results suggested that the BT model provides a three-class classification accuracy of 0.69 and a binary classification accuracy of 0.86. This study expanded the substrate scope of three PylRS variants and provided a framework for developing machine learning models to predict substrate specificity of other PylRS variants.
Collapse
Affiliation(s)
- Qunfeng Zhang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Wenlong Zheng
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Zhongdi Song
- Key Laboratory of Pollution Exposure and Health Intervention of Zhejiang Province, Interdisciplinary Research Academy, Zhejiang Shuren University, Hangzhou 310015, China
| | - Qiang Zhang
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Lirong Yang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Jianping Wu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Jianping Lin
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Gang Xu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Haoran Yu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| |
Collapse
|
23
|
Chen L, Zhang Z, Li Z, Li R, Huo R, Chen L, Wang D, Luo X, Chen K, Liao C, Zheng M. Learning protein fitness landscapes with deep mutational scanning data from multiple sources. Cell Syst 2023; 14:706-721.e5. [PMID: 37591206 DOI: 10.1016/j.cels.2023.07.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/30/2023] [Accepted: 07/18/2023] [Indexed: 08/19/2023]
Abstract
One of the key points of machine learning-assisted directed evolution (MLDE) is the accurate learning of the fitness landscape, a conceptual mapping from sequence variants to the desired function. Here, we describe a multi-protein training scheme that leverages the existing deep mutational scanning data from diverse proteins to aid in understanding the fitness landscape of a new protein. Proof-of-concept trials are designed to validate this training scheme in three aspects: random and positional extrapolation for single-variant effects, zero-shot fitness predictions for new proteins, and extrapolation for higher-order variant effects from single-variant effects. Moreover, our study identified previously overlooked strong baselines, and their unexpectedly good performance brings our attention to the pitfalls of MLDE. Overall, these results may improve our understanding of the association between different protein fitness profiles and shed light on developing better machine learning-assisted approaches to the directed evolution of proteins. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Lin Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zehong Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhenghao Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Rui Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Ruifeng Huo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Lifan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Cangsong Liao
- University of Chinese Academy of Sciences, Beijing 100049, China; Chemical Biology Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Science, Shanghai 201203, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China; School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China.
| |
Collapse
|
24
|
Tamaru Y, Nakanishi S, Tanaka K, Umetsu M, Nakazawa H, Sugiyama A, Ito T, Shimokawa N, Takagi M. Recent research advances on non-linear phenomena in various biosystems. J Biosci Bioeng 2023:S1389-1723(23)00107-X. [PMID: 37246137 DOI: 10.1016/j.jbiosc.2023.03.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 03/03/2023] [Accepted: 03/22/2023] [Indexed: 05/30/2023]
Abstract
All biological phenomena can be classified as open, dissipative and non-linear. Moreover, the most typical phenomena are associated with non-linearity, dissipation and openness in biological systems. In this review article, four research topics on non-linear biosystems are described to show the examples from various biological systems. First, membrane dynamics of a lipid bilayer for the cell membrane is described. Since the cell membrane separates the inside of the cell from the outside, self-organizing systems that form spatial patterns on membranes often depend on non-linear dynamics. Second, various data banks based on recent genomics analysis supply the data including vast functional proteins from many organisms and their variable species. Since the proteins existing in nature are only a very small part of the space represented by amino acid sequence, success of mutagenesis-based molecular evolution approach crucially depends on preparing a library with high enrichment of functional proteins. Third, photosynthetic organisms depend on ambient light, the regular and irregular changes of which have a significant impact on photosynthetic processes. The light-driven process proceeds through many redox couples in the cyanobacteria constituting chain of redox reactions. Forth topics focuses on a vertebrate model, the zebrafish, which can help to understand, predict and control the chaos of complex biological systems. In particular, during early developmental stages, developmental differentiation occurs dynamically from a fertilized egg to divided and mature cells. These exciting fields of complexity, chaos, and non-linear science have experienced impressive growth in recent decades. Finally, future directions for non-liner biosystems are presented.
Collapse
Affiliation(s)
- Yutaka Tamaru
- Department of Life Sciences, Graduate School of Bioresources, Mie University, 1577 Kurimamachiya, Tsu, Mie 514-8507, Japan.
| | - Shuji Nakanishi
- Research Center for Solar Energy Chemistry, Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan
| | - Kenya Tanaka
- Research Center for Solar Energy Chemistry, Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aramakiazaaoba, Aoba, Sendai, Miyagi 980-8579, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aramakiazaaoba, Aoba, Sendai, Miyagi 980-8579, Japan
| | - Aruto Sugiyama
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aramakiazaaoba, Aoba, Sendai, Miyagi 980-8579, Japan
| | - Tomoyuki Ito
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aramakiazaaoba, Aoba, Sendai, Miyagi 980-8579, Japan
| | - Naofumi Shimokawa
- School of Materials Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan
| | - Masahiro Takagi
- School of Materials Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan
| |
Collapse
|
25
|
Gu J, Xu Y, Nie Y. Role of distal sites in enzyme engineering. Biotechnol Adv 2023; 63:108094. [PMID: 36621725 DOI: 10.1016/j.biotechadv.2023.108094] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 11/15/2022] [Accepted: 01/01/2023] [Indexed: 01/06/2023]
Abstract
The limitations associated with natural enzyme catalysis have triggered the rise of the field of protein engineering. Traditional rational design was based on the analysis of protein structural information and catalytic mechanisms to identify key active sites or ligand binding sites to reshape the substrate pocket. The role and significance of functional sites in the active center have been studied extensively. With a deeper understanding of the structure-catalysis relationship map, the entire protein molecule can be filled with residues that play a substantial role in its structure and function. However, the catalytic mechanism underlying distal mutations remains unclear. The aim of this review was to highlight the criticality of the distal site in enzyme engineering based on the following three aspects: What can distal mutations exert on function from mutability landscape? How do distal sites influence enzyme function? How to predict and design distal mutations? This review provides insights into the catalytic mechanism of enzymes from the global interaction network, knowledge from sequence-structure-dynamics-function relationships, and strategies for distal mutation-based protein engineering.
Collapse
Affiliation(s)
- Jie Gu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yan Xu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Yao Nie
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; Suqian Industrial Technology Research Institute of Jiangnan University, Suqian 223814, China.
| |
Collapse
|
26
|
Ogawa Y, Saito Y, Yamaguchi H, Katsuyama Y, Ohnishi Y. Engineering the Substrate Specificity of Toluene Degrading Enzyme XylM Using Biosensor XylS and Machine Learning. ACS Synth Biol 2023; 12:572-582. [PMID: 36734676 DOI: 10.1021/acssynbio.2c00577] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Enzyme engineering using machine learning has been developed in recent years. However, to obtain a large amount of data on enzyme activities for training data, it is necessary to develop a high-throughput and accurate method for evaluating enzyme activities. Here, we examined whether a biosensor-based enzyme engineering method can be applied to machine learning. As a model experiment, we aimed to modify the substrate specificity of XylM, a rate-determining enzyme in a multistep oxidation reaction catalyzed by XylMABC in Pseudomonas putida. XylMABC naturally converts toluene and xylene to benzoic acid and toluic acid, respectively. We aimed to engineer XylM to improve its conversion efficiency to a non-native substrate, 2,6-xylenol. Wild-type XylMABC slightly converted 2,6-xylenol to 3-methylsalicylic acid, which is the ligand of the transcriptional regulator XylS in P. putida. By locating a fluorescent protein gene under the control of the Pm promoter to which XylS binds, a XylS-producing Escherichia coli strain showed higher fluorescence intensity in a 3-methylsalicylic acid concentration-dependent manner. We evaluated the 3-methylsalicylic acid productivity of XylM variants using the fluorescence intensity of the sensor strain as an indicator. The obtained data provided the training data for machine learning for the directed evolution of XylM. Two cycles of machine learning-assisted directed evolution resulted in the acquisition of XylM-D140E-V144K-F243L-N244S with 15 times higher productivity than wild-type XylM. These results demonstrate that an indirect enzyme activity evaluation method using biosensors is sufficiently quantitative and high-throughput to be used as training data for machine learning. The findings expand the versatility of machine learning in enzyme engineering.
Collapse
Affiliation(s)
- Yuki Ogawa
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo113-8657, Japan
| | - Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo135-0064, Japan.,AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo169-8555, Japan.,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-8561, Japan
| | - Hideki Yamaguchi
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-8561, Japan
| | - Yohei Katsuyama
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo113-8657, Japan.,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo113-8657, Japan
| | - Yasuo Ohnishi
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo113-8657, Japan.,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo113-8657, Japan
| |
Collapse
|
27
|
Huang A, Lu F, Liu F. Discrimination of psychrophilic enzymes using machine learning algorithms with amino acid composition descriptor. Front Microbiol 2023; 14:1130594. [PMID: 36860491 PMCID: PMC9968940 DOI: 10.3389/fmicb.2023.1130594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 01/23/2023] [Indexed: 02/16/2023] Open
Abstract
Introduction Psychrophilic enzymes are a class of macromolecules with high catalytic activity at low temperatures. Cold-active enzymes possessing eco-friendly and cost-effective properties, are of huge potential application in detergent, textiles, environmental remediation, pharmaceutical as well as food industry. Compared with the time-consuming and labor-intensive experiments, computational modeling especially the machine learning (ML) algorithm is a high-throughput screening tool to identify psychrophilic enzymes efficiently. Methods In this study, the influence of 4 ML methods (support vector machines, K-nearest neighbor, random forest, and naïve Bayes), and three descriptors, i.e., amino acid composition (AAC), dipeptide combinations (DPC), and AAC + DPC on the model performance were systematically analyzed. Results and discussion Among the 4 ML methods, the support vector machine model based on the AAC descriptor using 5-fold cross-validation achieved the best prediction accuracy with 80.6%. The AAC outperformed than the DPC and AAC + DPC descriptors regardless of the ML methods used. In addition, amino acid frequencies between psychrophilic and non-psychrophilic proteins revealed that higher frequencies of Ala, Gly, Ser, and Thr, and lower frequencies of Glu, Lys, Arg, Ile,Val, and Leu could be related to the protein psychrophilicity. Further, ternary models were also developed that could classify psychrophilic, mesophilic, and thermophilic proteins effectively. The predictive accuracy of the ternary classification model using AAC descriptor via the support vector machine algorithm was 75.8%. These findings would enhance our insight into the cold-adaption mechanisms of psychrophilic proteins and aid in the design of engineered cold-active enzymes. Moreover, the proposed model could be used as a screening tool to identify novel cold-adapted proteins.
Collapse
Affiliation(s)
- Ailan Huang
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, China
| | - Fuping Lu
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, China,Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, Tianjin, China
| | - Fufeng Liu
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, China,Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, Tianjin, China,*Correspondence: Fufeng Liu, ✉ ;
| |
Collapse
|
28
|
Choi G, Kim W, Koo J. Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants. BIOTECHNOL BIOPROC E 2023. [DOI: 10.1007/s12257-022-0330-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
|
29
|
Tellechea-Luzardo J, Stiebritz MT, Carbonell P. Transcription factor-based biosensors for screening and dynamic regulation. Front Bioeng Biotechnol 2023; 11:1118702. [PMID: 36814719 PMCID: PMC9939652 DOI: 10.3389/fbioe.2023.1118702] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 01/26/2023] [Indexed: 02/09/2023] Open
Abstract
Advances in synthetic biology and genetic engineering are bringing into the spotlight a wide range of bio-based applications that demand better sensing and control of biological behaviours. Transcription factor (TF)-based biosensors are promising tools that can be used to detect several types of chemical compounds and elicit a response according to the desired application. However, the wider use of this type of device is still hindered by several challenges, which can be addressed by increasing the current metabolite-activated transcription factor knowledge base, developing better methods to identify new transcription factors, and improving the overall workflow for the design of novel biosensor circuits. These improvements are particularly important in the bioproduction field, where researchers need better biosensor-based approaches for screening production-strains and precise dynamic regulation strategies. In this work, we summarize what is currently known about transcription factor-based biosensors, discuss recent experimental and computational approaches targeted at their modification and improvement, and suggest possible future research directions based on two applications: bioproduction screening and dynamic regulation of genetic circuits.
Collapse
Affiliation(s)
- Jonathan Tellechea-Luzardo
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), Valencia, Spain
| | - Martin T. Stiebritz
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), Valencia, Spain
| | - Pablo Carbonell
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), Valencia, Spain
- Institute for Integrative Systems Biology I2SysBio, Universitat de València-CSIC, Paterna, Spain
| |
Collapse
|
30
|
Lee J, Campillo B, Hamidian S, Liu Z, Shorey M, St-Pierre F. Automating the High-Throughput Screening of Protein-Based Optical Indicators and Actuators. Biochemistry 2023; 62:169-177. [PMID: 36315460 PMCID: PMC9852035 DOI: 10.1021/acs.biochem.2c00357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Over the last 25 years, protein engineers have developed an impressive collection of optical tools to interface with biological systems: indicators to eavesdrop on cellular activity and actuators to poke and prod native processes. To reach the performance level required for their downstream applications, protein-based tools are usually sculpted by iterative rounds of mutagenesis. In each round, libraries of variants are made and evaluated, and the most promising hits are then retrieved, sequenced, and further characterized. Early efforts to engineer protein-based optical tools were largely manual, suffering from low throughput, human error, and tedium. Here, we describe approaches to automating the screening of libraries generated as colonies on agar, multiwell plates, and pooled populations of single-cell variants. We also briefly discuss emerging approaches for screening, including cell-free systems and machine learning.
Collapse
Affiliation(s)
- Jihwan Lee
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - Beatriz Campillo
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shaminta Hamidian
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zhuohe Liu
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Matthew Shorey
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - François St-Pierre
- Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
- Systems, Synthetic, and Physical Biology Program, Rice University, Houston, TX 77005, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
31
|
Paik I, Ngo PHT, Shroff R, Diaz DJ, Maranhao AC, Walker DJ, Bhadra S, Ellington AD. Improved Bst DNA Polymerase Variants Derived via a Machine Learning Approach. Biochemistry 2023; 62:410-418. [PMID: 34762799 PMCID: PMC9514386 DOI: 10.1021/acs.biochem.1c00451] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The DNA polymerase I from Geobacillus stearothermophilus (also known as Bst DNAP) is widely used in isothermal amplification reactions, where its strand displacement ability is prized. More robust versions of this enzyme should be enabled for diagnostic applications, especially for carrying out higher temperature reactions that might proceed more quickly. To this end, we appended a short fusion domain from the actin-binding protein villin that improved both stability and purification of the enzyme. In parallel, we have developed a machine learning algorithm that assesses the relative fit of individual amino acids to their chemical microenvironments at any position in a protein and applied this algorithm to predict sequence substitutions in Bst DNAP. The top predicted variants had greatly improved thermotolerance (heating prior to assay), and upon combination, the mutations showed additive thermostability, with denaturation temperatures up to 2.5 °C higher than the parental enzyme. The increased thermostability of the enzyme allowed faster loop-mediated isothermal amplification assays to be carried out at 73 °C, where both Bst DNAP and its improved commercial counterpart Bst 2.0 are inactivated. Overall, this is one of the first examples of the application of machine learning approaches to the thermostabilization of an enzyme.
Collapse
Affiliation(s)
- Inyup Paik
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Phuoc H. T. Ngo
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology and Department of Chemistry, College of Natural Sciences, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Raghav Shroff
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States; CCDC Army Research Lab-South, Austin, Texas 78712, United States
| | - Daniel J. Diaz
- Center for Systems and Synthetic Biology and Department of Chemistry, College of Natural Sciences, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Andre C. Maranhao
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| | - David J.F. Walker
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Sanchita Bhadra
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Andrew D. Ellington
- Department of Molecular Biosciences, College of Natural Sciences, the University of Texas at Austin, Austin, Texas 78712, United States; Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
32
|
Ito T, Nguyen TD, Saito Y, Kurumida Y, Nakazawa H, Kawada S, Nishi H, Tsuda K, Kameda T, Umetsu M. Selection of target-binding proteins from the information of weakly enriched phage display libraries by deep sequencing and machine learning. MAbs 2023; 15:2168470. [PMID: 36683172 PMCID: PMC9872955 DOI: 10.1080/19420862.2023.2168470] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Despite the advances in surface-display systems for directed evolution, variants with high affinity are not always enriched due to undesirable biases that increase target-unrelated variants during biopanning. Here, our goal was to design a library containing improved variants from the information of the "weakly enriched" library where functional variants were weakly enriched. Deep sequencing for the previous biopanning result, where no functional antibody mimetics were experimentally identified, revealed that weak enrichment was partly due to undesirable biases during phage infection and amplification steps. The clustering analysis of the deep sequencing data from appropriate steps revealed no distinct sequence patterns, but a Bayesian machine learning model trained with the selected deep sequencing data supplied nine clusters with distinct sequence patterns. Phage libraries were designed on the basis of the sequence patterns identified, and four improved variants with target-specific affinity (EC50 = 80-277 nM) were identified by biopanning. The selection and use of deep sequencing data without undesirable bias enabled us to extract the information on prospective variants. In summary, the use of appropriate deep sequencing data and machine learning with the sequence data has the possibility of finding sequence space where functional variants are enriched.
Collapse
Affiliation(s)
- Tomoyuki Ito
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, Sendai, Japan
| | - Thuy Duong Nguyen
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan,AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo, Japan,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan,Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
| | - Yoichi Kurumida
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, Sendai, Japan
| | - Sakiya Kawada
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, Sendai, Japan
| | - Hafumi Nishi
- Department of Applied Information Sciences, Graduate School of Information Sciences, Tohoku University, Sendai, Japan,Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan,Faculty of Core Research, Ochanomizu University, Tokyo, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan,Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan,Research and Services Division of Materials Data and Integrated Systems, National Institute for Materials Science, Tsukuba, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan,Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan,CONTACT Tomoshi Kameda Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, Sendai, Japan,Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan,Mitsuo Umetsu Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, Sendai, Japan
| |
Collapse
|
33
|
Kakuzaki T, Koga H, Takizawa S, Metsugi S, Shiraiwa H, Sampei Z, Yoshida K, Tsunoda H, Teramoto R. Monte Carlo Thompson sampling-guided design for antibody engineering. MAbs 2023; 15:2244214. [PMID: 37605371 PMCID: PMC10446805 DOI: 10.1080/19420862.2023.2244214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 07/27/2023] [Accepted: 07/29/2023] [Indexed: 08/23/2023] Open
Abstract
Antibodies are one of the predominant treatment modalities for various diseases. To improve the characteristics of a lead antibody, such as antigen-binding affinity and stability, we conducted comprehensive substitutions and exhaustively explored their sequence space. However, it is practically unfeasible to evaluate all possible combinations of mutations owing to combinatorial explosion when multiple amino acid residues are incorporated. It was recently reported that a machine-learning guided protein engineering approach such as Thompson sampling (TS) has been used to efficiently explore sequence space in the framework of Bayesian optimization. For TS, over-exploration occurs when the initial data are biasedly distributed in the vicinity of the lead antibody. We handle a large-scale virtual library that includes numerous mutations. When the number of experiments is limited, this over-exploration causes a serious issue. Thus, we conducted Monte Carlo Thompson sampling (MTS) to balance the exploration-exploitation trade-off by defining the posterior distribution via the Monte Carlo method and compared its performance with TS in antibody engineering. Our results demonstrated that MTS largely outperforms TS in discovering desirable candidates at an earlier round when over-exploration occurs on TS. Thus, the MTS method is a powerful technique for efficiently discovering antibodies with desired characteristics when the number of rounds is limited.
Collapse
Affiliation(s)
- Taro Kakuzaki
- Research Division, Chugai Pharmaceutical Co., Ltd., Yokohama, Japan
| | - Hikaru Koga
- Research Division, Chugai Pharmaceutical Co., Ltd., Yokohama, Japan
| | - Shuuki Takizawa
- Research Division, Chugai Pharmaceutical Co., Ltd., Yokohama, Japan
| | - Shoichi Metsugi
- Research Division, Chugai Pharmaceutical Co., Ltd., Yokohama, Japan
| | | | - Zenjiro Sampei
- Research Division, Chugai Pharmaceutical Co., Ltd., Yokohama, Japan
| | - Kenji Yoshida
- Research Division, Chugai Pharmaceutical Co., Ltd., Yokohama, Japan
| | - Hiroyuki Tsunoda
- Research Division, Chugai Pharmaceutical Co., Ltd., Yokohama, Japan
| | - Reiji Teramoto
- Research Division, Chugai Pharmaceutical Co., Ltd., Yokohama, Japan
| |
Collapse
|
34
|
Abstract
This chapter outlines the myriad applications of machine learning (ML) in synthetic biology, specifically in engineering cell and protein activity, and metabolic pathways. Though by no means comprehensive, the chapter highlights several prominent computational tools applied in the field and their potential use cases. The examples detailed reinforce how ML algorithms can enhance synthetic biology research by providing data-driven insights into the behavior of living systems, even without detailed knowledge of their underlying mechanisms. By doing so, ML promises to increase the efficiency of research projects by modeling hypotheses in silico that can then be tested through experiments. While challenges related to training dataset generation and computational costs remain, ongoing improvements in ML tools are paving the way for smarter and more streamlined synthetic biology workflows that can be readily employed to address grand challenges across manufacturing, medicine, engineering, agriculture, and beyond.
Collapse
Affiliation(s)
- Brendan Fu-Long Sieow
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- NUS Graduate School for Integrative Sciences and Engineering Programme, National University of Singapore, Singapore, Singapore
| | - Ryan De Sotto
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Zhi Ren Darren Seet
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - In Young Hwang
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Matthew Wook Chang
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore.
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
35
|
Sugiki S, Niide T, Toya Y, Shimizu H. Logistic Regression-Guided Identification of Cofactor Specificity-Contributing Residues in Enzyme with Sequence Datasets Partitioned by Catalytic Properties. ACS Synth Biol 2022; 11:3973-3985. [PMID: 36321539 PMCID: PMC9764414 DOI: 10.1021/acssynbio.2c00315] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Changing the substrate/cofactor specificity of an enzyme requires multiple mutations at spatially adjacent positions around the substrate pocket. However, this is challenging when solely based on crystal structure information because enzymes undergo dynamic conformational changes during the reaction process. Herein, we proposed a method for estimating the contribution of each amino acid residue to substrate specificity by deploying a phylogenetic analysis with logistic regression. Since this method can estimate the candidate amino acids for mutation by ranking, it is readable and can be used in protein engineering. We demonstrated our concept using redox cofactor conversion of the Escherichia coli malic enzyme as a model, which still lacks crystal structure elucidation. The use of logistic regression with amino acid sequences classified by cofactor specificity showed that the NADP+-dependent malic enzyme completely switched cofactor specificity to NAD+ dependence without the need for a practical screening step. The model showed that surrounding residues made a greater contribution to cofactor specificity than those in the interior of the substrate pocket. These residues might be difficult to identify from crystal structure observations. We show that a highly accurate and inferential machine learning model was obtained using amino acid sequences of structurally homologous and functionally distinct enzymes as input data.
Collapse
|
36
|
Engelhart E, Emerson R, Shing L, Lennartz C, Guion D, Kelley M, Lin C, Lopez R, Younger D, Walsh ME. A dataset comprised of binding interactions for 104,972 antibodies against a SARS-CoV-2 peptide. Sci Data 2022; 9:653. [PMID: 36289234 PMCID: PMC9606274 DOI: 10.1038/s41597-022-01779-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 10/14/2022] [Indexed: 11/21/2022] Open
Abstract
The dataset presented here contains quantitative binding scores of scFv-format antibodies against a SARS-CoV-2 target peptide collected via an AlphaSeq assay that can be used in the development and benchmarking of machine learning models. Starting from three seed sequences identified from a phage display campaign using a human naïve library, four sets of 29,900 antibodies were designed in silico by creating all k = 1 mutations and random k = 2 and k = 3 mutations throughout the complementary-determining regions (CDRs). Of the 119,600 designs, 104,972 were successfully built in to the AlphaSeq library and target binding was subsequently measured with 71,384 designs resulting in a predicted affinity value for at least one of the triplicate measurements. Data include antibodies with predicted affinity measurements ranging from 37 pM to 22 mM. To our knowledge, this dataset is the largest, publicly available dataset that contains antibody sequences, antigen sequence and quantitative measurements of binding scores and provides an opportunity to serve as a benchmark to evaluate antibody-specific representation models for machine learning. Measurement(s) | Antibody Binding | Technology Type(s) | AlphaSeq | Factor Type(s) | Antibody sequence | Sample Characteristic - Organism | Homo sapiens |
Collapse
Affiliation(s)
| | | | - Leslie Shing
- grid.504876.80000 0001 0684 1626Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA USA
| | - Chelsea Lennartz
- grid.504876.80000 0001 0684 1626Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA USA
| | | | | | | | | | | | - Matthew E. Walsh
- grid.504876.80000 0001 0684 1626Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA USA ,grid.21107.350000 0001 2171 9311Present Address: Department of Environmental Health and Engineering, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| |
Collapse
|
37
|
Qiu Y, Wei GW. CLADE 2.0: Evolution-Driven Cluster Learning-Assisted Directed Evolution. J Chem Inf Model 2022; 62:4629-4641. [PMID: 36154171 DOI: 10.1021/acs.jcim.2c01046] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Directed evolution, a revolutionary biotechnology in protein engineering, optimizes protein fitness by searching an astronomical mutational space via expensive experiments. The cluster learning-assisted directed evolution (CLADE) efficiently explores the mutational space via a combination of unsupervised hierarchical clustering and supervised learning. However, the initial-stage sampling in CLADE treats all clusters equally despite many clusters containing a large portion of non-functional mutations. Recent statistical and deep learning tools enable evolutionary density modeling to access protein fitness in an unsupervised manner. In this work, we construct an ensemble of multiple evolutionary scores to guide the initial sampling in CLADE. The resulting evolutionary score-enhanced CLADE, called CLADE 2.0, efficiently selects a training set within a small informative space using the evolution-driven clustering sampling. CLADE 2.0 is validated by using two benchmark libraries both having 160,000 sequences from four-site mutational combinations. Extensive computational experiments and comparisons with existing cutting-edge methods indicate that CLADE 2.0 is a new state-of-art tool for machine learning-assisted directed evolution.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States.,Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States.,Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
38
|
Koebke KJ, Pinter TBJ, Pitts WC, Pecoraro VL. Catalysis and Electron Transfer in De Novo Designed Metalloproteins. Chem Rev 2022; 122:12046-12109. [PMID: 35763791 PMCID: PMC10735231 DOI: 10.1021/acs.chemrev.1c01025] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
One of the hallmark advances in our understanding of metalloprotein function is showcased in our ability to design new, non-native, catalytically active protein scaffolds. This review highlights progress and milestone achievements in the field of de novo metalloprotein design focused on reports from the past decade with special emphasis on de novo designs couched within common subfields of bioinorganic study: heme binding proteins, monometal- and dimetal-containing catalytic sites, and metal-containing electron transfer sites. Within each subfield, we highlight several of what we have identified as significant and important contributions to either our understanding of that subfield or de novo metalloprotein design as a discipline. These reports are placed in context both historically and scientifically. General suggestions for future directions that we feel will be important to advance our understanding or accelerate discovery are discussed.
Collapse
Affiliation(s)
- Karl J. Koebke
- Department of Chemistry, University of Michigan Ann Arbor, MI 48109 USA
| | | | - Winston C. Pitts
- Department of Chemistry, University of Michigan Ann Arbor, MI 48109 USA
| | | |
Collapse
|
39
|
Huang JJ, Hu HX, Lu YJ, Bao YD, Zhou JL, Huang M. Computer-Aided Design of α-L-Rhamnosidase to Increase the Synthesis Efficiency of Icariside I. Front Bioeng Biotechnol 2022; 10:926829. [PMID: 35800333 PMCID: PMC9253678 DOI: 10.3389/fbioe.2022.926829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 05/24/2022] [Indexed: 11/13/2022] Open
Abstract
Icariside I, the glycosylation product of icaritin, is a novel effective anti-cancer agent with immunological anti-tumor activity. However, very limited natural icariside I content hinders its direct extraction from plants. Therefore, we employed a computer-aided protein design strategy to improve the catalytic efficiency and substrate specificity of the α-L-rhamnosidase from Thermotoga petrophila DSM 13995, to provide a highly-efficient preparation method. Several beneficial mutants were obtained by expanding the active cavity. The catalytic efficiencies of all mutants were improved 16-200-fold compared with the wild-type TpeRha. The double-point mutant DH was the best mutant and showed the highest catalytic efficiency (k cat /K M : 193.52 s-1 M-1) against icariin, which was a 209.76-fold increase compared with the wild-type TpeRha. Besides, the single-point mutant H570A showed higher substrate specificity than that of the wild-type TpeRha in hydrolysis of different substrates. This study provides enzyme design strategies and principles for the hydrolysis of rhamnosyl natural products.
Collapse
Affiliation(s)
- Jia-Jun Huang
- School of Food Science and Engineering, South China University of Technology, Guangzhou, China
- Golden Health Biotechnology Co., Ltd., Foshan, China
| | - Hao-Xuan Hu
- Golden Health Biotechnology Co., Ltd., Foshan, China
| | - Yu-Jing Lu
- Golden Health Biotechnology Co., Ltd., Foshan, China
- School of Chemical Engineering and Light Industry, School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Guangzhou, China
| | - Ya-Dan Bao
- Golden Health Biotechnology Co., Ltd., Foshan, China
| | - Jin-Lin Zhou
- Golden Health Biotechnology Co., Ltd., Foshan, China
| | - Mingtao Huang
- School of Food Science and Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
40
|
Wehler P, Armbruster D, Günter A, Schleicher E, Di Ventura B, Öztürk MA. Experimental Characterization of In Silico Red-Shift-Predicted iLOV L470T/Q489K and iLOV V392K/F410V/A426S Mutants. ACS OMEGA 2022; 7:19555-19560. [PMID: 35722011 PMCID: PMC9202016 DOI: 10.1021/acsomega.2c01283] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 04/28/2022] [Indexed: 06/15/2023]
Abstract
iLOV is a flavin mononucleotide-binding fluorescent protein used for in vivo cellular imaging similar to the green fluorescent protein. To expand the range of applications of iLOV, spectrally tuned red-shifted variants are desirable to reduce phototoxicity and allow for better tissue penetration. In this report, we experimentally tested two iLOV mutants, iLOVL470T/Q489K and iLOVV392K/F410V/A426S, which were previously computationally proposed by (KhrenovaJ. Phys. Chem. B2017, 121 ( (43), ), pp 10018-10025) to have red-shifted excitation and emission spectra. While iLOVL470T/Q489K is about 20% brighter compared to the WT in vitro, it exhibits a blue shift in contrast to quantum mechanics/molecular mechanics (QM/MM) predictions. Additional optical characterization of an iLOVV392K mutant revealed that V392 is essential for cofactor binding and, accordingly, variants with V392K mutation are unable to bind to FMN. iLOVL470T/Q489K and iLOVV392K/F410V/A426S are expressed at low levels and have no detectable fluorescence in living cells, preventing their utilization in imaging applications.
Collapse
Affiliation(s)
- Pierre Wehler
- Institute
of Biology II, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
- Centers
for Biological Signalling Studies BIOSS and CIBSS, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
| | - Daniel Armbruster
- Institute
of Biology II, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
- Centers
for Biological Signalling Studies BIOSS and CIBSS, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
| | - Andreas Günter
- Institute
of Physical Chemistry, University of Freiburg, Albertstr. 21, 79104 Freiburg, Germany
| | - Erik Schleicher
- Institute
of Physical Chemistry, University of Freiburg, Albertstr. 21, 79104 Freiburg, Germany
| | - Barbara Di Ventura
- Institute
of Biology II, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
- Centers
for Biological Signalling Studies BIOSS and CIBSS, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
| | - Mehmet Ali Öztürk
- Institute
of Biology II, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
- Centers
for Biological Signalling Studies BIOSS and CIBSS, University of Freiburg, Schänzlestraße 1, 79104 Freiburg, Germany
| |
Collapse
|
41
|
Lin CY, Romei MG, Mathews II, Boxer SG. Energetic Basis and Design of Enzyme Function Demonstrated Using GFP, an Excited-State Enzyme. J Am Chem Soc 2022; 144:3968-3978. [PMID: 35200017 PMCID: PMC9014791 DOI: 10.1021/jacs.1c12305] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The past decades have witnessed an explosion of de novo protein designs with a remarkable range of scaffolds. It remains challenging, however, to design catalytic functions that are competitive with naturally occurring counterparts as well as biomimetic or nonbiological catalysts. Although directed evolution often offers efficient solutions, the fitness landscape remains opaque. Green fluorescent protein (GFP), which has revolutionized biological imaging and assays, is one of the most redesigned proteins. While not an enzyme in the conventional sense, GFPs feature competing excited-state decay pathways with the same steric and electrostatic origins as conventional ground-state catalysts, and they exert exquisite control over multiple reaction outcomes through the same principles. Thus, GFP is an "excited-state enzyme". Herein we show that rationally designed mutants and hybrids that contain environmental mutations and substituted chromophores provide the basis for a quantitative model and prediction that describes the influence of sterics and electrostatics on excited-state catalysis of GFPs. As both perturbations can selectively bias photoisomerization pathways, GFPs with fluorescence quantum yields (FQYs) and photoswitching characteristics tailored for specific applications could be predicted and then demonstrated. The underlying energetic landscape, readily accessible via spectroscopy for GFPs, offers an important missing link in the design of protein function that is generalizable to catalyst design.
Collapse
Affiliation(s)
- Chi-Yun Lin
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Matthew G Romei
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Irimpan I Mathews
- Stanford Synchrotron Radiation Lightsource, 2575 Sand Hill Road, Menlo Park, California 94025, United States
| | - Steven G Boxer
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
42
|
Raven SA, Payne B, Bruce M, Filipovska A, Rackham O. In silico evolution of nucleic acid-binding proteins from a nonfunctional scaffold. Nat Chem Biol 2022; 18:403-411. [PMID: 35210620 DOI: 10.1038/s41589-022-00967-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 01/04/2022] [Indexed: 11/09/2022]
Abstract
Directed evolution emulates the process of natural selection to produce proteins with improved or altered functions. These approaches have proven to be very powerful but are technically challenging and particularly time and resource intensive. To bypass these limitations, we constructed a system to perform the entire process of directed evolution in silico. We employed iterative computational cycles of mutation and evaluation to predict mutations that confer high-affinity binding activities for DNA and RNA to an initial de novo designed protein with no inherent function. Beneficial mutations revealed modes of nucleic acid recognition not previously observed in natural proteins, highlighting the ability of computational directed evolution to access new molecular functions. Furthermore, the process by which new functions were obtained closely resembles natural evolution and can provide insights into the contributions of mutation rate, population size and selective pressure on functionalization of macromolecules in nature.
Collapse
Affiliation(s)
- Samuel A Raven
- Harry Perkins Institute of Medical Research, Nedlands, Western Australia, Australia.,University of Western Australia Centre for Medical Research, Nedlands, Western Australia, Australia
| | - Blake Payne
- Harry Perkins Institute of Medical Research, Nedlands, Western Australia, Australia.,University of Western Australia Centre for Medical Research, Nedlands, Western Australia, Australia
| | - Mitchell Bruce
- Curtin Medical School, Curtin University, Bentley, Western Australia, Australia
| | - Aleksandra Filipovska
- Harry Perkins Institute of Medical Research, Nedlands, Western Australia, Australia.,University of Western Australia Centre for Medical Research, Nedlands, Western Australia, Australia.,School of Molecular Sciences, The University of Western Australia, Crawley, Western Australia, Australia.,Telethon Kids Institute, Northern Entrance, Perth Children's Hospital, Nedlands, Western Australia, Australia
| | - Oliver Rackham
- Harry Perkins Institute of Medical Research, Nedlands, Western Australia, Australia. .,Curtin Medical School, Curtin University, Bentley, Western Australia, Australia. .,Telethon Kids Institute, Northern Entrance, Perth Children's Hospital, Nedlands, Western Australia, Australia. .,Curtin Health Innovation Research Institute, Curtin University, Bentley, Western Australia, Australia.
| |
Collapse
|
43
|
Büchler J, Malca SH, Patsch D, Voss M, Turner NJ, Bornscheuer UT, Allemann O, Le Chapelain C, Lumbroso A, Loiseleur O, Buller R. Algorithm-aided engineering of aliphatic halogenase WelO5* for the asymmetric late-stage functionalization of soraphens. Nat Commun 2022; 13:371. [PMID: 35042883 PMCID: PMC8766452 DOI: 10.1038/s41467-022-27999-1] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 12/17/2021] [Indexed: 02/08/2023] Open
Abstract
Late-stage functionalization of natural products offers an elegant route to create novel entities in a relevant biological target space. In this context, enzymes capable of halogenating sp3 carbons with high stereo- and regiocontrol under benign conditions have attracted particular attention. Enabled by a combination of smart library design and machine learning, we engineer the iron/α-ketoglutarate dependent halogenase WelO5* for the late-stage functionalization of the complex and chemically difficult to derivatize macrolides soraphen A and C, potent anti-fungal agents. While the wild type enzyme WelO5* does not accept the macrolide substrates, our engineering strategy leads to active halogenase variants and improves upon their apparent kcat and total turnover number by more than 90-fold and 300-fold, respectively. Notably, our machine-learning guided engineering approach is capable of predicting more active variants and allows us to switch the regio-selectivity of the halogenases facilitating the targeted analysis of the derivatized macrolides’ structure-function activity in biological assays. The late-stage functionalization of unactivated carbon–hydrogen bonds is a difficult but important task, which has been met with promising but limited success through synthetic organic chemistry. Here the authors use machine learning to engineer WelO5* halogenase variants, which led to regioselective chlorination of inert C–H bonds on a representative polyketide that is a non-natural substrate for the enzyme.
Collapse
Affiliation(s)
- Johannes Büchler
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland.,School of Chemistry, The University of Manchester, Manchester Institute of Biotechnology, Manchester, M1 7DN, United Kingdom
| | - Sumire Honda Malca
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland
| | - David Patsch
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland.,Institute of Biochemistry, Dept. of Biotechnology & Enzyme Catalysis, Greifswald University, Felix-Hausdorff-Strasse 4, 17487, Greifswald, Germany
| | - Moritz Voss
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland
| | - Nicholas J Turner
- School of Chemistry, The University of Manchester, Manchester Institute of Biotechnology, Manchester, M1 7DN, United Kingdom
| | - Uwe T Bornscheuer
- Institute of Biochemistry, Dept. of Biotechnology & Enzyme Catalysis, Greifswald University, Felix-Hausdorff-Strasse 4, 17487, Greifswald, Germany
| | - Oliver Allemann
- Syngenta Crop Protection AG, Schaffhauserstrasse 101, 4332, Stein, Switzerland.,Idorsia Pharmaceuticals Ltd, Hegenheimermattweg 91, 4123, Allschwil, Switzerland
| | | | - Alexandre Lumbroso
- Syngenta Crop Protection AG, Schaffhauserstrasse 101, 4332, Stein, Switzerland
| | - Olivier Loiseleur
- Syngenta Crop Protection AG, Schaffhauserstrasse 101, 4332, Stein, Switzerland.
| | - Rebecca Buller
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland.
| |
Collapse
|
44
|
Chee WKD, Yeoh JW, Dao VL, Poh CL. Thermogenetics: Applications come of age. Biotechnol Adv 2022; 55:107907. [PMID: 35041863 DOI: 10.1016/j.biotechadv.2022.107907] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 12/13/2021] [Accepted: 01/09/2022] [Indexed: 12/20/2022]
Abstract
Temperature is a ubiquitous physical cue that is non-invasive, penetrative and easy to apply. In the growing field of thermogenetics, through beneficial repurposing of natural thermosensing mechanisms, synthetic biology is bringing new opportunities to design and build robust temperature-sensitive (TS) sensors which forms a thermogenetic toolbox of well characterised biological parts. Recent advancements in technological platforms available have expedited the discovery of novel or de novo thermosensors which are increasingly deployed in many practical temperature-dependent biomedical, industrial and biosafety applications. In all, the review aims to convey both the exhilarating recent technological developments underlying the advancement of thermosensors and the exciting opportunities the nascent thermogenetic field holds for biomedical and biotechnology applications.
Collapse
Affiliation(s)
- Wai Kit David Chee
- Department of Biomedical Engineering, Faculty of Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117583, Singapore; NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), Life Sciences Institute, National University of Singapore, 28 Medical Drive, Singapore 117456, Singapore
| | - Jing Wui Yeoh
- Department of Biomedical Engineering, Faculty of Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117583, Singapore; NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), Life Sciences Institute, National University of Singapore, 28 Medical Drive, Singapore 117456, Singapore
| | - Viet Linh Dao
- Department of Biomedical Engineering, Faculty of Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117583, Singapore; NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), Life Sciences Institute, National University of Singapore, 28 Medical Drive, Singapore 117456, Singapore
| | - Chueh Loo Poh
- Department of Biomedical Engineering, Faculty of Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117583, Singapore; NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), Life Sciences Institute, National University of Singapore, 28 Medical Drive, Singapore 117456, Singapore.
| |
Collapse
|
45
|
Mardikoraem M, Woldring D. Machine Learning-driven Protein Library Design: A Path Toward Smarter Libraries. Methods Mol Biol 2022; 2491:87-104. [PMID: 35482186 DOI: 10.1007/978-1-0716-2285-8_5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Proteins are small yet valuable biomolecules that play a versatile role in therapeutics and diagnostics. The intricate sequence-structure-function paradigm in the realm of proteins opens the possibility for directly mapping amino acid sequence to function. However, the rugged nature of the protein fitness landscape and an astronomical number of possible mutations even for small proteins make navigating this system a daunting task. Moreover, the scarcity of functional proteins and the ease with which deleterious mutations are introduced, due to complex epistatic relationships, compound the existing challenges. This highlights the need for auxiliary tools in current techniques such as rational design and directed evolution. To that end, the state-of-the-art machine learning can offer time and cost efficiency in finding high fitness proteins, circumventing unnecessary wet-lab experiments. In the context of improving library design, machine learning provides valuable insights via its unique features such as high adaptation to complex systems, multi-tasking, and parallelism, and the ability to capture hidden trends in input data. Finally, both the advancements in computational resources and the rapidly increasing number of sequences in protein databases will allow more promising and detailed insights delivered from machine learning to protein library design. In this chapter, fundamental concepts and a method for machine learning-driven library design leveraging deep sequencing datasets will be discussed. We elaborate on (1) basic knowledge about machine learning algorithms, (2) the benefit of machine learning in library design, and (3) methodology for implementing machine learning in library design.
Collapse
Affiliation(s)
- Mehrsa Mardikoraem
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Daniel Woldring
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, USA.
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
46
|
Cadet XF, Gelly JC, van Noord A, Cadet F, Acevedo-Rocha CG. Learning Strategies in Protein Directed Evolution. Methods Mol Biol 2022; 2461:225-275. [PMID: 35727454 DOI: 10.1007/978-1-0716-2152-3_15] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Synthetic biology is a fast-evolving research field that combines biology and engineering principles to develop new biological systems for medical, pharmacological, and industrial applications. Synthetic biologists use iterative "design, build, test, and learn" cycles to efficiently engineer genetic systems that are reliable, reproducible, and predictable. Protein engineering by directed evolution can benefit from such a systematic engineering approach for various reasons. Learning can be carried out before starting, throughout or after finalizing a directed evolution project. Computational tools, bioinformatics, and scanning mutagenesis methods can be excellent starting points, while molecular dynamics simulations and other strategies can guide engineering efforts. Similarly, studying protein intermediates along evolutionary pathways offers fascinating insights into the molecular mechanisms shaped by evolution. The learning step of the cycle is not only crucial for proteins or enzymes that are not suitable for high-throughput screening or selection systems, but it is also valuable for any platform that can generate a large amount of data that can be aided by machine learning algorithms. The main challenge in protein engineering is to predict the effect of a single mutation on one functional parameter-to say nothing of several mutations on multiple parameters. This is largely due to nonadditive mutational interactions, known as epistatic effects-beneficial mutations present in a genetic background may not be beneficial in another genetic background. In this work, we provide an overview of experimental and computational strategies that can guide the user to learn protein function at different stages in a directed evolution project. We also discuss how epistatic effects can influence the success of directed evolution projects. Since machine learning is gaining momentum in protein engineering and the field is becoming more interdisciplinary thanks to collaboration between mathematicians, computational scientists, engineers, molecular biologists, and chemists, we provide a general workflow that familiarizes nonexperts with the basic concepts, dataset requirements, learning approaches, model capabilities and performance metrics of this intriguing area. Finally, we also provide some practical recommendations on how machine learning can harness epistatic effects for engineering proteins in an "outside-the-box" way.
Collapse
Affiliation(s)
- Xavier F Cadet
- PEACCEL, Artificial Intelligence Department, Paris, France
| | - Jean Christophe Gelly
- Laboratoire d'Excellence GR-Ex, Paris, France
- BIGR, DSIMB, UMR_S1134, INSERM, University of Paris & University of Reunion, Paris, France
| | | | - Frédéric Cadet
- Laboratoire d'Excellence GR-Ex, Paris, France
- BIGR, DSIMB, UMR_S1134, INSERM, University of Paris & University of Reunion, Paris, France
| | | |
Collapse
|
47
|
SpeedyGenesXL: an Automated, High-Throughput Platform for the Preparation of Bespoke Ultralarge Variant Libraries for Directed Evolution. Methods Mol Biol 2022; 2461:67-83. [PMID: 35727444 DOI: 10.1007/978-1-0716-2152-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Directed evolution of proteins is a highly effective strategy for tailoring biocatalysts to a particular application, and is capable of engineering improvements such as kcat, thermostability and organic solvent tolerance. It is recognized that large and systematic libraries are required to navigate a protein's vast and rugged sequence landscape effectively, yet their preparation is nontrivial and commercial libraries are extremely costly. To address this, we have developed SpeedyGenesXL, an automated, high-throughput platform for the production of wild-type genes, Boolean OR, combinatorial, or combinatorial-OR-type libraries based on the SpeedyGenes methodology. Together this offers a flexible platform for library synthesis, capable of generating many different bespoke, diverse libraries simultaneously.
Collapse
|
48
|
Abstract
Directed evolution, a strategy for protein engineering, optimizes protein properties (i.e., fitness) by expensive and time-consuming screening or selection of large mutational sequence space. Machine learning-assisted directed evolution (MLDE), which screens sequence properties in silico, can accelerate the optimization and reduce the experimental burden. This work introduces a MLDE framework, cluster learning-assisted directed evolution (CLADE), that combines hierarchical unsupervised clustering sampling and supervised learning to guide protein engineering. The clustering sampling selectively picks and screens variants in targeted subspaces, which guides the subsequent generation of diverse training sets. In the last stage, accurate predictions via supervised learning models improve final outcomes. By sequentially screening 480 sequences out of 160,000 in a four-site combinatorial library with five equal experimental batches, CLADE achieves the global maximal fitness hit rate up to 91.0% and 34.0% for GB1 and PhoQ datasets, respectively, improved from 18.6% and 7.2% obtained by random-sampling-based MLDE.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Jian Hu
- Department of Chemistry, Michigan State University, MI, 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI, 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI, 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Corresponding author:
| |
Collapse
|
49
|
Gelman S, Fahlberg SA, Heinzelman P, Romero PA, Gitter A. Neural networks to learn protein sequence-function relationships from deep mutational scanning data. Proc Natl Acad Sci U S A 2021; 118:e2104878118. [PMID: 34815338 PMCID: PMC8640744 DOI: 10.1073/pnas.2104878118] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/01/2021] [Indexed: 11/18/2022] Open
Abstract
The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI 53706
- Morgridge Institute for Research, Madison, WI 53715
| | - Sarah A. Fahlberg
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706
| | - Pete Heinzelman
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706
| | - Philip A. Romero
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI 53706
- Morgridge Institute for Research, Madison, WI 53715
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53792
| |
Collapse
|
50
|
Saito Y, Oikawa M, Sato T, Nakazawa H, Ito T, Kameda T, Tsuda K, Umetsu M. Machine-Learning-Guided Library Design Cycle for Directed Evolution of Enzymes: The Effects of Training Data Composition on Sequence Space Exploration. ACS Catal 2021. [DOI: 10.1021/acscatal.1c03753] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Misaki Oikawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Takumi Sato
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoyuki Ito
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| |
Collapse
|