1
|
Vongsouthi V, Georgelin R, Matthews DS, Saunders J, Lee BM, Ton J, Damry AM, Frkic RL, Spence MA, Jackson CJ. Ancestral reconstruction of polyethylene terephthalate degrading cutinases reveals a rugged and unexplored sequence-fitness landscape. SCIENCE ADVANCES 2025; 11:eads8318. [PMID: 40367179 PMCID: PMC12077509 DOI: 10.1126/sciadv.ads8318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Accepted: 04/09/2025] [Indexed: 05/16/2025]
Abstract
The use of protein engineering to generate enzymes for the degradation of polyethylene terephthalate (PET) is a promising route for plastic recycling, yet traditional engineering approaches often fail to explore protein sequence space for optimal enzymes. In this work, we use multiplexed ancestral sequence reconstruction (mASR) to address this, exploring the evolutionary sequence space of PET-degrading cutinases. Using 20 statistically equivalent phylogenies of the bacterial cutinase family, we generated 48 ancestral sequences revealing a wide range of PETase activities, highlighting the value of mASR in uncovering functional variants. Our findings show PETase activity can evolve through multiple pathways involving mutations remote from the active site. Moreover, analyzing the PETase fitness landscape with local ancestral sequence embedding (LASE) revealed that LASE can capture sequence features linked to PETase activity. This work highlights mASR's potential in exploration of sequence space and underscores the use of LASE in readily mapping the protein fitness landscapes.
Collapse
Affiliation(s)
- Vanessa Vongsouthi
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- Samsara Eco, Sydney, NSW 2065, Australia
| | - Rosemary Georgelin
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- Samsara Eco, Sydney, NSW 2065, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Dana S. Matthews
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- Samsara Eco, Sydney, NSW 2065, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Jake Saunders
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Brendon M. Lee
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | | | - Adam M. Damry
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Rebecca L. Frkic
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Matthew A. Spence
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- Samsara Eco, Sydney, NSW 2065, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Colin J. Jackson
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
- ARC Centre of Excellence for Innovations in Synthetic Biology, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| |
Collapse
|
2
|
Gao L, Yuan J, Hong K, Ma NL, Liu S, Wu X. Technological advancement spurs Komagataella phaffii as a next-generation platform for sustainable biomanufacturing. Biotechnol Adv 2025; 82:108593. [PMID: 40339766 DOI: 10.1016/j.biotechadv.2025.108593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2025] [Revised: 04/11/2025] [Accepted: 05/01/2025] [Indexed: 05/10/2025]
Abstract
Biomanufacturing stands as a cornerstone of sustainable industrial development, necessitating a shift toward non-food carbon feedstocks to alleviate agricultural resource competition and advance a circular bioeconomy. Methanol, a renewable one‑carbon substrate, has emerged as a pivotal candidate due to its abundance, cost-effectiveness, and high reduction potential, further bolstered by breakthroughs in CO₂ hydrogenation-based synthesis. Capitalizing on this momentum, the methylotrophic yeast Komagataella phaffii has undergone transformative technological upgrades, evolving from a conventional protein expression workhorse into an intelligent bioproduction chassis. This paradigm shift is fundamentally driven by converging innovations across CRISPR-empowered advancement in genome editing and AI-powered metabolic pathway design in K. phaffii. The integration of CRISPR systems with droplet microfluidics high-throughput screening has redefined strain engineering efficiency, achieving much higher editing precision than traditional homologous recombination while compressing the "design-build-test-learn" cycle. Concurrently, machine learning-enhanced genome-scale metabolic models facilitate dynamic flux balancing, enabling simultaneous improvements in product titers, carbon yields, and volumetric productivity. Finally, technological advancement promotes the application of K. phaffii, including directing more efficiently metabolic flux toward nutrient products, and strengthening efficient synthesis of excreted proteins. As DNA synthesis automation and robotic experimentation platforms mature, next-generation breakthroughs in genome modification, cofactor engineering, and AI-guided autonomous evolution will further cement K. phaffii as a next-generation platform for decarbonizing global manufacturing paradigms. This technological trajectory positions methanol-based biomanufacturing as a cornerstone of the low-carbon circular economy.
Collapse
Affiliation(s)
- Le Gao
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, National Center of Technology Innovation for Synthetic Biology, No. 32, Xiqi Road, Tianjin Airport Economic Park, Tianjin 300308, China.
| | - Jie Yuan
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, National Center of Technology Innovation for Synthetic Biology, No. 32, Xiqi Road, Tianjin Airport Economic Park, Tianjin 300308, China
| | - Kai Hong
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, National Center of Technology Innovation for Synthetic Biology, No. 32, Xiqi Road, Tianjin Airport Economic Park, Tianjin 300308, China
| | - Nyuk Ling Ma
- Institute of Tropical Biodiversity and Sustainable Development, University Malaysia Terengganu, Malaysia
| | - Shuguang Liu
- Beijing Chasing future Biotechnology Co., Ltd, Beijing, China
| | - Xin Wu
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, National Center of Technology Innovation for Synthetic Biology, No. 32, Xiqi Road, Tianjin Airport Economic Park, Tianjin 300308, China.
| |
Collapse
|
3
|
Lu C, Fang R, Tian S, Hu M, Wang J, Ding J. Integrating protein contact networks for the engineering of thermostable lipase A. Int J Biol Macromol 2025; 306:141725. [PMID: 40044005 DOI: 10.1016/j.ijbiomac.2025.141725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2025] [Revised: 03/01/2025] [Accepted: 03/02/2025] [Indexed: 05/03/2025]
Abstract
In the field of industrial biocatalysis, the rapid advancement of enzyme functional evolution necessitates new theories and computational methods to achieve target functions with fewer iterations. This study identified key residues affecting enzyme stability by constructing the protein contact network (PCN) of Lipase A. Comparing the PCNs of the wild-type (WT) and the 6B variant revealed that changes in residue interactions and node properties (e.g., degree and betweenness centrality (BC)) positively impacted stability. Using thresholds for degree and BC, 25 candidate sites were screened, and 11 out of 18 single-point mutation designs improved thermal stability. Mutations were divided into three groups (M1, M2, M3) based on network communities and contributions, followed by iterative combinations. M1, containing five mutations distributed across four communities, increased the melting temperature (Tm) by 14.61 °C, close to the predicted 13.97 °C, demonstrating a linear additive effect. In M2, three new mutations resulted in a non-linear additive effect, with a ΔTm of 17.58 °C (Expected ΔTm = 18.93 °C). In contrast, the three new mutations in M3 destabilized the enzyme (Observed ΔTm = 15.94 °C vs Expected ΔTm = 19.92 °C). Molecular dynamics simulations showed that polar edge nodes enhanced network connectivity, while proline mutations rigidified flexible regions, improving stability. Conversely, M3 mutations disrupted α-helix stability by increasing the dihedral angle fluctuations of residue Y161, might to a stability-activity trade-off. The PCN provides valuable insights for developing efficient and precise design strategies.
Collapse
Affiliation(s)
- Cheng Lu
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 214122 Wuxi, China
| | - Ruijie Fang
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 214122 Wuxi, China
| | - Siyuan Tian
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 214122 Wuxi, China
| | - Mingzhu Hu
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 214122 Wuxi, China
| | - Jianan Wang
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 214122 Wuxi, China
| | - Jian Ding
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, 214122 Wuxi, China.
| |
Collapse
|
4
|
Gelman S, Johnson B, Freschlin C, Sharma A, D'Costa S, Peters J, Gitter A, Romero PA. Biophysics-based protein language models for protein engineering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.03.15.585128. [PMID: 38559182 PMCID: PMC10980077 DOI: 10.1101/2024.03.15.585128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Bryce Johnson
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | | | - Arnav Sharma
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Sameer D'Costa
- Department of Biochemistry, University of Wisconsin-Madison
| | - John Peters
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | - Philip A Romero
- Department of Biochemistry, University of Wisconsin-Madison
- Department of Biomedical Engineering, Duke University
| |
Collapse
|
5
|
Kuang Z, Yan X, Yuan Y, Wang R, Zhu H, Wang Y, Li J, Ye J, Yue H, Yang X. Advances in stress-tolerance elements for microbial cell factories. Synth Syst Biotechnol 2024; 9:793-808. [PMID: 39072145 PMCID: PMC11277822 DOI: 10.1016/j.synbio.2024.06.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 06/10/2024] [Accepted: 06/27/2024] [Indexed: 07/30/2024] Open
Abstract
Microorganisms, particularly extremophiles, have evolved multiple adaptation mechanisms to address diverse stress conditions during survival in unique environments. Their responses to environmental coercion decide not only survival in severe conditions but are also an essential factor determining bioproduction performance. The design of robust cell factories should take the balance of their growing and bioproduction into account. Thus, mining and redesigning stress-tolerance elements to optimize the performance of cell factories under various extreme conditions is necessary. Here, we reviewed several stress-tolerance elements, including acid-tolerant elements, saline-alkali-resistant elements, thermotolerant elements, antioxidant elements, and so on, providing potential materials for the construction of cell factories and the development of synthetic biology. Strategies for mining and redesigning stress-tolerance elements were also discussed. Moreover, several applications of stress-tolerance elements were provided, and perspectives and discussions for potential strategies for screening stress-tolerance elements were made.
Collapse
Affiliation(s)
- Zheyi Kuang
- School of Intelligence Science and Technology, Xinjiang University, Urumqi, 830017, China
| | - Xiaofang Yan
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yanfei Yuan
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Ruiqi Wang
- School of Intelligence Science and Technology, Xinjiang University, Urumqi, 830017, China
| | - Haifan Zhu
- School of Intelligence Science and Technology, Xinjiang University, Urumqi, 830017, China
| | - Youyang Wang
- School of Intelligence Science and Technology, Xinjiang University, Urumqi, 830017, China
| | - Jianfeng Li
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Jianwen Ye
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Haitao Yue
- School of Intelligence Science and Technology, Xinjiang University, Urumqi, 830017, China
- Laboratory of Synthetic Biology, School of Life Science and Technology, Xinjiang University, Urumqi, 830017, China
| | - Xiaofeng Yang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
6
|
Zhang Z, Li Z, Wang Q, Wu H, Yang M, Zhao F, Tan M, Han S. A protein fitness predictive framework based on feature combination and intelligent searching. Protein Sci 2024; 33:e5211. [PMID: 39548358 PMCID: PMC11567853 DOI: 10.1002/pro.5211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 09/14/2024] [Accepted: 10/22/2024] [Indexed: 11/17/2024]
Abstract
Machine learning (ML) constructs predictive models by understanding the relationship between protein sequences and their functions, enabling efficient identification of protein sequences with high fitness values without falling into local optima, like directional evolution. However, how to extract the most pertinent functional feature information from a limited number of protein sequences is vital for optimizing the performance of ML models. Here, we propose scut_ProFP (Protein Fitness Predictor), a predictive framework that integrates feature combination and feature selection techniques. Feature combination offers comprehensive sequence information, while feature selection searches for the most beneficial features to enhance model performance, enabling accurate sequence-to-function mapping. Compared to similar frameworks, scut_ProFP demonstrates superior performance and is also competitive with more complex deep learning models-ECNet, EVmutation, and UniRep. In addition, scut_ProFP enables generalization from low-order mutants to high-order mutants. Finally, we utilized scut_ProFP to simulate the engineering of the fluorescent protein CreiLOV and highly enriched mutants with high fluorescence based on only a small number of low-fluorescence mutants. Essentially, the developed method is advantageous for ML in protein engineering, providing an effective approach to data-driven protein engineering. The code and datasets for scut_ProFP are available at https://github.com/Zhang66-star/scut_ProFP.
Collapse
Affiliation(s)
- Zhihui Zhang
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological EngineeringSouth China University of TechnologyGuangzhouChina
| | - Zhixuan Li
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological EngineeringSouth China University of TechnologyGuangzhouChina
| | - Qianyue Wang
- School of Software EngineeringSouth China University of TechnologyGuangzhouChina
| | - Hanlin Wu
- School of Software EngineeringSouth China University of TechnologyGuangzhouChina
| | - Manli Yang
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological EngineeringSouth China University of TechnologyGuangzhouChina
| | - Fengguang Zhao
- School of Light Industry and EngineeringSouth China University of TechnologyGuangzhouChina
| | - Mingkui Tan
- School of Software EngineeringSouth China University of TechnologyGuangzhouChina
| | - Shuangyan Han
- Guangdong Key Laboratory of Fermentation and Enzyme Engineering, School of Biology and Biological EngineeringSouth China University of TechnologyGuangzhouChina
| |
Collapse
|
7
|
Wu J, Wang Z, Zeng M, He Z, Chen Q, Chen J. Comprehensive Understanding of Laboratory Evolution for Food Enzymes: From Design to Screening Innovations. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:24928-24943. [PMID: 39495102 DOI: 10.1021/acs.jafc.4c08453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2024]
Abstract
In the field of food processing, enzymes play a pivotal role in improving product quality and flavor, and extending shelf life. However, the exposure of traditional food enzymes to high temperatures during processing often leads to a decrease in activity or even inactivation, limiting the effectiveness of their application under high-temperature conditions. Therefore, the modification of thermostability and activity of enzymes to adapt to extreme conditions through protein engineering has become a key way to improve the efficiency and economic benefits of industrial production. Directed evolution and semirational design strategies in the laboratory have proven to be broadly applicable frameworks for biochemical researchers in the food field, including those who are beginners. In this review, we systematically summarize semirational design strategies and high-throughput screening strategies, and introduce some intuitive computer simulation software to improve the thermostability and enzyme activity of food enzymes. The application of these strategies and techniques provides a comprehensive guide for the optimization of food enzymes. In addition, the latest hot topics of genetically engineered food enzymes in the field of application are discussed.
Collapse
Affiliation(s)
- Junhao Wu
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, P. R. China
| | - Zhaojun Wang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, P. R. China
| | - Maomao Zeng
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, P. R. China
| | - Zhiyong He
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, P. R. China
| | - Qiuming Chen
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, P. R. China
| | - Jie Chen
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, Jiangsu 214122, P. R. China
| |
Collapse
|
8
|
Xie X, Gui L, Qiao B, Wang G, Huang S, Zhao Y, Sun S. Deep learning in template-free de novo biosynthetic pathway design of natural products. Brief Bioinform 2024; 25:bbae495. [PMID: 39373052 PMCID: PMC11456888 DOI: 10.1093/bib/bbae495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 09/12/2024] [Accepted: 09/20/2024] [Indexed: 10/08/2024] Open
Abstract
Natural products (NPs) are indispensable in drug development, particularly in combating infections, cancer, and neurodegenerative diseases. However, their limited availability poses significant challenges. Template-free de novo biosynthetic pathway design provides a strategic solution for NP production, with deep learning standing out as a powerful tool in this domain. This review delves into state-of-the-art deep learning algorithms in NP biosynthesis pathway design. It provides an in-depth discussion of databases like Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and UniProt, which are essential for model training, along with chemical databases such as Reaxys, SciFinder, and PubChem for transfer learning to expand models' understanding of the broader chemical space. It evaluates the potential and challenges of sequence-to-sequence and graph-to-graph translation models for accurate single-step prediction. Additionally, it discusses search algorithms for multistep prediction and deep learning algorithms for predicting enzyme function. The review also highlights the pivotal role of deep learning in improving catalytic efficiency through enzyme engineering, which is essential for enhancing NP production. Moreover, it examines the application of large language models in pathway design, enzyme discovery, and enzyme engineering. Finally, it addresses the challenges and prospects associated with template-free approaches, offering insights into potential advancements in NP biosynthesis pathway design.
Collapse
Affiliation(s)
- Xueying Xie
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), No. 26 Hexing Road, Xiangfang District, Harbin 150001, China
- College of Life Science, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Lin Gui
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Baixue Qiao
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), No. 26 Hexing Road, Xiangfang District, Harbin 150001, China
- College of Life Science, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Shan Huang
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, No. 246 Xuefu Road, Nangang District,Harbin 150081, China
| | - Yuming Zhao
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Shanwen Sun
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), No. 26 Hexing Road, Xiangfang District, Harbin 150001, China
- College of Life Science, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| |
Collapse
|
9
|
Cheng P, Mao C, Tang J, Yang S, Cheng Y, Wang W, Gu Q, Han W, Chen H, Li S, Chen Y, Zhou J, Li W, Pan A, Zhao S, Huang X, Zhu S, Zhang J, Shu W, Wang S. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res 2024; 34:630-647. [PMID: 38969803 PMCID: PMC11369238 DOI: 10.1038/s41422-024-00989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/03/2024] [Indexed: 07/07/2024] Open
Abstract
Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.
Collapse
Affiliation(s)
- Peng Cheng
- Bioinformatics Center of AMMS, Beijing, China
| | - Cong Mao
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Jin Tang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Sen Yang
- Bioinformatics Center of AMMS, Beijing, China
| | - Yu Cheng
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wuke Wang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Qiuxi Gu
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wei Han
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Hao Chen
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Sihan Li
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | | | | | - Wuju Li
- Bioinformatics Center of AMMS, Beijing, China
| | - Aimin Pan
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xingxu Huang
- Zhejiang Lab, Hangzhou, Zhejiang, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | | | - Jun Zhang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China.
| | - Wenjie Shu
- Bioinformatics Center of AMMS, Beijing, China.
| | | |
Collapse
|
10
|
Lipsh-Sokolik R, Fleishman SJ. Addressing epistasis in the design of protein function. Proc Natl Acad Sci U S A 2024; 121:e2314999121. [PMID: 39133844 PMCID: PMC11348311 DOI: 10.1073/pnas.2314999121] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024] Open
Abstract
Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional active-site variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.
Collapse
Affiliation(s)
- Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
11
|
Freschlin CR, Fahlberg SA, Heinzelman P, Romero PA. Neural network extrapolation to distant regions of the protein fitness landscape. Nat Commun 2024; 15:6405. [PMID: 39080282 PMCID: PMC11289474 DOI: 10.1038/s41467-024-50712-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 07/13/2024] [Indexed: 08/02/2024] Open
Abstract
Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks' capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models' extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. We also find that implementing a simple ensemble of convolutional neural networks enables robust design of high-performing variants in the local landscape. Our findings highlight how each architecture's inductive biases prime them to learn different aspects of the protein fitness landscape and how a simple ensembling approach makes protein engineering more robust.
Collapse
Affiliation(s)
- Chase R Freschlin
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Sarah A Fahlberg
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Pete Heinzelman
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Philip A Romero
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
12
|
Tang H, Zhu HL, Zhao JQ, Wang LY, Xue YP, Zheng YG. Through virtual saturation mutagenesis and rational design for superior substrate conversion in engineered d-amino acid oxidase. Biotechnol J 2024; 19:e2400287. [PMID: 39014925 DOI: 10.1002/biot.202400287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 06/02/2024] [Accepted: 06/17/2024] [Indexed: 07/18/2024]
Abstract
The d-amino acid oxidase (DAAO) is pivotal in obtaining optically pure l-glufosinate (l-PPT) by converting d-glufosinate (d-PPT) to its deamination product. We screened and designed a Rasamsonia emersonii DAAO (ReDAAO), making it more suitable for oxidizing d-PPT. Using Caver 3.0, we delineated three substrate binding pockets and, via alanine scanning, identified nearby key residues. Pinpointing key residues influencing activity, we applied virtual saturation mutagenesis (VSM), and experimentally validated mutants which reduced substrate binding energy. Analysis of positive mutants revealed elongated side-chain prevalence in substrate binding pocket periphery. Although computer-aided approaches can rapidly identify advantageous mutants and guide further design, the mutations obtained in the first round may not be suitable for combination with other advantageous mutations. Therefore, each round of combination requires reasonable iteration. Employing VSM-assisted screening multiple times and after four rounds of combining mutations, we ultimately obtained a mutant, N53V/F57Q/V94R/V242R, resulting in a mutant with a 5097% increase in enzyme activity compared to the wild type. It provides valuable insights into the structural determinants of enzyme activity and introduces a novel rational design procedure.
Collapse
Affiliation(s)
- Heng Tang
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, P. R. China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, P. R. China
| | - Hong-Li Zhu
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, P. R. China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, P. R. China
| | - Jin-Qiao Zhao
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, P. R. China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, P. R. China
| | - Liu-Yu Wang
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, P. R. China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, P. R. China
| | - Ya-Ping Xue
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, P. R. China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, P. R. China
| | - Yu-Guo Zheng
- Key Laboratory of Bioorganic Synthesis of Zhejiang Province, College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, P. R. China
- The National and Local Joint Engineering Research Center for Biomanufacturing of Chiral Chemicals, Zhejiang University of Technology, Hangzhou, P. R. China
| |
Collapse
|