1
|
Han Y, Ding X, Tan J, Sun Y, Duan Y, Liu Z, Zheng G, Lu D. Sequence and taxonomic feature evaluation facilitated the discovery of alcohol oxidases. Synth Syst Biotechnol 2025; 10:907-915. [PMID: 40386440 PMCID: PMC12083922 DOI: 10.1016/j.synbio.2025.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 04/18/2025] [Accepted: 04/21/2025] [Indexed: 05/20/2025] Open
Abstract
Recent advancements in data technology offer immense opportunities for the discovery and development of new enzymes for the green synthesis of chemicals. Current protein databases predominantly prioritize overall sequence matches. The multi-scale features underpinning catalytic mechanisms and processes, which are scattered across various data sources, have not been sufficiently integrated to be effectively utilized in enzyme mining. In this study, we developed a sequence- and taxonomic-feature evaluation driven workflow to discover enzymes that can be expressed in E. coli and catalyze chemical reactions in vitro, using alcohol oxidase (AOX) for demonstration, which catalyzes the conversion of methanol to formaldehyde. A dataset of 21 reported AOXs was used to construct sequence scoring rules based on features, including sequence length, structural motifs, catalytic-related residues, binding residues, and overall structure. These scoring rules were applied to filter the results from HMM-based searches, yielding 357 candidate sequences of eukaryotic origin, which were categorized into six classes at 85 % sequence similarity. Experimental validation was conducted in two rounds on 31 selected sequences representing all classes. Among these selected sequences, 19 were expressed as soluble proteins in E. coli, and 18 of these soluble proteins exhibited AOX activity, as predicted. Notably, the most active recombinant AOX exhibited an activity of 8.65 ± 0.29 U/mg, approaching the highest activity of native eukaryotic enzymes. Compared to the established UniProt-annotation-based workflow, this feature-evaluation-based approach yielded a higher probability of highly active recombinant AOX (from 8.3 % to 19.4 %), demonstrating the efficiency and potential of this multi-dimensional feature evaluation method in accelerating the discovery of active enzymes.
Collapse
Affiliation(s)
- Yilei Han
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Xuwei Ding
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Junjian Tan
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Yajuan Sun
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Yunjiang Duan
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Zheng Liu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Gaowei Zheng
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Diannan Lu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
2
|
Cai P, Liu D, Xing H, Zhang D, Le Y, Wu A, Hu QN. DeepMBEnzy: An AI-Driven Database of Mycotoxin Biotransformation Enzymes. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2025; 73:13038-13046. [PMID: 40378051 DOI: 10.1021/acs.jafc.5c02477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2025]
Abstract
Mycotoxins are toxic fungal metabolites that pose significant health risks. Enzyme biotransformation is a promising option for detoxifying mycotoxins and for elucidating their intracellular metabolism. However, few mycotoxin-biotransformation enzymes have been identified thus far. Here, we developed an enzyme promiscuity prediction for mycotoxin biotransformation (EPP-MB) model by fine-tuning a pretrained model using a cold protein data-splitting approach. The EPP-MB model leverages deep learning to predict enzymes capable of mycotoxin biotransformation, achieving a validation accuracy of 79% against a data set of experimentally confirmed mycotoxin-biotransforming enzymes. We applied the model to predict potential biotransformation enzymes for over 4000 mycotoxins and compiled these into the DeepMBEnzy database, which archives the predicted enzymes and related information for each mycotoxin, providing researchers with a user-friendly, publicly accessible interface at https://synbiodesign.com/DeepMBEnzy/. DeepMBEnzy is designed to facilitate the exploration and utilization of enzyme candidates in mycotoxin biotransformation, supporting further advancements in mycotoxin detoxification research and applications.
Collapse
Affiliation(s)
- Pengli Cai
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dongliang Liu
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Huadong Xing
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dachuan Zhang
- Department of Food Science and Technology, National University of Singapore, Singapore 117542, Singapore
| | - Yingying Le
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - AiBo Wu
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Qian-Nan Hu
- Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
3
|
Liao L, Xie M, Zheng X, Zhou Z, Deng Z, Gao J. Molecular insights fast-tracked: AI in biosynthetic pathway research. Nat Prod Rep 2025; 42:911-936. [PMID: 40130306 DOI: 10.1039/d4np00003j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2025]
Abstract
Covering: 2000 to 2025This review explores the potential of artificial intelligence (AI) in addressing challenges and accelerating molecular insights in biosynthetic pathway research, which is crucial for developing bioactive natural products with applications in pharmacology, agriculture, and biotechnology. It provides an overview of various AI techniques relevant to this research field, including machine learning (ML), deep learning (DL), natural language processing, network analysis, and data mining. AI-powered applications across three main areas, namely, pathway discovery and mining, pathway design, and pathway optimization, are discussed, and the benefits and challenges of integrating omics data and AI for enhanced pathway research are also elucidated. This review also addresses the current limitations, future directions, and the importance of synergy between AI and experimental approaches in unlocking rapid advancements in biosynthetic pathway research. The review concludes with an evaluation of AI's current capabilities and future outlook, emphasizing the transformative impact of AI on biosynthetic pathway research and the potential for new opportunities in the discovery and optimization of bioactive natural products.
Collapse
Affiliation(s)
- Lijuan Liao
- Key BioAI Synthetica Lab for Natural Product Drug Discovery, College of Bee, Biomedical and Pharmaceutical Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237, P. R. China
| | - Mengjun Xie
- Key BioAI Synthetica Lab for Natural Product Drug Discovery, College of Bee, Biomedical and Pharmaceutical Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| | - Xiaoshan Zheng
- Key BioAI Synthetica Lab for Natural Product Drug Discovery, College of Bee, Biomedical and Pharmaceutical Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| | - Zhao Zhou
- Key BioAI Synthetica Lab for Natural Product Drug Discovery, College of Bee, Biomedical and Pharmaceutical Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| | - Zixin Deng
- State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Jiangtao Gao
- Key BioAI Synthetica Lab for Natural Product Drug Discovery, College of Bee, Biomedical and Pharmaceutical Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| |
Collapse
|
4
|
Zhou Y, Liu Y, Sun H, Lu Y. Creating novel metabolic pathways by protein engineering for bioproduction. Trends Biotechnol 2025; 43:1094-1103. [PMID: 39632163 PMCID: PMC12064402 DOI: 10.1016/j.tibtech.2024.10.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 10/21/2024] [Accepted: 10/31/2024] [Indexed: 12/07/2024]
Abstract
A diverse array of natural products has been produced by cell biofactories through metabolic engineering, in which enzymes play essential roles in the complex metabolic network. However, the scope of such biotransformation can be limited by the capacities of natural enzymes. To broaden their scope, many natural enzymes have recently been engineered to activate non-native substrates and/or to employ new-to-nature reaction mechanisms, but most of these systems are only demonstrated for in vitro applications. To bridge the gap between in vitro and in vivo biocatalysis, we highlight recent progress in engineering enzymes with non-native substrates or new-to-nature mechanisms that have been successfully applied in living cells to create novel metabolic pathways.
Collapse
Affiliation(s)
- Yu Zhou
- Department of Chemistry, University of Texas at Austin, Austin, TX 78712, USA
| | - Yiwei Liu
- Department of Chemistry, University of Texas at Austin, Austin, TX 78712, USA
| | - Haoran Sun
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Yi Lu
- Department of Chemistry, University of Texas at Austin, Austin, TX 78712, USA; Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA.
| |
Collapse
|
5
|
Du BX, Yu H, Zhu B, Long Y, Wu M, Shi JY. A novel interpretability framework for enzyme turnover number prediction boosted by pre-trained enzyme embeddings and adaptive gate network. Methods 2025; 237:45-52. [PMID: 40021034 DOI: 10.1016/j.ymeth.2025.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 01/05/2025] [Accepted: 02/25/2025] [Indexed: 03/03/2025] Open
Abstract
It is a vital step to identify the enzyme turnover number (kcat) in synthetic biology and early-stage drug discovery. Recently, deep learning methods have achieved inspiring process to predict kcat with the development of multi-species enzyme-substrate pairs turnover number data. However, the performance of existing approaches still heavily depends on the effectiveness of feature extraction for enzymes and substrates, as well as the optimal fusion of these two types of features. Furthermore, it is essential to identify the key molecular substructures that significantly impact kcat prediction. To address these issues, we develop a novel end-to-end dual-representation interpretability framework GELKcat by harnessing graph transformers for substrate molecular encoding and CNNs for enzyme word2vec embeddings. We further integrate substrate and enzyme features using the adaptive gate network, which assigns optimal weights to capture the most suitable feature combinations. The comparison with several state-of-the-art methods demonstrates the superiority of our GELKcat and the ablation studies further illuminate the invaluable roles of three main components. Furthermore, case studies illustrate the interpretability of GELKcat by identifying the key functional groups in a substrate, which are significantly associated with turnover number. It is anticipated that this work can bridge current gaps in enzyme-substrate representation, which can give some guidance for drug discovery and synthetic biology.
Collapse
Affiliation(s)
- Bing-Xue Du
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China; Institute for Infocomm Research (I(2)R), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| | - Haoyang Yu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Bei Zhu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yahui Long
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore.
| | - Min Wu
- Institute for Infocomm Research (I(2)R), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
6
|
Upadhyay V, Li H, He J, Ocampo BE, Cook S, Zhao H, Maranas CD. Combining Chemical Catalysis with Enzymatic Steps for the Synthesis of the Artemisinin Precursor Dihydroartemisinic Acid. ACS Synth Biol 2025; 14:1112-1120. [PMID: 40105756 DOI: 10.1021/acssynbio.4c00707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
The supply of artemisinin, the primary antimalarial drug recommended by the World Health Organization (WHO), is limited due to synthesis cost and supply constraints. This study explores novel chemo-enzymatic pathways for the efficient synthesis of dihydroartemisinic acid (DHAA), the penultimate precursor to artemisinin. The key concept here is to leverage the seamless integration of chemical and enzymatic steps for more thoroughly exploring synthesis alternatives. Using novoStoic, a biosynthetic pathway design tool, we identified previously unexplored carbon- and energy-balanced pathways for converting amorpha-4,11-diene (AMPD) to DHAA. For some of the enzymatically catalyzed steps lacking efficient enzymes, chemical catalysis alternatives were proposed and implemented, leading to a hybrid chemo-enzymatic pathway design. The proposed pathway converts AMPD directly to DHAA without going through artemisinic acid (AA), making it a shorter pathway compared with the existing synthesis routes for artemisinin. This effort paves the way for the systematic design of chemo-enzymatic pathways and provides insight into decision strategies between chemical synthesis and enzymatic synthesis steps. It serves as an example of how synthesis pathway design tools can be integrated with human intuition for accelerating retrosynthesis and how AI-based tools can identify and replace human intuitions to automate the decision processes. This can help reduce human-machine interventions and improve the development of future tools for synthesis planning.
Collapse
Affiliation(s)
- Vikas Upadhyay
- Department of Chemical Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - Hongxiang Li
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61820, United States
| | - Jiachen He
- Department of Chemistry, Indiana University, 800 East Kirkwood Avenue, Bloomington, Indiana 47405-7102, United States
| | - Blake Edward Ocampo
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61820, United States
| | - Silas Cook
- Department of Chemistry, Indiana University, 800 East Kirkwood Avenue, Bloomington, Indiana 47405-7102, United States
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61820, United States
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61820, United States
| | - Costas D Maranas
- Department of Chemical Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, United States
| |
Collapse
|
7
|
Jurich C, Shao Q, Ran X, Yang ZJ. Physics-based modeling in the new era of enzyme engineering. NATURE COMPUTATIONAL SCIENCE 2025; 5:279-291. [PMID: 40275092 DOI: 10.1038/s43588-025-00788-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 03/04/2025] [Indexed: 04/26/2025]
Abstract
Enzyme engineering is entering a new era characterized by the integration of computational strategies. While bioinformatics and artificial intelligence methods have been extensively applied to accelerate the screening of function-enhancing mutants, physics-based modeling methods, such as molecular mechanics and quantum mechanics, are essential complements in many objectives. In this Perspective, we highlight how physics-based modeling will help the field of computational enzyme engineering reach its full potential by exploring current developments, unmet challenges and emerging opportunities for tool development.
Collapse
Affiliation(s)
| | - Qianzhen Shao
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
| | - Xinchun Ran
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA.
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA.
- The Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, TN, USA.
- Data Science Institute, Vanderbilt University, Nashville, TN, USA.
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, TN, USA.
| |
Collapse
|
8
|
Chen A, Peng X, Shen T, Zheng L, Wu D, Wang S. Discovery, design, and engineering of enzymes based on molecular retrobiosynthesis. MLIFE 2025; 4:107-125. [PMID: 40313979 PMCID: PMC12042125 DOI: 10.1002/mlf2.70009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 02/06/2025] [Accepted: 02/13/2025] [Indexed: 05/03/2025]
Abstract
Biosynthesis-a process utilizing biological systems to synthesize chemical compounds-has emerged as a revolutionary solution to 21st-century challenges due to its environmental sustainability, scalability, and high stereoselectivity and regioselectivity. Recent advancements in artificial intelligence (AI) are accelerating biosynthesis by enabling intelligent design, construction, and optimization of enzymatic reactions and biological systems. We first introduce the molecular retrosynthesis route planning in biochemical pathway design, including single-step retrosynthesis algorithms and AI-based chemical retrosynthesis route design tools. We highlight the advantages and challenges of large language models in addressing the sparsity of chemical data. Furthermore, we review enzyme discovery methods based on sequence and structure alignment techniques. Breakthroughs in AI-based structural prediction methods are expected to significantly improve the accuracy of enzyme discovery. We also summarize methods for de novo enzyme generation for nonnatural or orphan reactions, focusing on AI-based enzyme functional annotation and enzyme discovery techniques based on reaction or small molecule similarity. Turning to enzyme engineering, we discuss strategies to improve enzyme thermostability, solubility, and activity, as well as the applications of AI in these fields. The shift from traditional experiment-driven models to data-driven and computationally driven intelligent models is already underway. Finally, we present potential challenges and provide a perspective on future research directions. We envision expanded applications of biocatalysis in drug development, green chemistry, and complex molecule synthesis.
Collapse
Affiliation(s)
- Ancheng Chen
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | - Xiangda Peng
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | - Tao Shen
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | | | - Dong Wu
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| |
Collapse
|
9
|
Basnet BB, Zhou ZY, Wei B, Wang H. Advances in AI-based strategies and tools to facilitate natural product and drug development. Crit Rev Biotechnol 2025:1-32. [PMID: 40159111 DOI: 10.1080/07388551.2025.2478094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Revised: 02/11/2025] [Accepted: 02/16/2025] [Indexed: 04/02/2025]
Abstract
Natural products and their derivatives have been important for treating diseases in humans, animals, and plants. However, discovering new structures from natural sources is still challenging. In recent years, artificial intelligence (AI) has greatly aided the discovery and development of natural products and drugs. AI facilitates to: connect genetic data to chemical structures or vice-versa, repurpose known natural products, predict metabolic pathways, and design and optimize metabolites biosynthesis. More recently, the emergence and improvement in neural networks such as deep learning and ensemble automated web based bioinformatics platforms have sped up the discovery process. Meanwhile, AI also improves the identification and structure elucidation of unknown compounds from raw data like mass spectrometry and nuclear magnetic resonance. This article reviews these AI-driven methods and tools, highlighting their practical applications and guide for efficient natural product discovery and drug development.
Collapse
Affiliation(s)
- Buddha Bahadur Basnet
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
- Central Department of Biotechnology, Tribhuvan University, Kathmandu, Nepal
| | - Zhen-Yi Zhou
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
| | - Bin Wei
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
| | - Hong Wang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
- Key Laboratory of Marine Fishery Resources Exploitment, Utilization of Zhejiang Province, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
10
|
Cao Y, Zhang T, Zhao X, Li H. HiRXN: Hierarchical Attention-Based Representation Learning for Chemical Reaction. J Chem Inf Model 2025; 65:1990-2002. [PMID: 39901569 DOI: 10.1021/acs.jcim.4c01787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2025]
Abstract
In recent years, natural language processing (NLP) techniques, including large language modeling (LLM), have contributed significantly to advancements in organic chemistry research. Chemical reaction representations provide a link between NLP models and chemistry prediction tasks and enable the translation of complex chemical processes into a format that NLP models can understand and learn from. However, previous representation methods fail to adequately consider the hierarchical and structural information inherent in chemical reactions. Here, we propose a tool named HiRXN to learn the comprehensive representation of chemical reactions based on their hierarchical structure. In order to significantly enhance feature engineering for machine learning (ML) models, HiRXN develops an effective tokenization method called RXNTokenizer to capture atomic microenvironment features with multiradius. Then, the hierarchical attention network is used to integrate information from atomic microenvironment-level and molecule-level to accurately understand chemical reactions. The experimental results show that HiRXN is capable of representing chemical reactions and achieves remarkable performance in terms of reaction regression and classification prediction tasks. A web server has been developed to provide a specialized service that accepts Reaction SMILES as input and provides predicted results. The Web site is accessible at http://bdatju.com.
Collapse
Affiliation(s)
- Yahui Cao
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
| | - Tao Zhang
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
| | - Xin Zhao
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
| | - Haotong Li
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
| |
Collapse
|
11
|
Long L, Li R, Zhang J. Artificial Intelligence in Retrosynthesis Prediction and its Applications in Medicinal Chemistry. J Med Chem 2025; 68:2333-2355. [PMID: 39883477 DOI: 10.1021/acs.jmedchem.4c02749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
Retrosynthesis is a strategy to analyze the synthetic routes for target molecules in medicinal chemistry. However, traditional retrosynthesis predictions performed by chemists and rule-based expert systems struggle to adapt to the vast chemical space of real-world scenarios. Artificial intelligence (AI) has revolutionized retrosynthesis prediction in recent decades, significantly increasing the accuracy and diversity of predictions for target compounds. Single-step AI-driven retrosynthesis models can be generalized into three types based on their dependence on predefined reaction templates (template-based, semitemplate-based methods, template-free models), with respective advantages and limitations, and common challenges that limit their medicinal chemistry applications. Moreover, there are relatively inadequate multi-step retrosynthesis methods, which lack strong links with single-step methods. Herein, we review the recent advancements in AI applications for retrosynthesis prediction by summarizing related techniques and the landscape of current representative retrosynthesis models and propose feasible solutions to tackle existing problems and outline future directions in this field.
Collapse
Affiliation(s)
- Lanxin Long
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Rui Li
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Jian Zhang
- Medicinal Chemistry and Bioinformatics Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- Key Laboratory of Protection, Development, and Utilization of Medicinal Resources in Liupanshan Area, Ministry of Education, Peptides & Protein Drug Research Center, School of Pharmacy, Ningxia Medical University, Yinchuan 750004, China
| |
Collapse
|
12
|
Li H, Liu X, Jiang G, Zhao H. Chemoenzymatic Synthesis Planning Guided by Reaction Type Score. J Chem Inf Model 2024; 64:9240-9248. [PMID: 39648592 DOI: 10.1021/acs.jcim.4c01525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/10/2024]
Abstract
Thanks to the growing interest in computer-aided synthesis planning (CASP), a wide variety of retrosynthesis and retrobiosynthesis tools have been developed in the past decades. However, synthesis planning tools for multistep chemoenzymatic reactions are still rare despite the widespread use of enzymatic reactions in chemical synthesis. Herein, we report a reaction type score (RTscore)-guided chemoenzymatic synthesis planning (RTS-CESP) strategy. Briefly, the RTscore is trained using a text-based convolutional neural network (TextCNN) to distinguish synthesis reactions from decomposition reactions and evaluate synthesis efficiency. Once multiple chemical synthesis routes are generated by a retrosynthesis tool for a target molecule, RTscore is used to rank them and find the step(s) that can be replaced by enzymatic reactions to improve synthesis efficiency. As proof of concept, RTS-CESP was applied to 10 molecules with known chemoenzymatic synthesis routes in the literature and was able to predict all of them with six being the top-ranked routes. Moreover, RTS-CESP was employed for 1000 molecules in the boutique database and was able to predict the chemoenzymatic synthesis routes for 554 molecules, outperforming ASKCOS, a state-of-the-art chemoenzymatic synthesis planning tool. Finally, RTS-CESP was used to design a new chemoenzymatic synthesis route for the FDA-approved drug Alclofenac, which was shorter than the literature-reported route and has been experimentally validated.
Collapse
Affiliation(s)
- Hongxiang Li
- NSF Molecule Maker Lab Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Xuan Liu
- NSF Molecule Maker Lab Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Guangde Jiang
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Huimin Zhao
- NSF Molecule Maker Lab Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
13
|
Zeng T, Li J, Wu R. Natural product databases for drug discovery: Features and applications. PHARMACEUTICAL SCIENCE ADVANCES 2024; 2:100050. [DOI: 10.1016/j.pscia.2024.100050] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
|
14
|
Yang Z, Shi A, Zhang R, Ji Z, Li J, Lyu J, Qian J, Chen T, Wang X, You F, Xie J. When Metal Nanoclusters Meet Smart Synthesis. ACS NANO 2024; 18:27138-27166. [PMID: 39316700 DOI: 10.1021/acsnano.4c09597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
Atomically precise metal nanoclusters (MNCs) represent a fascinating class of ultrasmall nanoparticles with molecule-like properties, bridging conventional metal-ligand complexes and nanocrystals. Despite their potential for various applications, synthesis challenges such as a precise understanding of varied synthetic parameters and property-driven synthesis persist, hindering their full exploitation and wider application. Incorporating smart synthesis methodologies, including a closed-loop framework of automation, data interpretation, and feedback from AI, offers promising solutions to address these challenges. In this perspective, we summarize the closed-loop smart synthesis that has been demonstrated in various nanomaterials and explore the research frontiers of smart synthesis for MNCs. Moreover, the perspectives on the inherent challenges and opportunities of smart synthesis for MNCs are discussed, aiming to provide insights and directions for future advancements in this emerging field of AI for Science, while the integration of deep learning algorithms stands to substantially enrich research in smart synthesis by offering enhanced predictive capabilities, optimization strategies, and control mechanisms, thereby extending the potential of MNC synthesis.
Collapse
Affiliation(s)
- Zhucheng Yang
- Joint School of National University of Singapore and Tianjin University, International Campus of Tianjin University, Fuzhou 350207, P. R. China
- Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117585, Singapore
| | - Anye Shi
- Systems Engineering, College of Engineering, Cornell University, Ithaca, New York 14583, United States
| | - Ruixuan Zhang
- Joint School of National University of Singapore and Tianjin University, International Campus of Tianjin University, Fuzhou 350207, P. R. China
- Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117585, Singapore
| | - Zuowei Ji
- School of Humanities and Social Sciences, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, P. R. China
| | - Jiali Li
- Department of Chemistry, National University of Singapore, Singapore 117543, Singapore
| | - Jingkuan Lyu
- Joint School of National University of Singapore and Tianjin University, International Campus of Tianjin University, Fuzhou 350207, P. R. China
- Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117585, Singapore
| | - Jing Qian
- Joint School of National University of Singapore and Tianjin University, International Campus of Tianjin University, Fuzhou 350207, P. R. China
- Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117585, Singapore
| | - Tiankai Chen
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, P. R. China
| | - Xiaonan Wang
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, P. R. China
| | - Fengqi You
- Systems Engineering, College of Engineering, Cornell University, Ithaca, New York 14583, United States
- Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York 14853, United States
- Cornell University AI for Science Institute (CUAISci), Cornell University, Ithaca, New York 14853, United States
| | - Jianping Xie
- Joint School of National University of Singapore and Tianjin University, International Campus of Tianjin University, Fuzhou 350207, P. R. China
- Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117585, Singapore
| |
Collapse
|
15
|
Xie X, Gui L, Qiao B, Wang G, Huang S, Zhao Y, Sun S. Deep learning in template-free de novo biosynthetic pathway design of natural products. Brief Bioinform 2024; 25:bbae495. [PMID: 39373052 PMCID: PMC11456888 DOI: 10.1093/bib/bbae495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 09/12/2024] [Accepted: 09/20/2024] [Indexed: 10/08/2024] Open
Abstract
Natural products (NPs) are indispensable in drug development, particularly in combating infections, cancer, and neurodegenerative diseases. However, their limited availability poses significant challenges. Template-free de novo biosynthetic pathway design provides a strategic solution for NP production, with deep learning standing out as a powerful tool in this domain. This review delves into state-of-the-art deep learning algorithms in NP biosynthesis pathway design. It provides an in-depth discussion of databases like Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and UniProt, which are essential for model training, along with chemical databases such as Reaxys, SciFinder, and PubChem for transfer learning to expand models' understanding of the broader chemical space. It evaluates the potential and challenges of sequence-to-sequence and graph-to-graph translation models for accurate single-step prediction. Additionally, it discusses search algorithms for multistep prediction and deep learning algorithms for predicting enzyme function. The review also highlights the pivotal role of deep learning in improving catalytic efficiency through enzyme engineering, which is essential for enhancing NP production. Moreover, it examines the application of large language models in pathway design, enzyme discovery, and enzyme engineering. Finally, it addresses the challenges and prospects associated with template-free approaches, offering insights into potential advancements in NP biosynthesis pathway design.
Collapse
Affiliation(s)
- Xueying Xie
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), No. 26 Hexing Road, Xiangfang District, Harbin 150001, China
- College of Life Science, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Lin Gui
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Baixue Qiao
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), No. 26 Hexing Road, Xiangfang District, Harbin 150001, China
- College of Life Science, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Shan Huang
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, No. 246 Xuefu Road, Nangang District,Harbin 150081, China
| | - Yuming Zhao
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Shanwen Sun
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), No. 26 Hexing Road, Xiangfang District, Harbin 150001, China
- College of Life Science, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| |
Collapse
|
16
|
Foldi J, Connolly JA, Takano E, Breitling R. Synthetic Biology of Natural Products Engineering: Recent Advances Across the Discover-Design-Build-Test-Learn Cycle. ACS Synth Biol 2024; 13:2684-2692. [PMID: 39163395 PMCID: PMC11421215 DOI: 10.1021/acssynbio.4c00391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/09/2024] [Accepted: 08/09/2024] [Indexed: 08/22/2024]
Abstract
Advances in genome engineering and associated technologies have reinvigorated natural products research. Here we highlight the latest developments in the field across the discover-design-build-test-learn cycle of bioengineering, from recent progress in computational tools for AI-supported genome mining, enzyme and pathway engineering, and compound identification to novel host systems and new techniques for improving production levels, and place these trends in the context of responsible research and innovation, emphasizing the importance of anticipatory analysis at the early stages of process development.
Collapse
Affiliation(s)
| | | | - Eriko Takano
- Manchester Institute of Biotechnology,
Department of Chemistry, School of Natural Sciences, Faculty of Science
and Engineering, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Rainer Breitling
- Manchester Institute of Biotechnology,
Department of Chemistry, School of Natural Sciences, Faculty of Science
and Engineering, University of Manchester, Manchester M1 7DN, United Kingdom
| |
Collapse
|
17
|
Hollmann F, Sanchis J, Reetz MT. Learning from Protein Engineering by Deconvolution of Multi-Mutational Variants. Angew Chem Int Ed Engl 2024; 63:e202404880. [PMID: 38884594 DOI: 10.1002/anie.202404880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/05/2024] [Accepted: 06/06/2024] [Indexed: 06/18/2024]
Abstract
This review analyzes a development in biochemistry, enzymology and biotechnology that originally came as a surprise. Following the establishment of directed evolution of stereoselective enzymes in organic chemistry, the concept of partial or complete deconvolution of selective multi-mutational variants was introduced. Early deconvolution experiments of stereoselective variants led to the finding that mutations can interact cooperatively or antagonistically with one another, not just additively. During the past decade, this phenomenon was shown to be general. In some studies, molecular dynamics (MD) and quantum mechanics/molecular mechanics (QM/MM) computations were performed in order to shed light on the origin of non-additivity at all stages of an evolutionary upward climb. Data of complete deconvolution can be used to construct unique multi-dimensional rugged fitness pathway landscapes, which provide mechanistic insights different from traditional fitness landscapes. Along a related line, biochemists have long tested the result of introducing two point mutations in an enzyme for mechanistic reasons, followed by a comparison of the respective double mutant in so-called double mutant cycles, which originally showed only additive effects, but more recently also uncovered cooperative and antagonistic non-additive effects. We conclude with suggestions for future work, and call for a unified overall picture of non-additivity and epistasis.
Collapse
Affiliation(s)
- Frank Hollmann
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629HZ, Delft, Netherlands
| | - Joaquin Sanchis
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, 3052, Australia
| | - Manfred T Reetz
- Max-Plank-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45481, Mülheim, Germany
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
| |
Collapse
|
18
|
Gong X, Zhang J, Gan Q, Teng Y, Hou J, Lyu Y, Liu Z, Wu Z, Dai R, Zou Y, Wang X, Zhu D, Zhu H, Liu T, Yan Y. Advancing microbial production through artificial intelligence-aided biology. Biotechnol Adv 2024; 74:108399. [PMID: 38925317 DOI: 10.1016/j.biotechadv.2024.108399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/20/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
Microbial cell factories (MCFs) have been leveraged to construct sustainable platforms for value-added compound production. To optimize metabolism and reach optimal productivity, synthetic biology has developed various genetic devices to engineer microbial systems by gene editing, high-throughput protein engineering, and dynamic regulation. However, current synthetic biology methodologies still rely heavily on manual design, laborious testing, and exhaustive analysis. The emerging interdisciplinary field of artificial intelligence (AI) and biology has become pivotal in addressing the remaining challenges. AI-aided microbial production harnesses the power of processing, learning, and predicting vast amounts of biological data within seconds, providing outputs with high probability. With well-trained AI models, the conventional Design-Build-Test (DBT) cycle has been transformed into a multidimensional Design-Build-Test-Learn-Predict (DBTLP) workflow, leading to significantly improved operational efficiency and reduced labor consumption. Here, we comprehensively review the main components and recent advances in AI-aided microbial production, focusing on genome annotation, AI-aided protein engineering, artificial functional protein design, and AI-enabled pathway prediction. Finally, we discuss the challenges of integrating novel AI techniques into biology and propose the potential of large language models (LLMs) in advancing microbial production.
Collapse
Affiliation(s)
- Xinyu Gong
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jianli Zhang
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Qi Gan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Yuxi Teng
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jixin Hou
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Yanjun Lyu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Zhengliang Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Zihao Wu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Runpeng Dai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yusong Zou
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Xianqiao Wang
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Dajiang Zhu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tianming Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Yajun Yan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA.
| |
Collapse
|
19
|
Lu H, Xiao L, Liao W, Yan X, Nielsen J. Cell factory design with advanced metabolic modelling empowered by artificial intelligence. Metab Eng 2024; 85:61-72. [PMID: 39038602 DOI: 10.1016/j.ymben.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/06/2024] [Accepted: 07/06/2024] [Indexed: 07/24/2024]
Abstract
Advances in synthetic biology and artificial intelligence (AI) have provided new opportunities for modern biotechnology. High-performance cell factories, the backbone of industrial biotechnology, are ultimately responsible for determining whether a bio-based product succeeds or fails in the fierce competition with petroleum-based products. To date, one of the greatest challenges in synthetic biology is the creation of high-performance cell factories in a consistent and efficient manner. As so-called white-box models, numerous metabolic network models have been developed and used in computational strain design. Moreover, great progress has been made in AI-powered strain engineering in recent years. Both approaches have advantages and disadvantages. Therefore, the deep integration of AI with metabolic models is crucial for the construction of superior cell factories with higher titres, yields and production rates. The detailed applications of the latest advanced metabolic models and AI in computational strain design are summarized in this review. Additionally, approaches for the deep integration of AI and metabolic models are discussed. It is anticipated that advanced mechanistic metabolic models powered by AI will pave the way for the efficient construction of powerful industrial chassis strains in the coming years.
Collapse
Affiliation(s)
- Hongzhong Lu
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China.
| | - Luchi Xiao
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Wenbin Liao
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China; Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, PR China
| | - Xuefeng Yan
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, PR China
| | - Jens Nielsen
- BioInnovation Institute, Ole Måløes Vej, DK2200, Copenhagen N, Denmark; Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden.
| |
Collapse
|
20
|
Kim T, Lee S, Kwak Y, Choi MS, Park J, Hwang SJ, Kim SG. READRetro: natural product biosynthesis predicting with retrieval-augmented dual-view retrosynthesis. THE NEW PHYTOLOGIST 2024; 243:2512-2527. [PMID: 39081009 DOI: 10.1111/nph.20012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 07/08/2024] [Indexed: 08/23/2024]
Abstract
Plants, as a sessile organism, produce various secondary metabolites to interact with the environment. These chemicals have fascinated the plant science community because of their ecological significance and notable biological activity. However, predicting the complete biosynthetic pathways from target molecules to metabolic building blocks remains a challenge. Here, we propose retrieval-augmented dual-view retrosynthesis (READRetro) as a practical bio-retrosynthesis tool to predict the biosynthetic pathways of plant natural products. Conventional bio-retrosynthesis models have been limited in their ability to predict biosynthetic pathways for natural products. READRetro was optimized for the prediction of complex metabolic pathways by incorporating cutting-edge deep learning architectures, an ensemble approach, and two retrievers. Evaluation of single- and multi-step retrosynthesis showed that each component of READRetro significantly improved its ability to predict biosynthetic pathways. READRetro was also able to propose the known pathways of secondary metabolites such as monoterpene indole alkaloids and the unknown pathway of menisdaurilide, demonstrating its applicability to real-world bio-retrosynthesis of plant natural products. For researchers interested in the biosynthesis and production of secondary metabolites, a user-friendly website (https://readretro.net) and the open-source code of READRetro have been made available.
Collapse
Affiliation(s)
- Taein Kim
- Department of Biological Sciences, KAIST, Daejeon, 34141, Korea
| | - Seul Lee
- Kim Jaechul Graduate School of AI, KAIST, Daejeon, 34141, Korea
| | - Yejin Kwak
- Department of BioMedical Convergence Engineering, Pusan National University, Yangsan, 50612, Korea
| | - Min-Soo Choi
- Department of Biological Sciences, KAIST, Daejeon, 34141, Korea
| | - Jeongbin Park
- Department of BioMedical Convergence Engineering, Pusan National University, Yangsan, 50612, Korea
| | - Sung Ju Hwang
- Kim Jaechul Graduate School of AI, KAIST, Daejeon, 34141, Korea
- School of Computing, KAIST, Daejeon, 34141, Korea
| | - Sang-Gyu Kim
- Department of Biological Sciences, KAIST, Daejeon, 34141, Korea
| |
Collapse
|
21
|
Gricourt G, Meyer P, Duigou T, Faulon JL. Artificial Intelligence Methods and Models for Retro-Biosynthesis: A Scoping Review. ACS Synth Biol 2024; 13:2276-2294. [PMID: 39047143 PMCID: PMC11334239 DOI: 10.1021/acssynbio.4c00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 06/14/2024] [Accepted: 06/14/2024] [Indexed: 07/27/2024]
Abstract
Retrosynthesis aims to efficiently plan the synthesis of desirable chemicals by strategically breaking down molecules into readily available building block compounds. Having a long history in chemistry, retro-biosynthesis has also been used in the fields of biocatalysis and synthetic biology. Artificial intelligence (AI) is driving us toward new frontiers in synthesis planning and the exploration of chemical spaces, arriving at an opportune moment for promoting bioproduction that would better align with green chemistry, enhancing environmental practices. In this review, we summarize the recent advancements in the application of AI methods and models for retrosynthetic and retro-biosynthetic pathway design. These techniques can be based either on reaction templates or generative models and require scoring functions and planning strategies to navigate through the retrosynthetic graph of possibilities. We finally discuss limitations and promising research directions in this field.
Collapse
Affiliation(s)
- Guillaume Gricourt
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
| | - Philippe Meyer
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
| | - Thomas Duigou
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
| | - Jean-Loup Faulon
- Université
Paris-Saclay, INRAE, AgroParisTech, Micalis
Institute, 78350 Jouy-en-Josas, France
- The
University of Manchester, Manchester Institute
of Biotechnology, Manchester M1 7DN, U.K.
| |
Collapse
|
22
|
Song Y, Prather KLJ. Strategies in engineering sustainable biochemical synthesis through microbial systems. Curr Opin Chem Biol 2024; 81:102493. [PMID: 38971129 DOI: 10.1016/j.cbpa.2024.102493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/24/2024] [Accepted: 06/05/2024] [Indexed: 07/08/2024]
Abstract
Growing environmental concerns and the urgency to address climate change have increased demand for the development of sustainable alternatives to fossil-derived fuels and chemicals. Microbial systems, possessing inherent biosynthetic capabilities, present a promising approach for achieving this goal. This review discusses the coupling of systems and synthetic biology to enable the elucidation and manipulation of microbial phenotypes for the production of chemicals that can substitute for petroleum-derived counterparts and contribute to advancing green biotechnology. The integration of artificial intelligence with metabolic engineering to facilitate precise and data-driven design of biosynthetic pathways is also discussed, along with the identification of current limitations and proposition of strategies for optimizing biosystems, thereby propelling the field of chemical biology towards sustainable chemical production.
Collapse
Affiliation(s)
- Yoseb Song
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Kristala L J Prather
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
23
|
Zeng T, Jin Z, Zheng S, Yu T, Wu R. Developing BioNavi for Hybrid Retrosynthesis Planning. JACS AU 2024; 4:2492-2502. [PMID: 39055138 PMCID: PMC11267531 DOI: 10.1021/jacsau.4c00228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/18/2024] [Accepted: 06/20/2024] [Indexed: 07/27/2024]
Abstract
Illuminating synthetic pathways is essential for producing valuable chemicals, such as bioactive molecules. Chemical and biological syntheses are crucial, and their integration often leads to more efficient and sustainable pathways. Despite the rapid development of retrosynthesis models, few of them consider both chemical and biological syntheses, hindering the pathway design for high-value chemicals. Here, we propose BioNavi by innovating multitask learning and reaction templates into the deep learning-driven model to design hybrid synthesis pathways in a more interpretable manner. BioNavi outperforms existing approaches on different data sets, achieving a 75% hit rate in replicating reported biosynthetic pathways and displaying superior ability in designing hybrid synthesis pathways. Additional case studies further illustrate the potential application of BioNavi in a de novo pathway design. The enhanced web server (http://biopathnavi.qmclab.com/bionavi/) simplifies input operations and implements step-by-step exploration according to user experience. We show that BioNavi is a handy navigator for designing synthetic pathways for various chemicals.
Collapse
Affiliation(s)
- Tao Zeng
- School
of Pharmaceutical Sciences, Sun Yat-sen
University, Guangzhou 510006, P. R. China
| | - Zhehao Jin
- Center
for Synthetic Biochemistry, CAS Key Laboratory of Quantitative Engineering
Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
(CAS), Shenzhen 518055, P. R. China
| | - Shuangjia Zheng
- Global
Institute of Future Technology, Shanghai
Jiao Tong University, Shanghai 200240, P. R. China
| | - Tao Yu
- Center
for Synthetic Biochemistry, CAS Key Laboratory of Quantitative Engineering
Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
(CAS), Shenzhen 518055, P. R. China
| | - Ruibo Wu
- School
of Pharmaceutical Sciences, Sun Yat-sen
University, Guangzhou 510006, P. R. China
| |
Collapse
|
24
|
Shi Z, Wang D, Li Y, Deng R, Lin J, Liu C, Li H, Wang R, Zhao M, Mao Z, Yuan Q, Liao X, Ma H. REME: an integrated platform for reaction enzyme mining and evaluation. Nucleic Acids Res 2024; 52:W299-W305. [PMID: 38769057 PMCID: PMC11223788 DOI: 10.1093/nar/gkae405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 04/16/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024] Open
Abstract
A key challenge in pathway design is finding proper enzymes that can be engineered to catalyze a non-natural reaction. Although existing tools can identify potential enzymes based on similar reactions, these tools encounter several issues. Firstly, the calculated similar reactions may not even have the same reaction type. Secondly, the associated enzymes are often numerous and identifying the most promising candidate enzymes is difficult due to the lack of data for evaluation. Thirdly, existing web tools do not provide interactive functions that enable users to fine-tune results based on their expertise. Here, we present REME (https://reme.biodesign.ac.cn/), the first integrated web platform for reaction enzyme mining and evaluation. Combining atom-to-atom mapping, atom type change identification, and reaction similarity calculation enables quick ranking and visualization of reactions similar to an objective non-natural reaction. Additional functionality enables users to filter similar reactions by their specified functional groups and candidate enzymes can be further filtered (e.g. by organisms) or expanded by Enzyme Commission number (EC) or sequence homology. Afterward, enzyme attributes (such as kcat, Km, optimal temperature and pH) can be assessed with deep learning-based methods, facilitating the swift identification of potential enzymes that can catalyze the non-natural reaction.
Collapse
Affiliation(s)
- Zhenkun Shi
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Dehang Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Yang Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- University of Chinese Academy of Sciences, Beijing 101408, PR China
| | - Rui Deng
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Jiawei Lin
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Cui Liu
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Haoran Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Ruoyu Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Muqiang Zhao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Zhitao Mao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Qianqian Yuan
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Xiaoping Liao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- Haihe Laboratory of Synthetic Biology, Tianjin 300308, PR China
| | - Hongwu Ma
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| |
Collapse
|
25
|
Han Y, Zhang H, Zeng Z, Liu Z, Lu D, Liu Z. Descriptor-augmented machine learning for enzyme-chemical interaction predictions. Synth Syst Biotechnol 2024; 9:259-268. [PMID: 38450325 PMCID: PMC10915406 DOI: 10.1016/j.synbio.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/08/2024] Open
Abstract
Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals, as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective. This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction. We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation. The influence of protein and chemical descriptors was assessed in three scenarios, which were predicting the activity of unknown relations between known enzymes and known chemicals (new relationship evaluation), predicting the activity of novel enzymes on known chemicals (new enzyme evaluation), and predicting the activity of new chemicals on known enzymes (new chemical evaluation). The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes, whereas chemical descriptors appear no effect. A variety of sequence-based and structure-based protein descriptors were constructed, among which the esm-2 descriptor achieved the best results. Using enzyme families as labels showed that descriptors could cluster proteins well, which could explain the contributions of descriptors to the machine learning model. As a counterpart, in the new chemical evaluation, chemical descriptors made significant improvement in four out of the seven datasets, while protein descriptors appear no effect. We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models. The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy. This work provides guidance for the development of machine learning models for specific enzyme families.
Collapse
Affiliation(s)
- Yilei Han
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Haoye Zhang
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Zheni Zeng
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Zhiyuan Liu
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Diannan Lu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Zheng Liu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
26
|
Orsi E, Schada von Borzyskowski L, Noack S, Nikel PI, Lindner SN. Automated in vivo enzyme engineering accelerates biocatalyst optimization. Nat Commun 2024; 15:3447. [PMID: 38658554 PMCID: PMC11043082 DOI: 10.1038/s41467-024-46574-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 03/04/2024] [Indexed: 04/26/2024] Open
Abstract
Achieving cost-competitive bio-based processes requires development of stable and selective biocatalysts. Their realization through in vitro enzyme characterization and engineering is mostly low throughput and labor-intensive. Therefore, strategies for increasing throughput while diminishing manual labor are gaining momentum, such as in vivo screening and evolution campaigns. Computational tools like machine learning further support enzyme engineering efforts by widening the explorable design space. Here, we propose an integrated solution to enzyme engineering challenges whereby ML-guided, automated workflows (including library generation, implementation of hypermutation systems, adapted laboratory evolution, and in vivo growth-coupled selection) could be realized to accelerate pipelines towards superior biocatalysts.
Collapse
Affiliation(s)
- Enrico Orsi
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
| | | | - Stephan Noack
- Institute of Bio- and Geosciences, IBG-1: Biotechnology, Forschungszentrum Jülich, 52425, Jülich, Germany
| | - Pablo I Nikel
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
| | - Steffen N Lindner
- Max Planck Institute of Molecular Plant Physiology, 14476, Potsdam-Golm, Germany.
- Department of Biochemistry, Charité Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität, 10117, Berlin, Germany.
| |
Collapse
|
27
|
Chang J, Fan X, Tian B. DeepP450: Predicting Human P450 Activities of Small Molecules by Integrating Pretrained Protein Language Model and Molecular Representation. J Chem Inf Model 2024; 64:3149-3160. [PMID: 38587937 DOI: 10.1021/acs.jcim.4c00115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Cytochrome P450 enzymes (CYPs) play a crucial role in Phase I drug metabolism in the human body, and CYP activity toward compounds can significantly affect druggability, making early prediction of CYP activity and substrate identification essential for therapeutic development. Here, we established a deep learning model for assessing potential CYP substrates, DeepP450, by fine-tuning protein and molecule pretrained models through feature integration with cross-attention and self-attention layers. This model exhibited high prediction accuracy (0.92) on the test set, with area under the receiver operating characteristic curve (AUROC) values ranging from 0.89 to 0.98 in substrate/nonsubstrate predictions across the nine major human CYPs, surpassing current benchmarks for CYP activity prediction. Notably, DeepP450 uses only one model to predict substrates/nonsubstrates for any of the nine CYPs and exhibits certain generalizability on novel compounds and different categories of human CYPs, which could greatly facilitate early stage drug design by avoiding CYP-reactive compounds.
Collapse
Affiliation(s)
- Jiamin Chang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Xiaoyu Fan
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
28
|
Yang J, Li FZ, Arnold FH. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering. ACS CENTRAL SCIENCE 2024; 10:226-241. [PMID: 38435522 PMCID: PMC10906252 DOI: 10.1021/acscentsci.3c01275] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/26/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024]
Abstract
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
Collapse
Affiliation(s)
- Jason Yang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Frances H. Arnold
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
29
|
van Sluijs B, Zhou T, Helwig B, Baltussen MG, Nelissen FHT, Heus HA, Huck WTS. Iterative design of training data to control intricate enzymatic reaction networks. Nat Commun 2024; 15:1602. [PMID: 38383500 PMCID: PMC10881569 DOI: 10.1038/s41467-024-45886-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 02/06/2024] [Indexed: 02/23/2024] Open
Abstract
Kinetic modeling of in vitro enzymatic reaction networks is vital to understand and control the complex behaviors emerging from the nonlinear interactions inside. However, modeling is severely hampered by the lack of training data. Here, we introduce a methodology that combines an active learning-like approach and flow chemistry to efficiently create optimized datasets for a highly interconnected enzymatic reactions network with multiple sub-pathways. The optimal experimental design (OED) algorithm designs a sequence of out-of-equilibrium perturbations to maximize the information about the reaction kinetics, yielding a descriptive model that allows control of the output of the network towards any cost function. We experimentally validate the model by forcing the network to produce different product ratios while maintaining a minimum level of overall conversion efficiency. Our workflow scales with the complexity of the system and enables the optimization of previously unobtainable network outputs.
Collapse
Affiliation(s)
- Bob van Sluijs
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands
| | - Tao Zhou
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands.
| | - Britta Helwig
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands
| | - Mathieu G Baltussen
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands
| | - Frank H T Nelissen
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands
| | - Hans A Heus
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands
| | - Wilhelm T S Huck
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands.
| |
Collapse
|
30
|
Zhang D, Wang Z, Oberschelp C, Bradford E, Hellweg S. Enhanced Deep-Learning Model for Carbon Footprints of Chemicals. ACS SUSTAINABLE CHEMISTRY & ENGINEERING 2024; 12:2700-2708. [PMID: 38389904 PMCID: PMC10880087 DOI: 10.1021/acssuschemeng.3c07038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/17/2024] [Accepted: 01/17/2024] [Indexed: 02/24/2024]
Abstract
Millions of chemicals have been designed; however, their product carbon footprints (PCFs) are largely unknown, leaving questions about their sustainability. This general lack of PCF data is because the data needed for comprehensive environmental analyses are typically not available in the early molecular design stages. Several predictive tools have been developed to estimate the PCF of chemicals, which are applicable to only a narrow range of common chemicals and have limited predictive ability. Here, we propose FineChem 2, which is based on a novel transformer framework and first-hand industry data, for accurately predicting the PCF of chemicals. Compared to previous tools, FineChem 2 demonstrates significantly better predictive power, and its applicability domains are improved by ∼75% on a diverse set of chemicals on the global market, including the high-production-volume chemicals identified by regulators, daily chemicals, and chemical additives in food and plastics. In addition, through better interpretability from the attention mechanism, FineChem 2 may successfully identify PCF-intensive substructures and critical raw materials of chemicals, providing insights into the design of more sustainable molecules and processes. Therefore, we highlight FineChem 2 for estimating the PCF of chemicals, contributing to advancements in the sustainable transition of the global chemical industry.
Collapse
Affiliation(s)
- Dachuan Zhang
- National
Centre of Competence in Research (NCCR) Catalysis, Ecological Systems
Design, Institute of Environmental Engineering, ETH Zürich, Zürich 8093, Switzerland
| | - Zhanyun Wang
- National
Centre of Competence in Research (NCCR) Catalysis, Ecological Systems
Design, Institute of Environmental Engineering, ETH Zürich, Zürich 8093, Switzerland
- Technology
and Society Laboratory, Empa-Swiss Federal
Laboratories for Materials Science and Technology, St. Gallen CH-9014, Switzerland
| | - Christopher Oberschelp
- National
Centre of Competence in Research (NCCR) Catalysis, Ecological Systems
Design, Institute of Environmental Engineering, ETH Zürich, Zürich 8093, Switzerland
| | - Eric Bradford
- National
Centre of Competence in Research (NCCR) Catalysis, Ecological Systems
Design, Institute of Environmental Engineering, ETH Zürich, Zürich 8093, Switzerland
| | - Stefanie Hellweg
- National
Centre of Competence in Research (NCCR) Catalysis, Ecological Systems
Design, Institute of Environmental Engineering, ETH Zürich, Zürich 8093, Switzerland
| |
Collapse
|
31
|
Qin Z, Zhou Y, Li Z, Höhne M, Bornscheuer UT, Wu S. Production of Biobased Ethylbenzene by Cascade Biocatalysis with an Engineered Photodecarboxylase. Angew Chem Int Ed Engl 2024; 63:e202314566. [PMID: 37947487 DOI: 10.1002/anie.202314566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/06/2023] [Accepted: 11/10/2023] [Indexed: 11/12/2023]
Abstract
Production of commodity chemicals, such as benzene, toluene, ethylbenzene, and xylenes (BTEX), from renewable resources is key for a sustainable society. Biocatalysis enables one-pot multistep transformation of bioresources under mild conditions, yet it is often limited to biochemicals. Herein, we developed a non-natural three-enzyme cascade for one-pot conversion of biobased l-phenylalanine into ethylbenzene. The key rate-limiting photodecarboxylase was subjected to structure-guided semirational engineering, and a triple mutant CvFAP(Y466T/P460A/G462I) was obtained with a 6.3-fold higher productivity. With this improved photodecarboxylase, an optimized two-cell sequential process was developed to convert l-phenylalanine into ethylbenzene with 82 % conversion. The cascade reaction was integrated with fermentation to achieve the one-pot bioproduction of ethylbenzene from biobased glycerol, demonstrating the potential of cascade biocatalysis plus enzyme engineering for the production of biobased commodity chemicals.
Collapse
Affiliation(s)
- Zhaoyang Qin
- National Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Street, Wuhan, 430070, P. R. China
| | - Yi Zhou
- National Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Street, Wuhan, 430070, P. R. China
| | - Zhi Li
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, Singapore, 117585, Singapore
| | - Matthias Höhne
- Institute of Chemistry, Technische Universität Berlin, Müller-Breslau-Str. 10, 10623, Berlin, Germany
| | - Uwe T Bornscheuer
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix Hausdorff-Str. 4, 17489, Greifswald, Germany
| | - Shuke Wu
- National Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Street, Wuhan, 430070, P. R. China
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix Hausdorff-Str. 4, 17489, Greifswald, Germany
| |
Collapse
|
32
|
Su Y, Mangus AM, Cordell WT, Pfleger BF. Overcoming barriers to medium-chain fatty alcohol production. Curr Opin Biotechnol 2024; 85:103063. [PMID: 38219523 PMCID: PMC10922944 DOI: 10.1016/j.copbio.2023.103063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/19/2023] [Accepted: 12/20/2023] [Indexed: 01/16/2024]
Abstract
Medium-chain fatty alcohols (mcFaOHs) are aliphatic primary alcohols containing six to twelve carbons that are widely used in materials, pharmaceuticals, and cosmetics. Microbial biosynthesis has been touted as a route to less-abundant chain-length molecules and as a sustainable alternative to current petrochemical processes. Several metabolic engineering strategies for producing mcFaOHs have been demonstrated in the literature, yet processes continue to suffer from poor selectivity and mcFaOH toxicity, leading to reduced titers, rates, and yields of the desired compounds. This opinion examines the current state of microbial mcFaOH biosynthesis, summarizing engineering efforts to tailor selectivity and improve product tolerance by implementing engineering strategies that circumvent or overcome mcFaOH toxicity.
Collapse
Affiliation(s)
- Yun Su
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Anna M Mangus
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - William T Cordell
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Brian F Pfleger
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|
33
|
Kries H, Trottmann F, Hertweck C. Novel Biocatalysts from Specialized Metabolism. Angew Chem Int Ed Engl 2024; 63:e202309284. [PMID: 37737720 DOI: 10.1002/anie.202309284] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/21/2023] [Accepted: 09/22/2023] [Indexed: 09/23/2023]
Abstract
Enzymes are increasingly recognized as valuable (bio)catalysts that complement existing synthetic methods. However, the range of biotransformations used in the laboratory is limited. Here we give an overview on the biosynthesis-inspired discovery of novel biocatalysts that address various synthetic challenges. Prominent examples from this dynamic field highlight remarkable enzymes for protecting-group-free amide formation and modification, control of pericyclic reactions, stereoselective hetero- and polycyclizations, atroposelective aryl couplings, site-selective C-H activations, introduction of ring strain, and N-N bond formation. We also explore unusual functions of cytochrome P450 monooxygenases, radical SAM-dependent enzymes, flavoproteins, and enzymes recruited from primary metabolism, which offer opportunities for synthetic biology, enzyme engineering, directed evolution, and catalyst design.
Collapse
Affiliation(s)
- Hajo Kries
- Junior Research Group Biosynthetic Design of Natural Products, Leibniz Institute for Natural Product Research and Infection Biology (HKI), Beutenbergstr. 11a, 07745, Jena, Germany
- Department of Chemistry, University of Bayreuth, Universitätsstr. 30, 95440, Bayreuth, Germany
| | - Felix Trottmann
- Department of Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology (HKI), Beutenbergstr. 11a, 07745, Jena, Germany
| | - Christian Hertweck
- Department of Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology (HKI), Beutenbergstr. 11a, 07745, Jena, Germany
- Faculty of Biological Sciences, Friedrich Schiller University Jena, 07743, Jena, Germany
| |
Collapse
|
34
|
Xing H, Cai P, Liu D, Han M, Liu J, Le Y, Zhang D, Hu QN. High-throughput prediction of enzyme promiscuity based on substrate-product pairs. Brief Bioinform 2024; 25:bbae089. [PMID: 38487850 PMCID: PMC10940840 DOI: 10.1093/bib/bbae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/20/2024] [Accepted: 02/03/2024] [Indexed: 03/18/2024] Open
Abstract
The screening of enzymes for catalyzing specific substrate-product pairs is often constrained in the realms of metabolic engineering and synthetic biology. Existing tools based on substrate and reaction similarity predominantly rely on prior knowledge, demonstrating limited extrapolative capabilities and an inability to incorporate custom candidate-enzyme libraries. Addressing these limitations, we have developed the Substrate-product Pair-based Enzyme Promiscuity Prediction (SPEPP) model. This innovative approach utilizes transfer learning and transformer architecture to predict enzyme promiscuity, thereby elucidating the intricate interplay between enzymes and substrate-product pairs. SPEPP exhibited robust predictive ability, eliminating the need for prior knowledge of reactions and allowing users to define their own candidate-enzyme libraries. It can be seamlessly integrated into various applications, including metabolic engineering, de novo pathway design, and hazardous material degradation. To better assist metabolic engineers in designing and refining biochemical pathways, particularly those without programming skills, we also designed EnzyPick, an easy-to-use web server for enzyme screening based on SPEPP. EnzyPick is accessible at http://www.biosynther.com/enzypick/.
Collapse
Affiliation(s)
- Huadong Xing
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Yingying Le
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Dachuan Zhang
- Institute of Environmental Engineering, ETH Zurich, Laura-Hezner-Weg 7, 8093 Zurich, Switzerland
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
35
|
Yuan B, Yang D, Qu G, Turner NJ, Sun Z. Biocatalytic reductive aminations with NAD(P)H-dependent enzymes: enzyme discovery, engineering and synthetic applications. Chem Soc Rev 2024; 53:227-262. [PMID: 38059509 DOI: 10.1039/d3cs00391d] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
Chiral amines are pivotal building blocks for the pharmaceutical industry. Asymmetric reductive amination is one of the most efficient and atom economic methodologies for the synthesis of optically active amines. Among the various strategies available, NAD(P)H-dependent amine dehydrogenases (AmDHs) and imine reductases (IREDs) are robust enzymes that are available from various sources and capable of utilizing a broad range of substrates with high activities and stereoselectivities. AmDHs and IREDs operate via similar mechanisms, both involving a carbinolamine intermediate followed by hydride transfer from the co-factor. In addition, both groups catalyze the formation of primary and secondary amines utilizing both organic and inorganic amine donors. In this review, we discuss advances in developing AmDHs and IREDs as biocatalysts and focus on evolutionary history, substrate scope and applications of the enzymes to provide an outlook on emerging industrial biotechnologies of chiral amine production.
Collapse
Affiliation(s)
- Bo Yuan
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West 7th Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
| | - Dameng Yang
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
| | - Ge Qu
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West 7th Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
| | - Nicholas J Turner
- Department of Chemistry, Manchester Institute of Biotechnology, University of Manchester, Manchester M1 7DN, UK.
| | - Zhoutong Sun
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West 7th Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
| |
Collapse
|
36
|
Boob AG, Chen J, Zhao H. Enabling pathway design by multiplex experimentation and machine learning. Metab Eng 2024; 81:70-87. [PMID: 38040110 DOI: 10.1016/j.ymben.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 11/01/2023] [Accepted: 11/25/2023] [Indexed: 12/03/2023]
Abstract
The remarkable metabolic diversity observed in nature has provided a foundation for sustainable production of a wide array of valuable molecules. However, transferring the biosynthetic pathway to the desired host often runs into inherent failures that arise from intermediate accumulation and reduced flux resulting from competing pathways within the host cell. Moreover, the conventional trial and error methods utilized in pathway optimization struggle to fully grasp the intricacies of installed pathways, leading to time-consuming and labor-intensive experiments, ultimately resulting in suboptimal yields. Considering these obstacles, there is a pressing need to explore the enzyme expression landscape and identify the optimal pathway configuration for enhanced production of molecules. This review delves into recent advancements in pathway engineering, with a focus on multiplex experimentation and machine learning techniques. These approaches play a pivotal role in overcoming the limitations of traditional methods, enabling exploration of a broader design space and increasing the likelihood of discovering optimal pathway configurations for enhanced production of molecules. We discuss several tools and strategies for pathway design, construction, and optimization for sustainable and cost-effective microbial production of molecules ranging from bulk to fine chemicals. We also highlight major successes in academia and industry through compelling case studies.
Collapse
Affiliation(s)
- Aashutosh Girish Boob
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Junyu Chen
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States.
| |
Collapse
|
37
|
Xie WJ, Warshel A. Harnessing generative AI to decode enzyme catalysis and evolution for enhanced engineering. Natl Sci Rev 2023; 10:nwad331. [PMID: 38299119 PMCID: PMC10829072 DOI: 10.1093/nsr/nwad331] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 09/27/2023] [Accepted: 10/13/2023] [Indexed: 02/02/2024] Open
Abstract
Enzymes, as paramount protein catalysts, occupy a central role in fostering remarkable progress across numerous fields. However, the intricacy of sequence-function relationships continues to obscure our grasp of enzyme behaviors and curtails our capabilities in rational enzyme engineering. Generative artificial intelligence (AI), known for its proficiency in handling intricate data distributions, holds the potential to offer novel perspectives in enzyme research. Generative models could discern elusive patterns within the vast sequence space and uncover new functional enzyme sequences. This review highlights the recent advancements in employing generative AI for enzyme sequence analysis. We delve into the impact of generative AI in predicting mutation effects on enzyme fitness, catalytic activity and stability, rationalizing the laboratory evolution of de novo enzymes, and decoding protein sequence semantics and their application in enzyme engineering. Notably, the prediction of catalytic activity and stability of enzymes using natural protein sequences serves as a vital link, indicating how enzyme catalysis shapes enzyme evolution. Overall, we foresee that the integration of generative AI into enzyme studies will remarkably enhance our knowledge of enzymes and expedite the creation of superior biocatalysts.
Collapse
Affiliation(s)
- Wen Jun Xie
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, Genetics Institute, University of Florida, Gainesville, FL 32610, USA
| | - Arieh Warshel
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
38
|
Buller R, Lutz S, Kazlauskas RJ, Snajdrova R, Moore JC, Bornscheuer UT. From nature to industry: Harnessing enzymes for biocatalysis. Science 2023; 382:eadh8615. [PMID: 37995253 DOI: 10.1126/science.adh8615] [Citation(s) in RCA: 143] [Impact Index Per Article: 71.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/17/2023] [Indexed: 11/25/2023]
Abstract
Biocatalysis harnesses enzymes to make valuable products. This green technology is used in countless applications from bench scale to industrial production and allows practitioners to access complex organic molecules, often with fewer synthetic steps and reduced waste. The last decade has seen an explosion in the development of experimental and computational tools to tailor enzymatic properties, equipping enzyme engineers with the ability to create biocatalysts that perform reactions not present in nature. By using (chemo)-enzymatic synthesis routes or orchestrating intricate enzyme cascades, scientists can synthesize elaborate targets ranging from DNA and complex pharmaceuticals to starch made in vitro from CO2-derived methanol. In addition, new chemistries have emerged through the combination of biocatalysis with transition metal catalysis, photocatalysis, and electrocatalysis. This review highlights recent key developments, identifies current limitations, and provides a future prospect for this rapidly developing technology.
Collapse
Affiliation(s)
- R Buller
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - S Lutz
- Codexis Incorporated, Redwood City, CA 94063, USA
| | - R J Kazlauskas
- Department of Biochemistry, Molecular Biology and Biophysics, Biotechnology Institute, University of Minnesota, Saint Paul, MN 55108, USA
| | - R Snajdrova
- Novartis Institutes for BioMedical Research, Global Discovery Chemistry, 4056 Basel, Switzerland
| | - J C Moore
- MRL, Merck & Co., Rahway, NJ 07065, USA
| | - U T Bornscheuer
- Institute of Biochemistry, Dept. of Biotechnology and Enzyme Catalysis, Greifswald University, Greifswald, Germany
| |
Collapse
|
39
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
40
|
Merzbacher C, Oyarzún DA. Applications of artificial intelligence and machine learning in dynamic pathway engineering. Biochem Soc Trans 2023; 51:1871-1879. [PMID: 37656433 PMCID: PMC10657174 DOI: 10.1042/bst20221542] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/07/2023] [Accepted: 08/21/2023] [Indexed: 09/02/2023]
Abstract
Dynamic pathway engineering aims to build metabolic production systems embedded with intracellular control mechanisms for improved performance. These control systems enable host cells to self-regulate the temporal activity of a production pathway in response to perturbations, using a combination of biosensors and feedback circuits for controlling expression of heterologous enzymes. Pathway design, however, requires assembling together multiple biological parts into suitable circuit architectures, as well as careful calibration of the function of each component. This results in a large design space that is costly to navigate through experimentation alone. Methods from artificial intelligence (AI) and machine learning are gaining increasing attention as tools to accelerate the design cycle, owing to their ability to identify hidden patterns in data and rapidly screen through large collections of designs. In this review, we discuss recent developments in the application of machine learning methods to the design of dynamic pathways and their components. We cover recent successes and offer perspectives for future developments in the field. The integration of AI into metabolic engineering pipelines offers great opportunities to streamline design and discover control systems for improved production of high-value chemicals.
Collapse
Affiliation(s)
| | - Diego A. Oyarzún
- School of Informatics, University of Edinburgh, Edinburgh, U.K
- The Alan Turing Institute, London, U.K
- School of Biological Sciences, University of Edinburgh, Edinburgh, U.K
| |
Collapse
|
41
|
Xie WJ, Warshel A. Harnessing Generative AI to Decode Enzyme Catalysis and Evolution for Enhanced Engineering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.10.561808. [PMID: 37873334 PMCID: PMC10592750 DOI: 10.1101/2023.10.10.561808] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Enzymes, as paramount protein catalysts, occupy a central role in fostering remarkable progress across numerous fields. However, the intricacy of sequence-function relationships continues to obscure our grasp of enzyme behaviors and curtails our capabilities in rational enzyme engineering. Generative artificial intelligence (AI), known for its proficiency in handling intricate data distributions, holds the potential to offer novel perspectives in enzyme research. By applying generative models, we could discern elusive patterns within the vast sequence space and uncover new functional enzyme sequences. This review highlights the recent advancements in employing generative AI for enzyme sequence analysis. We delve into the impact of generative AI in predicting mutation effects on enzyme fitness, activity, and stability, rationalizing the laboratory evolution of de novo enzymes, decoding protein sequence semantics, and its applications in enzyme engineering. Notably, the prediction of enzyme activity and stability using natural enzyme sequences serves as a vital link, indicating how enzyme catalysis shapes enzyme evolution. Overall, we foresee that the integration of generative AI into enzyme studies will remarkably enhance our knowledge of enzymes and expedite the creation of superior biocatalysts.
Collapse
Affiliation(s)
- Wen Jun Xie
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA
- Departmet of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development (CNPD3), Genetics Institute, University of Florida, Gainesville, FL, USA
| | - Arieh Warshel
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
42
|
Michailidou F. The Scent of Change: Sustainable Fragrances Through Industrial Biotechnology. Chembiochem 2023; 24:e202300309. [PMID: 37668275 DOI: 10.1002/cbic.202300309] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/29/2023] [Indexed: 09/06/2023]
Abstract
Current environmental and safety considerations urge innovation to address the need for sustainable high-value chemicals that are embraced by consumers. This review discusses the concept of sustainable fragrances, as high-value, everyday and everywhere chemicals. Current and emerging technologies represent an opportunity to produce fragrances in an environmentally and socially responsible way. Biotechnology, including fermentation, biocatalysis, and genetic engineering, has the potential to reduce the environmental footprint of fragrance production while maintaining quality and consistency. Computational and in silico methods, including machine learning (ML), are also likely to augment the capabilities of sustainable fragrance production. Continued innovation and collaboration will be crucial to the future of sustainable fragrances, with a focus on developing novel sustainable ingredients, as well as ethical sourcing practices.
Collapse
Affiliation(s)
- Freideriki Michailidou
- Department of Health Sciences and Technology, ETH Zurich, Schmelzbergstrasse 9, 8092, Zürich, Switzerland
| |
Collapse
|
43
|
Yuan Y, Shi C, Zhao H. Machine Learning-Enabled Genome Mining and Bioactivity Prediction of Natural Products. ACS Synth Biol 2023; 12:2650-2662. [PMID: 37607352 PMCID: PMC10615616 DOI: 10.1021/acssynbio.3c00234] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Natural products (NPs) produced by microorganisms and plants are a major source of drugs, herbicides, and fungicides. Thanks to recent advances in DNA sequencing, bioinformatics, and genome mining tools, a vast amount of data on NP biosynthesis has been generated over the years, which has been increasingly exploited to develop machine learning (ML) tools for NP discovery. In this review, we discuss the latest advances in developing and applying ML tools for exploring the potential NPs that can be encoded by genomic language and predicting the types of bioactivities of NPs. We also examine the technical challenges associated with the development and application of ML tools for NP research.
Collapse
Affiliation(s)
- Yujie Yuan
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Chengyou Shi
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Huimin Zhao
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Departments of Chemistry, Biochemistry, and Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
44
|
Yang J, Ducharme J, Johnston KE, Li FZ, Yue Y, Arnold FH. DeCOIL: Optimization of Degenerate Codon Libraries for Machine Learning-Assisted Protein Engineering. ACS Synth Biol 2023; 12:2444-2454. [PMID: 37524064 DOI: 10.1021/acssynbio.3c00301] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]
Abstract
With advances in machine learning (ML)-assisted protein engineering, models based on data, biophysics, and natural evolution are being used to propose informed libraries of protein variants to explore. Synthesizing these libraries for experimental screens is a major bottleneck, as the cost of obtaining large numbers of exact gene sequences is often prohibitive. Degenerate codon (DC) libraries are a cost-effective alternative for generating combinatorial mutagenesis libraries where mutations are targeted to a handful of amino acid sites. However, existing computational methods to optimize DC libraries to include desired protein variants are not well suited to design libraries for ML-assisted protein engineering. To address these drawbacks, we present DEgenerate Codon Optimization for Informed Libraries (DeCOIL), a generalized method that directly optimizes DC libraries to be useful for protein engineering: to sample protein variants that are likely to have both high fitness and high diversity in the sequence search space. Using computational simulations and wet-lab experiments, we demonstrate that DeCOIL is effective across two specific case studies, with the potential to be applied to many other use cases. DeCOIL offers several advantages over existing methods, as it is direct, easy to use, generalizable, and scalable. With accompanying software (https://github.com/jsunn-y/DeCOIL), DeCOIL can be readily implemented to generate desired informed libraries.
Collapse
Affiliation(s)
- Jason Yang
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Julie Ducharme
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Kadina E Johnston
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Yisong Yue
- Division of Engineering and Applied Sciences, California Institute of Technology, Pasadena, California 91125, United States
| | - Frances H Arnold
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
45
|
Zhou L, Wang Y, Peng L, Li Z, Luo X. Identifying potential drug-target interactions based on ensemble deep learning. Front Aging Neurosci 2023; 15:1176400. [PMID: 37396659 PMCID: PMC10309650 DOI: 10.3389/fnagi.2023.1176400] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/10/2023] [Indexed: 07/04/2023] Open
Abstract
Introduction Drug-target interaction prediction is one important step in drug research and development. Experimental methods are time consuming and laborious. Methods In this study, we developed a novel DTI prediction method called EnGDD by combining initial feature acquisition, dimensional reduction, and DTI classification based on Gradient boosting neural network, Deep neural network, and Deep Forest. Results EnGDD was compared with seven stat-of-the-art DTI prediction methods (BLM-NII, NRLMF, WNNGIP, NEDTP, DTi2Vec, RoFDT, and MolTrans) on the nuclear receptor, GPCR, ion channel, and enzyme datasets under cross validations on drugs, targets, and drug-target pairs, respectively. EnGDD computed the best recall, accuracy, F1-score, AUC, and AUPR under the majority of conditions, demonstrating its powerful DTI identification performance. EnGDD predicted that D00182 and hsa2099, D07871 and hsa1813, DB00599 and hsa2562, D00002 and hsa10935 have a higher interaction probabilities among unknown drug-target pairs and may be potential DTIs on the four datasets, respectively. In particular, D00002 (Nadide) was identified to interact with hsa10935 (Mitochondrial peroxiredoxin3) whose up-regulation might be used to treat neurodegenerative diseases. Finally, EnGDD was used to find possible drug targets for Parkinson's disease and Alzheimer's disease after confirming its DTI identification performance. The results show that D01277, D04641, and D08969 may be applied to the treatment of Parkinson's disease through targeting hsa1813 (dopamine receptor D2) and D02173, D02558, and D03822 may be the clues of treatment for patients with Alzheimer's disease through targeting hsa5743 (prostaglandinendoperoxide synthase 2). The above prediction results need further biomedical validation. Discussion We anticipate that our proposed EnGDD model can help discover potential therapeutic clues for various diseases including neurodegenerative diseases.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yuzhuang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Xueming Luo
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|