1
|
Chen A, Peng X, Shen T, Zheng L, Wu D, Wang S. Discovery, design, and engineering of enzymes based on molecular retrobiosynthesis. MLIFE 2025; 4:107-125. [PMID: 40313979 PMCID: PMC12042125 DOI: 10.1002/mlf2.70009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 02/06/2025] [Accepted: 02/13/2025] [Indexed: 05/03/2025]
Abstract
Biosynthesis-a process utilizing biological systems to synthesize chemical compounds-has emerged as a revolutionary solution to 21st-century challenges due to its environmental sustainability, scalability, and high stereoselectivity and regioselectivity. Recent advancements in artificial intelligence (AI) are accelerating biosynthesis by enabling intelligent design, construction, and optimization of enzymatic reactions and biological systems. We first introduce the molecular retrosynthesis route planning in biochemical pathway design, including single-step retrosynthesis algorithms and AI-based chemical retrosynthesis route design tools. We highlight the advantages and challenges of large language models in addressing the sparsity of chemical data. Furthermore, we review enzyme discovery methods based on sequence and structure alignment techniques. Breakthroughs in AI-based structural prediction methods are expected to significantly improve the accuracy of enzyme discovery. We also summarize methods for de novo enzyme generation for nonnatural or orphan reactions, focusing on AI-based enzyme functional annotation and enzyme discovery techniques based on reaction or small molecule similarity. Turning to enzyme engineering, we discuss strategies to improve enzyme thermostability, solubility, and activity, as well as the applications of AI in these fields. The shift from traditional experiment-driven models to data-driven and computationally driven intelligent models is already underway. Finally, we present potential challenges and provide a perspective on future research directions. We envision expanded applications of biocatalysis in drug development, green chemistry, and complex molecule synthesis.
Collapse
Affiliation(s)
- Ancheng Chen
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | - Xiangda Peng
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | - Tao Shen
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | | | - Dong Wu
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| |
Collapse
|
2
|
Liu M, Ni X, Ramanujam J, Brylinski M. EC2Vec: A Machine Learning Method to Embed Enzyme Commission (EC) Numbers into Vector Representations. J Chem Inf Model 2025; 65:2173-2179. [PMID: 39981640 PMCID: PMC11898066 DOI: 10.1021/acs.jcim.4c02161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 02/10/2025] [Accepted: 02/10/2025] [Indexed: 02/22/2025]
Abstract
Enzyme commission (EC) numbers play a vital role in classifying enzymes and understanding their functions in enzyme-related research. Although accurate and informative encoding of EC numbers is essential for enhancing the effectiveness of machine learning applications, simple EC encoding approaches suffer from limitations such as false numerical order and high sparsity. To address these issues, we developed EC2Vec, a multimodal autoencoder that preserves the categorical nature of EC numbers and leverages their hierarchical relationships, resulting in more meaningful and informative representations. EC2Vec encodes each digit of the EC number as a categorical token and then processes these embeddings through a 1D convolutional layer to capture their relationships. Comprehensive benchmarking against a large collection of EC numbers indicates that EC2Vec outperforms simple encoding methods. The t-SNE visualization of EC2Vec embeddings revealed distinct clusters corresponding to different enzyme classes, demonstrating that the hierarchical structure of the EC numbers is effectively captured. In downstream machine learning applications, EC2Vec embeddings outperformed other EC encoding methods in the reaction-EC pair classification task, underscoring its robustness and utility for enzyme-related research and bioinformatics applications.
Collapse
Affiliation(s)
- Mengmeng Liu
- Division
of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, United States
| | - Xialong Ni
- Department
of Biological Sciences, Louisiana State
University, Baton Rouge, Louisiana 70803, United States
| | - J. Ramanujam
- Division
of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, United States
- Center
for Computation and Technology, Louisiana
State University, Baton Rouge, Louisiana 70803, United States
| | - Michal Brylinski
- Department
of Biological Sciences, Louisiana State
University, Baton Rouge, Louisiana 70803, United States
- Center
for Computation and Technology, Louisiana
State University, Baton Rouge, Louisiana 70803, United States
| |
Collapse
|
3
|
Srivastava G, Brylinski M. A Data-Driven Approach to Enhance the Prediction of Bacteria-Metabolite Interactions in the Human Gut Microbiome Using Enzyme Encodings and Metabolite Structural Embeddings. Nutrients 2025; 17:469. [PMID: 39940326 PMCID: PMC11820091 DOI: 10.3390/nu17030469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2024] [Revised: 01/22/2025] [Accepted: 01/24/2025] [Indexed: 02/14/2025] Open
Abstract
Background: The human gut microbiome is critical for host health by facilitating essential metabolic processes. Our study presents a data-driven analysis across 312 bacterial species and 154 unique metabolites to enhance the understanding of underlying metabolic processes in gut bacteria. The focus of the study was to create a strategy to generate a theoretical (negative) set for binary classification models to predict the consumption and production of metabolites in the human gut microbiome. Results: Our models achieved median balanced accuracies of 0.74 for consumption predictions and 0.95 for production predictions, highlighting the effectiveness of this approach in generating reliable negative sets. Additionally, we applied a kernel principal component analysis for dimensionality reduction. The consumption model with a polynomial kernel, and the production model with a radial basis function with 32 reduced features, showed median accuracies of 0.58 and 0.67, respectively. This demonstrates that biological information can still be captured, albeit with some loss, even after reducing the number of features. Furthermore, our models were validated on six previously unseen cases, achieving five correct predictions for consumption and four for production, demonstrating alignment with known biological outcomes. Conclusions: These findings highlight the potential of integrating data-driven approaches with machine learning techniques to enhance our understanding of gut microbiome metabolism. This work provides a foundation for creating bacteria-metabolite datasets to enhance machine learning-based predictive tools, with potential applications in developing therapeutic methods targeting gut microbes.
Collapse
Affiliation(s)
- Gopal Srivastava
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA;
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA;
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA
| |
Collapse
|
4
|
Kissman EN, Sosa MB, Millar DC, Koleski EJ, Thevasundaram K, Chang MCY. Expanding chemistry through in vitro and in vivo biocatalysis. Nature 2024; 631:37-48. [PMID: 38961155 DOI: 10.1038/s41586-024-07506-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 05/01/2024] [Indexed: 07/05/2024]
Abstract
Living systems contain a vast network of metabolic reactions, providing a wealth of enzymes and cells as potential biocatalysts for chemical processes. The properties of protein and cell biocatalysts-high selectivity, the ability to control reaction sequence and operation in environmentally benign conditions-offer approaches to produce molecules at high efficiency while lowering the cost and environmental impact of industrial chemistry. Furthermore, biocatalysis offers the opportunity to generate chemical structures and functions that may be inaccessible to chemical synthesis. Here we consider developments in enzymes, biosynthetic pathways and cellular engineering that enable their use in catalysis for new chemistry and beyond.
Collapse
Affiliation(s)
- Elijah N Kissman
- Department of Chemistry, University of California Berkeley, Berkeley, CA, USA
| | - Max B Sosa
- Department of Chemistry, University of California Berkeley, Berkeley, CA, USA
| | - Douglas C Millar
- Department of Chemical and Biomolecular Engineering, University of California Berkeley, Berkeley, CA, USA
| | - Edward J Koleski
- Department of Chemistry, University of California Berkeley, Berkeley, CA, USA
| | | | - Michelle C Y Chang
- Department of Chemistry, University of California Berkeley, Berkeley, CA, USA.
- Department of Chemical and Biomolecular Engineering, University of California Berkeley, Berkeley, CA, USA.
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA, USA.
- Department of Chemistry, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
5
|
Prešern U, Goličnik M. Enzyme Databases in the Era of Omics and Artificial Intelligence. Int J Mol Sci 2023; 24:16918. [PMID: 38069254 PMCID: PMC10707154 DOI: 10.3390/ijms242316918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 12/18/2023] Open
Abstract
Enzyme research is important for the development of various scientific fields such as medicine and biotechnology. Enzyme databases facilitate this research by providing a wide range of information relevant to research planning and data analysis. Over the years, various databases that cover different aspects of enzyme biology (e.g., kinetic parameters, enzyme occurrence, and reaction mechanisms) have been developed. Most of the databases are curated manually, which improves reliability of the information; however, such curation cannot keep pace with the exponential growth in published data. Lack of data standardization is another obstacle for data extraction and analysis. Improving machine readability of databases is especially important in the light of recent advances in deep learning algorithms that require big training datasets. This review provides information regarding the current state of enzyme databases, especially in relation to the ever-increasing amount of generated research data and recent advancements in artificial intelligence algorithms. Furthermore, it describes several enzyme databases, providing the reader with necessary information for their use.
Collapse
Affiliation(s)
| | - Marko Goličnik
- Institute of Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Vrazov trg 2, 1000 Ljubljana, Slovenia;
| |
Collapse
|
7
|
Zhang D, Tian Y, Tian Y, Xing H, Liu S, Zhang H, Ding S, Cai P, Sun D, Zhang T, Hong Y, Dai H, Tu W, Chen J, Wu A, Hu QN. A data-driven integrative platform for computational prediction of toxin biotransformation with a case study. JOURNAL OF HAZARDOUS MATERIALS 2021; 408:124810. [PMID: 33360695 DOI: 10.1016/j.jhazmat.2020.124810] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/24/2020] [Accepted: 12/06/2020] [Indexed: 06/12/2023]
Abstract
Recently, biogenic toxins have received increasing attention owing to their high contamination levels in feed and food as well as in the environment. However, there is a lack of an integrative platform for seamless linking of data-driven computational methods with 'wet' experimental validations. To this end, we constructed a novel platform that integrates the technical aspects of toxin biotransformation methods. First, a biogenic toxin database termed ToxinDB (http://www.rxnfinder.org/toxindb/), containing multifaceted data on more than 4836 toxins, was built. Next, more than 8000 biotransformation reaction rules were extracted from over 300,000 biochemical reactions extracted from ~580,000 literature reports curated by more than 100 people over the past decade. Based on these reaction rules, a toxin biotransformation prediction model was constructed. Finally, the global chemical space of biogenic toxins was constructed, comprising ~550,000 toxins and putative toxin metabolites, of which 94.7% of the metabolites have not been previously reported. Additionally, we performed a case study to investigate citrinin metabolism in Trichoderma, and a novel metabolite was identified with the assistance of the biotransformation prediction tool of ToxinDB. This unique integrative platform will assist exploration of the 'dark matter' of a toxin's metabolome and promote the discovery of detoxification enzymes.
Collapse
Affiliation(s)
- Dachuan Zhang
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China
| | - Ye Tian
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China
| | - Yu Tian
- School of Biology and Pharmaceutical Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China; Wuhan LifeSynther Science and Technology Co. Limited, Wuhan 430070, PR China
| | - Huadong Xing
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China
| | - Sheng Liu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China
| | - Haoyang Zhang
- College of Food Engineering and Nutritional Science, Shaanxi Normal University, Xi'an 710119, PR China
| | - Shaozhen Ding
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China; Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Dandan Sun
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China
| | - Tong Zhang
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China
| | - Yanhong Hong
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China
| | - Hongkun Dai
- Shandong Runda Testing Technology Co. Limited, Weifang 261000, PR China
| | - Weizhong Tu
- Wuhan LifeSynther Science and Technology Co. Limited, Wuhan 430070, PR China
| | - Junni Chen
- Wuhan LifeSynther Science and Technology Co. Limited, Wuhan 430070, PR China
| | - Aibo Wu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China.
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS Key Laboratory of Nutrition, Metabolism and Food Safety, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China.
| |
Collapse
|