1
|
Cao L, Teo D, Wang Y, Ye Q, Liu C, Ding C, Li X, Chang M, Han Y, Li Z, Sun X, Huang Q, Zhang CY, Foo JL, Wong A, Yu A. Advancements in Microbial Cell Engineering for Benzylisoquinoline Alkaloid Production. ACS Synth Biol 2024; 13:3842-3856. [PMID: 39579377 DOI: 10.1021/acssynbio.4c00599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2024]
Abstract
Benzylisoquinoline alkaloids (BIAs) are a class of natural compounds found in plants of the Ranunculaceae family, known for their diverse pharmacological activities. However, the extraction yields of BIAs from plants are limited, and the cost of chemical synthesis is prohibitively high. Recent advancements in systems metabolic engineering and genomics have made it feasible to use microbes as bioreactors for BIAs production. This review explores recent progress in enhancing the production and yields of BIAs in two microbial systems: Escherichia coli and Saccharomyces cerevisiae. It covers various BIAs, including (S)-reticuline, morphinane, protoberberine, and aporphine alkaloids. The review provides strategies and technologies for BIAs synthesis, analyzes current challenges in BIAs research, and offers recommendations for future research directions.
Collapse
Affiliation(s)
- Liyan Cao
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Desmond Teo
- Food Chemical and Biotechnology Cluster, Singapore Institute of Technology, Singapore 828608, Singapore
| | - Yuyang Wang
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Qingqing Ye
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Chang Liu
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Chen Ding
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Xiangyu Li
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Mingxin Chang
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Yuqing Han
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Zhuo Li
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Xu Sun
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Qingeng Huang
- Qingyuan One Alive Institute of Biological Research Co., Ltd, Qingyuan 500112, PR China
| | - Cui-Ying Zhang
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| | - Jee Loon Foo
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228, Singapore
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore 117456, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore
- National Centre for Engineering Biology (NCEB), 119077Singapore, Singapore
| | - Adison Wong
- Food Chemical and Biotechnology Cluster, Singapore Institute of Technology, Singapore 828608, Singapore
| | - Aiqun Yu
- State Key Laboratory of Food Nutrition and Safety, Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, No.29 the 13th Street TEDA, Tianjin 300457, PR China
| |
Collapse
|
2
|
Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev 2024; 53:8202-8239. [PMID: 38990263 DOI: 10.1039/d4cs00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Global environmental issues and sustainable development call for new technologies for fine chemical synthesis and waste valorization. Biocatalysis has attracted great attention as the alternative to the traditional organic synthesis. However, it is challenging to navigate the vast sequence space to identify those proteins with admirable biocatalytic functions. The recent development of deep-learning based structure prediction methods such as AlphaFold2 reinforced by different computational simulations or multiscale calculations has largely expanded the 3D structure databases and enabled structure-based design. While structure-based approaches shed light on site-specific enzyme engineering, they are not suitable for large-scale screening of potential biocatalysts. Effective utilization of big data using machine learning techniques opens up a new era for accelerated predictions. Here, we review the approaches and applications of structure-based and machine-learning guided enzyme design. We also provide our view on the challenges and perspectives on effectively employing enzyme design approaches integrating traditional molecular simulations and machine learning, and the importance of database construction and algorithm development in attaining predictive ML models to explore the sequence fitness landscape for the design of admirable biocatalysts.
Collapse
Affiliation(s)
- Jiahui Zhou
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| | - Meilan Huang
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| |
Collapse
|
3
|
Kim GB, Kim JY, Lee JA, Norsigian CJ, Palsson BO, Lee SY. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nat Commun 2023; 14:7370. [PMID: 37963869 PMCID: PMC10645960 DOI: 10.1038/s41467-023-43216-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 11/03/2023] [Indexed: 11/16/2023] Open
Abstract
Functional annotation of open reading frames in microbial genomes remains substantially incomplete. Enzymes constitute the most prevalent functional gene class in microbial genomes and can be described by their specific catalytic functions using the Enzyme Commission (EC) number. Consequently, the ability to predict EC numbers could substantially reduce the number of un-annotated genes. Here we present a deep learning model, DeepECtransformer, which utilizes transformer layers as a neural network architecture to predict EC numbers. Using the extensively studied Escherichia coli K-12 MG1655 genome, DeepECtransformer predicted EC numbers for 464 un-annotated genes. We experimentally validated the enzymatic activities predicted for three proteins (YgfF, YciO, and YjdM). Further examination of the neural network's reasoning process revealed that the trained neural network relies on functional motifs of enzymes to predict EC numbers. Thus, DeepECtransformer is a method that facilitates the functional annotation of uncharacterized genes.
Collapse
Affiliation(s)
- Gi Bae Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Ji Yeon Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Jong An Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Charles J Norsigian
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
- Novo Nordisk Foundation Center for Biosustainability, 2800, Kongens Lyngby, Denmark
| | - Sang Yup Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea.
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea.
- BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
4
|
Zhong W, Li H, Wang Y. Design and Construction of Artificial Biological Systems for One-Carbon Utilization. BIODESIGN RESEARCH 2023; 5:0021. [PMID: 37915992 PMCID: PMC10616972 DOI: 10.34133/bdr.0021] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 10/05/2023] [Indexed: 11/03/2023] Open
Abstract
The third-generation (3G) biorefinery aims to use microbial cell factories or enzymatic systems to synthesize value-added chemicals from one-carbon (C1) sources, such as CO2, formate, and methanol, fueled by renewable energies like light and electricity. This promising technology represents an important step toward sustainable development, which can help address some of the most pressing environmental challenges faced by modern society. However, to establish processes competitive with the petroleum industry, it is crucial to determine the most viable pathways for C1 utilization and productivity and yield of the target products. In this review, we discuss the progresses that have been made in constructing artificial biological systems for 3G biorefineries in the last 10 years. Specifically, we highlight the representative works on the engineering of artificial autotrophic microorganisms, tandem enzymatic systems, and chemo-bio hybrid systems for C1 utilization. We also prospect the revolutionary impact of these developments on biotechnology. By harnessing the power of 3G biorefinery, scientists are establishing a new frontier that could potentially revolutionize our approach to industrial production and pave the way for a more sustainable future.
Collapse
Affiliation(s)
- Wei Zhong
- Westlake Center of Synthetic Biology and Integrated Bioengineering, School of Engineering,
Westlake University, Hangzhou 310000, PR China
| | - Hailong Li
- Westlake Center of Synthetic Biology and Integrated Bioengineering, School of Engineering,
Westlake University, Hangzhou 310000, PR China
- School of Materials Science and Engineering,
Zhejiang University, Zhejiang Province, Hangzhou 310000, PR China
| | - Yajie Wang
- Westlake Center of Synthetic Biology and Integrated Bioengineering, School of Engineering,
Westlake University, Hangzhou 310000, PR China
| |
Collapse
|
5
|
Ge F, Chen G, Qian M, Xu C, Liu J, Cao J, Li X, Hu D, Xu Y, Xin Y, Wang D, Zhou J, Shi H, Tan Z. Artificial Intelligence Aided Lipase Production and Engineering for Enzymatic Performance Improvement. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:14911-14930. [PMID: 37800676 DOI: 10.1021/acs.jafc.3c05029] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
With the development of artificial intelligence (AI), tailoring methods for enzyme engineering have been widely expanded. Additional protocols based on optimized network models have been used to predict and optimize lipase production as well as properties, namely, catalytic activity, stability, and substrate specificity. Here, different network models and algorithms for the prediction and reforming of lipase, focusing on its modification methods and cases based on AI, are reviewed in terms of both their advantages and disadvantages. Different neural networks coupled with various algorithms are usually applied to predict the maximum yield of lipase by optimizing the external cultivations for lipase production, while one part is used to predict the molecule variations affecting the properties of lipase. However, few studies have directly utilized AI to engineer lipase by affecting the structure of the enzyme, and a set of research gaps needs to be explored. Additionally, future perspectives of AI application in enzymes, including lipase engineering, are deduced to help the redesign of enzymes and the reform of new functional biocatalysts. This review provides a new horizon for developing effective and innovative AI tools for lipase production and engineering and facilitating lipase applications in the food industry and biomass conversion.
Collapse
Affiliation(s)
- Feiyin Ge
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Gang Chen
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Minjing Qian
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Cheng Xu
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Jiao Liu
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Jiaqi Cao
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Xinchao Li
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Die Hu
- School of Pharmacy & School of Biological and Food Engineering, Changzhou University, Changzhou 213164, People's Republic of China
| | - Yangsen Xu
- Dongtai Hanfangyuan Biotechnology Co. Ltd., Yancheng 224241, People's Republic of China
| | - Ya Xin
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Dianlong Wang
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Jia Zhou
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Hao Shi
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| | - Zhongbiao Tan
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huai'an 223003, People's Republic of China
| |
Collapse
|
6
|
Platero-Rochart D, Krivobokova T, Gastegger M, Reibnegger G, Sánchez-Murcia PA. Prediction of Enzyme Catalysis by Computing Reaction Energy Barriers via Steered QM/MM Molecular Dynamics Simulations and Machine Learning. J Chem Inf Model 2023; 63:4623-4632. [PMID: 37479222 PMCID: PMC10430765 DOI: 10.1021/acs.jcim.3c00772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Indexed: 07/23/2023]
Abstract
The prediction of enzyme activity is one of the main challenges in catalysis. With computer-aided methods, it is possible to simulate the reaction mechanism at the atomic level. However, these methods are usually expensive if they are to be used on a large scale, as they are needed for protein engineering campaigns. To alleviate this situation, machine learning methods can help in the generation of predictive-decision models. Herein, we test different regression algorithms for the prediction of the reaction energy barrier of the rate-limiting step of the hydrolysis of mono-(2-hydroxyethyl)terephthalic acid by the MHETase ofIdeonella sakaiensis. As a training data set, we use steered quantum mechanics/molecular mechanics (QM/MM) molecular dynamics (MD) simulation snapshots and their corresponding pulling work values. We have explored three algorithms together with three chemical representations. As an outcome, our trained models are able to predict pulling works along the steered QM/MM MD simulations with a mean absolute error below 3 kcal mol-1 and a score value above 0.90. More challenging is the prediction of the energy maximum with a single geometry. Whereas the use of the initial snapshot of the QM/MM MD trajectory as input geometry yields a very poor prediction of the reaction energy barrier, the use of an intermediate snapshot of the former trajectory brings the score value above 0.40 with a low mean absolute error (ca. 3 kcal mol-1). Altogether, we have faced in this work some initial challenges of the final goal of getting an efficient workflow for the semiautomatic prediction of enzyme-catalyzed energy barriers and catalytic efficiencies.
Collapse
Affiliation(s)
- Daniel Platero-Rochart
- Laboratory
of Computer-Aided Molecular Design, Division of Medicinal Chemistry,
Otto-Loewi Research Center, Medical University
of Graz, Neue Stiftingtalstraße 6/III, A-8010 Graz, Austria
| | - Tatyana Krivobokova
- Department
of Statistics and Operations Research, University
of Vienna, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria
| | - Michael Gastegger
- Institute
of Software Engineering and Theoretical Computer Science, Machine
Learning Group, Technische Universität, 10587 Berlin, Germany
| | - Gilbert Reibnegger
- Laboratory
of Computer-Aided Molecular Design, Division of Medicinal Chemistry,
Otto-Loewi Research Center, Medical University
of Graz, Neue Stiftingtalstraße 6/III, A-8010 Graz, Austria
| | - Pedro A. Sánchez-Murcia
- Laboratory
of Computer-Aided Molecular Design, Division of Medicinal Chemistry,
Otto-Loewi Research Center, Medical University
of Graz, Neue Stiftingtalstraße 6/III, A-8010 Graz, Austria
| |
Collapse
|
7
|
Rappoport D, Jinich A. Enzyme Substrate Prediction from Three-Dimensional Feature Representations Using Space-Filling Curves. J Chem Inf Model 2023; 63:1637-1648. [PMID: 36802628 DOI: 10.1021/acs.jcim.3c00005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Compact and interpretable structural feature representations are required for accurately predicting properties and function of proteins. In this work, we construct and evaluate three-dimensional feature representations of protein structures based on space-filling curves (SFCs). We focus on the problem of enzyme substrate prediction, using two ubiquitous enzyme families as case studies: the short-chain dehydrogenase/reductases (SDRs) and the S-adenosylmethionine-dependent methyltransferases (SAM-MTases). Space-filling curves such as the Hilbert curve and the Morton curve generate a reversible mapping from discretized three-dimensional to one-dimensional representations and thus help to encode three-dimensional molecular structures in a system-independent way and with only a few adjustable parameters. Using three-dimensional structures of SDRs and SAM-MTases generated using AlphaFold2, we assess the performance of the SFC-based feature representations in predictions on a new benchmark database of enzyme classification tasks including their cofactor and substrate selectivity. Gradient-boosted tree classifiers yield binary prediction accuracy of 0.77-0.91 and area under curve (AUC) characteristics of 0.83-0.92 for the classification tasks. We investigate the effects of amino acid encoding, spatial orientation, and (the few) parameters of SFC-based encodings on the accuracy of the predictions. Our results suggest that geometry-based approaches such as SFCs are promising for generating protein structural representations and are complementary to the existing protein feature representations such as evolutionary scale modeling (ESM) sequence embeddings.
Collapse
Affiliation(s)
- Dmitrij Rappoport
- Department of Chemistry, University of California, Irvine, 1102 Natural Sciences 2, Irvine, California 92697, United States
| | - Adrian Jinich
- Weill Cornell Medicine, 1300 York Avenue, Box 65, New York, New York 10065, United States
| |
Collapse
|
8
|
Lim PK, Julca I, Mutwil M. Redesigning plant specialized metabolism with supervised machine learning using publicly available reactome data. Comput Struct Biotechnol J 2023; 21:1639-1650. [PMID: 36874159 PMCID: PMC9976193 DOI: 10.1016/j.csbj.2023.01.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/12/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open
Abstract
The immense structural diversity of products and intermediates of plant specialized metabolism (specialized metabolites) makes them rich sources of therapeutic medicine, nutrients, and other useful materials. With the rapid accumulation of reactome data that can be accessible on biological and chemical databases, along with recent advances in machine learning, this review sets out to outline how supervised machine learning can be used to design new compounds and pathways by exploiting the wealth of said data. We will first examine the various sources from which reactome data can be obtained, followed by explaining the different machine learning encoding methods for reactome data. We then discuss current supervised machine learning developments that can be employed in various aspects to help redesign plant specialized metabolism.
Collapse
Affiliation(s)
- Peng Ken Lim
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Irene Julca
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
9
|
Predicting Genetic Disorder and Types of Disorder Using Chain Classifier Approach. Genes (Basel) 2022; 14:genes14010071. [PMID: 36672812 PMCID: PMC9858679 DOI: 10.3390/genes14010071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 12/16/2022] [Accepted: 12/16/2022] [Indexed: 12/28/2022] Open
Abstract
Genetic disorders are the result of mutation in the deoxyribonucleic acid (DNA) sequence which can be developed or inherited from parents. Such mutations may lead to fatal diseases such as Alzheimer's, cancer, Hemochromatosis, etc. Recently, the use of artificial intelligence-based methods has shown superb success in the prediction and prognosis of different diseases. The potential of such methods can be utilized to predict genetic disorders at an early stage using the genome data for timely treatment. This study focuses on the multi-label multi-class problem and makes two major contributions to genetic disorder prediction. A novel feature engineering approach is proposed where the class probabilities from an extra tree (ET) and random forest (RF) are joined to make a feature set for model training. Secondly, the study utilizes the classifier chain approach where multiple classifiers are joined in a chain and the predictions from all the preceding classifiers are used by the conceding classifiers to make the final prediction. Because of the multi-label multi-class data, macro accuracy, Hamming loss, and α-evaluation score are used to evaluate the performance. Results suggest that extreme gradient boosting (XGB) produces the best scores with a 92% α-evaluation score and a 84% macro accuracy score. The performance of XGB is much better than state-of-the-art approaches, in terms of both performance and computational complexity.
Collapse
|
10
|
Watanabe N, Yamamoto M, Murata M, Vavricka CJ, Ogino C, Kondo A, Araki M. Comprehensive Machine Learning Prediction of Extensive Enzymatic Reactions. J Phys Chem B 2022; 126:6762-6770. [PMID: 36053051 DOI: 10.1021/acs.jpcb.2c03287] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
New enzyme functions exist within the increasing number of unannotated protein sequences. Novel enzyme discovery is necessary to expand the pathways that can be accessed by metabolic engineering for the biosynthesis of functional compounds. Accordingly, various machine learning models have been developed to predict enzymatic reactions. However, the ability to predict unknown reactions that are not included in the training data has not been clarified. In order to cover uncertain and unknown reactions, a wider range of reaction types must be demonstrated by the models. Here, we establish 16 expanded enzymatic reaction prediction models developed using various machine learning algorithms, including deep neural network. Improvements in prediction performances over that of our previous study indicate that the updated methods are more effective for the prediction of enzymatic reactions. Overall, the deep neural network model trained with combined substrate-enzyme-product information exhibits the highest prediction accuracy with Macro F1 scores up to 0.966 and with robust prediction of unknown enzymatic reactions that are not included in the training data. This model can predict more extensive enzymatic reactions in comparison to previously reported models. This study will facilitate the discovery of new enzymes for the production of useful substances.
Collapse
Affiliation(s)
- Naoki Watanabe
- Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501, Japan
| | - Masaki Yamamoto
- Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan
| | - Masahiro Murata
- Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan
| | - Christopher J Vavricka
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
| | - Chiaki Ogino
- Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501, Japan
| | - Akihiko Kondo
- Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501, Japan.,Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
| | - Michihiro Araki
- Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan.,Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan.,National Institutes of Biomedical Innovation, Health and Nutrition, National Institute of Health and Nutrition, 1-23-1 Toyama, Shinjuku-ku, Tokyo 162-8638, Japan.,National Cerebral and Cardiovascular Center, 6-1 Kishibe-Shinmachi, Suita, Osaka 564-8565, Japan
| |
Collapse
|
11
|
Machine learning discovery of missing links that mediate alternative branches to plant alkaloids. Nat Commun 2022; 13:1405. [PMID: 35296652 PMCID: PMC8927377 DOI: 10.1038/s41467-022-28883-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 02/16/2022] [Indexed: 01/12/2023] Open
Abstract
Engineering the microbial production of secondary metabolites is limited by the known reactions of correctly annotated enzymes. Therefore, the machine learning discovery of specialized enzymes offers great potential to expand the range of biosynthesis pathways. Benzylisoquinoline alkaloid production is a model example of metabolic engineering with potential to revolutionize the paradigm of sustainable biomanufacturing. Existing bacterial studies utilize a norlaudanosoline pathway, whereas plants contain a more stable norcoclaurine pathway, which is exploited in yeast. However, committed aromatic precursors are still produced using microbial enzymes that remain elusive in plants, and additional downstream missing links remain hidden within highly duplicated plant gene families. In the current study, machine learning is applied to predict and select plant missing link enzymes from homologous candidate sequences. Metabolomics-based characterization of the selected sequences reveals potential aromatic acetaldehyde synthases and phenylpyruvate decarboxylases in reconstructed plant gene-only benzylisoquinoline alkaloid pathways from tyrosine. Synergistic application of the aryl acetaldehyde producing enzymes results in enhanced benzylisoquinoline alkaloid production through hybrid norcoclaurine and norlaudanosoline pathways. Producing plant secondary metabolites by microbes is limited by the known enzymatic reactions. Here, the authors apply machine learning to predict missing link enzymes of benzylisoquinoline alkaloid (BIA) biosynthesis in Papaver somniferum, and validate the specialized activities through heterologous production.
Collapse
|
12
|
Heid E, Goldman S, Sankaranarayanan K, Coley CW, Flamm C, Green WH. EHreact: Extended Hasse Diagrams for the Extraction and Scoring of Enzymatic Reaction Templates. J Chem Inf Model 2021; 61:4949-4961. [PMID: 34587449 PMCID: PMC8549070 DOI: 10.1021/acs.jcim.1c00921] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Indexed: 11/29/2022]
Abstract
Data-driven computer-aided synthesis planning utilizing organic or biocatalyzed reactions from large databases has gained increasing interest in the last decade, sparking the development of numerous tools to extract, apply, and score general reaction templates. The generation of reaction rules for enzymatic reactions is especially challenging since substrate promiscuity varies between enzymes, causing the optimal levels of rule specificity and optimal number of included atoms to differ between enzymes. This complicates an automated extraction from databases and has promoted the creation of manually curated reaction rule sets. Here, we present EHreact, a purely data-driven open-source software tool, to extract and score reaction rules from sets of reactions known to be catalyzed by an enzyme at appropriate levels of specificity without expert knowledge. EHreact extracts and groups reaction rules into tree-like structures, Hasse diagrams, based on common substructures in the imaginary transition structures. Each diagram can be utilized to output a single or a set of reaction rules, as well as calculate the probability of a new substrate to be processed by the given enzyme by inferring information about the reactive site of the enzyme from the known reactions and their grouping in the template tree. EHreact heuristically predicts the activity of a given enzyme on a new substrate, outperforming current approaches in accuracy and functionality.
Collapse
Affiliation(s)
- Esther Heid
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Samuel Goldman
- Computational
and Systems Biology, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Karthik Sankaranarayanan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Christoph Flamm
- Department
of Theoretical Chemistry, University of
Vienna, 1090 Vienna, Austria
| | - William H. Green
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
13
|
Westermayr J, Marquetand P. Machine Learning for Electronically Excited States of Molecules. Chem Rev 2021; 121:9873-9926. [PMID: 33211478 PMCID: PMC8391943 DOI: 10.1021/acs.chemrev.0c00749] [Citation(s) in RCA: 198] [Impact Index Per Article: 49.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Indexed: 12/11/2022]
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna
Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data
Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
14
|
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
15
|
Dutta K, Shityakov S, Khalifa I. New Trends in Bioremediation Technologies Toward Environment-Friendly Society: A Mini-Review. Front Bioeng Biotechnol 2021; 9:666858. [PMID: 34409018 PMCID: PMC8365754 DOI: 10.3389/fbioe.2021.666858] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 05/26/2021] [Indexed: 01/29/2023] Open
Abstract
Today's environmental balance has been compromised by the unreasonable and sometimes dangerous actions committed by humans to maintain their dominance over the Earth's natural resources. As a result, oceans are contaminated by the different types of plastic trash, crude oil coming from mismanagement of transporting ships spilling it in the water, and air pollution due to increasing production of greenhouse gases, such as CO2 and CH4 etc., into the atmosphere. The lands, agricultural fields, and groundwater are also contaminated by the infamous chemicals viz., polycyclic aromatic hydrocarbons, pyrethroids pesticides, bisphenol-A, and dioxanes. Therefore, bioremediation might function as a convenient alternative to restore a clean environment. However, at present, the majority of bioremediation reports are limited to the natural capabilities of microbial enzymes. Synthetic biology with uncompromised supervision of ethical standards could help to outsmart nature's engineering, such as the CETCH cycle for improved CO2 fixation. Additionally, a blend of synthetic biology with machine learning algorithms could expand the possibilities of bioengineering. This review summarized current state-of-the-art knowledge of the data-assisted enzyme redesigning to actively promote new research on important enzymes to ameliorate the environment.
Collapse
Affiliation(s)
- Kunal Dutta
- Department of Human Physiology, Vidyasagar University, Medinipur, India
| | - Sergey Shityakov
- Department of Chemoinformatics, Infochemistry Scientific Center, Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University), Saint-Petersburg, Russia
| | - Ibrahim Khalifa
- Food Technology Department, Faculty of Agriculture, Benha University, Moshtohor, Egypt
| |
Collapse
|
16
|
Sun D, Cheng X, Tian Y, Ding S, Zhang D, Cai P, Hu QN. EnzyMine: a comprehensive database for enzyme function annotation with enzymatic reaction chemical feature. Database (Oxford) 2020; 2023:baaa065. [PMID: 33002112 PMCID: PMC10755256 DOI: 10.1093/database/baaa065] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 07/19/2020] [Accepted: 07/24/2020] [Indexed: 11/14/2022]
Abstract
Addition of chemical structural information in enzymatic reactions has proven to be significant for accurate enzyme function prediction. However, such chemical data lack systematic feature mining and hardly exist in enzyme-related databases. Therefore, global mining of enzymatic reactions will offer a unique landscape for researchers to understand the basic functional mechanisms of natural bioprocesses and facilitate enzyme function annotation. Here, we established a new knowledge base called EnzyMine, through which we propose to elucidate enzymatic reaction features and then link them with sequence and structural annotations. EnzyMine represents an advanced database that extends enzyme knowledge by incorporating reaction chemical feature strategies, strengthening the connectivity between enzyme and metabolic reactions. Therefore, it has the potential to reveal many new metabolic pathways involved with given enzymes, as well as expand enzyme function annotation. Database URL: http://www.rxnfinder.org/enzymine/.
Collapse
Affiliation(s)
- Dandan Sun
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Xingxiang Cheng
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Yu Tian
- School of Biology and Pharmaceutical Engineering, Wuhan Polytechnic University, Wuhan, Hubei 430023, China and
| | - Shaozhen Ding
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Dachuan Zhang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P. R. China
| | - Qian-nan Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| |
Collapse
|