1
|
Han Y, Ding X, Tan J, Sun Y, Duan Y, Liu Z, Zheng G, Lu D. Sequence and taxonomic feature evaluation facilitated the discovery of alcohol oxidases. Synth Syst Biotechnol 2025; 10:907-915. [PMID: 40386440 PMCID: PMC12083922 DOI: 10.1016/j.synbio.2025.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 04/18/2025] [Accepted: 04/21/2025] [Indexed: 05/20/2025] Open
Abstract
Recent advancements in data technology offer immense opportunities for the discovery and development of new enzymes for the green synthesis of chemicals. Current protein databases predominantly prioritize overall sequence matches. The multi-scale features underpinning catalytic mechanisms and processes, which are scattered across various data sources, have not been sufficiently integrated to be effectively utilized in enzyme mining. In this study, we developed a sequence- and taxonomic-feature evaluation driven workflow to discover enzymes that can be expressed in E. coli and catalyze chemical reactions in vitro, using alcohol oxidase (AOX) for demonstration, which catalyzes the conversion of methanol to formaldehyde. A dataset of 21 reported AOXs was used to construct sequence scoring rules based on features, including sequence length, structural motifs, catalytic-related residues, binding residues, and overall structure. These scoring rules were applied to filter the results from HMM-based searches, yielding 357 candidate sequences of eukaryotic origin, which were categorized into six classes at 85 % sequence similarity. Experimental validation was conducted in two rounds on 31 selected sequences representing all classes. Among these selected sequences, 19 were expressed as soluble proteins in E. coli, and 18 of these soluble proteins exhibited AOX activity, as predicted. Notably, the most active recombinant AOX exhibited an activity of 8.65 ± 0.29 U/mg, approaching the highest activity of native eukaryotic enzymes. Compared to the established UniProt-annotation-based workflow, this feature-evaluation-based approach yielded a higher probability of highly active recombinant AOX (from 8.3 % to 19.4 %), demonstrating the efficiency and potential of this multi-dimensional feature evaluation method in accelerating the discovery of active enzymes.
Collapse
Affiliation(s)
- Yilei Han
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Xuwei Ding
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Junjian Tan
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Yajuan Sun
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Yunjiang Duan
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Zheng Liu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Gaowei Zheng
- State Key Laboratory of Bioreactor Engineering, Shanghai Collaborative Innovation Center for Biomanufacturing, East China University of Science and Technology, Shanghai, 200237, China
| | - Diannan Lu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
2
|
Kroll A, Rousset Y. Recent advances and future trends for protein-small molecule interaction predictions with protein language models. Curr Opin Struct Biol 2025; 93:103070. [PMID: 40414181 DOI: 10.1016/j.sbi.2025.103070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 04/23/2025] [Accepted: 05/04/2025] [Indexed: 05/27/2025]
Abstract
In recent years, the application of natural language models to protein amino acid sequences, referred to as protein language models (PLMs), has demonstrated a significant potential for uncovering hidden patterns related to protein structure, function, and stability. The critical functions of proteins in biological processes often arise through interactions with small molecules; central examples are enzymes, receptors, and transporters. Understanding these interactions is particularly important for drug design, for bioengineering, and for understanding cellular metabolism. In this review, we present state-of-the-art PLMs and explore how they can be integrated with small molecule information to predict protein-small molecule interactions. We present several such prediction tasks and discuss current limitations and potential areas for improvement.
Collapse
Affiliation(s)
- Alexander Kroll
- Heinrich-Heine-University, Universitätsstraße 1, Düsseldorf, 40225, NRW, Germany.
| | - Yvan Rousset
- Heinrich-Heine-University, Universitätsstraße 1, Düsseldorf, 40225, NRW, Germany
| |
Collapse
|
3
|
Sun X, Wang YG, Shen Y. A multimodal deep learning framework for enzyme turnover prediction with missing modality. Comput Biol Med 2025; 193:110348. [PMID: 40409036 DOI: 10.1016/j.compbiomed.2025.110348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 04/25/2025] [Accepted: 05/04/2025] [Indexed: 05/25/2025]
Abstract
Accurate prediction of the turnover number (kcat), which quantifies the maximum rate of substrate conversion at an enzyme's active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of kcat are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on kcat due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this kcat prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce MMKcat, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, R2, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at https://github.com/ProEcho1/MMKcat.
Collapse
Affiliation(s)
- Xin Sun
- Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Yu Guang Wang
- Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China
| | - Yiqing Shen
- Johns Hopkins University, Baltimore, 21218, MD, USA.
| |
Collapse
|
4
|
Zou Y, Zheng P, Chen P, Yu X, Wu D. Multidimensional computational strategies enhance the thermostability of alpha-galactosidase. Int J Biol Macromol 2025; 314:144316. [PMID: 40388995 DOI: 10.1016/j.ijbiomac.2025.144316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2025] [Revised: 05/12/2025] [Accepted: 05/15/2025] [Indexed: 05/21/2025]
Abstract
Alpha-Galactosidase has significant industrial application value in food processing, animal nutrition and medical applications. Microbial-derived α-galactosidases predominate industrial implementation due to high productivity, yet their inherent thermal instability necessitates systematic protein engineering. In this study, we established a dual-strategy protein engineering framework to enhance the thermostability of Aspergillus tubingensis α-galactosidase (AtWU_04653). Strategy I employed integrative computational design tools (ABACUS2/PROSS/DBD2) for mutational library construction, which yielded the dominant mutant A169P exhibiting remarkable performance: 78.52 % enhancement in thermal half-life at 55 °C (pH 4.0) and 52.04 % increase in catalytic efficiency (kcat /Km). Strategy II implemented a physics-based computational methodology combining GROMACS molecular dynamics simulations with Rosetta unfolding free energy calculations and SPIRED machine learning predictions, successfully deriving three stabilized variants (E429I, N380L, T64P) displaying 57.33 %, 67.17 %, and 41.34 % extended half-lives respectively. Notably, E429I and T64P demonstrated concurrent 85.25 % and 65.90 % catalytic activity augmentation (kcat /Km). Both strategies achieved substantial reduction in experimental screening workload while enabling synergistic thermostability-activity optimization. This study uses sequence conservation analysis, unfolding free energy calculation, molecular dynamics simulation, and innovative protein prediction models to establish multidimensional computational strategies for designing mutants, providing new and important technical references for computational design and functional optimization of enzymes.
Collapse
Affiliation(s)
- Youfeng Zou
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214122, China
| | - Pu Zheng
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214122, China
| | - Pengcheng Chen
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214122, China
| | - Xiaowei Yu
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214122, China
| | - Dan Wu
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214122, China.
| |
Collapse
|
5
|
Wang X, Chen Q, Huang Z, Lin Y, Zhou J, Ma F. Discovering novel transglutaminases from Streptomyces species for efficient protein cross-linking in foods. Int J Biol Macromol 2025; 313:144283. [PMID: 40381772 DOI: 10.1016/j.ijbiomac.2025.144283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2025] [Revised: 04/22/2025] [Accepted: 05/14/2025] [Indexed: 05/20/2025]
Abstract
Transglutaminase (TG) is a key enzyme in food processing by catalyzing protein crosslinking. This study identified novel hyperactive TGs from Streptomyces species to enhance crosslinking efficiency. We integrated phylogenetic analysis for initial screening, followed by virtual docking for further selection. Of the eight variants tested, all exhibited specific activity against CBZ-Gln-Gly, with TG derived from Streptomyces wuyuanensis (SwTG) displayed a specific activity of 78.3 U/mg, which is 2.3-fold higher than that of the commonly used Streptomyces mobaraensis derived TG (SmTG). The top three variants were selected for crosslinking experiments, showing that SwTG, Streptomyces sp. TN58 derived TG (StTG), and Streptomyces roseoverticillatus (SrTG) outperformed SmTG in casein crosslinking, while only StTG showed obvious higher activity than SmTG in crosslinking minced meat. These findings suggest that the specific activity of TGs does not always correlate with their ability to induce protein crosslinks. Further investigation using molecular dynamics simulations revealed that larger enzyme volumes with lower flexibility could hinder substrate binding, resulting in weak crosslinking activity for food proteins. Additionally, surface charge was found to be another key factor that disrupt substrate binding. This study presents alternative TGs for generating food crosslinks and offers valuable insights into the mechanisms behind TG-induced crosslinking.
Collapse
Affiliation(s)
- Xinglong Wang
- Medical Enzyme Engineering Center, CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, 88 Keling Road, Suzhou 215004, China; Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Qiming Chen
- Medical Enzyme Engineering Center, CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, 88 Keling Road, Suzhou 215004, China; Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Zhongshi Huang
- Medical Enzyme Engineering Center, CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, 88 Keling Road, Suzhou 215004, China
| | - Yanna Lin
- Shandong Lab of Advanced Biomaterials and Medical Devices in Weihai, 288 Shanhai Road, Weihai, Shandong 264210, China
| | - Jingwen Zhou
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Fuqiang Ma
- Medical Enzyme Engineering Center, CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, 88 Keling Road, Suzhou 215004, China; Shandong Lab of Advanced Biomaterials and Medical Devices in Weihai, 288 Shanhai Road, Weihai, Shandong 264210, China.
| |
Collapse
|
6
|
Du BX, Yu H, Zhu B, Long Y, Wu M, Shi JY. A novel interpretability framework for enzyme turnover number prediction boosted by pre-trained enzyme embeddings and adaptive gate network. Methods 2025; 237:45-52. [PMID: 40021034 DOI: 10.1016/j.ymeth.2025.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 01/05/2025] [Accepted: 02/25/2025] [Indexed: 03/03/2025] Open
Abstract
It is a vital step to identify the enzyme turnover number (kcat) in synthetic biology and early-stage drug discovery. Recently, deep learning methods have achieved inspiring process to predict kcat with the development of multi-species enzyme-substrate pairs turnover number data. However, the performance of existing approaches still heavily depends on the effectiveness of feature extraction for enzymes and substrates, as well as the optimal fusion of these two types of features. Furthermore, it is essential to identify the key molecular substructures that significantly impact kcat prediction. To address these issues, we develop a novel end-to-end dual-representation interpretability framework GELKcat by harnessing graph transformers for substrate molecular encoding and CNNs for enzyme word2vec embeddings. We further integrate substrate and enzyme features using the adaptive gate network, which assigns optimal weights to capture the most suitable feature combinations. The comparison with several state-of-the-art methods demonstrates the superiority of our GELKcat and the ablation studies further illuminate the invaluable roles of three main components. Furthermore, case studies illustrate the interpretability of GELKcat by identifying the key functional groups in a substrate, which are significantly associated with turnover number. It is anticipated that this work can bridge current gaps in enzyme-substrate representation, which can give some guidance for drug discovery and synthetic biology.
Collapse
Affiliation(s)
- Bing-Xue Du
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China; Institute for Infocomm Research (I(2)R), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| | - Haoyang Yu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Bei Zhu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yahui Long
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore.
| | - Min Wu
- Institute for Infocomm Research (I(2)R), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
7
|
Zhai J, Qi X, Cai L, Liu Y, Tang H, Xie L, Wang J. NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling. Brief Bioinform 2025; 26:bbaf212. [PMID: 40370097 PMCID: PMC12078937 DOI: 10.1093/bib/bbaf212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2025] [Revised: 04/14/2025] [Accepted: 04/21/2025] [Indexed: 05/16/2025] Open
Abstract
Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et al.'s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research & development.
Collapse
Affiliation(s)
- Jingchen Zhai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Xiguang Qi
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Lianjin Cai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Yue Liu
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Haocheng Tang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, 695 Park Ave, New York, NY 10065, United States
- Helen & Robert Appel Alzheimer's Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, 413 E 69th St, New York, NY 10021, United States
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| |
Collapse
|
8
|
Blonde C, Caddeo A, Nasser W, Reverchon S, Peyraud R, Haichar FEZ. New insights in metabolism modelling to decipher plant-microbe interactions. THE NEW PHYTOLOGIST 2025; 246:1485-1493. [PMID: 40119556 PMCID: PMC12018784 DOI: 10.1111/nph.70063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Accepted: 02/18/2025] [Indexed: 03/24/2025]
Abstract
Plant disease outbreaks, exacerbated by climate change, threaten food security and environmental sustainability world-wide. Plants interact with a wide range of microorganisms. The quest for resilient agriculture requires a deep insight into the molecular and ecological interplays between plants and their associated microbial communities. Omics methods, by profiling entire molecular sets, have shed light on these complex interactions. Nonetheless, deciphering the relationships among thousands of molecular components remains a formidable challenge, and studies that integrate these components into cohesive biological networks involving plants and associated microbes are still limited. Systems biology has the potential to predict the effects of biotic and abiotic perturbations on these networks. It is therefore a promising framework for addressing the full complexity of plant-microbiome interactions.
Collapse
Affiliation(s)
- Clara Blonde
- INSA Lyon, CNRS, Université Claude Bernard Lyon 1UMR5240 Microbiologie, Adaptation, Pathogénie, Université Lyon10 rue Raphaël Dubois69622VilleurbanneFrance
| | - Amélie Caddeo
- Institut Agro, INRAE, IRHS, SFR QUASAV, Univ AngersF‐49000AngersFrance
- iMEAN135 Avenue de Rangueil31077ToulouseFrance
| | - William Nasser
- INSA Lyon, CNRS, Université Claude Bernard Lyon 1UMR5240 Microbiologie, Adaptation, Pathogénie, Université Lyon10 rue Raphaël Dubois69622VilleurbanneFrance
| | - Sylvie Reverchon
- INSA Lyon, CNRS, Université Claude Bernard Lyon 1UMR5240 Microbiologie, Adaptation, Pathogénie, Université Lyon10 rue Raphaël Dubois69622VilleurbanneFrance
| | | | - Feth el Zahar Haichar
- INSA Lyon, CNRS, Université Claude Bernard Lyon 1UMR5240 Microbiologie, Adaptation, Pathogénie, Université Lyon10 rue Raphaël Dubois69622VilleurbanneFrance
| |
Collapse
|
9
|
Kroll A, Rousset Y, Spitzlei T, Lercher MJ. DeepMolecules: a web server for predicting enzyme and transporter-small molecule interactions. Nucleic Acids Res 2025:gkaf343. [PMID: 40297998 DOI: 10.1093/nar/gkaf343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Revised: 04/04/2025] [Accepted: 04/16/2025] [Indexed: 04/30/2025] Open
Abstract
DeepMolecules is an easily accessible web server for predicting protein-small molecule interactions. It integrates four state-of-the-art models: ESP and SPOT for identifying substrates of enzymes and transporters, respectively, TurNuP for predicting enzyme turnover numbers kcat, and a model for predicting Michaelis constants KM. These models use deep learning-generated numerical representations of the proteins and small molecules as input features for gradient-boosted decision tree models, achieving high predictive performance. The web interface accepts protein amino acid sequences and small molecules in SMILES, InChI, or KEGG ID formats, supporting single submissions and batch submissions via Excel files. Beyond its predictive capabilities, DeepMolecules provides a structured interface to experimental data on known interactions and kinetic parameters, offering a comprehensive view of protein-small molecule relationships. Freely accessible at https://www.DeepMolecules.org, the web server supports applications in metabolic engineering, drug discovery, and biocatalyst optimization by identifying potential substrates and quantifying their catalytic properties.
Collapse
Affiliation(s)
- Alexander Kroll
- Heinrich-Heine-University, Institute for Computer Science and Department of Biology, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Yvan Rousset
- Heinrich-Heine-University, Institute for Computer Science and Department of Biology, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Thomas Spitzlei
- Heinrich-Heine-University, Institute for Computer Science and Department of Biology, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Martin J Lercher
- Heinrich-Heine-University, Institute for Computer Science and Department of Biology, Universitätsstraße 1, 40225 Düsseldorf, Germany
| |
Collapse
|
10
|
Toumpe I, Choudhury S, Hatzimanikatis V, Miskovic L. The Dawn of High-Throughput and Genome-Scale Kinetic Modeling: Recent Advances and Future Directions. ACS Synth Biol 2025. [PMID: 40262025 DOI: 10.1021/acssynbio.4c00868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/24/2025]
Abstract
Researchers have invested much effort into developing kinetic models due to their ability to capture dynamic behaviors, transient states, and regulatory mechanisms of metabolism, providing a detailed and realistic representation of cellular processes. Historically, the requirements for detailed parametrization and significant computational resources created barriers to their development and adoption for high-throughput studies. However, recent advancements, including the integration of machine learning with mechanistic metabolic models, the development of novel kinetic parameter databases, and the use of tailor-made parametrization strategies, are reshaping the field of kinetic modeling. In this Review, we discuss these developments and offer future directions, highlighting the potential of these advances to drive progress in systems and synthetic biology, metabolic engineering, and medical research at an unprecedented scale and pace.
Collapse
Affiliation(s)
- Ilias Toumpe
- Laboratory of Computational Systems Biology (LCSB), Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Subham Choudhury
- Laboratory of Computational Systems Biology (LCSB), Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Vassily Hatzimanikatis
- Laboratory of Computational Systems Biology (LCSB), Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Ljubisa Miskovic
- Laboratory of Computational Systems Biology (LCSB), Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| |
Collapse
|
11
|
Cai Y, Zhang W, Dou Z, Wang C, Yu W, Wang L. PreTKcat: A pre-trained representation learning and machine learning framework for predicting enzyme turnover number. Comput Biol Chem 2025; 115:108327. [PMID: 39765190 DOI: 10.1016/j.compbiolchem.2024.108327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 12/06/2024] [Accepted: 12/24/2024] [Indexed: 02/26/2025]
Abstract
The enzyme turnover number (kcat) is crucial for understanding enzyme kinetics and optimizing biotechnological processes. However, experimentally measured kcat values are limited due to the high cost and labor intensity of wet-lab measurements, necessitating robust computational methods. To address this issue, we propose PreTKcat, a framework that integrates pre-trained representation learning and machine learning to predict kcat values. PreTKcat utilizes the ProtT5 protein language model to encode enzyme sequences and the MolGNet molecular representation learning model to encode substrate molecular graphs. By integrating these representations, the ExtraTrees model is employed to predict kcat values. Additionally, PreTKcat accounts for the impact of temperature on kcat prediction. In addition, PreTKcat can also be used to predict enzyme-substrate affinity, i.e. km values. Comparative assessments with various state-of-the-art models highlight the superior performance of PreTKcat. PreTKcat serves as an effective tool for investigating enzyme kinetics, offering new perspectives for enzyme engineering and its industrial uses.
Collapse
Affiliation(s)
- Yunxiang Cai
- College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China
| | - Wenjuan Zhang
- College of General Education, Tianjin Foreign Studies University, No. 117, Machang Road, Hexi District, Tianjin, 300204, China
| | - Zhuangzhuang Dou
- College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China
| | - Chao Wang
- College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China
| | - Wenping Yu
- College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China
| | - Lin Wang
- College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China.
| |
Collapse
|
12
|
Sweetlove LJ, Ratcliffe RG, Fernie AR. Non-canonical plant metabolism. NATURE PLANTS 2025; 11:696-708. [PMID: 40164785 DOI: 10.1038/s41477-025-01965-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Accepted: 03/01/2025] [Indexed: 04/02/2025]
Abstract
Metabolism is essential for plant growth and has become a major target for crop improvement by enhancing nutrient use efficiency. Metabolic engineering is also the basis for producing high-value plant products such as pharmaceuticals, biofuels and industrial biochemicals. An inherent problem for such engineering endeavours is the tendency to view metabolism as a series of distinct metabolic pathways-glycolysis, the tricarboxylic acid cycle, the Calvin-Benson cycle and so on. While these canonical pathways may represent a dominant or frequently occurring flux mode, systematic analyses of metabolism via computational modelling have emphasized the inherent flexibility of the metabolic network to carry flux distributions that are distinct from the canonical pathways. Recent experimental estimates of metabolic network fluxes using 13C-labelling approaches have revealed numerous instances in which non-canonical pathways occur under different conditions and in different tissues. In this Review, we bring these non-canonical pathways to the fore, summarizing the evidence for their occurrence and the context in which they operate. We also emphasize the importance of non-canonical pathways for metabolic engineering. We argue that the introduction of a high-flux pathway to a desired metabolic product will, by necessity, require non-canonical supporting fluxes in central metabolism to provide the necessary carbon skeletons, energy and reducing power. We illustrate this using the overproduction of isoprenoids and fatty acids as case studies.
Collapse
Affiliation(s)
| | | | - Alisdair R Fernie
- Max Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany
| |
Collapse
|
13
|
Orlando M, Marchetti A, Bombardi L, Lotti M, Fusco S, Mangiagalli M. Polysaccharide degradation in an Antarctic bacterium: Discovery of glycoside hydrolases from remote regions of the sequence space. Int J Biol Macromol 2025; 299:140113. [PMID: 39842586 DOI: 10.1016/j.ijbiomac.2025.140113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 01/13/2025] [Accepted: 01/18/2025] [Indexed: 01/24/2025]
Abstract
Glycoside hydrolases (GHs) are enzymes involved in the degradation of oligosaccharides and polysaccharides. The sequence space of GHs is rapidly expanding due to the increasing number of available sequences. This expansion paves the way for the discovery of novel enzymes with peculiar structural and functional properties. This work is focused on two GHs, Ps_GH5 and Ps_GH50, from the genome of the Antarctic bacterium Pseudomonas sp. ef1. These enzymes are in an unexplored region of the sequence space of their respective GH families, not allowing a reliable sequence-based function prediction. For this reason, a computational pipeline was developed that combines deep learning "dynamic docking" on AlphaFold 3D models with physics-based molecular dynamics simulations to infer their substrate specificity. From in silico screening of a repertoire of potential oligosaccharides, only xylooligosaccharides for Ps_GH5 and galactooligosaccharides for Ps_GH50 emerged as catalytically competent substrates. Biochemical characterization agrees with computational simulations indicating that Ps_GH5 is an endo-β-xylanase, and Ps_GH50 is active mainly on small galactooligosaccharides. In conclusion, this study identifies two novel GHs subfamilies placed in remote regions of the sequence space and highlights the efficacy of substrate specificity prediction by computational approaches in the discovery of new enzymes.
Collapse
Affiliation(s)
- Marco Orlando
- Department of Biotechnology and Biosciences, University of Milano Bicocca, Piazza della Scienza 2, Milano 20126, Italy
| | - Alessandro Marchetti
- Department of Biotechnology and Biosciences, University of Milano Bicocca, Piazza della Scienza 2, Milano 20126, Italy
| | - Luca Bombardi
- Biochemistry and Industrial Biotechnology (BIB) Laboratory, Department of Biotechnology, University of Verona, Verona, Italy
| | - Marina Lotti
- Department of Biotechnology and Biosciences, University of Milano Bicocca, Piazza della Scienza 2, Milano 20126, Italy
| | - Salvatore Fusco
- Biochemistry and Industrial Biotechnology (BIB) Laboratory, Department of Biotechnology, University of Verona, Verona, Italy.
| | - Marco Mangiagalli
- Department of Biotechnology and Biosciences, University of Milano Bicocca, Piazza della Scienza 2, Milano 20126, Italy.
| |
Collapse
|
14
|
Noor MS, Ferdous S, Salehi R, Gates H, Dey S, Raghunath VS, Zargar MR, Chowdhury R. Next-generation metabolic models informed by biomolecular simulations. Curr Opin Biotechnol 2025; 92:103259. [PMID: 39827498 DOI: 10.1016/j.copbio.2025.103259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 01/01/2025] [Indexed: 01/22/2025]
Abstract
Metabolic modeling is essential for understanding the mechanistic bases of cellular metabolism in various organisms, from microbes to humans, and the design of fitter microbial strains. Metabolic networks focus on the overall fluxes through biochemical reactions that implicitly rely on several biochemical processes, such as active or diffusive uptake (or export) of nutrients (or metabolites), enzymatic turnover of metabolites, and metal-cofactor enzyme interactions. Despite independent progress in biomolecular simulations, they have yet to be integrated to inform metabolic models. We explore the evolution of computational metabolic modeling approaches, starting with flux balance analysis, dynamic, kinetic delineations of metabolic shifts in single organisms within cells and across tissues, and mutually informing, community-level modeling frameworks and provide a narrative to tie in biomolecular simulations and machine learning predictions to usher the new phase of structure-guided synthetic biology applications. These additions and prospective novel ones are likely to open hitherto untapped paradigms for optimizing/understanding metabolic pathways toward improving bioproduction of protein and small molecule products with downstream applications in health, environment, energy, and sustainability.
Collapse
Affiliation(s)
- Mohammed S Noor
- Department of Chemical and Biological Engineering, Iowa State University, Ames, IA, USA; Nanovaccine Institute, Iowa State University, Ames, IA, USA
| | - Sakib Ferdous
- Department of Chemical and Biological Engineering, Iowa State University, Ames, IA, USA; Nanovaccine Institute, Iowa State University, Ames, IA, USA
| | - Rahil Salehi
- Department of Chemical and Biological Engineering, Iowa State University, Ames, IA, USA; Nanovaccine Institute, Iowa State University, Ames, IA, USA
| | - Hannah Gates
- Department of Chemical and Biological Engineering, Iowa State University, Ames, IA, USA; Nanovaccine Institute, Iowa State University, Ames, IA, USA
| | - Supantha Dey
- Department of Chemical and Biological Engineering, Iowa State University, Ames, IA, USA; Nanovaccine Institute, Iowa State University, Ames, IA, USA
| | - Vaishnavey S Raghunath
- Department of Chemical and Biological Engineering, Iowa State University, Ames, IA, USA; Nanovaccine Institute, Iowa State University, Ames, IA, USA
| | - Mohammad R Zargar
- Department of Chemical and Biological Engineering, Iowa State University, Ames, IA, USA; Nanovaccine Institute, Iowa State University, Ames, IA, USA
| | - Ratul Chowdhury
- Department of Chemical and Biological Engineering, Iowa State University, Ames, IA, USA; Nanovaccine Institute, Iowa State University, Ames, IA, USA.
| |
Collapse
|
15
|
Vasudhevan P, Ruoyu Z, Ma H, Singh S, Varshney D, Pu S. Biocatalytic enzymes in food packaging, biomedical, and biotechnological applications: A comprehensive review. Int J Biol Macromol 2025; 300:140069. [PMID: 39832587 DOI: 10.1016/j.ijbiomac.2025.140069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 12/27/2024] [Accepted: 01/17/2025] [Indexed: 01/22/2025]
Abstract
The increasing environmental concerns and health risks associated with synthetic chemicals have driven the demand for sustainable and eco-friendly solutions. Biocatalysis, employing enzymes or whole cells as biocatalysts, has emerged as a powerful alternative. This review provides a comprehensive analysis of the applications of biocatalytic enzymes in food packaging, biomedical sciences, and biotechnology. We highlight the potential of enzymes like laccase, glucose oxidase, lysozyme, protease, lipase, cellulase, and asparaginase to replace traditional chemical methods, driving innovation and sustainability. The global enzyme market is also analyzed, including current trends, emerging demands, and the impact of the COVID-19 pandemic. This review aims to bridge knowledge gaps, emphasize recent technological breakthroughs, and showcase the potential of biocatalytic enzymes to address critical industrial challenges while supporting environmental sustainability and economic growth.
Collapse
Affiliation(s)
- Palanisamy Vasudhevan
- State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu 610059, China.
| | - Zhang Ruoyu
- State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu 610059, China
| | - Hui Ma
- State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu 610059, China
| | - Subhav Singh
- Chitkara Centre for Research and Development, Chitkara University, Himachal Pradesh 174103, India; Division of research and development, Lovely Professional University, Phagwara, Punjab, India
| | - Deekshant Varshney
- Centre of Research Impact and Outcome, Chitkara University, Rajpura 140417, Punjab, India; Division of Research & innovation, Uttaranchal University, Dehradun, India
| | - Shengyan Pu
- State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu 610059, China.
| |
Collapse
|
16
|
Wang Y. Unlocking plant metabolic resilience: how enzyme-constrained metabolic models illuminate thermal responses. THE NEW PHYTOLOGIST 2025. [PMID: 40125595 DOI: 10.1111/nph.70100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]
Affiliation(s)
- Yu Wang
- School of Life Sciences, Nanjing University, 163 Xianlin Road, Nanjing, Jiangsu, 210023, China
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1206 W. Gregory Drive, Urbana, IL, 61801, USA
| |
Collapse
|
17
|
de Moura Ferreira MA, de Almeida ELM, da Silveira WB, Nikoloski Z. Protein-constrained models pinpoints the role of underground metabolism in robustness of metabolic phenotypes. iScience 2025; 28:112126. [PMID: 40160425 PMCID: PMC11951047 DOI: 10.1016/j.isci.2025.112126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 01/26/2025] [Accepted: 02/25/2025] [Indexed: 04/02/2025] Open
Abstract
Integrating enzyme parameters into constraint-based models have significantly improved the prediction of physiological and molecular traits. To further improve these models, we integrated promiscuous enzyme activities that jointly comprise the so-called underground metabolism by developing the CORAL toolbox, which increases the resolution of modeled enzyme resource allocation. Applying CORAL to a protein-constrained model of Escherichia coli revealed that underground metabolism resulted in larger flexibility of metabolic fluxes and enzyme usage. Simulating metabolic defects where the main activity of a promiscuous enzyme was blocked but promiscuous activities remained functional showed a small enzyme redistribution to the side activities. Further, blocking pairs of main activities showed that non-promiscuous enzymes exhibited larger impact on growth than promiscuous enzymes. These simulations showed that promiscuous enzymes can compensate for these defects, in line with experimental evidence. Together, our results indicated that promiscuous enzyme activities are vital to maintain robust metabolic function and growth.
Collapse
Affiliation(s)
| | | | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
- Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam, Germany
| |
Collapse
|
18
|
Wang Z, Xie D, Wu D, Luo X, Wang S, Li Y, Yang Y, Li W, Zheng L. Robust enzyme discovery and engineering with deep learning using CataPro. Nat Commun 2025; 16:2736. [PMID: 40108140 PMCID: PMC11923063 DOI: 10.1038/s41467-025-58038-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Accepted: 03/11/2025] [Indexed: 03/22/2025] Open
Abstract
Accurate prediction of enzyme kinetic parameters is crucial for enzyme exploration and modification. Existing models face the problem of either low accuracy or poor generalization ability due to overfitting. In this work, we first developed unbiased datasets to evaluate the actual performance of these methods and proposed a deep learning model, CataPro, based on pre-trained models and molecular fingerprints to predict turnover number (kcat), Michaelis constant (Km), and catalytic efficiency (kcat/Km). Compared with previous baseline models, CataPro demonstrates clearly enhanced accuracy and generalization ability on the unbiased datasets. In a representational enzyme mining project, by combining CataPro with traditional methods, we identified an enzyme (SsCSO) with 19.53 times increased activity compared to the initial enzyme (CSO2) and then successfully engineered it to improve its activity by 3.34 times. This reveals the high potential of CataPro as an effective tool for future enzyme discovery and modification.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, Jinan, 250100, Shandong, China
| | - Dongqi Xie
- Shanghai Zelixir Biotech Co. Ltd, Shanghai, 201210, Shanghai, China
| | - Dong Wu
- Shanghai Zelixir Biotech Co. Ltd, Shanghai, 201210, Shanghai, China
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Co. Ltd, Shanghai, 201210, Shanghai, China
| | - Yangyang Li
- School of Physics, Shandong University, Jinan, 250100, Shandong, China
| | - Yanmei Yang
- College of Chemistry, Chemical Engineering and Materials Science, Key Laboratory of Molecular and Nano Probes, Ministry of Education, Shandong Normal University, Jinan, 250014, Shandong, China.
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, 250100, Shandong, China.
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech Co. Ltd, Shanghai, 201210, Shanghai, China.
- Shenzhen Zelixir Biotech Co. Ltd, Shenzhen, 518107, Guangdong, China.
| |
Collapse
|
19
|
Qiu W, Yang P, Ye J, Zhou J, Liu S. Unveiling Highly Active and Stable l-Glutaminase through Ancestral Sequence Reconstruction and Turnover Number Prediction. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2025; 73:5353-5362. [PMID: 39994028 DOI: 10.1021/acs.jafc.4c11502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/26/2025]
Abstract
In our study, we employed ancestral sequence reconstruction and DLKcat analysis to engineer l-glutaminases with enhanced activity and thermal stability, using Bacillus subtilis 168 l-glutaminase (YbgJ) as the template. We identified two ancestral l-glutaminases, Anc165 and Anc194, with specific activities 730.6- and 203.5-fold higher than YbgJ, respectively. Anc165 retained 96% activity at 65 °C and 69% at 70 °C for 30 min, while Anc194 maintained over 40% activity at 70 °C, contrasting with YbgJ, which was inactivated above 55 °C. In a 15% NaCl solution, Anc165 and Anc194 retained 100 and 18% activity, respectively, compared to YbgJ's complete loss. Molecular dynamics simulations indicate that the enhanced thermal stability of Anc165 is due to its increased structural rigidity. The enhanced activity of Anc165 is due to a more stable enzyme-substrate complex with l-glutamine. In a simulated soy sauce fermentation system, Anc165 produced about 10% more glutamate than YbgJ. With high-thermal stability and activity, Anc165 could be a potential candidate for industrial applications.
Collapse
Affiliation(s)
- Wenxuan Qiu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- JiaXing Institute of Future Food, Jiaxing 314000, Zhejiang, China
| | - Penghui Yang
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
| | - Jiacai Ye
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
| | - Jingwen Zhou
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- JiaXing Institute of Future Food, Jiaxing 314000, Zhejiang, China
| | - Song Liu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, Jiangsu, China
- JiaXing Institute of Future Food, Jiaxing 314000, Zhejiang, China
| |
Collapse
|
20
|
Wang Y, Cheng L, Zhang Y, Cao Y, Alghazzawi D. DEKP: a deep learning model for enzyme kinetic parameter prediction based on pretrained models and graph neural networks. Brief Bioinform 2025; 26:bbaf187. [PMID: 40273427 PMCID: PMC12021017 DOI: 10.1093/bib/bbaf187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/10/2025] [Accepted: 03/28/2025] [Indexed: 04/26/2025] Open
Abstract
The prediction of enzyme kinetic parameters is crucial for screening enzymes with high catalytic efficiency and desired characteristics to catalyze natural or non-natural reactions. Data-driven machine learning models have been explored to reduce experimental cost and speed up the enzyme design process. However, the prediction performance is still subject to significant limitations due to the variance in sequence similarity between training and testing datasets. In this work, we introduce DEKP, an integrated deep learning approach enzyme kinetic parameter prediction. It leverages pretrained models of protein sequences and incorporates enhanced graph neural networks that provide comprehensive representation of protein structural features. This novel approach can effectively alleviate the performance degradation caused by sequence similarity variation. Moreover, it provides sensitive detection of changes in catalytic efficiency due to enzyme mutations. Experiments validate that DEKP outperforms existing models in predicting enzyme kinetic parameters. This work is expected to significantly improve the performance of the enzyme screening process and provide a robust tool for enzyme-directed evolution research.
Collapse
Affiliation(s)
- Yizhen Wang
- School of Computer Science, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
| | - Li Cheng
- School of Computer Science, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
- Key Laboratory of Intelligent Sensing System and Security, Hubei University, Ministry of Education, No. 368 Youyi Road, 430062 Wuhan, China
- Hubei Key Laboratory of Big Data Intelligent Analysis and Application, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
| | - Yanyun Zhang
- School of Computer Science, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
- Key Laboratory of Intelligent Sensing System and Security, Hubei University, Ministry of Education, No. 368 Youyi Road, 430062 Wuhan, China
- Hubei Key Laboratory of Big Data Intelligent Analysis and Application, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
| | - Yujia Cao
- School of Computer Science, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
| | - Daniyal Alghazzawi
- Faculty of Computing and Information Technology (FCIT), 3599 King Abdulaziz University (KAU), Unit 3600, Jeddah 22254-7653, Saudi Arabia
| |
Collapse
|
21
|
Boorla VS, Maranas CD. CatPred: a comprehensive framework for deep learning in vitro enzyme kinetic parameters. Nat Commun 2025; 16:2072. [PMID: 40021618 PMCID: PMC11871309 DOI: 10.1038/s41467-025-57215-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 02/14/2025] [Indexed: 03/03/2025] Open
Abstract
Estimation of enzymatic activities still heavily relies on experimental assays, which can be cost and time-intensive. We present CatPred, a deep learning framework for predicting in vitro enzyme kinetic parameters, including turnover numbers (kcat), Michaelis constants (Km), and inhibition constants (Ki). CatPred addresses key challenges such as the lack of standardized datasets, performance evaluation on enzyme sequences that are dissimilar to those used during training, and model uncertainty quantification. We explore diverse learning architectures and feature representations, including pretrained protein language models and three-dimensional structural features, to enable robust predictions. CatPred provides accurate predictions with query-specific uncertainty estimates, with lower predicted variances correlating with higher accuracy. Pretrained protein language model features particularly enhance performance on out-of-distribution samples. CatPred also introduces benchmark datasets with extensive coverage (~23 k, 41 k, and 12 k data points for kcat, Km, and Ki respectively). Our framework performs competitively with existing methods while offering reliable uncertainty quantification.
Collapse
Grants
- This material is based upon work supported by the Center for Bioenergy Innovation (CBI), U.S. Department of Energy, Office of Science, Biological and Environmental Research Program under Award Number ERKP886. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the U.S. Department of Energy. This work was also supported by the U.S. National Science Foundation funded Molecule Maker Lab Institute (MMLI), award number 2019897 supported by National AI Research Institutes Program of the Directorate for Computer and Information Science and Engineering (CISE), in collaboration with the Division of Chemistry (CHE) and the Division of Chemical, Bioengineering, and Environmental Transport Systems (CBET) awarded to CDM. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Collapse
Affiliation(s)
- Veda Sheersh Boorla
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
- The Center for Bioenergy Innovation, Oak Ridge, TN, 37830, USA
| | - Costas D Maranas
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.
- The Center for Bioenergy Innovation, Oak Ridge, TN, 37830, USA.
| |
Collapse
|
22
|
Anna Sajeevan K, Osinuga A, B A, Ferdous S, Shahreen N, Noor MS, Koneru S, Santos-Correa LM, Salehi R, Chowdhury NB, Calderon-Lopez B, Mali A, Saha R, Chowdhury R. Robust Prediction of Enzyme Variant Kinetics with RealKcat. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.10.637555. [PMID: 39990461 PMCID: PMC11844551 DOI: 10.1101/2025.02.10.637555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Accurate prediction of kinetic parameters is crucial for understanding known and tailoring novel enzymes for biocatalysis. Current models fail to capture mutation effects on catalytically essential residues, limiting their utility in enzyme design. We grid-searched through ten model architectures (25,671 hyperparameter combinations) to identify a gradient-based additive framework called RealKcat, trained on 27,176 experimental entries curated manually (KinHub-27k) by screening 2,158 articles. Clustering catalytic turnover (k cat ) and substrate affinity (K M ) by rational orders of magnitude, RealKcat achieves >85% test accuracy, demonstrating highest sensitivity to mutation-induced variability thus far, and is the first-of-its-kind-model to demonstrate complete loss of activity upon deletion of the catalytic apparatus. Finally, state-of-the-art k cat validation accuracy (96%) on alkaline phosphatase (PafA) mutant industrial dataset confirms RealKcat's generalizability in learning per-residue catalytic relevance.
Collapse
Affiliation(s)
- Karuna Anna Sajeevan
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
- The Center for Biorenewable Chemicals, Iowa State University, Ames, Iowa, USA
| | - Abraham Osinuga
- Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Arunraj B
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | - Sakib Ferdous
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | - Nabia Shahreen
- Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Mohammed Sakib Noor
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | - Shashank Koneru
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | | | - Rahil Salehi
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | - Niaz Bahar Chowdhury
- Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Brisa Calderon-Lopez
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | - Ankur Mali
- Department of Computer Science and Engineering, University of South Florida, Tampa, Florida, USA
| | - Rajib Saha
- Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Ratul Chowdhury
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
- The Center for Biorenewable Chemicals, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
23
|
Huang Z, Zhou J, Wang J, Xu S, Cheng C, Ma J, Gao Z. Complementary Distant and Active Site Mutations Simultaneously Enhance Catalytic Activity and Thermostability of α-Galactosidase. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2025; 73:3635-3644. [PMID: 39899880 DOI: 10.1021/acs.jafc.4c12426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2025]
Abstract
The industrial applications of enzymes are limited due to the activity-stability trade-off, which implies that the improvement of thermostability often accompanies decreased activity. This study presents a dual-strategy approach to simultaneously improve the catalytic efficiency and thermostability of α-galactosidase galV from Anoxybacillus vitaminiphilus WMF1. Our integrated method combines computational analysis with enzyme property prediction to selectively target and modify the catalytic region and residues that are distant from the active site. We identified and experimentally validated mutations that improve activity without compromising stability and further increased thermostability through additional distant-site mutations. The resulting mutant enzyme variant N549Q/T550N/Y634F demonstrated a 6.2-fold increase in catalytic efficiency and a 3.2-fold improvement in the half-life at 65 °C. Molecular dynamics (MD) simulations supported the structural basis for the observed enhancements. This approach offers a refined strategy for engineering α-galactosidases with improved industrial applicability, overcoming the traditional trade-offs between enzyme activity and stability. Hydrolytic activity toward raffinose family oligosaccharides (RFOs) was validated using soymilk as a model substrate, demonstrating significant practical potential.
Collapse
Affiliation(s)
- Zhuangzhuang Huang
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 Puzhunan Road, Nanjing, Jiangsu 211816, China
| | - Junru Zhou
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 Puzhunan Road, Nanjing, Jiangsu 211816, China
| | - Jialing Wang
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 Puzhunan Road, Nanjing, Jiangsu 211816, China
| | - Sheng Xu
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 Puzhunan Road, Nanjing, Jiangsu 211816, China
| | - Cheng Cheng
- School of Pharmaceutical Sciences, Nanjing Tech University, 30 Puzhunan Road, Nanjing, Jiangsu 211816, China
| | - Jiangfeng Ma
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 Puzhunan Road, Nanjing, Jiangsu 211816, China
| | - Zhen Gao
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 Puzhunan Road, Nanjing, Jiangsu 211816, China
| |
Collapse
|
24
|
Wang X, Wu H, Wang T, Chen Y, Jia B, Fang H, Yin X, Zhao Y, Yu R. NIRFluor: A Deep Learning Platform for Rapid Screening of Small Molecule Near-Infrared Fluorophores with Desired Optical Properties. Anal Chem 2025; 97:1992-2002. [PMID: 39818744 DOI: 10.1021/acs.analchem.4c01953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
Small molecule near-infrared (NIR) fluorophores play a critical role in disease diagnosis and early detection of various markers in living organisms. To accelerate their development and design, a deep learning platform, NIRFluor, was established to rapidly screen small molecule NIR fluorophores with the desired optical properties. The core component of NIRFluor is a state-of-the-art deep learning model trained on 5179 experimental big data. First, novel hybrid fingerprints including Morgan fingerprints, physicochemical properties, and solvent properties were proposed. Then, a powerful deep learning model, multitask fingerprint-enhanced graph convolutional network (MT-FinGCN), was designed, which combines fingerprint information and molecule graph structure information to achieve accurate prediction of six properties (absorption wavelength, emission wavelength, Stokes shift, extinction coefficient, photoluminescence quantum yield, and lifetime) of different small molecule NIR fluorophores in different solvents. Furthermore, the "black-box" of the GCN model was opened through interpretability studies. Finally, the well-trained models were placed on the web platform NIRFluor for free use (https://nirfluor.aicbsc.com).
Collapse
Affiliation(s)
- Xiaozhi Wang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Hailong Wu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Tong Wang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Yao Chen
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
- Hunan Key Lab of Biomedical Materials and Devices, College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou 412008, China
| | - Baoshuo Jia
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Huan Fang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Xiaoyue Yin
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Yanping Zhao
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Ruqin Yu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| |
Collapse
|
25
|
Kratochvíl M, Wilken SE, Ebenhöh O, Schneider R, Satagopam VP. COBREXA 2: tidy and scalable construction of complex metabolic models. Bioinformatics 2025; 41:btaf056. [PMID: 39921902 PMCID: PMC11842047 DOI: 10.1093/bioinformatics/btaf056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 01/24/2025] [Accepted: 02/03/2025] [Indexed: 02/10/2025] Open
Abstract
SUMMARY Constraint-based metabolic models offer a scalable framework to investigate biological systems using optimality principles. Construction and simulation of detailed models that utilize multiple kinds of constraint systems pose a significant coding overhead, complicating implementation of new types of analyses. We present an improved version of the constraint-based metabolic modeling package COBREXA, which utilizes a hierarchical model construction framework that decouples the implemented analysis algorithms into independent, yet re-combinable, building blocks. By removing the need to re-implement modeling components, assembly of complex metabolic models is simplified, which we demonstrate on use-cases of resource-balanced models, and enzyme-constrained flux balance models of interacting bacterial communities. Notably, these models show improved predictive capabilities in both monoculture and community settings. In perspective, the re-usable model-building components in COBREXA 2 provide a sustainable way to handle increasingly complex models in constraint-based modeling. AVAILABILITY AND IMPLEMENTATION COBREXA 2 is available from https://github.com/COBREXA/COBREXA.jl, and from Julia package repositories. COBREXA 2 works on all major operating systems and computer architectures. Documentation is available at https://cobrexa.github.io/COBREXA.jl/.
Collapse
Affiliation(s)
- Miroslav Kratochvíl
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
| | - St Elmo Wilken
- Institute of Quantitative and Theoretical Biology, Heinrich Heine University, Düsseldorf, North Rhine-Westphalia 40225, Germany
- Cluster of Excellence on Plant Sciences, Heinrich Heine University, Düsseldorf, North Rhine-Westphalia 40225, Germany
| | - Oliver Ebenhöh
- Institute of Quantitative and Theoretical Biology, Heinrich Heine University, Düsseldorf, North Rhine-Westphalia 40225, Germany
- Cluster of Excellence on Plant Sciences, Heinrich Heine University, Düsseldorf, North Rhine-Westphalia 40225, Germany
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
| | - Venkata P Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette L-4362, Luxembourg
| |
Collapse
|
26
|
Dosajh A, Agrawal P, Chatterjee P, Priyakumar UD. Modern machine learning methods for protein property prediction. Curr Opin Struct Biol 2025; 90:102990. [PMID: 39881454 DOI: 10.1016/j.sbi.2025.102990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 12/06/2024] [Accepted: 01/04/2025] [Indexed: 01/31/2025]
Abstract
Recent progress and development of artificial intelligence and machine learning (AI/ML) techniques have enabled addressing complex biomolecular problems. AI/ML models learn the underlying distribution of data they are trained on and when exposed to new inputs, they make predictions based on patterns and relationships previously observed in the training set. Further, generative artificial intelligence (GenAI) can be used to accurately generate protein structure or sequence from specific selected properties. This review specifically focuses on the applications of AI/ML in predicting important functional properties of proteins, and the potential prospects of reverse-engineering in depicting the sequence and structure, from available protein-property information.
Collapse
Affiliation(s)
- Arjun Dosajh
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India
| | - Prakul Agrawal
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India
| | - Prathit Chatterjee
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India.
| |
Collapse
|
27
|
Wendering P, Andreou GM, Laitinen RAE, Nikoloski Z. Metabolic modeling identifies determinants of thermal growth responses in Arabidopsis thaliana. THE NEW PHYTOLOGIST 2025. [PMID: 39856022 DOI: 10.1111/nph.20420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Accepted: 01/09/2025] [Indexed: 01/27/2025]
Abstract
Temperature is a critical environmental factor affecting nearly all plant processes, including growth, development, and yield. Yet, despite decades of research, we lack the ability to predict plant performance at different temperatures, limiting the development of climate-resilient crops. Further, there is a pressing need to bridge the gap between the prediction of physiological and molecular traits to improve our understanding and manipulation of plant temperature responses. Here, we developed the first enzyme-constrained model of Arabidopsis thaliana's metabolism, facilitating predictions of growth-related phenotypes at different temperatures. We showed that the model can be employed for in silico identification of genes that affect plant growth at suboptimal growth temperature. Using mutant lines, we validated the genes predicted to affect plant growth, demonstrating the potential of metabolic modeling in accurately predicting plant thermal responses. The temperature-dependent enzyme-constrained metabolic model provides a template that can be used for developing sophisticated strategies to engineer climate-resilient crops.
Collapse
Affiliation(s)
- Philipp Wendering
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Potsdam, 14476, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam, 14476, Germany
| | - Gregory M Andreou
- Organismal and Evolutionary Research Programme, Faculty of Biological and Environmental Sciences, Viikki Plant Science Centre, University of Helsinki, Viikinkaari 1, Helsinki, 00790, Finland
| | - Roosa A E Laitinen
- Organismal and Evolutionary Research Programme, Faculty of Biological and Environmental Sciences, Viikki Plant Science Centre, University of Helsinki, Viikinkaari 1, Helsinki, 00790, Finland
| | - Zoran Nikoloski
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Potsdam, 14476, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam, 14476, Germany
| |
Collapse
|
28
|
Zeng Z, Guo J, Jin J, Luo X. CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions. J Cheminform 2025; 17:2. [PMID: 39773344 PMCID: PMC11707929 DOI: 10.1186/s13321-024-00944-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 12/19/2024] [Indexed: 01/11/2025] Open
Abstract
Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based AnnotatIon for Reaction's EC), a novel framework leveraging contrastive learning, pre-trained language model-based reaction embeddings, and data augmentation to address these limitations. CLAIRE achieved notable performance improvements, demonstrating weighted average F1 scores of 0.861 and 0.911 on the testing set (n = 18,816) and an independent dataset (n = 1040) derived from yeast's metabolic model, respectively. Remarkably, CLAIRE significantly outperformed the state-of-the-art model by 3.65 folds and 1.18 folds, respectively. Its high accuracy positions CLAIRE as a promising tool for retrosynthesis planning, drug fate prediction, and synthetic biology applications. CLAIRE is freely available on GitHub ( https://github.com/zishuozeng/CLAIRE ).Scientific contributionThis work employed contrastive learning for predicting enzymatic reaction's EC numbers, overcoming the challenges in data scarcity and imbalance. The new model achieves the state-of-the-art performance and may facilitate the computer-aided synthesis planning.
Collapse
Affiliation(s)
- Zishuo Zeng
- Synceres Biosciences Co. Ltd., Shenzhen, 518100, China.
| | - Jin Guo
- Synceres Biosciences Co. Ltd., Shenzhen, 518100, China
| | - Jiao Jin
- Synceres Biosciences Co. Ltd., Shenzhen, 518100, China
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Key Laboratory of Quantitative Synthetic Biology, Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
29
|
Taheri-Garavand A, Beiranvandi M, Ahmadi A, Nikoloudakis N. Smart estimation of protective antioxidant enzymes' activity in savory (Satureja rechingeri L.) under drought stress and soil amendments. BMC PLANT BIOLOGY 2025; 25:19. [PMID: 39757153 DOI: 10.1186/s12870-024-06044-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Accepted: 12/31/2024] [Indexed: 01/07/2025]
Abstract
Savory (Satureja rechingeri L.) is one of Iran's most important medicinal plants, having low irrigation needs, and thus is considered one of the most valuable plants for cultivation in arid and semi-arid regions, especially under drought conditions. The current research was carried out to develop a genetic algorithm-based artificial neural network (ΑΝΝ) model able of simulating the levels of antioxidants in savory when using soil amendments [biochar (BC) and superabsorbent (SA)] under drought. Data under different watering schemes and different levels of soil amendments showed that both BC and SA have mitigating effects over drought stress by optimizing enzymatic and non-enzymatic antioxidant traits (POD, CTA, and APX enzymes). Specifically, using biochar and superabsorbent led to improved homeostasis under water deficit as reflected by lower MDA levels. An ANN model with a 3-10-6 topology was found to be the best model to predict polyphenols (PHE), proline (PRO), peroxidase (POX), catalase (CAT), ascorbate peroxidase (APX) levels, and indicator of oxidative stress malondialdehyde (MDA). The model's efficiency was established using the R-value as the statistical parameter, and simulated GA-ANN data were highly correlated with experimental findings. Across enzymatic antioxidants, APX had the best model fit, having an R-value of 0.9733. On the other hand, POX had a lower predictive correlation (R = 0.8737), indicating a lower capacity of the ANN system in forecasting this parameter. On the other hand, MDA (R = 0.9690) had an elevated assimilation performance over PHE (R = 0.9604) and PRO (R = 0.9245) levels. The current study shows the potential of the ANN model in predicting the content of enzymatic and non-enzymatic antioxidants in savory plants under drought stress as a non-invasive, low-cost experimental alternative.
Collapse
Affiliation(s)
- Amin Taheri-Garavand
- Mechanical Engineering of Biosystems Department, Lorestan University, Khorramabad, Iran.
| | - Mojgan Beiranvandi
- Department of Agro-Ecology, Faculty of Agriculture, Lorestan University, Khorramabad, Iran
| | - Abdolreza Ahmadi
- Department of Plant Protection, Faculty of Agriculture, Lorestan University, Khorramabad, Iran
| | - Nikolaos Nikoloudakis
- Department of Agricultural Science, Biotechnology and Food Science, Cyprus University of Technology, Limassol, 3036, Cyprus
| |
Collapse
|
30
|
Siharath C, Biondi O, Peres S. Modelling energy metabolism dysregulations in neuromuscular diseases: A case study of calpainopathy. Heliyon 2024; 10:e40918. [PMID: 39759341 PMCID: PMC11698924 DOI: 10.1016/j.heliyon.2024.e40918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 11/05/2024] [Accepted: 12/03/2024] [Indexed: 01/07/2025] Open
Abstract
Biological modelling helps understanding complex processes, like energy metabolism, by predicting pathway compensations and equilibrium under given conditions. When deciphering metabolic adaptations, traditional experiments face challenges due to numerous enzymatic activities, needing modelling to anticipate pathway behaviours and orientate research. This paper aims to implement a constraint-based modelling method of muscular energy metabolism, adaptable to individual situations, energy demands, and complex disease-specific metabolic alterations like muscular dystrophy calpainopathy. Our calpainopathy-like model not only confirms the ATP production defect under increasing energy demands, but suggests compensatory mechanisms through anaerobic glycolysis. However, excessive glycolysis indicates a need to enhance mitochondrial respiration, preventing excess lactate production common in several diseases. Our model suggests that moderate-intensity physiotherapy, known to improve aerobic performance and anaerobic buffering, combined with increased carbohydrate and amino acid sources, could be a potent therapeutic approach for calpainopathy.
Collapse
Affiliation(s)
- Camille Siharath
- Laboratoire de Biométrie et de Biologie Évolutive, UMR CNRS 5558 Université Claude Bernard Lyon 1, 69622, Villeurbanne cedex, France
- ERABLE, INRIA Lyon Centre, 69622, Villeurbanne cedex, France
| | - Olivier Biondi
- Laboratoire de Biologie de l'Exercice pour la Performance et la Santé (LBEPS), UMR, Université d'Evry, IRBA, Université de Paris Saclay, 91025, Evry-Courcouronnes, France
| | - Sabine Peres
- Laboratoire de Biométrie et de Biologie Évolutive, UMR CNRS 5558 Université Claude Bernard Lyon 1, 69622, Villeurbanne cedex, France
- ERABLE, INRIA Lyon Centre, 69622, Villeurbanne cedex, France
| |
Collapse
|
31
|
Harding-Larsen D, Funk J, Madsen NG, Gharabli H, Acevedo-Rocha CG, Mazurenko S, Welner DH. Protein representations: Encoding biological information for machine learning in biocatalysis. Biotechnol Adv 2024; 77:108459. [PMID: 39366493 DOI: 10.1016/j.biotechadv.2024.108459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/19/2024] [Accepted: 09/29/2024] [Indexed: 10/06/2024]
Abstract
Enzymes offer a more environmentally friendly and low-impact solution to conventional chemistry, but they often require additional engineering for their application in industrial settings, an endeavour that is challenging and laborious. To address this issue, the power of machine learning can be harnessed to produce predictive models that enable the in silico study and engineering of improved enzymatic properties. Such machine learning models, however, require the conversion of the complex biological information to a numerical input, also called protein representations. These inputs demand special attention to ensure the training of accurate and precise models, and, in this review, we therefore examine the critical step of encoding protein information to numeric representations for use in machine learning. We selected the most important approaches for encoding the three distinct biological protein representations - primary sequence, 3D structure, and dynamics - to explore their requirements for employment and inductive biases. Combined representations of proteins and substrates are also introduced as emergent tools in biocatalysis. We propose the division of fixed representations, a collection of rule-based encoding strategies, and learned representations extracted from the latent spaces of large neural networks. To select the most suitable protein representation, we propose two main factors to consider. The first one is the model setup, which is influenced by the size of the training dataset and the choice of architecture. The second factor is the model objectives such as consideration about the assayed property, the difference between wild-type models and mutant predictors, and requirements for explainability. This review is aimed at serving as a source of information and guidance for properly representing enzymes in future machine learning models for biocatalysis.
Collapse
Affiliation(s)
- David Harding-Larsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Jonathan Funk
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Niklas Gesmar Madsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Hani Gharabli
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Carlos G Acevedo-Rocha
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Ditte Hededam Welner
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark.
| |
Collapse
|
32
|
Yan X, Bao W, Wu Y, Zhang C, Mao Z, Yuan Q, Hu Z, He P, Peng Q, Hu M, Geng B, Ma H, Chen S, Fei Q, He Q, Yang S. Paradigm of engineering recalcitrant non-model microorganism with dominant metabolic pathway as a biorefinery chassis. Nat Commun 2024; 15:10441. [PMID: 39616174 PMCID: PMC11608335 DOI: 10.1038/s41467-024-54897-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 11/22/2024] [Indexed: 05/17/2025] Open
Abstract
The development and implementation of microbial chassis cells have profound impacts on circular economy. Non-model bacterium Zymomonas mobilis is an excellent chassis owing to its extraordinary industrial characteristics. Here, the genome-scale metabolic model iZM516 is improved and updated by integrating enzyme constraints to simulate the dynamics of flux distribution and guide pathway design. We show that the innate dominant ethanol pathway of Z. mobilis restricts the titer and rate of these biochemicals. A dominant-metabolism compromised intermediate-chassis (DMCI) strategy is then developed through introducing low toxicity but cofactor imbalanced 2,3-butanediol pathway, and a recombinant D-lactate producer is constructed to produce more than 140.92 g/L and 104.6 g/L D-lactate (yield > 0.97 g/g) from glucose and corncob residue hydrolysate, respectively. Additionally, techno-economic analysis (TEA) and life cycle assessment (LCA) demonstrate the commercialization feasibility and greenhouse gas reduction capability of lignocellulosic D-lactate. This work thus establishes a paradigm for engineering recalcitrant microorganisms as biorefinery chassis.
Collapse
Affiliation(s)
- Xiongying Yan
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China
| | - Weiwei Bao
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China
| | - Yalun Wu
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China
| | - Chenyue Zhang
- Xi'an Key Laboratory of C1 Compound Bioconversion Technology, School of Chemical Engineering and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Zhitao Mao
- Biodesign Center, Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Qianqian Yuan
- Biodesign Center, Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Zhousheng Hu
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China
| | - Penghui He
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China
| | - Qiqun Peng
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China
| | - Mimi Hu
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China
| | - Binan Geng
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China
| | - Hongwu Ma
- Biodesign Center, Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Shouwen Chen
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China
| | - Qiang Fei
- Xi'an Key Laboratory of C1 Compound Bioconversion Technology, School of Chemical Engineering and Technology, Xi'an Jiaotong University, Xi'an, China.
| | - Qiaoning He
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China.
| | - Shihui Yang
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, China.
| |
Collapse
|
33
|
Nana Teukam YG, Zipoli F, Laino T, Criscuolo E, Grisoni F, Manica M. Integrating genetic algorithms and language models for enhanced enzyme design. Brief Bioinform 2024; 26:bbae675. [PMID: 39780486 PMCID: PMC11711099 DOI: 10.1093/bib/bbae675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 06/24/2024] [Accepted: 12/13/2024] [Indexed: 01/11/2025] Open
Abstract
Enzymes are molecular machines optimized by nature to allow otherwise impossible chemical processes to occur. Their design is a challenging task due to the complexity of the protein space and the intricate relationships between sequence, structure, and function. Recently, large language models (LLMs) have emerged as powerful tools for modeling and analyzing biological sequences, but their application to protein design is limited by the high cardinality of the protein space. This study introduces a framework that combines LLMs with genetic algorithms (GAs) to optimize enzymes. LLMs are trained on a large dataset of protein sequences to learn relationships between amino acid residues linked to structure and function. This knowledge is then leveraged by GAs to efficiently search for sequences with improved catalytic performance. We focused on two optimization tasks: improving the feasibility of biochemical reactions and increasing their turnover rate. Systematic evaluations on 105 biocatalytic reactions demonstrated that the LLM-GA framework generated mutants outperforming the wild-type enzymes in terms of feasibility in 90% of the instances. Further in-depth evaluation of seven reactions reveals the power of this methodology to make "the best of both worlds" and create mutants with structural features and flexibility comparable with the wild types. Our approach advances the state-of-the-art computational design of biocatalysts, ultimately opening opportunities for more sustainable chemical processes.
Collapse
Affiliation(s)
- Yves Gaetan Nana Teukam
- IBM Research Europe, Säumerstrasse 4, CH-8803 Rüschlikon, Switzerland
- Institute for Complex Molecular Systems and Department of Biomedical Engineering, Eindhoven University of Technology, 5612 AZ Eindhoven, the Netherlands
| | - Federico Zipoli
- IBM Research Europe, Säumerstrasse 4, CH-8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| | - Teodoro Laino
- IBM Research Europe, Säumerstrasse 4, CH-8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| | - Emanuele Criscuolo
- Institute for Complex Molecular Systems and Department of Biomedical Engineering, Eindhoven University of Technology, 5612 AZ Eindhoven, the Netherlands
| | - Francesca Grisoni
- Institute for Complex Molecular Systems and Department of Biomedical Engineering, Eindhoven University of Technology, 5612 AZ Eindhoven, the Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, the Netherlands
| | - Matteo Manica
- IBM Research Europe, Säumerstrasse 4, CH-8803 Rüschlikon, Switzerland
| |
Collapse
|
34
|
Razaghi-Moghadam Z, Soleymani Babadi F, Nikoloski Z. Harnessing the optimization of enzyme catalytic rates in engineering of metabolic phenotypes. PLoS Comput Biol 2024; 20:e1012576. [PMID: 39495797 PMCID: PMC11563432 DOI: 10.1371/journal.pcbi.1012576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 11/14/2024] [Accepted: 10/21/2024] [Indexed: 11/06/2024] Open
Abstract
The increasing availability of enzyme turnover number measurements from experiments and of turnover number predictions from deep learning models prompts the use of these enzyme parameters in precise metabolic engineering. Yet, there is no computational approach that allows the prediction of metabolic engineering strategies that rely on the modification of turnover numbers. It is also unclear if modifications of turnover numbers without alterations in the host's transcriptional regulatory machinery suffice to increase the production of chemicals of interest. Here, we present a constraint-based modeling approach, termed Overcoming Kinetic rate Obstacles (OKO), that uses enzyme-constrained metabolic models to predict in silico strategies to increase the production of a given chemical, while ensuring specified cell growth. We demonstrate that the application of OKO to enzyme-constrained metabolic models of Escherichia coli and Saccharomyces cerevisiae results in strategies that can at least double the production of over 40 compounds with little penalty to growth. Interestingly, we show that the overproduction of compounds of interest does not entail only an increase in the values of turnover numbers. Lastly, we demonstrate that a refinement of OKO, allowing also for manipulation of enzyme abundance, facilitates the usage of the available compendia and deep learning models of turnover numbers in the design of precise metabolic engineering strategies. Our results expand the usage of genome-scale metabolic models toward the identification of targets for protein engineering, allowing their direct usage in the generation of innovative metabolic engineering designs for various biotechnological applications.
Collapse
Affiliation(s)
- Zahra Razaghi-Moghadam
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
| | - Fayaz Soleymani Babadi
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
| |
Collapse
|
35
|
Gollub MG, Backes T, Kaltenbach HM, Stelling J. ENKIE: a package for predicting enzyme kinetic parameter values and their uncertainties. Bioinformatics 2024; 40:btae652. [PMID: 39495107 PMCID: PMC11588206 DOI: 10.1093/bioinformatics/btae652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 10/23/2024] [Accepted: 10/30/2024] [Indexed: 11/05/2024] Open
Abstract
MOTIVATION Relating metabolite and enzyme abundances to metabolic fluxes requires reaction kinetics, core elements of dynamic and enzyme cost models. However, kinetic parameters have been measured only for a fraction of all known enzymes, and the reliability of the available values is unknown. RESULTS The ENzyme KInetics Estimator (ENKIE) uses Bayesian Multilevel Models to predict value and uncertainty of KM and kcat parameters. Our models use five categorical predictors and achieve prediction performances comparable to deep learning approaches that use sequence and structure information. They provide calibrated uncertainty predictions and interpretable insights into the main sources of uncertainty. We expect our tool to simplify the construction of priors for Bayesian kinetic models of metabolism. AVAILABILITY AND IMPLEMENTATION Code and Python package are available at https://gitlab.com/csb.ethz/enkie and https://pypi.org/project/enkie/.
Collapse
Affiliation(s)
- Mattia G Gollub
- Department of Biosystems Science and Engineering and SIB Swiss Institute of Bioinformatics, ETH Zurich, 4056 Basel, Switzerland
| | - Thierry Backes
- Department of Biosystems Science and Engineering and SIB Swiss Institute of Bioinformatics, ETH Zurich, 4056 Basel, Switzerland
| | - Hans-Michael Kaltenbach
- Department of Biosystems Science and Engineering and SIB Swiss Institute of Bioinformatics, ETH Zurich, 4056 Basel, Switzerland
| | - Jörg Stelling
- Department of Biosystems Science and Engineering and SIB Swiss Institute of Bioinformatics, ETH Zurich, 4056 Basel, Switzerland
| |
Collapse
|
36
|
Muir DF, Asper GPR, Notin P, Posner JA, Marks DS, Keiser MJ, Pinney MM. Evolutionary-Scale Enzymology Enables Biochemical Constant Prediction Across a Multi-Peaked Catalytic Landscape. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.23.619915. [PMID: 39484523 PMCID: PMC11526920 DOI: 10.1101/2024.10.23.619915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Quantitatively mapping enzyme sequence-catalysis landscapes remains a critical challenge in understanding enzyme function, evolution, and design. Here, we expand an emerging microfluidic platform to measure catalytic constants-k cat and K M-for hundreds of diverse naturally occurring sequences and mutants of the model enzyme Adenylate Kinase (ADK). This enables us to dissect the sequence-catalysis landscape's topology, navigability, and mechanistic underpinnings, revealing distinct catalytic peaks organized by structural motifs. These results challenge long-standing hypotheses in enzyme adaptation, demonstrating that thermophilic enzymes are not slower than their mesophilic counterparts. Combining the rich representations of protein sequences provided by deep-learning models with our custom high-throughput kinetic data yields semi-supervised models that significantly outperform existing models at predicting catalytic parameters of naturally occurring ADK sequences. Our work demonstrates a promising strategy for dissecting sequence-catalysis landscapes across enzymatic evolution and building family-specific models capable of accurately predicting catalytic constants, opening new avenues for enzyme engineering and functional prediction.
Collapse
Affiliation(s)
- Duncan F Muir
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA, USA
- Program in Biophysics, University of California, San Francisco, San Francisco, CA, USA
| | - Garrison P R Asper
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA, USA
| | - Pascal Notin
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Jacob A Posner
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA, USA
- Department of Biology, San Francisco State University, San Francisco, CA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Michael J Keiser
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Margaux M Pinney
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA, USA
- Valhalla Fellow, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
37
|
Alazmi M. Enzyme catalytic efficiency prediction: employing convolutional neural networks and XGBoost. Front Artif Intell 2024; 7:1446063. [PMID: 39498388 PMCID: PMC11532030 DOI: 10.3389/frai.2024.1446063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 10/07/2024] [Indexed: 11/07/2024] Open
Abstract
Introduction In the intricate realm of enzymology, the precise quantification of enzyme efficiency, epitomized by the turnover number (k cat), is a paramount yet elusive objective. Existing methodologies, though sophisticated, often grapple with the inherent stochasticity and multifaceted nature of enzymatic reactions. Thus, there arises a necessity to explore avant-garde computational paradigms. Methods In this context, we introduce "enzyme catalytic efficiency prediction (ECEP)," leveraging advanced deep learning techniques to enhance the previous implementation, TurNuP, for predicting the enzyme catalase k cat. Our approach significantly outperforms prior methodologies, incorporating new features derived from enzyme sequences and chemical reaction dynamics. Through ECEP, we unravel the intricate enzyme-substrate interactions, capturing the nuanced interplay of molecular determinants. Results Preliminary assessments, compared against established models like TurNuP and DLKcat, underscore the superior predictive capabilities of ECEP, marking a pivotal shift in silico enzymatic turnover number estimation. This study enriches the computational toolkit available to enzymologists and lays the groundwork for future explorations in the burgeoning field of bioinformatics. This paper suggested a multi-feature ensemble deep learning-based approach to predict enzyme kinetic parameters using an ensemble convolution neural network and XGBoost by calculating weighted-average of each feature-based model's output to outperform traditional machine learning methods. The proposed "ECEP" model significantly outperformed existing methodologies, achieving a mean squared error (MSE) reduction of 0.35 from 0.81 to 0.46 and R-squared score from 0.44 to 0.54, thereby demonstrating its superior accuracy and effectiveness in enzyme catalytic efficiency prediction. Discussion This improvement underscores the model's potential to enhance the field of bioinformatics, setting a new benchmark for performance.
Collapse
Affiliation(s)
- Meshari Alazmi
- College of Computer Science and Engineering, University of Ha’il, Ha’il, Saudi Arabia
| |
Collapse
|
38
|
Chen LY, Li YP. Machine learning-guided strategies for reaction conditions design and optimization. Beilstein J Org Chem 2024; 20:2476-2492. [PMID: 39376489 PMCID: PMC11457048 DOI: 10.3762/bjoc.20.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/19/2024] [Indexed: 10/09/2024] Open
Abstract
This review surveys the recent advances and challenges in predicting and optimizing reaction conditions using machine learning techniques. The paper emphasizes the importance of acquiring and processing large and diverse datasets of chemical reactions, and the use of both global and local models to guide the design of synthetic processes. Global models exploit the information from comprehensive databases to suggest general reaction conditions for new reactions, while local models fine-tune the specific parameters for a given reaction family to improve yield and selectivity. The paper also identifies the current limitations and opportunities in this field, such as the data quality and availability, and the integration of high-throughput experimentation. The paper demonstrates how the combination of chemical engineering, data science, and ML algorithms can enhance the efficiency and effectiveness of reaction conditions design, and enable novel discoveries in synthetic chemistry.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei 11529, Taiwan
| |
Collapse
|
39
|
Zare F, Fleming RMT. Integration of proteomic data with genome-scale metabolic models: A methodological overview. Protein Sci 2024; 33:e5150. [PMID: 39275997 PMCID: PMC11400636 DOI: 10.1002/pro.5150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 06/29/2024] [Accepted: 08/06/2024] [Indexed: 09/16/2024]
Abstract
The integration of proteomics data with constraint-based reconstruction and analysis (COBRA) models plays a pivotal role in understanding the relationship between genotype and phenotype and bridges the gap between genome-level phenomena and functional adaptations. Integrating a generic genome-scale model with information on proteins enables generation of a context-specific metabolic model which improves the accuracy of model prediction. This review explores methodologies for incorporating proteomics data into genome-scale models. Available methods are grouped into four distinct categories based on their approach to integrate proteomics data and their depth of modeling. Within each category section various methods are introduced in chronological order of publication demonstrating the progress of this field. Furthermore, challenges and potential solutions to further progress are outlined, including the limited availability of appropriate in vitro data, experimental enzyme turnover rates, and the trade-off between model accuracy, computational tractability, and data scarcity. In conclusion, methods employing simpler approaches demand fewer kinetic and omics data, consequently leading to a less complex mathematical problem and reduced computational expenses. On the other hand, approaches that delve deeper into cellular mechanisms and aim to create detailed mathematical models necessitate more extensive kinetic and omics data, resulting in a more complex and computationally demanding problem. However, in some cases, this increased cost can be justified by the potential for more precise predictions.
Collapse
Affiliation(s)
- Farid Zare
- School of Medicine, University of Galway, Galway, Ireland
| | | |
Collapse
|
40
|
Kroll A, Lercher MJ. DLKcat cannot predict meaningful k cat values for mutants and unfamiliar enzymes. Biol Methods Protoc 2024; 9:bpae061. [PMID: 39346751 PMCID: PMC11427335 DOI: 10.1093/biomethods/bpae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 08/09/2024] [Accepted: 08/22/2024] [Indexed: 10/01/2024] Open
Abstract
The recently published DLKcat model, a deep learning approach for predicting enzyme turnover numbers (k cat), claims to enable high-throughput k cat predictions for metabolic enzymes from any organism and to capture k cat changes for mutated enzymes. Here, we critically evaluate these claims. We show that for enzymes with <60% sequence identity to the training data DLKcat predictions become worse than simply assuming a constant average k cat value for all reactions. Furthermore, DLKcat's ability to predict mutation effects is much weaker than implied, capturing none of the experimentally observed variation across mutants not included in the training data. These findings highlight significant limitations in DLKcat's generalizability and its practical utility for predicting k cat values for novel enzyme families or mutants, which are crucial applications in fields such as metabolic modeling.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| |
Collapse
|
41
|
Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev 2024; 53:8202-8239. [PMID: 38990263 DOI: 10.1039/d4cs00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Global environmental issues and sustainable development call for new technologies for fine chemical synthesis and waste valorization. Biocatalysis has attracted great attention as the alternative to the traditional organic synthesis. However, it is challenging to navigate the vast sequence space to identify those proteins with admirable biocatalytic functions. The recent development of deep-learning based structure prediction methods such as AlphaFold2 reinforced by different computational simulations or multiscale calculations has largely expanded the 3D structure databases and enabled structure-based design. While structure-based approaches shed light on site-specific enzyme engineering, they are not suitable for large-scale screening of potential biocatalysts. Effective utilization of big data using machine learning techniques opens up a new era for accelerated predictions. Here, we review the approaches and applications of structure-based and machine-learning guided enzyme design. We also provide our view on the challenges and perspectives on effectively employing enzyme design approaches integrating traditional molecular simulations and machine learning, and the importance of database construction and algorithm development in attaining predictive ML models to explore the sequence fitness landscape for the design of admirable biocatalysts.
Collapse
Affiliation(s)
- Jiahui Zhou
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| | - Meilan Huang
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| |
Collapse
|
42
|
Cui Q, Gao Y, Wen Q, Wang T, Ren X, Cheng L, Bai M, Cheng C. Tunable Structured 2D Nanobiocatalysts: Synthesis, Catalytic Properties and New Horizons in Biomedical Applications. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2024; 20:e2311584. [PMID: 38566551 DOI: 10.1002/smll.202311584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/18/2024] [Indexed: 04/04/2024]
Abstract
2D materials have offered essential contributions to boosting biocatalytic efficiency in diverse biomedical applications due to the intrinsic enzyme-mimetic activity and massive specific surface area for loading metal catalytic centers. Since the difficulty of high-quality synthesis, the varied structure, and the tough choice of efficient surface loading sites with catalytic properties, the artificial building of 2D nanobiocatalysts still faces great challenges. Here, in this review, a timely and comprehensive summarization of the latest progress and future trends in the design and biotherapeutic applications of 2D nanobiocatalysts is provided, which is essential for their development. First, an overview of the synthesis-structure-fundamentals and structure-property relationships of 2D nanobiocatalysts, both metal-free and metal-based is provided. After that, the effective design of the active sites of nanobiocatalysts is discussed. Then, the progress of their applied research in recent years, including biomedical analysis, biomedical therapeutics, pharmacokinetics, and toxicology is systematically highlighted. Finally, future research directions of 2D nanobiocatalysts are prospected. Overall, this review to provide cutting-edge and multidisciplinary guidance for accelerating future developments and biomedical applications of 2D nanobiocatalysts is expected.
Collapse
Affiliation(s)
- Qiqi Cui
- College of Polymer Science and Engineering, State Key Laboratory of Polymer Materials Engineering, Sichuan University, Chengdu, 610065, China
| | - Yang Gao
- College of Polymer Science and Engineering, State Key Laboratory of Polymer Materials Engineering, Sichuan University, Chengdu, 610065, China
- Department of Endodontics, State Key Laboratory of Oral Diseases & National Clinical Research, Center for Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, China
| | - Qinlong Wen
- College of Polymer Science and Engineering, State Key Laboratory of Polymer Materials Engineering, Sichuan University, Chengdu, 610065, China
| | - Ting Wang
- College of Polymer Science and Engineering, State Key Laboratory of Polymer Materials Engineering, Sichuan University, Chengdu, 610065, China
| | - Xiancheng Ren
- College of Polymer Science and Engineering, State Key Laboratory of Polymer Materials Engineering, Sichuan University, Chengdu, 610065, China
| | - Liang Cheng
- Department of Materials Science and Engineering, Center for Oral Diseases, The Macau University of Science and Technology, Taipa, Macau, China
| | - Mingru Bai
- Department of Endodontics, State Key Laboratory of Oral Diseases & National Clinical Research, Center for Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, China
| | - Chong Cheng
- College of Polymer Science and Engineering, State Key Laboratory of Polymer Materials Engineering, Sichuan University, Chengdu, 610065, China
- Department of Endodontics, State Key Laboratory of Oral Diseases & National Clinical Research, Center for Oral Diseases, West China Hospital of Stomatology, Sichuan University, Chengdu, 610041, China
| |
Collapse
|
43
|
Wang T, Xiang G, He S, Su L, Wang Y, Yan X, Lu H. DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures. Brief Bioinform 2024; 25:bbae409. [PMID: 39162313 PMCID: PMC11880767 DOI: 10.1093/bib/bbae409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 07/13/2024] [Accepted: 08/04/2024] [Indexed: 08/21/2024] Open
Abstract
Turnover numbers (kcat), which indicate an enzyme's catalytic efficiency, have a wide range of applications in fields including protein engineering and synthetic biology. Experimentally measuring the enzymes' kcat is always time-consuming. Recently, the prediction of kcat using deep learning models has mitigated this problem. However, the accuracy and robustness in kcat prediction still needs to be improved significantly, particularly when dealing with enzymes with low sequence similarity compared to those within the training dataset. Herein, we present DeepEnzyme, a cutting-edge deep learning model that combines the most recent Transformer and Graph Convolutional Network (GCN) to capture the information of both the sequence and 3D-structure of a protein. To improve the prediction accuracy, DeepEnzyme was trained by leveraging the integrated features from both sequences and 3D-structures. Consequently, DeepEnzyme exhibits remarkable robustness when processing enzymes with low sequence similarity compared to those in the training dataset by utilizing additional features from high-quality protein 3D-structures. DeepEnzyme also makes it possible to evaluate how point mutations affect the catalytic activity of the enzyme, which helps identify residue sites that are crucial for the catalytic function. In summary, DeepEnzyme represents a pioneering effort in predicting enzymes' kcat values with improved accuracy and robustness compared to previous algorithms. This advancement will significantly contribute to our comprehension of enzyme function and its evolutionary patterns across species.
Collapse
Affiliation(s)
- Tong Wang
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China
- College of Science, Chongqing University of Technology, 69 Hongguang Avenue, Banan District, Chongqing 400054, China
| | - Guangming Xiang
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China
| | - Siwei He
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China
| | - Liyun Su
- College of Science, Chongqing University of Technology, 69 Hongguang Avenue, Banan District, Chongqing 400054, China
| | - Yuguang Wang
- Institute of Natural Sciences, School of Mathematical Sciences, Zhangjiang Institute of Advanced Study, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, 701 Yunjin Road, Xuhui District, Shanghai 200237, China
| | - Xuefeng Yan
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, 130 Meilong Road, Xuhui District, Shanghai 200237, China
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Xuhui District, Shanghai 200237, China
| | - Hongzhong Lu
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China
| |
Collapse
|
44
|
Wang J, Yang Z, Chen C, Yao G, Wan X, Bao S, Ding J, Wang L, Jiang H. MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction. Brief Bioinform 2024; 25:bbae387. [PMID: 39129365 PMCID: PMC11317537 DOI: 10.1093/bib/bbae387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 06/24/2024] [Accepted: 07/23/2024] [Indexed: 08/13/2024] Open
Abstract
Enzymatic reaction kinetics are central in analyzing enzymatic reaction mechanisms and target-enzyme optimization, and thus in biomanufacturing and other industries. The enzyme turnover number (kcat) and Michaelis constant (Km), key kinetic parameters for measuring enzyme catalytic efficiency, are crucial for analyzing enzymatic reaction mechanisms and the directed evolution of target enzymes. Experimental determination of kcat and Km is costly in terms of time, labor, and cost. To consider the intrinsic connection between kcat and Km and further improve the prediction performance, we propose a universal pretrained multitask deep learning model, MPEK, to predict these parameters simultaneously while considering pH, temperature, and organismal information. Through testing on the same kcat and Km test datasets, MPEK demonstrated superior prediction performance over the previous models. Specifically, MPEK achieved the Pearson coefficient of 0.808 for predicting kcat, improving ca. 14.6% and 7.6% compared to the DLKcat and UniKP models, and it achieved the Pearson coefficient of 0.777 for predicting Km, improving ca. 34.9% and 53.3% compared to the Kroll_model and UniKP models. More importantly, MPEK was able to reveal enzyme promiscuity and was sensitive to slight changes in the mutant enzyme sequence. In addition, in three case studies, it was shown that MPEK has the potential for assisted enzyme mining and directed evolution. To facilitate in silico evaluation of enzyme catalytic efficiency, we have established a web server implementing this model, which can be accessed at http://mathtc.nscc-tj.cn/mpek.
Collapse
Affiliation(s)
- Jingjing Wang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Zhijiang Yang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Chang Chen
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Ge Yao
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Xiukun Wan
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Shaoheng Bao
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| |
Collapse
|
45
|
Shi Z, Wang D, Li Y, Deng R, Lin J, Liu C, Li H, Wang R, Zhao M, Mao Z, Yuan Q, Liao X, Ma H. REME: an integrated platform for reaction enzyme mining and evaluation. Nucleic Acids Res 2024; 52:W299-W305. [PMID: 38769057 PMCID: PMC11223788 DOI: 10.1093/nar/gkae405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 04/16/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024] Open
Abstract
A key challenge in pathway design is finding proper enzymes that can be engineered to catalyze a non-natural reaction. Although existing tools can identify potential enzymes based on similar reactions, these tools encounter several issues. Firstly, the calculated similar reactions may not even have the same reaction type. Secondly, the associated enzymes are often numerous and identifying the most promising candidate enzymes is difficult due to the lack of data for evaluation. Thirdly, existing web tools do not provide interactive functions that enable users to fine-tune results based on their expertise. Here, we present REME (https://reme.biodesign.ac.cn/), the first integrated web platform for reaction enzyme mining and evaluation. Combining atom-to-atom mapping, atom type change identification, and reaction similarity calculation enables quick ranking and visualization of reactions similar to an objective non-natural reaction. Additional functionality enables users to filter similar reactions by their specified functional groups and candidate enzymes can be further filtered (e.g. by organisms) or expanded by Enzyme Commission number (EC) or sequence homology. Afterward, enzyme attributes (such as kcat, Km, optimal temperature and pH) can be assessed with deep learning-based methods, facilitating the swift identification of potential enzymes that can catalyze the non-natural reaction.
Collapse
Affiliation(s)
- Zhenkun Shi
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Dehang Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Yang Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- University of Chinese Academy of Sciences, Beijing 101408, PR China
| | - Rui Deng
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Jiawei Lin
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Cui Liu
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Haoran Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Ruoyu Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Muqiang Zhao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Zhitao Mao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Qianqian Yuan
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Xiaoping Liao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- Haihe Laboratory of Synthetic Biology, Tianjin 300308, PR China
| | - Hongwu Ma
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| |
Collapse
|
46
|
Norton-Baker B, Denton MCR, Murphy NP, Fram B, Lim S, Erickson E, Gauthier NP, Beckham GT. Enabling high-throughput enzyme discovery and engineering with a low-cost, robot-assisted pipeline. Sci Rep 2024; 14:14449. [PMID: 38914665 PMCID: PMC11196671 DOI: 10.1038/s41598-024-64938-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 06/14/2024] [Indexed: 06/26/2024] Open
Abstract
As genomic databases expand and artificial intelligence tools advance, there is a growing demand for efficient characterization of large numbers of proteins. To this end, here we describe a generalizable pipeline for high-throughput protein purification using small-scale expression in E. coli and an affordable liquid-handling robot. This low-cost platform enables the purification of 96 proteins in parallel with minimal waste and is scalable for processing hundreds of proteins weekly per user. We demonstrate the performance of this method with the expression and purification of the leading poly(ethylene terephthalate) hydrolases reported in the literature. Replicate experiments demonstrated reproducibility and enzyme purity and yields (up to 400 µg) sufficient for comprehensive analyses of both thermostability and activity, generating a standardized benchmark dataset for comparing these plastic-degrading enzymes. The cost-effectiveness and ease of implementation of this platform render it broadly applicable to diverse protein characterization challenges in the biological sciences.
Collapse
Grants
- DE-SC0022024 U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER), Genomic Science Program
- DE-SC0022024 U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER), Genomic Science Program
- DE-SC0022024 U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER), Genomic Science Program
- DE-SC0022024 U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER), Genomic Science Program
- DE-SC0022024 U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER), Genomic Science Program
- DE-AC36-08GO28308 Advanced Materials and Manufacturing Technologies Office (AMMTO)
- DE-AC36-08GO28308 Advanced Materials and Manufacturing Technologies Office (AMMTO)
- DE-AC36-08GO28308 Advanced Materials and Manufacturing Technologies Office (AMMTO)
- DE-AC36-08GO28308 Advanced Materials and Manufacturing Technologies Office (AMMTO)
- U.S. Department of Energy Office of Energy Efficiency and Renewable Energy Bioenergy Technologies Office (BETO)
- Bio-Optimized Technologies to keep Thermoplastics out of Landfills and the Environment (BOTTLE) Consortium
- Dana-Farber Cancer Institute
Collapse
Affiliation(s)
- Brenna Norton-Baker
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, USA
- BOTTLE Consortium, Golden, CO, USA
- Agile BioFoundry, Emeryville, CA, USA
| | - Mackenzie C R Denton
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, USA
- BOTTLE Consortium, Golden, CO, USA
| | - Natasha P Murphy
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, USA
- BOTTLE Consortium, Golden, CO, USA
| | - Benjamin Fram
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Samuel Lim
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Erika Erickson
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, USA
- BOTTLE Consortium, Golden, CO, USA
| | - Nicholas P Gauthier
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Gregg T Beckham
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO, USA.
- BOTTLE Consortium, Golden, CO, USA.
- Agile BioFoundry, Emeryville, CA, USA.
| |
Collapse
|
47
|
Ndochinwa OG, Wang QY, Amadi OC, Nwagu TN, Nnamchi CI, Okeke ES, Moneke AN. Current status and emerging frontiers in enzyme engineering: An industrial perspective. Heliyon 2024; 10:e32673. [PMID: 38912509 PMCID: PMC11193041 DOI: 10.1016/j.heliyon.2024.e32673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 06/05/2024] [Accepted: 06/06/2024] [Indexed: 06/25/2024] Open
Abstract
Protein engineering mechanisms can be an efficient approach to enhance the biochemical properties of various biocatalysts. Immobilization of biocatalysts and the introduction of new-to-nature chemical reactivities are also possible through the same mechanism. Discovering new protocols that enhance the catalytic active protein that possesses novelty in terms of being stable, active, and, stereoselectivity with functions could be identified as essential areas in terms of concurrent bioorganic chemistry (synergistic relationship between organic chemistry and biochemistry in the context of enzyme engineering). However, with our current level of knowledge about protein folding and its correlation with protein conformation and activities, it is almost impossible to design proteins with specific biological and physical properties. Hence, contemporary protein engineering typically involves reprogramming existing enzymes by mutagenesis to generate new phenotypes with desired properties. These processes ensure that limitations of naturally occurring enzymes are not encountered. For example, researchers have engineered cellulases and hemicellulases to withstand harsh conditions encountered during biomass pretreatment, such as high temperatures and acidic environments. By enhancing the activity and robustness of these enzymes, biofuel production becomes more economically viable and environmentally sustainable. Recent trends in enzyme engineering have enabled the development of tailored biocatalysts for pharmaceutical applications. For instance, researchers have engineered enzymes such as cytochrome P450s and amine oxidases to catalyze challenging reactions involved in drug synthesis. In addition to conventional methods, there has been an increasing application of machine learning techniques to identify patterns in data. These patterns are then used to predict protein structures, enhance enzyme solubility, stability, and function, forecast substrate specificity, and assist in rational protein design. In this review, we discussed recent trends in enzyme engineering to optimize the biochemical properties of various biocatalysts. Using examples relevant to biotechnology in engineering enzymes, we try to expatiate the significance of enzyme engineering with how these methods could be applied to optimize the biochemical properties of a naturally occurring enzyme.
Collapse
Affiliation(s)
- Obinna Giles Ndochinwa
- Department of Microbiology, Faculty of Biological Science, University of Nigeria, Nsukka, Nigeria
| | - Qing-Yan Wang
- State Key Laboratory of Biomass Enzyme Technology, National Engineering Research Center for Non-Food Biorefinery, Guangxi Academy of Sciences, Nanning, Guangxi, China
| | - Oyetugo Chioma Amadi
- Department of Microbiology, Faculty of Biological Science, University of Nigeria, Nsukka, Nigeria
| | - Tochukwu Nwamaka Nwagu
- Department of Microbiology, Faculty of Biological Science, University of Nigeria, Nsukka, Nigeria
| | | | - Emmanuel Sunday Okeke
- Department of Biochemistry, Faculty of Biological Sciences & Natural Science Unit, School of General Studies, University of Nigeria, Nsukka, Enugu State, 410001, Nigeria
- Institute of Environmental Health and Ecological Security, School of the Environment and Safety, Jiangsu University, 301 Xuefu Rd., 212013, Zhenjiang, Jiangsu, China
| | - Anene Nwabu Moneke
- Department of Microbiology, Faculty of Biological Science, University of Nigeria, Nsukka, Nigeria
| |
Collapse
|
48
|
Zhang F, Naeem M, Yu B, Liu F, Ju J. Improving the enzymatic activity and stability of N-carbamoyl hydrolase using deep learning approach. Microb Cell Fact 2024; 23:164. [PMID: 38834993 PMCID: PMC11151596 DOI: 10.1186/s12934-024-02439-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 05/24/2024] [Indexed: 06/06/2024] Open
Abstract
BACKGROUND Optically active D-amino acids are widely used as intermediates in the synthesis of antibiotics, insecticides, and peptide hormones. Currently, the two-enzyme cascade reaction is the most efficient way to produce D-amino acids using enzymes DHdt and DCase, but DCase is susceptible to heat inactivation. Here, to enhance the enzymatic activity and thermal stability of DCase, a rational design software "Feitian" was developed based on kcat prediction using the deep learning approach. RESULTS According to empirical design and prediction of "Feitian" software, six single-point mutants with high kcat value were selected and successfully constructed by site-directed mutagenesis. Out of six, three mutants (Q4C, T212S, and A302C) showed higher enzymatic activity than the wild-type. Furthermore, the combined triple-point mutant DCase-M3 (Q4C/T212S/A302C) exhibited a 4.25-fold increase in activity (29.77 ± 4.52 U) and a 2.25-fold increase in thermal stability as compared to the wild-type, respectively. Through the whole-cell reaction, the high titer of D-HPG (2.57 ± 0.43 mM) was produced by the mutant Q4C/T212S/A302C, which was about 2.04-fold of the wild-type. Molecular dynamics simulation results showed that DCase-M3 significantly enhances the rigidity of the catalytic site and thus increases the activity of DCase-M3. CONCLUSIONS In this study, an efficient rational design software "Feitian" was successfully developed with a prediction accuracy of about 50% in enzymatic activity. A triple-point mutant DCase-M3 (Q4C/T212S/A302C) with enhanced enzymatic activity and thermostability was successfully obtained, which could be applied to the development of a fully enzymatic process for the industrial production of D-HPG.
Collapse
Affiliation(s)
- Fa Zhang
- College of Life Science, Hebei Normal University, Shijiazhuang, 050024, China
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Muhammad Naeem
- College of Life Science, Hebei Normal University, Shijiazhuang, 050024, China
| | - Bo Yu
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Feixia Liu
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jiansong Ju
- College of Life Science, Hebei Normal University, Shijiazhuang, 050024, China.
- Hebei Collaborative Innovation Center for Eco-Environment, Shijiazhuang, 050024, China.
| |
Collapse
|
49
|
Wang Y, Mao Z, Dong J, Zhang P, Gao Q, Liu D, Tian C, Ma H. Construction of an enzyme-constrained metabolic network model for Myceliophthora thermophila using machine learning-based k cat data. Microb Cell Fact 2024; 23:138. [PMID: 38750569 PMCID: PMC11558977 DOI: 10.1186/s12934-024-02415-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 05/04/2024] [Indexed: 11/14/2024] Open
Abstract
BACKGROUND Genome-scale metabolic models (GEMs) serve as effective tools for understanding cellular phenotypes and predicting engineering targets in the development of industrial strain. Enzyme-constrained genome-scale metabolic models (ecGEMs) have emerged as a valuable advancement, providing more accurate predictions and unveiling new engineering targets compared to models lacking enzyme constraints. In 2022, a stoichiometric GEM, iDL1450, was reconstructed for the industrially significant fungus Myceliophthora thermophila. To enhance the GEM's performance, an ecGEM was developed for M. thermophila in this study. RESULTS Initially, the model iDL1450 underwent refinement and updates, resulting in a new version named iYW1475. These updates included adjustments to biomass components, correction of gene-protein-reaction (GPR) rules, and a consensus on metabolites. Subsequently, the first ecGEM for M. thermophila was constructed using machine learning-based kcat data predicted by TurNuP within the ECMpy framework. During the construction, three versions of ecGEMs were developed based on three distinct kcat collection methods, namely AutoPACMEN, DLKcat and TurNuP. After comparison, the ecGEM constructed using TurNuP-predicted kcat values performed better in several aspects and was selected as the definitive version of ecGEM for M. thermophila (ecMTM). Comparing ecMTM to iYW1475, the solution space was reduced and the growth simulation results more closely resembled realistic cellular phenotypes. Metabolic adjustment simulated by ecMTM revealed a trade-off between biomass yield and enzyme usage efficiency at varying glucose uptake rates. Notably, hierarchical utilization of five carbon sources derived from plant biomass hydrolysis was accurately captured and explained by ecMTM. Furthermore, based on enzyme cost considerations, ecMTM successfully predicted reported targets for metabolic engineering modification and introduced some new potential targets for chemicals produced in M. thermophila. CONCLUSIONS In this study, the incorporation of enzyme constraint to iYW1475 not only improved prediction accuracy but also broadened the model's applicability. This research demonstrates the effectiveness of integrating of machine learning-based kcat data in the construction of ecGEMs especially in situations where there is limited measured enzyme kinetic parameters for a specific organism.
Collapse
Affiliation(s)
- Yutao Wang
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, Tianjin, 300457, China
- Haihe Laboratory of Synthetic Biology, Tianjin, 300308, China
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
| | - Zhitao Mao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
| | - Jiacheng Dong
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, Tianjin, 300457, China
- Haihe Laboratory of Synthetic Biology, Tianjin, 300308, China
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
| | - Peiji Zhang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
| | - Qiang Gao
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, Tianjin, 300457, China
| | - Defei Liu
- Haihe Laboratory of Synthetic Biology, Tianjin, 300308, China.
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China.
| | - Chaoguang Tian
- Haihe Laboratory of Synthetic Biology, Tianjin, 300308, China.
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China.
| | - Hongwu Ma
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China.
| |
Collapse
|
50
|
Kroll A, Ranjan S, Lercher MJ. A multimodal Transformer Network for protein-small molecule interactions enhances predictions of kinase inhibition and enzyme-substrate relationships. PLoS Comput Biol 2024; 20:e1012100. [PMID: 38768223 PMCID: PMC11142704 DOI: 10.1371/journal.pcbi.1012100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/31/2024] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
The activities of most enzymes and drugs depend on interactions between proteins and small molecules. Accurate prediction of these interactions could greatly accelerate pharmaceutical and biotechnological research. Current machine learning models designed for this task have a limited ability to generalize beyond the proteins used for training. This limitation is likely due to a lack of information exchange between the protein and the small molecule during the generation of the required numerical representations. Here, we introduce ProSmith, a machine learning framework that employs a multimodal Transformer Network to simultaneously process protein amino acid sequences and small molecule strings in the same input. This approach facilitates the exchange of all relevant information between the two molecule types during the computation of their numerical representations, allowing the model to account for their structural and functional interactions. Our final model combines gradient boosting predictions based on the resulting multimodal Transformer Network with independent predictions based on separate deep learning representations of the proteins and small molecules. The resulting predictions outperform recently published state-of-the-art models for predicting protein-small molecule interactions across three diverse tasks: predicting kinase inhibitions; inferring potential substrates for enzymes; and predicting Michaelis constants KM. The Python code provided can be used to easily implement and improve machine learning predictions involving arbitrary protein-small molecule interactions.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| | - Sahasra Ranjan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Martin J. Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|