1
|
Sun X, Wang YG, Shen Y. A multimodal deep learning framework for enzyme turnover prediction with missing modality. Comput Biol Med 2025; 193:110348. [PMID: 40409036 DOI: 10.1016/j.compbiomed.2025.110348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 04/25/2025] [Accepted: 05/04/2025] [Indexed: 05/25/2025]
Abstract
Accurate prediction of the turnover number (kcat), which quantifies the maximum rate of substrate conversion at an enzyme's active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of kcat are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on kcat due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this kcat prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce MMKcat, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, R2, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at https://github.com/ProEcho1/MMKcat.
Collapse
Affiliation(s)
- Xin Sun
- Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Yu Guang Wang
- Shanghai Jiao Tong University, Shanghai, 200240, Shanghai, China
| | - Yiqing Shen
- Johns Hopkins University, Baltimore, 21218, MD, USA.
| |
Collapse
|
2
|
Du BX, Yu H, Zhu B, Long Y, Wu M, Shi JY. A novel interpretability framework for enzyme turnover number prediction boosted by pre-trained enzyme embeddings and adaptive gate network. Methods 2025; 237:45-52. [PMID: 40021034 DOI: 10.1016/j.ymeth.2025.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 01/05/2025] [Accepted: 02/25/2025] [Indexed: 03/03/2025] Open
Abstract
It is a vital step to identify the enzyme turnover number (kcat) in synthetic biology and early-stage drug discovery. Recently, deep learning methods have achieved inspiring process to predict kcat with the development of multi-species enzyme-substrate pairs turnover number data. However, the performance of existing approaches still heavily depends on the effectiveness of feature extraction for enzymes and substrates, as well as the optimal fusion of these two types of features. Furthermore, it is essential to identify the key molecular substructures that significantly impact kcat prediction. To address these issues, we develop a novel end-to-end dual-representation interpretability framework GELKcat by harnessing graph transformers for substrate molecular encoding and CNNs for enzyme word2vec embeddings. We further integrate substrate and enzyme features using the adaptive gate network, which assigns optimal weights to capture the most suitable feature combinations. The comparison with several state-of-the-art methods demonstrates the superiority of our GELKcat and the ablation studies further illuminate the invaluable roles of three main components. Furthermore, case studies illustrate the interpretability of GELKcat by identifying the key functional groups in a substrate, which are significantly associated with turnover number. It is anticipated that this work can bridge current gaps in enzyme-substrate representation, which can give some guidance for drug discovery and synthetic biology.
Collapse
Affiliation(s)
- Bing-Xue Du
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China; Institute for Infocomm Research (I(2)R), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| | - Haoyang Yu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Bei Zhu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yahui Long
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore.
| | - Min Wu
- Institute for Infocomm Research (I(2)R), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
3
|
Zhai J, Qi X, Cai L, Liu Y, Tang H, Xie L, Wang J. NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling. Brief Bioinform 2025; 26:bbaf212. [PMID: 40370097 PMCID: PMC12078937 DOI: 10.1093/bib/bbaf212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2025] [Revised: 04/14/2025] [Accepted: 04/21/2025] [Indexed: 05/16/2025] Open
Abstract
Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et al.'s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research & development.
Collapse
Affiliation(s)
- Jingchen Zhai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Xiguang Qi
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Lianjin Cai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Yue Liu
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Haocheng Tang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, 695 Park Ave, New York, NY 10065, United States
- Helen & Robert Appel Alzheimer's Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, 413 E 69th St, New York, NY 10021, United States
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| |
Collapse
|
4
|
Kong Y, Chen H, Huang X, Chang L, Yang B, Chen W. Precise metabolic modeling in post-omics era: accomplishments and perspectives. Crit Rev Biotechnol 2025; 45:683-701. [PMID: 39198033 DOI: 10.1080/07388551.2024.2390089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/18/2024] [Accepted: 07/23/2024] [Indexed: 09/01/2024]
Abstract
Microbes have been extensively utilized for their sustainable and scalable properties in synthesizing desired bio-products. However, insufficient knowledge about intracellular metabolism has impeded further microbial applications. The genome-scale metabolic models (GEMs) play a pivotal role in facilitating a global understanding of cellular metabolic mechanisms. These models enable rational modification by exploring metabolic pathways and predicting potential targets in microorganisms, enabling precise cell regulation without experimental costs. Nonetheless, simplified GEM only considers genome information and network stoichiometry while neglecting other important bio-information, such as enzyme functions, thermodynamic properties, and kinetic parameters. Consequently, uncertainties persist particularly when predicting microbial behaviors in complex and fluctuant systems. The advent of the omics era with its massive quantification of genes, proteins, and metabolites under various conditions has led to the flourishing of multi-constrained models and updated algorithms with improved predicting power and broadened dimension. Meanwhile, machine learning (ML) has demonstrated exceptional analytical and predictive capacities when applied to training sets of biological big data. Incorporating the discriminant strength of ML with GEM facilitates mechanistic modeling efficiency and improves predictive accuracy. This paper provides an overview of research innovations in the GEM, including multi-constrained modeling, analytical approaches, and the latest applications of ML, which may contribute comprehensive knowledge toward genetic refinement, strain development, and yield enhancement for a broad range of biomolecules.
Collapse
Affiliation(s)
- Yawen Kong
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, P. R. China
- School of Food Science and Technology, Jiangnan University, Wuxi, P. R. China
| | - Haiqin Chen
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, P. R. China
- School of Food Science and Technology, Jiangnan University, Wuxi, P. R. China
| | - Xinlei Huang
- The Key Laboratory of Industrial Biotechnology, School of Biotechnology, Jiangnan University, Wuxi, P. R. China
| | - Lulu Chang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, P. R. China
- School of Food Science and Technology, Jiangnan University, Wuxi, P. R. China
| | - Bo Yang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, P. R. China
- School of Food Science and Technology, Jiangnan University, Wuxi, P. R. China
| | - Wei Chen
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, P. R. China
- School of Food Science and Technology, Jiangnan University, Wuxi, P. R. China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi, P. R. China
| |
Collapse
|
5
|
Kroll A, Rousset Y, Spitzlei T, Lercher MJ. DeepMolecules: a web server for predicting enzyme and transporter-small molecule interactions. Nucleic Acids Res 2025:gkaf343. [PMID: 40297998 DOI: 10.1093/nar/gkaf343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Revised: 04/04/2025] [Accepted: 04/16/2025] [Indexed: 04/30/2025] Open
Abstract
DeepMolecules is an easily accessible web server for predicting protein-small molecule interactions. It integrates four state-of-the-art models: ESP and SPOT for identifying substrates of enzymes and transporters, respectively, TurNuP for predicting enzyme turnover numbers kcat, and a model for predicting Michaelis constants KM. These models use deep learning-generated numerical representations of the proteins and small molecules as input features for gradient-boosted decision tree models, achieving high predictive performance. The web interface accepts protein amino acid sequences and small molecules in SMILES, InChI, or KEGG ID formats, supporting single submissions and batch submissions via Excel files. Beyond its predictive capabilities, DeepMolecules provides a structured interface to experimental data on known interactions and kinetic parameters, offering a comprehensive view of protein-small molecule relationships. Freely accessible at https://www.DeepMolecules.org, the web server supports applications in metabolic engineering, drug discovery, and biocatalyst optimization by identifying potential substrates and quantifying their catalytic properties.
Collapse
Affiliation(s)
- Alexander Kroll
- Heinrich-Heine-University, Institute for Computer Science and Department of Biology, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Yvan Rousset
- Heinrich-Heine-University, Institute for Computer Science and Department of Biology, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Thomas Spitzlei
- Heinrich-Heine-University, Institute for Computer Science and Department of Biology, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Martin J Lercher
- Heinrich-Heine-University, Institute for Computer Science and Department of Biology, Universitätsstraße 1, 40225 Düsseldorf, Germany
| |
Collapse
|
6
|
Cai Y, Zhang W, Dou Z, Wang C, Yu W, Wang L. PreTKcat: A pre-trained representation learning and machine learning framework for predicting enzyme turnover number. Comput Biol Chem 2025; 115:108327. [PMID: 39765190 DOI: 10.1016/j.compbiolchem.2024.108327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 12/06/2024] [Accepted: 12/24/2024] [Indexed: 02/26/2025]
Abstract
The enzyme turnover number (kcat) is crucial for understanding enzyme kinetics and optimizing biotechnological processes. However, experimentally measured kcat values are limited due to the high cost and labor intensity of wet-lab measurements, necessitating robust computational methods. To address this issue, we propose PreTKcat, a framework that integrates pre-trained representation learning and machine learning to predict kcat values. PreTKcat utilizes the ProtT5 protein language model to encode enzyme sequences and the MolGNet molecular representation learning model to encode substrate molecular graphs. By integrating these representations, the ExtraTrees model is employed to predict kcat values. Additionally, PreTKcat accounts for the impact of temperature on kcat prediction. In addition, PreTKcat can also be used to predict enzyme-substrate affinity, i.e. km values. Comparative assessments with various state-of-the-art models highlight the superior performance of PreTKcat. PreTKcat serves as an effective tool for investigating enzyme kinetics, offering new perspectives for enzyme engineering and its industrial uses.
Collapse
Affiliation(s)
- Yunxiang Cai
- College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China
| | - Wenjuan Zhang
- College of General Education, Tianjin Foreign Studies University, No. 117, Machang Road, Hexi District, Tianjin, 300204, China
| | - Zhuangzhuang Dou
- College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China
| | - Chao Wang
- College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China
| | - Wenping Yu
- College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China
| | - Lin Wang
- College of Artificial Intelligence, Tianjin University of Science and Technology, No. 9, 13th Street, Tianjin Economic-Technological Development Area, Tianjin, 300457, China.
| |
Collapse
|
7
|
Wang Y, Han S, Wang Y, Liang Q, Luo W. Artificial Intelligence Technology Assists Enzyme Prediction and Rational Design. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2025; 73:7065-7073. [PMID: 40066931 DOI: 10.1021/acs.jafc.4c13201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2025]
Abstract
Since the structure of enzymes determines their function, elucidating the structure of enzymes lays a solid foundation for deciphering their catalytic mechanism and enabling rational design. The development of artificial intelligence (AI) has sparked a technological revolution, infusing new vitality into theoretical studies of enzymology and the advancement of enzyme engineering techniques. This Review outlines the development process and main methods of AI applied in the structural elucidation and functional prediction of enzymes. Furthermore, it emphasizes AI-based rational design of enzymes and provides a detailed exposition of representative AI algorithms and case studies. With the support of AI technology, the comprehension of enzyme structure and function and their relationship will become deeper and more efficient, thereby promoting the widespread application of enzyme engineering in various fields.
Collapse
Affiliation(s)
- Yuhang Wang
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214126, China
| | - Shuangxin Han
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214126, China
| | - Yi Wang
- Department of Biological and Agricultural Engineering, University of California, Davis, 1 Shields Ave, Davis, California 95616, United States
| | - Quanfeng Liang
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237, P. R. China
| | - Wei Luo
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214126, China
| |
Collapse
|
8
|
Wang Z, Xie D, Wu D, Luo X, Wang S, Li Y, Yang Y, Li W, Zheng L. Robust enzyme discovery and engineering with deep learning using CataPro. Nat Commun 2025; 16:2736. [PMID: 40108140 PMCID: PMC11923063 DOI: 10.1038/s41467-025-58038-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Accepted: 03/11/2025] [Indexed: 03/22/2025] Open
Abstract
Accurate prediction of enzyme kinetic parameters is crucial for enzyme exploration and modification. Existing models face the problem of either low accuracy or poor generalization ability due to overfitting. In this work, we first developed unbiased datasets to evaluate the actual performance of these methods and proposed a deep learning model, CataPro, based on pre-trained models and molecular fingerprints to predict turnover number (kcat), Michaelis constant (Km), and catalytic efficiency (kcat/Km). Compared with previous baseline models, CataPro demonstrates clearly enhanced accuracy and generalization ability on the unbiased datasets. In a representational enzyme mining project, by combining CataPro with traditional methods, we identified an enzyme (SsCSO) with 19.53 times increased activity compared to the initial enzyme (CSO2) and then successfully engineered it to improve its activity by 3.34 times. This reveals the high potential of CataPro as an effective tool for future enzyme discovery and modification.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, Jinan, 250100, Shandong, China
| | - Dongqi Xie
- Shanghai Zelixir Biotech Co. Ltd, Shanghai, 201210, Shanghai, China
| | - Dong Wu
- Shanghai Zelixir Biotech Co. Ltd, Shanghai, 201210, Shanghai, China
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Co. Ltd, Shanghai, 201210, Shanghai, China
| | - Yangyang Li
- School of Physics, Shandong University, Jinan, 250100, Shandong, China
| | - Yanmei Yang
- College of Chemistry, Chemical Engineering and Materials Science, Key Laboratory of Molecular and Nano Probes, Ministry of Education, Shandong Normal University, Jinan, 250014, Shandong, China.
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, 250100, Shandong, China.
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech Co. Ltd, Shanghai, 201210, Shanghai, China.
- Shenzhen Zelixir Biotech Co. Ltd, Shenzhen, 518107, Guangdong, China.
| |
Collapse
|
9
|
Morrissey J, Barberi G, Strain B, Facco P, Kontoravdi C. NEXT-FBA: A hybrid stoichiometric/data-driven approach to improve intracellular flux predictions. Metab Eng 2025; 91:130-144. [PMID: 40118205 DOI: 10.1016/j.ymben.2025.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 03/16/2025] [Accepted: 03/18/2025] [Indexed: 03/23/2025]
Abstract
Genome-scale metabolic models (GEMs) have been widely utilized to understand cellular metabolism. The application of GEMs has been advanced by computational methods that enable the prediction and analysis of intracellular metabolic states. However, the accuracy and biological relevance of these predictions often suffer from the many degrees of freedom and scarcity of available data to constrain the models adequately. Here, we introduce Neural-net EXtracellular Trained Flux Balance Analysis, (NEXT-FBA), a novel computational methodology that addresses these limitations by utilizing exometabolomic data to derive biologically relevant constraints for intracellular fluxes in GEMs. We achieve this by training artificial neural networks (ANNs) with exometabolomic data from Chinese hamster ovary (CHO) cells and correlating it with 13C-labeled intracellular fluxomic data. By capturing the underlying relationships between exometabolomics and cell metabolism, NEXT-FBA predicts upper and lower bounds for intracellular reaction fluxes to constrain GEMs. We demonstrate the efficacy of NEXT-FBA across several validation experiments, where it outperforms existing methods in predicting intracellular flux distributions that align closely with experimental observations. Furthermore, a case study demonstrates how NEXT-FBA can guide bioprocess optimization by identifying key metabolic shifts and refining flux predictions to yield actionable process and metabolic engineering targets. Overall, NEXT-FBA aims to improve the accuracy and biological relevance of intracellular flux predictions in metabolic modelling, with minimal input data requirements for pre-trained models.
Collapse
Affiliation(s)
- James Morrissey
- Department of Chemical Engineering, Imperial College London, London, United Kingdom
| | - Gianmarco Barberi
- CAPE-Lab (Computer-Aided Process Engineering Laboratory), Department of Industrial Engineering, University of Padova, Padova, Italy
| | - Benjamin Strain
- Department of Chemical Engineering, Imperial College London, London, United Kingdom
| | - Pierantonio Facco
- CAPE-Lab (Computer-Aided Process Engineering Laboratory), Department of Industrial Engineering, University of Padova, Padova, Italy.
| | - Cleo Kontoravdi
- Department of Chemical Engineering, Imperial College London, London, United Kingdom.
| |
Collapse
|
10
|
Wang Y, Cheng L, Zhang Y, Cao Y, Alghazzawi D. DEKP: a deep learning model for enzyme kinetic parameter prediction based on pretrained models and graph neural networks. Brief Bioinform 2025; 26:bbaf187. [PMID: 40273427 PMCID: PMC12021017 DOI: 10.1093/bib/bbaf187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/10/2025] [Accepted: 03/28/2025] [Indexed: 04/26/2025] Open
Abstract
The prediction of enzyme kinetic parameters is crucial for screening enzymes with high catalytic efficiency and desired characteristics to catalyze natural or non-natural reactions. Data-driven machine learning models have been explored to reduce experimental cost and speed up the enzyme design process. However, the prediction performance is still subject to significant limitations due to the variance in sequence similarity between training and testing datasets. In this work, we introduce DEKP, an integrated deep learning approach enzyme kinetic parameter prediction. It leverages pretrained models of protein sequences and incorporates enhanced graph neural networks that provide comprehensive representation of protein structural features. This novel approach can effectively alleviate the performance degradation caused by sequence similarity variation. Moreover, it provides sensitive detection of changes in catalytic efficiency due to enzyme mutations. Experiments validate that DEKP outperforms existing models in predicting enzyme kinetic parameters. This work is expected to significantly improve the performance of the enzyme screening process and provide a robust tool for enzyme-directed evolution research.
Collapse
Affiliation(s)
- Yizhen Wang
- School of Computer Science, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
| | - Li Cheng
- School of Computer Science, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
- Key Laboratory of Intelligent Sensing System and Security, Hubei University, Ministry of Education, No. 368 Youyi Road, 430062 Wuhan, China
- Hubei Key Laboratory of Big Data Intelligent Analysis and Application, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
| | - Yanyun Zhang
- School of Computer Science, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
- Key Laboratory of Intelligent Sensing System and Security, Hubei University, Ministry of Education, No. 368 Youyi Road, 430062 Wuhan, China
- Hubei Key Laboratory of Big Data Intelligent Analysis and Application, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
| | - Yujia Cao
- School of Computer Science, Hubei University, No. 368 Youyi Road, 430062 Wuhan, China
| | - Daniyal Alghazzawi
- Faculty of Computing and Information Technology (FCIT), 3599 King Abdulaziz University (KAU), Unit 3600, Jeddah 22254-7653, Saudi Arabia
| |
Collapse
|
11
|
Anna Sajeevan K, Osinuga A, B A, Ferdous S, Shahreen N, Noor MS, Koneru S, Santos-Correa LM, Salehi R, Chowdhury NB, Calderon-Lopez B, Mali A, Saha R, Chowdhury R. Robust Prediction of Enzyme Variant Kinetics with RealKcat. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.10.637555. [PMID: 39990461 PMCID: PMC11844551 DOI: 10.1101/2025.02.10.637555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Accurate prediction of kinetic parameters is crucial for understanding known and tailoring novel enzymes for biocatalysis. Current models fail to capture mutation effects on catalytically essential residues, limiting their utility in enzyme design. We grid-searched through ten model architectures (25,671 hyperparameter combinations) to identify a gradient-based additive framework called RealKcat, trained on 27,176 experimental entries curated manually (KinHub-27k) by screening 2,158 articles. Clustering catalytic turnover (k cat ) and substrate affinity (K M ) by rational orders of magnitude, RealKcat achieves >85% test accuracy, demonstrating highest sensitivity to mutation-induced variability thus far, and is the first-of-its-kind-model to demonstrate complete loss of activity upon deletion of the catalytic apparatus. Finally, state-of-the-art k cat validation accuracy (96%) on alkaline phosphatase (PafA) mutant industrial dataset confirms RealKcat's generalizability in learning per-residue catalytic relevance.
Collapse
Affiliation(s)
- Karuna Anna Sajeevan
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
- The Center for Biorenewable Chemicals, Iowa State University, Ames, Iowa, USA
| | - Abraham Osinuga
- Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Arunraj B
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | - Sakib Ferdous
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | - Nabia Shahreen
- Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Mohammed Sakib Noor
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | - Shashank Koneru
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | | | - Rahil Salehi
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | - Niaz Bahar Chowdhury
- Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Brisa Calderon-Lopez
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
| | - Ankur Mali
- Department of Computer Science and Engineering, University of South Florida, Tampa, Florida, USA
| | - Rajib Saha
- Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Ratul Chowdhury
- Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa, USA
- The Center for Biorenewable Chemicals, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
12
|
Chowdhury S, Fong SS, Uetz P. The protein interactome of Escherichia coli carbohydrate metabolism. PLoS One 2025; 20:e0315240. [PMID: 39903745 PMCID: PMC11793828 DOI: 10.1371/journal.pone.0315240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 11/21/2024] [Indexed: 02/06/2025] Open
Abstract
We investigate how protein-protein interactions (PPIs) can regulate carbohydrate metabolism in Escherichia coli. We specifically investigated the stoichiometry of 378 PPIs involving carbohydrate metabolic enzymes. In 48 interactions, the interactors were much more abundant than the enzyme and are thus likely to affect enzyme activity and carbohydrate metabolism. Many of these PPIs are conserved across thousands of bacteria including pathogens and microbial species. E. coli adapts to different cellular environments by adjusting the quantities of the interacting proteins (25 PPIs) in a way that the protein-enzyme interaction (PEI) is a likely mechanism to regulate its metabolism in specific environments. We predict 3 PPIs (RpsB-AdhE, DcyD-NanE and MinE-Yccx) previously not known to regulate metabolism.
Collapse
Affiliation(s)
- Shomeek Chowdhury
- Center for Integrative Life Sciences Education, Virginia Commonwealth University, Richmond, VA, United States of America
| | - Stephen S. Fong
- Center for Integrative Life Sciences Education, Virginia Commonwealth University, Richmond, VA, United States of America
| | - Peter Uetz
- Center for Biological Data Science, School of Life Sciences, Virginia Commonwealth University, Richmond, VA, United States of America
| |
Collapse
|
13
|
Oulia F, Charton P, Lo-Thong-Viramoutou O, Acevedo-Rocha CG, Liu W, Huynh D, Damour C, Wang J, Cadet F. Metabolic Fluxes Using Deep Learning Based on Enzyme Variations: Application to Glycolysis in Entamoeba histolytica. Int J Mol Sci 2024; 25:13390. [PMID: 39769154 PMCID: PMC11676880 DOI: 10.3390/ijms252413390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 12/09/2024] [Accepted: 12/10/2024] [Indexed: 01/11/2025] Open
Abstract
Metabolic pathway modeling, essential for understanding organism metabolism, is pivotal in predicting genetic mutation effects, drug design, and biofuel development. Enhancing these modeling techniques is crucial for achieving greater prediction accuracy and reliability. However, the limited experimental data or the complexity of the pathway makes it challenging for researchers to predict phenotypes. Deep learning (DL) is known to perform better than other Machine Learning (ML) approaches if the right conditions are met (i.e., a large database and good choice of parameters). Here, we use a knowledge-based model to massively generate synthetic data and extend a small initial dataset of experimental values. The main objective is to assess if DL can perform at least as well as other ML approaches in flux prediction, using 68,950 instances. Two processing methods are used to generate DL models: cross-validation and repeated holdout evaluation. DL models predict the metabolic fluxes with high precision and slightly outperform the best-known ML approach (the Cubist model) with a lower RMSE (≤0.01) in both cases. They also outperform the PLS model (RMSE ≥ 30). This study is the first to use DL to predict the overall flux of a metabolic pathway only from variations of enzyme concentrations.
Collapse
Affiliation(s)
- Freddy Oulia
- BIGR, UMR_S1134 Inserm, University of Paris City, 75006 Paris, France; (F.O.); (P.C.); (O.L.-T.-V.)
- Laboratory of Excellence GR-Ex, 75006 Paris, France
- DSIMB, UMR_S1134 BIGR, Inserm, Faculty of Sciences and Technology, University of Reunion, 97744 Saint-Denis, France
| | - Philippe Charton
- BIGR, UMR_S1134 Inserm, University of Paris City, 75006 Paris, France; (F.O.); (P.C.); (O.L.-T.-V.)
- Laboratory of Excellence GR-Ex, 75006 Paris, France
- DSIMB, UMR_S1134 BIGR, Inserm, Faculty of Sciences and Technology, University of Reunion, 97744 Saint-Denis, France
| | - Ophélie Lo-Thong-Viramoutou
- BIGR, UMR_S1134 Inserm, University of Paris City, 75006 Paris, France; (F.O.); (P.C.); (O.L.-T.-V.)
- Laboratory of Excellence GR-Ex, 75006 Paris, France
- DSIMB, UMR_S1134 BIGR, Inserm, Faculty of Sciences and Technology, University of Reunion, 97744 Saint-Denis, France
| | - Carlos G. Acevedo-Rocha
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark;
| | - Wei Liu
- Department of Computer Science and Software Engineering, School of Physics, Mathematics and Computing, The University of Western Australia, Perth 6009, Australia; (W.L.); (D.H.)
| | - Du Huynh
- Department of Computer Science and Software Engineering, School of Physics, Mathematics and Computing, The University of Western Australia, Perth 6009, Australia; (W.L.); (D.H.)
| | - Cédric Damour
- EnergyLab, EA 4079, Faculty of Sciences and Technology, University of Reunion, 97490 Saint-Denis, France;
| | - Jingbo Wang
- Department of Physics, School of Physics, Mathematics and Computing, The University of Western Australia, Perth 6009, Australia;
| | - Frederic Cadet
- BIGR, UMR_S1134 Inserm, University of Paris City, 75006 Paris, France; (F.O.); (P.C.); (O.L.-T.-V.)
- Laboratory of Excellence GR-Ex, 75006 Paris, France
- DSIMB, UMR_S1134 BIGR, Inserm, Faculty of Sciences and Technology, University of Reunion, 97744 Saint-Denis, France
- Artificial Intelligence Department, PEACCEL, 75013 Paris, France
| |
Collapse
|
14
|
Harding-Larsen D, Funk J, Madsen NG, Gharabli H, Acevedo-Rocha CG, Mazurenko S, Welner DH. Protein representations: Encoding biological information for machine learning in biocatalysis. Biotechnol Adv 2024; 77:108459. [PMID: 39366493 DOI: 10.1016/j.biotechadv.2024.108459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/19/2024] [Accepted: 09/29/2024] [Indexed: 10/06/2024]
Abstract
Enzymes offer a more environmentally friendly and low-impact solution to conventional chemistry, but they often require additional engineering for their application in industrial settings, an endeavour that is challenging and laborious. To address this issue, the power of machine learning can be harnessed to produce predictive models that enable the in silico study and engineering of improved enzymatic properties. Such machine learning models, however, require the conversion of the complex biological information to a numerical input, also called protein representations. These inputs demand special attention to ensure the training of accurate and precise models, and, in this review, we therefore examine the critical step of encoding protein information to numeric representations for use in machine learning. We selected the most important approaches for encoding the three distinct biological protein representations - primary sequence, 3D structure, and dynamics - to explore their requirements for employment and inductive biases. Combined representations of proteins and substrates are also introduced as emergent tools in biocatalysis. We propose the division of fixed representations, a collection of rule-based encoding strategies, and learned representations extracted from the latent spaces of large neural networks. To select the most suitable protein representation, we propose two main factors to consider. The first one is the model setup, which is influenced by the size of the training dataset and the choice of architecture. The second factor is the model objectives such as consideration about the assayed property, the difference between wild-type models and mutant predictors, and requirements for explainability. This review is aimed at serving as a source of information and guidance for properly representing enzymes in future machine learning models for biocatalysis.
Collapse
Affiliation(s)
- David Harding-Larsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Jonathan Funk
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Niklas Gesmar Madsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Hani Gharabli
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Carlos G Acevedo-Rocha
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Ditte Hededam Welner
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark.
| |
Collapse
|
15
|
Zare F, Fleming RMT. Integration of proteomic data with genome-scale metabolic models: A methodological overview. Protein Sci 2024; 33:e5150. [PMID: 39275997 PMCID: PMC11400636 DOI: 10.1002/pro.5150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 06/29/2024] [Accepted: 08/06/2024] [Indexed: 09/16/2024]
Abstract
The integration of proteomics data with constraint-based reconstruction and analysis (COBRA) models plays a pivotal role in understanding the relationship between genotype and phenotype and bridges the gap between genome-level phenomena and functional adaptations. Integrating a generic genome-scale model with information on proteins enables generation of a context-specific metabolic model which improves the accuracy of model prediction. This review explores methodologies for incorporating proteomics data into genome-scale models. Available methods are grouped into four distinct categories based on their approach to integrate proteomics data and their depth of modeling. Within each category section various methods are introduced in chronological order of publication demonstrating the progress of this field. Furthermore, challenges and potential solutions to further progress are outlined, including the limited availability of appropriate in vitro data, experimental enzyme turnover rates, and the trade-off between model accuracy, computational tractability, and data scarcity. In conclusion, methods employing simpler approaches demand fewer kinetic and omics data, consequently leading to a less complex mathematical problem and reduced computational expenses. On the other hand, approaches that delve deeper into cellular mechanisms and aim to create detailed mathematical models necessitate more extensive kinetic and omics data, resulting in a more complex and computationally demanding problem. However, in some cases, this increased cost can be justified by the potential for more precise predictions.
Collapse
Affiliation(s)
- Farid Zare
- School of Medicine, University of Galway, Galway, Ireland
| | | |
Collapse
|
16
|
Park SY, Choi DH, Song J, Lakshmanan M, Richelle A, Yoon S, Kontoravdi C, Lewis NE, Lee DY. Driving towards digital biomanufacturing by CHO genome-scale models. Trends Biotechnol 2024; 42:1192-1203. [PMID: 38548556 DOI: 10.1016/j.tibtech.2024.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 05/20/2024]
Abstract
Genome-scale metabolic models (GEMs) of Chinese hamster ovary (CHO) cells are valuable for gaining mechanistic understanding of mammalian cell metabolism and cultures. We provide a comprehensive overview of past and present developments of CHO-GEMs and in silico methods within the flux balance analysis (FBA) framework, focusing on their practical utility in rational cell line development and bioprocess improvements. There are many opportunities for further augmenting the model coverage and establishing integrative models that account for different cellular processes and data for future applications. With supportive collaborative efforts by the research community, we envisage that CHO-GEMs will be crucial for the increasingly digitized and dynamically controlled bioprocessing pipelines, especially because they can be successfully deployed in conjunction with artificial intelligence (AI) and systems engineering algorithms.
Collapse
Affiliation(s)
- Seo-Young Park
- School of Chemical Engineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Dong-Hyuk Choi
- School of Chemical Engineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Jinsung Song
- School of Chemical Engineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea
| | - Meiyappan Lakshmanan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, and Centre for Integrative Biology and Systems Medicine (IBSE), Indian Institute of Technology Madras, Chennai 600036, Tamil Nadu, India
| | - Anne Richelle
- Sartorius Corporate Research, Avenue Ariane 5, 1200 Brussels, Belgium
| | - Seongkyu Yoon
- Department of Chemical Engineering, University of Massachusetts Lowell, Lowell, MA 01850, USA
| | - Cleo Kontoravdi
- Department of Chemical Engineering and Chemical Technology, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Nathan E Lewis
- Departments of Pediatrics and Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Dong-Yup Lee
- School of Chemical Engineering, Sungkyunkwan University, Suwon, Gyeonggi-do 16419, Republic of Korea.
| |
Collapse
|
17
|
Lu H, Xiao L, Liao W, Yan X, Nielsen J. Cell factory design with advanced metabolic modelling empowered by artificial intelligence. Metab Eng 2024; 85:61-72. [PMID: 39038602 DOI: 10.1016/j.ymben.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/06/2024] [Accepted: 07/06/2024] [Indexed: 07/24/2024]
Abstract
Advances in synthetic biology and artificial intelligence (AI) have provided new opportunities for modern biotechnology. High-performance cell factories, the backbone of industrial biotechnology, are ultimately responsible for determining whether a bio-based product succeeds or fails in the fierce competition with petroleum-based products. To date, one of the greatest challenges in synthetic biology is the creation of high-performance cell factories in a consistent and efficient manner. As so-called white-box models, numerous metabolic network models have been developed and used in computational strain design. Moreover, great progress has been made in AI-powered strain engineering in recent years. Both approaches have advantages and disadvantages. Therefore, the deep integration of AI with metabolic models is crucial for the construction of superior cell factories with higher titres, yields and production rates. The detailed applications of the latest advanced metabolic models and AI in computational strain design are summarized in this review. Additionally, approaches for the deep integration of AI and metabolic models are discussed. It is anticipated that advanced mechanistic metabolic models powered by AI will pave the way for the efficient construction of powerful industrial chassis strains in the coming years.
Collapse
Affiliation(s)
- Hongzhong Lu
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China.
| | - Luchi Xiao
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Wenbin Liao
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China; Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, PR China
| | - Xuefeng Yan
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237, PR China
| | - Jens Nielsen
- BioInnovation Institute, Ole Måløes Vej, DK2200, Copenhagen N, Denmark; Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden.
| |
Collapse
|
18
|
Kundu P, Beura S, Mondal S, Das AK, Ghosh A. Machine learning for the advancement of genome-scale metabolic modeling. Biotechnol Adv 2024; 74:108400. [PMID: 38944218 DOI: 10.1016/j.biotechadv.2024.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/13/2024] [Accepted: 06/23/2024] [Indexed: 07/01/2024]
Abstract
Constraint-based modeling (CBM) has evolved as the core systems biology tool to map the interrelations between genotype, phenotype, and external environment. The recent advancement of high-throughput experimental approaches and multi-omics strategies has generated a plethora of new and precise information from wide-ranging biological domains. On the other hand, the continuously growing field of machine learning (ML) and its specialized branch of deep learning (DL) provide essential computational architectures for decoding complex and heterogeneous biological data. In recent years, both multi-omics and ML have assisted in the escalation of CBM. Condition-specific omics data, such as transcriptomics and proteomics, helped contextualize the model prediction while analyzing a particular phenotypic signature. At the same time, the advanced ML tools have eased the model reconstruction and analysis to increase the accuracy and prediction power. However, the development of these multi-disciplinary methodological frameworks mainly occurs independently, which limits the concatenation of biological knowledge from different domains. Hence, we have reviewed the potential of integrating multi-disciplinary tools and strategies from various fields, such as synthetic biology, CBM, omics, and ML, to explore the biochemical phenomenon beyond the conventional biological dogma. How the integrative knowledge of these intersected domains has improved bioengineering and biomedical applications has also been highlighted. We categorically explained the conventional genome-scale metabolic model (GEM) reconstruction tools and their improvement strategies through ML paradigms. Further, the crucial role of ML and DL in omics data restructuring for GEM development has also been briefly discussed. Finally, the case-study-based assessment of the state-of-the-art method for improving biomedical and metabolic engineering strategies has been elaborated. Therefore, this review demonstrates how integrating experimental and in silico strategies can help map the ever-expanding knowledge of biological systems driven by condition-specific cellular information. This multiview approach will elevate the application of ML-based CBM in the biomedical and bioengineering fields for the betterment of society and the environment.
Collapse
Affiliation(s)
- Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Satyajit Beura
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Suman Mondal
- P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Kumar Das
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
19
|
Choudhury S, Narayanan B, Moret M, Hatzimanikatis V, Miskovic L. Generative machine learning produces kinetic models that accurately characterize intracellular metabolic states. Nat Catal 2024; 7:1086-1098. [PMID: 39463726 PMCID: PMC11499278 DOI: 10.1038/s41929-024-01220-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 08/06/2024] [Indexed: 10/29/2024]
Abstract
Generating large omics datasets has become routine for gaining insights into cellular processes, yet deciphering these datasets to determine metabolic states remains challenging. Kinetic models can help integrate omics data by explicitly linking metabolite concentrations, metabolic fluxes and enzyme levels. Nevertheless, determining the kinetic parameters that underlie cellular physiology poses notable obstacles to the widespread use of these mathematical representations of metabolism. Here we present RENAISSANCE, a generative machine learning framework for efficiently parameterizing large-scale kinetic models with dynamic properties matching experimental observations. Through seamless integration of diverse omics data and other relevant information, including extracellular medium composition, physicochemical data and expertise of domain specialists, RENAISSANCE accurately characterizes intracellular metabolic states in Escherichia coli. It also estimates missing kinetic parameters and reconciles them with sparse experimental data, substantially reducing parameter uncertainty and improving accuracy. This framework will be valuable for researchers studying metabolic variations involving changes in metabolite and enzyme levels and enzyme activity in health and biotechnology.
Collapse
Affiliation(s)
- Subham Choudhury
- Laboratory of Computational Systems Biology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Bharath Narayanan
- Laboratory of Computational Systems Biology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Present Address: Department of Oncology, University of Cambridge, Cambridge, UK
| | - Michael Moret
- Laboratory of Computational Systems Biology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Present Address: Department of Genetics, Harvard Medical School, Boston, MA USA
| | - Vassily Hatzimanikatis
- Laboratory of Computational Systems Biology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Ljubisa Miskovic
- Laboratory of Computational Systems Biology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
20
|
Wang T, Xiang G, He S, Su L, Wang Y, Yan X, Lu H. DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures. Brief Bioinform 2024; 25:bbae409. [PMID: 39162313 PMCID: PMC11880767 DOI: 10.1093/bib/bbae409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 07/13/2024] [Accepted: 08/04/2024] [Indexed: 08/21/2024] Open
Abstract
Turnover numbers (kcat), which indicate an enzyme's catalytic efficiency, have a wide range of applications in fields including protein engineering and synthetic biology. Experimentally measuring the enzymes' kcat is always time-consuming. Recently, the prediction of kcat using deep learning models has mitigated this problem. However, the accuracy and robustness in kcat prediction still needs to be improved significantly, particularly when dealing with enzymes with low sequence similarity compared to those within the training dataset. Herein, we present DeepEnzyme, a cutting-edge deep learning model that combines the most recent Transformer and Graph Convolutional Network (GCN) to capture the information of both the sequence and 3D-structure of a protein. To improve the prediction accuracy, DeepEnzyme was trained by leveraging the integrated features from both sequences and 3D-structures. Consequently, DeepEnzyme exhibits remarkable robustness when processing enzymes with low sequence similarity compared to those in the training dataset by utilizing additional features from high-quality protein 3D-structures. DeepEnzyme also makes it possible to evaluate how point mutations affect the catalytic activity of the enzyme, which helps identify residue sites that are crucial for the catalytic function. In summary, DeepEnzyme represents a pioneering effort in predicting enzymes' kcat values with improved accuracy and robustness compared to previous algorithms. This advancement will significantly contribute to our comprehension of enzyme function and its evolutionary patterns across species.
Collapse
Affiliation(s)
- Tong Wang
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China
- College of Science, Chongqing University of Technology, 69 Hongguang Avenue, Banan District, Chongqing 400054, China
| | - Guangming Xiang
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China
| | - Siwei He
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China
| | - Liyun Su
- College of Science, Chongqing University of Technology, 69 Hongguang Avenue, Banan District, Chongqing 400054, China
| | - Yuguang Wang
- Institute of Natural Sciences, School of Mathematical Sciences, Zhangjiang Institute of Advanced Study, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, 701 Yunjin Road, Xuhui District, Shanghai 200237, China
| | - Xuefeng Yan
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, 130 Meilong Road, Xuhui District, Shanghai 200237, China
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Xuhui District, Shanghai 200237, China
| | - Hongzhong Lu
- State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200240, China
| |
Collapse
|
21
|
Wang J, Yang Z, Chen C, Yao G, Wan X, Bao S, Ding J, Wang L, Jiang H. MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction. Brief Bioinform 2024; 25:bbae387. [PMID: 39129365 PMCID: PMC11317537 DOI: 10.1093/bib/bbae387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 06/24/2024] [Accepted: 07/23/2024] [Indexed: 08/13/2024] Open
Abstract
Enzymatic reaction kinetics are central in analyzing enzymatic reaction mechanisms and target-enzyme optimization, and thus in biomanufacturing and other industries. The enzyme turnover number (kcat) and Michaelis constant (Km), key kinetic parameters for measuring enzyme catalytic efficiency, are crucial for analyzing enzymatic reaction mechanisms and the directed evolution of target enzymes. Experimental determination of kcat and Km is costly in terms of time, labor, and cost. To consider the intrinsic connection between kcat and Km and further improve the prediction performance, we propose a universal pretrained multitask deep learning model, MPEK, to predict these parameters simultaneously while considering pH, temperature, and organismal information. Through testing on the same kcat and Km test datasets, MPEK demonstrated superior prediction performance over the previous models. Specifically, MPEK achieved the Pearson coefficient of 0.808 for predicting kcat, improving ca. 14.6% and 7.6% compared to the DLKcat and UniKP models, and it achieved the Pearson coefficient of 0.777 for predicting Km, improving ca. 34.9% and 53.3% compared to the Kroll_model and UniKP models. More importantly, MPEK was able to reveal enzyme promiscuity and was sensitive to slight changes in the mutant enzyme sequence. In addition, in three case studies, it was shown that MPEK has the potential for assisted enzyme mining and directed evolution. To facilitate in silico evaluation of enzyme catalytic efficiency, we have established a web server implementing this model, which can be accessed at http://mathtc.nscc-tj.cn/mpek.
Collapse
Affiliation(s)
- Jingjing Wang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Zhijiang Yang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Chang Chen
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Ge Yao
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Xiukun Wan
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Shaoheng Bao
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, No. 37 South Central Street, Yangfang Town, Changping District, Beijing 102205, China
| |
Collapse
|
22
|
Zielinski DC, Matos MR, de Bree JE, Glass K, Sonnenschein N, Palsson BO. Bottom-up parameterization of enzyme rate constants: Reconciling inconsistent data. Metab Eng Commun 2024; 18:e00234. [PMID: 38711578 PMCID: PMC11070925 DOI: 10.1016/j.mec.2024.e00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 05/08/2024] Open
Abstract
Kinetic models of metabolism are promising platforms for studying complex metabolic systems and designing production strains. Given the availability of enzyme kinetic data from historical experiments and machine learning estimation tools, a straightforward modeling approach is to assemble kinetic data enzyme by enzyme until a desired scale is reached. However, this type of 'bottom up' parameterization of kinetic models has been difficult due to a number of issues including gaps in kinetic parameters, the complexity of enzyme mechanisms, inconsistencies between parameters obtained from different sources, and in vitro-in vivo differences. Here, we present a computational workflow for the robust estimation of kinetic parameters for detailed mass action enzyme models while taking into account parameter uncertainty. The resulting software package, termed MASSef (the Mass Action Stoichiometry Simulation Enzyme Fitting package), can handle standard 'macroscopic' kinetic parameters, including Km, kcat, Ki, Keq, and nh, as well as diverse reaction mechanisms defined in terms of mass action reactions and 'microscopic' rate constants. We provide three enzyme case studies demonstrating that this approach can identify and reconcile inconsistent data either within in vitro experiments or between in vitro and in vivo enzyme function. We further demonstrate how parameterized enzyme modules can be used to assemble pathway-scale kinetic models consistent with in vivo behavior. This work builds on the legacy of knowledge on kinetic behavior of enzymes by enabling robust parameterization of enzyme kinetic models at scale utilizing the abundance of historical literature data and machine learning parameter estimates.
Collapse
Affiliation(s)
- Daniel C. Zielinski
- Department of Bioengineering, University of California, San Diego, CA, 92093, USA
| | - Marta R.A. Matos
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - James E. de Bree
- Department of Bioengineering, University of California, San Diego, CA, 92093, USA
| | - Kevin Glass
- Department of Bioengineering, University of California, San Diego, CA, 92093, USA
| | - Nikolaus Sonnenschein
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Bernhard O. Palsson
- Department of Bioengineering, University of California, San Diego, CA, 92093, USA
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
- Department of Pediatrics, University of California, San Diego, CA, 92093, USA
| |
Collapse
|
23
|
Tarzi C, Zampieri G, Sullivan N, Angione C. Emerging methods for genome-scale metabolic modeling of microbial communities. Trends Endocrinol Metab 2024; 35:533-548. [PMID: 38575441 DOI: 10.1016/j.tem.2024.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/28/2024] [Accepted: 02/29/2024] [Indexed: 04/06/2024]
Abstract
Genome-scale metabolic models (GEMs) are consolidating as platforms for studying mixed microbial populations, by combining biological data and knowledge with mathematical rigor. However, deploying these models to answer research questions can be challenging due to the increasing number of available computational tools, the lack of universal standards, and their inherent limitations. Here, we present a comprehensive overview of foundational concepts for building and evaluating genome-scale models of microbial communities. We then compare tools in terms of requirements, capabilities, and applications. Next, we highlight the current pitfalls and open challenges to consider when adopting existing tools and developing new ones. Our compendium can be relevant for the expanding community of modelers, both at the entry and experienced levels.
Collapse
Affiliation(s)
- Chaimaa Tarzi
- School of Computing, Engineering and Digital Technologies, Teesside University, Southfield Rd, Middlesbrough, TS1 3BX, North Yorkshire, UK
| | - Guido Zampieri
- Department of Biology, University of Padova, Padova, 35122, Veneto, Italy
| | - Neil Sullivan
- Complement Genomics Ltd, Station Rd, Lanchester, Durham, DH7 0EX, County Durham, UK
| | - Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Southfield Rd, Middlesbrough, TS1 3BX, North Yorkshire, UK; Centre for Digital Innovation, Teesside University, Southfield Rd, Middlesbrough, TS1 3BX, North Yorkshire, UK; National Horizons Centre, Teesside University, 38 John Dixon Ln, Darlington, DL1 1HG, North Yorkshire, UK.
| |
Collapse
|
24
|
Turanli B, Gulfidan G, Aydogan OO, Kula C, Selvaraj G, Arga KY. Genome-scale metabolic models in translational medicine: the current status and potential of machine learning in improving the effectiveness of the models. Mol Omics 2024; 20:234-247. [PMID: 38444371 DOI: 10.1039/d3mo00152k] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
The genome-scale metabolic model (GEM) has emerged as one of the leading modeling approaches for systems-level metabolic studies and has been widely explored for a broad range of organisms and applications. Owing to the development of genome sequencing technologies and available biochemical data, it is possible to reconstruct GEMs for model and non-model microorganisms as well as for multicellular organisms such as humans and animal models. GEMs will evolve in parallel with the availability of biological data, new mathematical modeling techniques and the development of automated GEM reconstruction tools. The use of high-quality, context-specific GEMs, a subset of the original GEM in which inactive reactions are removed while maintaining metabolic functions in the extracted model, for model organisms along with machine learning (ML) techniques could increase their applications and effectiveness in translational research in the near future. Here, we briefly review the current state of GEMs, discuss the potential contributions of ML approaches for more efficient and frequent application of these models in translational research, and explore the extension of GEMs to integrative cellular models.
Collapse
Affiliation(s)
- Beste Turanli
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
- Health Biotechnology Joint Research and Application Center of Excellence, Istanbul, Turkey
| | - Gizem Gulfidan
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
| | - Ozge Onluturk Aydogan
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
| | - Ceyda Kula
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
- Health Biotechnology Joint Research and Application Center of Excellence, Istanbul, Turkey
| | - Gurudeeban Selvaraj
- Concordia University, Centre for Research in Molecular Modeling & Department of Chemistry and Biochemistry, Quebec, Canada
- Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha Dental College and Hospital, Department of Biomaterials, Bioinformatics Unit, Chennai, India
| | - Kazim Yalcin Arga
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
- Health Biotechnology Joint Research and Application Center of Excellence, Istanbul, Turkey
- Marmara University, Genetic and Metabolic Diseases Research and Investigation Center, Istanbul, Turkey
| |
Collapse
|
25
|
Chew YH, Spill F. Discretised Flux Balance Analysis for Reaction-Diffusion Simulation of Single-Cell Metabolism. Bull Math Biol 2024; 86:39. [PMID: 38448618 PMCID: PMC11390822 DOI: 10.1007/s11538-024-01264-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Metabolites have to diffuse within the sub-cellular compartments they occupy to specific locations where enzymes are, so reactions could occur. Conventional flux balance analysis (FBA), a method based on linear programming that is commonly used to model metabolism, implicitly assumes that all enzymatic reactions are not diffusion-limited though that may not always be the case. In this work, we have developed a spatial method that implements FBA on a grid-based system, to enable the exploration of diffusion effects on metabolism. Specifically, the method discretises a living cell into a two-dimensional grid, represents the metabolic reactions in each grid element as well as the diffusion of metabolites to and from neighbouring elements, and simulates the system as a single linear programming problem. We varied the number of rows and columns in the grid to simulate different cell shapes, and the method was able to capture diffusion effects at different shapes. We then used the method to simulate heterogeneous enzyme distribution, which suggested a theoretical effect on variability at the population level. We propose the use of this method, and its future extensions, to explore how spatiotemporal organisation of sub-cellular compartments and the molecules within could affect cell behaviour.
Collapse
Affiliation(s)
- Yin Hoon Chew
- School of Mathematics, University of Birmingham, Edgbaston, Birmingham, B15 2TT, England, UK.
| | - Fabian Spill
- School of Mathematics, University of Birmingham, Edgbaston, Birmingham, B15 2TT, England, UK
| |
Collapse
|
26
|
Goshisht MK. Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges. ACS OMEGA 2024; 9:9921-9945. [PMID: 38463314 PMCID: PMC10918679 DOI: 10.1021/acsomega.3c05913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/19/2024] [Accepted: 01/30/2024] [Indexed: 03/12/2024]
Abstract
Machine learning (ML), particularly deep learning (DL), has made rapid and substantial progress in synthetic biology in recent years. Biotechnological applications of biosystems, including pathways, enzymes, and whole cells, are being probed frequently with time. The intricacy and interconnectedness of biosystems make it challenging to design them with the desired properties. ML and DL have a synergy with synthetic biology. Synthetic biology can be employed to produce large data sets for training models (for instance, by utilizing DNA synthesis), and ML/DL models can be employed to inform design (for example, by generating new parts or advising unrivaled experiments to perform). This potential has recently been brought to light by research at the intersection of engineering biology and ML/DL through achievements like the design of novel biological components, best experimental design, automated analysis of microscopy data, protein structure prediction, and biomolecular implementations of ANNs (Artificial Neural Networks). I have divided this review into three sections. In the first section, I describe predictive potential and basics of ML along with myriad applications in synthetic biology, especially in engineering cells, activity of proteins, and metabolic pathways. In the second section, I describe fundamental DL architectures and their applications in synthetic biology. Finally, I describe different challenges causing hurdles in the progress of ML/DL and synthetic biology along with their solutions.
Collapse
Affiliation(s)
- Manoj Kumar Goshisht
- Department of Chemistry, Natural and
Applied Sciences, University of Wisconsin—Green
Bay, Green
Bay, Wisconsin 54311-7001, United States
| |
Collapse
|
27
|
Ferreira MADM, Silveira WBD, Nikoloski Z. Protein constraints in genome-scale metabolic models: Data integration, parameter estimation, and prediction of metabolic phenotypes. Biotechnol Bioeng 2024; 121:915-930. [PMID: 38178617 DOI: 10.1002/bit.28650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 10/24/2023] [Accepted: 12/18/2023] [Indexed: 01/06/2024]
Abstract
Genome-scale metabolic models provide a valuable resource to study metabolism and cell physiology. These models are employed with approaches from the constraint-based modeling framework to predict metabolic and physiological phenotypes. The prediction performance of genome-scale metabolic models can be improved by including protein constraints. The resulting protein-constrained models consider data on turnover numbers (kcat ) and facilitate the integration of protein abundances. In this systematic review, we present and discuss the current state-of-the-art regarding the estimation of kinetic parameters used in protein-constrained models. We also highlight how data-driven and constraint-based approaches can aid the estimation of turnover numbers and their usage in improving predictions of cellular phenotypes. Finally, we identify standing challenges in protein-constrained metabolic models and provide a perspective regarding future approaches to improve the predictive performance.
Collapse
Affiliation(s)
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
- Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| |
Collapse
|
28
|
Baghdassarian HM, Lewis NE. Resource allocation in mammalian systems. Biotechnol Adv 2024; 71:108305. [PMID: 38215956 PMCID: PMC11182366 DOI: 10.1016/j.biotechadv.2023.108305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 12/17/2023] [Accepted: 12/18/2023] [Indexed: 01/14/2024]
Abstract
Cells execute biological functions to support phenotypes such as growth, migration, and secretion. Complementarily, each function of a cell has resource costs that constrain phenotype. Resource allocation by a cell allows it to manage these costs and optimize their phenotypes. In fact, the management of resource constraints (e.g., nutrient availability, bioenergetic capacity, and macromolecular machinery production) shape activity and ultimately impact phenotype. In mammalian systems, quantification of resource allocation provides important insights into higher-order multicellular functions; it shapes intercellular interactions and relays environmental cues for tissues to coordinate individual cells to overcome resource constraints and achieve population-level behavior. Furthermore, these constraints, objectives, and phenotypes are context-dependent, with cells adapting their behavior according to their microenvironment, resulting in distinct steady-states. This review will highlight the biological insights gained from probing resource allocation in mammalian cells and tissues.
Collapse
Affiliation(s)
- Hratch M Baghdassarian
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA; Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Nathan E Lewis
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
29
|
Kugler A, Stensjö K. Machine learning predicts system-wide metabolic flux control in cyanobacteria. Metab Eng 2024; 82:171-182. [PMID: 38395194 DOI: 10.1016/j.ymben.2024.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 02/14/2024] [Accepted: 02/20/2024] [Indexed: 02/25/2024]
Abstract
Metabolic fluxes and their control mechanisms are fundamental in cellular metabolism, offering insights for the study of biological systems and biotechnological applications. However, quantitative and predictive understanding of controlling biochemical reactions in microbial cell factories, especially at the system level, is limited. In this work, we present ARCTICA, a computational framework that integrates constraint-based modelling with machine learning tools to address this challenge. Using the model cyanobacterium Synechocystis sp. PCC 6803 as chassis, we demonstrate that ARCTICA effectively simulates global-scale metabolic flux control. Key findings are that (i) the photosynthetic bioproduction is mainly governed by enzymes within the Calvin-Benson-Bassham (CBB) cycle, rather than by those involve in the biosynthesis of the end-product, (ii) the catalytic capacity of the CBB cycle limits the photosynthetic activity and downstream pathways and (iii) ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) is a major, but not the most, limiting step within the CBB cycle. Predicted metabolic reactions qualitatively align with prior experimental observations, validating our modelling approach. ARCTICA serves as a valuable pipeline for understanding cellular physiology and predicting rate-limiting steps in genome-scale metabolic networks, and thus provides guidance for bioengineering of cyanobacteria.
Collapse
Affiliation(s)
- Amit Kugler
- Microbial Chemistry, Department of Chemistry-Ångström Laboratory, Uppsala University, Box 523, SE-751 20, Uppsala, Sweden
| | - Karin Stensjö
- Microbial Chemistry, Department of Chemistry-Ångström Laboratory, Uppsala University, Box 523, SE-751 20, Uppsala, Sweden.
| |
Collapse
|
30
|
Zhang Y, Qin G, Aguilar B, Rappaport N, Yurkovich JT, Pflieger L, Huang S, Hood L, Shmulevich I. A framework towards digital twins for type 2 diabetes. Front Digit Health 2024; 6:1336050. [PMID: 38343907 PMCID: PMC10853398 DOI: 10.3389/fdgth.2024.1336050] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 01/15/2024] [Indexed: 11/16/2024] Open
Abstract
Introduction A digital twin is a virtual representation of a patient's disease, facilitating real-time monitoring, analysis, and simulation. This enables the prediction of disease progression, optimization of care delivery, and improvement of outcomes. Methods Here, we introduce a digital twin framework for type 2 diabetes (T2D) that integrates machine learning with multiomic data, knowledge graphs, and mechanistic models. By analyzing a substantial multiomic and clinical dataset, we constructed predictive machine learning models to forecast disease progression. Furthermore, knowledge graphs were employed to elucidate and contextualize multiomic-disease relationships. Results and discussion Our findings not only reaffirm known targetable disease components but also spotlight novel ones, unveiled through this integrated approach. The versatile components presented in this study can be incorporated into a digital twin system, enhancing our grasp of diseases and propelling the advancement of precision medicine.
Collapse
Affiliation(s)
- Yue Zhang
- Institute for Systems Biology, Seattle, WA, United States
| | - Guangrong Qin
- Institute for Systems Biology, Seattle, WA, United States
| | - Boris Aguilar
- Institute for Systems Biology, Seattle, WA, United States
| | - Noa Rappaport
- Institute for Systems Biology, Seattle, WA, United States
- Center for Phenomic Health, Buck Institute for Research on Aging, Novato, CA, United States
| | - James T. Yurkovich
- Center for Phenomic Health, Buck Institute for Research on Aging, Novato, CA, United States
- Phenome Health, Seattle, WA, United States
| | - Lance Pflieger
- Center for Phenomic Health, Buck Institute for Research on Aging, Novato, CA, United States
- Phenome Health, Seattle, WA, United States
| | - Sui Huang
- Institute for Systems Biology, Seattle, WA, United States
| | - Leroy Hood
- Institute for Systems Biology, Seattle, WA, United States
- Center for Phenomic Health, Buck Institute for Research on Aging, Novato, CA, United States
- Phenome Health, Seattle, WA, United States
| | | |
Collapse
|
31
|
Du JH, Patil P, Roeder K, Kuchibhotla AK. Extrapolated cross-validation for randomized ensembles. J Comput Graph Stat 2024; 33:1061-1072. [PMID: 39439808 PMCID: PMC11492369 DOI: 10.1080/10618600.2023.2288194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 11/15/2023] [Indexed: 10/25/2024]
Abstract
Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields δ -optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests under a computational constraint on the maximum ensemble size. Compared to sample-split and K -fold cross-validation, ECV achieves higher accuracy by avoiding sample splitting. Meanwhile, its computational cost is considerably lower owing to the use of the risk extrapolation technique.
Collapse
Affiliation(s)
- Jin-Hong Du
- Department of Statistics and Data Science, Carnegie Mellon University
- Machine Learning Department, Carnegie Mellon University
| | - Pratik Patil
- Department of Statistics, University of California, Berkeley
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University
| | | |
Collapse
|
32
|
Yu H, Deng H, He J, Keasling JD, Luo X. UniKP: a unified framework for the prediction of enzyme kinetic parameters. Nat Commun 2023; 14:8211. [PMID: 38081905 PMCID: PMC10713628 DOI: 10.1038/s41467-023-44113-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 11/30/2023] [Indexed: 12/18/2023] Open
Abstract
Prediction of enzyme kinetic parameters is essential for designing and optimizing enzymes for various biotechnological and industrial applications, but the limited performance of current prediction tools on diverse tasks hinders their practical applications. Here, we introduce UniKP, a unified framework based on pretrained language models for the prediction of enzyme kinetic parameters, including enzyme turnover number (kcat), Michaelis constant (Km), and catalytic efficiency (kcat / Km), from protein sequences and substrate structures. A two-layer framework derived from UniKP (EF-UniKP) has also been proposed to allow robust kcat prediction in considering environmental factors, including pH and temperature. In addition, four representative re-weighting methods are systematically explored to successfully reduce the prediction error in high-value prediction tasks. We have demonstrated the application of UniKP and EF-UniKP in several enzyme discovery and directed evolution tasks, leading to the identification of new enzymes and enzyme mutants with higher activity. UniKP is a valuable tool for deciphering the mechanisms of enzyme kinetics and enables novel insights into enzyme engineering and their industrial applications.
Collapse
Affiliation(s)
- Han Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Huaxiang Deng
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jiahui He
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jay D Keasling
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Chemical and Biomolecular Engineering & Department of Bioengineering, University of California, Berkeley, CA, 94720, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
33
|
Yang X, Mao Z, Huang J, Wang R, Dong H, Zhang Y, Ma H. Improving pathway prediction accuracy of constraints-based metabolic network models by treating enzymes as microcompartments. Synth Syst Biotechnol 2023; 8:597-605. [PMID: 37743907 PMCID: PMC10514394 DOI: 10.1016/j.synbio.2023.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/12/2023] [Accepted: 09/06/2023] [Indexed: 09/26/2023] Open
Abstract
Metabolic network models have become increasingly precise and accurate as the most widespread and practical digital representations of living cells. The prediction functions were significantly expanded by integrating cellular resources and abiotic constraints in recent years. However, if unreasonable modeling methods were adopted due to a lack of consideration of biological knowledge, the conflicts between stoichiometric and other constraints, such as thermodynamic feasibility and enzyme resource availability, would lead to distorted predictions. In this work, we investigated a prediction anomaly of EcoETM, a constraints-based metabolic network model, and introduced the idea of enzyme compartmentalization into the analysis process. Through rational combination of reactions, we avoid the false prediction of pathway feasibility caused by the unrealistic assumption of free intermediate metabolites. This allowed us to correct the pathway structures of l-serine and l-tryptophan. A specific analysis explains the application method of the EcoETM-like model and demonstrates its potential and value in correcting the prediction results in pathway structure by resolving the conflict between different constraints and incorporating the evolved roles of enzymes as reaction compartments. Notably, this work also reveals the trade-off between product yield and thermodynamic feasibility. Our work is of great value for the structural improvement of constraints-based models.
Collapse
Affiliation(s)
- Xue Yang
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
| | - Zhitao Mao
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
| | - Jianfeng Huang
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
| | - Ruoyu Wang
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
| | - Huaming Dong
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
- School of Environmental Ecology and Biological Engineering, Wuhan Institute of Technology, Wuhan, 430205, China
| | - Yanfei Zhang
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
| | - Hongwu Ma
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Technology Innovation Center of Synthetic Biology, Tianjin, 300308, China
| |
Collapse
|
34
|
Pettersen JP, Castillo S, Jouhten P, Almaas E. Genome-scale metabolic models reveal determinants of phenotypic differences in non-Saccharomyces yeasts. BMC Bioinformatics 2023; 24:438. [PMID: 37990145 PMCID: PMC10664357 DOI: 10.1186/s12859-023-05506-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 09/29/2023] [Indexed: 11/23/2023] Open
Abstract
BACKGROUND Use of alternative non-Saccharomyces yeasts in wine and beer brewing has gained more attention the recent years. This is both due to the desire to obtain a wider variety of flavours in the product and to reduce the final alcohol content. Given the metabolic differences between the yeast species, we wanted to account for some of the differences by using in silico models. RESULTS We created and studied genome-scale metabolic models of five different non-Saccharomyces species using an automated processes. These were: Metschnikowia pulcherrima, Lachancea thermotolerans, Hanseniaspora osmophila, Torulaspora delbrueckii and Kluyveromyces lactis. Using the models, we predicted that M. pulcherrima, when compared to the other species, conducts more respiration and thus produces less fermentation products, a finding which agrees with experimental data. Complex I of the electron transport chain was to be present in M. pulcherrima, but absent in the others. The predicted importance of Complex I was diminished when we incorporated constraints on the amount of enzymatic protein, as this shifts the metabolism towards fermentation. CONCLUSIONS Our results suggest that Complex I in the electron transport chain is a key differentiator between Metschnikowia pulcherrima and the other yeasts considered. Yet, more annotations and experimental data have the potential to improve model quality in order to increase fidelity and confidence in these results. Further experiments should be conducted to confirm the in vivo effect of Complex I in M. pulcherrima and its respiratory metabolism.
Collapse
Affiliation(s)
- Jakob P Pettersen
- Department of Biotechnology and Food Science, NTNU-Norwegian University of Science and Technology, Trondheim, Norway.
| | | | - Paula Jouhten
- Department of Bioproducts and Biosystems, Aalto University, Espoo, Finland
| | - Eivind Almaas
- Department of Biotechnology and Food Science, NTNU-Norwegian University of Science and Technology, Trondheim, Norway.
- Department of Public Health and General Practice, K.G. Jebsen Center for Genetic Epidemiology, NTNU- Norwegian University of Science and Technology, Trondheim, Norway.
| |
Collapse
|
35
|
Kim GB, Kim JY, Lee JA, Norsigian CJ, Palsson BO, Lee SY. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nat Commun 2023; 14:7370. [PMID: 37963869 PMCID: PMC10645960 DOI: 10.1038/s41467-023-43216-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 11/03/2023] [Indexed: 11/16/2023] Open
Abstract
Functional annotation of open reading frames in microbial genomes remains substantially incomplete. Enzymes constitute the most prevalent functional gene class in microbial genomes and can be described by their specific catalytic functions using the Enzyme Commission (EC) number. Consequently, the ability to predict EC numbers could substantially reduce the number of un-annotated genes. Here we present a deep learning model, DeepECtransformer, which utilizes transformer layers as a neural network architecture to predict EC numbers. Using the extensively studied Escherichia coli K-12 MG1655 genome, DeepECtransformer predicted EC numbers for 464 un-annotated genes. We experimentally validated the enzymatic activities predicted for three proteins (YgfF, YciO, and YjdM). Further examination of the neural network's reasoning process revealed that the trained neural network relies on functional motifs of enzymes to predict EC numbers. Thus, DeepECtransformer is a method that facilitates the functional annotation of uncharacterized genes.
Collapse
Affiliation(s)
- Gi Bae Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Ji Yeon Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Jong An Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Charles J Norsigian
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
- Novo Nordisk Foundation Center for Biosustainability, 2800, Kongens Lyngby, Denmark
| | - Sang Yup Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea.
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea.
- BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
36
|
Yang ZJ, Shao Q, Jiang Y, Jurich C, Ran X, Juarez RJ, Yan B, Stull SL, Gollu A, Ding N. Mutexa: A Computational Ecosystem for Intelligent Protein Engineering. J Chem Theory Comput 2023; 19:7459-7477. [PMID: 37828731 PMCID: PMC10653112 DOI: 10.1021/acs.jctc.3c00602] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Indexed: 10/14/2023]
Abstract
Protein engineering holds immense promise in shaping the future of biomedicine and biotechnology. This Review focuses on our ongoing development of Mutexa, a computational ecosystem designed to enable "intelligent protein engineering". In this vision, researchers will seamlessly acquire sequences of protein variants with desired functions as biocatalysts, therapeutic peptides, and diagnostic proteins through a finely-tuned computational machine, akin to Amazon Alexa's role as a versatile virtual assistant. The technical foundation of Mutexa has been established through the development of a database that combines and relates enzyme structures and their respective functions (e.g., IntEnzyDB), workflow software packages that enable high-throughput protein modeling (e.g., EnzyHTP and LassoHTP), and scoring functions that map the sequence-structure-function relationship of proteins (e.g., EnzyKR and DeepLasso). We will showcase the applications of these tools in benchmarking the convergence conditions of enzyme functional descriptors across mutants, investigating protein electrostatics and cavity distributions in SAM-dependent methyltransferases, and understanding the role of nonelectrostatic dynamic effects in enzyme catalysis. Finally, we will conclude by addressing the future steps and fundamental challenges in our endeavor to develop new Mutexa applications that assist the identification of beneficial mutants in protein engineering.
Collapse
Affiliation(s)
- Zhongyue J. Yang
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37235, United States
- Vanderbilt
Institute of Chemical Biology, Vanderbilt
University, Nashville, Tennessee 37235, United States
- Department
of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, United States
- Data
Science Institute, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Qianzhen Shao
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Yaoyukun Jiang
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Christopher Jurich
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
- Vanderbilt
Institute of Chemical Biology, Vanderbilt
University, Nashville, Tennessee 37235, United States
| | - Xinchun Ran
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Reecan J. Juarez
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
- Chemical
and Physical Biology Program, Vanderbilt
University, Nashville, Tennessee 37235, United States
| | - Bailu Yan
- Department
of Biostatistics, Vanderbilt University, Nashville, Tennessee 37205, United States
| | - Sebastian L. Stull
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Anvita Gollu
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Ning Ding
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| |
Collapse
|
37
|
Ran X, Jiang Y, Shao Q, Yang ZJ. EnzyKR: a chirality-aware deep learning model for predicting the outcomes of the hydrolase-catalyzed kinetic resolution. Chem Sci 2023; 14:12073-12082. [PMID: 37969577 PMCID: PMC10631226 DOI: 10.1039/d3sc02752j] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 10/16/2023] [Indexed: 11/17/2023] Open
Abstract
Hydrolase-catalyzed kinetic resolution is a well-established biocatalytic process. However, the computational tools that predict favorable enzyme scaffolds for separating a racemic substrate mixture are underdeveloped. To address this challenge, we trained a deep learning framework, EnzyKR, to automate the selection of hydrolases for stereoselective biocatalysis. EnzyKR adopts a classifier-regressor architecture that first identifies the reactive binding conformer of a substrate-hydrolase complex, and then predicts its activation free energy. A structure-based encoding strategy was used to depict the chiral interactions between hydrolases and enantiomers. Different from existing models trained on protein sequences and substrate SMILES strings, EnzyKR was trained using 204 substrate-hydrolase complexes, which were constructed by docking. EnzyKR was tested using a held-out dataset of 20 complexes on the task of predicting activation free energy. EnzyKR achieved a Pearson correlation coefficient (R) of 0.72, a Spearman rank correlation coefficient (Spearman R) of 0.72, and a mean absolute error (MAE) of 1.54 kcal mol-1 in this task. Furthermore, EnzyKR was tested on the task of predicting enantiomeric excess ratios for 28 hydrolytic kinetic resolution reactions catalyzed by fluoroacetate dehalogenase RPA1163, halohydrin HheC, A. mediolanus epoxide hydrolase, and P. fluorescens esterase. The performance of EnzyKR was compared against that of a recently developed kinetic predictor, DLKcat. EnzyKR correctly predicts the favored enantiomer and outperforms DLKcat in 18 out of 28 reactions, occupying 64% of the test cases. These results demonstrate EnzyKR to be a new approach for prediction of enantiomeric outcomes in hydrolase-catalyzed kinetic resolution reactions.
Collapse
Affiliation(s)
- Xinchun Ran
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
| | - Yaoyukun Jiang
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
| | - Qianzhen Shao
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University Nashville Tennessee 37235 USA +1-343-9849
- Center for Structural Biology, Vanderbilt University Nashville Tennessee 37235 USA
- Vanderbilt Institute of Chemical Biology, Vanderbilt University Nashville Tennessee 37235 USA
- Data Science Institute, Vanderbilt University Nashville Tennessee 37235 USA
- Department of Chemical and Biomolecular Engineering, Vanderbilt University Nashville Tennessee 37235 USA
| |
Collapse
|
38
|
Moura Ferreira MAD, Wendering P, Arend M, Batista da Silveira W, Nikoloski Z. Accurate prediction of in vivo protein abundances by coupling constraint-based modelling and machine learning. Metab Eng 2023; 80:184-192. [PMID: 37802292 DOI: 10.1016/j.ymben.2023.09.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/10/2023] [Accepted: 09/25/2023] [Indexed: 10/08/2023]
Abstract
Quantification of how different environmental cues affect protein allocation can provide important insights for understanding cell physiology. While absolute quantification of proteins can be obtained by resource-intensive mass-spectrometry-based technologies, prediction of protein abundances offers another way to obtain insights into protein allocation. Here we present CAMEL, a framework that couples constraint-based modelling with machine learning to predict protein abundance for any environmental condition. This is achieved by building machine learning models that leverage static features, derived from protein sequences, and condition-dependent features predicted from protein-constrained metabolic models. Our findings demonstrate that CAMEL results in excellent prediction of protein allocation in E. coli (average Pearson correlation of at least 0.9), and moderate performance in S. cerevisiae (average Pearson correlation of at least 0.5). Therefore, CAMEL outperformed contending approaches without using molecular read-outs from unseen conditions and provides a valuable tool for using protein allocation in biotechnological applications.
Collapse
Affiliation(s)
| | - Philipp Wendering
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany
| | - Marius Arend
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany.
| |
Collapse
|
39
|
Lamoureux CR, Decker KT, Sastry AV, Rychel K, Gao Y, McConn J, Zielinski D, Palsson BO. A multi-scale expression and regulation knowledge base for Escherichia coli. Nucleic Acids Res 2023; 51:10176-10193. [PMID: 37713610 PMCID: PMC10602906 DOI: 10.1093/nar/gkad750] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/02/2023] [Accepted: 09/05/2023] [Indexed: 09/17/2023] Open
Abstract
Transcriptomic data is accumulating rapidly; thus, scalable methods for extracting knowledge from this data are critical. Here, we assembled a top-down expression and regulation knowledge base for Escherichia coli. The expression component is a 1035-sample, high-quality RNA-seq compendium consisting of data generated in our lab using a single experimental protocol. The compendium contains diverse growth conditions, including: 9 media; 39 supplements, including antibiotics; 42 heterologous proteins; and 76 gene knockouts. Using this resource, we elucidated global expression patterns. We used machine learning to extract 201 modules that account for 86% of known regulatory interactions, creating the regulatory component. With these modules, we identified two novel regulons and quantified systems-level regulatory responses. We also integrated 1675 curated, publicly-available transcriptomes into the resource. We demonstrated workflows for analyzing new data against this knowledge base via deconstruction of regulation during aerobic transition. This resource illuminates the E. coli transcriptome at scale and provides a blueprint for top-down transcriptomic analysis of non-model organisms.
Collapse
Affiliation(s)
- Cameron R Lamoureux
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Katherine T Decker
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Anand V Sastry
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Kevin Rychel
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Ye Gao
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - John Luke McConn
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Daniel C Zielinski
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
40
|
Hu XP, Schroeder S, Lercher MJ. Proteome efficiency of metabolic pathways in Escherichia coli increases along the nutrient flow. mSystems 2023; 8:e0076023. [PMID: 37795991 PMCID: PMC10654084 DOI: 10.1128/msystems.00760-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 08/24/2023] [Indexed: 10/06/2023] Open
Abstract
IMPORTANCE Protein translation is the most expensive cellular process in fast-growing bacteria, and efficient proteome usage should thus be under strong natural selection. However, recent studies show that a considerable part of the proteome is unneeded for instantaneous cell growth in Escherichia coli. We still lack a systematic understanding of how this excess proteome is distributed across different pathways as a function of the growth conditions. We estimated the minimal required proteome across growth conditions in E. coli and compared the predictions with experimental data. We found that the proteome allocated to the most expensive internal pathways, including translation and the synthesis of amino acids and cofactors, is near the minimally required levels. In contrast, transporters and central carbon metabolism show much higher proteome levels than the predicted minimal abundance. Our analyses show that the proteome fraction unneeded for instantaneous cell growth decreases along the nutrient flow in E. coli.
Collapse
Affiliation(s)
- Xiao-Pan Hu
- Institute for Computer Science, Heinrich Heine University, Düsseldorf, Germany
- Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| | - Stefan Schroeder
- Institute for Computer Science, Heinrich Heine University, Düsseldorf, Germany
- Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| | - Martin J. Lercher
- Institute for Computer Science, Heinrich Heine University, Düsseldorf, Germany
- Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
41
|
Ferreira MADM, da Silveira WB, Nikoloski Z. PARROT: Prediction of enzyme abundances using protein-constrained metabolic models. PLoS Comput Biol 2023; 19:e1011549. [PMID: 37856550 PMCID: PMC10617714 DOI: 10.1371/journal.pcbi.1011549] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 10/31/2023] [Accepted: 09/29/2023] [Indexed: 10/21/2023] Open
Abstract
Protein allocation determines the activity of cellular pathways and affects growth across all organisms. Therefore, different experimental and machine learning approaches have been developed to quantify and predict protein abundance and how they are allocated to different cellular functions, respectively. Yet, despite advances in protein quantification, it remains challenging to predict condition-specific allocation of enzymes in metabolic networks. Here, using protein-constrained metabolic models, we propose a family of constrained-based approaches, termed PARROT, to predict how much of each enzyme is used based on the principle of minimizing the difference between a reference and an alternative growth condition. To this end, PARROT variants model the minimization of enzyme reallocation using four different (combinations of) distance functions. We demonstrate that the PARROT variant that minimizes the Manhattan distance between the enzyme allocation of a reference and an alternative condition outperforms existing approaches based on the parsimonious distribution of fluxes or enzymes for both Escherichia coli and Saccharomyces cerevisiae. Further, we show that the combined minimization of flux and enzyme allocation adjustment leads to inconsistent predictions. Together, our findings indicate that minimization of protein allocation rather than flux redistribution is a governing principle determining steady-state pathway activity for microorganism grown in alternative growth conditions.
Collapse
Affiliation(s)
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
- Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| |
Collapse
|
42
|
Bruggeman FJ, Teusink B, Steuer R. Trade-offs between the instantaneous growth rate and long-term fitness: Consequences for microbial physiology and predictive computational models. Bioessays 2023; 45:e2300015. [PMID: 37559168 DOI: 10.1002/bies.202300015] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 07/14/2023] [Accepted: 07/19/2023] [Indexed: 08/11/2023]
Abstract
Microbial systems biology has made enormous advances in relating microbial physiology to the underlying biochemistry and molecular biology. By meticulously studying model microorganisms, in particular Escherichia coli and Saccharomyces cerevisiae, increasingly comprehensive computational models predict metabolic fluxes, protein expression, and growth. The modeling rationale is that cells are constrained by a limited pool of resources that they allocate optimally to maximize fitness. As a consequence, the expression of particular proteins is at the expense of others, causing trade-offs between cellular objectives such as instantaneous growth, stress tolerance, and capacity to adapt to new environments. While current computational models are remarkably predictive for E. coli and S. cerevisiae when grown in laboratory environments, this may not hold for other growth conditions and other microorganisms. In this contribution, we therefore discuss the relationship between the instantaneous growth rate, limited resources, and long-term fitness. We discuss uses and limitations of current computational models, in particular for rapidly changing and adverse environments, and propose to classify microbial growth strategies based on Grimes's CSR framework.
Collapse
Affiliation(s)
- Frank J Bruggeman
- Systems Biology Lab/AIMMS, VU University, Amsterdam, The Netherlands
| | - Bas Teusink
- Systems Biology Lab/AIMMS, VU University, Amsterdam, The Netherlands
| | - Ralf Steuer
- Institute for Theoretical Biology (ITB), Institute for Biology, Humboldt-University of Berlin, Berlin, Germany
| |
Collapse
|
43
|
Fan X, Cao L, Yan X. Sensitivity analysis and adaptive mutation strategy differential evolution algorithm for optimizing enzymes' turnover numbers in metabolic models. Biotechnol Bioeng 2023. [PMID: 37448239 DOI: 10.1002/bit.28493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 04/04/2023] [Accepted: 06/29/2023] [Indexed: 07/15/2023]
Abstract
Genome-scale metabolic network model (GSMM) based on enzyme constraints greatly improves general metabolic models. The turnover number ( k cat ${k}_{\mathrm{cat}}$ ) of enzymes is used as a parameter to limit the reaction when extending GSMM. Therefore, turnover number plays a crucial role in the prediction accuracy of cell metabolism. In this work, we proposed an enzyme-constrained GSMM parameter optimization method. First, sensitivity analysis of the parameters was carried out to select the parameters with the greatest influence on predicting the specific growth rate. Then, differential evolution (DE) algorithm with adaptive mutation strategy was adopted to optimize the parameters. This algorithm can dynamically select five different mutation strategies. Finally, the specific growth rate prediction, flux variability, and phase plane of the optimized model were analyzed to further evaluate the model. The enzyme-constrained GSMM of Saccharomyces cerevisiae, ecYeast8.3.4, was optimized. Results of the sensitivity analysis showed that the optimization variables can be divided into three groups based on sensitivity: most sensitive (149 k cat ${k}_{\mathrm{cat}}$ c), highly sensitive (1759 k cat ${k}_{\mathrm{cat}}$ ), and nonsensitive (2502 k cat ${k}_{\mathrm{cat}}$ ) groups. Six optimization strategies were developed based on the results of the sensitivity analysis. The results showed that the DE with adaptive mutation strategy can indeed improve the model by optimizing highly sensitive parameters. Retaining all parameters and optimizing the highly sensitive parameters are the recommended optimization strategy.
Collapse
Affiliation(s)
- Xingcun Fan
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Lingfeng Cao
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Xuefeng Yan
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
44
|
Kroll A, Rousset Y, Hu XP, Liebrand NA, Lercher MJ. Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning. Nat Commun 2023; 14:4139. [PMID: 37438349 DOI: 10.1038/s41467-023-39840-4] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 06/27/2023] [Indexed: 07/14/2023] Open
Abstract
The turnover number kcat, a measure of enzyme efficiency, is central to understanding cellular physiology and resource allocation. As experimental kcat estimates are unavailable for the vast majority of enzymatic reactions, the development of accurate computational prediction methods is highly desirable. However, existing machine learning models are limited to a single, well-studied organism, or they provide inaccurate predictions except for enzymes that are highly similar to proteins in the training set. Here, we present TurNuP, a general and organism-independent model that successfully predicts turnover numbers for natural reactions of wild-type enzymes. We constructed model inputs by representing complete chemical reactions through differential reaction fingerprints and by representing enzymes through a modified and re-trained Transformer Network model for protein sequences. TurNuP outperforms previous models and generalizes well even to enzymes that are not similar to proteins in the training set. Parameterizing metabolic models with TurNuP-predicted kcat values leads to improved proteome allocation predictions. To provide a powerful and convenient tool for the study of molecular biochemistry and physiology, we implemented a TurNuP web server.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Yvan Rousset
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Xiao-Pan Hu
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Nina A Liebrand
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany.
| |
Collapse
|
45
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|
46
|
Pang TY, Lercher MJ. Optimal density of bacterial cells. PLoS Comput Biol 2023; 19:e1011177. [PMID: 37307285 DOI: 10.1371/journal.pcbi.1011177] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 05/11/2023] [Indexed: 06/14/2023] Open
Abstract
A substantial fraction of the bacterial cytosol is occupied by catalysts and their substrates. While a higher volume density of catalysts and substrates might boost biochemical fluxes, the resulting molecular crowding can slow down diffusion, perturb the reactions' Gibbs free energies, and reduce the catalytic efficiency of proteins. Due to these tradeoffs, dry mass density likely possesses an optimum that facilitates maximal cellular growth and that is interdependent on the cytosolic molecule size distribution. Here, we analyze the balanced growth of a model cell, accounting systematically for crowding effects on reaction kinetics. Its optimal cytosolic volume occupancy depends on the nutrient-dependent resource allocation into large ribosomal vs. small metabolic macromolecules, reflecting a tradeoff between the saturation of metabolic enzymes, favoring larger occupancies with higher encounter rates, and the inhibition of the ribosomes, favoring lower occupancies with unhindered diffusion of tRNAs. Our predictions across growth rates are quantitatively consistent with the experimentally observed reduction in volume occupancy on rich media compared to minimal media in E. coli. Strong deviations from optimal cytosolic occupancy only lead to minute reductions in growth rate, which are nevertheless evolutionarily relevant due to large bacterial population sizes. In sum, cytosolic density variation in bacterial cells appears to be consistent with an optimality principle of cellular efficiency.
Collapse
Affiliation(s)
- Tin Yau Pang
- Institute for Computer Science & Department of Biology, Heinrich Heine University, Düsseldorf, Germany
- Division of Cardiology, Pulmonology and Vascular Medicine, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
| | - Martin J Lercher
- Institute for Computer Science & Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
47
|
Dourado H, Liebermeister W, Ebenhöh O, Lercher MJ. Mathematical properties of optimal fluxes in cellular reaction networks at balanced growth. PLoS Comput Biol 2023; 19:e1011156. [PMID: 37279246 DOI: 10.1371/journal.pcbi.1011156] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 05/04/2023] [Indexed: 06/08/2023] Open
Abstract
The physiology of biological cells evolved under physical and chemical constraints, such as mass conservation across the network of biochemical reactions, nonlinear reaction kinetics, and limits on cell density. For unicellular organisms, the fitness that governs this evolution is mainly determined by the balanced cellular growth rate. We previously introduced growth balance analysis (GBA) as a general framework to model and analyze such nonlinear systems, revealing important analytical properties of optimal balanced growth states. It has been shown that at optimality, only a minimal subset of reactions can have nonzero flux. However, no general principles have been established to determine if a specific reaction is active at optimality. Here, we extend the GBA framework to study the optimality of each biochemical reaction, and we identify the mathematical conditions determining whether a reaction is active or not at optimal growth in a given environment. We reformulate the mathematical problem in terms of a minimal number of dimensionless variables and use the Karush-Kuhn-Tucker (KKT) conditions to identify fundamental principles of optimal resource allocation in GBA models of any size and complexity. Our approach helps to identify from first principles the economic values of biochemical reactions, expressed as marginal changes in cellular growth rate; these economic values can be related to the costs and benefits of proteome allocation into the reactions' catalysts. Our formulation also generalizes the concepts of Metabolic Control Analysis to models of growing cells. We show how the extended GBA framework unifies and extends previous approaches of cellular modeling and analysis, putting forward a program to analyze cellular growth through the stationarity conditions of a Lagrangian function. GBA thereby provides a general theoretical toolbox for the study of fundamental mathematical properties of balanced cellular growth.
Collapse
Affiliation(s)
- Hugo Dourado
- Institute for Computer Science and Department of Biology, Heinrich-Heine Universität, Düsseldorf, Germany
| | | | - Oliver Ebenhöh
- Quantitative and Theoretical Biology, Heinrich-Heine Universität, Düsseldorf, Germany
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich-Heine Universität, Düsseldorf, Germany
| |
Collapse
|
48
|
Vasina M, Kovar D, Damborsky J, Ding Y, Yang T, deMello A, Mazurenko S, Stavrakis S, Prokop Z. In-depth analysis of biocatalysts by microfluidics: An emerging source of data for machine learning. Biotechnol Adv 2023; 66:108171. [PMID: 37150331 DOI: 10.1016/j.biotechadv.2023.108171] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 05/04/2023] [Accepted: 05/04/2023] [Indexed: 05/09/2023]
Abstract
Nowadays, the vastly increasing demand for novel biotechnological products is supported by the continuous development of biocatalytic applications which provide sustainable green alternatives to chemical processes. The success of a biocatalytic application is critically dependent on how quickly we can identify and characterize enzyme variants fitting the conditions of industrial processes. While miniaturization and parallelization have dramatically increased the throughput of next-generation sequencing systems, the subsequent characterization of the obtained candidates is still a limiting process in identifying the desired biocatalysts. Only a few commercial microfluidic systems for enzyme analysis are currently available, and the transformation of numerous published prototypes into commercial platforms is still to be streamlined. This review presents the state-of-the-art, recent trends, and perspectives in applying microfluidic tools in the functional and structural analysis of biocatalysts. We discuss the advantages and disadvantages of available technologies, their reproducibility and robustness, and readiness for routine laboratory use. We also highlight the unexplored potential of microfluidics to leverage the power of machine learning for biocatalyst development.
Collapse
Affiliation(s)
- Michal Vasina
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - David Kovar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Yun Ding
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Tianjin Yang
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland; Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland
| | - Andrew deMello
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic.
| | - Stavros Stavrakis
- Institute for Chemical and Bioengineering, ETH Zürich, 8093 Zürich, Switzerland.
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic.
| |
Collapse
|
49
|
Cheng Y, Bi X, Xu Y, Liu Y, Li J, Du G, Lv X, Liu L. Machine learning for metabolic pathway optimization: A review. Comput Struct Biotechnol J 2023; 21:2381-2393. [PMID: 38213889 PMCID: PMC10781721 DOI: 10.1016/j.csbj.2023.03.045] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 03/24/2023] [Accepted: 03/25/2023] [Indexed: 03/29/2023] Open
Abstract
Optimizing the metabolic pathways of microbial cell factories is essential for establishing viable biotechnological production processes. However, due to the limited understanding of the complex setup of cellular machinery, building efficient microbial cell factories remains tedious and time-consuming. Machine learning (ML), a powerful tool capable of identifying patterns within large datasets, has been used to analyze biological datasets generated using various high-throughput technologies to build data-driven models for complex bioprocesses. In addition, ML can also be integrated with Design-Build-Test-Learn to accelerate development. This review focuses on recent ML applications in genome-scale metabolic model construction, multistep pathway optimization, rate-limiting enzyme engineering, and gene regulatory element designing. In addition, we have discussed some limitations of these methods as well as potential solutions.
Collapse
Affiliation(s)
- Yang Cheng
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Xinyu Bi
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yameng Xu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yanfeng Liu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Jianghua Li
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Guocheng Du
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Xueqin Lv
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Long Liu
- Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China
- Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
50
|
Yuan L, Lu H, Li F, Nielsen J, Kerkhoven EJ. HGTphyloDetect: facilitating the identification and phylogenetic analysis of horizontal gene transfer. Brief Bioinform 2023; 24:7031155. [PMID: 36752380 PMCID: PMC10025432 DOI: 10.1093/bib/bbad035] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 12/28/2022] [Accepted: 01/17/2023] [Indexed: 02/09/2023] Open
Abstract
BACKGROUND Horizontal gene transfer (HGT) is an important driver in genome evolution, gain-of-function, and metabolic adaptation to environmental niches. Genome-wide identification of putative HGT events has become increasingly practical, given the rapid growth of genomic data. However, existing HGT analysis toolboxes are not widely used, limited by their inability to perform phylogenetic reconstruction to explore potential donors, and the detection of HGT from both evolutionarily distant and closely related species. RESULTS In this study, we have developed HGTphyloDetect, which is a versatile computational toolbox that combines high-throughput analysis with phylogenetic inference, to facilitate comprehensive investigation of HGT events. Two case studies with Saccharomyces cerevisiae and Candida versatilis demonstrate the ability of HGTphyloDetect to identify horizontally acquired genes with high accuracy. In addition, HGTphyloDetect enables phylogenetic analysis to illustrate a likely path of gene transmission among the evolutionarily distant or closely related species. CONCLUSIONS The HGTphyloDetect computational toolbox is designed for ease of use and can accurately find HGT events with a very low false discovery rate in a high-throughput manner. The HGTphyloDetect toolbox and its related user tutorial are freely available at https://github.com/SysBioChalmers/HGTphyloDetect.
Collapse
Affiliation(s)
- Le Yuan
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96 Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96 Gothenburg, Sweden
| | - Hongzhong Lu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Feiran Li
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96 Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96 Gothenburg, Sweden
- BioInnovation Institute, Ole Måløes Vej 3 DK-2200 Copenhagen, Denmark
| | - Eduard J Kerkhoven
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96 Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96 Gothenburg, Sweden
| |
Collapse
|