1
|
Konečný L, Peterková K. Unveiling the peptidases of parasites from the office chair - The endothelin-converting enzyme case study. ADVANCES IN PARASITOLOGY 2024; 126:1-52. [PMID: 39448189 DOI: 10.1016/bs.apar.2024.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2024]
Abstract
The emergence of high-throughput methodologies such as next-generation sequencing and proteomics has necessitated significant advancements in biological databases and bioinformatic tools, therefore reshaping the landscape of research into parasitic peptidases. In this review we outline the development of these resources along the -omics technologies and their transformative impact on the field. Apart from extensive summary of general and specific databases and tools, we provide a general pipeline on how to use these resources effectively to identify candidate peptidases from these large datasets and how to gain as much information about them as possible without leaving the office chair. This pipeline is then applied in an illustrative case study on the endothelin-converting enzyme 1 homologue from Schistosoma mansoni and attempts to highlight the contemporary capabilities of bioinformatics. The case study demonstrate how such approach can aid to hypothesize enzyme functions and interactions through computational analysis alone effectively and emphasizes how such virtual investigations can guide and optimize subsequent wet lab experiments therefore potentially saving precious time and resources. Finally, by showing what can be achieved without traditional wet laboratory methods, this review provides a compelling narrative on the use of bioinformatics to bridge the gap between big data and practical research applications, highlighting the key role of these technologies in furthering our understanding of parasitic diseases.
Collapse
Affiliation(s)
- Lukáš Konečný
- Department of Parasitology, Faculty of Science, Charles University, Prague, Czechia; Department of Ecology, Centre of Infectious Animal Diseases, Faculty of Environmental Sciences, Czech University of Life Sciences, Prague, Czechia.
| | - Kristýna Peterková
- Department of Parasitology, Faculty of Science, Charles University, Prague, Czechia
| |
Collapse
|
2
|
Kroll A, Ranjan S, Lercher MJ. A multimodal Transformer Network for protein-small molecule interactions enhances predictions of kinase inhibition and enzyme-substrate relationships. PLoS Comput Biol 2024; 20:e1012100. [PMID: 38768223 PMCID: PMC11142704 DOI: 10.1371/journal.pcbi.1012100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 05/31/2024] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
The activities of most enzymes and drugs depend on interactions between proteins and small molecules. Accurate prediction of these interactions could greatly accelerate pharmaceutical and biotechnological research. Current machine learning models designed for this task have a limited ability to generalize beyond the proteins used for training. This limitation is likely due to a lack of information exchange between the protein and the small molecule during the generation of the required numerical representations. Here, we introduce ProSmith, a machine learning framework that employs a multimodal Transformer Network to simultaneously process protein amino acid sequences and small molecule strings in the same input. This approach facilitates the exchange of all relevant information between the two molecule types during the computation of their numerical representations, allowing the model to account for their structural and functional interactions. Our final model combines gradient boosting predictions based on the resulting multimodal Transformer Network with independent predictions based on separate deep learning representations of the proteins and small molecules. The resulting predictions outperform recently published state-of-the-art models for predicting protein-small molecule interactions across three diverse tasks: predicting kinase inhibitions; inferring potential substrates for enzymes; and predicting Michaelis constants KM. The Python code provided can be used to easily implement and improve machine learning predictions involving arbitrary protein-small molecule interactions.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| | - Sahasra Ranjan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Martin J. Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
3
|
Schulz-Mirbach H, Dronsella B, He H, Erb TJ. Creating new-to-nature carbon fixation: A guide. Metab Eng 2024; 82:12-28. [PMID: 38160747 DOI: 10.1016/j.ymben.2023.12.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/23/2023] [Accepted: 12/27/2023] [Indexed: 01/03/2024]
Abstract
Synthetic biology aims at designing new biological functions from first principles. These new designs allow to expand the natural solution space and overcome the limitations of naturally evolved systems. One example is synthetic CO2-fixation pathways that promise to provide more efficient ways for the capture and conversion of CO2 than natural pathways, such as the Calvin Benson Bassham (CBB) cycle of photosynthesis. In this review, we provide a practical guideline for the design and realization of such new-to-nature CO2-fixation pathways. We introduce the concept of "synthetic CO2-fixation", and give a general overview over the enzymology and topology of synthetic pathways, before we derive general principles for their design from their eight naturally evolved analogs. We provide a comprehensive summary of synthetic carbon-assimilation pathways and derive a step-by-step, practical guide from the theoretical design to their practical implementation, before ending with an outlook on new developments in the field.
Collapse
Affiliation(s)
- Helena Schulz-Mirbach
- Max Planck Institute for Terrestrial Microbiology, Karl-von-Frisch-Str. 10, 35043, Marburg, Germany
| | - Beau Dronsella
- Max Planck Institute for Terrestrial Microbiology, Karl-von-Frisch-Str. 10, 35043, Marburg, Germany; Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany
| | - Hai He
- Max Planck Institute for Terrestrial Microbiology, Karl-von-Frisch-Str. 10, 35043, Marburg, Germany
| | - Tobias J Erb
- Max Planck Institute for Terrestrial Microbiology, Karl-von-Frisch-Str. 10, 35043, Marburg, Germany; Center for Synthetic Microbiology (SYNMIKRO), Karl-von-Frisch-Str. 16, D-35043, Marburg, Germany.
| |
Collapse
|
4
|
Yang J, Li FZ, Arnold FH. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering. ACS CENTRAL SCIENCE 2024; 10:226-241. [PMID: 38435522 PMCID: PMC10906252 DOI: 10.1021/acscentsci.3c01275] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/26/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024]
Abstract
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
Collapse
Affiliation(s)
- Jason Yang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Frances H. Arnold
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
5
|
Boob AG, Chen J, Zhao H. Enabling pathway design by multiplex experimentation and machine learning. Metab Eng 2024; 81:70-87. [PMID: 38040110 DOI: 10.1016/j.ymben.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 11/01/2023] [Accepted: 11/25/2023] [Indexed: 12/03/2023]
Abstract
The remarkable metabolic diversity observed in nature has provided a foundation for sustainable production of a wide array of valuable molecules. However, transferring the biosynthetic pathway to the desired host often runs into inherent failures that arise from intermediate accumulation and reduced flux resulting from competing pathways within the host cell. Moreover, the conventional trial and error methods utilized in pathway optimization struggle to fully grasp the intricacies of installed pathways, leading to time-consuming and labor-intensive experiments, ultimately resulting in suboptimal yields. Considering these obstacles, there is a pressing need to explore the enzyme expression landscape and identify the optimal pathway configuration for enhanced production of molecules. This review delves into recent advancements in pathway engineering, with a focus on multiplex experimentation and machine learning techniques. These approaches play a pivotal role in overcoming the limitations of traditional methods, enabling exploration of a broader design space and increasing the likelihood of discovering optimal pathway configurations for enhanced production of molecules. We discuss several tools and strategies for pathway design, construction, and optimization for sustainable and cost-effective microbial production of molecules ranging from bulk to fine chemicals. We also highlight major successes in academia and industry through compelling case studies.
Collapse
Affiliation(s)
- Aashutosh Girish Boob
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Junyu Chen
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, United States; DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, United States.
| |
Collapse
|
6
|
Alletto P, Garcia AM, Marchesan S. Short Peptides for Hydrolase Supramolecular Mimicry and Their Potential Applications. Gels 2023; 9:678. [PMID: 37754360 PMCID: PMC10529927 DOI: 10.3390/gels9090678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 08/19/2023] [Accepted: 08/21/2023] [Indexed: 09/28/2023] Open
Abstract
Hydrolases are enzymes that have found numerous applications in various industrial sectors spanning from pharmaceuticals to foodstuff and beverages, consumers' products such as detergents and personal care, textiles, and even for biodiesel production and environmental bioremediation. Self-assembling and gelling short peptides have been designed for their mimicry so that their supramolecular organization leads to the creation of hydrophobic pockets for catalysis to occur. Catalytic gels of this kind can also find numerous industrial applications to address important global challenges of our time. This concise review focuses on the last 5 years of progress in this fast-paced, popular field of research with an eye towards the future.
Collapse
Affiliation(s)
- Paola Alletto
- Chemical and Pharmaceutical Sciences Department, University of Trieste, 34127 Trieste, Italy
- Instituto Regional de Investigación Científica Aplicada (IRICA), Universidad de Castilla-La Mancha, 13071 Ciudad Real, Spain
- Facultad de Ciencias y Tecnologías Químicas, Universidad de Castilla-La Mancha, 13071 Ciudad Real, Spain
| | - Ana Maria Garcia
- Instituto Regional de Investigación Científica Aplicada (IRICA), Universidad de Castilla-La Mancha, 13071 Ciudad Real, Spain
- Facultad de Ciencias y Tecnologías Químicas, Universidad de Castilla-La Mancha, 13071 Ciudad Real, Spain
| | - Silvia Marchesan
- Chemical and Pharmaceutical Sciences Department, University of Trieste, 34127 Trieste, Italy
| |
Collapse
|
7
|
Zhang Q, Zheng W, Song Z, Zhang Q, Yang L, Wu J, Lin J, Xu G, Yu H. Machine Learning Enables Prediction of Pyrrolysyl-tRNA Synthetase Substrate Specificity. ACS Synth Biol 2023; 12:2403-2417. [PMID: 37486975 DOI: 10.1021/acssynbio.3c00225] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
Knowledge about the substrate scope for a given enzyme is informative for elucidating biochemical pathways and also for expanding applications of the enzyme. However, no general methods are available to accurately predict the substrate specificity of an enzyme. Pyrrolysyl-tRNA synthetase (PylRS) is a powerful tool for incorporating various noncanonical amino acids (NCAAs) into proteins, which enabled us to probe, image, rationally engineer, and evolve protein structure and function. However, the incorporation of a new NCAA typically requires the selection of large libraries of PylRS with randomized mutations at active sites, and this process requires multiple rounds of selection for each new substrate. Therefore, a single aminoacyl-tRNA synthetase with broad substrate promiscuity is ideal to facilitate widespread applications of the genetic NCAA incorporation technique. Herein, machine learning models were developed to predict the substrate specificity of PylRS to accept novel NCAAs that could be incorporated into proteins by three PylRS mutants. The models were built from a training set of 285 unique enzyme-substrate pairs of three PylRS mutants including IFRS, BtaRS, and MFRS against 95 NCAAs. The best BaggingTree (BT) model was then used for virtually screening a NCAAs library containing 1474 phenylalanine, tyrosine, tryptophan, and alanine analogues, and 156 NCAAs were predicted to be accepted by at least one of the three PylRS mutants. Then, 27 NCAAs including 24 positive and 3 negative substrates were experimentally tested for their activities, and 20 of the 24 positive substrates showed weak or strong activity and were accepted by at least one PylRS mutant, among which 11 NCAAs were never reported to be incorporated into proteins before. Three negative substrates did not show any activity. Experimental results suggested that the BT model provides a three-class classification accuracy of 0.69 and a binary classification accuracy of 0.86. This study expanded the substrate scope of three PylRS variants and provided a framework for developing machine learning models to predict substrate specificity of other PylRS variants.
Collapse
Affiliation(s)
- Qunfeng Zhang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Wenlong Zheng
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Zhongdi Song
- Key Laboratory of Pollution Exposure and Health Intervention of Zhejiang Province, Interdisciplinary Research Academy, Zhejiang Shuren University, Hangzhou 310015, China
| | - Qiang Zhang
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Lirong Yang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Jianping Wu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| | - Jianping Lin
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Gang Xu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Haoran Yu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou 311200, Zhejiang, China
| |
Collapse
|
8
|
Kalia A, Krishnan D, Hassoun S. CSI: Contrastive data Stratification for Interaction prediction and its application to compound-protein interaction prediction. Bioinformatics 2023; 39:btad456. [PMID: 37490457 PMCID: PMC10423023 DOI: 10.1093/bioinformatics/btad456] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/10/2023] [Accepted: 07/24/2023] [Indexed: 07/27/2023] Open
Abstract
MOTIVATION Accurately predicting the likelihood of interaction between two objects (compound-protein sequence, user-item, author-paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly, relationships between the interacting objects, or features of the interaction, offer an opportunity to partition the data to create multi-views of the interacting objects. The resulting congruent and non-congruent views can then be exploited via contrastive learning techniques to learn enhanced representations of the objects. RESULTS We present a novel method, Contrastive Stratification for Interaction Prediction (CSI), to stratify (partition) a dataset in a manner that can be exploited via Contrastive Multiview Coding to learn embeddings that maximize the mutual information across congruent data views. CSI assigns a key and multiple views to each data point, where data partitions under a particular key form congruent views of the data. We showcase the effectiveness of CSI by applying it to the compound-protein sequence interaction prediction problem, a pressing problem whose solution promises to expedite drug delivery (drug-protein interaction prediction), metabolic engineering, and synthetic biology (compound-enzyme interaction prediction) applications. Comparing CSI with a baseline model that does not utilize data stratification and contrastive learning, and show gains in average precision ranging from 13.7% to 39% using compounds and sequences as keys across multiple drug-target and enzymatic datasets, and gains ranging from 16.9% to 63% using reaction features as keys across enzymatic datasets. AVAILABILITY AND IMPLEMENTATION Code and dataset available at https://github.com/HassounLab/CSI.
Collapse
Affiliation(s)
- Apurva Kalia
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | | | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
- Department of Chemical and Biological Engineering, Tufts University, Medford, MA 02155, United States
| |
Collapse
|
9
|
Brooks SM, Marsan C, Reed KB, Yuan SF, Nguyen DD, Trivedi A, Altin-Yavuzarslan G, Ballinger N, Nelson A, Alper HS. A tripartite microbial co-culture system for de novo biosynthesis of diverse plant phenylpropanoids. Nat Commun 2023; 14:4448. [PMID: 37488111 PMCID: PMC10366228 DOI: 10.1038/s41467-023-40242-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 07/19/2023] [Indexed: 07/26/2023] Open
Abstract
Plant-derived phenylpropanoids, in particular phenylpropenes, have diverse industrial applications ranging from flavors and fragrances to polymers and pharmaceuticals. Heterologous biosynthesis of these products has the potential to address low, seasonally dependent yields hindering ease of widespread manufacturing. However, previous efforts have been hindered by the inherent pathway promiscuity and the microbial toxicity of key pathway intermediates. Here, in this study, we establish the propensity of a tripartite microbial co-culture to overcome these limitations and demonstrate to our knowledge the first reported de novo phenylpropene production from simple sugar starting materials. After initially designing the system to accumulate eugenol, the platform modularity and downstream enzyme promiscuity was leveraged to quickly create avenues for hydroxychavicol and chavicol production. The consortia was found to be compatible with Engineered Living Material production platforms that allow for reusable, cold-chain-independent distributed manufacturing. This work lays the foundation for further deployment of modular microbial approaches to produce plant secondary metabolites.
Collapse
Affiliation(s)
- Sierra M Brooks
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Celeste Marsan
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Kevin B Reed
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Shuo-Fu Yuan
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| | - Dustin-Dat Nguyen
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| | - Adit Trivedi
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Gokce Altin-Yavuzarslan
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, 98195, USA
| | - Nathan Ballinger
- Department of Chemistry, University of Washington, Box 351700, Seattle, WA, USA
| | - Alshakim Nelson
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, 98195, USA
- Department of Chemistry, University of Washington, Box 351700, Seattle, WA, USA
| | - Hal S Alper
- McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX, USA.
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
10
|
Ebbels TMD, van der Hooft JJJ, Chatelaine H, Broeckling C, Zamboni N, Hassoun S, Mathé EA. Recent advances in mass spectrometry-based computational metabolomics. Curr Opin Chem Biol 2023; 74:102288. [PMID: 36966702 PMCID: PMC11075003 DOI: 10.1016/j.cbpa.2023.102288] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 02/16/2023] [Accepted: 02/21/2023] [Indexed: 04/03/2023]
Abstract
The computational metabolomics field brings together computer scientists, bioinformaticians, chemists, clinicians, and biologists to maximize the impact of metabolomics across a wide array of scientific and medical disciplines. The field continues to expand as modern instrumentation produces datasets with increasing complexity, resolution, and sensitivity. These datasets must be processed, annotated, modeled, and interpreted to enable biological insight. Techniques for visualization, integration (within or between omics), and interpretation of metabolomics data have evolved along with innovation in the databases and knowledge resources required to aid understanding. In this review, we highlight recent advances in the field and reflect on opportunities and innovations in response to the most pressing challenges. This review was compiled from discussions from the 2022 Dagstuhl seminar entitled "Computational Metabolomics: From Spectra to Knowledge".
Collapse
Affiliation(s)
- Timothy M D Ebbels
- Section of Bioinformatics, Department of Metabolism, Digestion & Reproduction, Imperial College London, Burlington Danes Building, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK.
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University & Research, Wageningen 6708 PB, the Netherlands; Department of Biochemistry, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa
| | - Haley Chatelaine
- Informatics Core, Division of Preclinical Innovation, National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Corey Broeckling
- Bioanalysis and Omics Center, Analytical Resources Core, Colorado State University, Fort Collins, CO, USA
| | - Nicola Zamboni
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford, MA, USA; Department of Chemical and Biological Engineering, Tufts University, Medford, MA, USA
| | - Ewy A Mathé
- Informatics Core, Division of Preclinical Innovation, National Center for Advancing Translational Sciences, Rockville, MD, USA.
| |
Collapse
|
11
|
Kroll A, Ranjan S, Engqvist MKM, Lercher MJ. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nat Commun 2023; 14:2787. [PMID: 37188731 DOI: 10.1038/s41467-023-38347-2] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 04/21/2023] [Indexed: 05/17/2023] Open
Abstract
For most proteins annotated as enzymes, it is unknown which primary and/or secondary reactions they catalyze. Experimental characterizations of potential substrates are time-consuming and costly. Machine learning predictions could provide an efficient alternative, but are hampered by a lack of information regarding enzyme non-substrates, as available training data comprises mainly positive examples. Here, we present ESP, a general machine-learning model for the prediction of enzyme-substrate pairs with an accuracy of over 91% on independent and diverse test data. ESP can be applied successfully across widely different enzymes and a broad range of metabolites included in the training data, outperforming models designed for individual, well-studied enzyme families. ESP represents enzymes through a modified transformer model, and is trained on data augmented with randomly sampled small molecules assigned as non-substrates. By facilitating easy in silico testing of potential substrates, the ESP web server may support both basic and applied science.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany
| | - Sahasra Ranjan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
| | - Martin K M Engqvist
- Department of Biology and Bioengineering, Chalmers University of Technology, SE-412 96, Gothenburg, Sweden
- EnginZyme AB, Tomtebodevägen 6, 17165, Stockholm, Sweden
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, D-40225, Düsseldorf, Germany.
| |
Collapse
|
12
|
Rappoport D, Jinich A. Enzyme Substrate Prediction from Three-Dimensional Feature Representations Using Space-Filling Curves. J Chem Inf Model 2023; 63:1637-1648. [PMID: 36802628 DOI: 10.1021/acs.jcim.3c00005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Compact and interpretable structural feature representations are required for accurately predicting properties and function of proteins. In this work, we construct and evaluate three-dimensional feature representations of protein structures based on space-filling curves (SFCs). We focus on the problem of enzyme substrate prediction, using two ubiquitous enzyme families as case studies: the short-chain dehydrogenase/reductases (SDRs) and the S-adenosylmethionine-dependent methyltransferases (SAM-MTases). Space-filling curves such as the Hilbert curve and the Morton curve generate a reversible mapping from discretized three-dimensional to one-dimensional representations and thus help to encode three-dimensional molecular structures in a system-independent way and with only a few adjustable parameters. Using three-dimensional structures of SDRs and SAM-MTases generated using AlphaFold2, we assess the performance of the SFC-based feature representations in predictions on a new benchmark database of enzyme classification tasks including their cofactor and substrate selectivity. Gradient-boosted tree classifiers yield binary prediction accuracy of 0.77-0.91 and area under curve (AUC) characteristics of 0.83-0.92 for the classification tasks. We investigate the effects of amino acid encoding, spatial orientation, and (the few) parameters of SFC-based encodings on the accuracy of the predictions. Our results suggest that geometry-based approaches such as SFCs are promising for generating protein structural representations and are complementary to the existing protein feature representations such as evolutionary scale modeling (ESM) sequence embeddings.
Collapse
Affiliation(s)
- Dmitrij Rappoport
- Department of Chemistry, University of California, Irvine, 1102 Natural Sciences 2, Irvine, California 92697, United States
| | - Adrian Jinich
- Weill Cornell Medicine, 1300 York Avenue, Box 65, New York, New York 10065, United States
| |
Collapse
|
13
|
Yu T, Boob AG, Volk MJ, Liu X, Cui H, Zhao H. Machine learning-enabled retrobiosynthesis of molecules. Nat Catal 2023. [DOI: 10.1038/s41929-022-00909-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
14
|
Walther D. Specifics of Metabolite-Protein Interactions and Their Computational Analysis and Prediction. Methods Mol Biol 2023; 2554:179-197. [PMID: 36178627 DOI: 10.1007/978-1-0716-2624-5_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Computational approaches to the characterization and prediction of compound-protein interactions have a long research history and are well established, driven primarily by the needs of drug development. While, in principle, many of the computational methods developed in the context of drug development can also be applied directly to the investigation of metabolite-protein interactions, the interactions of metabolites with proteins (enzymes) are characterized by a number of particularities that result from their natural evolutionary origin and their biological and biochemical roles, as well as from a different problem setting when investigating them. In this review, these special aspects will be highlighted and recent research on them and developed computational approaches presented, along with available resources. They concern, among others, binding promiscuity, allostery, the role of posttranslational modifications, molecular steering and crowding effects, and metabolic conversion rate predictions. Recent breakthroughs in the field of protein structure prediction and newly developed machine learning techniques are being discussed as a tremendous opportunity for developing a more detailed molecular understanding of metabolism.
Collapse
Affiliation(s)
- Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.
| |
Collapse
|
15
|
Tian Y, Zhang D, Cai P, Lin H, Ying H, Hu QN, Wu A. Elimination of Fusarium mycotoxin deoxynivalenol (DON) via microbial and enzymatic strategies: Current status and future perspectives. Trends Food Sci Technol 2022. [DOI: 10.1016/j.tifs.2022.04.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
16
|
Li X, Liu LP, Hassoun S. Boost-RS: boosted embeddings for recommender systems and its application to enzyme-substrate interaction prediction. Bioinformatics 2022; 38:2832-2838. [PMID: 35561204 PMCID: PMC9113267 DOI: 10.1093/bioinformatics/btac201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 02/06/2022] [Accepted: 04/07/2022] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Despite experimental and curation efforts, the extent of enzyme promiscuity on substrates continues to be largely unexplored and under documented. Providing computational tools for the exploration of the enzyme-substrate interaction space can expedite experimentation and benefit applications such as constructing synthesis pathways for novel biomolecules, identifying products of metabolism on ingested compounds, and elucidating xenobiotic metabolism. Recommender systems (RS), which are currently unexplored for the enzyme-substrate interaction prediction problem, can be utilized to provide enzyme recommendations for substrates, and vice versa. The performance of Collaborative-Filtering (CF) RSs; however, hinges on the quality of embedding vectors of users and items (enzymes and substrates in our case). Importantly, enhancing CF embeddings with heterogeneous auxiliary data, specially relational data (e.g. hierarchical, pairwise or groupings), remains a challenge. RESULTS We propose an innovative general RS framework, termed Boost-RS that enhances RS performance by 'boosting' embedding vectors through auxiliary data. Specifically, Boost-RS is trained and dynamically tuned on multiple relevant auxiliary learning tasks Boost-RS utilizes contrastive learning tasks to exploit relational data. To show the efficacy of Boost-RS for the enzyme-substrate prediction interaction problem, we apply the Boost-RS framework to several baseline CF models. We show that each of our auxiliary tasks boosts learning of the embedding vectors, and that contrastive learning using Boost-RS outperforms attribute concatenation and multi-label learning. We also show that Boost-RS outperforms similarity-based models. Ablation studies and visualization of learned representations highlight the importance of using contrastive learning on some of the auxiliary data in boosting the embedding vectors. AVAILABILITY AND IMPLEMENTATION A Python implementation for Boost-RS is provided at https://github.com/HassounLab/Boost-RS. The enzyme-substrate interaction data is available from the KEGG database (https://www.genome.jp/kegg/).
Collapse
Affiliation(s)
- Xinmeng Li
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - Li-Ping Liu
- To whom correspondence should be addressed. and
| | | |
Collapse
|
17
|
Ma EJ, Siirola E, Moore C, Kummer A, Stoeckli M, Faller M, Bouquet C, Eggimann F, Ligibel M, Huynh D, Cutler G, Siegrist L, Lewis RA, Acker AC, Freund E, Koch E, Vogel M, Schlingensiepen H, Oakeley EJ, Snajdrova R. Machine-Directed Evolution of an Imine Reductase for Activity and Stereoselectivity. ACS Catal 2021. [DOI: 10.1021/acscatal.1c02786] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Eric J. Ma
- NIBR Informatics, Novartis Institutes for BioMedical Research (NIBR), 181 Massachusetts Ave, Cambridge, Massachusetts 02139, United States
| | - Elina Siirola
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Charles Moore
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Arkadij Kummer
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Markus Stoeckli
- Analytical Sciences and Imaging, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Michael Faller
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Caroline Bouquet
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Fabian Eggimann
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Mathieu Ligibel
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Dan Huynh
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Geoffrey Cutler
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Luca Siegrist
- NIBR Biologics Center, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Richard A. Lewis
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Anne-Christine Acker
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Ernst Freund
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Elke Koch
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Markus Vogel
- NIBR Biologics Center, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Holger Schlingensiepen
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Edward J. Oakeley
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Radka Snajdrova
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| |
Collapse
|