1
|
Yu Y, Rué Casamajo A, Finnigan W, Schnepel C, Barker R, Morrill C, Heath RS, De Maria L, Turner NJ, Scrutton NS. Structure-Based Design of Small Imine Reductase Panels for Target Substrates. ACS Catal 2023; 13:12310-12321. [PMID: 37736118 PMCID: PMC10510103 DOI: 10.1021/acscatal.3c02278] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 08/20/2023] [Indexed: 09/23/2023]
Abstract
Biocatalysis is important in the discovery, development, and manufacture of pharmaceuticals. However, the identification of enzymes for target transformations of interest requires major screening efforts. Here, we report a structure-based computational workflow to prioritize protein sequences by a score based on predicted activities on substrates, thereby reducing a resource-intensive laboratory-based biocatalyst screening. We selected imine reductases (IREDs) as a class of biocatalysts to illustrate the application of the computational workflow termed IREDFisher. Validation by using published data showed that IREDFisher can retrieve the best enzymes and increase the hit rate by identifying the top 20 ranked sequences. The power of IREDFisher is confirmed by computationally screening 1400 sequences for chosen reductive amination reactions with different levels of complexity. Highly active IREDs were identified by only testing 20 samples in vitro. Our speed test shows that it only takes 90 min to rank 85 sequences from user input and 30 min for the established IREDFisher database containing 591 IRED sequences. IREDFisher is available as a user-friendly web interface (https://enzymeevolver.com/IREDFisher). IREDFisher enables the rapid discovery of IREDs for applications in synthesis and directed evolution studies, with minimal time and resource expenditure. Future use of the workflow with other enzyme families could be implemented following the modification of the workflow scoring function.
Collapse
Affiliation(s)
- Yuqi Yu
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
- Augmented
Biologics Discovery & Design, Department of Biologics Engineering, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB21 6GH, U.K.
| | - Arnau Rué Casamajo
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - William Finnigan
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Christian Schnepel
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Rhys Barker
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Charlotte Morrill
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Rachel S. Heath
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Leonardo De Maria
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(RI), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 43150, Sweden
| | - Nicholas J. Turner
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Nigel S. Scrutton
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| |
Collapse
|
2
|
Rappoport D, Jinich A. Enzyme Substrate Prediction from Three-Dimensional Feature Representations Using Space-Filling Curves. J Chem Inf Model 2023; 63:1637-1648. [PMID: 36802628 DOI: 10.1021/acs.jcim.3c00005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Compact and interpretable structural feature representations are required for accurately predicting properties and function of proteins. In this work, we construct and evaluate three-dimensional feature representations of protein structures based on space-filling curves (SFCs). We focus on the problem of enzyme substrate prediction, using two ubiquitous enzyme families as case studies: the short-chain dehydrogenase/reductases (SDRs) and the S-adenosylmethionine-dependent methyltransferases (SAM-MTases). Space-filling curves such as the Hilbert curve and the Morton curve generate a reversible mapping from discretized three-dimensional to one-dimensional representations and thus help to encode three-dimensional molecular structures in a system-independent way and with only a few adjustable parameters. Using three-dimensional structures of SDRs and SAM-MTases generated using AlphaFold2, we assess the performance of the SFC-based feature representations in predictions on a new benchmark database of enzyme classification tasks including their cofactor and substrate selectivity. Gradient-boosted tree classifiers yield binary prediction accuracy of 0.77-0.91 and area under curve (AUC) characteristics of 0.83-0.92 for the classification tasks. We investigate the effects of amino acid encoding, spatial orientation, and (the few) parameters of SFC-based encodings on the accuracy of the predictions. Our results suggest that geometry-based approaches such as SFCs are promising for generating protein structural representations and are complementary to the existing protein feature representations such as evolutionary scale modeling (ESM) sequence embeddings.
Collapse
Affiliation(s)
- Dmitrij Rappoport
- Department of Chemistry, University of California, Irvine, 1102 Natural Sciences 2, Irvine, California 92697, United States
| | - Adrian Jinich
- Weill Cornell Medicine, 1300 York Avenue, Box 65, New York, New York 10065, United States
| |
Collapse
|
3
|
Ferdous S, Shihab IF, Reuel NF. Effects of Sequence Features on Machine-Learned Enzyme Classification Fidelity. Biochem Eng J 2022; 187:108612. [PMID: 37215687 PMCID: PMC10194028 DOI: 10.1016/j.bej.2022.108612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Assigning enzyme commission (EC) numbers using sequence information alone has been the subject of recent classification algorithms where statistics, homology and machine-learning based methods are used. This work benchmarks performance of a few of these algorithms as a function of sequence features such as chain length and amino acid composition (AAC). This enables determination of optimal classification windows for de novo sequence generation and enzyme design. In this work we developed a parallelization workflow which efficiently processes >500,000 annotated sequences through each candidate algorithm and a visualization workflow to observe the performance of the classifier over changing enzyme length, main EC class and AAC. We applied these workflows to the entire SwissProt database to date (n = 565245) using two, locally installable classifiers, ECpred and DeepEC, and collecting results from two other webserver-based tools, Deepre and BENZ-ws. It is observed that all the classifiers exhibit peak performance in the range of 300 to 500 amino acids in length. In terms of main EC class, classifiers were most accurate at predicting translocases (EC-6) and were least accurate in determining hydrolases (EC-3) and oxidoreductases (EC-1). We also identified AAC ranges that are most common in the annotated enzymes and found that all classifiers work best in this common range. Among the four classifiers, ECpred showed the best consistency in changing feature space. These workflows can be used to benchmark new algorithms as they are developed and find optimum design spaces for the generation of new, synthetic enzymes.
Collapse
Affiliation(s)
- Sakib Ferdous
- Department of Chemical and Biological Engineering, Iowa State University
| | | | - Nigel F. Reuel
- Department of Chemical and Biological Engineering, Iowa State University
| |
Collapse
|
4
|
Vasina M, Velecký J, Planas-Iglesias J, Marques SM, Skarupova J, Damborsky J, Bednar D, Mazurenko S, Prokop Z. Tools for computational design and high-throughput screening of therapeutic enzymes. Adv Drug Deliv Rev 2022; 183:114143. [PMID: 35167900 DOI: 10.1016/j.addr.2022.114143] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Revised: 02/04/2022] [Accepted: 02/09/2022] [Indexed: 12/16/2022]
Abstract
Therapeutic enzymes are valuable biopharmaceuticals in various biomedical applications. They have been successfully applied for fibrinolysis, cancer treatment, enzyme replacement therapies, and the treatment of rare diseases. Still, there is a permanent demand to find new or better therapeutic enzymes, which would be sufficiently soluble, stable, and active to meet specific medical needs. Here, we highlight the benefits of coupling computational approaches with high-throughput experimental technologies, which significantly accelerate the identification and engineering of catalytic therapeutic agents. New enzymes can be identified in genomic and metagenomic databases, which grow thanks to next-generation sequencing technologies exponentially. Computational design and machine learning methods are being developed to improve catalytically potent enzymes and predict their properties to guide the selection of target enzymes. High-throughput experimental pipelines, increasingly relying on microfluidics, ensure functional screening and biochemical characterization of target enzymes to reach efficient therapeutic enzymes.
Collapse
Affiliation(s)
- Michal Vasina
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic
| | - Jan Velecký
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic
| | - Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic
| | - Sergio M Marques
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic
| | - Jana Skarupova
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic; Enantis, INBIT, Kamenice 34, Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic.
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic.
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic.
| |
Collapse
|
5
|
Liu D, Han M, Tian Y, Gong L, Jia C, Cai P, Tu W, Chen J, Hu QN. Cell2Chem: mining explored and unexplored biosynthetic chemical spaces. Bioinformatics 2021; 36:5269-5270. [PMID: 32697815 DOI: 10.1093/bioinformatics/btaa660] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 06/14/2020] [Accepted: 07/16/2020] [Indexed: 11/12/2022] Open
Abstract
SUMMARY Living cell strains have important applications in synthesizing their native compounds and potential for use in studies exploring the universal chemical space. Here, we present a web server named as Cell2Chem which accelerates the search for explored compounds in organisms, facilitating investigations of biosynthesis in unexplored chemical spaces. Cell2Chem uses co-occurrence networks and natural language processing to provide a systematic method for linking living organisms to biosynthesized compounds and the processes that produce these compounds. The Cell2Chem platform comprises 40 370 species and 125 212 compounds. Using reaction pathway and enzyme function in silico prediction methods, Cell2Chem reveals possible biosynthetic pathways of compounds and catalytic functions of proteins to expand unexplored biosynthetic chemical spaces. Cell2Chem can help improve biosynthesis research and enhance the efficiency of synthetic biology. AVAILABILITY AND IMPLEMENTATION Cell2Chem is available at: http://www.rxnfinder.org/cell2chem/.
Collapse
Affiliation(s)
- Dongliang Liu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China
| | - Yu Tian
- Tianjin Institute of Industrial Biotechnology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Tianjin 300308, P. R. China
| | - Linlin Gong
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China
| | - Cancan Jia
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China.,Tianjin Institute of Industrial Biotechnology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Tianjin 300308, P. R. China
| | - Weizhong Tu
- Wuhan LifeSynther Science and Technology Co. Limited, Wuhan 430070, P. R. China
| | - Junni Chen
- Wuhan LifeSynther Science and Technology Co. Limited, Wuhan 430070, P. R. China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China
| |
Collapse
|
6
|
Zhang D, Zhang T, Liu S, Sun D, Ding S, Cheng X, Cai P, Ren A, Han M, Liu D, Jia C, Gong L, Zhang R, Xing H, Tu W, Chen J, Hu QN. SARS2020: an integrated platform for identification of novel coronavirus by a consensus sequence-function model. Bioinformatics 2021; 37:1182-1183. [PMID: 32871007 PMCID: PMC7558763 DOI: 10.1093/bioinformatics/btaa767] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 08/21/2020] [Accepted: 08/25/2020] [Indexed: 12/02/2022] Open
Abstract
Motivation The 2019 novel coronavirus outbreak has significantly affected global health and
society. Thus, predicting biological function from pathogen sequence is crucial and
urgently needed. However, little work has been performed to identify viruses by the
enzymes that they encode, and which are key to pathogen propagation. Results We built a comprehensive scientific resource, SARS2020, that integrates
coronavirus-related research, genomic sequences, and results of anti-viral drug trials.
In addition, we built a consensus sequence-catalytic function model from which we
identified the novel coronavirus as encoding the same proteinase as the Severe Acute
Respiratory Syndrome virus. This data-driven sequence-based strategy will enable rapid
identification of agents responsible for future epidemics. Availability SARS2020 is available at http://design.rxnfinder.org/sars2020/. Supplementary information
Collapse
Affiliation(s)
- Dachuan Zhang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Tong Zhang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Sheng Liu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Dandan Sun
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Shaozhen Ding
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Xingxiang Cheng
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China.,Tianjin Institute of Industrial Biotechnology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Tianjin 300308, P. R. China
| | - Ailin Ren
- Tianjin Institute of Industrial Biotechnology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Tianjin 300308, P. R. China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Cancan Jia
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Linlin Gong
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Rui Zhang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Huadong Xing
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Weizhong Tu
- Wuhan LifeSynther Science and Technology Co. Limited, Wuhan 430070, P. R. China
| | - Junni Chen
- Wuhan LifeSynther Science and Technology Co. Limited, Wuhan 430070, P. R. China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| |
Collapse
|
7
|
Marques SM, Planas-Iglesias J, Damborsky J. Web-based tools for computational enzyme design. Curr Opin Struct Biol 2021; 69:19-34. [PMID: 33667757 DOI: 10.1016/j.sbi.2021.01.010] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 01/14/2021] [Accepted: 01/27/2021] [Indexed: 12/30/2022]
Abstract
Enzymes are in high demand for very diverse biotechnological applications. However, natural biocatalysts often need to be engineered for fine-tuning their properties towards the end applications, such as the activity, selectivity, stability to temperature or co-solvents, and solubility. Computational methods are increasingly used in this task, providing predictions that narrow down the space of possible mutations significantly and can enormously reduce the experimental burden. Many computational tools are available as web-based platforms, making them accessible to non-expert users. These platforms are typically user-friendly, contain walk-throughs, and do not require deep expertise and installations. Here we describe some of the most recent outstanding web-tools for enzyme engineering and formulate future perspectives in this field.
Collapse
Affiliation(s)
- Sérgio M Marques
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic.
| |
Collapse
|
8
|
Foroozandeh Shahraki M, Ariaeenejad S, Fallah Atanaki F, Zolfaghari B, Koshiba T, Kavousi K, Salekdeh GH. MCIC: Automated Identification of Cellulases From Metagenomic Data and Characterization Based on Temperature and pH Dependence. Front Microbiol 2020; 11:567863. [PMID: 33193158 PMCID: PMC7645119 DOI: 10.3389/fmicb.2020.567863] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 09/30/2020] [Indexed: 01/03/2023] Open
Abstract
As the availability of high-throughput metagenomic data is increasing, agile and accurate tools are required to analyze and exploit this valuable and plentiful resource. Cellulose-degrading enzymes have various applications, and finding appropriate cellulases for different purposes is becoming increasingly challenging. An in silico screening method for high-throughput data can be of great assistance when combined with the characterization of thermal and pH dependence. By this means, various metagenomic sources with high cellulolytic potentials can be explored. Using a sequence similarity-based annotation and an ensemble of supervised learning algorithms, this study aims to identify and characterize cellulolytic enzymes from a given high-throughput metagenomic data based on optimum temperature and pH. The prediction performance of MCIC (metagenome cellulase identification and characterization) was evaluated through multiple iterations of sixfold cross-validation tests. This tool was also implemented for a comparative analysis of four metagenomic sources to estimate their cellulolytic profile and capabilities. For experimental validation of MCIC’s screening and prediction abilities, two identified enzymes from cattle rumen were subjected to cloning, expression, and characterization. To the best of our knowledge, this is the first time that a sequence-similarity based method is used alongside an ensemble machine learning model to identify and characterize cellulase enzymes from extensive metagenomic data. This study highlights the strength of machine learning techniques to predict enzymatic properties solely based on their sequence. MCIC is freely available as a python package and standalone toolkit for Windows and Linux-based operating systems with several functions to facilitate the screening and thermal and pH dependence prediction of cellulases.
Collapse
Affiliation(s)
- Mehdi Foroozandeh Shahraki
- Laboratory of Complex Biological Systems and Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Shohreh Ariaeenejad
- Department of Systems and Synthetic Biology, Agricultural Biotechnology Research Institute of Iran, Agricultural Research Education and Extension Organization, Karaj, Iran
| | - Fereshteh Fallah Atanaki
- Laboratory of Complex Biological Systems and Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Behrouz Zolfaghari
- Computer Science and Engineering Department, Indian Institute of Technology Guwahati, Guwahati, India
| | - Takeshi Koshiba
- Department of Mathematics, Faculty of Education and Integrated Arts and Sciences, Waseda University, Tokyo, Japan
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Ghasem Hosseini Salekdeh
- Department of Systems and Synthetic Biology, Agricultural Biotechnology Research Institute of Iran, Agricultural Research Education and Extension Organization, Karaj, Iran.,Department of Molecular Sciences, Macquarie University, Sydney, NSW, Australia
| |
Collapse
|
9
|
Sun D, Cheng X, Tian Y, Ding S, Zhang D, Cai P, Hu QN. EnzyMine: a comprehensive database for enzyme function annotation with enzymatic reaction chemical feature. Database (Oxford) 2020; 2023:baaa065. [PMID: 33002112 PMCID: PMC10755256 DOI: 10.1093/database/baaa065] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 07/19/2020] [Accepted: 07/24/2020] [Indexed: 11/14/2022]
Abstract
Addition of chemical structural information in enzymatic reactions has proven to be significant for accurate enzyme function prediction. However, such chemical data lack systematic feature mining and hardly exist in enzyme-related databases. Therefore, global mining of enzymatic reactions will offer a unique landscape for researchers to understand the basic functional mechanisms of natural bioprocesses and facilitate enzyme function annotation. Here, we established a new knowledge base called EnzyMine, through which we propose to elucidate enzymatic reaction features and then link them with sequence and structural annotations. EnzyMine represents an advanced database that extends enzyme knowledge by incorporating reaction chemical feature strategies, strengthening the connectivity between enzyme and metabolic reactions. Therefore, it has the potential to reveal many new metabolic pathways involved with given enzymes, as well as expand enzyme function annotation. Database URL: http://www.rxnfinder.org/enzymine/.
Collapse
Affiliation(s)
- Dandan Sun
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Xingxiang Cheng
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Yu Tian
- School of Biology and Pharmaceutical Engineering, Wuhan Polytechnic University, Wuhan, Hubei 430023, China and
| | - Shaozhen Ding
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Dachuan Zhang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| | - Pengli Cai
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P. R. China
| | - Qian-nan Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200333, P. R. China
| |
Collapse
|