1
|
Yarıcı M, Cantürk F, Dursun S, Aydın HN, Karabekmez ME. RSEA: A Web Server for Pathway Enrichment Analysis of Metabolic Reaction Sets. Biotechnol Bioeng 2025. [PMID: 40345143 DOI: 10.1002/bit.29020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 02/10/2025] [Accepted: 04/28/2025] [Indexed: 05/11/2025]
Abstract
Changes in biological pathways provide essential clues about metabolism. Genome-scale metabolic models (GEM) are network-based templates that computationally describe all stoichiometric associations and gene-protein reaction (GPR) relations found in an organism for all its metabolic genes and metabolites. Using reaction stoichiometry as input, GEMs mathematically simulate metabolic reaction fluxes occurring in an organism and predict changes in the metabolic system under the relevant condition. Multiple tools and approaches in the literature can capture fluxes sensitive to a given condition by using GEMs. However, functional enrichment analysis of these reaction lists in a systems biology perspective is not straightforward. Here, we introduce RSEA to annotate given reaction sets to significantly related metabolic pathways: Reaction Set Enrichment Analysis web server tool. RSEA converts given reaction list derived from GEMs into proper reaction identifiers and statistically analyze its enrichment in metabolic pathways. RSEA is designed to provide researchers with a practical and user-friendly platform to explore and interpret sets of reactions in biological pathways and freely available online (https://rseatool.com/).
Collapse
Affiliation(s)
- Merve Yarıcı
- Department of Bioengineering, Istanbul Medeniyet University, Istanbul, Turkey
| | - Furkan Cantürk
- Department of Artificial Intelligence, Özyeğin University, Istanbul, Turkey
| | | | - Hatice Nur Aydın
- Department of Bioengineering, Istanbul Medeniyet University, Istanbul, Turkey
| | | |
Collapse
|
2
|
Viana R, Couceiro D, Newton W, Coutinho L, Dias O, Coelho C, Teixeira MC. Reconstruction and exploitation of a dedicated Genome-Scale Metabolic Model of the human pathogen C. neoformans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.02.646762. [PMID: 40291681 PMCID: PMC12026501 DOI: 10.1101/2025.04.02.646762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
C. neoformans is notorious for causing severe pulmonary and central nervous system infections, particularly in immunocompromised patients. High mortality rates, associated with its tropism and adaptation to the brain microenvironment and its drug resistance profile, makes this pathogen a public health threat and a World Health Organization (WHO) priority. In this study, we reconstructed GSMM iRV890 for C. neoformans var. grubii , providing a promising platform for the comprehensive understanding of the unique metabolic features of C. neoformans , and subsequently shedding light on its complex tropism for the brain microenvironment and potentially informing the discovery of new drug targets. The GSMM iRV890 model is openly available in the SBML format, and underwent validation using experimental data for nitrogen and carbon assimilation, as well as specific growth and glucose consumption rates. Based on the comparison with GSMMs available for other pathogenic yeasts, unique metabolic features were predicted for C. neoformans , including key pathways shaping the dynamics between C. neoformans and the human host, and underlying its adaptation to the brain environment. Finally, predicted essential genes from the validated model are explored herein as potential novel antifungal drug targets.
Collapse
|
3
|
Capela J, Zimmermann-Kogadeeva M, Dijk ADJV, de Ridder D, Dias O, Rocha M. Comparative Assessment of Protein Large Language Models for Enzyme Commission Number Prediction. BMC Bioinformatics 2025; 26:68. [PMID: 40016653 PMCID: PMC11866580 DOI: 10.1186/s12859-025-06081-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Accepted: 02/11/2025] [Indexed: 03/01/2025] Open
Abstract
BACKGROUND Protein large language models (LLM) have been used to extract representations of enzyme sequences to predict their function, which is encoded by enzyme commission (EC) numbers. However, a comprehensive comparison of different LLMs for this task is still lacking, leaving questions about their relative performance. Moreover, protein sequence alignments (e.g. BLASTp or DIAMOND) are often combined with machine learning models to assign EC numbers from homologous enzymes, thus compensating for the shortcomings of these models' predictions. In this context, LLMs and sequence alignment methods have not been extensively compared as individual predictors, raising unaddressed questions about LLMs' performance and limitations relative to the alignment methods. In this study, we set out to assess the performance of ESM2, ESM1b, and ProtBERT language models in their ability to predict EC numbers, comparing them with BLASTp, against each other and against models that rely on one-hot encodings of amino acid sequences. RESULTS Our findings reveal that combining these LLMs with fully connected neural networks surpasses the performance of deep learning models that rely on one-hot encodings. Moreover, although BLASTp provided marginally better results overall, DL models provide results that complement BLASTp's, revealing that LLMs better predict certain EC numbers while BLASTp excels in predicting others. The ESM2 stood out as the best model among the LLMs tested, providing more accurate predictions on difficult annotation tasks and for enzymes without homologs. CONCLUSIONS Crucially, this study demonstrates that LLMs still have to be improved to become the gold standard tool over BLASTp in mainstream enzyme annotation routines. On the other hand, LLMs can provide good predictions for more difficult-to-annotate enzymes, particularly when the identity between the query sequence and the reference database falls below 25%. Our results reinforce the claim that BLASTp and LLM models complement each other and can be more effective when used together.
Collapse
Affiliation(s)
- João Capela
- Centre of Biological Engineering, University of Minho, Braga, 4710-057, Portugal.
| | | | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
- Biosystems Data Analysis, University of Amsterdam, Amsterdam, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| | - Oscar Dias
- Centre of Biological Engineering, University of Minho, Braga, 4710-057, Portugal
- LABBELS - Associate Laboratory, Braga/Guimarães, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Braga, 4710-057, Portugal
- LABBELS - Associate Laboratory, Braga/Guimarães, Portugal
| |
Collapse
|
4
|
Heinken A, Hulshof TO, Nap B, Martinelli F, Basile A, O'Brolchain A, O'Sullivan NF, Gallagher C, Magee E, McDonagh F, Lalor I, Bergin M, Evans P, Daly R, Farrell R, Delaney RM, Hill S, McAuliffe SR, Kilgannon T, Fleming RMT, Thinnes CC, Thiele I. A genome-scale metabolic reconstruction resource of 247,092 diverse human microbes spanning multiple continents, age groups, and body sites. Cell Syst 2025; 16:101196. [PMID: 39947184 DOI: 10.1016/j.cels.2025.101196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 10/04/2024] [Accepted: 01/15/2025] [Indexed: 02/19/2025]
Abstract
Genome-scale modeling of microbiome metabolism enables the simulation of diet-host-microbiome-disease interactions. However, current genome-scale reconstruction resources are limited in scope by computational challenges. We developed an optimized and highly parallelized reconstruction and analysis pipeline to build a resource of 247,092 microbial genome-scale metabolic reconstructions, deemed APOLLO. APOLLO spans 19 phyla, contains >60% of uncharacterized strains, and accounts for strains from 34 countries, all age groups, and multiple body sites. Using machine learning, we predicted with high accuracy the taxonomic assignment of strains based on the computed metabolic features. We then built 14,451 metagenomic sample-specific microbiome community models to systematically interrogate their community-level metabolic capabilities. We show that sample-specific metabolic pathways accurately stratify microbiomes by body site, age, and disease state. APOLLO is freely available, enables the systematic interrogation of the metabolic capabilities of largely still uncultured and unclassified species, and provides unprecedented opportunities for systems-level modeling of personalized host-microbiome co-metabolism.
Collapse
Affiliation(s)
- Almut Heinken
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland; Inserm UMRS 1256 NGERE, University of Lorraine, Nancy, France
| | - Timothy Otto Hulshof
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland
| | - Bram Nap
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland
| | - Filippo Martinelli
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland
| | - Arianna Basile
- School of Medicine, University of Galway, Galway, Ireland; Department of Biology, University of Padova, Padova, Italy
| | | | | | | | | | | | - Ian Lalor
- University of Galway, Galway, Ireland
| | | | | | | | | | | | | | | | | | | | - Cyrille C Thinnes
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland
| | - Ines Thiele
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland; Division of Microbiology, University of Galway, Galway, Ireland; APC Microbiome Ireland, Cork, Ireland.
| |
Collapse
|
5
|
De Bernardini N, Zampieri G, Campanaro S, Zimmermann J, Waschina S, Treu L. pan-Draft: automated reconstruction of species-representative metabolic models from multiple genomes. Genome Biol 2024; 25:280. [PMID: 39456096 PMCID: PMC11515315 DOI: 10.1186/s13059-024-03425-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Accepted: 10/15/2024] [Indexed: 10/28/2024] Open
Abstract
The accurate reconstruction of genome-scale metabolic models (GEMs) for unculturable species poses challenges due to the incomplete and fragmented genetic information typical of metagenome-assembled genomes (MAGs). While existing tools leverage sequence homology from single genomes, this study introduces pan-Draft, a pan-reactome-based approach exploiting recurrent genetic evidence to determine the solid core structure of species-level GEMs. By comparing MAGs clustered at the species-level, pan-Draft addresses the issues due to the incompleteness and contamination of individual genomes, providing high-quality draft models and an accessory reactions catalog supporting the gapfilling step. This approach will improve our comprehension of metabolic functions of uncultured species.
Collapse
Affiliation(s)
- Nicola De Bernardini
- Department of Biology, University of Padova, Via U. Bassi 58/B, Padua, 35121, Italy
| | - Guido Zampieri
- Department of Biology, University of Padova, Via U. Bassi 58/B, Padua, 35121, Italy.
| | - Stefano Campanaro
- Department of Biology, University of Padova, Via U. Bassi 58/B, Padua, 35121, Italy.
| | - Johannes Zimmermann
- Evolutionary Ecology and Genetics, Zoological Institute, Kiel University, Kiel, 24118, Germany
- Antibiotic Resistance Group, Max Planck Institute for Evolutionary Biology, Ploen, 24306, Germany
| | - Silvio Waschina
- Department of Human Nutrition and Food Science, Kiel University, Heinrich-Hecht-Platz 10, Kiel, 24118, Germany
| | - Laura Treu
- Department of Biology, University of Padova, Via U. Bassi 58/B, Padua, 35121, Italy
| |
Collapse
|
6
|
Kroll A, Niebuhr N, Butler G, Lercher MJ. SPOT: A machine learning model that predicts specific substrates for transport proteins. PLoS Biol 2024; 22:e3002807. [PMID: 39325691 PMCID: PMC11426516 DOI: 10.1371/journal.pbio.3002807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 08/13/2024] [Indexed: 09/28/2024] Open
Abstract
Transport proteins play a crucial role in cellular metabolism and are central to many aspects of molecular biology and medicine. Determining the function of transport proteins experimentally is challenging, as they become unstable when isolated from cell membranes. Machine learning-based predictions could provide an efficient alternative. However, existing methods are limited to predicting a small number of specific substrates or broad transporter classes. These limitations stem partly from using small data sets for model training and a choice of input features that lack sufficient information about the prediction problem. Here, we present SPOT, the first general machine learning model that can successfully predict specific substrates for arbitrary transport proteins, achieving an accuracy above 92% on independent and diverse test data covering widely different transporters and a broad range of metabolites. SPOT uses Transformer Networks to represent transporters and substrates numerically. To overcome the problem of missing negative data for training, it augments a large data set of known transporter-substrate pairs with carefully sampled random molecules as non-substrates. SPOT not only predicts specific transporter-substrate pairs, but also outperforms previously published models designed to predict broad substrate classes for individual transport proteins. We provide a web server and Python function that allows users to explore the substrate scope of arbitrary transporters.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| | - Nico Niebuhr
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| | - Gregory Butler
- Department of Computer Science and Software Engineering, Concordia University, Montreal, Quebec, Canada
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
7
|
Tarzi C, Zampieri G, Sullivan N, Angione C. Emerging methods for genome-scale metabolic modeling of microbial communities. Trends Endocrinol Metab 2024; 35:533-548. [PMID: 38575441 DOI: 10.1016/j.tem.2024.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/28/2024] [Accepted: 02/29/2024] [Indexed: 04/06/2024]
Abstract
Genome-scale metabolic models (GEMs) are consolidating as platforms for studying mixed microbial populations, by combining biological data and knowledge with mathematical rigor. However, deploying these models to answer research questions can be challenging due to the increasing number of available computational tools, the lack of universal standards, and their inherent limitations. Here, we present a comprehensive overview of foundational concepts for building and evaluating genome-scale models of microbial communities. We then compare tools in terms of requirements, capabilities, and applications. Next, we highlight the current pitfalls and open challenges to consider when adopting existing tools and developing new ones. Our compendium can be relevant for the expanding community of modelers, both at the entry and experienced levels.
Collapse
Affiliation(s)
- Chaimaa Tarzi
- School of Computing, Engineering and Digital Technologies, Teesside University, Southfield Rd, Middlesbrough, TS1 3BX, North Yorkshire, UK
| | - Guido Zampieri
- Department of Biology, University of Padova, Padova, 35122, Veneto, Italy
| | - Neil Sullivan
- Complement Genomics Ltd, Station Rd, Lanchester, Durham, DH7 0EX, County Durham, UK
| | - Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Southfield Rd, Middlesbrough, TS1 3BX, North Yorkshire, UK; Centre for Digital Innovation, Teesside University, Southfield Rd, Middlesbrough, TS1 3BX, North Yorkshire, UK; National Horizons Centre, Teesside University, 38 John Dixon Ln, Darlington, DL1 1HG, North Yorkshire, UK.
| |
Collapse
|
8
|
Kulyashov MA, Kolmykov SK, Khlebodarova TM, Akberdin IR. State-of the-Art Constraint-Based Modeling of Microbial Metabolism: From Basics to Context-Specific Models with a Focus on Methanotrophs. Microorganisms 2023; 11:2987. [PMID: 38138131 PMCID: PMC10745598 DOI: 10.3390/microorganisms11122987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 12/09/2023] [Accepted: 12/13/2023] [Indexed: 12/24/2023] Open
Abstract
Methanotrophy is the ability of an organism to capture and utilize the greenhouse gas, methane, as a source of energy-rich carbon. Over the years, significant progress has been made in understanding of mechanisms for methane utilization, mostly in bacterial systems, including the key metabolic pathways, regulation and the impact of various factors (iron, copper, calcium, lanthanum, and tungsten) on cell growth and methane bioconversion. The implementation of -omics approaches provided vast amount of heterogeneous data that require the adaptation or development of computational tools for a system-wide interrogative analysis of methanotrophy. The genome-scale mathematical modeling of its metabolism has been envisioned as one of the most productive strategies for the integration of muti-scale data to better understand methane metabolism and enable its biotechnological implementation. Herein, we provide an overview of various computational strategies implemented for methanotrophic systems. We highlight functional capabilities as well as limitations of the most popular web resources for the reconstruction, modification and optimization of the genome-scale metabolic models for methane-utilizing bacteria.
Collapse
Affiliation(s)
- Mikhail A. Kulyashov
- Department of Computational Biology, Scientific Center for Information Technologies and Artificial Intelligence, Sirius University of Science and Technology, 354340 Sochi, Russia; (M.A.K.); (S.K.K.); (T.M.K.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| | - Semyon K. Kolmykov
- Department of Computational Biology, Scientific Center for Information Technologies and Artificial Intelligence, Sirius University of Science and Technology, 354340 Sochi, Russia; (M.A.K.); (S.K.K.); (T.M.K.)
| | - Tamara M. Khlebodarova
- Department of Computational Biology, Scientific Center for Information Technologies and Artificial Intelligence, Sirius University of Science and Technology, 354340 Sochi, Russia; (M.A.K.); (S.K.K.); (T.M.K.)
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia
- Kurchatov Genomics Center, Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia
| | - Ilya R. Akberdin
- Department of Computational Biology, Scientific Center for Information Technologies and Artificial Intelligence, Sirius University of Science and Technology, 354340 Sochi, Russia; (M.A.K.); (S.K.K.); (T.M.K.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| |
Collapse
|
9
|
Cunha E, Silva M, Chaves I, Demirci H, Lagoa DR, Lima D, Rocha M, Rocha I, Dias O. The first multi-tissue genome-scale metabolic model of a woody plant highlights suberin biosynthesis pathways in Quercus suber. PLoS Comput Biol 2023; 19:e1011499. [PMID: 37729340 PMCID: PMC10545120 DOI: 10.1371/journal.pcbi.1011499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 10/02/2023] [Accepted: 09/07/2023] [Indexed: 09/22/2023] Open
Abstract
Over the last decade, genome-scale metabolic models have been increasingly used to study plant metabolic behaviour at the tissue and multi-tissue level under different environmental conditions. Quercus suber, also known as the cork oak tree, is one of the most important forest communities of the Mediterranean/Iberian region. In this work, we present the genome-scale metabolic model of the Q. suber (iEC7871). The metabolic model comprises 7871 genes, 6231 reactions, and 6481 metabolites across eight compartments. Transcriptomics data was integrated into the model to obtain tissue-specific models for the leaf, inner bark, and phellogen, with specific biomass compositions. The tissue-specific models were merged into a diel multi-tissue metabolic model to predict interactions among the three tissues at the light and dark phases. The metabolic models were also used to analyse the pathways associated with the synthesis of suberin monomers, namely the acyl-lipids, phenylpropanoids, isoprenoids, and flavonoids production. The models developed in this work provide a systematic overview of the metabolism of Q. suber, including its secondary metabolism pathways and cork formation.
Collapse
Affiliation(s)
- Emanuel Cunha
- Centre of Biological Engineering, Universidade do Minho, Braga, Portugal
| | - Miguel Silva
- Centre of Biological Engineering, Universidade do Minho, Braga, Portugal
| | - Inês Chaves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Avenida da República, Quinta do Marquês, Oeiras, Portugal
- iBET, Instituto de Biologia Experimental e Tecnológica, Oeiras, Portugal
| | - Huseyin Demirci
- Centre of Biological Engineering, Universidade do Minho, Braga, Portugal
- SnT/University of Luxembourg, Luxembourg
| | | | - Diogo Lima
- Centre of Biological Engineering, Universidade do Minho, Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, Universidade do Minho, Braga, Portugal
- LABBELS–Associate Laboratory, Braga, Guimarães, Portugal
| | - Isabel Rocha
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Avenida da República, Quinta do Marquês, Oeiras, Portugal
| | - Oscar Dias
- Centre of Biological Engineering, Universidade do Minho, Braga, Portugal
- LABBELS–Associate Laboratory, Braga, Guimarães, Portugal
| |
Collapse
|
10
|
Cunha E, Lagoa D, Faria JP, Liu F, Henry CS, Dias O. TranSyT, an innovative framework for identifying transport systems. Bioinformatics 2023; 39:btad466. [PMID: 37589572 PMCID: PMC10444967 DOI: 10.1093/bioinformatics/btad466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 06/15/2023] [Accepted: 08/10/2023] [Indexed: 08/18/2023] Open
Abstract
MOTIVATION The importance and rate of development of genome-scale metabolic models have been growing for the last few years, increasing the demand for software solutions that automate several steps of this process. However, since TRIAGE's release, software development for the automatic integration of transport reactions into models has stalled. RESULTS Here, we present the Transport Systems Tracker (TranSyT). Unlike other transport systems annotation software, TranSyT does not rely on manual curation to expand its internal database, which is derived from highly curated records retrieved from the Transporters Classification Database and complemented with information from other data sources. TranSyT compiles information regarding transporter families and proteins, and derives reactions into its internal database, making it available for rapid annotation of complete genomes. All transport reactions have GPR associations and can be exported with identifiers from four different metabolite databases. TranSyT is currently available as a plugin for merlin v4.0 and an app for KBase. AVAILABILITY AND IMPLEMENTATION TranSyT web service: https://transyt.bio.di.uminho.pt/; GitHub for the tool: https://github.com/BioSystemsUM/transyt; GitHub with examples and instructions to run TranSyT: https://github.com/ecunha1996/transyt_paper.
Collapse
Affiliation(s)
- Emanuel Cunha
- Centre of Biological Engineering, University of Minho, Braga 4704-553, Portugal
| | - Davide Lagoa
- Centre of Biological Engineering, University of Minho, Braga 4704-553, Portugal
- Computing, Environment, and Life Sciences Division, Argonne National Laboratory, Lemont, IL 60439, United States
| | - José P Faria
- Computing, Environment, and Life Sciences Division, Argonne National Laboratory, Lemont, IL 60439, United States
| | - Filipe Liu
- Computing, Environment, and Life Sciences Division, Argonne National Laboratory, Lemont, IL 60439, United States
| | - Christopher S Henry
- Computing, Environment, and Life Sciences Division, Argonne National Laboratory, Lemont, IL 60439, United States
| | - Oscar Dias
- Centre of Biological Engineering, University of Minho, Braga 4704-553, Portugal
- LABBELS—Associate Laboratory, Braga/Guimarães, Portugal
| |
Collapse
|
11
|
Bartmanski BJ, Rocha M, Zimmermann-Kogadeeva M. Recent advances in data- and knowledge-driven approaches to explore primary microbial metabolism. Curr Opin Chem Biol 2023; 75:102324. [PMID: 37207402 PMCID: PMC10410306 DOI: 10.1016/j.cbpa.2023.102324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 04/15/2023] [Accepted: 04/18/2023] [Indexed: 05/21/2023]
Abstract
With the rapid progress in metabolomics and sequencing technologies, more data on the metabolome of single microbes and their communities become available, revealing the potential of microorganisms to metabolize a broad range of chemical compounds. The analysis of microbial metabolomics datasets remains challenging since it inherits the technical challenges of metabolomics analysis, such as compound identification and annotation, while harboring challenges in data interpretation, such as distinguishing metabolite sources in mixed samples. This review outlines the recent advances in computational methods to analyze primary microbial metabolism: knowledge-based approaches that take advantage of metabolic and molecular networks and data-driven approaches that employ machine/deep learning algorithms in combination with large-scale datasets. These methods aim at improving metabolite identification and disentangling reciprocal interactions between microbes and metabolites. We also discuss the perspective of combining these approaches and further developments required to advance the investigation of primary metabolism in mixed microbial samples.
Collapse
Affiliation(s)
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Campus of Gualtar, Braga, Portugal
| | | |
Collapse
|
12
|
Belcour A, Got J, Aite M, Delage L, Collén J, Frioux C, Leblanc C, Dittami SM, Blanquart S, Markov GV, Siegel A. Inferring and comparing metabolism across heterogeneous sets of annotated genomes using AuCoMe. Genome Res 2023; 33:972-987. [PMID: 37468308 PMCID: PMC10629481 DOI: 10.1101/gr.277056.122] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 05/23/2023] [Indexed: 07/21/2023]
Abstract
Comparative analysis of genome-scale metabolic networks (GSMNs) may yield important information on the biology, evolution, and adaptation of species. However, it is impeded by the high heterogeneity of the quality and completeness of structural and functional genome annotations, which may bias the results of such comparisons. To address this issue, we developed AuCoMe, a pipeline to automatically reconstruct homogeneous GSMNs from a heterogeneous set of annotated genomes without discarding available manual annotations. We tested AuCoMe with three data sets, one bacterial, one fungal, and one algal, and showed that it successfully reduces technical biases while capturing the metabolic specificities of each organism. Our results also point out shared and divergent metabolic traits among evolutionarily distant algae, underlining the potential of AuCoMe to accelerate the broad exploration of metabolic evolution across the tree of life.
Collapse
Affiliation(s)
- Arnaud Belcour
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France;
| | - Jeanne Got
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France
| | - Méziane Aite
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France
| | - Ludovic Delage
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - Jonas Collén
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | | | - Catherine Leblanc
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - Simon M Dittami
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | | | - Gabriel V Markov
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - Anne Siegel
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France;
| |
Collapse
|
13
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|
14
|
Fakih I, Got J, Robles-Rodriguez CE, Siegel A, Forano E, Muñoz-Tamayo R. Dynamic genome-based metabolic modeling of the predominant cellulolytic rumen bacterium Fibrobacter succinogenes S85. mSystems 2023; 8:e0102722. [PMID: 37289026 PMCID: PMC10308913 DOI: 10.1128/msystems.01027-22] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/14/2023] [Indexed: 06/09/2023] Open
Abstract
Fibrobacter succinogenes is a cellulolytic bacterium that plays an essential role in the degradation of plant fibers in the rumen ecosystem. It converts cellulose polymers into intracellular glycogen and the fermentation metabolites succinate, acetate, and formate. We developed dynamic models of F. succinogenes S85 metabolism on glucose, cellobiose, and cellulose on the basis of a network reconstruction done with the automatic reconstruction of metabolic model workspace. The reconstruction was based on genome annotation, five template-based orthology methods, gap filling, and manual curation. The metabolic network of F. succinogenes S85 comprises 1,565 reactions with 77% linked to 1,317 genes, 1,586 unique metabolites, and 931 pathways. The network was reduced using the NetRed algorithm and analyzed for the computation of elementary flux modes. A yield analysis was further performed to select a minimal set of macroscopic reactions for each substrate. The accuracy of the models was acceptable in simulating F. succinogenes carbohydrate metabolism with an average coefficient of variation of the root mean squared error of 19%. The resulting models are useful resources for investigating the metabolic capabilities of F. succinogenes S85, including the dynamics of metabolite production. Such an approach is a key step toward the integration of omics microbial information into predictive models of rumen metabolism. IMPORTANCE F. succinogenes S85 is a cellulose-degrading and succinate-producing bacterium. Such functions are central for the rumen ecosystem and are of special interest for several industrial applications. This work illustrates how information of the genome of F. succinogenes can be translated to develop predictive dynamic models of rumen fermentation processes. We expect this approach can be applied to other rumen microbes for producing a model of rumen microbiome that can be used for studying microbial manipulation strategies aimed at enhancing feed utilization and mitigating enteric emissions.
Collapse
Affiliation(s)
- Ibrahim Fakih
- Université Clermont Auvergne, INRAE, UMR454 Microbiologie Environnement Digestif et Santé, 63000 Clermont-Ferrand, France
- Université Paris-Saclay, INRAE, AgroParisTech, UMR Modélisation Systémique Appliquée aux Ruminants, 91120 Palaiseau, France
| | - Jeanne Got
- Université Rennes, Inria, CNRS, IRISA, Dyliss team, 35042 Rennes, France
| | | | - Anne Siegel
- Université Rennes, Inria, CNRS, IRISA, Dyliss team, 35042 Rennes, France
| | - Evelyne Forano
- Université Clermont Auvergne, INRAE, UMR454 Microbiologie Environnement Digestif et Santé, 63000 Clermont-Ferrand, France
| | - Rafael Muñoz-Tamayo
- Université Paris-Saclay, INRAE, AgroParisTech, UMR Modélisation Systémique Appliquée aux Ruminants, 91120 Palaiseau, France
| |
Collapse
|
15
|
Systems biology's role in leveraging microalgal biomass potential: Current status and future perspectives. ALGAL RES 2022. [DOI: 10.1016/j.algal.2022.102963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
16
|
Tamasco G, Kumar M, Zengler K, Silva-Rocha R, da Silva RR. ChiMera: an easy to use pipeline for bacterial genome based metabolic network reconstruction, evaluation and visualization. BMC Bioinformatics 2022; 23:512. [PMID: 36451100 PMCID: PMC9710178 DOI: 10.1186/s12859-022-05056-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 11/14/2022] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Genome-scale metabolic reconstruction tools have been developed in the last decades. They have helped to reconstruct eukaryotic and prokaryotic metabolic models, which have contributed to fields, e.g., genetic engineering, drug discovery, prediction of phenotypes, and other model-driven discoveries. However, the use of these programs requires a high level of bioinformatic skills. Moreover, the functionalities required to build models are scattered throughout multiple tools, requiring knowledge and experience for utilizing several tools. RESULTS Here we present ChiMera, which combines tools used for model reconstruction, prediction, and visualization. ChiMera uses CarveMe in the reconstruction module, generating a gap-filled draft reconstruction able to produce growth predictions using flux balance analysis for gram-positive and gram-negative bacteria. ChiMera also contains two modules for metabolic network visualization. The first module generates maps for the most important pathways, e.g., glycolysis, nucleotides and amino acids biosynthesis, fatty acid oxidation and biosynthesis and core-metabolism. The second module produces a genome-wide metabolic map, which can be used to retrieve KEGG pathway information for each compound in the model. A module to investigate gene essentiality and knockout is also present. CONCLUSIONS Overall, ChiMera uses automation algorithms to combine a variety of tools to automatically perform model creation, gap-filling, flux balance analysis (FBA), and metabolic network visualization. ChiMera models readily provide metabolic insights that can aid genetic engineering projects, prediction of phenotypes, and model-driven discoveries.
Collapse
Affiliation(s)
- Gustavo Tamasco
- Ribeirão Preto School of Medicine (FMRP), University of São Paulo (USP), Ribeirão Preto, SP, Brazil.
| | - Manish Kumar
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0760, USA
| | - Karsten Zengler
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0760, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, 92093-0412, USA
- Center for Microbiome Innovation, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0403, USA
| | - Rafael Silva-Rocha
- Ribeirão Preto School of Medicine (FMRP), University of São Paulo (USP), Ribeirão Preto, SP, Brazil
| | - Ricardo Roberto da Silva
- School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP, Brazil.
| |
Collapse
|