1
|
Karp PD, Paley S, Caspi R, Kothari A, Krummenacker M, Midford PE, Moore LR, Subhraveti P, Gama-Castro S, Tierrafria VH, Lara P, Muñiz-Rascado L, Bonavides-Martinez C, Santos-Zavaleta A, Mackie A, Sun G, Ahn-Horst TA, Choi H, Juenemann R, Knudsen CNM, Covert MW, Collado-Vides J, Paulsen I. The EcoCyc database (2025). EcoSal Plus 2025:eesp00192024. [PMID: 40304522 DOI: 10.1128/ecosalplus.esp-0019-2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 03/18/2025] [Indexed: 05/02/2025]
Abstract
EcoCyc is a bioinformatics database (DB) available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project was to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on the regulation of gene expression, E. coli gene essentiality, and nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for the analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed via EcoCyc.org. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. Data generated from a whole-cell model that is parameterized from the latest data on EcoCyc is also available. This review outlines the data content of EcoCyc and the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Ron Caspi
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Markus Krummenacker
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Peter E Midford
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Lisa R Moore
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Pallavi Subhraveti
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Víctor H Tierrafria
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Paloma Lara
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Amanda Mackie
- School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Gwanggyu Sun
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Travis A Ahn-Horst
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Heejo Choi
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Riley Juenemann
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Cyrus N M Knudsen
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Markus W Covert
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Ian Paulsen
- School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
2
|
Varghese S, Jisha M, Rajeshkumar K, Gajbhiye V, Alrefaei AF, Jeewon R. Endophytic fungi: A future prospect for breast cancer therapeutics and drug development. Heliyon 2024; 10:e33995. [PMID: 39091955 PMCID: PMC11292557 DOI: 10.1016/j.heliyon.2024.e33995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 06/26/2024] [Accepted: 07/02/2024] [Indexed: 08/04/2024] Open
Abstract
Globally, breast cancer is a primary contributor to cancer-related fatalities and illnesses among women. Consequently, there is a pressing need for safe and effective treatments for breast cancer. Bioactive compounds from endophytic fungi that live in symbiosis with medicinal plants have garnered significant interest in pharmaceutical research due to their extensive chemical composition and prospective medicinal attributes. This review underscores the potentiality of fungal endophytes as a promising resource for the development of innovative anticancer agents specifically tailored for breast cancer therapy. The diversity of endophytic fungi residing in medicinal plants, success stories of key endophytic bioactive metabolites tested against breast cancer and the current progress with regards to in vivo studies and clinical trials on endophytic fungal metabolites in breast cancer research forms the underlying theme of this article. A thorough compilation of putative anticancer compounds sourced from endophytic fungi that have demonstrated therapeutic potential against breast cancer, spanning the period from 1990 to 2022, has been presented. This review article also outlines the latest trends in endophyte-based drug discovery, including the use of artificial intelligence, machine learning, multi-omics approaches, and high-throughput strategies. The challenges and future prospects associated with fungal endophytes as substitutive sources for developing anticancer drugs targeting breast cancer are also being highlighted.
Collapse
Affiliation(s)
- Sherin Varghese
- School of Biosciences, Mahatma Gandhi University, Kottayam, Kerala, 686560, India
| | - M.S. Jisha
- School of Biosciences, Mahatma Gandhi University, Kottayam, Kerala, 686560, India
| | - K.C. Rajeshkumar
- National Fungal Culture Collection of India (NFCCI), Biodiversity and Palaeobiology (Fungi) Gr., Agharkar Research Institute, G.G. Agharkar Road, Pune, 411 004, Maharashtra, India
| | - Virendra Gajbhiye
- Nanobioscience Group, Agharkar Research Institute, G.G. Agharkar Road, Pune, 411 004, Maharashtra, India
| | - Abdulwahed Fahad Alrefaei
- Department of Zoology, College of Science, King Saud University, P.O. Box 2455, Riyadh, 11451, Saudi Arabia
| | - Rajesh Jeewon
- Department of Zoology, College of Science, King Saud University, P.O. Box 2455, Riyadh, 11451, Saudi Arabia
- Department of Health Sciences, Faculty of Medicine and Health Sciences, University of Mauritius, Reduit, Mauritius
| |
Collapse
|
3
|
Karp PD, Paley S, Caspi R, Kothari A, Krummenacker M, Midford PE, Moore LR, Subhraveti P, Gama-Castro S, Tierrafria VH, Lara P, Muñiz-Rascado L, Bonavides-Martinez C, Santos-Zavaleta A, Mackie A, Sun G, Ahn-Horst TA, Choi H, Covert MW, Collado-Vides J, Paulsen I. The EcoCyc Database (2023). EcoSal Plus 2023; 11:eesp00022023. [PMID: 37220074 PMCID: PMC10729931 DOI: 10.1128/ecosalplus.esp-0002-2023] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 04/04/2023] [Indexed: 01/28/2024]
Abstract
EcoCyc is a bioinformatics database available online at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on the regulation of gene expression, E. coli gene essentiality, and nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for the analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed online. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. Data generated from a whole-cell model that is parameterized from the latest data on EcoCyc are also available. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D. Karp
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Ron Caspi
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Markus Krummenacker
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Peter E. Midford
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Lisa R. Moore
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Pallavi Subhraveti
- Bioinformatics Research Group, SRI International, Menlo Park, California, USA
| | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Victor H. Tierrafria
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Paloma Lara
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Gwanggyu Sun
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Travis A. Ahn-Horst
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Heejo Choi
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Markus W. Covert
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Ian Paulsen
- School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
4
|
Carter EL, Constantinidou C, Alam MT. Applications of genome-scale metabolic models to investigate microbial metabolic adaptations in response to genetic or environmental perturbations. Brief Bioinform 2023; 25:bbad439. [PMID: 38048080 PMCID: PMC10694557 DOI: 10.1093/bib/bbad439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 09/21/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
Environmental perturbations are encountered by microorganisms regularly and will require metabolic adaptations to ensure an organism can survive in the newly presenting conditions. In order to study the mechanisms of metabolic adaptation in such conditions, various experimental and computational approaches have been used. Genome-scale metabolic models (GEMs) are one of the most powerful approaches to study metabolism, providing a platform to study the systems level adaptations of an organism to different environments which could otherwise be infeasible experimentally. In this review, we are describing the application of GEMs in understanding how microbes reprogram their metabolic system as a result of environmental variation. In particular, we provide the details of metabolic model reconstruction approaches, various algorithms and tools for model simulation, consequences of genetic perturbations, integration of '-omics' datasets for creating context-specific models and their application in studying metabolic adaptation due to the change in environmental conditions.
Collapse
Affiliation(s)
- Elena Lucy Carter
- Warwick Medical School, University of Warwick, Coventry, CV4 7HL, UK
| | | | | |
Collapse
|
5
|
Rajagopal S, Hmar RV, Mookherjee D, Ghatak A, Shanbhag AP, Katagihallimath N, Venkatraman J, Ks R, Datta S. Validated In Silico Population Model of Escherichia coli. ACS Synth Biol 2022; 11:2672-2684. [PMID: 35801944 DOI: 10.1021/acssynbio.2c00097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Flux balance analysis (FBA) and ordinary differential equation models have been instrumental in depicting the metabolic functioning of a cell. Nevertheless, they demonstrate a population's average behavior (summation of individuals), thereby portraying homogeneity. However, living organisms such as Escherichia coli contain more biochemical reactions than engaging metabolites, making them an underdetermined and degenerate system. This results in a heterogeneous population with varying metabolic patterns. We have formulated a population systems biology model that predicts this degeneracy by emulating a diverse metabolic makeup with unique biochemical signatures. The model mimics the universally accepted experimental view that a subpopulation of bacteria, even under normal growth conditions, renders a unique biochemical state, leading to the synthesis of metabolites and persister progenitors of antibiotic resistance and biofilms. We validate the platform's predictions by producing commercially important heterologous (isobutanol) and homologous (shikimate) metabolites. The predicted fluxes are tested in vitro resulting in 32- and 42-fold increased product of isobutanol and shikimate, respectively. Moreover, we authenticate the platform by mimicking a bacterial population in the presence of glyphosate, a metabolic pathway inhibitor. Here, we observe a fraction of subsisting persisters despite inhibition, thus affirming the signature of a heterogeneous populace. The platform has multiple uses based on the disposition of the user.
Collapse
Affiliation(s)
- Sreenath Rajagopal
- Bugworks Research India Private Limited, C-CAMP, National Center for Biological Sciences (TIFR), Bangalore 560065, India
| | - Rothangmawi Victoria Hmar
- Biomoneta Research Private Limited, C-CAMP, National Center for Biological Sciences (TIFR), Bangalore 560092, India
| | - Debdatto Mookherjee
- Bugworks Research India Private Limited, C-CAMP, National Center for Biological Sciences (TIFR), Bangalore 560065, India
| | - Arindam Ghatak
- Biomoneta Research Private Limited, C-CAMP, National Center for Biological Sciences (TIFR), Bangalore 560092, India.,Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata 700073, India
| | - Anirudh P Shanbhag
- Bugworks Research India Private Limited, C-CAMP, National Center for Biological Sciences (TIFR), Bangalore 560065, India.,Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata 700073, India
| | - Nainesh Katagihallimath
- Bugworks Research India Private Limited, C-CAMP, National Center for Biological Sciences (TIFR), Bangalore 560065, India
| | - Janani Venkatraman
- Biomoneta Research Private Limited, C-CAMP, National Center for Biological Sciences (TIFR), Bangalore 560092, India
| | - Ramanujan Ks
- Biomoneta Research Private Limited, C-CAMP, National Center for Biological Sciences (TIFR), Bangalore 560092, India
| | - Santanu Datta
- Bugworks Research India Private Limited, C-CAMP, National Center for Biological Sciences (TIFR), Bangalore 560065, India
| |
Collapse
|
6
|
Endophytic Fungi: Key Insights, Emerging Prospects, and Challenges in Natural Product Drug Discovery. Microorganisms 2022; 10:microorganisms10020360. [PMID: 35208814 PMCID: PMC8876476 DOI: 10.3390/microorganisms10020360] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 01/25/2022] [Accepted: 02/01/2022] [Indexed: 12/01/2022] Open
Abstract
Plant-associated endophytes define an important symbiotic association in nature and are established bio-reservoirs of plant-derived natural products. Endophytes colonize the internal tissues of a plant without causing any disease symptoms or apparent changes. Recently, there has been a growing interest in endophytes because of their beneficial effects on the production of novel metabolites of pharmacological significance. Studies have highlighted the socio-economic implications of endophytic fungi in agriculture, medicine, and the environment, with considerable success. Endophytic fungi-mediated biosynthesis of well-known metabolites includes taxol from Taxomyces andreanae, azadirachtin A and B from Eupenicillium parvum, vincristine from Fusarium oxysporum, and quinine from Phomopsis sp. The discovery of the billion-dollar anticancer drug taxol was a landmark in endophyte biology/research and established new paradigms for the metabolic potential of plant-associated endophytes. In addition, endophytic fungi have emerged as potential prolific producers of antimicrobials, antiseptics, and antibiotics of plant origin. Although extensively studied as a “production platform” of novel pharmacological metabolites, the molecular mechanisms of plant–endophyte dynamics remain less understood/explored for their efficient utilization in drug discovery. The emerging trends in endophytic fungi-mediated biosynthesis of novel bioactive metabolites, success stories of key pharmacological metabolites, strategies to overcome the existing challenges in endophyte biology, and future direction in endophytic fungi-based drug discovery forms the underlying theme of this article.
Collapse
|
7
|
Frades I, Foguet C, Cascante M, Araúzo-Bravo MJ. Genome Scale Modeling to Study the Metabolic Competition between Cells in the Tumor Microenvironment. Cancers (Basel) 2021; 13:4609. [PMID: 34572839 PMCID: PMC8470216 DOI: 10.3390/cancers13184609] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 09/06/2021] [Accepted: 09/09/2021] [Indexed: 12/31/2022] Open
Abstract
The tumor's physiology emerges from the dynamic interplay of numerous cell types, such as cancer cells, immune cells and stromal cells, within the tumor microenvironment. Immune and cancer cells compete for nutrients within the tumor microenvironment, leading to a metabolic battle between these cell populations. Tumor cells can reprogram their metabolism to meet the high demand of building blocks and ATP for proliferation, and to gain an advantage over the action of immune cells. The study of the metabolic reprogramming mechanisms underlying cancer requires the quantification of metabolic fluxes which can be estimated at the genome-scale with constraint-based or kinetic modeling. Constraint-based models use a set of linear constraints to simulate steady-state metabolic fluxes, whereas kinetic models can simulate both the transient behavior and steady-state values of cellular fluxes and concentrations. The integration of cell- or tissue-specific data enables the construction of context-specific models that reflect cell-type- or tissue-specific metabolic properties. While the available modeling frameworks enable limited modeling of the metabolic crosstalk between tumor and immune cells in the tumor stroma, future developments will likely involve new hybrid kinetic/stoichiometric formulations.
Collapse
Affiliation(s)
- Itziar Frades
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, 20009 San Sebastian, Spain;
| | - Carles Foguet
- Department of Biochemistry and Molecular Biomedicine, Institute of Biomedicine of University of Barcelona, Faculty of Biology, Universitat de Barcelona, Av. Diagonal 643, 08028 Barcelona, Spain; (C.F.); (M.C.)
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD) (CB17/04/00023) and Metabolomics Node at Spanish National Bioinformatics Institute (INB-ISCIII-ES-ELIXIR), Instituto de Salud Carlos III (ISCIII), 28020 Madrid, Spain
| | - Marta Cascante
- Department of Biochemistry and Molecular Biomedicine, Institute of Biomedicine of University of Barcelona, Faculty of Biology, Universitat de Barcelona, Av. Diagonal 643, 08028 Barcelona, Spain; (C.F.); (M.C.)
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD) (CB17/04/00023) and Metabolomics Node at Spanish National Bioinformatics Institute (INB-ISCIII-ES-ELIXIR), Instituto de Salud Carlos III (ISCIII), 28020 Madrid, Spain
| | - Marcos J. Araúzo-Bravo
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, 20009 San Sebastian, Spain;
- Max Planck Institute of Molecular Biomedicine, 48167 Münster, Germany
- Centro de Investigación Biomédica en Red de Fragilidad y Envejecimiento Saludable (CIBERfes), 28015 Madrid, Spain
- Translational Bioinformatics Network (TransBioNet), 8001 Barcelona, Spain
- Ikerbasque, Basque Foundation for Science, 48012 Bilbao, Spain
| |
Collapse
|
8
|
Aghdam SA, Brown AMV. Deep learning approaches for natural product discovery from plant endophytic microbiomes. ENVIRONMENTAL MICROBIOME 2021; 16:6. [PMID: 33758794 PMCID: PMC7972023 DOI: 10.1186/s40793-021-00375-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 02/21/2021] [Indexed: 05/10/2023]
Abstract
Plant microbiomes are not only diverse, but also appear to host a vast pool of secondary metabolites holding great promise for bioactive natural products and drug discovery. Yet, most microbes within plants appear to be uncultivable, and for those that can be cultivated, their metabolic potential lies largely hidden through regulatory silencing of biosynthetic genes. The recent explosion of powerful interdisciplinary approaches, including multi-omics methods to address multi-trophic interactions and artificial intelligence-based computational approaches to infer distribution of function, together present a paradigm shift in high-throughput approaches to natural product discovery from plant-associated microbes. Arguably, the key to characterizing and harnessing this biochemical capacity depends on a novel, systematic approach to characterize the triggers that turn on secondary metabolite biosynthesis through molecular or genetic signals from the host plant, members of the rich 'in planta' community, or from the environment. This review explores breakthrough approaches for natural product discovery from plant microbiomes, emphasizing the promise of deep learning as a tool for endophyte bioprospecting, endophyte biochemical novelty prediction, and endophyte regulatory control. It concludes with a proposed pipeline to harness global databases (genomic, metabolomic, regulomic, and chemical) to uncover and unsilence desirable natural products. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1186/s40793-021-00375-0.
Collapse
Affiliation(s)
- Shiva Abdollahi Aghdam
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX 79409 USA
| | - Amanda May Vivian Brown
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX 79409 USA
| |
Collapse
|
9
|
Pathway Tools Visualization of Organism-Scale Metabolic Networks. Metabolites 2021; 11:metabo11020064. [PMID: 33499002 PMCID: PMC7911265 DOI: 10.3390/metabo11020064] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 01/12/2021] [Accepted: 01/12/2021] [Indexed: 12/20/2022] Open
Abstract
Metabolomics, synthetic biology, and microbiome research demand information about organism-scale metabolic networks. The convergence of genome sequencing and computational inference of metabolic networks has enabled great progress toward satisfying that demand by generating metabolic reconstructions from the genomes of thousands of sequenced organisms. Visualization of whole metabolic networks is critical for aiding researchers in understanding, analyzing, and exploiting those reconstructions. We have developed bioinformatics software tools that automatically generate a full metabolic-network diagram for an organism, and that enable searching and analyses of the network. The software generates metabolic-network diagrams for unicellular organisms, for multi-cellular organisms, and for pan-genomes and organism communities. Search tools enable users to find genes, metabolites, enzymes, reactions, and pathways within a diagram. The diagrams are zoomable to enable researchers to study local neighborhoods in detail and to see the big picture. The diagrams also serve as tools for comparison of metabolic networks and for interpreting high-throughput datasets, including transcriptomics, metabolomics, and reaction fluxes computed by metabolic models. These data can be overlaid on the metabolic charts to produce animated zoomable displays of metabolic flux and metabolite abundance. The BioCyc.org website contains whole-network diagrams for more than 18,000 sequenced organisms. The ready availability of organism-specific metabolic network diagrams and associated tools for almost any sequenced organism are useful for researchers working to better understand the metabolism of their organism and to interpret high-throughput datasets in a metabolic context.
Collapse
|
10
|
Systematically gap-filling the genome-scale metabolic model of CHO cells. Biotechnol Lett 2020; 43:73-87. [PMID: 33040240 DOI: 10.1007/s10529-020-03021-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 10/03/2020] [Indexed: 10/23/2022]
Abstract
OBJECTIVE Chinese hamster ovary (CHO) cells are the leading cell factories for producing recombinant proteins in the biopharmaceutical industry. In this regard, constraint-based metabolic models are useful platforms to perform computational analysis of cell metabolism. These models need to be regularly updated in order to include the latest biochemical data of the cells, and to increase their predictive power. Here, we provide an update to iCHO1766, the metabolic model of CHO cells. RESULTS We expanded the existing model of Chinese hamster metabolism with the help of four gap-filling approaches, leading to the addition of 773 new reactions and 335 new genes. We incorporated these into an updated genome-scale metabolic network model of CHO cells, named iCHO2101. In this updated model, the number of reactions and pathways capable of carrying flux is substantially increased. CONCLUSIONS The present CHO model is an important step towards more complete metabolic models of CHO cells.
Collapse
|
11
|
Abstract
While there has been much study of bacterial gene dispensability, there is a lack of comprehensive genome-scale examinations of the impact of gene deletion on growth in different carbon sources. In this context, a lot can be learned from such experiments in the model microbe Escherichia coli where much is already understood and there are existing tools for the investigation of carbon metabolism and physiology (1). Gene deletion studies have practical potential in the field of antibiotic drug discovery where there is emerging interest in bacterial central metabolism as a target for new antibiotics (2). Furthermore, some carbon utilization pathways have been shown to be critical for initiating and maintaining infection for certain pathogens and sites of infection (3–5). Here, with the use of high-throughput solid medium phenotyping methods, we have generated kinetic growth measurements for 3,796 genes under 30 different carbon source conditions. This data set provides a foundation for research that will improve our understanding of genes with unknown function, aid in predicting potential antibiotic targets, validate and advance metabolic models, and help to develop our understanding of E. coli metabolism. Central metabolism is a topic that has been studied for decades, and yet, this process is still not fully understood in Escherichia coli, perhaps the most amenable and well-studied model organism in biology. To further our understanding, we used a high-throughput method to measure the growth kinetics of each of 3,796 E. coli single-gene deletion mutants in 30 different carbon sources. In total, there were 342 genes (9.01%) encompassing a breadth of biological functions that showed a growth phenotype on at least 1 carbon source, demonstrating that carbon metabolism is closely linked to a large number of processes in the cell. We identified 74 genes that showed low growth in 90% of conditions, defining a set of genes which are essential in nutrient-limited media, regardless of the carbon source. The data are compiled into a Web application, Carbon Phenotype Explorer (CarPE), to facilitate easy visualization of growth curves for each mutant strain in each carbon source. Our experimental data matched closely with the predictions from the EcoCyc metabolic model which uses flux balance analysis to predict growth phenotypes. From our comparisons to the model, we found that, unexpectedly, phosphoenolpyruvate carboxylase (ppc) was required for robust growth in most carbon sources other than most trichloroacetic acid (TCA) cycle intermediates. We also identified 51 poorly annotated genes that showed a low growth phenotype in at least 1 carbon source, which allowed us to form hypotheses about the functions of these genes. From this list, we further characterized the ydhC gene and demonstrated its role in adenosine efflux.
Collapse
|
12
|
Ong WK, Midford PE, Karp PD. Taxonomic weighting improves the accuracy of a gap-filling algorithm for metabolic models. Bioinformatics 2020; 36:1823-1830. [PMID: 31688932 PMCID: PMC7523652 DOI: 10.1093/bioinformatics/btz813] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 08/29/2019] [Accepted: 10/31/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The increasing availability of annotated genome sequences enables construction of genome-scale metabolic networks, which are useful tools for studying organisms of interest. However, due to incomplete genome annotations, draft metabolic models contain gaps that must be filled in a time-consuming process before they are usable. Optimization-based algorithms that fill these gaps have been developed, however, gap-filling algorithms show significant error rates and often introduce incorrect reactions. RESULTS Here, we present a new gap-filling method that computes the costs of candidate gap-filling reactions from a universal reaction database (MetaCyc) based on taxonomic information. When gap-filling a metabolic model for an organism M (such as Escherichia coli), the cost for reaction R is based on the frequency with which R occurs in other organisms within the phylum of M (in this case, Proteobacteria). The assumption behind this method is that different taxonomic groups are biased toward using different metabolic reactions. Evaluation of the new gap-filler on randomly degraded variants of the EcoCyc metabolic model for E.coli showed an increase in the average F1-score to 99.0 (when using the variable weights by frequency method at the phylum level), compared to 91.0 using the previous MetaFlux gap-filler and 80.3 using a basic gap-filler. Evaluation on two other microbial metabolic models showed similar improvements. AVAILABILITY AND IMPLEMENTATION The Pathway Tools software (including MetaFlux) is free for academic use and is available at http://pathwaytools.com. Additional code for reproducing the results presented here is available at www.ai.sri.com/pkarp/pubs/taxgap/supplementary.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wai Kit Ong
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, USA
| | - Peter E Midford
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, USA
| | - Peter D Karp
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, USA
| |
Collapse
|
13
|
Karp PD, Billington R, Caspi R, Fulcher CA, Latendresse M, Kothari A, Keseler IM, Krummenacker M, Midford PE, Ong Q, Ong WK, Paley SM, Subhraveti P. The BioCyc collection of microbial genomes and metabolic pathways. Brief Bioinform 2019; 20:1085-1093. [PMID: 29447345 PMCID: PMC6781571 DOI: 10.1093/bib/bbx085] [Citation(s) in RCA: 542] [Impact Index Per Article: 90.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Revised: 06/22/2017] [Indexed: 01/31/2023] Open
Abstract
BioCyc.org is a microbial genome Web portal that combines thousands of genomes with additional information inferred by computer programs, imported from other databases and curated from the biomedical literature by biologist curators. BioCyc also provides an extensive range of query tools, visualization services and analysis software. Recent advances in BioCyc include an expansion in the content of BioCyc in terms of both the number of genomes and the types of information available for each genome; an expansion in the amount of curated content within BioCyc; and new developments in the BioCyc software tools including redesigned gene/protein pages and metabolite pages; new search tools; a new sequence-alignment tool; a new tool for visualizing groups of related metabolic pathways; and a facility called SmartTables, which enables biologists to perform analyses that previously would have required a programmer's assistance.
Collapse
|
14
|
Human Systems Biology and Metabolic Modelling: A Review-From Disease Metabolism to Precision Medicine. BIOMED RESEARCH INTERNATIONAL 2019; 2019:8304260. [PMID: 31281846 PMCID: PMC6590590 DOI: 10.1155/2019/8304260] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 02/07/2019] [Accepted: 05/20/2019] [Indexed: 01/06/2023]
Abstract
In cell and molecular biology, metabolism is the only system that can be fully simulated at genome scale. Metabolic systems biology offers powerful abstraction tools to simulate all known metabolic reactions in a cell, therefore providing a snapshot that is close to its observable phenotype. In this review, we cover the 15 years of human metabolic modelling. We show that, although the past five years have not experienced large improvements in the size of the gene and metabolite sets in human metabolic models, their accuracy is rapidly increasing. We also describe how condition-, tissue-, and patient-specific metabolic models shed light on cell-specific changes occurring in the metabolic network, therefore predicting biomarkers of disease metabolism. We finally discuss current challenges and future promising directions for this research field, including machine/deep learning and precision medicine. In the omics era, profiling patients and biological processes from a multiomic point of view is becoming more common and less expensive. Starting from multiomic data collected from patients and N-of-1 trials where individual patients constitute different case studies, methods for model-building and data integration are being used to generate patient-specific models. Coupled with state-of-the-art machine learning methods, this will allow characterizing each patient's disease phenotype and delivering precision medicine solutions, therefore leading to preventative medicine, reduced treatment, and in silico clinical trials.
Collapse
|
15
|
Weinrich S, Koch S, Bonk F, Popp D, Benndorf D, Klamt S, Centler F. Augmenting Biogas Process Modeling by Resolving Intracellular Metabolic Activity. Front Microbiol 2019; 10:1095. [PMID: 31156601 PMCID: PMC6533897 DOI: 10.3389/fmicb.2019.01095] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Accepted: 04/30/2019] [Indexed: 01/23/2023] Open
Abstract
The process of anaerobic digestion in which waste biomass is transformed to methane by complex microbial communities has been modeled for more than 16 years by parametric gray box approaches that simplify process biology and do not resolve intracellular microbial activity. Information on such activity, however, has become available in unprecedented detail by recent experimental advances in metatranscriptomics and metaproteomics. The inclusion of such data could lead to more powerful process models of anaerobic digestion that more faithfully represent the activity of microbial communities. We augmented the Anaerobic Digestion Model No. 1 (ADM1) as the standard kinetic model of anaerobic digestion by coupling it to Flux-Balance-Analysis (FBA) models of methanogenic species. Steady-state results of coupled models are comparable to standard ADM1 simulations if the energy demand for non-growth associated maintenance (NGAM) is chosen adequately. When changing a constant feed of maize silage from continuous to pulsed feeding, the final average methane production remains very similar for both standard and coupled models, while both the initial response of the methanogenic population at the onset of pulsed feeding as well as its dynamics between pulses deviates considerably. In contrast to ADM1, the coupled models deliver predictions of up to 1,000s of intracellular metabolic fluxes per species, describing intracellular metabolic pathway activity in much higher detail. Furthermore, yield coefficients which need to be specified in ADM1 are no longer required as they are implicitly encoded in the topology of the species’ metabolic network. We show the feasibility of augmenting ADM1, an ordinary differential equation-based model for simulating biogas production, by FBA models implementing individual steps of anaerobic digestion. While cellular maintenance is introduced as a new parameter, the total number of parameters is reduced as yield coefficients no longer need to be specified. The coupled models provide detailed predictions on intracellular activity of microbial species which are compatible with experimental data on enzyme synthesis activity or abundance as obtained by metatranscriptomics or metaproteomics. By providing predictions of intracellular fluxes of individual community members, the presented approach advances the simulation of microbial community driven processes and provides a direct link to validation by state-of-the-art experimental techniques.
Collapse
Affiliation(s)
- Sören Weinrich
- Biochemical Conversion Department, Deutsches Biomasseforschungszentrum gGmbH, Leipzig, Germany
| | - Sabine Koch
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Fabian Bonk
- Department of Environmental Microbiology, UFZ - Helmholtz Centre for Environmental Research, Leipzig, Germany
| | - Denny Popp
- Department of Environmental Microbiology, UFZ - Helmholtz Centre for Environmental Research, Leipzig, Germany
| | - Dirk Benndorf
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany.,Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
| | - Steffen Klamt
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Florian Centler
- Department of Environmental Microbiology, UFZ - Helmholtz Centre for Environmental Research, Leipzig, Germany
| |
Collapse
|
16
|
Heirendt L, Arreckx S, Pfau T, Mendoza SN, Richelle A, Heinken A, Haraldsdóttir HS, Wachowiak J, Keating SM, Vlasov V, Magnusdóttir S, Ng CY, Preciat G, Žagare A, Chan SHJ, Aurich MK, Clancy CM, Modamio J, Sauls JT, Noronha A, Bordbar A, Cousins B, El Assal DC, Valcarcel LV, Apaolaza I, Ghaderi S, Ahookhosh M, Ben Guebila M, Kostromins A, Sompairac N, Le HM, Ma D, Sun Y, Wang L, Yurkovich JT, Oliveira MAP, Vuong PT, El Assal LP, Kuperstein I, Zinovyev A, Hinton HS, Bryant WA, Aragón Artacho FJ, Planes FJ, Stalidzans E, Maass A, Vempala S, Hucka M, Saunders MA, Maranas CD, Lewis NE, Sauter T, Palsson BØ, Thiele I, Fleming RMT. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat Protoc 2019; 14:639-702. [PMID: 30787451 PMCID: PMC6635304 DOI: 10.1038/s41596-018-0098-2] [Citation(s) in RCA: 695] [Impact Index Per Article: 115.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Constraint-based reconstruction and analysis (COBRA) provides a molecular mechanistic framework for integrative analysis of experimental molecular systems biology data and quantitative prediction of physicochemically and biochemically feasible phenotypic states. The COBRA Toolbox is a comprehensive desktop software suite of interoperable COBRA methods. It has found widespread application in biology, biomedicine, and biotechnology because its functions can be flexibly combined to implement tailored COBRA protocols for any biochemical network. This protocol is an update to the COBRA Toolbox v.1.0 and v.2.0. Version 3.0 includes new methods for quality-controlled reconstruction, modeling, topological analysis, strain and experimental design, and network visualization, as well as network integration of chemoinformatic, metabolomic, transcriptomic, proteomic, and thermochemical data. New multi-lingual code integration also enables an expansion in COBRA application scope via high-precision, high-performance, and nonlinear numerical optimization solvers for multi-scale, multi-cellular, and reaction kinetic modeling, respectively. This protocol provides an overview of all these new features and can be adapted to generate and analyze constraint-based models in a wide variety of scenarios. The COBRA Toolbox v.3.0 provides an unparalleled depth of COBRA methods.
Collapse
Affiliation(s)
- Laurent Heirendt
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Sylvain Arreckx
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Thomas Pfau
- Life Sciences Research Unit, University of Luxembourg, Belvaux, Luxembourg
| | - Sebastián N Mendoza
- Center for Genome Regulation (Fondap 15090007), University of Chile, Santiago, Chile
- Mathomics, Center for Mathematical Modeling, University of Chile, Santiago, Chile
| | - Anne Richelle
- Department of Pediatrics, University of California, San Diego, School of Medicine, La Jolla, CA, USA
| | - Almut Heinken
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Hulda S Haraldsdóttir
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Jacek Wachowiak
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Sarah M Keating
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Vanja Vlasov
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Stefania Magnusdóttir
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Chiam Yu Ng
- Department of Chemical Engineering, The Pennsylvania State University, State College, PA, USA
| | - German Preciat
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Alise Žagare
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Siu H J Chan
- Department of Chemical Engineering, The Pennsylvania State University, State College, PA, USA
| | - Maike K Aurich
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Catherine M Clancy
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Jennifer Modamio
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - John T Sauls
- Department of Physics, and Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Alberto Noronha
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | | | - Benjamin Cousins
- Algorithms and Randomness Center, School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
| | - Diana C El Assal
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Luis V Valcarcel
- Biomedical Engineering and Sciences Department, TECNUN, University of Navarra, San Sebastián, Spain
| | - Iñigo Apaolaza
- Biomedical Engineering and Sciences Department, TECNUN, University of Navarra, San Sebastián, Spain
| | - Susan Ghaderi
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Masoud Ahookhosh
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Marouen Ben Guebila
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Andrejs Kostromins
- Institute of Microbiology and Biotechnology, University of Latvia, Riga, Latvia
| | - Nicolas Sompairac
- Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, Paris, France
| | - Hoai M Le
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Ding Ma
- Department of Management Science and Engineering, Stanford University, Stanford, CA, USA
| | - Yuekai Sun
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Lin Wang
- Department of Chemical Engineering, The Pennsylvania State University, State College, PA, USA
| | - James T Yurkovich
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Miguel A P Oliveira
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Phan T Vuong
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Lemmer P El Assal
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Inna Kuperstein
- Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, Paris, France
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, Paris, France
| | - H Scott Hinton
- Utah State University Research Foundation, North Logan, UT, USA
| | - William A Bryant
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK
| | | | - Francisco J Planes
- Biomedical Engineering and Sciences Department, TECNUN, University of Navarra, San Sebastián, Spain
| | - Egils Stalidzans
- Institute of Microbiology and Biotechnology, University of Latvia, Riga, Latvia
| | - Alejandro Maass
- Center for Genome Regulation (Fondap 15090007), University of Chile, Santiago, Chile
- Mathomics, Center for Mathematical Modeling, University of Chile, Santiago, Chile
| | - Santosh Vempala
- Algorithms and Randomness Center, School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
| | - Michael Hucka
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Michael A Saunders
- Department of Management Science and Engineering, Stanford University, Stanford, CA, USA
| | - Costas D Maranas
- Department of Chemical Engineering, The Pennsylvania State University, State College, PA, USA
| | - Nathan E Lewis
- Department of Pediatrics, University of California, San Diego, School of Medicine, La Jolla, CA, USA
- Novo Nordisk Foundation Center for Biosustainability, University of California, San Diego, La Jolla, CA, USA
| | - Thomas Sauter
- Life Sciences Research Unit, University of Luxembourg, Belvaux, Luxembourg
| | - Bernhard Ø Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Lyngby, Denmark
| | - Ines Thiele
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Ronan M T Fleming
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg.
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
17
|
Castillo S, Patil KR, Jouhten P. Yeast Genome-Scale Metabolic Models for Simulating Genotype-Phenotype Relations. PROGRESS IN MOLECULAR AND SUBCELLULAR BIOLOGY 2019; 58:111-133. [PMID: 30911891 DOI: 10.1007/978-3-030-13035-0_5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Understanding genotype-phenotype dependency is a universal aim for all life sciences. While the complete genotype-phenotype relations remain challenging to resolve, metabolic phenotypes are moving within the reach through genome-scale metabolic model simulations. Genome-scale metabolic models are available for commonly investigated yeasts, such as model eukaryote and domesticated fermentation species Saccharomyces cerevisiae, and automatic reconstruction methods facilitate obtaining models for any sequenced species. The models allow for investigating genotype-phenotype relations through simulations simultaneously considering the effects of nutrient availability, and redox and energy homeostasis in cells. Genome-scale models also offer frameworks for omics data integration to help to uncover how the translation of genotypes to the apparent phenotypes is regulated at different levels. In this chapter, we provide an overview of the yeast genome-scale metabolic models and the simulation approaches for using these models to interrogate genotype-phenotype relations. We review the methodological approaches according to the underlying biological reasoning in order to inspire formulating novel questions and applications that the genome-scale metabolic models could contribute to. Finally, we discuss current challenges and opportunities in the genome-scale metabolic model simulations.
Collapse
Affiliation(s)
- Sandra Castillo
- VTT Technical Research Centre of Finland Ltd., Tietotie 2, 02044, Espoo, Finland
| | - Kiran Raosaheb Patil
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117, Heidelberg, Germany
| | - Paula Jouhten
- VTT Technical Research Centre of Finland Ltd., Tietotie 2, 02044, Espoo, Finland.
| |
Collapse
|
18
|
Karp PD, Ong WK, Paley S, Billington R, Caspi R, Fulcher C, Kothari A, Krummenacker M, Latendresse M, Midford PE, Subhraveti P, Gama-Castro S, Muñiz-Rascado L, Bonavides-Martinez C, Santos-Zavaleta A, Mackie A, Collado-Vides J, Keseler IM, Paulsen I. The EcoCyc Database. EcoSal Plus 2018; 8:10.1128/ecosalplus.ESP-0006-2018. [PMID: 30406744 PMCID: PMC6504970 DOI: 10.1128/ecosalplus.esp-0006-2018] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Indexed: 01/28/2023]
Abstract
EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed via EcoCyc.org. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Wai Kit Ong
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Suzanne Paley
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Ron Caspi
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Carol Fulcher
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Mario Latendresse
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Peter E Midford
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | | | - Socorro Gama-Castro
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Luis Muñiz-Rascado
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - César Bonavides-Martinez
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Alberto Santos-Zavaleta
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
| | - Ingrid M Keseler
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
| | - Ian Paulsen
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
19
|
Tamura T, Lu W, Song J, Akutsu T. Computing Minimum Reaction Modifications in a Boolean Metabolic Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1853-1862. [PMID: 29989991 DOI: 10.1109/tcbb.2017.2777456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In metabolic network modification, we newly add enzymes or/and knock-out genes to maximize the biomass production with minimum side-effect. Although this problem has been studied for various problem settings via mathematical models including flux balance analysis, elementary mode, and Boolean models, some important problem settings still remain to be studied. In this paper, we consider the Boolean Reaction Modification (BRM) problem, where a host metabolic network and a reference metabolic network are given in the Boolean model. The host network initially produces some toxic compounds and cannot produce some necessary compounds, but the reference network can produce the necessary compounds, and we should minimize the total number of removed reactions from the host network and added reactions from the reference network so that the toxic compounds are not producible, but the necessary compounds are producible in the resulting host network. We developed integer linear programming (ILP)-based methods for BRM, and compared them with OptStrain and SimOptStrain. The results show that our method performed better for reducing the total number of added and removed reactions, while OptStrain and SimOptStrain performed better for optimizing the production of the target compound. Our developed software is freely available at "http://sunflower.kuicr.kyoto-u.ac.jp/~rogi/solBRM/solBRM.html ".
Collapse
|
20
|
Xu X, Zarecki R, Medina S, Ofaim S, Liu X, Chen C, Hu S, Brom D, Gat D, Porob S, Eizenberg H, Ronen Z, Jiang J, Freilich S. Modeling microbial communities from atrazine contaminated soils promotes the development of biostimulation solutions. ISME JOURNAL 2018; 13:494-508. [PMID: 30291327 DOI: 10.1038/s41396-018-0288-5] [Citation(s) in RCA: 102] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 09/10/2018] [Accepted: 09/14/2018] [Indexed: 12/26/2022]
Abstract
Microbial communities play a vital role in biogeochemical cycles, allowing the biodegradation of a wide range of pollutants. The composition of the community and the interactions between its members affect degradation rate and determine the identity of the final products. Here, we demonstrate the application of sequencing technologies and metabolic modeling approaches towards enhancing biodegradation of atrazine-a herbicide causing environmental pollution. Treatment of agriculture soil with atrazine is shown to induce significant changes in community structure and functional performances. Genome-scale metabolic models were constructed for Arthrobacter, the atrazine degrader, and four other non-atrazine degrading species whose relative abundance in soil was changed following exposure to the herbicide. By modeling community function we show that consortia including the direct degrader and non-degrader differentially abundant species perform better than Arthrobacter alone. Simulations predict that growth/degradation enhancement is derived by metabolic exchanges between community members. Based on simulations we designed endogenous consortia optimized for enhanced degradation whose performances were validated in vitro and biostimulation strategies that were tested in pot experiments. Overall, our analysis demonstrates that understanding community function in its wider context, beyond the single direct degrader perspective, promotes the design of biostimulation strategies.
Collapse
Affiliation(s)
- Xihui Xu
- Department of Microbiology, Key Lab of Microbiology for Agricultural Environment, Ministry of Agriculture, College of Life Sciences, Nanjing Agricultural University, Nanjing, 210095, China.,Newe Ya'ar Research Center, Agricultural Research Organization, P.O. Box 1021, Ramat Yishay, 30095, Israel
| | - Raphy Zarecki
- Newe Ya'ar Research Center, Agricultural Research Organization, P.O. Box 1021, Ramat Yishay, 30095, Israel
| | - Shlomit Medina
- Newe Ya'ar Research Center, Agricultural Research Organization, P.O. Box 1021, Ramat Yishay, 30095, Israel
| | - Shany Ofaim
- Newe Ya'ar Research Center, Agricultural Research Organization, P.O. Box 1021, Ramat Yishay, 30095, Israel.,Faculty of Biotechnology and Food Engineering, Technion-Israel Institute of Technology, Haifa, Israel
| | - Xiaowei Liu
- Department of Microbiology, Key Lab of Microbiology for Agricultural Environment, Ministry of Agriculture, College of Life Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| | - Chen Chen
- Department of Microbiology, Key Lab of Microbiology for Agricultural Environment, Ministry of Agriculture, College of Life Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| | - Shunli Hu
- Department of Microbiology, Key Lab of Microbiology for Agricultural Environment, Ministry of Agriculture, College of Life Sciences, Nanjing Agricultural University, Nanjing, 210095, China
| | - Dan Brom
- Newe Ya'ar Research Center, Agricultural Research Organization, P.O. Box 1021, Ramat Yishay, 30095, Israel
| | - Daniella Gat
- Department of Environmental Hydrology and Microbiology, The Zuckerberg Institute for Water Research, Ben-Gurion University of the Negev, Sede-Boqer Campus, Sede-Boqer, 8499000, Israel
| | - Seema Porob
- Department of Environmental Hydrology and Microbiology, The Zuckerberg Institute for Water Research, Ben-Gurion University of the Negev, Sede-Boqer Campus, Sede-Boqer, 8499000, Israel
| | - Hanan Eizenberg
- Newe Ya'ar Research Center, Agricultural Research Organization, P.O. Box 1021, Ramat Yishay, 30095, Israel
| | - Zeev Ronen
- Department of Environmental Hydrology and Microbiology, The Zuckerberg Institute for Water Research, Ben-Gurion University of the Negev, Sede-Boqer Campus, Sede-Boqer, 8499000, Israel
| | - Jiandong Jiang
- Department of Microbiology, Key Lab of Microbiology for Agricultural Environment, Ministry of Agriculture, College of Life Sciences, Nanjing Agricultural University, Nanjing, 210095, China.
| | - Shiri Freilich
- Newe Ya'ar Research Center, Agricultural Research Organization, P.O. Box 1021, Ramat Yishay, 30095, Israel.
| |
Collapse
|
21
|
Methods for automated genome-scale metabolic model reconstruction. Biochem Soc Trans 2018; 46:931-936. [PMID: 30065105 DOI: 10.1042/bst20170246] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 06/04/2018] [Accepted: 06/06/2018] [Indexed: 11/17/2022]
Abstract
In the era of next-generation sequencing and ubiquitous assembly and binning of metagenomes, new putative genome sequences are being produced from isolate and microbiome samples at ever-increasing rates. Genome-scale metabolic models have enormous utility for supporting the analysis and predictive characterization of these genomes based on sequence data. As a result, tools for rapid automated reconstruction of metabolic models are becoming critically important for supporting the analysis of new genome sequences. Many tools and algorithms have now emerged to support rapid model reconstruction and analysis. Here, we are comparing and contrasting the capabilities and output of a variety of these tools, including ModelSEED, Raven Toolbox, PathwayTools, SuBliMinal Toolbox and merlin.
Collapse
|
22
|
Karp PD, Weaver D, Latendresse M. How accurate is automated gap filling of metabolic models? BMC SYSTEMS BIOLOGY 2018; 12:73. [PMID: 29914471 PMCID: PMC6006690 DOI: 10.1186/s12918-018-0593-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 05/31/2018] [Indexed: 12/20/2022]
Abstract
Background Reaction gap filling is a computational technique for proposing the addition of reactions to genome-scale metabolic models to permit those models to run correctly. Gap filling completes what are otherwise incomplete models that lack fully connected metabolic networks. The models are incomplete because they are derived from annotated genomes in which not all enzymes have been identified. Here we compare the results of applying an automated likelihood-based gap filler within the Pathway Tools software with the results of manually gap filling the same metabolic model. Both gap-filling exercises were applied to the same genome-derived qualitative metabolic reconstruction for Bifidobacterium longum subsp. longum JCM 1217, and to the same modeling conditions — anaerobic growth under four nutrients producing 53 biomass metabolites. Results The solution computed by the gap-filling program GenDev contained 12 reactions, but closer examination showed that solution was not minimal; two of the twelve reactions can be removed to yield a set of ten reactions that enable model growth. The manually curated solution contained 13 reactions, eight of which were shared with the 12-reaction computed solution. Thus, GenDev achieved recall of 61.5% and precision of 66.6%. These results suggest that although computational gap fillers are populating metabolic models with significant numbers of correct reactions, automatically gap-filled metabolic models also contain significant numbers of incorrect reactions. Conclusions Our conclusion is that manual curation of gap-filler results is needed to obtain high-accuracy models. Many of the differences between the manual and automatic solutions resulted from using expert biological knowledge to direct the choice of reactions within the curated solution, such as reactions specific to the anaerobic lifestyle of B. longum. Electronic supplementary material The online version of this article (10.1186/s12918-018-0593-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, 94025, USA.
| | - Daniel Weaver
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, 94025, USA
| | - Mario Latendresse
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, 94025, USA
| |
Collapse
|
23
|
Latendresse M, Karp PD. Evaluation of reaction gap-filling accuracy by randomization. BMC Bioinformatics 2018; 19:53. [PMID: 29444634 PMCID: PMC5813426 DOI: 10.1186/s12859-018-2050-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 01/31/2018] [Indexed: 12/18/2022] Open
Abstract
Background Completion of genome-scale flux-balance models using computational reaction gap-filling is a widely used approach, but its accuracy is not well known. Results We report on computational experiments of reaction gap filling in which we generated degraded versions of the EcoCyc-20.0-GEM model by randomly removing flux-carrying reactions from a growing model. We gap-filled the degraded models and compared the resulting gap-filled models with the original model. Gap-filling was performed by the Pathway Tools MetaFlux software using its General Development Mode (GenDev) and its Fast Development Mode (FastDev). We explored 12 GenDev variants including two linear solvers (SCIP and CPLEX) for solving the Mixed Integer Linear Programming (MILP) problems for gap filling; three different sets of linear constraints were applied; and two MILP methods were implemented. We compared these 13 variants according to accuracy, speed, and amount of information returned to the user. Conclusions We observed large variation among the performance of the 13 gap-filling variants. Although no variant was best in all dimensions, we found one variant that was fast, accurate, and returned more information to the user. Some gap-filling variants were inaccurate, producing solutions that were non-minimum or invalid (did not enable model growth). The best GenDev variant showed a best average precision of 87% and a best average recall of 61%. FastDev showed an average precision of 71% and an average recall of 59%. Thus, using the most accurate variant, approximately 13% of the gap-filled reactions were incorrect (were not the reactions removed from the model), and 39% of gap-filled reactions were not found, suggesting that curation is still an important aspect of metabolic-model development.
Collapse
Affiliation(s)
- Mario Latendresse
- SRI International/Artificial Intelligence Center, 333 Ravenswood Ave, Menlo Park, 94025, USA.
| | - Peter D Karp
- SRI International/Artificial Intelligence Center, 333 Ravenswood Ave, Menlo Park, 94025, USA
| |
Collapse
|
24
|
Goldberg AP, Szigeti B, Chew YH, Sekar JA, Roth YD, Karr JR. Emerging whole-cell modeling principles and methods. Curr Opin Biotechnol 2017; 51:97-102. [PMID: 29275251 DOI: 10.1016/j.copbio.2017.12.013] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Revised: 12/07/2017] [Accepted: 12/11/2017] [Indexed: 11/16/2022]
Abstract
Whole-cell computational models aim to predict cellular phenotypes from genotype by representing the entire genome, the structure and concentration of each molecular species, each molecular interaction, and the extracellular environment. Whole-cell models have great potential to transform bioscience, bioengineering, and medicine. However, numerous challenges remain to achieve whole-cell models. Nevertheless, researchers are beginning to leverage recent progress in measurement technology, bioinformatics, data sharing, rule-based modeling, and multi-algorithmic simulation to build the first whole-cell models. We anticipate that ongoing efforts to develop scalable whole-cell modeling tools will enable dramatically more comprehensive and more accurate models, including models of human cells.
Collapse
Affiliation(s)
- Arthur P Goldberg
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Balázs Szigeti
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Yin Hoon Chew
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - John Ap Sekar
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Yosef D Roth
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jonathan R Karr
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
25
|
Johnson SR, Lange I, Srividya N, Lange BM. Bioenergetics of Monoterpenoid Essential Oil Biosynthesis in Nonphotosynthetic Glandular Trichomes. PLANT PHYSIOLOGY 2017; 175:681-695. [PMID: 28838953 PMCID: PMC5619891 DOI: 10.1104/pp.17.00551] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 08/22/2017] [Indexed: 05/10/2023]
Abstract
The commercially important essential oils of peppermint (Mentha × piperita) and its relatives in the mint family (Lamiaceae) are accumulated in specialized anatomical structures called glandular trichomes (GTs). A genome-scale stoichiometric model of secretory phase metabolism in peppermint GTs was constructed based on current biochemical and physiological knowledge. Fluxes through the network were predicted based on metabolomic and transcriptomic data. Using simulated reaction deletions, this model predicted that two processes, the regeneration of ATP and ferredoxin (in its reduced form), exert substantial control over flux toward monoterpenes. Follow-up biochemical assays with isolated GTs indicated that oxidative phosphorylation and ethanolic fermentation were active and that cooperation to provide ATP depended on the concentration of the carbon source. We also report that GTs with high flux toward monoterpenes express, at very high levels, genes coding for a unique pair of ferredoxin and ferredoxin-NADP+ reductase isoforms. This study provides, to our knowledge, the first evidence of how bioenergetic processes determine flux through monoterpene biosynthesis in GTs.
Collapse
Affiliation(s)
- Sean R Johnson
- Institute of Biological Chemistry and M.J. Murdock Metabolomics Laboratory, Washington State University, Pullman, Washington 99164-6340
| | - Iris Lange
- Institute of Biological Chemistry and M.J. Murdock Metabolomics Laboratory, Washington State University, Pullman, Washington 99164-6340
| | - Narayanan Srividya
- Institute of Biological Chemistry and M.J. Murdock Metabolomics Laboratory, Washington State University, Pullman, Washington 99164-6340
| | - B Markus Lange
- Institute of Biological Chemistry and M.J. Murdock Metabolomics Laboratory, Washington State University, Pullman, Washington 99164-6340
| |
Collapse
|
26
|
Gerritsen J, Hornung B, Renckens B, van Hijum SA, Martins dos Santos VA, Rijkers GT, Schaap PJ, de Vos WM, Smidt H. Genomic and functional analysis of Romboutsia ilealis CRIB T reveals adaptation to the small intestine. PeerJ 2017; 5:e3698. [PMID: 28924494 PMCID: PMC5598433 DOI: 10.7717/peerj.3698] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 07/26/2017] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND The microbiota in the small intestine relies on their capacity to rapidly import and ferment available carbohydrates to survive in a complex and highly competitive ecosystem. Understanding how these communities function requires elucidating the role of its key players, the interactions among them and with their environment/host. METHODS The genome of the gut bacterium Romboutsia ilealis CRIBT was sequenced with multiple technologies (Illumina paired-end, mate-pair and PacBio). The transcriptome was sequenced (Illumina HiSeq) after growth on three different carbohydrate sources, and short chain fatty acids were measured via HPLC. RESULTS We present the complete genome of Romboutsia ilealis CRIBT, a natural inhabitant and key player of the small intestine of rats. R. ilealis CRIBT possesses a circular chromosome of 2,581,778 bp and a plasmid of 6,145 bp, carrying 2,351 and eight predicted protein coding sequences, respectively. Analysis of the genome revealed limited capacity to synthesize amino acids and vitamins, whereas multiple and partially redundant pathways for the utilization of different relatively simple carbohydrates are present. Transcriptome analysis allowed identification of the key components in the degradation of glucose, L-fucose and fructo-oligosaccharides. DISCUSSION This revealed that R. ilealis CRIBT is adapted to a nutrient-rich environment where carbohydrates, amino acids and vitamins are abundantly available.
Collapse
Affiliation(s)
- Jacoline Gerritsen
- Laboratory of Microbiology, Wageningen University & Research, Wageningen, The Netherlands
- Winclove Probiotics, Amsterdam, The Netherlands
| | - Bastian Hornung
- Laboratory of Microbiology, Wageningen University & Research, Wageningen, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands
| | - Bernadette Renckens
- Nijmegen Centre for Molecular Life Sciences, CMBI, Radboud UMC, Nijmegen, The Netherlands
| | - Sacha A.F.T. van Hijum
- Nijmegen Centre for Molecular Life Sciences, CMBI, Radboud UMC, Nijmegen, The Netherlands
- NIZO, Ede, The Netherlands
| | - Vitor A.P. Martins dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands
- LifeGlimmer GmbH, Berlin, Germany
| | - Ger T. Rijkers
- Laboratory for Medical Microbiology and Immunology, St. Antonius Hospital, Nieuwegein, The Netherlands
- Department of Science, University College Roosevelt, Middelburg, The Netherlands
| | - Peter J. Schaap
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands
| | - Willem M. de Vos
- Laboratory of Microbiology, Wageningen University & Research, Wageningen, The Netherlands
- Departments of Microbiology and Immunology and Veterinary Biosciences, University of Helsinki, Helsinki, Finland
| | - Hauke Smidt
- Laboratory of Microbiology, Wageningen University & Research, Wageningen, The Netherlands
| |
Collapse
|
27
|
Hahn AS, Altman T, Konwar KM, Hanson NW, Kim D, Relman DA, Dill DL, Hallam SJ. A geographically-diverse collection of 418 human gut microbiome pathway genome databases. Sci Data 2017; 4:170035. [PMID: 28398290 PMCID: PMC5387927 DOI: 10.1038/sdata.2017.35] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 02/10/2017] [Indexed: 01/16/2023] Open
Abstract
Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn's disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools.
Collapse
Affiliation(s)
- Aria S Hahn
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada.,Koonkie Inc., Menlo Park, California 94025, USA
| | - Tomer Altman
- Biomedical Informatics, Stanford University School of Medicine, Stanford, California 94305, USA.,Whole Biome, Inc., 953 Indiana Street, San Francisco, California 94107, USA
| | - Kishori M Konwar
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada.,Koonkie Inc., Menlo Park, California 94025, USA.,Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Niels W Hanson
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada
| | - Dongjae Kim
- Department of Computer Science, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada
| | - David A Relman
- Department of Microbiology and Immunology, Stanford University School of Medicine, 299 Campus Drive, Stanford, California 94305, USA.,Department of Medicine, Stanford University School of Medicine, Stanford, California 94305, USA.,Veterans Affairs Palo Alto Health Care System, Palo Alto, California 94304, USA
| | - David L Dill
- Department of Computer Science, Stanford University, Stanford, California 94305, USA
| | - Steven J Hallam
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada.,Koonkie Inc., Menlo Park, California 94025, USA.,Ecosystem Services, Commercialization and Entrepreneurship (ECOSCOPE), University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada
| |
Collapse
|
28
|
Oyetunde T, Zhang M, Chen Y, Tang Y, Lo C. BoostGAPFILL: improving the fidelity of metabolic network reconstructions through integrated constraint and pattern-based methods. Bioinformatics 2017; 33:608-611. [PMID: 27797784 DOI: 10.1093/bioinformatics/btw684] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 10/25/2016] [Indexed: 11/13/2022] Open
Abstract
Motivation Metabolic network reconstructions are often incomplete. Constraint-based and pattern-based methodologies have been used for automated gap filling of these networks, each with its own strengths and weaknesses. Moreover, since validation of hypotheses made by gap filling tools require experimentation, it is challenging to benchmark performance and make improvements other than that related to speed and scalability. Results We present BoostGAPFILL, an open source tool that leverages both constraint-based and machine learning methodologies for hypotheses generation in gap filling and metabolic model refinement. BoostGAPFILL uses metabolite patterns in the incomplete network captured using a matrix factorization formulation to constrain the set of reactions used to fill gaps in a metabolic network. We formulate a testing framework based on the available metabolic reconstructions and demonstrate the superiority of BoostGAPFILL to state-of-the-art gap filling tools. We randomly delete a number of reactions from a metabolic network and rate the different algorithms on their ability to both predict the deleted reactions from a universal set and to fill gaps. For most metabolic network reconstructions tested, BoostGAPFILL shows above 60% precision and recall, which is more than twice that of other existing tools. Availability and Implementation MATLAB open source implementation ( https://github.com/Tolutola/BoostGAPFILL ). Contacts toyetunde@wustl.edu or muhan@wustl.edu . Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Muhan Zhang
- Department of Computer Science and Engineering, Washington University, Saint Louis, MO 63130, USA
| | - Yixin Chen
- Department of Computer Science and Engineering, Washington University, Saint Louis, MO 63130, USA
| | - Yinjie Tang
- Department of Energy, Environmental and Chemical Engineering
| | - Cynthia Lo
- Department of Energy, Environmental and Chemical Engineering
| |
Collapse
|
29
|
Bogart E, Myers CR. Multiscale Metabolic Modeling of C4 Plants: Connecting Nonlinear Genome-Scale Models to Leaf-Scale Metabolism in Developing Maize Leaves. PLoS One 2016; 11:e0151722. [PMID: 26990967 PMCID: PMC4807923 DOI: 10.1371/journal.pone.0151722] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 03/03/2016] [Indexed: 11/18/2022] Open
Abstract
C4 plants, such as maize, concentrate carbon dioxide in a specialized compartment surrounding the veins of their leaves to improve the efficiency of carbon dioxide assimilation. Nonlinear relationships between carbon dioxide and oxygen levels and reaction rates are key to their physiology but cannot be handled with standard techniques of constraint-based metabolic modeling. We demonstrate that incorporating these relationships as constraints on reaction rates and solving the resulting nonlinear optimization problem yields realistic predictions of the response of C4 systems to environmental and biochemical perturbations. Using a new genome-scale reconstruction of maize metabolism, we build an 18000-reaction, nonlinearly constrained model describing mesophyll and bundle sheath cells in 15 segments of the developing maize leaf, interacting via metabolite exchange, and use RNA-seq and enzyme activity measurements to predict spatial variation in metabolic state by a novel method that optimizes correlation between fluxes and expression data. Though such correlations are known to be weak in general, we suggest that developmental gradients may be particularly suited to the inference of metabolic fluxes from expression data, and we demonstrate that our method predicts fluxes that achieve high correlation with the data, successfully capture the experimentally observed base-to-tip transition between carbon-importing tissue and carbon-exporting tissue, and include a nonzero growth rate, in contrast to prior results from similar methods in other systems.
Collapse
Affiliation(s)
- Eli Bogart
- Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY, United States of America
- Institute of Biotechnology, Cornell University, Ithaca, NY, United States of America
| | - Christopher R. Myers
- Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY, United States of America
- Institute of Biotechnology, Cornell University, Ithaca, NY, United States of America
| |
Collapse
|
30
|
Weber T, Kim HU. The secondary metabolite bioinformatics portal: Computational tools to facilitate synthetic biology of secondary metabolite production. Synth Syst Biotechnol 2016; 1:69-79. [PMID: 29062930 PMCID: PMC5640684 DOI: 10.1016/j.synbio.2015.12.002] [Citation(s) in RCA: 119] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 12/10/2015] [Accepted: 12/26/2015] [Indexed: 01/02/2023] Open
Abstract
Natural products are among the most important sources of lead molecules for drug discovery. With the development of affordable whole-genome sequencing technologies and other ‘omics tools, the field of natural products research is currently undergoing a shift in paradigms. While, for decades, mainly analytical and chemical methods gave access to this group of compounds, nowadays genomics-based methods offer complementary approaches to find, identify and characterize such molecules. This paradigm shift also resulted in a high demand for computational tools to assist researchers in their daily work. In this context, this review gives a summary of tools and databases that currently are available to mine, identify and characterize natural product biosynthesis pathways and their producers based on ‘omics data. A web portal called Secondary Metabolite Bioinformatics Portal (SMBP at http://www.secondarymetabolites.org) is introduced to provide a one-stop catalog and links to these bioinformatics resources. In addition, an outlook is presented how the existing tools and those to be developed will influence synthetic biology approaches in the natural products field.
Collapse
Key Words
- A, adenylation domain
- Antibiotics
- BGC, biosynthetic gene cluster
- Bioinformatics
- Biosynthesis
- C, condensation domain
- GPR, gene-protein-reaction
- HMM, hidden Markov model
- LC, liquid chromatography
- MS, mass spectrometry
- NMR, nuclear magnetic resonance
- NRP, non-ribosomally synthesized peptide
- NRPS
- NRPS, non-ribosomal peptide synthetase
- Natural product
- PCP, peptidyl carrier protein
- PK, polyketide
- PKS
- PKS, polyketide synthase
- RiPP, ribosomally and post-translationally modified peptide
- SVM, support vector machine
Collapse
Affiliation(s)
- Tilmann Weber
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kogle Alle 6, 2970 Hørsholm, Denmark
| | - Hyun Uk Kim
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kogle Alle 6, 2970 Hørsholm, Denmark.,BioInformatics Research Center, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea
| |
Collapse
|
31
|
Ponce-de-Leon M, Calle-Espinosa J, Peretó J, Montero F. Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach. PLoS One 2015; 10:e0143626. [PMID: 26629901 PMCID: PMC4668087 DOI: 10.1371/journal.pone.0143626] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 11/06/2015] [Indexed: 01/10/2023] Open
Abstract
Genome-scale metabolic models usually contain inconsistencies that manifest as blocked reactions and gap metabolites. With the purpose to detect recurrent inconsistencies in metabolic models, a large-scale analysis was performed using a previously published dataset of 130 genome-scale models. The results showed that a large number of reactions (~22%) are blocked in all the models where they are present. To unravel the nature of such inconsistencies a metamodel was construed by joining the 130 models in a single network. This metamodel was manually curated using the unconnected modules approach, and then, it was used as a reference network to perform a gap-filling on each individual genome-scale model. Finally, a set of 36 models that had not been considered during the construction of the metamodel was used, as a proof of concept, to extend the metamodel with new biochemical information, and to assess its impact on gap-filling results. The analysis performed on the metamodel allowed to conclude: 1) the recurrent inconsistencies found in the models were already present in the metabolic database used during the reconstructions process; 2) the presence of inconsistencies in a metabolic database can be propagated to the reconstructed models; 3) there are reactions not manifested as blocked which are active as a consequence of some classes of artifacts, and; 4) the results of an automatic gap-filling are highly dependent on the consistency and completeness of the metamodel or metabolic database used as the reference network. In conclusion the consistency analysis should be applied to metabolic databases in order to detect and fill gaps as well as to detect and remove artifacts and redundant information.
Collapse
Affiliation(s)
- Miguel Ponce-de-Leon
- Departamento de Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, Ciudad Universitaria, Madrid 28045, Spain
- * E-mail:
| | - Jorge Calle-Espinosa
- Departamento de Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, Ciudad Universitaria, Madrid 28045, Spain
| | - Juli Peretó
- Departament de Bioquímica i Biologia Molecular and Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, C/José Beltrán 2, Paterna 46980, Spain
| | - Francisco Montero
- Departamento de Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, Ciudad Universitaria, Madrid 28045, Spain
| |
Collapse
|
32
|
Abstract
Most natural microbial systems have evolved to function in environments with temporal and spatial variations. A major limitation to understanding such complex systems is the lack of mathematical modelling frameworks that connect the genomes of individual species and temporal and spatial variations in the environment to system behaviour. The goal of this review is to introduce the emerging field of spatiotemporal metabolic modelling based on genome-scale reconstructions of microbial metabolism. The extension of flux balance analysis (FBA) to account for both temporal and spatial variations in the environment is termed spatiotemporal FBA (SFBA). Following a brief overview of FBA and its established dynamic extension, the SFBA problem is introduced and recent progress is described. Three case studies are reviewed to illustrate the current state-of-the-art and possible future research directions are outlined. The author posits that SFBA is the next frontier for microbial metabolic modelling and a rapid increase in methods development and system applications is anticipated.
Collapse
Affiliation(s)
- Michael A Henson
- Department of Chemical Engineering, University of Massachusetts, Amherst, MA 01003, U.S.A.
| |
Collapse
|
33
|
Zomorrodi AR, Segrè D. Synthetic Ecology of Microbes: Mathematical Models and Applications. J Mol Biol 2015; 428:837-61. [PMID: 26522937 DOI: 10.1016/j.jmb.2015.10.019] [Citation(s) in RCA: 127] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Revised: 10/17/2015] [Accepted: 10/21/2015] [Indexed: 12/29/2022]
Abstract
As the indispensable role of natural microbial communities in many aspects of life on Earth is uncovered, the bottom-up engineering of synthetic microbial consortia with novel functions is becoming an attractive alternative to engineering single-species systems. Here, we summarize recent work on synthetic microbial communities with a particular emphasis on open challenges and opportunities in environmental sustainability and human health. We next provide a critical overview of mathematical approaches, ranging from phenomenological to mechanistic, to decipher the principles that govern the function, dynamics and evolution of microbial ecosystems. Finally, we present our outlook on key aspects of microbial ecosystems and synthetic ecology that require further developments, including the need for more efficient computational algorithms, a better integration of empirical methods and model-driven analysis, the importance of improving gene function annotation, and the value of a standardized library of well-characterized organisms to be used as building blocks of synthetic communities.
Collapse
Affiliation(s)
| | - Daniel Segrè
- Bioinformatics Program, Boston University, Boston, MA; Department of Biology, Boston University, Boston, MA; Department of Biomedical Engineering, Boston University, Boston, MA.
| |
Collapse
|
34
|
Computational Methods for Modification of Metabolic Networks. Comput Struct Biotechnol J 2015; 13:376-81. [PMID: 26106462 PMCID: PMC4477032 DOI: 10.1016/j.csbj.2015.05.004] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Revised: 05/20/2015] [Accepted: 05/24/2015] [Indexed: 11/22/2022] Open
Abstract
In metabolic engineering, modification of metabolic networks is an important biotechnology and a challenging computational task. In the metabolic network modification, we should modify metabolic networks by newly adding enzymes or/and knocking-out genes to maximize the biomass production with minimum side-effect. In this mini-review, we briefly review constraint-based formalizations for Minimum Reaction Cut (MRC) problem where the minimum set of reactions is deleted so that the target compound becomes non-producible from the view point of the flux balance analysis (FBA), elementary mode (EM), and Boolean models. Minimum Reaction Insertion (MRI) problem where the minimum set of reactions is added so that the target compound newly becomes producible is also explained with a similar formalization approach. The relation between the accuracy of the models and the risk of overfitting is also discussed.
Collapse
|
35
|
Land M, Hauser L, Jun SR, Nookaew I, Leuze MR, Ahn TH, Karpinets T, Lund O, Kora G, Wassenaar T, Poudel S, Ussery DW. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics 2015; 15:141-61. [PMID: 25722247 PMCID: PMC4361730 DOI: 10.1007/s10142-015-0433-4] [Citation(s) in RCA: 433] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 02/11/2015] [Accepted: 02/12/2015] [Indexed: 12/18/2022]
Abstract
Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.
Collapse
Affiliation(s)
- Miriam Land
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Loren Hauser
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA
- Department of Microbiology, University of Tennessee, Knoxville, TN 37996 USA
| | - Se-Ran Jun
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Intawat Nookaew
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Michael R. Leuze
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Tae-Hyuk Ahn
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Tatiana Karpinets
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Ole Lund
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark
| | - Guruprased Kora
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Trudy Wassenaar
- Molecular Microbiology and Genomics Consultants, Tannenstr 7, 55576 Zotzenheim, Germany
| | - Suresh Poudel
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA
| | - David W. Ussery
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark
- Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA
| |
Collapse
|
36
|
Transparency in metabolic network reconstruction enables scalable biological discovery. Curr Opin Biotechnol 2015; 34:105-9. [PMID: 25562137 DOI: 10.1016/j.copbio.2014.12.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Revised: 12/11/2014] [Accepted: 12/12/2014] [Indexed: 12/19/2022]
Abstract
Reconstructing metabolic pathways has long been a focus of active research. Now, draft models can be generated from genomic annotation and used to simulate metabolic fluxes of mass and energy at the whole-cell scale. This approach has led to an explosion in the number of functional metabolic network models. However, more models have not led to expanded coverage of metabolic reactions known to occur in the biosphere. Thus, there exists opportunity to reconsider the process of reconstruction and model derivation to better support the less-scalable investigative processes of biocuration and experimentation. Realizing this opportunity to improve our knowledge of metabolism requires developing new tools that make reconstructions more useful by highlighting metabolic network knowledge limitations to guide future research.
Collapse
|
37
|
Systems Approaches to Study Infectious Diseases. SYSTEMS AND SYNTHETIC BIOLOGY 2015. [DOI: 10.1007/978-94-017-9514-2_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
38
|
Cazzaniga P, Damiani C, Besozzi D, Colombo R, Nobile MS, Gaglio D, Pescini D, Molinari S, Mauri G, Alberghina L, Vanoni M. Computational strategies for a system-level understanding of metabolism. Metabolites 2014; 4:1034-87. [PMID: 25427076 PMCID: PMC4279158 DOI: 10.3390/metabo4041034] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 11/05/2014] [Accepted: 11/12/2014] [Indexed: 12/20/2022] Open
Abstract
Cell metabolism is the biochemical machinery that provides energy and building blocks to sustain life. Understanding its fine regulation is of pivotal relevance in several fields, from metabolic engineering applications to the treatment of metabolic disorders and cancer. Sophisticated computational approaches are needed to unravel the complexity of metabolism. To this aim, a plethora of methods have been developed, yet it is generally hard to identify which computational strategy is most suited for the investigation of a specific aspect of metabolism. This review provides an up-to-date description of the computational methods available for the analysis of metabolic pathways, discussing their main advantages and drawbacks. In particular, attention is devoted to the identification of the appropriate scale and level of accuracy in the reconstruction of metabolic networks, and to the inference of model structure and parameters, especially when dealing with a shortage of experimental measurements. The choice of the proper computational methods to derive in silico data is then addressed, including topological analyses, constraint-based modeling and simulation of the system dynamics. A description of some computational approaches to gain new biological knowledge or to formulate hypotheses is finally provided.
Collapse
Affiliation(s)
- Paolo Cazzaniga
- SYSBIO Centre of Systems Biology, Piazza della Scienza 2, 20126 Milano, Italy.
| | - Chiara Damiani
- SYSBIO Centre of Systems Biology, Piazza della Scienza 2, 20126 Milano, Italy.
| | - Daniela Besozzi
- SYSBIO Centre of Systems Biology, Piazza della Scienza 2, 20126 Milano, Italy.
| | - Riccardo Colombo
- SYSBIO Centre of Systems Biology, Piazza della Scienza 2, 20126 Milano, Italy.
| | - Marco S Nobile
- SYSBIO Centre of Systems Biology, Piazza della Scienza 2, 20126 Milano, Italy.
| | - Daniela Gaglio
- SYSBIO Centre of Systems Biology, Piazza della Scienza 2, 20126 Milano, Italy.
| | - Dario Pescini
- SYSBIO Centre of Systems Biology, Piazza della Scienza 2, 20126 Milano, Italy.
| | - Sara Molinari
- Dipartimento di Biotecnologie e Bioscienze, Università degli Studi di Milano-Bicocca, Piazza della Scienza 2, 20126 Milano, Italy.
| | - Giancarlo Mauri
- SYSBIO Centre of Systems Biology, Piazza della Scienza 2, 20126 Milano, Italy.
| | - Lilia Alberghina
- SYSBIO Centre of Systems Biology, Piazza della Scienza 2, 20126 Milano, Italy.
| | - Marco Vanoni
- SYSBIO Centre of Systems Biology, Piazza della Scienza 2, 20126 Milano, Italy.
| |
Collapse
|
39
|
Joice R, Yasuda K, Shafquat A, Morgan XC, Huttenhower C. Determining microbial products and identifying molecular targets in the human microbiome. Cell Metab 2014; 20:731-741. [PMID: 25440055 PMCID: PMC4254638 DOI: 10.1016/j.cmet.2014.10.003] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Human-associated microbes are the source of many bioactive microbial products (proteins and metabolites) that play key functions both in human host pathways and in microbe-microbe interactions. Culture-independent studies now provide an accelerated means of exploring novel bioactives in the human microbiome; however, intriguingly, a substantial fraction of the microbial metagenome cannot be mapped to annotated genes or isolate genomes and is thus of unknown function. Meta'omic approaches, including metagenomic sequencing, metatranscriptomics, metabolomics, and integration of multiple assay types, represent an opportunity to efficiently explore this large pool of potential therapeutics. In combination with appropriate follow-up validation, high-throughput culture-independent assays can be combined with computational approaches to identify and characterize novel and biologically interesting microbial products. Here we briefly review the state of microbial product identification and characterization and discuss possible next steps to catalog and leverage the large uncharted fraction of the microbial metagenome.
Collapse
Affiliation(s)
- Regina Joice
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Koji Yasuda
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Afrah Shafquat
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Xochitl C Morgan
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
40
|
Larocque M, Chénard T, Najmanovich R. A curated C. difficile strain 630 metabolic network: prediction of essential targets and inhibitors. BMC SYSTEMS BIOLOGY 2014; 8:117. [PMID: 25315994 PMCID: PMC4207893 DOI: 10.1186/s12918-014-0117-z] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 10/08/2014] [Indexed: 12/12/2022]
Abstract
BACKGROUND Clostridium difficile is the leading cause of hospital-borne infections occurring when the natural intestinal flora is depleted following antibiotic treatment. Current treatments for Clostridium difficile infections present high relapse rates and new hyper-virulent and multi-resistant strains are emerging, making the study of this nosocomial pathogen necessary to find novel therapeutic targets. RESULTS We present iMLTC806cdf, an extensively curated reconstructed metabolic network for the C. difficile pathogenic strain 630. iMLTC806cdf contains 806 genes, 703 metabolites and 769 metabolic, 117 exchange and 145 transport reactions. iMLTC806cdf is the most complete and accurate metabolic reconstruction of a gram-positive anaerobic bacteria to date. We validate the model with simulated growth assays in different media and carbon sources and use it to predict essential genes. We obtain 89.2% accuracy in the prediction of gene essentiality when compared to experimental data for B. subtilis homologs (the closest organism for which such data exists). We predict the existence of 76 essential genes and 39 essential gene pairs, a number of which are unique to C. difficile and have non-existing or predicted non-essential human homologs. For 29 of these potential therapeutic targets, we find 125 inhibitors of homologous proteins including approved drugs with the potential for drug repositioning, that when validated experimentally could serve as starting points in the development of new antibiotics. CONCLUSIONS We created a highly curated metabolic network model of C. difficile strain 630 and used it to predict essential genes as potential new therapeutic targets in the fight against Clostridium difficile infections.
Collapse
Affiliation(s)
- Mathieu Larocque
- Department of Biochemistry, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, J1H 5N4, Canada.
| | - Thierry Chénard
- Department of Biochemistry, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, J1H 5N4, Canada.
| | - Rafael Najmanovich
- Department of Biochemistry, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, J1H 5N4, Canada.
| |
Collapse
|
41
|
Schmidt R, Waschina S, Boettger-Schmidt D, Kost C, Kaleta C. Computing autocatalytic sets to unravel inconsistencies in metabolic network reconstructions. ACTA ACUST UNITED AC 2014; 31:373-81. [PMID: 25286919 DOI: 10.1093/bioinformatics/btu658] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
MOTIVATION Genome-scale metabolic network reconstructions have been established as a powerful tool for the prediction of cellular phenotypes and metabolic capabilities of organisms. In recent years, the number of network reconstructions has been constantly increasing, mostly because of the availability of novel (semi-)automated procedures, which enabled the reconstruction of metabolic models based on individual genomes and their annotation. The resulting models are widely used in numerous applications. However, the accuracy and predictive power of network reconstructions are commonly limited by inherent inconsistencies and gaps. RESULTS Here we present a novel method to validate metabolic network reconstructions based on the concept of autocatalytic sets. Autocatalytic sets correspond to collections of metabolites that, besides enzymes and a growth medium, are required to produce all biomass components in a metabolic model. These autocatalytic sets are well-conserved across all domains of life, and their identification in specific genome-scale reconstructions allows us to draw conclusions about potential inconsistencies in these models. The method is capable of detecting inconsistencies, which are neglected by other gap-finding methods. We tested our method on the Model SEED, which is the largest repository for automatically generated genome-scale network reconstructions. In this way, we were able to identify a significant number of missing pathways in several of these reconstructions. Hence, the method we report represents a powerful tool to identify inconsistencies in large-scale metabolic networks. AVAILABILITY AND IMPLEMENTATION The method is available as source code on http://users.minet.uni-jena.de/∼m3kach/ASBIG/ASBIG.zip. CONTACT christoph.kaleta@uni-jena.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ralf Schmidt
- Research Group Theoretical Systems Biology, Faculty of Biology and Pharmacy, Friedrich Schiller University Jena, 07743 Jena, Department of Bioorganic Chemistry, Experimental Ecology and Evolution Research Group, Max Planck Institute for Chemical Ecology, 07745 Jena, Department of Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, 07745 Jena and Department of Computational Biology, Institute for Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
| | - Silvio Waschina
- Research Group Theoretical Systems Biology, Faculty of Biology and Pharmacy, Friedrich Schiller University Jena, 07743 Jena, Department of Bioorganic Chemistry, Experimental Ecology and Evolution Research Group, Max Planck Institute for Chemical Ecology, 07745 Jena, Department of Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, 07745 Jena and Department of Computational Biology, Institute for Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark Research Group Theoretical Systems Biology, Faculty of Biology and Pharmacy, Friedrich Schiller University Jena, 07743 Jena, Department of Bioorganic Chemistry, Experimental Ecology and Evolution Research Group, Max Planck Institute for Chemical Ecology, 07745 Jena, Department of Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, 07745 Jena and Department of Computational Biology, Institute for Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
| | - Daniela Boettger-Schmidt
- Research Group Theoretical Systems Biology, Faculty of Biology and Pharmacy, Friedrich Schiller University Jena, 07743 Jena, Department of Bioorganic Chemistry, Experimental Ecology and Evolution Research Group, Max Planck Institute for Chemical Ecology, 07745 Jena, Department of Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, 07745 Jena and Department of Computational Biology, Institute for Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
| | - Christian Kost
- Research Group Theoretical Systems Biology, Faculty of Biology and Pharmacy, Friedrich Schiller University Jena, 07743 Jena, Department of Bioorganic Chemistry, Experimental Ecology and Evolution Research Group, Max Planck Institute for Chemical Ecology, 07745 Jena, Department of Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, 07745 Jena and Department of Computational Biology, Institute for Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
| | - Christoph Kaleta
- Research Group Theoretical Systems Biology, Faculty of Biology and Pharmacy, Friedrich Schiller University Jena, 07743 Jena, Department of Bioorganic Chemistry, Experimental Ecology and Evolution Research Group, Max Planck Institute for Chemical Ecology, 07745 Jena, Department of Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, 07745 Jena and Department of Computational Biology, Institute for Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark Research Group Theoretical Systems Biology, Faculty of Biology and Pharmacy, Friedrich Schiller University Jena, 07743 Jena, Department of Bioorganic Chemistry, Experimental Ecology and Evolution Research Group, Max Planck Institute for Chemical Ecology, 07745 Jena, Department of Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute, 07745 Jena and Department of Computational Biology, Institute for Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense M, Denmark
| |
Collapse
|
42
|
Hanson NW, Konwar KM, Hawley AK, Altman T, Karp PD, Hallam SJ. Metabolic pathways for the whole community. BMC Genomics 2014; 15:619. [PMID: 25048541 PMCID: PMC4137073 DOI: 10.1186/1471-2164-15-619] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 07/08/2014] [Indexed: 11/27/2022] Open
Abstract
Background A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change. Results Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools’ performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients. Conclusions This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-619) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | - Steven J Hallam
- Graduate Program in Bioinformatics, University of British Columbia, Genome Sciences Centre, 100-570 West 7th Avenue, Vancouver, British Columbia V5Z 4S6, Canada.
| |
Collapse
|
43
|
Weaver DS, Keseler IM, Mackie A, Paulsen IT, Karp PD. A genome-scale metabolic flux model of Escherichia coli K-12 derived from the EcoCyc database. BMC SYSTEMS BIOLOGY 2014; 8:79. [PMID: 24974895 PMCID: PMC4086706 DOI: 10.1186/1752-0509-8-79] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2014] [Accepted: 06/19/2014] [Indexed: 12/14/2022]
Abstract
BACKGROUND Constraint-based models of Escherichia coli metabolic flux have played a key role in computational studies of cellular metabolism at the genome scale. We sought to develop a next-generation constraint-based E. coli model that achieved improved phenotypic prediction accuracy while being frequently updated and easy to use. We also sought to compare model predictions with experimental data to highlight open questions in E. coli biology. RESULTS We present EcoCyc-18.0-GEM, a genome-scale model of the E. coli K-12 MG1655 metabolic network. The model is automatically generated from the current state of EcoCyc using the MetaFlux software, enabling the release of multiple model updates per year. EcoCyc-18.0-GEM encompasses 1445 genes, 2286 unique metabolic reactions, and 1453 unique metabolites. We demonstrate a three-part validation of the model that breaks new ground in breadth and accuracy: (i) Comparison of simulated growth in aerobic and anaerobic glucose culture with experimental results from chemostat culture and simulation results from the E. coli modeling literature. (ii) Essentiality prediction for the 1445 genes represented in the model, in which EcoCyc-18.0-GEM achieves an improved accuracy of 95.2% in predicting the growth phenotype of experimental gene knockouts. (iii) Nutrient utilization predictions under 431 different media conditions, for which the model achieves an overall accuracy of 80.7%. The model's derivation from EcoCyc enables query and visualization via the EcoCyc website, facilitating model reuse and validation by inspection. We present an extensive investigation of disagreements between EcoCyc-18.0-GEM predictions and experimental data to highlight areas of interest to E. coli modelers and experimentalists, including 70 incorrect predictions of gene essentiality on glucose, 80 incorrect predictions of gene essentiality on glycerol, and 83 incorrect predictions of nutrient utilization. CONCLUSION Significant advantages can be derived from the combination of model organism databases and flux balance modeling represented by MetaFlux. Interpretation of the EcoCyc database as a flux balance model results in a highly accurate metabolic model and provides a rigorous consistency check for information stored in the database.
Collapse
Affiliation(s)
- Daniel S Weaver
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave., 94025 Menlo Park, CA, USA
| | - Ingrid M Keseler
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave., 94025 Menlo Park, CA, USA
| | - Amanda Mackie
- Department of Chemistry and Biomolecular Science, Macquarie University, Balaclava Rd, North Ryde NSW 2109, Australia
| | - Ian T Paulsen
- Department of Chemistry and Biomolecular Science, Macquarie University, Balaclava Rd, North Ryde NSW 2109, Australia
| | - Peter D Karp
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave., 94025 Menlo Park, CA, USA
| |
Collapse
|
44
|
Latendresse M. Efficiently gap-filling reaction networks. BMC Bioinformatics 2014; 15:225. [PMID: 24972703 PMCID: PMC4094995 DOI: 10.1186/1471-2105-15-225] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2014] [Accepted: 06/24/2014] [Indexed: 11/23/2022] Open
Abstract
Background Flux Balance Analysis (FBA) is a genome-scale computational technique for modeling the steady-state fluxes of an organism’s reaction network. When the organism’s reaction network needs to be completed to obtain growth using FBA, without relying on the genome, the completion process is called reaction gap-filling. Currently, computational techniques used to gap-fill a reaction network compute the minimum set of reactions using Mixed-Integer Linear Programming (MILP). Depending on the number of candidate reactions used to complete the model, MILP can be computationally demanding. Results We present a computational technique, called FastGapFilling, that efficiently completes a reaction network by using only Linear Programming, not MILP. FastGapFilling creates a linear program with all candidate reactions, an objective function based on their weighted fluxes, and a variable weight on the biomass reaction: no integer variable is used. A binary search is performed by modifying the weight applied to the flux of the biomass reaction, and solving each corresponding linear program, to try reducing the number of candidate reactions to add to the network to generate a working model. We show that this method has proved effective on a series of incomplete E. coli and yeast models with, in some cases, a three orders of magnitude execution speedup compared with MILP. We have implemented FastGapFilling in MetaFlux as part of Pathway Tools (version 17.5), which is freely available to academic users, and for a fee to commercial users. Download from: biocyc.org/download.shtml. Conclusions The computational technique presented is very efficient allowing interactive completion of reaction networks of FBA models. Computational techniques based on MILP cannot offer such fast and interactive completion.
Collapse
Affiliation(s)
- Mario Latendresse
- Bioinformatics Research Group/Artificial Intelligence Center, SRI International, 333 Ravenswood Ave, Menlo Park 94025, USA.
| |
Collapse
|
45
|
Keseler IM, Skrzypek M, Weerasinghe D, Chen AY, Fulcher C, Li GW, Lemmer KC, Mladinich KM, Chow ED, Sherlock G, Karp PD. Curation accuracy of model organism databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau058. [PMID: 24923819 PMCID: PMC4207230 DOI: 10.1093/database/bau058] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Manual extraction of information from the biomedical literature-or biocuration-is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org//
Collapse
Affiliation(s)
- Ingrid M Keseler
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Marek Skrzypek
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Deepika Weerasinghe
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Albert Y Chen
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Carol Fulcher
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Gene-Wei Li
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Kimberly C Lemmer
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Katherine M Mladinich
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Edmond D Chow
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Gavin Sherlock
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Peter D Karp
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| |
Collapse
|
46
|
High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource. Proc Natl Acad Sci U S A 2014; 111:9645-50. [PMID: 24927599 DOI: 10.1073/pnas.1401329111] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.
Collapse
|
47
|
Hamilton JJ, Reed JL. Software platforms to facilitate reconstructing genome-scale metabolic networks. Environ Microbiol 2013; 16:49-59. [PMID: 24148076 DOI: 10.1111/1462-2920.12312] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2013] [Accepted: 10/12/2013] [Indexed: 12/24/2022]
Abstract
System-level analyses of microbial metabolism are facilitated by genome-scale reconstructions of microbial biochemical networks. A reconstruction provides a structured representation of the biochemical transformations occurring within an organism, as well as the genes necessary to carry out these transformations, as determined by the annotated genome sequence and experimental data. Network reconstructions also serve as platforms for constraint-based computational techniques, which facilitate biological studies in a variety of applications, including evaluation of network properties, metabolic engineering and drug discovery. Bottom-up metabolic network reconstructions have been developed for dozens of organisms, but until recently, the pace of reconstruction has failed to keep up with advances in genome sequencing. To address this problem, a number of software platforms have been developed to automate parts of the reconstruction process, thereby alleviating much of the manual effort previously required. Here, we review four such platforms in the context of established guidelines for network reconstruction. While many steps of the reconstruction process have been successfully automated, some manual evaluation of the results is still required to ensure a high-quality reconstruction. Widespread adoption of these platforms by the scientific community is underway and will be further enabled by exchangeable formats across platforms.
Collapse
Affiliation(s)
- Joshua J Hamilton
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | |
Collapse
|
48
|
Ponce-de-León M, Montero F, Peretó J. Solving gap metabolites and blocked reactions in genome-scale models: application to the metabolic network of Blattabacterium cuenoti. BMC SYSTEMS BIOLOGY 2013; 7:114. [PMID: 24176055 PMCID: PMC3819652 DOI: 10.1186/1752-0509-7-114] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Accepted: 10/23/2013] [Indexed: 11/20/2022]
Abstract
Background Metabolic reconstruction is the computational-based process that aims to elucidate the network of metabolites interconnected through reactions catalyzed by activities assigned to one or more genes. Reconstructed models may contain inconsistencies that appear as gap metabolites and blocked reactions. Although automatic methods for solving this problem have been previously developed, there are many situations where manual curation is still needed. Results We introduce a general definition of gap metabolite that allows its detection in a straightforward manner. Moreover, a method for the detection of Unconnected Modules, defined as isolated sets of blocked reactions connected through gap metabolites, is proposed. The method has been successfully applied to the curation of iCG238, the genome-scale metabolic model for the bacterium Blattabacterium cuenoti, obligate endosymbiont of cockroaches. Conclusion We found the proposed approach to be a valuable tool for the curation of genome-scale metabolic models. The outcome of its application to the genome-scale model B. cuenoti iCG238 is a more accurate model version named as B. cuenoti iMP240.
Collapse
Affiliation(s)
| | - Francisco Montero
- Departamento de Bioquímica y Biología Molecular I, Facultad de Ciencias Químicas, Universidad Complutense de Madrid, Ciudad Universitaria, Madrid 28045, Spain.
| | | |
Collapse
|
49
|
Semi-automated curation of metabolic models via flux balance analysis: a case study with Mycoplasma gallisepticum. PLoS Comput Biol 2013; 9:e1003208. [PMID: 24039564 PMCID: PMC3764002 DOI: 10.1371/journal.pcbi.1003208] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Accepted: 07/19/2013] [Indexed: 11/19/2022] Open
Abstract
Primarily used for metabolic engineering and synthetic biology, genome-scale metabolic modeling shows tremendous potential as a tool for fundamental research and curation of metabolism. Through a novel integration of flux balance analysis and genetic algorithms, a strategy to curate metabolic networks and facilitate identification of metabolic pathways that may not be directly inferable solely from genome annotation was developed. Specifically, metabolites involved in unknown reactions can be determined, and potentially erroneous pathways can be identified. The procedure developed allows for new fundamental insight into metabolism, as well as acting as a semi-automated curation methodology for genome-scale metabolic modeling. To validate the methodology, a genome-scale metabolic model for the bacterium Mycoplasma gallisepticum was created. Several reactions not predicted by the genome annotation were postulated and validated via the literature. The model predicted an average growth rate of 0.358±0.12, closely matching the experimentally determined growth rate of M. gallisepticum of 0.244±0.03. This work presents a powerful algorithm for facilitating the identification and curation of previously known and new metabolic pathways, as well as presenting the first genome-scale reconstruction of M. gallisepticum. Flux balance analysis (FBA) is a powerful approach for genome-scale metabolic modeling. It provides metabolic engineers with a tool for manipulating, predicting, and optimizing metabolism for biotechnological and biomedical purposes. However, we posit that it can also be used as tool for fundamental research in understanding and curating metabolic networks. Specifically, by using a genetic algorithm integrated with FBA, we developed a curation approach to identify missing reactions, incomplete reactions, and erroneous reactions. Additionally, it was possible to take advantage of the ensemble information from the genetic algorithm to identify the most critical reactions for curation. We tested our strategy using Mycoplasma gallisepticum as our model organism. Using the genome annotation as the basis, the preliminary genome-scale metabolic model consisted of 446 metabolites involved in 380 reactions. Carrying out our analysis, we found over 80 incorrect reactions and 16 missing reactions. Based upon the guidance of the algorithm, we were able to curate and resolve all discrepancies. The model predicted an average bacterial growth rate of 0.358±0.12 h−1 compared to the experimentally observed 0.244±0.03 h−1. Thus, our approach facilitated the curation of a genome-scale metabolic network and generated a high quality metabolic model.
Collapse
|
50
|
Fearnley LG, Davis MJ, Ragan MA, Nielsen LK. Extracting reaction networks from databases-opening Pandora's box. Brief Bioinform 2013; 15:973-83. [PMID: 23946492 PMCID: PMC4239801 DOI: 10.1093/bib/bbt058] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Large quantities of information describing the mechanisms of biological pathways continue to be collected in publicly available databases. At the same time, experiments have increased in scale, and biologists increasingly use pathways defined in online databases to interpret the results of experiments and generate hypotheses. Emerging computational techniques that exploit the rich biological information captured in reaction systems require formal standardized descriptions of pathways to extract these reaction networks and avoid the alternative: time-consuming and largely manual literature-based network reconstruction. Here, we systematically evaluate the effects of commonly used knowledge representations on the seemingly simple task of extracting a reaction network describing signal transduction from a pathway database. We show that this process is in fact surprisingly difficult, and the pathway representations adopted by various knowledge bases have dramatic consequences for reaction network extraction, connectivity, capture of pathway crosstalk and in the modelling of cell-cell interactions. Researchers constructing computational models built from automatically extracted reaction networks must therefore consider the issues we outline in this review to maximize the value of existing pathway knowledge.
Collapse
|