1
|
Emerging methods for genome-scale metabolic modeling of microbial communities. Trends Endocrinol Metab 2024:S1043-2760(24)00062-6. [PMID: 38575441 DOI: 10.1016/j.tem.2024.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/28/2024] [Accepted: 02/29/2024] [Indexed: 04/06/2024]
Abstract
Genome-scale metabolic models (GEMs) are consolidating as platforms for studying mixed microbial populations, by combining biological data and knowledge with mathematical rigor. However, deploying these models to answer research questions can be challenging due to the increasing number of available computational tools, the lack of universal standards, and their inherent limitations. Here, we present a comprehensive overview of foundational concepts for building and evaluating genome-scale models of microbial communities. We then compare tools in terms of requirements, capabilities, and applications. Next, we highlight the current pitfalls and open challenges to consider when adopting existing tools and developing new ones. Our compendium can be relevant for the expanding community of modelers, both at the entry and experienced levels.
Collapse
|
2
|
Mechanism-aware and multimodal AI: beyond model-agnostic interpretation. Trends Cell Biol 2024; 34:85-89. [PMID: 38087709 DOI: 10.1016/j.tcb.2023.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 11/03/2023] [Accepted: 11/07/2023] [Indexed: 02/04/2024]
Abstract
Artificial intelligence (AI) is widely used for exploiting multimodal biomedical data, with increasingly accurate predictions and model-agnostic interpretations, which are however also agnostic to biological mechanisms. Combining metabolic modelling, 'omics, and imaging data via multimodal AI can generate predictions that can be interpreted mechanistically and transparently, therefore with significantly higher therapeutic potential.
Collapse
|
3
|
Uncovering potential diagnostic and pathophysiological roles of α-synuclein and DJ-1 in melanoma. Cancer Med 2024; 13. [PMID: 38189631 PMCID: PMC10807602 DOI: 10.1002/cam4.6900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/20/2023] [Accepted: 12/19/2023] [Indexed: 01/09/2024] Open
Abstract
BACKGROUND Melanoma, the most lethal skin cancer type, occurs more frequently in Parkinson's disease (PD), and PD is more frequent in melanoma patients, suggesting disease mechanisms overlap. α-synuclein, a protein that accumulates in PD brain, and the oncogene DJ-1, which is associated with PD autosomal recessive forms, are both elevated in melanoma cells. Whether this indicates melanoma progression or constitutes a protective response remains unclear. We hereby investigated the molecular mechanisms through which α-synuclein and DJ-1 interact, suggesting novel biomarkers and targets in melanoma. METHODS The Cancer Genome Atlas (TCGA) expression profiles derived from UCSC Xena were used to obtain α-synuclein and DJ-1 expression and correlated with survival in skin cutaneous melanoma (SKCM). Immunohistochemistry determined the expression in metastatic melanoma lymph nodes. Protein-protein interactions (PPIs) and molecular docking assessed protein binding and affinity with chemotherapeutic drugs. Further validation was performed using in vitro cellular models and ELISA immunoassays. RESULTS α-synuclein and DJ-1 were upregulated in primary and metastatic SKCM. Aggregated α-synuclein was selectively detected in metastatic melanoma lymph nodes. α-synuclein overexpression in SK-MEL-28 cells induced the expression of DJ-1, supporting PPI and a positive correlation in melanoma patients. Molecular docking revealed a stable protein complex, with differential binding to chemotherapy drugs such as temozolomide, dacarbazine, and doxorubicin. Parallel reduction of both proteins in temozolomide-treated SK-MEL-28 spheroids suggests drug binding may affect protein interaction and/or stability. CONCLUSION α-synuclein, together with DJ-1, may play a role in melanoma progression and chemosensitivity, constituting novel targets for therapeutic intervention, and possible biomarkers for melanoma.
Collapse
|
4
|
Abstract
Data are the most important elements of bioinformatics: Computational analysis of bioinformatics data, in fact, can help researchers infer new knowledge about biology, chemistry, biophysics, and sometimes even medicine, influencing treatments and therapies for patients. Bioinformatics and high-throughput biological data coming from different sources can even be more helpful, because each of these different data chunks can provide alternative, complementary information about a specific biological phenomenon, similar to multiple photos of the same subject taken from different angles. In this context, the integration of bioinformatics and high-throughput biological data gets a pivotal role in running a successful bioinformatics study. In the last decades, data originating from proteomics, metabolomics, metagenomics, phenomics, transcriptomics, and epigenomics have been labelled -omics data, as a unique name to refer to them, and the integration of these omics data has gained importance in all biological areas. Even if this omics data integration is useful and relevant, due to its heterogeneity, it is not uncommon to make mistakes during the integration phases. We therefore decided to present these ten quick tips to perform an omics data integration correctly, avoiding common mistakes we experienced or noticed in published studies in the past. Even if we designed our ten guidelines for beginners, by using a simple language that (we hope) can be understood by anyone, we believe our ten recommendations should be taken into account by all the bioinformaticians performing omics data integration, including experts.
Collapse
|
5
|
Editorial: Artificial intelligence for data discovery and reuse in endocrinology and metabolism. Front Endocrinol (Lausanne) 2023; 14:1180254. [PMID: 37214239 PMCID: PMC10196622 DOI: 10.3389/fendo.2023.1180254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 04/18/2023] [Indexed: 05/24/2023] Open
|
6
|
Integration of epigenetic regulatory mechanisms in heart failure. Basic Res Cardiol 2023; 118:16. [PMID: 37140699 PMCID: PMC10158703 DOI: 10.1007/s00395-023-00986-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/27/2023] [Accepted: 04/10/2023] [Indexed: 05/05/2023]
Abstract
The number of "omics" approaches is continuously growing. Among others, epigenetics has appeared as an attractive area of investigation by the cardiovascular research community, notably considering its association with disease development. Complex diseases such as cardiovascular diseases have to be tackled using methods integrating different omics levels, so called "multi-omics" approaches. These approaches combine and co-analyze different levels of disease regulation. In this review, we present and discuss the role of epigenetic mechanisms in regulating gene expression and provide an integrated view of how these mechanisms are interlinked and regulate the development of cardiac disease, with a particular attention to heart failure. We focus on DNA, histone, and RNA modifications, and discuss the current methods and tools used for data integration and analysis. Enhancing the knowledge of these regulatory mechanisms may lead to novel therapeutic approaches and biomarkers for precision healthcare and improved clinical outcomes.
Collapse
|
7
|
Glycosylation spectral signatures for glioma grade discrimination using Raman spectroscopy. BMC Cancer 2023; 23:174. [PMID: 36809974 PMCID: PMC9942363 DOI: 10.1186/s12885-023-10588-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/12/2023] [Accepted: 01/27/2023] [Indexed: 02/23/2023] Open
Abstract
BACKGROUND Gliomas are the most common brain tumours with the high-grade glioblastoma representing the most aggressive and lethal form. Currently, there is a lack of specific glioma biomarkers that would aid tumour subtyping and minimally invasive early diagnosis. Aberrant glycosylation is an important post-translational modification in cancer and is implicated in glioma progression. Raman spectroscopy (RS), a vibrational spectroscopic label-free technique, has already shown promise in cancer diagnostics. METHODS RS was combined with machine learning to discriminate glioma grades. Raman spectral signatures of glycosylation patterns were used in serum samples and fixed tissue biopsy samples, as well as in single cells and spheroids. RESULTS Glioma grades in fixed tissue patient samples and serum were discriminated with high accuracy. Discrimination between higher malignant glioma grades (III and IV) was achieved with high accuracy in tissue, serum, and cellular models using single cells and spheroids. Biomolecular changes were assigned to alterations in glycosylation corroborated by analysing glycan standards and other changes such as carotenoid antioxidant content. CONCLUSION RS combined with machine learning could pave the way for more objective and less invasive grading of glioma patients, serving as a useful tool to facilitate glioma diagnosis and delineate biomolecular glioma progression changes.
Collapse
|
8
|
Metatranscriptomics-guided genome-scale metabolic modeling of microbial communities. CELL REPORTS METHODS 2023; 3:100383. [PMID: 36814842 PMCID: PMC9939383 DOI: 10.1016/j.crmeth.2022.100383] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 10/07/2022] [Accepted: 12/12/2022] [Indexed: 01/09/2023]
Abstract
Multi-omics data integration via mechanistic models of metabolism is a scalable and flexible framework for exploring biological hypotheses in microbial systems. However, although most microorganisms are unculturable, such multi-omics modeling is limited to isolate microbes or simple synthetic communities. Here, we developed an approach for modeling microbial activity and interactions that leverages the reconstruction of metagenome-assembled genomes and associated genome-centric metatranscriptomes. At its core, we designed a method for condition-specific metabolic modeling of microbial communities through the integration of metatranscriptomic data. Using this approach, we explored the behavior of anaerobic digestion consortia driven by hydrogen availability and human gut microbiota dysbiosis associated with Crohn's disease, identifying condition-dependent amino acid requirements in archaeal species and a reduced short-chain fatty acid exchange network associated with disease, respectively. Our approach can be applied to complex microbial communities, allowing a mechanistic contextualization of multi-omics data on a metagenome scale.
Collapse
|
9
|
Machine Learning Methods for Survival Analysis with Clinical and Transcriptomics Data of Breast Cancer. Methods Mol Biol 2023; 2553:325-393. [PMID: 36227551 DOI: 10.1007/978-1-0716-2617-7_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Breast cancer is one of the most common cancers in women worldwide, which causes an enormous number of deaths annually. However, early diagnosis of breast cancer can improve survival outcomes enabling simpler and more cost-effective treatments. The recent increase in data availability provides unprecedented opportunities to apply data-driven and machine learning methods to identify early-detection prognostic factors capable of predicting the expected survival and potential sensitivity to treatment of patients, with the final aim of enhancing clinical outcomes. This tutorial presents a protocol for applying machine learning models in survival analysis for both clinical and transcriptomic data. We show that integrating clinical and mRNA expression data is essential to explain the multiple biological processes driving cancer progression. Our results reveal that machine-learning-based models such as random survival forests, gradient boosted survival model, and survival support vector machine can outperform the traditional statistical methods, i.e., Cox proportional hazard model. The highest C-index among the machine learning models was recorded when using survival support vector machine, with a value 0.688, whereas the C-index recorded using the Cox model was 0.677. Shapley Additive Explanation (SHAP) values were also applied to identify the feature importance of the models and their impact on the prediction outcomes.
Collapse
|
10
|
Clinical stratification improves the diagnostic accuracy of small omics datasets within machine learning and genome-scale metabolic modelling methods. Comput Biol Med 2022; 151:106244. [PMID: 36343407 DOI: 10.1016/j.compbiomed.2022.106244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 10/07/2022] [Accepted: 10/22/2022] [Indexed: 12/27/2022]
Abstract
BACKGROUND Recently, multi-omic machine learning architectures have been proposed for the early detection of cancer. However, for rare cancers and their associated small datasets, it is still unclear how to use the available multi-omics data to achieve a mechanistic prediction of cancer onset and progression, due to the limited data available. Hepatoblastoma is the most frequent liver cancer in infancy and childhood, and whose incidence has been lately increasing in several developed countries. Even though some studies have been conducted to understand the causes of its onset and discover potential biomarkers, the role of metabolic rewiring has not been investigated in depth so far. METHODS Here, we propose and implement an interpretable multi-omics pipeline that combines mechanistic knowledge from genome-scale metabolic models with machine learning algorithms, and we use it to characterise the underlying mechanisms controlling hepatoblastoma. RESULTS AND CONCLUSIONS While the obtained machine learning models generally present a high diagnostic classification accuracy, our results show that the type of omics combinations used as input to the machine learning models strongly affects the detection of important genes, reactions and metabolic pathways linked to hepatoblastoma. Our method also suggests that, in the context of computer-aided diagnosis of cancer, optimal diagnostic accuracy can be achieved by adopting a combination of omics that depends on the patient's clinical characteristics.
Collapse
|
11
|
Whole-genome sequencing and genome-scale metabolic modeling of Chromohalobacter canadensis 85B to explore its salt tolerance and biotechnological use. Microbiologyopen 2022; 11:e1328. [PMID: 36314754 PMCID: PMC9597258 DOI: 10.1002/mbo3.1328] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 10/01/2022] [Indexed: 11/06/2022] Open
Abstract
Salt tolerant organisms are increasingly being used for the industrial production of high-value biomolecules due to their better adaptability compared to mesophiles. Chromohalobacter canadensis is one of the early halophiles to show promising biotechnology potential, which has not been explored to date. Advanced high throughput technologies such as whole-genome sequencing allow in-depth insight into the potential of organisms while at the frontiers of systems biology. At the same time, genome-scale metabolic models (GEMs) enable phenotype predictions through a mechanistic representation of metabolism. Here, we sequence and analyze the genome of C. canadensis 85B, and we use it to reconstruct a GEM. We then analyze the GEM using flux balance analysis and validate it against literature data on C. canadensis. We show that C. canadensis 85B is a metabolically versatile organism with many features for stress and osmotic adaptation. Pathways to produce ectoine and polyhydroxybutyrates were also predicted. The GEM reveals the ability to grow on several carbon sources in a minimal medium and reproduce osmoadaptation phenotypes. Overall, this study reveals insights from the genome of C. canadensis 85B, providing genomic data and a draft GEM that will serve as the first steps towards a better understanding of its metabolism, for novel applications in industrial biotechnology.
Collapse
|
12
|
Loss of full-length dystrophin expression results in major cell-autonomous abnormalities in proliferating myoblasts. eLife 2022; 11:75521. [PMID: 36164827 PMCID: PMC9514850 DOI: 10.7554/elife.75521] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 09/02/2022] [Indexed: 12/05/2022] Open
Abstract
Duchenne muscular dystrophy (DMD) affects myofibers and muscle stem cells, causing progressive muscle degeneration and repair defects. It was unknown whether dystrophic myoblasts—the effector cells of muscle growth and regeneration—are affected. Using transcriptomic, genome-scale metabolic modelling and functional analyses, we demonstrate, for the first time, convergent abnormalities in primary mouse and human dystrophic myoblasts. In Dmdmdx myoblasts lacking full-length dystrophin, the expression of 170 genes was significantly altered. Myod1 and key genes controlled by MyoD (Myog, Mymk, Mymx, epigenetic regulators, ECM interactors, calcium signalling and fibrosis genes) were significantly downregulated. Gene ontology analysis indicated enrichment in genes involved in muscle development and function. Functionally, we found increased myoblast proliferation, reduced chemotaxis and accelerated differentiation, which are all essential for myoregeneration. The defects were caused by the loss of expression of full-length dystrophin, as similar and not exacerbated alterations were observed in dystrophin-null Dmdmdx-βgeo myoblasts. Corresponding abnormalities were identified in human DMD primary myoblasts and a dystrophic mouse muscle cell line, confirming the cross-species and cell-autonomous nature of these defects. The genome-scale metabolic analysis in human DMD myoblasts showed alterations in the rate of glycolysis/gluconeogenesis, leukotriene metabolism, and mitochondrial beta-oxidation of various fatty acids. These results reveal the disease continuum: DMD defects in satellite cells, the myoblast dysfunction affecting muscle regeneration, which is insufficient to counteract muscle loss due to myofiber instability. Contrary to the established belief, our data demonstrate that DMD abnormalities occur in myoblasts, making these cells a novel therapeutic target for the treatment of this lethal disease.
Collapse
|
13
|
Computational profiling of natural compounds as promising inhibitors against the spike proteins of SARS-CoV-2 wild-type and the variants of concern, viral cell-entry process, and cytokine storm in COVID-19. J Cell Biochem 2022; 123:964-986. [PMID: 35342986 DOI: 10.1002/jcb.30243] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 03/11/2022] [Accepted: 03/14/2022] [Indexed: 12/16/2022]
Abstract
The continuous spread and evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and the rapid surge in infection cases in the coronavirus disease 2019 (COVID-19) evoke a dire need for effective therapeutics. In this study, we explored the inhibitory potential of a library of 605 phytocompounds, selected from Indian medicinal plants with reported antiviral and anti-inflammatory activities, against the receptor-binding domain of spike proteins of the SARS-CoV-2 wild-type and the variants of concern, including variants B.1.1.7 (Alpha), B.1.351 (Beta), P.1 (Gamma), B.1.617.2 (Delta), and B.1.1.529 (Omicron). Our approach was based on extensive molecular docking, assessment of drug-likeness, and robust molecular dynamics simulations. We also identified promising inhibitory candidates against the host (human) proteins associated with SARS-CoV-2 spike activation and attachment, namely, ACE2 receptor, proteases TMPRSS2 and CTSL, and the endocytic regulator AAK1. In addition, we screened promising inhibitory compounds against the human proinflammatory cytokines- IL-6, IL-1β, TNF-α, and IFN-γ, that are associated with the adverse cytokine storm in COVID-19 patients. Our analysis returned an encouraging list of promising inhibitory candidates that includes: abietatriene against the spike proteins of the SARS-CoV-2 wild-type and the variants of concern; taraxerol against the human ACE2, CTSL and TNF-α; β-amyrin against the human TMPRSS2; cynaroside against the human AAK1 and IL-1β; and friedelin against the human IL-6 and IFN-γ. Our findings provide substantial evidence for the inhibitory potential of these compounds and encourage further in vitro and in vivo studies to validate their use as safe and effective therapeutics against COVID-19.
Collapse
|
14
|
Abstract
The human transcriptome comprises a complex network of coding and non-coding RNAs implicated in a myriad of biological functions. Non-coding RNAs exhibit highly organized spatial and temporal expression patterns and are emerging as critical regulators of differentiation, homeostasis, and pathological states, including in the cardiovascular system. This review defines the current knowledge gaps, unmet methodological needs, and describes the challenges in dissecting and understanding the role and regulation of the non-coding transcriptome in cardiovascular disease. These challenges include poor annotation of the non-coding genome, determination of the cellular distribution of transcripts, assessment of the role of RNA processing and identification of cell-type specific changes in cardiovascular physiology and disease. We highlight similarities and differences in the hurdles associated with the analysis of the non-coding and protein-coding transcriptomes. In addition, we discuss how the lack of consensus and absence of standardized methods affect reproducibility of data. These shortcomings should be defeated in order to make significant scientific progress and foster the development of clinically applicable non-coding RNA-based therapeutic strategies to lessen the burden of cardiovascular disease.
Collapse
|
15
|
Using machine learning as a surrogate model for agent-based simulations. PLoS One 2022; 17:e0263150. [PMID: 35143521 PMCID: PMC8830643 DOI: 10.1371/journal.pone.0263150] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Accepted: 01/12/2022] [Indexed: 02/02/2023] Open
Abstract
In this proof-of-concept work, we evaluate the performance of multiple machine-learning methods as surrogate models for use in the analysis of agent-based models (ABMs). Analysing agent-based modelling outputs can be challenging, as the relationships between input parameters can be non-linear or even chaotic even in relatively simple models, and each model run can require significant CPU time. Surrogate modelling, in which a statistical model of the ABM is constructed to facilitate detailed model analyses, has been proposed as an alternative to computationally costly Monte Carlo methods. Here we compare multiple machine-learning methods for ABM surrogate modelling in order to determine the approaches best suited as a surrogate for modelling the complex behaviour of ABMs. Our results suggest that, in most scenarios, artificial neural networks (ANNs) and gradient-boosted trees outperform Gaussian process surrogates, currently the most commonly used method for the surrogate modelling of complex computational models. ANNs produced the most accurate model replications in scenarios with high numbers of model runs, although training times were longer than the other methods. We propose that agent-based modelling would benefit from using machine-learning methods for surrogate modelling, as this can facilitate more robust sensitivity analyses for the models while also reducing CPU time consumption when calibrating and analysing the simulation.
Collapse
|
16
|
Integrating genome-scale metabolic modelling and transfer learning for human gene regulatory network reconstruction. Bioinformatics 2022; 38:487-493. [PMID: 34499112 DOI: 10.1093/bioinformatics/btab647] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 07/23/2021] [Accepted: 09/06/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Gene regulation is responsible for controlling numerous physiological functions and dynamically responding to environmental fluctuations. Reconstructing the human network of gene regulatory interactions is thus paramount to understanding the cell functional organization across cell types, as well as to elucidating pathogenic processes and identifying molecular drug targets. Although significant effort has been devoted towards this direction, existing computational methods mainly rely on gene expression levels, possibly ignoring the information conveyed by mechanistic biochemical knowledge. Moreover, except for a few recent attempts, most of the existing approaches only consider the information of the organism under analysis, without exploiting the information of related model organisms. RESULTS We propose a novel method for the reconstruction of the human gene regulatory network, based on a transfer learning strategy that synergically exploits information from human and mouse, conveyed by gene-related metabolic features generated in silico from gene expression data. Specifically, we learn a predictive model from metabolic activity inferred via tissue-specific metabolic modelling of artificial gene knockouts. Our experiments show that the combination of our transfer learning approach with the constructed metabolic features provides a significant advantage in terms of reconstruction accuracy, as well as additional clues on the contribution of each constructed metabolic feature. AVAILABILITY AND IMPLEMENTATION The method, the datasets and all the results obtained in this study are available at: https://doi.org/10.6084/m9.figshare.c.5237687. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
17
|
Genome Sequencing Variations in the Octodon degus, an Unconventional Natural Model of Aging and Alzheimer's Disease. Front Aging Neurosci 2022; 14:894994. [PMID: 35860672 PMCID: PMC9291219 DOI: 10.3389/fnagi.2022.894994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 05/31/2022] [Indexed: 11/25/2022] Open
Abstract
The degu (Octodon degus) is a diurnal long-lived rodent that can spontaneously develop molecular and behavioral changes that mirror those seen in human aging. With age some degu, but not all individuals, develop cognitive decline and brain pathology like that observed in Alzheimer's disease including neuroinflammation, hyperphosphorylated tau and amyloid plaques, together with other co-morbidities associated with aging such as macular degeneration, cataracts, alterations in circadian rhythm, diabetes and atherosclerosis. Here we report the whole-genome sequencing and analysis of the degu genome, which revealed unique features and molecular adaptations consistent with aging and Alzheimer's disease. We identified single nucleotide polymorphisms in genes associated with Alzheimer's disease including a novel apolipoprotein E (Apoe) gene variant that correlated with an increase in amyloid plaques in brain and modified the in silico predicted degu APOE protein structure and functionality. The reported genome of an unconventional long-lived animal model of aging and Alzheimer's disease offers the opportunity for understanding molecular pathways involved in aging and should help advance biomedical research into treatments for Alzheimer's disease.
Collapse
|
18
|
Abstract
Complex, distributed, and dynamic sets of clinical biomedical data are collectively referred to as multimodal clinical data. In order to accommodate the volume and heterogeneity of such diverse data types and aid in their interpretation when they are combined with a multi-scale predictive model, machine learning is a useful tool that can be wielded to deconstruct biological complexity and extract relevant outputs. Additionally, genome-scale metabolic models (GSMMs) are one of the main frameworks striving to bridge the gap between genotype and phenotype by incorporating prior biological knowledge into mechanistic models. Consequently, the utilization of GSMMs as a foundation for the integration of multi-omic data originating from different domains is a valuable pursuit towards refining predictions. In this chapter, we show how cancer multi-omic data can be analyzed via multimodal machine learning and metabolic modeling. Firstly, we focus on the merits of adopting an integrative systems biology led approach to biomedical data mining. Following this, we propose how constraint-based metabolic models can provide a stable yet adaptable foundation for the integration of multimodal data with machine learning. Finally, we provide a step-by-step tutorial for the combination of machine learning and GSMMs, which includes: (i) tissue-specific constraint-based modeling; (ii) survival analysis using time-to-event prediction for cancer; and (iii) classification and regression approaches for multimodal machine learning. The code associated with the tutorial can be found at https://github.com/Angione-Lab/Tutorials_Combining_ML_and_GSMM .
Collapse
|
19
|
Protocol for hybrid flux balance, statistical, and machine learning analysis of multi-omic data from the cyanobacterium Synechococcus sp. PCC 7002. STAR Protoc 2021; 2:100837. [PMID: 34632416 PMCID: PMC8488602 DOI: 10.1016/j.xpro.2021.100837] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Combining a computational framework for flux balance analysis with machine learning improves the accuracy of predicting metabolic activity across conditions, while enabling mechanistic interpretation. This protocol presents a guide to condition-specific metabolic modeling that integrates regularized flux balance analysis with machine learning approaches to extract key features from transcriptomic and fluxomic data. We demonstrate the protocol as applied to Synechococcus sp. PCC 7002; we also outline how it can be adapted to any species or community with available multi-omic data. For complete details on the use and execution of this protocol, please refer to Vijayakumar et al. (2020).
Collapse
|
20
|
Discovering Essential Multiple Gene Effects Through Large Scale Optimization: An Application to Human Cancer Metabolism. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2339-2352. [PMID: 32248120 DOI: 10.1109/tcbb.2020.2973386] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Computational modelling of metabolic processes has proven to be a useful approach to formulate our knowledge and improve our understanding of core biochemical systems that are crucial to maintaining cellular functions. Towards understanding the broader role of metabolism on cellular decision-making in health and disease conditions, it is important to integrate the study of metabolism with other core regulatory systems and omics within the cell, including gene expression patterns. After quantitatively integrating gene expression profiles with a genome-scale reconstruction of human metabolism, we propose a set of combinatorial methods to reverse engineer gene expression profiles and to find pairs and higher-order combinations of genetic modifications that simultaneously optimize multi-objective cellular goals. This enables us to suggest classes of transcriptomic profiles that are most suitable to achieve given metabolic phenotypes. We demonstrate how our techniques are able to compute beneficial, neutral or "toxic" combinations of gene expression levels. We test our methods on nine tissue-specific cancer models, comparing our outcomes with the corresponding normal cells, identifying genes as targets for potential therapies. Our methods open the way to a broad class of applications that require an understanding of the interplay among genotype, metabolism, and cellular behaviour, at scale.
Collapse
|
21
|
Abstract
The use of data mining and modeling methods in service industry is a promising avenue for optimizing current processes in a targeted manner, ultimately reducing costs and improving customer experience. However, the introduction of such tools in already established pipelines often must adapt to the way data is sampled and to its content. In this study, we tackle the challenge of characterizing and predicting customer experience having available only process log data with time-stamp information, without any ground truth feedback from the customers. As a case study, we consider the context of a contact center managed by TeleWare and analyze phone call logs relative to a two months span. We develop an approach to interpret the phone call process events registered in the logs and infer concrete points of improvement in the service management. Our approach is based on latent tree modeling and multi-class Naïve Bayes classification, which jointly allow us to infer a spectrum of customer experiences and test their predictability based on the current data sampling strategy. Moreover, such approach can overcome limitations in customer feedback collection and sharing across organizations, thus having wide applicability and being complementary to tools relying on more heavily constrained data.
Collapse
|
22
|
Genome-scale metabolic modelling of SARS-CoV-2 in cancer cells reveals an increased shift to glycolytic energy production. FEBS Lett 2021; 595:2350-2365. [PMID: 34409594 PMCID: PMC8427129 DOI: 10.1002/1873-3468.14180] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 08/02/2021] [Accepted: 08/15/2021] [Indexed: 01/08/2023]
Abstract
Cancer is considered a high‐risk condition for severe illness resulting from COVID‐19. The interaction between severe acute respiratory syndrome coronavirus‐2 (SARS‐CoV‐2) and human metabolism is key to elucidating the risk posed by COVID‐19 for cancer patients and identifying effective treatments, yet it is largely uncharacterised on a mechanistic level. We present a genome‐scale map of short‐term metabolic alterations triggered by SARS‐CoV‐2 infection of cancer cells. Through transcriptomic‐ and proteomic‐informed genome‐scale metabolic modelling, we characterise the role of RNA and fatty acid biosynthesis in conjunction with a rewiring in energy production pathways and enhanced cytokine secretion. These findings link together complementary aspects of viral invasion of cancer cells, while providing mechanistic insights that can inform the development of treatment strategies.
Collapse
|
23
|
Author Correction: Integrated multi‑omics analysis of ovarian cancer using variational autoencoders. Sci Rep 2021; 11:16671. [PMID: 34381128 PMCID: PMC8357958 DOI: 10.1038/s41598-021-95882-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
24
|
Situating agent-based modelling in population health research. Emerg Themes Epidemiol 2021; 18:10. [PMID: 34330302 PMCID: PMC8325181 DOI: 10.1186/s12982-021-00102-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 07/23/2021] [Indexed: 11/21/2022] Open
Abstract
Today's most troublesome population health challenges are often driven by social and environmental determinants, which are difficult to model using traditional epidemiological methods. We agree with those who have argued for the wider adoption of agent-based modelling (ABM) in taking on these challenges. However, while ABM has been used occasionally in population health, we argue that for ABM to be most effective in the field it should be used as a means for answering questions normally inaccessible to the traditional epidemiological toolkit. In an effort to clearly illustrate the utility of ABM for population health research, and to clear up persistent misunderstandings regarding the method's conceptual underpinnings, we offer a detailed presentation of the core concepts of complex systems theory, and summarise why simulations are essential to the study of complex systems. We then examine the current state of the art in ABM for population health, and propose they are well-suited for the study of the 'wicked' problems in population health, and could make significant contributions to theory and intervention development in these areas.
Collapse
|
25
|
Multimodal regularised linear models with flux balance analysis for mechanistic integration of omics data. Bioinformatics 2021; 37:3546-3552. [PMID: 33974036 DOI: 10.1093/bioinformatics/btab324] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 01/06/2021] [Accepted: 04/27/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION High-throughput biological data, thanks to technological advances, have become cheaper to collect, leading to the availability of vast amounts of omic data of different types. In parallel, the in silico reconstruction and modelling of metabolic systems is now acknowledged as a key tool to complement experimental data on a large scale. The integration of these model- and data-driven information is therefore emerging as a new challenge in systems biology, with no clear guidance on how to better take advantage of the inherent multi-source and multi-omic nature of these data types while preserving mechanistic interpretation. RESULTS Here we investigate different regularisation techniques for high-dimensional data derived from the integration of gene expression profiles with metabolic flux data, extracted from strain-specific metabolic models, to improve cellular growth rate predictions. To this end, we propose ad-hoc extensions of previous regularisation frameworks including group, view-specific and principal component regularisation, and experimentally compare them using data from 1,143 Saccharomyces cerevisiae strains. We observe a divergence between methods in terms of regression accuracy and integration effectiveness based on the type of regularisation employed. In multi-omic regression tasks, when learning from experimental and model-generated omic data, our results demonstrate the competitiveness and ease of interpretation of multimodal regularised linear models compared to data-hungry methods based on neural networks. AVAILABILITY All data, models, and code produced in this work are available on GitHub at https://github.com/Angione-Lab/HybridGroupIPFLasso_pc2Lasso. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
26
|
Integrated multi-omics analysis of ovarian cancer using variational autoencoders. Sci Rep 2021; 11:6265. [PMID: 33737557 PMCID: PMC7973750 DOI: 10.1038/s41598-021-85285-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 02/28/2021] [Indexed: 02/06/2023] Open
Abstract
Cancer is a complex disease that deregulates cellular functions at various molecular levels (e.g., DNA, RNA, and proteins). Integrated multi-omics analysis of data from these levels is necessary to understand the aberrant cellular functions accountable for cancer and its development. In recent years, Deep Learning (DL) approaches have become a useful tool in integrated multi-omics analysis of cancer data. However, high dimensional multi-omics data are generally imbalanced with too many molecular features and relatively few patient samples. This imbalance makes a DL based integrated multi-omics analysis difficult. DL-based dimensionality reduction technique, including variational autoencoder (VAE), is a potential solution to balance high dimensional multi-omics data. However, there are few VAE-based integrated multi-omics analyses, and they are limited to pancancer. In this work, we did an integrated multi-omics analysis of ovarian cancer using the compressed features learned through VAE and an improved version of VAE, namely Maximum Mean Discrepancy VAE (MMD-VAE). First, we designed and developed a DL architecture for VAE and MMD-VAE. Then we used the architecture for mono-omics, integrated di-omics and tri-omics data analysis of ovarian cancer through cancer samples identification, molecular subtypes clustering and classification, and survival analysis. The results show that MMD-VAE and VAE-based compressed features can respectively classify the transcriptional subtypes of the TCGA datasets with an accuracy in the range of 93.2-95.5% and 87.1-95.7%. Also, survival analysis results show that VAE and MMD-VAE based compressed representation of omics data can be used in cancer prognosis. Based on the results, we can conclude that (i) VAE and MMD-VAE outperform existing dimensionality reduction techniques, (ii) integrated multi-omics analyses perform better or similar compared to their mono-omics counterparts, and (iii) MMD-VAE performs better than VAE in most omics dataset.
Collapse
|
27
|
From genes to cognition:
Octodon degus
, an animal model for AD translational research. Alzheimers Dement 2020. [DOI: 10.1002/alz.047726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
28
|
A Hybrid Flux Balance Analysis and Machine Learning Pipeline Elucidates Metabolic Adaptation in Cyanobacteria. iScience 2020; 23:101818. [PMID: 33354660 PMCID: PMC7744713 DOI: 10.1016/j.isci.2020.101818] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 10/23/2020] [Accepted: 11/13/2020] [Indexed: 01/20/2023] Open
Abstract
Machine learning has recently emerged as a promising tool for inferring multi-omic relationships in biological systems. At the same time, genome-scale metabolic models (GSMMs) can be integrated with such multi-omic data to refine phenotypic predictions. In this work, we use a multi-omic machine learning pipeline to analyze a GSMM of Synechococcus sp. PCC 7002, a cyanobacterium with large potential to produce renewable biofuels. We use regularized flux balance analysis to observe flux response between conditions across photosynthesis and energy metabolism. We then incorporate principal-component analysis, k-means clustering, and LASSO regularization to reduce dimensionality and extract key cross-omic features. Our results suggest that combining metabolic modeling with machine learning elucidates mechanisms used by cyanobacteria to cope with fluctuations in light intensity and salinity that cannot be detected using transcriptomics alone. Furthermore, GSMMs introduce critical mechanistic details that improve the performance of omic-based machine learning methods. A pipeline for metabolic modeling in Synechococcus sp. PCC 7002 is presented Metabolic fluxes display clear differences in pathway activity across conditions Omic-informed GSMMs provide critical mechanistic details within machine learning Combining GSMM and machine learning improves methods based on transcriptomics alone
Collapse
|
29
|
Seeing the wood for the trees: a forest of methods for optimization and omic-network integration in metabolic modelling. Brief Bioinform 2019; 19:1218-1235. [PMID: 28575143 DOI: 10.1093/bib/bbx053] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Indexed: 11/13/2022] Open
Abstract
Metabolic modelling has entered a mature phase with dozens of methods and software implementations available to the practitioner and the theoretician. It is not easy for a modeller to be able to see the wood (or the forest) for the trees. Driven by this analogy, we here present a 'forest' of principal methods used for constraint-based modelling in systems biology. This provides a tree-based view of methods available to prospective modellers, also available in interactive version at http://modellingmetabolism.net, where it will be kept updated with new methods after the publication of the present manuscript. Our updated classification of existing methods and tools highlights the most promising in the different branches, with the aim to develop a vision of how existing methods could hybridize and become more complex. We then provide the first hands-on tutorial for multi-objective optimization of metabolic models in R. We finally discuss the implementation of multi-view machine learning approaches in poly-omic integration. Throughout this work, we demonstrate the optimization of trade-offs between multiple metabolic objectives, with a focus on omic data integration through machine learning. We anticipate that the combination of a survey, a perspective on multi-view machine learning and a step-by-step R tutorial should be of interest for both the beginner and the advanced user.
Collapse
|
30
|
Abstract
Omic data analysis is steadily growing as a driver of basic and applied molecular biology research. Core to the interpretation of complex and heterogeneous biological phenotypes are computational approaches in the fields of statistics and machine learning. In parallel, constraint-based metabolic modeling has established itself as the main tool to investigate large-scale relationships between genotype, phenotype, and environment. The development and application of these methodological frameworks have occurred independently for the most part, whereas the potential of their integration for biological, biomedical, and biotechnological research is less known. Here, we describe how machine learning and constraint-based modeling can be combined, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved. We overlap systematic classifications from both frameworks, making them accessible to nonexperts. Finally, we delineate potential future scenarios, propose new joint theoretical frameworks, and suggest concrete points of investigation for this joint subfield. A multiview approach merging experimental and knowledge-driven omic data through machine learning methods can incorporate key mechanistic information in an otherwise biologically-agnostic learning process.
Collapse
|
31
|
Social dynamics modeling of chrono-nutrition. PLoS Comput Biol 2019; 15:e1006714. [PMID: 30699206 PMCID: PMC6370249 DOI: 10.1371/journal.pcbi.1006714] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 02/11/2019] [Accepted: 12/14/2018] [Indexed: 12/13/2022] Open
Abstract
Gut microbiota and human relationships are strictly connected to each other. What we eat reflects our body-mind connection and synchronizes with people around us. However, how this impacts on gut microbiota and, conversely, how gut bacteria influence our dietary behaviors has not been explored yet. To quantify the complex dynamics of this interplay between gut and human behaviors we explore the "gut-human behavior axis" and its evolutionary dynamics in a real-world scenario represented by the social multiplex network. We consider a dual type of similarity, homophily and gut similarity, other than psychological and unconscious biases. We analyze the dynamics of social and gut microbial communities, quantifying the impact of human behaviors on diets and gut microbial composition and, backwards, through a control mechanism. Meal timing mechanisms and "chrono-nutrition" play a crucial role in feeding behaviors, along with the quality and quantity of food intake. Considering a population of shift workers, we explore the dynamic interplay between their eating behaviors and gut microbiota, modeling the social dynamics of chrono-nutrition in a multiplex network. Our findings allow us to quantify the relation between human behaviors and gut microbiota through the methodological introduction of gut metabolic modeling and statistical estimators, able to capture their dynamic interplay. Moreover, we find that the timing of gut microbial communities is slower than social interactions and shift-working, and the impact of shift-working on the dynamics of chrono-nutrition is a fluctuation of strategies with a major propensity for defection (e.g. high-fat meals). A deeper understanding of the relation between gut microbiota and the dietary behavioral patterns, by embedding also the related social aspects, allows improving the overall knowledge about metabolic models and their implications for human health, opening the possibility to design promising social therapeutic dietary interventions.
Collapse
|
32
|
CiliateGEM: an open-project and a tool for predictions of ciliate metabolic variations and experimental condition design. BMC Bioinformatics 2018; 19:442. [PMID: 30497359 PMCID: PMC6266953 DOI: 10.1186/s12859-018-2422-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The study of cell metabolism is becoming central in several fields such as biotechnology, evolution/adaptation and human disease investigations. Here we present CiliateGEM, the first metabolic network reconstruction draft of the freshwater ciliate Tetrahymena thermophila. We also provide the tools and resources to simulate different growth conditions and to predict metabolic variations. CiliateGEM can be extended to other ciliates in order to set up a meta-model, i.e. a metabolic network reconstruction valid for all ciliates. Ciliates are complex unicellular eukaryotes of presumably monophyletic origin, with a phylogenetic position that is equal from plants and animals. These cells represent a new concept of unicellular system with a high degree of species, population biodiversity and cell complexity. Ciliates perform in a single cell all the functions of a pluricellular organism, including locomotion, feeding, digestion, and sexual processes. RESULTS After generating the model, we performed an in-silico simulation with the presence and absence of glucose. The lack of this nutrient caused a 32.1% reduction rate in biomass synthesis. Despite the glucose starvation, the growth did not stop due to the use of alternative carbon sources such as amino acids. CONCLUSIONS The future models obtained from CiliateGEM may represent a new approach to describe the metabolism of ciliates. This tool will be a useful resource for the ciliate research community in order to extend these species as model organisms in different research fields. An improved understanding of ciliate metabolism could be relevant to elucidate the basis of biological phenomena like genotype-phenotype relationships, population genetics, and cilia-related disease mechanisms.
Collapse
|
33
|
Abstract
BACKGROUND Ageing can be classified in two different ways, chronological ageing and biological ageing. While chronological age is a measure of the time that has passed since birth, biological (also known as transcriptomic) ageing is defined by how time and the environment affect an individual in comparison to other individuals of the same chronological age. Recent research studies have shown that transcriptomic age is associated with certain genes, and that each of those genes has an effect size. Using these effect sizes we can calculate the transcriptomic age of an individual from their age-associated gene expression levels. The limitation of this approach is that it does not consider how these changes in gene expression affect the metabolism of individuals and hence their observable cellular phenotype. RESULTS We propose a method based on poly-omic constraint-based models and machine learning in order to further the understanding of transcriptomic ageing. We use normalised CD4 T-cell gene expression data from peripheral blood mononuclear cells in 499 healthy individuals to create individual metabolic models. These models are then combined with a transcriptomic age predictor and chronological age to provide new insights into the differences between transcriptomic and chronological ageing. As a result, we propose a novel metabolic age predictor. CONCLUSIONS We show that our poly-omic predictors provide a more detailed analysis of transcriptomic ageing compared to gene-based approaches, and represent a basis for furthering our knowledge of the ageing mechanisms in human cells.
Collapse
|
34
|
In silico engineering of Pseudomonas metabolism reveals new biomarkers for increased biosurfactant production. PeerJ 2018; 6:e6046. [PMID: 30588397 PMCID: PMC6301282 DOI: 10.7717/peerj.6046] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Accepted: 10/30/2018] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Rhamnolipids, biosurfactants with a wide range of biomedical applications, are amphiphilic molecules produced on the surfaces of or excreted extracellularly by bacteria including Pseudomonas aeruginosa. However, Pseudomonas putida is a non-pathogenic model organism with greater metabolic versatility and potential for industrial applications. METHODS We investigate in silico the metabolic capabilities of P. putida for rhamnolipids biosynthesis using statistical, metabolic and synthetic engineering approaches after introducing key genes (RhlA and RhlB) from P. aeruginosa into a genome-scale model of P. putida. This pipeline combines machine learning methods with multi-omic modelling, and drives the engineered P. putida model toward an optimal production and export of rhamnolipids out of the membrane. RESULTS We identify a substantial increase in synthesis of rhamnolipids by the engineered model compared to the control model. We apply statistical and machine learning techniques on the metabolic reaction rates to identify distinct features on the structure of the variables and individual components driving the variation of growth and rhamnolipids production. We finally provide a computational framework for integrating multi-omics data and identifying latent pathways and genes for the production of rhamnolipids in P. putida. CONCLUSIONS We anticipate that our results will provide a versatile methodology for integrating multi-omics data for topological and functional analysis of P. putida toward maximization of biosurfactant production.
Collapse
|
35
|
Optimization of Multi-Omic Genome-Scale Models: Methodologies, Hands-on Tutorial, and Perspectives. Methods Mol Biol 2018; 1716:389-408. [PMID: 29222764 DOI: 10.1007/978-1-4939-7528-0_18] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Genome-scale metabolic models are valuable tools for assessing the metabolic potential of living organisms. Being downstream of gene expression, metabolism is increasingly being used as an indicator of the phenotypic outcome for drugs and therapies. We here present a review of the principal methods used for constraint-based modelling in systems biology, and explore how the integration of multi-omic data can be used to improve phenotypic predictions of genome-scale metabolic models. We believe that the large-scale comparison of the metabolic response of an organism to different environmental conditions will be an important challenge for genome-scale models. Therefore, within the context of multi-omic methods, we describe a tutorial for multi-objective optimization using the metabolic and transcriptomics adaptation estimator (METRADE), implemented in MATLAB. METRADE uses microarray and codon usage data to model bacterial metabolic response to environmental conditions (e.g., antibiotics, temperatures, heat shock). Finally, we discuss key considerations for the integration of multi-omic networks into metabolic models, towards automatically extracting knowledge from such models.
Collapse
|
36
|
Modelling pyruvate dehydrogenase under hypoxia and its role in cancer metabolism. ROYAL SOCIETY OPEN SCIENCE 2017; 4:170360. [PMID: 29134060 PMCID: PMC5666243 DOI: 10.1098/rsos.170360] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 09/25/2017] [Indexed: 05/18/2023]
Abstract
Metabolism is the only biological system that can be fully modelled at genome scale. As a result, metabolic models have been increasingly used to study the molecular mechanisms of various diseases. Hypoxia, a low-oxygen tension, is a well-known characteristic of many cancer cells. Pyruvate dehydrogenase (PDH) controls the flux of metabolites between glycolysis and the tricarboxylic acid cycle and is a key enzyme in metabolic reprogramming in cancer metabolism. Here, we develop and manually curate a constraint-based metabolic model to investigate the mechanism of pyruvate dehydrogenase under hypoxia. Our results characterize the activity of pyruvate dehydrogenase and its decline during hypoxia. This results in lactate accumulation, consistent with recent hypoxia studies and a well-known feature in cancer metabolism. We apply machine-learning techniques on the flux datasets to identify reactions that drive these variations. We also identify distinct features on the structure of the variables and individual metabolic components in the switch from normoxia to hypoxia. Our results provide a framework for future studies by integrating multi-omics data to predict condition-specific metabolic phenotypes under hypoxia.
Collapse
|
37
|
Integrating splice-isoform expression into genome-scale models characterizes breast cancer metabolism. Bioinformatics 2017; 34:494-501. [DOI: 10.1093/bioinformatics/btx562] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Accepted: 09/06/2017] [Indexed: 12/20/2022] Open
|
38
|
Making life difficult for Clostridium difficile: augmenting the pathogen's metabolic model with transcriptomic and codon usage data for better therapeutic target characterization. BMC SYSTEMS BIOLOGY 2017; 11:25. [PMID: 28209199 PMCID: PMC5314682 DOI: 10.1186/s12918-017-0395-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2016] [Accepted: 01/13/2017] [Indexed: 11/10/2022]
Abstract
BACKGROUND Clostridium difficile is a bacterium which can infect various animal species, including humans. Infection with this bacterium is a leading healthcare-associated illness. A better understanding of this organism and the relationship between its genotype and phenotype is essential to the search for an effective treatment. Genome-scale metabolic models contain all known biochemical reactions of a microorganism and can be used to investigate this relationship. RESULTS We present icdf834, an updated metabolic network of C. difficile that builds on iMLTC806cdf and features 1227 reactions, 834 genes, and 807 metabolites. We used this metabolic network to reconstruct the metabolic landscape of this bacterium. The standard metabolic model cannot account for changes in the bacterial metabolism in response to different environmental conditions. To account for this limitation, we also integrated transcriptomic data, which details the gene expression of the bacterium in a wide array of environments. Importantly, to bridge the gap between gene expression levels and protein abundance, we accounted for the synonymous codon usage bias of the bacterium in the model. To our knowledge, this is the first time codon usage has been quantified and integrated into a metabolic model. The metabolic fluxes were defined as a function of protein abundance. To determine potential therapeutic targets using the model, we conducted gene essentiality and metabolic pathway sensitivity analyses and calculated flux control coefficients. We obtained 92.3% accuracy in predicting gene essentiality when compared to experimental data for C. difficile R20291 (ribotype 027) homologs. We validated our context-specific metabolic models using sensitivity and robustness analyses and compared model predictions with literature on C. difficile. The model predicts interesting facets of the bacterium's metabolism, such as changes in the bacterium's growth in response to different environmental conditions. CONCLUSIONS After an extensive validation process, we used icdf834 to obtain state-of-the-art predictions of therapeutic targets for C. difficile. We show how context-specific metabolic models augmented with codon usage information can be a beneficial resource for better understanding C. difficile and for identifying novel therapeutic targets. We remark that our approach can be applied to investigate and treat against other pathogens.
Collapse
|
39
|
Erratum: Predictive analytics of environmental adaptability in multi-omic network models. Sci Rep 2016; 6:26266. [PMID: 27199183 PMCID: PMC4873815 DOI: 10.1038/srep26266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
|
40
|
|
41
|
Abstract
BACKGROUND Genomic, transcriptomic, and metabolic variations shape the complex adaptation landscape of bacteria to varying environmental conditions. Elucidating the genotype-phenotype relation paves the way for the prediction of such effects, but methods for characterizing the relationship between multiple environmental factors are still lacking. Here, we tackle the problem of extracting network-level information from collections of environmental conditions, by integrating the multiple omic levels at which the bacterial response is measured. RESULTS To this end, we model a large compendium of growth conditions as a multiplex network consisting of transcriptomic and fluxomic layers, and we propose a multi-omic network approach to infer similarity of growth conditions by integrating layers of the multiplex network. Each node of the network represents a single condition, while edges are similarities between conditions, as measured by phenotypic and transcriptomic properties on different layers of the network. We then fuse these layers into one network, therefore capturing a global network of conditions and the associated similarities across two omic levels. We apply this multi-omic fusion to an updated genome-scale reconstruction of Escherichia coli that includes underground metabolism and new gene-protein-reaction associations. CONCLUSIONS Our method can be readily used to evaluate and cross-compare different collections of conditions among different species. Acquiring multi-omic information on the topology of the space of experimental conditions makes it possible to infer the position and to build condition-specific models of untested or incomplete profiles for which experimental data is not available. Our weighted network fusion method for genome-scale models is freely available at https://github.com/maxconway/SNFtool .
Collapse
|
42
|
Predictive analytics of environmental adaptability in multi-omic network models. Sci Rep 2015; 5:15147. [PMID: 26482106 PMCID: PMC4611489 DOI: 10.1038/srep15147] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 09/14/2015] [Indexed: 01/22/2023] Open
Abstract
Bacterial phenotypic traits and lifestyles in response to diverse environmental conditions depend on changes in the internal molecular environment. However, predicting bacterial adaptability is still difficult outside of laboratory controlled conditions. Many molecular levels can contribute to the adaptation to a changing environment: pathway structure, codon usage, metabolism. To measure adaptability to changing environmental conditions and over time, we develop a multi-omic model of Escherichia coli that accounts for metabolism, gene expression and codon usage at both transcription and translation levels. After the integration of multiple omics into the model, we propose a multiobjective optimization algorithm to find the allowable and optimal metabolic phenotypes through concurrent maximization or minimization of multiple metabolic markers. In the condition space, we propose Pareto hypervolume and spectral analysis as estimators of short term multi-omic (transcriptomic and metabolic) evolution, thus enabling comparative analysis of metabolic conditions. We therefore compare, evaluate and cluster different experimental conditions, models and bacterial strains according to their metabolic response in a multidimensional objective space, rather than in the original space of microarray data. We finally validate our methods on a phenomics dataset of growth conditions. Our framework, named METRADE, is freely available as a MATLAB toolbox.
Collapse
|
43
|
Multi-Target Analysis and Design of Mitochondrial Metabolism. PLoS One 2015; 10:e0133825. [PMID: 26376088 PMCID: PMC4574446 DOI: 10.1371/journal.pone.0133825] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 07/02/2015] [Indexed: 12/30/2022] Open
Abstract
Analyzing and optimizing biological models is often identified as a research priority in biomedical engineering. An important feature of a model should be the ability to find the best condition in which an organism has to be grown in order to reach specific optimal output values chosen by the researcher. In this work, we take into account a mitochondrial model analyzed with flux-balance analysis. The optimal design and assessment of these models is achieved through single- and/or multi-objective optimization techniques driven by epsilon-dominance and identifiability analysis. Our optimization algorithm searches for the values of the flux rates that optimize multiple cellular functions simultaneously. The optimization of the fluxes of the metabolic network includes not only input fluxes, but also internal fluxes. A faster convergence process with robust candidate solutions is permitted by a relaxed Pareto dominance, regulating the granularity of the approximation of the desired Pareto front. We find that the maximum ATP production is linked to a total consumption of NADH, and reaching the maximum amount of NADH leads to an increasing request of NADH from the external environment. Furthermore, the identifiability analysis characterizes the type and the stage of three monogenic diseases. Finally, we propose a new methodology to extend any constraint-based model using protein abundances.
Collapse
|
44
|
A Hybrid of Metabolic Flux Analysis and Bayesian Factor Modeling for Multiomic Temporal Pathway Activation. ACS Synth Biol 2015; 4:880-9. [PMID: 25856685 DOI: 10.1021/sb5003407] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
The growing availability of multiomic data provides a highly comprehensive view of cellular processes at the levels of mRNA, proteins, metabolites, and reaction fluxes. However, due to probabilistic interactions between components depending on the environment and on the time course, casual, sometimes rare interactions may cause important effects in the cellular physiology. To date, interactions at the pathway level cannot be measured directly, and methodologies to predict pathway cross-correlations from reaction fluxes are still missing. Here, we develop a multiomic approach of flux-balance analysis combined with Bayesian factor modeling with the aim of detecting pathway cross-correlations and predicting metabolic pathway activation profiles. Starting from gene expression profiles measured in various environmental conditions, we associate a flux rate profile with each condition. We then infer pathway cross-correlations and identify the degrees of pathway activation with respect to the conditions and time course using Bayesian factor modeling. We test our framework on the most recent metabolic reconstruction of Escherichia coli in both static and dynamic environments, thus predicting the functionality of particular groups of reactions and how it varies over time. In a dynamic environment, our method can be readily used to characterize the temporal progression of pathway activation in response to given stimuli.
Collapse
|
45
|
Bioremediation in marine ecosystems: a computational study combining ecological modeling and flux balance analysis. Front Genet 2014; 5:319. [PMID: 25309577 PMCID: PMC4162388 DOI: 10.3389/fgene.2014.00319] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Accepted: 08/26/2014] [Indexed: 11/13/2022] Open
Abstract
The pressure to search effective bioremediation methodologies for contaminated ecosystems has led to the large-scale identification of microbial species and metabolic degradation pathways. However, minor attention has been paid to the study of bioremediation in marine food webs and to the definition of integrated strategies for reducing bioaccumulation in species. We propose a novel computational framework for analysing the multiscale effects of bioremediation at the ecosystem level, based on coupling food web bioaccumulation models and metabolic models of degrading bacteria. The combination of techniques from synthetic biology and ecological network analysis allows the specification of arbitrary scenarios of contaminant removal and the evaluation of strategies based on natural or synthetic microbial strains. In this study, we derive a bioaccumulation model of polychlorinated biphenyls (PCBs) in the Adriatic food web, and we extend a metabolic reconstruction of Pseudomonas putida KT2440 (iJN746) with the aerobic pathway of PCBs degradation. We assess the effectiveness of different bioremediation scenarios in reducing PCBs concentration in species and we study indices of species centrality to measure their importance in the contaminant diffusion via feeding links. The analysis of the Adriatic sea case study suggests that our framework could represent a practical tool in the design of effective remediation strategies, providing at the same time insights into the ecological role of microbial communities within food webs.
Collapse
|
46
|
A design automation framework for computational bioenergetics in biological networks. MOLECULAR BIOSYSTEMS 2014; 9:2554-64. [PMID: 23925151 DOI: 10.1039/c3mb25558a] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The bioenergetic activity of mitochondria can be thoroughly investigated by using computational methods. In particular, in our work we focus on ATP and NADH, namely the metabolites representing the production of energy in the cell. We develop a computational framework to perform an exhaustive investigation at the level of species, reactions, genes and metabolic pathways. The framework integrates several methods implementing the state-of-the-art algorithms for many-objective optimization, sensitivity, and identifiability analysis applied to biological systems. We use this computational framework to analyze three case studies related to the human mitochondria and the algal metabolism of Chlamydomonas reinhardtii, formally described with algebraic differential equations or flux balance analysis. Integrating the results of our framework applied to interacting organelles would provide a general-purpose method for assessing the production of energy in a biological network.
Collapse
|
47
|
Pareto optimality in organelle energy metabolism analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1032-1044. [PMID: 24334395 DOI: 10.1109/tcbb.2013.95] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In low and high eukaryotes, energy is collected or transformed in compartments, the organelles. The rich variety of size, characteristics, and density of the organelles makes it difficult to build a general picture. In this paper, we make use of the Pareto-front analysis to investigate the optimization of energy metabolism in mitochondria and chloroplasts. Using the Pareto optimality principle, we compare models of organelle metabolism on the basis of single- and multiobjective optimization, approximation techniques (the Bayesian Automatic Relevance Determination), robustness, and pathway sensitivity analysis. Finally, we report the first analysis of the metabolic model for the hydrogenosome of Trichomonas vaginalis, which is found in several protozoan parasites. Our analysis has shown the importance of the Pareto optimality for such comparison and for insights into the evolution of the metabolism from cytoplasmic to organelle bound, involving a model order reduction. We report that Pareto fronts represent an asymptotic analysis useful to describe the metabolism of an organism aimed at maximizing concurrently two or more metabolite concentrations.
Collapse
|
48
|
Efficient behavior of photosynthetic organelles via Pareto optimality, identifiability, and sensitivity analysis. ACS Synth Biol 2013; 2:274-88. [PMID: 23654280 DOI: 10.1021/sb300102k] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this work, we develop methodologies for analyzing and cross comparing metabolic models. We investigate three important metabolic networks to discuss the complexity of biological organization of organisms, modeling, and system properties. In particular, we analyze these metabolic networks because of their biotechnological and basic science importance: the photosynthetic carbon metabolism in a general leaf, the Rhodobacter spheroides bacterium, and the Chlamydomonas reinhardtii alga. We adopt single- and multi-objective optimization algorithms to maximize the CO 2 uptake rate and the production of metabolites of industrial interest or for ecological purposes. We focus both on the level of genes (e.g., finding genetic manipulations to increase the production of one or more metabolites) and on finding concentration enzymes for improving the CO 2 consumption. We find that R. spheroides is able to absorb an amount of CO 2 until 57.452 mmol h (-1) gDW (-1) , while C. reinhardtii obtains a maximum of 6.7331. We report that the Pareto front analysis proves extremely useful to compare different organisms, as well as providing the possibility to investigate them with the same framework. By using the sensitivity and robustness analysis, our framework identifies the most sensitive and fragile components of the biological systems we take into account, allowing us to compare their models. We adopt the identifiability analysis to detect functional relations among enzymes; we observe that RuBisCO, GAPDH, and FBPase belong to the same functional group, as suggested also by the sensitivity analysis.
Collapse
|
49
|
|
50
|
Abstract
MOTIVATION Metabolic engineering algorithms provide means to optimize a biological process leading to the improvement of a biotechnological interesting molecule. Therefore, it is important to understand how to act in a metabolic pathway in order to have the best results in terms of productions. In this work, we present a computational framework that searches for optimal and robust microbial strains that are able to produce target molecules. Our framework performs three tasks: it evaluates the parameter sensitivity of the microbial model, searches for the optimal genetic or fluxes design and finally calculates the robustness of the microbial strains. We are capable to combine the exploration of species, reactions, pathways and knockout parameter spaces with the Pareto-optimality principle. RESULTS Our framework provides also theoretical and practical guidelines for design automation. The statistical cross comparison of our new optimization procedures, performed with respect to currently widely used algorithms for bacteria (e.g. Escherichia coli) over different multiple functions, reveals good performances over a variety of biotechnological products. AVAILABILITY http://www.dmi.unict.it/nicosia/pathDesign.html. CONTACT nicosia@dmi.unict.it or pl219@cam.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|