1
|
Ferreira MADM, Silveira WBD, Nikoloski Z. Protein constraints in genome-scale metabolic models: Data integration, parameter estimation, and prediction of metabolic phenotypes. Biotechnol Bioeng 2024; 121:915-930. [PMID: 38178617 DOI: 10.1002/bit.28650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 10/24/2023] [Accepted: 12/18/2023] [Indexed: 01/06/2024]
Abstract
Genome-scale metabolic models provide a valuable resource to study metabolism and cell physiology. These models are employed with approaches from the constraint-based modeling framework to predict metabolic and physiological phenotypes. The prediction performance of genome-scale metabolic models can be improved by including protein constraints. The resulting protein-constrained models consider data on turnover numbers (kcat ) and facilitate the integration of protein abundances. In this systematic review, we present and discuss the current state-of-the-art regarding the estimation of kinetic parameters used in protein-constrained models. We also highlight how data-driven and constraint-based approaches can aid the estimation of turnover numbers and their usage in improving predictions of cellular phenotypes. Finally, we identify standing challenges in protein-constrained metabolic models and provide a perspective regarding future approaches to improve the predictive performance.
Collapse
Affiliation(s)
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
- Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| |
Collapse
|
2
|
Ponomarenko EA, Krasnov GS, Kiseleva OI, Kryukova PA, Arzumanian VA, Dolgalev GV, Ilgisonis EV, Lisitsa AV, Poverennaya EV. Workability of mRNA Sequencing for Predicting Protein Abundance. Genes (Basel) 2023; 14:2065. [PMID: 38003008 PMCID: PMC10671741 DOI: 10.3390/genes14112065] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/03/2023] [Accepted: 11/07/2023] [Indexed: 11/26/2023] Open
Abstract
Transcriptomics methods (RNA-Seq, PCR) today are more routine and reproducible than proteomics methods, i.e., both mass spectrometry and immunochemical analysis. For this reason, most scientific studies are limited to assessing the level of mRNA content. At the same time, protein content (and its post-translational status) largely determines the cell's state and behavior. Such a forced extrapolation of conclusions from the transcriptome to the proteome often seems unjustified. The ratios of "transcript-protein" pairs can vary by several orders of magnitude for different genes. As a rule, the correlation coefficient between transcriptome-proteome levels for different tissues does not exceed 0.3-0.5. Several characteristics determine the ratio between the content of mRNA and protein: among them, the rate of movement of the ribosome along the mRNA and the number of free ribosomes in the cell, the availability of tRNA, the secondary structure, and the localization of the transcript. The technical features of the experimental methods also significantly influence the levels of the transcript and protein of the corresponding gene on the outcome of the comparison. Given the above biological features and the performance of experimental and bioinformatic approaches, one may develop various models to predict proteomic profiles based on transcriptomic data. This review is devoted to the ability of RNA sequencing methods for protein abundance prediction.
Collapse
Affiliation(s)
| | - George S. Krasnov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia;
| | | | | | | | | | | | | | | |
Collapse
|
3
|
Moura Ferreira MAD, Wendering P, Arend M, Batista da Silveira W, Nikoloski Z. Accurate prediction of in vivo protein abundances by coupling constraint-based modelling and machine learning. Metab Eng 2023; 80:184-192. [PMID: 37802292 DOI: 10.1016/j.ymben.2023.09.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/10/2023] [Accepted: 09/25/2023] [Indexed: 10/08/2023]
Abstract
Quantification of how different environmental cues affect protein allocation can provide important insights for understanding cell physiology. While absolute quantification of proteins can be obtained by resource-intensive mass-spectrometry-based technologies, prediction of protein abundances offers another way to obtain insights into protein allocation. Here we present CAMEL, a framework that couples constraint-based modelling with machine learning to predict protein abundance for any environmental condition. This is achieved by building machine learning models that leverage static features, derived from protein sequences, and condition-dependent features predicted from protein-constrained metabolic models. Our findings demonstrate that CAMEL results in excellent prediction of protein allocation in E. coli (average Pearson correlation of at least 0.9), and moderate performance in S. cerevisiae (average Pearson correlation of at least 0.5). Therefore, CAMEL outperformed contending approaches without using molecular read-outs from unseen conditions and provides a valuable tool for using protein allocation in biotechnological applications.
Collapse
Affiliation(s)
| | - Philipp Wendering
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany
| | - Marius Arend
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany.
| |
Collapse
|
4
|
Ferreira M, Ventorim R, Almeida E, Silveira S, Silveira W. Protein Abundance Prediction Through Machine Learning Methods. J Mol Biol 2021; 433:167267. [PMID: 34563548 DOI: 10.1016/j.jmb.2021.167267] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 09/09/2021] [Accepted: 09/17/2021] [Indexed: 10/20/2022]
Abstract
Proteins are responsible for most physiological processes, and their abundance provides crucial information for systems biology research. However, absolute protein quantification, as determined by mass spectrometry, still has limitations in capturing the protein pool. Protein abundance is impacted by translation kinetics, which rely on features of codons. In this study, we evaluated the effect of codon usage bias of genes on protein abundance. Notably, we observed differences regarding codon usage patterns between genes coding for highly abundant proteins and genes coding for less abundant proteins. Analysis of synonymous codon usage and evolutionary selection showed a clear split between the two groups. Our machine learning models predicted protein abundances from codon usage metrics with remarkable accuracy, achieving strong correlation with experimental data. Upon integration of the predicted protein abundance in enzyme-constrained genome-scale metabolic models, the simulated phenotypes closely matched experimental data, which demonstrates that our predictive models are valuable tools for systems metabolic engineering approaches.
Collapse
Affiliation(s)
- Mauricio Ferreira
- Department of Microbiology, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil. https://twitter.com/@mauriciomyces
| | - Rafaela Ventorim
- Department of Microbiology, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil.
| | - Eduardo Almeida
- Department of Microbiology, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil. https://twitter.com/@elm_almeida
| | - Sabrina Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil. https://twitter.com/@sabrina_as
| | - Wendel Silveira
- Department of Microbiology, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil.
| |
Collapse
|
5
|
Wang M, Gong C, Amakye WK, Ren J. Exploring the Mechanisms of Anti‐A β42 Aggregation Activity of Walnut‐derived Peptides using Transcriptomics and Proteomics in vitro. EFOOD 2021; 2:247-258. [DOI: 10.53365/efood.k/144885] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 12/14/2021] [Indexed: 11/05/2022] Open
Abstract
Inhibiting β‐amyloid (Aβ) aggregation is of significance in finding potential candidates for Alzheimer's disease (AD) treatment. Accumulating evidence suggests that nutrition is important for improving cognition and reducing AD risk. Walnut has been widely used as a functional food for brain health; however the underlying mechanisms remain unknown. Here, we investigated the molecular level alteration in Arctic mutant Aβ42 induced aggregation cell model by RNA‐seq and iTRAQ approaches after walnut‐derived peptides Pro‐Pro‐Lys‐Asn‐Trp (PW5) and Trp‐Pro‐Pro‐Lys‐Asn (WN5) interventions. PW5 or WN5 could significantly decrease abnormal Aβ42 aggregates. However, resultant alterations in transcriptome (substantially unchanged) were inconsistent with proteomic data (marked change). Proteomic analysis revealed 184 and 194 differentially expressed proteins unique to PW5 and WN5 treatment, respectively, for inhibiting Aβ42 protein production or increasing protein degradation via the mismatch repair pathways. Our study provides new insights into the effectiveness of food‐derived peptides for anti‐Aβ42 aggregation in AD.
Collapse
Affiliation(s)
- Min Wang
- School of Food Science and Engineering South China University of Technology Wushan 510641 Guangzhou China
| | - Congcong Gong
- School of Food Science and Engineering South China University of Technology Wushan 510641 Guangzhou China
| | - William Kwame Amakye
- School of Food Science and Engineering South China University of Technology Wushan 510641 Guangzhou China
| | - Jiaoyan Ren
- School of Food Science and Engineering South China University of Technology Wushan 510641 Guangzhou China
| |
Collapse
|
6
|
Giudice G, Petsalaki E. Proteomics and phosphoproteomics in precision medicine: applications and challenges. Brief Bioinform 2019; 20:767-777. [PMID: 29077858 PMCID: PMC6585152 DOI: 10.1093/bib/bbx141] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2017] [Revised: 09/21/2017] [Indexed: 12/11/2022] Open
Abstract
Recent advances in proteomics allow the accurate measurement of abundances for thousands of proteins and phosphoproteins from multiple samples in parallel. Therefore, for the first time, we have the opportunity to measure the proteomic profiles of thousands of patient samples or disease model cell lines in a systematic way, to identify the precise underlying molecular mechanism and discover personalized biomarkers, networks and treatments. Here, we review examples of successful use of proteomics and phosphoproteomics data sets in as well as their integration other omics data sets with the aim of precision medicine. We will discuss the bioinformatics challenges posed by the generation, analysis and integration of such large data sets and present potential reasons why proteomics profiling and biomarkers are not currently widely used in the clinical setting. We will finally discuss ways to contribute to the better use of proteomics data in precision medicine and the clinical setting.
Collapse
Affiliation(s)
- Girolamo Giudice
- European Molecular Biology Laboratory European Bioinformatics Institute
| | | |
Collapse
|
7
|
Abstract
Analysis of genomic data is often complicated by the presence of missing values, which may arise due to cost or other reasons. The prevailing approach of single imputation is generally invalid if the imputation model is misspecified. In this paper, we propose a robust score statistic based on imputed data for testing the association between a phenotype and a genomic variable with (partially) missing values. We fit a semiparametric regression model for the genomic variable against an arbitrary function of the linear predictor in the phenotype model and impute each missing value by its estimated posterior expectation. We show that the score statistic with such imputed values is asymptotically unbiased under general missing-data mechanisms, even when the imputation model is misspecified. We develop a spline-based method to estimate the semiparametric imputation model and derive the asymptotic distribution of the corresponding score statistic with a consistent variance estimator using sieve approximation theory and empirical process theory. The proposed test is computationally feasible regardless of the number of independent variables in the imputation model. We demonstrate the advantages of the proposed method over existing methods through extensive simulation studies and provide an application to a major cancer genomics study.
Collapse
Affiliation(s)
- Kin Yau Wong
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - D Y Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
8
|
Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes (Basel) 2019; 10:E87. [PMID: 30696086 PMCID: PMC6410075 DOI: 10.3390/genes10020087] [Citation(s) in RCA: 176] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/08/2019] [Accepted: 01/21/2019] [Indexed: 12/11/2022] Open
Abstract
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.
Collapse
Affiliation(s)
- Bilal Mirza
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Wei Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Jie Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Howard Choi
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Neo Christopher Chung
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.
| | - Peipei Ping
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Medicine (Cardiology), University of California Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
9
|
Kumar D, Bansal G, Narang A, Basak T, Abbas T, Dash D. Integrating transcriptome and proteome profiling: Strategies and applications. Proteomics 2016; 16:2533-2544. [PMID: 27343053 DOI: 10.1002/pmic.201600140] [Citation(s) in RCA: 124] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 06/12/2016] [Accepted: 06/23/2016] [Indexed: 12/17/2022]
Abstract
Discovering the gene expression signature associated with a cellular state is one of the basic quests in majority of biological studies. For most of the clinical and cellular manifestations, these molecular differences may be exhibited across multiple layers of gene regulation like genomic variations, gene expression, protein translation and post-translational modifications. These system wide variations are dynamic in nature and their crosstalk is overwhelmingly complex, thus analyzing them separately may not be very informative. This necessitates the integrative analysis of such multiple layers of information to understand the interplay of the individual components of the biological system. Recent developments in high throughput RNA sequencing and mass spectrometric (MS) technologies to probe transcripts and proteins made these as preferred methods for understanding global gene regulation. Subsequently, improvements in "big-data" analysis techniques enable novel conclusions to be drawn from integrative transcriptomic-proteomic analysis. The unified analyses of both these data types have been rewarding for several biological objectives like improving genome annotation, predicting RNA-protein quantities, deciphering gene regulations, discovering disease markers and drug targets. There are different ways in which transcriptomics and proteomics data can be integrated; each aiming for different research objectives. Here, we review various studies, approaches and computational tools targeted for integrative analysis of these two high-throughput omics methods.
Collapse
Affiliation(s)
- Dhirendra Kumar
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Gourja Bansal
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Ankita Narang
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA
| | - Trayambak Basak
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA.,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India
| | - Tahseen Abbas
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA.,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India
| | - Debasis Dash
- G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, New Delhi, INDIA. , .,Academy of Scientific & Innovative Research (AcSIR), CSIR-IGIB South Campus, New Delhi, India. ,
| |
Collapse
|
10
|
Waltman PH, Guo J, Reistetter EN, Purvine S, Ansong CK, van Baren MJ, Wong CH, Wei CL, Smith RD, Callister SJ, Stuart JM, Worden AZ. Identifying Aspects of the Post-Transcriptional Program Governing the Proteome of the Green Alga Micromonas pusilla. PLoS One 2016; 11:e0155839. [PMID: 27434306 PMCID: PMC4951065 DOI: 10.1371/journal.pone.0155839] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 05/05/2016] [Indexed: 11/18/2022] Open
Abstract
Micromonas is a unicellular motile alga within the Prasinophyceae, a green algal group that is related to land plants. This picoeukaryote (<2 μm diameter) is widespread in the marine environment but is not well understood at the cellular level. Here, we examine shifts in mRNA and protein expression over the course of the day-night cycle using triplicated mid-exponential, nutrient replete cultures of Micromonas pusilla CCMP1545. Samples were collected at key transition points during the diel cycle for evaluation using high-throughput LC-MS proteomics. In conjunction, matched mRNA samples from the same time points were sequenced using pair-ended directional Illumina RNA-Seq to investigate the dynamics and relationship between the mRNA and protein expression programs of M. pusilla. Similar to a prior study of the marine cyanobacterium Prochlorococcus, we found significant divergence in the mRNA and proteomics expression dynamics in response to the light:dark cycle. Additionally, expressional responses of genes and the proteins they encoded could also be variable within the same metabolic pathway, such as we observed in the oxygenic photosynthesis pathway. A regression framework was used to predict protein levels from both mRNA expression and gene-specific sequence-based features. Several features in the genome sequence were found to influence protein abundance including codon usage as well as 3’ UTR length and structure. Collectively, our studies provide insights into the regulation of the proteome over a diel cycle as well as the relationships between transcriptional and translational programs in the widespread marine green alga Micromonas.
Collapse
Affiliation(s)
- Peter H. Waltman
- University of California at Santa Cruz, Baskin School of Engineering, Santa Cruz, California, 95064, United States of America
| | - Jian Guo
- Monterey Bay Aquarium Research Institute, Moss Landing, California, United States of America
| | - Emily Nahas Reistetter
- Monterey Bay Aquarium Research Institute, Moss Landing, California, United States of America
| | - Samuel Purvine
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, 99352, United States of America
| | - Charles K. Ansong
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, 99352, United States of America
| | - Marijke J. van Baren
- Monterey Bay Aquarium Research Institute, Moss Landing, California, United States of America
| | - Chee-Hong Wong
- U.S. Department of Energy (DOE) Joint Genome Institute (JGI), Walnut Creek, California, 94598, United States of America
| | - Chia-Lin Wei
- U.S. Department of Energy (DOE) Joint Genome Institute (JGI), Walnut Creek, California, 94598, United States of America
| | - Richard D. Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, 99352, United States of America
| | - Stephen J. Callister
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, 99352, United States of America
- * E-mail: (SJC); (JMS); (AZW)
| | - Joshua M. Stuart
- University of California at Santa Cruz, Baskin School of Engineering, Santa Cruz, California, 95064, United States of America
- * E-mail: (SJC); (JMS); (AZW)
| | - Alexandra Z. Worden
- Monterey Bay Aquarium Research Institute, Moss Landing, California, United States of America
- University of California Santa Cruz, Department of Ocean Sciences, Santa Cruz, California, 95064, United States of America
- Integrated Microbial Biodiversity Program, Canadian Institute for Advanced Research, Toronto, Canada, M5G 1Z8
- * E-mail: (SJC); (JMS); (AZW)
| |
Collapse
|
11
|
Lin D, Zhang J, Li J, Xu C, Deng HW, Wang YP. An integrative imputation method based on multi-omics datasets. BMC Bioinformatics 2016; 17:247. [PMID: 27329642 PMCID: PMC4915152 DOI: 10.1186/s12859-016-1122-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 06/05/2016] [Indexed: 12/26/2022] Open
Abstract
Background Integrative analysis of multi-omics data is becoming increasingly important to unravel functional mechanisms of complex diseases. However, the currently available multi-omics datasets inevitably suffer from missing values due to technical limitations and various constrains in experiments. These missing values severely hinder integrative analysis of multi-omics data. Current imputation methods mainly focus on using single omics data while ignoring biological interconnections and information imbedded in multi-omics data sets. Results In this study, a novel multi-omics imputation method was proposed to integrate multiple correlated omics datasets for improving the imputation accuracy. Our method was designed to: 1) combine the estimates of missing value from individual omics data itself as well as from other omics, and 2) simultaneously impute multiple missing omics datasets by an iterative algorithm. We compared our method with five imputation methods using single omics data at different noise levels, sample sizes and data missing rates. The results demonstrated the advantage and efficiency of our method, consistently in terms of the imputation error and the recovery of mRNA-miRNA network structure. Conclusions We concluded that our proposed imputation method can utilize more biological information to minimize the imputation error and thus can improve the performance of downstream analysis such as genetic regulatory network construction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1122-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dongdong Lin
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA.,Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA
| | - Jigang Zhang
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA.,Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA
| | - Jingyao Li
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA.,Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA
| | - Chao Xu
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA.,Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA
| | - Hong-Wen Deng
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA.,Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA. .,Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA. .,Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA.
| |
Collapse
|
12
|
Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J Proteome Res 2016; 15:1116-25. [DOI: 10.1021/acs.jproteome.5b00981] [Citation(s) in RCA: 286] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Cosmin Lazar
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| | - Laurent Gatto
- Computational Proteomics Unit, Cambridge CB2 1GA, United Kingdom
- Cambridge Center for Proteomics, Cambridge CB2 1GA, United Kingdom
| | - Myriam Ferro
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| | - Christophe Bruley
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| | - Thomas Burger
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CNRS, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| |
Collapse
|
13
|
Korte HL, Fels SR, Christensen GA, Price MN, Kuehl JV, Zane GM, Deutschbauer AM, Arkin AP, Wall JD. Genetic basis for nitrate resistance in Desulfovibrio strains. Front Microbiol 2014; 5:153. [PMID: 24795702 PMCID: PMC4001038 DOI: 10.3389/fmicb.2014.00153] [Citation(s) in RCA: 162] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 03/21/2014] [Indexed: 12/31/2022] Open
Abstract
Nitrate is an inhibitor of sulfate-reducing bacteria (SRB). In petroleum production sites, amendments of nitrate and nitrite are used to prevent SRB production of sulfide that causes souring of oil wells. A better understanding of nitrate stress responses in the model SRB, Desulfovibrio vulgaris Hildenborough and Desulfovibrio alaskensis G20, will strengthen predictions of environmental outcomes of nitrate application. Nitrate inhibition of SRB has historically been considered to result from the generation of small amounts of nitrite, to which SRB are quite sensitive. Here we explored the possibility that nitrate might inhibit SRB by a mechanism other than through nitrite inhibition. We found that nitrate-stressed D. vulgaris cultures grown in lactate-sulfate conditions eventually grew in the presence of high concentrations of nitrate, and their resistance continued through several subcultures. Nitrate consumption was not detected over the course of the experiment, suggesting adaptation to nitrate. With high-throughput genetic approaches employing TnLE-seq for D. vulgaris and a pooled mutant library of D. alaskensis, we determined the fitness of many transposon mutants of both organisms in nitrate stress conditions. We found that several mutants, including homologs present in both strains, had a greatly increased ability to grow in the presence of nitrate but not nitrite. The mutated genes conferring nitrate resistance included the gene encoding the putative Rex transcriptional regulator (DVU0916/Dde_2702), as well as a cluster of genes (DVU0251-DVU0245/Dde_0597-Dde_0605) that is poorly annotated. Follow-up studies with individual D. vulgaris transposon and deletion mutants confirmed high-throughput results. We conclude that, in D. vulgaris and D. alaskensis, nitrate resistance in wild-type cultures is likely conferred by spontaneous mutations. Furthermore, the mechanisms that confer nitrate resistance may be different from those that confer nitrite resistance.
Collapse
Affiliation(s)
- Hannah L Korte
- Department of Biochemistry, University of Missouri Columbia, MO, USA ; Ecosystems and Networks Integrated with Genes and Molecular Assemblies Berkeley, CA, USA
| | - Samuel R Fels
- Ecosystems and Networks Integrated with Genes and Molecular Assemblies Berkeley, CA, USA ; Department of Molecular Microbiology and Immunology, University of Missouri Columbia, MO, USA
| | - Geoff A Christensen
- Department of Biochemistry, University of Missouri Columbia, MO, USA ; Ecosystems and Networks Integrated with Genes and Molecular Assemblies Berkeley, CA, USA
| | - Morgan N Price
- Ecosystems and Networks Integrated with Genes and Molecular Assemblies Berkeley, CA, USA ; Physical Biosciences Division, Lawrence Berkeley National Laboratory Berkeley, CA, USA
| | - Jennifer V Kuehl
- Ecosystems and Networks Integrated with Genes and Molecular Assemblies Berkeley, CA, USA ; Physical Biosciences Division, Lawrence Berkeley National Laboratory Berkeley, CA, USA
| | - Grant M Zane
- Department of Biochemistry, University of Missouri Columbia, MO, USA ; Ecosystems and Networks Integrated with Genes and Molecular Assemblies Berkeley, CA, USA
| | - Adam M Deutschbauer
- Ecosystems and Networks Integrated with Genes and Molecular Assemblies Berkeley, CA, USA ; Physical Biosciences Division, Lawrence Berkeley National Laboratory Berkeley, CA, USA
| | - Adam P Arkin
- Ecosystems and Networks Integrated with Genes and Molecular Assemblies Berkeley, CA, USA ; Physical Biosciences Division, Lawrence Berkeley National Laboratory Berkeley, CA, USA
| | - Judy D Wall
- Department of Biochemistry, University of Missouri Columbia, MO, USA ; Ecosystems and Networks Integrated with Genes and Molecular Assemblies Berkeley, CA, USA ; Department of Molecular Microbiology and Immunology, University of Missouri Columbia, MO, USA
| |
Collapse
|
14
|
Mehdi AM, Patrick R, Bailey TL, Bodén M. Predicting the dynamics of protein abundance. Mol Cell Proteomics 2014; 13:1330-40. [PMID: 24532840 DOI: 10.1074/mcp.m113.033076] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA-protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation efficiency. The software and data used in this research are available at http://bioinf.scmb.uq.edu.au/proteinabundance/.
Collapse
Affiliation(s)
- Ahmed M Mehdi
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072, Australia
| | | | | | | |
Collapse
|
15
|
Gibbs DL, Gralinski L, Baric RS, McWeeney SK. Multi-omic network signatures of disease. Front Genet 2014; 4:309. [PMID: 24432028 PMCID: PMC3882664 DOI: 10.3389/fgene.2013.00309] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2013] [Accepted: 12/19/2013] [Indexed: 12/12/2022] Open
Abstract
To better understand dynamic disease processes, integrated multi-omic methods are needed, yet comparing different types of omic data remains difficult. Integrative solutions benefit experimenters by eliminating potential biases that come with single omic analysis. We have developed the methods needed to explore whether a relationship exists between co-expression network models built from transcriptomic and proteomic data types, and whether this relationship can be used to improve the disease signature discovery process. A naïve, correlation based method is utilized for comparison. Using publicly available infectious disease time series data, we analyzed the related co-expression structure of the transcriptome and proteome in response to SARS-CoV infection in mice. Transcript and peptide expression data was filtered using quality scores and subset by taking the intersection on mapped Entrez IDs. Using this data set, independent co-expression networks were built. The networks were integrated by constructing a bipartite module graph based on module member overlap, module summary correlation, and correlation to phenotypes of interest. Compared to the module level results, the naïve approach is hindered by a lack of correlation across data types, less significant enrichment results, and little functional overlap across data types. Our module graph approach avoids these problems, resulting in an integrated omic signature of disease progression, which allows prioritization across data types for down-stream experiment planning. Integrated modules exhibited related functional enrichments and could suggest novel interactions in response to infection. These disease and platform-independent methods can be used to realize the full potential of multi-omic network signatures. The data (experiment SM001) are publically available through the NIAID Systems Virology (https://www.systemsvirology.org) and PNNL (http://omics.pnl.gov) web portals. Phenotype data is found in the supplementary information. The ProCoNA package is available as part of Bioconductor 2.13.
Collapse
Affiliation(s)
- David L Gibbs
- McWeeney Lab, Division of Bioinformatics and Computational Biology, Oregon Health & Science University Portland, OR, USA
| | - Lisa Gralinski
- Baric Lab, Department of Microbiology and Immunology, University of North Carolina at Chapel Hill Chapel Hill, NC, USA
| | - Ralph S Baric
- Baric Lab, Department of Microbiology and Immunology, University of North Carolina at Chapel Hill Chapel Hill, NC, USA
| | - Shannon K McWeeney
- McWeeney Lab, Division of Bioinformatics and Computational Biology, Oregon Health & Science University Portland, OR, USA ; McWeeney Lab, OHSU Knight Cancer Institute, Oregon Health & Science University Portland, OR, USA
| |
Collapse
|
16
|
Yang Y, Wang J, Yuan T, Bu D, Yang J, Sun P. Proteome profile of bovine ruminal epithelial tissue based on GeLC-MS/MS. Biotechnol Lett 2013; 35:1831-8. [PMID: 23974490 DOI: 10.1007/s10529-013-1291-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 06/25/2013] [Indexed: 12/15/2022]
Abstract
The proteome of rumen epithelial tissue was analysed by SDS-PAGE coupled with LC-MS/MS. 813 non-redundant proteins were identified of which 7.4 % featured membrane-spanning domains and 15.4 % harboured a signal peptide. According to the gene ontology annotation, the most abundant proteins exhibited binding activities related to their molecular functions, were proteins of cellular components or belonged to various metabolic processes. A predominant group of canonical pathways in the rumen epithelial tissue was identified using the IPA software. The GeLC-MS/MS approach was used to characterise the entire protein expression repertoire in rumen tissue, providing a more detailed understanding of the important biological processes in the rumen.
Collapse
Affiliation(s)
- Yongxin Yang
- State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | | | | | | | | | | |
Collapse
|
17
|
Ma NL, Rahmat Z, Lam SS. A review of the "Omics" approach to biomarkers of oxidative stress in Oryza sativa. Int J Mol Sci 2013; 14:7515-41. [PMID: 23567269 PMCID: PMC3645701 DOI: 10.3390/ijms14047515] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Revised: 03/20/2013] [Accepted: 03/20/2013] [Indexed: 12/27/2022] Open
Abstract
Physiological and ecological constraints that cause the slow growth and depleted production of crops have raised a major concern in the agriculture industry as they represent a possible threat of short food supply in the future. The key feature that regulates the stress signaling pathway is always related to the reactive oxygen species (ROS). The accumulation of ROS in plant cells would leave traces of biomarkers at the genome, proteome, and metabolome levels, which could be identified with the recent technological breakthrough coupled with improved performance of bioinformatics. This review highlights the recent breakthrough in molecular strategies (comprising transcriptomics, proteomics, and metabolomics) in identifying oxidative stress biomarkers and the arising opportunities and obstacles observed in research on biomarkers in rice. The major issue in incorporating bioinformatics to validate the biomarkers from different omic platforms for the use of rice-breeding programs is also discussed. The development of powerful techniques for identification of oxidative stress-related biomarkers and the integration of data from different disciplines shed light on the oxidative response pathways in plants.
Collapse
Affiliation(s)
- Nyuk Ling Ma
- Department of Biology, Faculty of Science and Technology, University Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia
| | - Zaidah Rahmat
- Department of Biotechnology and Medical Engineering, Faculty of Biosciences and Medical Engineering, University Technology Malaysia, 81310 Johor Bahru, Johor, Malaysia; E-Mail:
| | - Su Shiung Lam
- Department of Engineering Science, Faculty of Science and Technology, University Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia; E-Mail:
| |
Collapse
|
18
|
Garcia-Manteiga JM. Data Analysis and Interpretation in Metabolomics. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Metabolomics represents the new ‘omics’ approach of the functional genomics era. It consists in the identification and quantification of all small molecules, namely metabolites, in a given biological system. While metabolomics refers to the analysis of any possible biological system, metabonomics is specifically applied to disease and physiopathological situations. The data collected within these approaches is highly integrative of the other higher levels and is hence amenable to be explored with a top-down systems biology point of view. The aim of this chapter is to give a global view of the state of the art in metabolomics describing the two analytical techniques usually used to give rise to this kind of data, nuclear magnetic resonance, NMR, and mass spectrometry. In addition, the author will focus on the different data analysis tools that can be applied to such studies to extract information with special interest at the attempts to integrate metabolomics with other ‘omics’ approaches and its relevance in systems biology modeling.
Collapse
|
19
|
Haider S, Pal R. Integrated analysis of transcriptomic and proteomic data. Curr Genomics 2013; 14:91-110. [PMID: 24082820 PMCID: PMC3637682 DOI: 10.2174/1389202911314020003] [Citation(s) in RCA: 285] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Revised: 01/09/2013] [Accepted: 01/22/2013] [Indexed: 12/14/2022] Open
Abstract
Until recently, understanding the regulatory behavior of cells has been pursued through independent analysis of the transcriptome or the proteome. Based on the central dogma, it was generally assumed that there exist a direct correspondence between mRNA transcripts and generated protein expressions. However, recent studies have shown that the correlation between mRNA and Protein expressions can be low due to various factors such as different half lives and post transcription machinery. Thus, a joint analysis of the transcriptomic and proteomic data can provide useful insights that may not be deciphered from individual analysis of mRNA or protein expressions. This article reviews the existing major approaches for joint analysis of transcriptomic and proteomic data. We categorize the different approaches into eight main categories based on the initial algorithm and final analysis goal. We further present analogies with other domains and discuss the existing research problems in this area.
Collapse
Affiliation(s)
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| |
Collapse
|
20
|
A practical data processing workflow for multi-OMICS projects. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:52-62. [PMID: 23501674 DOI: 10.1016/j.bbapap.2013.02.029] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2012] [Revised: 02/15/2013] [Accepted: 02/20/2013] [Indexed: 12/11/2022]
Abstract
Multi-OMICS approaches aim on the integration of quantitative data obtained for different biological molecules in order to understand their interrelation and the functioning of larger systems. This paper deals with several data integration and data processing issues that frequently occur within this context. To this end, the data processing workflow within the PROFILE project is presented, a multi-OMICS project that aims on identification of novel biomarkers and the development of new therapeutic targets for seven important liver diseases. Furthermore, a software called CrossPlatformCommander is sketched, which facilitates several steps of the proposed workflow in a semi-automatic manner. Application of the software is presented for the detection of novel biomarkers, their ranking and annotation with existing knowledge using the example of corresponding Transcriptomics and Proteomics data sets obtained from patients suffering from hepatocellular carcinoma. Additionally, a linear regression analysis of Transcriptomics vs. Proteomics data is presented and its performance assessed. It was shown, that for capturing profound relations between Transcriptomics and Proteomics data, a simple linear regression analysis is not sufficient and implementation and evaluation of alternative statistical approaches are needed. Additionally, the integration of multivariate variable selection and classification approaches is intended for further development of the software. Although this paper focuses only on the combination of data obtained from quantitative Proteomics and Transcriptomics experiments, several approaches and data integration steps are also applicable for other OMICS technologies. Keeping specific restrictions in mind the suggested workflow (or at least parts of it) may be used as a template for similar projects that make use of different high throughput techniques. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
|
21
|
Torres-García W, Ashili S, Kelbauskas L, Johnson RH, Zhang W, Runger GC, Meldrum DR. A statistical framework for multiparameter analysis at the single-cell level. MOLECULAR BIOSYSTEMS 2012; 8:804-17. [DOI: 10.1039/c2mb05429a] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
22
|
Tsiliki G, Kossida S. Fusion methodologies for biomedical data. J Proteomics 2011; 74:2774-85. [PMID: 21767675 DOI: 10.1016/j.jprot.2011.07.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Revised: 06/13/2011] [Accepted: 07/01/2011] [Indexed: 12/12/2022]
Abstract
Data fusion methods are powerful tools for integrating the different views of an organism provided by various types of experimental data. We describe various methodologies for integrating and drawing inferences from a collection of biomedical data, primarily focusing on protein and gene expression data. Computational experiments performed using biomedical data, including known protein-protein interactions, hydropathy profiles, gene expression data and amino acid sequences, demonstrate the utility of this approach. Overall, studies agree in that methodologies using carefully selected data of various types to predict particular classes, groups and interactions, perform better than when applied to a single type of data.
Collapse
Affiliation(s)
- Georgia Tsiliki
- Bioinformatics andMedical Informatics Group, Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou, 115 27, Athens, Greece.
| | | |
Collapse
|
23
|
Keller KL, Wall JD. Genetics and molecular biology of the electron flow for sulfate respiration in desulfovibrio. Front Microbiol 2011; 2:135. [PMID: 21747813 PMCID: PMC3129016 DOI: 10.3389/fmicb.2011.00135] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2011] [Accepted: 06/10/2011] [Indexed: 11/25/2022] Open
Abstract
Progress in the genetic manipulation of the Desulfovibrio strains has provided an opportunity to explore electron flow pathways during sulfate respiration. Most bacteria in this genus couple the oxidation of organic acids or ethanol with the reduction of sulfate, sulfite, or thiosulfate. Both fermentation of pyruvate in the absence of an alternative terminal electron acceptor, disproportionation of fumarate and growth on H2 with CO2 during sulfate reduction are exhibited by some strains. The ability to produce or consume H2 provides Desulfovibrio strains the capacity to participate as either partner in interspecies H2 transfer. Interestingly the mechanisms of energy conversion, pathways of electron flow and the parameters determining the pathways used remain to be elucidated. Recent application of molecular genetic tools for the exploration of the metabolism of Desulfovibrio vulgaris Hildenborough has provided several new datasets that might provide insights and constraints to the electron flow pathways. These datasets include (1) gene expression changes measured in microarrays for cells cultured with different electron donors and acceptors, (2) relative mRNA abundances for cells growing exponentially in defined medium with lactate as carbon source and electron donor plus sulfate as terminal electron acceptor, and (3) a random transposon mutant library selected on medium containing lactate plus sulfate supplemented with yeast extract. Studies of directed mutations eliminating apparent key components, the quinone-interacting membrane-bound oxidoreductase (Qmo) complex, the Type 1 tetraheme cytochrome c3 (Tp1-c3), or the Type 1 cytochrome c3:menaquinone oxidoreductase (Qrc) complex, suggest a greater flexibility in electron flow than previously considered. The new datasets revealed the absence of random transposons in the genes encoding an enzyme with homology to Coo membrane-bound hydrogenase. From this result, we infer that Coo hydrogenase plays an important role in D. vulgaris growth on lactate plus sulfate. These observations along with those reported previously have been combined in a model showing dual pathways of electrons from the oxidation of both lactate and pyruvate during sulfate respiration. Continuing genetic and biochemical analyses of key genes in Desulfovibrio strains will allow further clarification of a general model for sulfate respiration.
Collapse
Affiliation(s)
- Kimberly L Keller
- Department of Biochemistry, University of Missouri Columbia, MO, USA
| | | |
Collapse
|
24
|
Prediction and Characterization of Missing Proteomic Data in Desulfovibrio vulgaris. Comp Funct Genomics 2011; 2011:780973. [PMID: 21687592 PMCID: PMC3114432 DOI: 10.1155/2011/780973] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2010] [Revised: 12/17/2010] [Accepted: 03/01/2011] [Indexed: 11/17/2022] Open
Abstract
Proteomic datasets are often incomplete due to identification range and sensitivity issues. It becomes important to develop methodologies to estimate missing proteomic data, allowing better interpretation of proteomic datasets and metabolic mechanisms underlying complex biological systems. In this study, we applied an artificial neural network to approximate the relationships between cognate transcriptomic and proteomic datasets of Desulfovibrio vulgaris, and to predict protein abundance for the proteins not experimentally detected, based on several relevant predictors, such as mRNA abundance, cellular role and triple codon counts. The results showed that the coefficients of determination for the trained neural network models ranged from 0.47 to 0.68, providing better modeling than several previous regression models. The validity of the trained neural network model was evaluated using biological information (i.e. operons). To seek understanding of mechanisms causing missing proteomic data, we used a multivariate logistic regression analysis and the result suggested that some key factors, such as protein instability index, aliphatic index, mRNA abundance, effective number of codons (N(c)) and codon adaptation index (CAI) values may be ascribed to whether a given expressed protein can be detected. In addition, we demonstrated that biological interpretation can be improved by use of imputed proteomic datasets.
Collapse
|
25
|
Torres-García W, Brown SD, Johnson RH, Zhang W, Runger GC, Meldrum DR. Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets. MOLECULAR BIOSYSTEMS 2011; 7:1093-104. [PMID: 21212895 DOI: 10.1039/c0mb00260g] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Despite significant improvements in recent years, proteomic datasets currently available still suffer from large number of missing values. Integrative analyses based upon incomplete proteomic and transcriptomic datasets could seriously bias the biological interpretation. In this study, we applied a non-linear data-driven stochastic gradient boosted trees (GBT) model to impute missing proteomic values using a temporal transcriptomic and proteomic dataset of Shewanella oneidensis. In this dataset, genes' expression was measured after the cells were exposed to 1 mM potassium chromate for 5, 30, 60, and 90 min, while protein abundance was measured for 45 and 90 min. With the ultimate objective to impute protein values for experimentally undetected samples at 45 and 90 min, we applied a serial set of algorithms to capture relationships between temporal gene and protein expression. This work follows four main steps: (1) a quality control step for gene expression reliability, (2) mRNA imputation, (3) protein prediction, and (4) validation. Initially, an S control chart approach is performed on gene expression replicates to remove unwanted variability. Then, we focused on the missing measurements of gene expression through a nonlinear Smoothing Splines Curve Fitting. This method identifies temporal relationships among transcriptomic data at different time points and enables imputation of mRNA abundance at 45 min. After mRNA imputation was validated by biological constrains (i.e. operons), we used a data-driven GBT model to impute protein abundance for the proteins experimentally undetected in the 45 and 90 min samples, based on relevant predictors such as temporal mRNA gene expression data and cellular functional roles. The imputed protein values were validated using biological constraints such as operon and pathway information through a permutation test to investigate whether dispersion measures are indeed smaller for known biological groups than for any set of random genes. Finally, we demonstrated that such missing value imputation improved characterization of the temporal response of S. oneidensis to chromate.
Collapse
Affiliation(s)
- Wandaliz Torres-García
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85287-5906, USA.
| | | | | | | | | | | |
Collapse
|
26
|
Aittokallio T. Dealing with missing values in large-scale studies: microarray data imputation and beyond. Brief Bioinform 2009; 11:253-64. [DOI: 10.1093/bib/bbp059] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
27
|
Zhang W, Li F, Nie L. Integrating multiple 'omics' analysis for microbial biology: application and methodologies. MICROBIOLOGY-SGM 2009; 156:287-301. [PMID: 19910409 DOI: 10.1099/mic.0.034793-0] [Citation(s) in RCA: 285] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Recent advances in various 'omics' technologies enable quantitative monitoring of the abundance of various biological molecules in a high-throughput manner, and thus allow determination of their variation between different biological states on a genomic scale. Several popular 'omics' platforms that have been used in microbial systems biology include transcriptomics, which measures mRNA transcript levels; proteomics, which quantifies protein abundance; metabolomics, which determines abundance of small cellular metabolites; interactomics, which resolves the whole set of molecular interactions in cells; and fluxomics, which establishes dynamic changes of molecules within a cell over time. However, no single 'omics' analysis can fully unravel the complexities of fundamental microbial biology. Therefore, integration of multiple layers of information, the multi-'omics' approach, is required to acquire a precise picture of living micro-organisms. In spite of this being a challenging task, some attempts have been made recently to integrate heterogeneous 'omics' datasets in various microbial systems and the results have demonstrated that the multi-'omics' approach is a powerful tool for understanding the functional principles and dynamics of total cellular systems. This article reviews some basic concepts of various experimental 'omics' approaches, recent application of the integrated 'omics' for exploring metabolic and regulatory mechanisms in microbes, and advances in computational and statistical methodologies associated with integrated 'omics' analyses. Online databases and bioinformatic infrastructure available for integrated 'omics' analyses are also briefly discussed.
Collapse
Affiliation(s)
- Weiwen Zhang
- Center for Ecogenomics, Biodesign Institute, Arizona State University, Tempe, AZ 85287-6501, USA
| | - Feng Li
- Division of Biometrics II, Office of Biometrics/OTS/CDER/FDA, Silver Spring, MD 20993-0002, USA
| | - Lei Nie
- Division of Biometrics IV, Office of Biometrics/OTS/CDER/FDA, Silver Spring, MD 20993-0002, USA
| |
Collapse
|