1
|
Pawłowski PH, Zielenkiewicz P. Predicting the S. cerevisiae Gene Expression Score by a Machine Learning Classifier. Life (Basel) 2025; 15:723. [PMID: 40430151 PMCID: PMC12113619 DOI: 10.3390/life15050723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2025] [Revised: 04/27/2025] [Accepted: 04/28/2025] [Indexed: 05/29/2025] Open
Abstract
The topic of this work is gene expression and its score according to various factors analyzed globally using machine learning techniques. The expression score (ES) of genes characterizes their activity and, thus, their importance for cellular processes. This may depend on many different factors (attributes). To find the most important classifier, a machine learning classifier (random forest) was selected, trained, and optimized on the Waikato Environment for Knowledge Analysis WEKA platform, resulting in the most accurate attribute-dependent prediction of the ES of Saccharomyces cerevisiae genes. In this way, data from the Saccharomyces Genome Database (SGD), presenting ES values corresponding to a wide spectrum of attributes, were used, revised, classified, and balanced, and the significance of the considered attributes was evaluated. In this way, the novel random forest model indicates the most important attributes determining classes of low, moderate, and high ES. They cover both the experimental conditions and the genetic, physical, statistical, and logistic features. During validation, the obtained model could classify the instances of a primary unknown test set with a correctness of 84.1%.
Collapse
Affiliation(s)
- Piotr H. Pawłowski
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, 02-093 Warsaw, Poland;
| | - Piotr Zielenkiewicz
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, 02-093 Warsaw, Poland;
- Laboratory of Systems Biology, Institute of Experimental Plant Biology and Biotechnology, Faculty of Biology, University of Warsaw, 02-096 Warsaw, Poland
| |
Collapse
|
2
|
Adeniji AA, Chukwuneme CF, Conceição EC, Ayangbenro AS, Wilkinson E, Maasdorp E, de Oliveira T, Babalola OO. Unveiling novel features and phylogenomic assessment of indigenous Priestia megaterium AB-S79 using comparative genomics. Microbiol Spectr 2025; 13:e0146624. [PMID: 39969228 PMCID: PMC11960082 DOI: 10.1128/spectrum.01466-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 12/12/2024] [Indexed: 02/20/2025] Open
Abstract
Priestia megaterium strain AB-S79 isolated from active gold mine soil previously expressed in vitro heavy metal resistance and has a 5.7 Mb genome useful for biotechnological exploitation. This study used web-based bioinformatic resources to analyze P. megaterium AB-S79 genomic relatedness, decipher its secondary metabolite biosynthetic gene clusters (BGCs), and better comprehend its taxa. Genes were highly conserved across the 14 P. megaterium genomes examined here. The pangenome reflected a total of 61,397 protein-coding genes, 59,745 homolog protein family hits, and 1,652 singleton protein family hits. There were also 7,735 protein families, including 1,653 singleton families and 6,082 homolog families. OrthoVenn3 comparison of AB-S79 protein sequences with 13 other P. megaterium strains, 7 other Priestia spp., and 6 other Bacillus spp. highlighted AB-S79's unique genomic and evolutionary trait. antiSMASH identified two key transcription factor binding site regulators in AB-S79's genome: zinc-responsive repressor (Zur) and antibiotic production activator (AbrC3), plus putative enzymes for the biosynthesis of terpenes and ranthipeptides. AB-S79 also harbors BGCs for two unique siderophores (synechobactins and schizokinens), phosphonate, dienelactone hydrolase family protein, and phenazine biosynthesis protein (phzF), which is significant for this study. Phosphonate particularly showed specificity for the P. megaterium sp. validating the effect of gene family expansion and contraction. P. megaterium AB-S79 looks to be a viable source for value-added compounds. Thus, this study contributes to the theoretical framework for the systematic metabolic and genetic exploitation of the P. megaterium sp., particularly the value-yielding strains. IMPORTANCE This study explores microbial natural product discovery using genome mining, focusing on Priestia megaterium. Key findings highlight the potential of P. megaterium, particularly strain AB-S79, for biotechnological applications. The research shows a limited output of P. megaterium genome sequences from Africa, emphasizing the importance of the native strain AB-S79. Additionally, the study underlines the strain's diverse metabolic capabilities, reinforcing its suitability as a model for microbial cell factories and its foundational role in future biotechnological exploitation.
Collapse
Affiliation(s)
- Adetomiwa Ayodele Adeniji
- Centre for Epidemic Response & Innovation, School of Data Science & Computational Thinking, Stellenbosch University, Cape Town, South Africa
- Food Security & Safety Focus Area, Faculty of Natural & Agricultural Sciences, North-West University, Mmabatho, South Africa
| | - Chinenyenwa Fortune Chukwuneme
- Department of Natural Sciences, Faculty of Applied & Computer Sciences, Vaal University of Technology, Vanderbijlpark, South Africa
| | - Emilyn Costa Conceição
- SAMRC Centre for Tuberculosis Research, Division of Molecular Biology & Human Genetics, Faculty of Medicine & Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Ayansina Segun Ayangbenro
- Food Security & Safety Focus Area, Faculty of Natural & Agricultural Sciences, North-West University, Mmabatho, South Africa
| | - Eduan Wilkinson
- Centre for Epidemic Response & Innovation, School of Data Science & Computational Thinking, Stellenbosch University, Cape Town, South Africa
| | - Elizna Maasdorp
- SAMRC Centre for Tuberculosis Research, Division of Immunology, Faculty of Medicine & Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Tulio de Oliveira
- Centre for Epidemic Response & Innovation, School of Data Science & Computational Thinking, Stellenbosch University, Cape Town, South Africa
| | - Olubukola Oluranti Babalola
- Food Security & Safety Focus Area, Faculty of Natural & Agricultural Sciences, North-West University, Mmabatho, South Africa
- Department of Life Sciences, Faculty of Natural Sciences, Imperial College, Berkshire, United Kingdom
| |
Collapse
|
3
|
Ferreira MADM, Silveira WBD, Nikoloski Z. Protein constraints in genome-scale metabolic models: Data integration, parameter estimation, and prediction of metabolic phenotypes. Biotechnol Bioeng 2024; 121:915-930. [PMID: 38178617 DOI: 10.1002/bit.28650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 10/24/2023] [Accepted: 12/18/2023] [Indexed: 01/06/2024]
Abstract
Genome-scale metabolic models provide a valuable resource to study metabolism and cell physiology. These models are employed with approaches from the constraint-based modeling framework to predict metabolic and physiological phenotypes. The prediction performance of genome-scale metabolic models can be improved by including protein constraints. The resulting protein-constrained models consider data on turnover numbers (kcat ) and facilitate the integration of protein abundances. In this systematic review, we present and discuss the current state-of-the-art regarding the estimation of kinetic parameters used in protein-constrained models. We also highlight how data-driven and constraint-based approaches can aid the estimation of turnover numbers and their usage in improving predictions of cellular phenotypes. Finally, we identify standing challenges in protein-constrained metabolic models and provide a perspective regarding future approaches to improve the predictive performance.
Collapse
Affiliation(s)
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
- Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| |
Collapse
|
4
|
Moura Ferreira MAD, Wendering P, Arend M, Batista da Silveira W, Nikoloski Z. Accurate prediction of in vivo protein abundances by coupling constraint-based modelling and machine learning. Metab Eng 2023; 80:184-192. [PMID: 37802292 DOI: 10.1016/j.ymben.2023.09.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/10/2023] [Accepted: 09/25/2023] [Indexed: 10/08/2023]
Abstract
Quantification of how different environmental cues affect protein allocation can provide important insights for understanding cell physiology. While absolute quantification of proteins can be obtained by resource-intensive mass-spectrometry-based technologies, prediction of protein abundances offers another way to obtain insights into protein allocation. Here we present CAMEL, a framework that couples constraint-based modelling with machine learning to predict protein abundance for any environmental condition. This is achieved by building machine learning models that leverage static features, derived from protein sequences, and condition-dependent features predicted from protein-constrained metabolic models. Our findings demonstrate that CAMEL results in excellent prediction of protein allocation in E. coli (average Pearson correlation of at least 0.9), and moderate performance in S. cerevisiae (average Pearson correlation of at least 0.5). Therefore, CAMEL outperformed contending approaches without using molecular read-outs from unseen conditions and provides a valuable tool for using protein allocation in biotechnological applications.
Collapse
Affiliation(s)
| | - Philipp Wendering
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany
| | - Marius Arend
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany.
| |
Collapse
|
5
|
Moreira-Ramos S, Arias L, Flores R, Katz A, Levicán G, Orellana O. Synonymous mutations in the phosphoglycerate kinase 1 gene induce an altered response to protein misfolding in Schizosaccharomyces pombe. Front Microbiol 2023; 13:1074741. [PMID: 36713198 PMCID: PMC9875302 DOI: 10.3389/fmicb.2022.1074741] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 12/20/2022] [Indexed: 01/13/2023] Open
Abstract
Background Proteostasis refers to the processes that regulate the biogenesis, folding, trafficking, and degradation of proteins. Any alteration in these processes can lead to cell malfunction. Protein synthesis, a key proteostatic process, is highly-regulated at multiple levels to ensure adequate adaptation to environmental and physiological challenges such as different stressors, proteotoxic conditions and aging, among other factors. Because alterations in protein translation can lead to protein misfolding, examining how protein translation is regulated may also help to elucidate in part how proteostasis is controlled. Codon usage bias has been implicated in the fine-tuning of translation rate, as more-frequent codons might be read faster than their less-frequent counterparts. Thus, alterations in codon usage due to synonymous mutations may alter translation kinetics and thereby affect the folding of the nascent polypeptide, without altering its primary structure. To date, it has been difficult to predict the effect of synonymous mutations on protein folding and cellular fitness due to a scarcity of relevant data. Thus, the purpose of this work was to assess the effect of synonymous mutations in discrete regions of the gene that encodes the highly-expressed enzyme 3-phosphoglycerate kinase 1 (pgk1) in the fission yeast Schizosaccharomyces pombe. Results By means of systematic replacement of synonymous codons along pgk1, we found slightly-altered protein folding and activity in a region-specific manner. However, alterations in protein aggregation, heat stress as well as changes in proteasome activity occurred independently of the mutated region. Concomitantly, reduced mRNA levels of the chaperones Hsp9 and Hsp16 were observed. Conclusion Taken together, these data suggest that codon usage bias of the gene encoding this highly-expressed protein is an important regulator of protein function and proteostasis.
Collapse
Affiliation(s)
- Sandra Moreira-Ramos
- Programa de Biología Celular y Molecular, Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Loreto Arias
- Programa de Biología Celular y Molecular, Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Rodrigo Flores
- Programa de Biología Celular y Molecular, Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Assaf Katz
- Programa de Biología Celular y Molecular, Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Gloria Levicán
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile (USACH), Santiago, Chile
| | - Omar Orellana
- Programa de Biología Celular y Molecular, Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad de Chile, Santiago, Chile,*Correspondence: Omar Orellana,
| |
Collapse
|
6
|
Korenskaia AE, Matushkin YG, Lashin SA, Klimenko AI. Bioinformatic Assessment of Factors Affecting the Correlation between Protein Abundance and Elongation Efficiency in Prokaryotes. Int J Mol Sci 2022; 23:11996. [PMID: 36233299 PMCID: PMC9570070 DOI: 10.3390/ijms231911996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/23/2022] [Accepted: 09/30/2022] [Indexed: 11/07/2022] Open
Abstract
Protein abundance is crucial for the majority of genetically regulated cell functions to act properly in prokaryotic organisms. Therefore, developing bioinformatic methods for assessing the efficiency of different stages of gene expression is of great importance for predicting the actual protein abundance. One of these steps is the evaluation of translation elongation efficiency based on mRNA sequence features, such as codon usage bias and mRNA secondary structure properties. In this study, we have evaluated correlation coefficients between experimentally measured protein abundance and predicted elongation efficiency characteristics for 26 prokaryotes, including non-model organisms, belonging to diverse taxonomic groups The algorithm for assessing elongation efficiency takes into account not only codon bias, but also number and energy of secondary structures in mRNA if those demonstrate an impact on predicted elongation efficiency of the ribosomal protein genes. The results show that, for a number of organisms, secondary structures are a better predictor of protein abundance than codon usage bias. The bioinformatic analysis has revealed several factors associated with the value of the correlation coefficient. The first factor is the elongation efficiency optimization type-the organisms whose genomes are optimized for codon usage only have significantly higher correlation coefficients. The second factor is taxonomical identity-bacteria that belong to the class Bacilli tend to have higher correlation coefficients among the analyzed set. The third is growth rate, which is shown to be higher for the organisms with higher correlation coefficients between protein abundance and predicted translation elongation efficiency. The obtained results can be useful for further improvement of methods for protein abundance prediction.
Collapse
Affiliation(s)
- Aleksandra E. Korenskaia
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, 630090 Novosibirsk, Russia
| | - Yury G. Matushkin
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, 630090 Novosibirsk, Russia
| | - Sergey A. Lashin
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, 630090 Novosibirsk, Russia
| | - Alexandra I. Klimenko
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
| |
Collapse
|
7
|
Ferreira M, Ventorim R, Almeida E, Silveira S, Silveira W. Protein Abundance Prediction Through Machine Learning Methods. J Mol Biol 2021; 433:167267. [PMID: 34563548 DOI: 10.1016/j.jmb.2021.167267] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 09/09/2021] [Accepted: 09/17/2021] [Indexed: 10/20/2022]
Abstract
Proteins are responsible for most physiological processes, and their abundance provides crucial information for systems biology research. However, absolute protein quantification, as determined by mass spectrometry, still has limitations in capturing the protein pool. Protein abundance is impacted by translation kinetics, which rely on features of codons. In this study, we evaluated the effect of codon usage bias of genes on protein abundance. Notably, we observed differences regarding codon usage patterns between genes coding for highly abundant proteins and genes coding for less abundant proteins. Analysis of synonymous codon usage and evolutionary selection showed a clear split between the two groups. Our machine learning models predicted protein abundances from codon usage metrics with remarkable accuracy, achieving strong correlation with experimental data. Upon integration of the predicted protein abundance in enzyme-constrained genome-scale metabolic models, the simulated phenotypes closely matched experimental data, which demonstrates that our predictive models are valuable tools for systems metabolic engineering approaches.
Collapse
Affiliation(s)
- Mauricio Ferreira
- Department of Microbiology, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil. https://twitter.com/@mauriciomyces
| | - Rafaela Ventorim
- Department of Microbiology, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil.
| | - Eduardo Almeida
- Department of Microbiology, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil. https://twitter.com/@elm_almeida
| | - Sabrina Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil. https://twitter.com/@sabrina_as
| | - Wendel Silveira
- Department of Microbiology, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil.
| |
Collapse
|
8
|
Dai X, Xu F, Wang S, Mundra PA, Zheng J. PIKE-R2P: Protein-protein interaction network-based knowledge embedding with graph neural network for single-cell RNA to protein prediction. BMC Bioinformatics 2021; 22:139. [PMID: 34078261 PMCID: PMC8170782 DOI: 10.1186/s12859-021-04022-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 02/11/2021] [Indexed: 12/05/2022] Open
Abstract
Background Recent advances in simultaneous measurement of RNA and protein abundances at single-cell level provide a unique opportunity to predict protein abundance from scRNA-seq data using machine learning models. However, existing machine learning methods have not considered relationship among the proteins sufficiently. Results We formulate this task in a multi-label prediction framework where multiple proteins are linked to each other at the single-cell level. Then, we propose a novel method for single-cell RNA to protein prediction named PIKE-R2P, which incorporates protein–protein interactions (PPI) and prior knowledge embedding into a graph neural network. Compared with existing methods, PIKE-R2P could significantly improve prediction performance in terms of smaller errors and higher correlations with the gold standard measurements. Conclusion The superior performance of PIKE-R2P indicates that adding the prior knowledge of PPI to graph neural networks can be a powerful strategy for cross-modality prediction of protein abundances at the single-cell level.
Collapse
Affiliation(s)
- Xinnan Dai
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, Shanghai, 201210, China
| | - Fan Xu
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, Shanghai, 201210, China
| | - Shike Wang
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, Shanghai, 201210, China
| | - Piyushkumar A Mundra
- Molecular Oncology Group, Cancer Research UK Manchester Institute, The University of Manchester, Alderley Park, Manchester, UK
| | - Jie Zheng
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, Shanghai, 201210, China.
| |
Collapse
|
9
|
Tsang O, Wong JWH. Proteogenomic interrogation of cancer cell lines: an overview of the field. Expert Rev Proteomics 2021; 18:221-232. [PMID: 33877947 DOI: 10.1080/14789450.2021.1914594] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Introduction: Cancer cell lines (CCLs) have been a major resource for cancer research. Over the past couple of decades, they have been instrumental in omic profiling method development and as model systems to generate new knowledge in cell and cancer biology. More recently, with the increasing amount of genomic, transcriptomic and proteomic data being generated in hundreds of CCLs, there is growing potential for integrative proteogenomic data analyses to be performed.Areas covered: In this review, we first describe the most commonly used proteome profiling methods in CCLs. We then discuss how these proteomics data can be integrated with genomics data for proteogenomics analyses. Finally, we highlight some of the recent biological discoveries that have arisen from proteogenomics analyses of CCLs.Expert opinion: Protegeonomics analyses of CCLs have so far enabled the discovery of novel proteins and proteoforms. It has also improved our understanding of biological processes including post-transcriptional regulation of protein abundance and the presentation of antigens by major histocompatibility complex alleles. With proteomics data to be generated in hundreds to thousands of CCLs in coming years, there will be further potential for large-scale proteogenomics analyses and data integration with the phenotypically well-characterized CCLs.
Collapse
Affiliation(s)
- Olson Tsang
- Centre for PanorOmic Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR
| | - Jason W H Wong
- Centre for PanorOmic Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR.,School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR
| |
Collapse
|
10
|
Deng L, Handler DCL, Multari DH, Haynes PA. Comparison of protein and peptide fractionation approaches in protein identification and quantification from Saccharomyces cerevisiae. J Chromatogr B Analyt Technol Biomed Life Sci 2021; 1162:122453. [PMID: 33279813 DOI: 10.1016/j.jchromb.2020.122453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 11/09/2020] [Accepted: 11/12/2020] [Indexed: 11/29/2022]
Abstract
Shotgun proteomics is a high-throughput technology which has been developed with the aim of investigating the maximum number of proteins in cells in a given experiment. However, protein discovery and data generation vary in depth and coverage when different technical strategies are selected. In this study, three different sample preparation approaches, and peptide or protein fractionation methods, were applied to identify and quantify proteins from log-phase yeast lysate: sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), filter-aided sample preparation coupled with gas phase fractionation (FASP-GPF), and FASP - high pH reversed phase fractionation (HpH). Fractions were initially analyzed and compared using nanoflow liquid chromatography - tandem mass spectrometry (nanoLC-MS/MS) employing data dependent acquisition on a linear ion trap instrument. The number of fractions and analytical replicates was adjusted so that each experiment used a similar amount of mass spectrometric instrument time. A second set of experiments was performed, comparing FASP-GPF, SDS-PAGE and FASP-HpH using a Q Exactive Orbitrap mass spectrometer. Compared with results from the linear ion trap mass spectrometer, the use of a Q Exactive Orbitrap mass spectrometer enabled a substantial increase in protein identifications, and an even greater increase in peptide identifications. This shows that the main advantage of using the higher resolution instrument is in increased proteome coverage. A total of 1035, 1357 and 2134 proteins were separately identified by FASP-GPF, SDS-PAGE and FASP-HpH. Combining results from the Orbitrap experiments, there were a total of 2269 proteins found, with 94% of them identified using the FASP-HpH method. Therefore, the FASP-HpH method is the optimal choice among these approaches, when applied to this type of sample.
Collapse
Affiliation(s)
- Liting Deng
- Department of Molecular Sciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - David C L Handler
- Department of Molecular Sciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Dylan H Multari
- Department of Molecular Sciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia
| | - Paul A Haynes
- Department of Molecular Sciences, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia; Biomolecular Discovery Research Centre, Faculty of Science and Engineering, Macquarie University, Sydney, NSW, Australia.
| |
Collapse
|
11
|
Eicher T, Patt A, Kautto E, Machiraju R, Mathé E, Zhang Y. Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge. BMC Bioinformatics 2019; 20:669. [PMID: 31861998 PMCID: PMC6923881 DOI: 10.1186/s12859-019-3253-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking. Results We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches. Conclusions In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.
Collapse
Affiliation(s)
- Tara Eicher
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA
| | - Andrew Patt
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Esko Kautto
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Raghu Machiraju
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA. .,Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
| | - Ewy Mathé
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
| | - Yan Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA. .,The Ohio State University Comprehensive Cancer Center (OSUCCC - James), Columbus, OH, 43210, USA.
| |
Collapse
|
12
|
Innovating the Concept and Practice of Two-Dimensional Gel Electrophoresis in the Analysis of Proteomes at the Proteoform Level. Proteomes 2019; 7:proteomes7040036. [PMID: 31671630 PMCID: PMC6958347 DOI: 10.3390/proteomes7040036] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/15/2019] [Accepted: 10/28/2019] [Indexed: 12/21/2022] Open
Abstract
Two-dimensional gel electrophoresis (2DE) is an important and well-established technical platform enabling extensive top-down proteomic analysis. However, the long-held but now largely outdated conventional concepts of 2DE have clearly impacted its application to in-depth investigations of proteomes at the level of protein species/proteoforms. It is time to popularize a new concept of 2DE for proteomics. With the development and enrichment of the proteome concept, any given “protein” is now recognized to consist of a series of proteoforms. Thus, it is the proteoform, rather than the canonical protein, that is the basic unit of a proteome, and each proteoform has a specific isoelectric point (pI) and relative mass (Mr). Accordingly, using 2DE, each proteoform can routinely be resolved and arrayed according to its different pI and Mr. Each detectable spot contains multiple proteoforms derived from the same gene, as well as from different genes. Proteoforms derived from the same gene are distributed into different spots in a 2DE pattern. High-resolution 2DE is thus actually an initial level of separation to address proteome complexity and is effectively a pre-fractionation method prior to analysis using mass spectrometry (MS). Furthermore, stable isotope-labeled 2DE coupled with high-sensitivity liquid chromatography-tandem MS (LC-MS/MS) has tremendous potential for the large-scale detection, identification, and quantification of the proteoforms that constitute proteomes.
Collapse
|
13
|
Zhan X, Yang H, Peng F, Li J, Mu Y, Long Y, Cheng T, Huang Y, Li Z, Lu M, Li N, Li M, Liu J, Jungblut PR. How many proteins can be identified in a 2DE gel spot within an analysis of a complex human cancer tissue proteome? Electrophoresis 2018; 39:965-980. [PMID: 29205401 DOI: 10.1002/elps.201700330] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Revised: 11/03/2017] [Accepted: 11/17/2017] [Indexed: 01/28/2023]
Affiliation(s)
- Xianquan Zhan
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- The State Key Laboratory of Medical Genetics; Central South University; Changsha Hunan P. R. China
| | - Haiyan Yang
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
| | - Fang Peng
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
| | - Jianglin Li
- Molecular Science and Biomedicine Laboratory, State Key Laboratory for Chemo/Biosensing and Chemometrics, College of Biology, Hunan University; Changsha Hunan P. R. China
| | - Yun Mu
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
| | - Ying Long
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
| | - Tingting Cheng
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
| | - Yuda Huang
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
| | - Zhao Li
- Department of Neurosurgery; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
| | - Miaolong Lu
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
| | - Na Li
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
| | - Maoyu Li
- Key Laboratory of Cancer Proteomics of Chinese Ministry of Health; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- Hunan Engineering Laboratory for Structural Biology and Drug Design; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
- State Local Joint Engineering Laboratory for Anticancer Drugs; Xiangya Hospital, Central South University; Changsha Hunan P. R. China
| | - Jianping Liu
- Bio-Analytical Chemistry Research Laboratory, Modern Analytical Testing Center; Central South University; Changsha Hunan P. R. China
| | - Peter R. Jungblut
- Max Planck Institute for Infection Biology, Core Facility Protein Analysis; Berlin Germany
| |
Collapse
|
14
|
Perl K, Ushakov K, Pozniak Y, Yizhar-Barnea O, Bhonker Y, Shivatzki S, Geiger T, Avraham KB, Shamir R. Reduced changes in protein compared to mRNA levels across non-proliferating tissues. BMC Genomics 2017; 18:305. [PMID: 28420336 PMCID: PMC5395847 DOI: 10.1186/s12864-017-3683-9] [Citation(s) in RCA: 88] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 04/04/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The quantitative relations between RNA and protein are fundamental to biology and are still not fully understood. Across taxa, it was demonstrated that the protein-to-mRNA ratio in steady state varies in a direction that lessens the change in protein levels as a result of changes in the transcript abundance. Evidence for this behavior in tissues is sparse. We tested this phenomenon in new data that we produced for the mouse auditory system, and in previously published tissue datasets. A joint analysis of the transcriptome and proteome was performed across four datasets: inner-ear mouse tissues, mouse organ tissues, lymphoblastoid primate samples and human cancer cell lines. RESULTS We show that the protein levels are more conserved than the mRNA levels in all datasets, and that changes in transcription are associated with translational changes that exert opposite effects on the final protein level, in all tissues except cancer. Finally, we observe that some functions are enriched in the inner ear on the mRNA level but not in protein. CONCLUSIONS We suggest that partial buffering between transcription and translation ensures that proteins can be made rapidly in response to a stimulus. Accounting for the buffering can improve the prediction of protein levels from mRNA levels.
Collapse
Affiliation(s)
- Kobi Perl
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Kathy Ushakov
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Yair Pozniak
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Ofer Yizhar-Barnea
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Yoni Bhonker
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Shaked Shivatzki
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Tamar Geiger
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Karen B Avraham
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel.
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 6997801, Israel.
| |
Collapse
|
15
|
Structural hot spots for the solubility of globular proteins. Nat Commun 2016; 7:10816. [PMID: 26905391 PMCID: PMC4770091 DOI: 10.1038/ncomms10816] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 01/25/2016] [Indexed: 12/25/2022] Open
Abstract
Natural selection shapes protein solubility to physiological requirements and recombinant applications that require higher protein concentrations are often problematic. This raises the question whether the solubility of natural protein sequences can be improved. We here show an anti-correlation between the number of aggregation prone regions (APRs) in a protein sequence and its solubility, suggesting that mutational suppression of APRs provides a simple strategy to increase protein solubility. We show that mutations at specific positions within a protein structure can act as APR suppressors without affecting protein stability. These hot spots for protein solubility are both structure and sequence dependent but can be computationally predicted. We demonstrate this by reducing the aggregation of human α-galactosidase and protective antigen of Bacillus anthracis through mutation. Our results indicate that many proteins possess hot spots allowing to adapt protein solubility independently of structure and function. Mutations in aggregation prone regions of recombinant proteins often improve their solubility, although they might cause negative effects on their structure and function. Here, the authors identify proteins hot spots that can be exploited to optimize solubility without compromising stability.
Collapse
|
16
|
Lawless C, Holman SW, Brownridge P, Lanthaler K, Harman VM, Watkins R, Hammond DE, Miller RL, Sims PFG, Grant CM, Eyers CE, Beynon RJ, Hubbard SJ. Direct and Absolute Quantification of over 1800 Yeast Proteins via Selected Reaction Monitoring. Mol Cell Proteomics 2016; 15:1309-22. [PMID: 26750110 PMCID: PMC4824857 DOI: 10.1074/mcp.m115.054288] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2015] [Indexed: 11/06/2022] Open
Abstract
Defining intracellular protein concentration is critical in molecular systems biology. Although strategies for determining relative protein changes are available, defining robust absolute values in copies per cell has proven significantly more challenging. Here we present a reference data set quantifying over 1800 Saccharomyces cerevisiae proteins by direct means using protein-specific stable-isotope labeled internal standards and selected reaction monitoring (SRM) mass spectrometry, far exceeding any previous study. This was achieved by careful design of over 100 QconCAT recombinant proteins as standards, defining 1167 proteins in terms of copies per cell and upper limits on a further 668, with robust CVs routinely less than 20%. The selected reaction monitoring-derived proteome is compared with existing quantitative data sets, highlighting the disparities between methodologies. Coupled with a quantification of the transcriptome by RNA-seq taken from the same cells, these data support revised estimates of several fundamental molecular parameters: a total protein count of ∼100 million molecules-per-cell, a median of ∼1000 proteins-per-transcript, and a linear model of protein translation explaining 70% of the variance in translation rate. This work contributes a “gold-standard” reference yeast proteome (including 532 values based on high quality, dual peptide quantification) that can be widely used in systems models and for other comparative studies.
Collapse
Affiliation(s)
- Craig Lawless
- From the ‡Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| | - Stephen W Holman
- §Centre for Proteome Research, Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Philip Brownridge
- §Centre for Proteome Research, Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Karin Lanthaler
- From the ‡Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| | - Victoria M Harman
- §Centre for Proteome Research, Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Rachel Watkins
- From the ‡Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| | - Dean E Hammond
- §Centre for Proteome Research, Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Rebecca L Miller
- §Centre for Proteome Research, Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Paul F G Sims
- From the ‡Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| | - Christopher M Grant
- From the ‡Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| | - Claire E Eyers
- §Centre for Proteome Research, Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Robert J Beynon
- §Centre for Proteome Research, Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Simon J Hubbard
- From the ‡Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK;
| |
Collapse
|
17
|
Rizzetto S, Priami C, Csikász-Nagy A. Qualitative and Quantitative Protein Complex Prediction Through Proteome-Wide Simulations. PLoS Comput Biol 2015; 11:e1004424. [PMID: 26492574 PMCID: PMC4619657 DOI: 10.1371/journal.pcbi.1004424] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 06/22/2015] [Indexed: 12/18/2022] Open
Abstract
Despite recent progress in proteomics most protein complexes are still unknown. Identification of these complexes will help us understand cellular regulatory mechanisms and support development of new drugs. Therefore it is really important to establish detailed information about the composition and the abundance of protein complexes but existing algorithms can only give qualitative predictions. Herein, we propose a new approach based on stochastic simulations of protein complex formation that integrates multi-source data--such as protein abundances, domain-domain interactions and functional annotations--to predict alternative forms of protein complexes together with their abundances. This method, called SiComPre (Simulation based Complex Prediction), achieves better qualitative prediction of yeast and human protein complexes than existing methods and is the first to predict protein complex abundances. Furthermore, we show that SiComPre can be used to predict complexome changes upon drug treatment with the example of bortezomib. SiComPre is the first method to produce quantitative predictions on the abundance of molecular complexes while performing the best qualitative predictions. With new data on tissue specific protein complexes becoming available SiComPre will be able to predict qualitative and quantitative differences in the complexome in various tissue types and under various conditions.
Collapse
Affiliation(s)
- Simone Rizzetto
- The Microsoft Research-University of Trento Centre for Computational Systems Biology, Rovereto, Italy
| | - Corrado Priami
- The Microsoft Research-University of Trento Centre for Computational Systems Biology, Rovereto, Italy
- Department of Mathematics, University of Trento, Povo (TN), Italy
- * E-mail: (CP); (ACN)
| | - Attila Csikász-Nagy
- Department of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach, San Michele all'Adige, Italy
- Randall Division of Cell and Molecular Biophysics and Institute for Mathematical and Molecular Biomedicine, King's College London, London, United Kingdom
- * E-mail: (CP); (ACN)
| |
Collapse
|