1
|
Bhushan V, Nita-Lazar A. Recent Advancements in Subcellular Proteomics: Growing Impact of Organellar Protein Niches on the Understanding of Cell Biology. J Proteome Res 2024; 23:2700-2722. [PMID: 38451675 PMCID: PMC11296931 DOI: 10.1021/acs.jproteome.3c00839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
The mammalian cell is a complex entity, with membrane-bound and membrane-less organelles playing vital roles in regulating cellular homeostasis. Organellar protein niches drive discrete biological processes and cell functions, thus maintaining cell equilibrium. Cellular processes such as signaling, growth, proliferation, motility, and programmed cell death require dynamic protein movements between cell compartments. Aberrant protein localization is associated with a wide range of diseases. Therefore, analyzing the subcellular proteome of the cell can provide a comprehensive overview of cellular biology. With recent advancements in mass spectrometry, imaging technology, computational tools, and deep machine learning algorithms, studies pertaining to subcellular protein localization and their dynamic distributions are gaining momentum. These studies reveal changing interaction networks because of "moonlighting proteins" and serve as a discovery tool for disease network mechanisms. Consequently, this review aims to provide a comprehensive repository for recent advancements in subcellular proteomics subcontexting methods, challenges, and future perspectives for method developers. In summary, subcellular proteomics is crucial to the understanding of the fundamental cellular mechanisms and the associated diseases.
Collapse
Affiliation(s)
- Vanya Bhushan
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Aleksandra Nita-Lazar
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| |
Collapse
|
2
|
Veres T, Kerestély M, Kovács BM, Keresztes D, Schulc K, Seitz E, Vassy Z, Veres DV, Csermely P. Cellular forgetting, desensitisation, stress and ageing in signalling networks. When do cells refuse to learn more? Cell Mol Life Sci 2024; 81:97. [PMID: 38372750 PMCID: PMC10876757 DOI: 10.1007/s00018-024-05112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/09/2023] [Accepted: 01/02/2024] [Indexed: 02/20/2024]
Abstract
Recent findings show that single, non-neuronal cells are also able to learn signalling responses developing cellular memory. In cellular learning nodes of signalling networks strengthen their interactions e.g. by the conformational memory of intrinsically disordered proteins, protein translocation, miRNAs, lncRNAs, chromatin memory and signalling cascades. This can be described by a generalized, unicellular Hebbian learning process, where those signalling connections, which participate in learning, become stronger. Here we review those scenarios, where cellular signalling is not only repeated in a few times (when learning occurs), but becomes too frequent, too large, or too complex and overloads the cell. This leads to desensitisation of signalling networks by decoupling signalling components, receptor internalization, and consequent downregulation. These molecular processes are examples of anti-Hebbian learning and 'forgetting' of signalling networks. Stress can be perceived as signalling overload inducing the desensitisation of signalling pathways. Ageing occurs by the summative effects of cumulative stress downregulating signalling. We propose that cellular learning desensitisation, stress and ageing may be placed along the same axis of more and more intensive (prolonged or repeated) signalling. We discuss how cells might discriminate between repeated and unexpected signals, and highlight the Hebbian and anti-Hebbian mechanisms behind the fold-change detection in the NF-κB signalling pathway. We list drug design methods using Hebbian learning (such as chemically-induced proximity) and clinical treatment modalities inducing (cancer, drug allergies) desensitisation or avoiding drug-induced desensitisation. A better discrimination between cellular learning, desensitisation and stress may open novel directions in drug design, e.g. helping to overcome drug resistance.
Collapse
Affiliation(s)
- Tamás Veres
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Márk Kerestély
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Borbála M Kovács
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Dávid Keresztes
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Klára Schulc
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
- Division of Oncology, Department of Internal Medicine and Oncology, Semmelweis University, Budapest, Hungary
| | - Erik Seitz
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Zsolt Vassy
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Dániel V Veres
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
- Turbine Ltd, Budapest, Hungary
| | - Peter Csermely
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary.
| |
Collapse
|
3
|
Wang L, Goldwag J, Bouyea M, Barra J, Matteson K, Maharjan N, Eladdadi A, Embrechts MJ, Intes X, Kruger U, Barroso M. Spatial topology of organelle is a new breast cancer cell classifier. iScience 2023; 26:107229. [PMID: 37519903 PMCID: PMC10384275 DOI: 10.1016/j.isci.2023.107229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 05/10/2023] [Accepted: 06/23/2023] [Indexed: 08/01/2023] Open
Abstract
Genomics and proteomics have been central to identify tumor cell populations, but more accurate approaches to classify cell subtypes are still lacking. We propose a new methodology to accurately classify cancer cells based on their organelle spatial topology. Herein, we developed an organelle topology-based cell classification pipeline (OTCCP), which integrates artificial intelligence (AI) and imaging quantification to analyze organelle spatial distribution and inter-organelle topology. OTCCP was used to classify a panel of human breast cancer cells, grown as 2D monolayer or 3D tumor spheroids using early endosomes, mitochondria, and their inter-organelle contacts. Organelle topology allows for a highly precise differentiation between cell lines of different subtypes and aggressiveness. These findings lay the groundwork for using organelle topological profiling as a fast and efficient method for phenotyping breast cancer function as well as a discovery tool to advance our understanding of cancer cell biology at the subcellular level.
Collapse
Affiliation(s)
- Ling Wang
- Department of Molecular and Cellular Physiology, Albany Medical College, Albany, NY 12208, USA
| | - Joshua Goldwag
- Department of Molecular and Cellular Physiology, Albany Medical College, Albany, NY 12208, USA
| | - Megan Bouyea
- Department of Molecular and Cellular Physiology, Albany Medical College, Albany, NY 12208, USA
| | - Jonathan Barra
- Department of Molecular and Cellular Physiology, Albany Medical College, Albany, NY 12208, USA
| | - Kailie Matteson
- Department of Molecular and Cellular Physiology, Albany Medical College, Albany, NY 12208, USA
| | - Niva Maharjan
- Department of Mathematics, The College of Saint Rose, Albany, NY 12203, USA
| | - Amina Eladdadi
- Department of Mathematics, The College of Saint Rose, Albany, NY 12203, USA
| | - Mark J. Embrechts
- Department of Industrial and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Xavier Intes
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Uwe Kruger
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Margarida Barroso
- Department of Molecular and Cellular Physiology, Albany Medical College, Albany, NY 12208, USA
| |
Collapse
|
4
|
Mou M, Pan Z, Lu M, Sun H, Wang Y, Luo Y, Zhu F. Application of Machine Learning in Spatial Proteomics. J Chem Inf Model 2022; 62:5875-5895. [PMID: 36378082 DOI: 10.1021/acs.jcim.2c01161] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Spatial proteomics is an interdisciplinary field that investigates the localization and dynamics of proteins, and it has gained extensive attention in recent years, especially the subcellular proteomics. Numerous evidence indicate that the subcellular localization of proteins is associated with various cellular processes and disease progression. Mass spectrometry (MS)-based and imaging-based experimental approaches have been developed to acquire large-scale spatial proteomic data. To allow the reliable analysis of increasingly complex spatial proteomics data, machine learning (ML) methods have been widely used in both MS-based and imaging-based spatial proteomic data analysis pipelines. Here, we comprehensively survey the applications of ML in spatial proteomics from following aspects: (1) data resources for spatial proteome are comprehensively introduced; (2) the roles of different ML algorithms in data analysis pipelines are elaborated; (3) successful applications of spatial proteomics and several analytical tools integrating ML methods are presented; (4) challenges existing in modern ML-based spatial proteomics studies are discussed. This review provides guidelines for researchers seeking to apply ML methods to analyze spatial proteomic data and can facilitate insightful understanding of cell biology as well as the future research in medical and drug discovery communities.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
5
|
Khandakji MN, Mifsud B. Gene-specific machine learning model to predict the pathogenicity of BRCA2 variants. Front Genet 2022; 13:982930. [PMID: 36246618 PMCID: PMC9561395 DOI: 10.3389/fgene.2022.982930] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 09/12/2022] [Indexed: 11/18/2022] Open
Abstract
Background: Existing BRCA2-specific variant pathogenicity prediction algorithms focus on the prediction of the functional impact of a subtype of variants alone. General variant effect predictors are applicable to all subtypes, but are trained on putative benign and pathogenic variants and do not account for gene-specific information, such as hotspots of pathogenic variants. Local, gene-specific information have been shown to aid variant pathogenicity prediction; therefore, our aim was to develop a BRCA2-specific machine learning model to predict pathogenicity of all types of BRCA2 variants. Methods: We developed an XGBoost-based machine learning model to predict pathogenicity of BRCA2 variants. The model utilizes general variant information such as position, frequency, and consequence for the canonical BRCA2 transcript, as well as deleteriousness prediction scores from several tools. We trained the model on 80% of the expert reviewed variants by the Evidence-Based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium and tested its performance on the remaining 20%, as well as on an independent set of variants of uncertain significance with experimentally determined functional scores. Results: The novel gene-specific model predicted the pathogenicity of ENIGMA BRCA2 variants with an accuracy of 99.9%. The model also performed excellently on predicting the functional consequence of the independent set of variants (accuracy was up to 91.3%). Conclusion: This new, gene-specific model is an accurate method for interpreting the pathogenicity of variants in the BRCA2 gene. It is a valuable addition for variant classification and can prioritize unreviewed variants for functional analysis or expert review.
Collapse
Affiliation(s)
- Mohannad N. Khandakji
- College of Health and Life Sciences, Hamad Bin Khalifa University, Ar-Rayyan, Qatar
- Hamad Medical Corporation, Doha, Qatar
| | - Borbala Mifsud
- College of Health and Life Sciences, Hamad Bin Khalifa University, Ar-Rayyan, Qatar
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- *Correspondence: Borbala Mifsud,
| |
Collapse
|
6
|
Mendik P, Kerestély M, Kamp S, Deritei D, Kunšič N, Vassy Z, Csermely P, Veres DV. Translocating proteins compartment-specifically alter the fate of epithelial-mesenchymal transition in a compartmentalized Boolean network model. NPJ Syst Biol Appl 2022; 8:19. [PMID: 35680961 PMCID: PMC9184490 DOI: 10.1038/s41540-022-00228-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
Regulation of translocating proteins is crucial in defining cellular behaviour. Epithelial-mesenchymal transition (EMT) is important in cellular processes, such as cancer progression. Several orchestrators of EMT, such as key transcription factors, are known to translocate. We show that translocating proteins become enriched in EMT-signalling. To simulate the compartment-specific functions of translocating proteins we created a compartmentalized Boolean network model. This model successfully reproduced known biological traits of EMT and as a novel feature it also captured organelle-specific functions of proteins. Our results predicted that glycogen synthase kinase-3 beta (GSK3B) compartment-specifically alters the fate of EMT, amongst others the activation of nuclear GSK3B halts transforming growth factor beta-1 (TGFB) induced EMT. Moreover, our results recapitulated that the nuclear activation of glioma associated oncogene transcription factors (GLI) is needed to achieve a complete EMT. Compartmentalized network models will be useful to uncover novel control mechanisms of biological processes. Our algorithmic procedures can be automatically rerun on the https://translocaboole.linkgroup.hu website, which provides a framework for similar future studies.
Collapse
Affiliation(s)
- Péter Mendik
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Márk Kerestély
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | | | - Dávid Deritei
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Nina Kunšič
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Zsolt Vassy
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Péter Csermely
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary
| | - Daniel V Veres
- Department of Molecular Biology, Institute of Biochemistry and Molecular Biology, Semmelweis University, Budapest, Hungary. .,Turbine Ltd, Budapest, Hungary.
| |
Collapse
|
7
|
Christopher JA, Stadler C, Martin CE, Morgenstern M, Pan Y, Betsinger CN, Rattray DG, Mahdessian D, Gingras AC, Warscheid B, Lehtiö J, Cristea IM, Foster LJ, Emili A, Lilley KS. Subcellular proteomics. NATURE REVIEWS. METHODS PRIMERS 2021; 1:32. [PMID: 34549195 PMCID: PMC8451152 DOI: 10.1038/s43586-021-00029-y] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/15/2021] [Indexed: 12/11/2022]
Abstract
The eukaryotic cell is compartmentalized into subcellular niches, including membrane-bound and membrane-less organelles. Proteins localize to these niches to fulfil their function, enabling discreet biological processes to occur in synchrony. Dynamic movement of proteins between niches is essential for cellular processes such as signalling, growth, proliferation, motility and programmed cell death, and mutations causing aberrant protein localization are associated with a wide range of diseases. Determining the location of proteins in different cell states and cell types and how proteins relocalize following perturbation is important for understanding their functions, related cellular processes and pathologies associated with their mislocalization. In this Primer, we cover the major spatial proteomics methods for determining the location, distribution and abundance of proteins within subcellular structures. These technologies include fluorescent imaging, protein proximity labelling, organelle purification and cell-wide biochemical fractionation. We describe their workflows, data outputs and applications in exploring different cell biological scenarios, and discuss their main limitations. Finally, we describe emerging technologies and identify areas that require technological innovation to allow better characterization of the spatial proteome.
Collapse
Affiliation(s)
- Josie A. Christopher
- Department of Biochemistry, University of Cambridge, Cambridge, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK
| | - Charlotte Stadler
- Department of Protein Sciences, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Claire E. Martin
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
| | - Marcel Morgenstern
- Institute of Biology II, Biochemistry and Functional Proteomics, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Yanbo Pan
- Department of Oncology and Pathology, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Cora N. Betsinger
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA
| | - David G. Rattray
- Department of Biochemistry & Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Diana Mahdessian
- Department of Protein Sciences, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Anne-Claude Gingras
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Bettina Warscheid
- Institute of Biology II, Biochemistry and Functional Proteomics, Faculty of Biology, University of Freiburg, Freiburg, Germany
- BIOSS and CIBSS Signaling Research Centers, University of Freiburg, Freiburg, Germany
| | - Janne Lehtiö
- Department of Oncology and Pathology, Karolinska Institutet, Science for Life Laboratory, Solna, Sweden
| | - Ileana M. Cristea
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA
| | - Leonard J. Foster
- Department of Biochemistry & Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Andrew Emili
- Center for Network Systems Biology, Boston University, Boston, MA, USA
| | - Kathryn S. Lilley
- Department of Biochemistry, University of Cambridge, Cambridge, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK
| |
Collapse
|
8
|
Schulc K, Nagy ZT, Kamp S, Molnár J, Veres DV, Csermely P, Kovács BM. Modular Reorganization of Signaling Networks during the Development of Colon Adenoma and Carcinoma. J Phys Chem B 2021; 125:1716-1726. [PMID: 33562960 PMCID: PMC8023713 DOI: 10.1021/acs.jpcb.0c09307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
![]()
Network science is
an emerging tool in systems biology and oncology,
providing novel, system-level insight into the development of cancer.
The aim of this project was to study the signaling networks in the
process of oncogenesis to explore the adaptive mechanisms taking part
in the cancerous transformation of healthy cells. For this purpose,
colon cancer proved to be an excellent candidate as the preliminary
phase, and adenoma has a long evolution time. In our work, transcriptomic
data have been collected from normal colon, colon adenoma, and colon
cancer samples to calculating link (i.e., network edge) weights as
approximative proxies for protein abundances, and link weights were
included in the Human Cancer Signaling Network. Here we show that
the adenoma phase clearly differs from the normal and cancer states
in terms of a more scattered link weight distribution and enlarged
network diameter. Modular analysis shows the rearrangement of the
apoptosis- and the cell-cycle-related modules, whose pathway enrichment
analysis supports the relevance of targeted therapy. Our work enriches
the system-wide assessment of cancer development, showing specific
changes for the adenoma state.
Collapse
Affiliation(s)
- Klára Schulc
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| | - Zsolt T Nagy
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| | | | | | - Daniel V Veres
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary.,Turbine Ltd, Budapest, Hungary
| | - Peter Csermely
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| | - Borbála M Kovács
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| |
Collapse
|
9
|
Ke Y, Rao J, Zhao H, Lu Y, Xiao N, Yang Y. Accurate prediction of genome-wide RNA secondary structure profile based on extreme gradient boosting. Bioinformatics 2021; 36:4576-4582. [PMID: 32467966 DOI: 10.1093/bioinformatics/btaa534] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 05/01/2020] [Accepted: 05/23/2020] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION RNA secondary structure plays a vital role in fundamental cellular processes, and identification of RNA secondary structure is a key step to understand RNA functions. Recently, a few experimental methods were developed to profile genome-wide RNA secondary structure, i.e. the pairing probability of each nucleotide, through high-throughput sequencing techniques. However, these high-throughput methods have low precision and cannot cover all nucleotides due to limited sequencing coverage. RESULTS Here, we have developed a new method for the prediction of genome-wide RNA secondary structure profile from RNA sequence based on the extreme gradient boosting technique. The method achieves predictions with areas under the receiver operating characteristic curve (AUC) >0.9 on three different datasets, and AUC of 0.888 by another independent test on the recently released Zika virus data. These AUCs are consistently >5% greater than those by the CROSS method recently developed based on a shallow neural network. Further analysis on the 1000 Genome Project data showed that our predicted unpaired probabilities are highly correlated (>0.8) with the minor allele frequencies at synonymous, non-synonymous mutations, and mutations in untranslated regions, which were higher than those generated by RNAplfold. Moreover, the prediction over all human mRNA indicated a consistent result with previous observation that there is a periodic distribution of unpaired probability on codons. The accurate predictions by our method indicate that such model trained on genome-wide experimental data might be an alternative for analytical methods. AVAILABILITY AND IMPLEMENTATION The GRASP is available for academic use at https://github.com/sysu-yanglab/GRASP. SUPPLEMENTARY INFORMATION Supplementary data are available online.
Collapse
Affiliation(s)
- Yaobin Ke
- School of Data and Computer Science, Guangzhou 510000, China
| | - Jiahua Rao
- School of Data and Computer Science, Guangzhou 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Guangzhou 510000, China
| | - Yutong Lu
- School of Data and Computer Science, Guangzhou 510000, China
| | - Nong Xiao
- School of Data and Computer Science, Guangzhou 510000, China
| | - Yuedong Yang
- School of Data and Computer Science, Guangzhou 510000, China.,Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University) of Ministry of Education, Guangzhou 510000, China
| |
Collapse
|
10
|
Chen S, Zhou W, Tu J, Li J, Wang B, Mo X, Tian G, Lv K, Huang Z. A Novel XGBoost Method to Infer the Primary Lesion of 20 Solid Tumor Types From Gene Expression Data. Front Genet 2021; 12:632761. [PMID: 33613644 PMCID: PMC7886791 DOI: 10.3389/fgene.2021.632761] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 01/06/2021] [Indexed: 11/19/2022] Open
Abstract
Purpose Establish a suitable machine learning model to identify its primary lesions for primary metastatic tumors in an integrated learning approach, making it more accurate to improve primary lesions’ diagnostic efficiency. Methods After deleting the features whose expression level is lower than the threshold, we use two methods to perform feature selection and use XGBoost for classification. After the optimal model is selected through 10-fold cross-validation, it is verified on an independent test set. Results Selecting features with around 800 genes for training, the R2-score of a 10-fold CV of training data can reach 96.38%, and the R2-score of test data can reach 83.3%. Conclusion These findings suggest that by combining tumor data with machine learning methods, each cancer has its corresponding classification accuracy, which can be used to predict primary metastatic tumors’ location. The machine-learning-based method can be used as an orthogonal diagnostic method to judge the machine learning model processing and clinical actual pathological conditions.
Collapse
Affiliation(s)
- Sijie Chen
- Department of Mathematics, Ocean University of China, Qingdao, China
| | - Wenjing Zhou
- Department of Oncology, Hiser Medical Center of Qingdao, Qingdao, China
| | - Jinghui Tu
- Department of Mathematics, Ocean University of China, Qingdao, China
| | - Jian Li
- Department of Mathematics, Ocean University of China, Qingdao, China
| | - Bo Wang
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| | - Xiaofei Mo
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| | - Kebo Lv
- Department of Mathematics, Ocean University of China, Qingdao, China
| | - Zhijian Huang
- Department of Breast Surgical Oncology, Fujian Cancer Hospital & Fujian Medical University Cancer Hospital, Fuzhou, China
| |
Collapse
|
11
|
Ribeiro DM, Prod'homme A, Teixeira A, Zanzoni A, Brun C. The role of 3'UTR-protein complexes in the regulation of protein multifunctionality and subcellular localization. Nucleic Acids Res 2020; 48:6491-6502. [PMID: 32484544 PMCID: PMC7337931 DOI: 10.1093/nar/gkaa462] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 04/24/2020] [Accepted: 05/24/2020] [Indexed: 02/06/2023] Open
Abstract
Multifunctional proteins often perform their different functions when localized in different subcellular compartments. However, the mechanisms leading to their localization are largely unknown. Recently, 3'UTRs were found to regulate the cellular localization of newly synthesized proteins through the formation of 3'UTR-protein complexes. Here, we investigate the formation of 3'UTR-protein complexes involving multifunctional proteins by exploiting large-scale protein-protein and protein-RNA interaction networks. Focusing on 238 human 'extreme multifunctional' (EMF) proteins, we predicted 1411 3'UTR-protein complexes involving 54% of those proteins and evaluated their role in regulating protein cellular localization and multifunctionality. We find that EMF proteins lacking localization addressing signals, yet present at both the nucleus and cell surface, often form 3'UTR-protein complexes, and that the formation of these complexes could provide EMF proteins with the diversity of interaction partners necessary to their multifunctionality. Our findings are reinforced by archetypal moonlighting proteins predicted to form 3'UTR-protein complexes. Finally, the formation of 3'UTR-protein complexes that involves up to 17% of the proteins in the human protein-protein interaction network, may be a common and yet underestimated protein trafficking mechanism, particularly suited to regulate the localization of multifunctional proteins.
Collapse
Affiliation(s)
- Diogo M Ribeiro
- Aix Marseille Univ, Inserm, TAGC, UMR_S1090, Marseille, France
| | | | - Adrien Teixeira
- Aix Marseille Univ, Inserm, TAGC, UMR_S1090, Marseille, France
| | - Andreas Zanzoni
- Aix Marseille Univ, Inserm, TAGC, UMR_S1090, Marseille, France
| | - Christine Brun
- Aix Marseille Univ, Inserm, TAGC, UMR_S1090, Marseille, France.,CNRS, Marseille, France
| |
Collapse
|
12
|
Kennedy MA, Hofstadter WA, Cristea IM. TRANSPIRE: A Computational Pipeline to Elucidate Intracellular Protein Movements from Spatial Proteomics Data Sets. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2020; 31:1422-1439. [PMID: 32401031 PMCID: PMC7737664 DOI: 10.1021/jasms.0c00033] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Protein localization is paramount to protein function, and the intracellular movement of proteins underlies the regulation of numerous cellular processes. Given advances in spatial proteomics, the investigation of protein localization at a global scale has become attainable. Also becoming apparent is the need for dedicated analytical frameworks that allow the discovery of global intracellular protein movement events. Here, we describe TRANSPIRE, a computational pipeline that facilitates TRanslocation ANalysis of SPatIal pRotEomics data sets. TRANSPIRE leverages synthetic translocation profiles generated from organelle marker proteins to train a probabilistic Gaussian process classifier that predicts changes in protein distribution. This output is then integrated with information regarding co-translocating proteins and complexes and enriched gene ontology associations to discern the putative regulation and function of movement. We validate TRANSPIRE performance for predicting nuclear-cytoplasmic shuttling events. Analyzing an existing data set of nuclear and cytoplasmic proteomes during Kaposi Sarcoma-associated herpesvirus (KSHV)-induced cellular mRNA decay, we confirm that TRANSPIRE readily discerns expected translocations of RNA binding proteins. We next investigate protein translocations during infection with human cytomegalovirus (HCMV), a β-herpesvirus known to induce global organelle remodeling. We find that HCMV infection induces broad changes in protein localization, with over 800 proteins predicted to translocate during virus replication. Evident are protein movements related to HCMV modulation of host defense, metabolism, cellular trafficking, and Wnt signaling. For example, the low-density lipoprotein receptor (LDLR) translocates to the lysosome early in infection in conjunction with its degradation, which we validate by targeted mass spectrometry. Using microscopy, we also validate the translocation of the multifunctional kinase DAPK3, a movement that may contribute to HCMV activation of Wnt signaling.
Collapse
Affiliation(s)
- Michelle A Kennedy
- Department of Molecular Biology, Princeton University, Washington Road, Princeton, New Jersey 08544, United States
| | - William A Hofstadter
- Department of Molecular Biology, Princeton University, Washington Road, Princeton, New Jersey 08544, United States
| | - Ileana M Cristea
- Department of Molecular Biology, Princeton University, Washington Road, Princeton, New Jersey 08544, United States
| |
Collapse
|
13
|
Lv X, Chen J, Lu Y, Chen Z, Xiao N, Yang Y. Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting. J Chem Inf Model 2020; 60:2388-2395. [PMID: 32203653 DOI: 10.1021/acs.jcim.0c00064] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Accurately predicting the impact of point mutation on protein stability has crucial roles in protein design and engineering. In this study, we proposed a novel method (BoostDDG) to predict stability changes upon point mutations from protein sequences based on the extreme gradient boosting. We extracted features comprehensively from evolutional information and predicted structures and performed feature selection by a strategy of sequential forward selection. The features and parameters were optimized by homologue-based cross-validation to avoid overfitting. Finally, we found that 14 features from six groups led to the highest Pearson correlation coefficient (PCC) of 0.535, which is consistent with the 0.540 on an independent test. Our method was indicated to consistently outperform other sequence-based methods on three precompiled test sets, and 7363 variants on two proteins (PTEN and TPMT). These results highlighted that BoostDDG is a powerful tool for predicting stability changes upon point mutations from protein sequences.
Collapse
Affiliation(s)
- Xuan Lv
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Jianwen Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Yutong Lu
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Zhiguang Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Nong Xiao
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China.,School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China.,Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-sen University, Ministry of Education, Guangzhou, Guangdong 510275, China
| |
Collapse
|
14
|
Temporal dynamics of protein complex formation and dissociation during human cytomegalovirus infection. Nat Commun 2020; 11:806. [PMID: 32041945 PMCID: PMC7010728 DOI: 10.1038/s41467-020-14586-5] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 01/10/2020] [Indexed: 12/23/2022] Open
Abstract
The co-evolution and co-existence of viral pathogens with their hosts for millions of years is reflected in dynamic virus-host protein-protein interactions (PPIs) that are intrinsic to the spread of infections. Here, we investigate the system-wide dynamics of protein complexes throughout infection with the herpesvirus, human cytomegalovirus (HCMV). Integrating thermal shift assays and mass spectrometry quantification with virology and microscopy, we monitor the temporal formation and dissociation of hundreds of functional protein complexes and the dynamics of host-host, virus-host, and virus-virus PPIs. We establish pro-viral roles for cellular protein complexes and translocating proteins. We show the HCMV receptor integrin beta 1 dissociates from extracellular matrix proteins, becoming internalized with CD63, which is necessary for virus production. Moreover, this approach facilitates characterization of essential viral proteins, such as pUL52. This study of temporal protein complex dynamics provides insights into mechanisms of HCMV infection and a resource for biological and therapeutic studies. Here, Hashimoto et al. apply mass spectrometry-based thermal proximity coaggregation to characterize the temporal dynamics of virus-host protein-protein interactions during human cytomegalovirus (HCMV) infection, uncovering proviral functions including the internalization of the HCMV receptor integrin beta 1 with CD63.
Collapse
|
15
|
Learning of Signaling Networks: Molecular Mechanisms. Trends Biochem Sci 2020; 45:284-294. [PMID: 32008897 DOI: 10.1016/j.tibs.2019.12.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 11/28/2019] [Accepted: 12/31/2019] [Indexed: 01/03/2023]
Abstract
Molecular processes of neuronal learning have been well described. However, learning mechanisms of non-neuronal cells are not yet fully understood at the molecular level. Here, we discuss molecular mechanisms of cellular learning, including conformational memory of intrinsically disordered proteins (IDPs) and prions, signaling cascades, protein translocation, RNAs [miRNA and long noncoding RNA (lncRNA)], and chromatin memory. We hypothesize that these processes constitute the learning of signaling networks and correspond to a generalized Hebbian learning process of single, non-neuronal cells, and we discuss how cellular learning may open novel directions in drug design and inspire new artificial intelligence methods.
Collapse
|