1
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
2
|
Zahiri Z, Mehrshad N, Mehrshad M. DF-Phos: Prediction of Protein Phosphorylation Sites by Deep Forest. J Biochem 2024; 175:447-456. [PMID: 38153271 DOI: 10.1093/jb/mvad116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 12/10/2023] [Accepted: 12/12/2023] [Indexed: 12/29/2023] Open
Abstract
Phosphorylation is the most important and studied post-translational modification (PTM), which plays a crucial role in protein function studies and experimental design. Many significant studies have been performed to predict phosphorylation sites using various machine-learning methods. Recently, several studies have claimed that deep learning-based methods are the best way to predict the phosphorylation sites because deep learning as an advanced machine learning method can automatically detect complex representations of phosphorylation patterns from raw sequences and thus offers a powerful tool to improve phosphorylation site prediction. In this study, we report DF-Phos, a new phosphosite predictor based on the Deep Forest to predict phosphorylation sites. In DF-Phos, the feature vector taken from the CkSAApair method is as input for a Deep Forest framework for predicting phosphorylation sites. The results of 10-fold cross-validation show that the Deep Forest method has the highest performance among other available methods. We implemented a Python program of DF-Phos, which is freely available for non-commercial use at https://github.com/zahiriz/DF-Phos Moreover, users can use it for various PTM predictions.
Collapse
Affiliation(s)
- Zeynab Zahiri
- Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Nasser Mehrshad
- Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Maliheh Mehrshad
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, 750 07 Sweden
| |
Collapse
|
3
|
da Silva ANR, Pereira GRC, Bonet LFS, Outeiro TF, De Mesquita JF. In silico analysis of alpha-synuclein protein variants and posttranslational modifications related to Parkinson's disease. J Cell Biochem 2024; 125:e30523. [PMID: 38239037 DOI: 10.1002/jcb.30523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 12/11/2023] [Accepted: 12/29/2023] [Indexed: 03/12/2024]
Abstract
Parkinson's disease (PD) is among the most prevalent neurodegenerative disorders, affecting over 10 million people worldwide. The protein encoded by the SNCA gene, alpha-synuclein (ASYN), is the major component of Lewy body (LB) aggregates, a histopathological hallmark of PD. Mutations and posttranslational modifications (PTMs) in ASYN are known to influence protein aggregation and LB formation, possibly playing a crucial role in PD pathogenesis. In this work, we applied computational methods to characterize the effects of missense mutations and PTMs on the structure and function of ASYN. Missense mutations in ASYN were compiled from the literature/databases and underwent a comprehensive predictive analysis. Phosphorylation and SUMOylation sites of ASYN were retrieved from databases and predicted by algorithms. ConSurf was used to estimate the evolutionary conservation of ASYN amino acids. Molecular dynamics (MD) simulations of ASYN wild-type and variants A30G, A30P, A53T, and G51D were performed using the GROMACS package. Seventy-seven missense mutations in ASYN were compiled. Although most mutations were not predicted to affect ASYN stability, aggregation propensity, amyloid formation, and chaperone binding, the analyzed mutations received relatively high rates of deleterious predictions and predominantly occurred at evolutionarily conserved sites within the protein. Moreover, our predictive analyses suggested that the following mutations may be possibly harmful to ASYN and, consequently, potential targets for future investigation: K6N, T22I, K34E, G36R, G36S, V37F, L38P, G41D, and K102E. The MD analyses pointed to remarkable flexibility and essential dynamics alterations at nearly all domains of the studied variants, which could lead to impaired contact between NAC and the C-terminal domain triggering protein aggregation. These alterations may have functional implications for ASYN and provide important insight into the molecular mechanism of PD, supporting the design of future biomedical research and improvements in existing therapies for the disease.
Collapse
Affiliation(s)
- Aloma N R da Silva
- Bioinformatics and Computational Biology Laboratory, Department of Genetics and Molecular Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| | - Gabriel R C Pereira
- Bioinformatics and Computational Biology Laboratory, Department of Genetics and Molecular Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| | - Luiz Felippe Sarmento Bonet
- Bioinformatics and Computational Biology Laboratory, Department of Genetics and Molecular Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| | - Tiago Fleming Outeiro
- Department of Experimental Neurodegeneration, Center for Biostructural Imaging of Neurodegeneration, University Medical Center Göttingen, Göttingen, Germany
- Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- Max Planck Institute for Experimental Medicine, Göttingen, Germany
| | - Joelma F De Mesquita
- Bioinformatics and Computational Biology Laboratory, Department of Genetics and Molecular Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
4
|
Esmaili F, Pourmirzaei M, Ramazi S, Shojaeilangari S, Yavari E. A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1266-1285. [PMID: 37863385 PMCID: PMC11082408 DOI: 10.1016/j.gpb.2023.03.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 01/16/2023] [Accepted: 03/23/2023] [Indexed: 10/22/2023]
Abstract
Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and, as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases, including neurological disorders and cancers. The purpose of this review is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively review all related databases and introduce all steps regarding dataset creation, data preprocessing, and method evaluation in p-site prediction. Next, we investigate p-site prediction methods, which are divided into two computational groups: algorithmic and machine learning (ML). Additionally, it is shown that there are basically two main approaches for p-site prediction by ML: conventional and end-to-end deep learning methods, both of which are given an overview. Moreover, this review introduces the most important feature extraction techniques, which have mostly been used in p-site prediction. Finally, we create three test sets from new proteins related to the released version of the database of protein post-translational modifications (dbPTM) in 2022 based on general and human species. Evaluating online p-site prediction tools on newly added proteins introduced in the dbPTM 2022 release, distinct from those in the dbPTM 2019 release, reveals their limitations. In other words, the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.
Collapse
Affiliation(s)
- Farzaneh Esmaili
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| | - Mahdi Pourmirzaei
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| | - Shahin Ramazi
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran 14115-111, Iran.
| | - Seyedehsamaneh Shojaeilangari
- Biomedical Engineering Group, Department of Electrical Engineering and Information Technology, Iranian Research Organization for Science and Technology (IROST), Tehran 33535-111, Iran
| | - Elham Yavari
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| |
Collapse
|
5
|
Dutta Gupta O, Karbat I, Pal K. Understanding the Molecular Regulation of Serotonin Receptor 5-HTR 1B-β-Arrestin1 Complex in Stress and Anxiety Disorders. J Mol Neurosci 2023; 73:664-677. [PMID: 37580644 DOI: 10.1007/s12031-023-02146-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 07/31/2023] [Indexed: 08/16/2023]
Abstract
The serotonin receptor subtype 5-HTR1B is widely distributed in the brain with an important role in various behavioral implications including neurological conditions and psychiatric disorders. The neuromodulatory action of 5-HTR1B largely depends upon its arrestin mediated signaling pathway. In this study, we tried to investigate the role of unusually long intracellular loop 3 (ICL3) region of the serotonin receptor 5-HTR1B in interaction with β-arrestin1 (Arr2) to compensate for the absence of the long cytoplasmic tail. Molecular modeling and docking tools were employed to obtain a suitable molecular conformation of the ICL3 region in complex with Arr2 which dictates the specific complex formation of 5-HTR1B with Arr2. This reveals the novel molecular mechanism of phosphorylated ICL3 mediated GPCR-arrestin interaction in the absence of the long cytoplasmic tail. The in-cell disulfide cross-linking experiments and molecular dynamics simulations of the complex further validate the model of 5-HTR1B-ICL3-Arr2 complex. Two serine residues (Ser281 and Ser295) within the 5-HTR1B-ICL3 region were found to be occupying the electropositive pocket of Arr2 in our model and might be crucial for phosphorylation and specific Arr2 binding. The alignment studies of these residues showed them to be conserved only across 5-HTR1B mammalian species. Thus, our studies were able to predict a molecular conformation of 5-HTR1B-Arr2 and identify the role of long ICL3 in the signaling process which might be crucial in designing targeted drugs (biased agonists) that promote GPCR-Arr2 signaling to deter the effects of stress and anxiety-like disorders.
Collapse
Affiliation(s)
- Oindrilla Dutta Gupta
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, 700126, Kolkata, West Bengal, India
| | - Izhar Karbat
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - Kuntal Pal
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, 700126, Kolkata, West Bengal, India.
- School of Biosciences and Technology (SBST), Vellore Institute of Technology, 632014, Vellore, Tamil Nadu, India.
| |
Collapse
|
6
|
Ahmed F, Dehzangi I, Hasan MM, Shatabda S. Accurately predicting microbial phosphorylation sites using evolutionary and structural features. Gene 2023; 851:146993. [DOI: 10.1016/j.gene.2022.146993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/05/2022] [Accepted: 10/14/2022] [Indexed: 11/27/2022]
|
7
|
Guo X, He H, Yu J, Shi S. PKSPS: a novel method for predicting kinase of specific phosphorylation sites based on maximum weighted bipartite matching algorithm and phosphorylation sequence enrichment analysis. Brief Bioinform 2021; 23:6398688. [PMID: 34661630 DOI: 10.1093/bib/bbab436] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 09/10/2021] [Accepted: 09/21/2021] [Indexed: 11/14/2022] Open
Abstract
With the development of biotechnology, a large number of phosphorylation sites have been experimentally confirmed and collected, but only a few of them have kinase annotations. Since experimental methods to detect kinases at specific phosphorylation sites are expensive and accidental, some computational methods have been proposed to predict the kinase of these sites, but most methods only consider single sequence information or single functional network information. In this study, a new method Predicting Kinase of Specific Phosphorylation Sites (PKSPS) is developed to predict kinases of specific phosphorylation sites in human proteins by combining PKSPS-Net with PKSPS-Seq, which considers protein-protein interaction (PPI) network information and sequence information. For PKSPS-Net, kinase-kinase and substrate-substrate similarity are quantified based on the topological similarity of proteins in the PPI network, and maximum weighted bipartite matching algorithm is proposed to predict kinase-substrate relationship. In PKSPS-Seq, phosphorylation sequence enrichment analysis is used to analyze the similarity of local sequences around phosphorylation sites and predict the kinase of specific phosphorylation sites (KSP). PKSPS has been proved to be more effective than the PKSPS-Net or PKSPS-Seq on different sets of kinases. Further comparison results show that the PKSPS method performs better than existing methods. Finally, the case study demonstrates the effectiveness of the PKSPS in predicting kinases of specific phosphorylation sites. The open source code and data of the PKSPS can be obtained from https://github.com/guoxinyunncu/PKSPS.
Collapse
Affiliation(s)
- Xinyun Guo
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Huan He
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Jialin Yu
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| | - Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China
| |
Collapse
|
8
|
Aneskievich BJ, Shamilov R, Vinogradova O. Intrinsic disorder in integral membrane proteins. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2021; 183:101-134. [PMID: 34656327 DOI: 10.1016/bs.pmbts.2021.06.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The well-defined roles and specific protein-protein interactions of many integral membrane proteins (IMPs), such as those functioning as receptors for extracellular matrix proteins and soluble growth factors, easily align with considering IMP structure as a classical "lock-and-key" concept. Nevertheless, continued advances in understanding protein conformation, such as those which established the widespread existence of intrinsically disordered proteins (IDPs) and especially intrinsically disordered regions (IDRs) in otherwise three-dimensionally organized proteins, call for ongoing reevaluation of transmembrane proteins. Here, we present basic traits of IDPs and IDRs, and, for some select single-span IMPs, consider the potential functional advantages intrinsic disorder might provide and the possible conformational impact of disease-associated mutations. For transmembrane proteins in general, we highlight several investigational approaches, such as biophysical and computational methods, stressing the importance of integrating them to produce a more-complete mechanistic model of disorder-containing IMPs. These procedures, when synergized with in-cell assessments, will likely be key in translating in silico and in vitro results to improved understanding of IMP conformational flexibility in normal cell physiology as well as disease, and will help to extend their potential as therapeutic targets.
Collapse
Affiliation(s)
- Brian J Aneskievich
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT, United States
| | - Rambon Shamilov
- Graduate Program in Pharmacology and Toxicology, Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT, United States
| | - Olga Vinogradova
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, CT, United States.
| |
Collapse
|
9
|
D’Amore C, Salvi M. Editorial of Special Issue "Protein Post-Translational Modifications in Signal Transduction and Diseases". Int J Mol Sci 2021; 22:ijms22052232. [PMID: 33668127 PMCID: PMC7956322 DOI: 10.3390/ijms22052232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 02/20/2021] [Accepted: 02/22/2021] [Indexed: 11/16/2022] Open
Abstract
The making of a protein is based on the combination of 20 different monomers (22 considering selenocysteine and pyrrolysine, the latest present only in some archaea and bacteria) giving the possibility of building a variety of structures from the simplest to the most complex, rigid or highly dynamic, and suited to carry out a wide range of structural and functional roles [...].
Collapse
|