1
|
Esmaili F, Pourmirzaei M, Ramazi S, Shojaeilangari S, Yavari E. A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1266-1285. [PMID: 37863385 PMCID: PMC11082408 DOI: 10.1016/j.gpb.2023.03.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 01/16/2023] [Accepted: 03/23/2023] [Indexed: 10/22/2023]
Abstract
Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and, as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases, including neurological disorders and cancers. The purpose of this review is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively review all related databases and introduce all steps regarding dataset creation, data preprocessing, and method evaluation in p-site prediction. Next, we investigate p-site prediction methods, which are divided into two computational groups: algorithmic and machine learning (ML). Additionally, it is shown that there are basically two main approaches for p-site prediction by ML: conventional and end-to-end deep learning methods, both of which are given an overview. Moreover, this review introduces the most important feature extraction techniques, which have mostly been used in p-site prediction. Finally, we create three test sets from new proteins related to the released version of the database of protein post-translational modifications (dbPTM) in 2022 based on general and human species. Evaluating online p-site prediction tools on newly added proteins introduced in the dbPTM 2022 release, distinct from those in the dbPTM 2019 release, reveals their limitations. In other words, the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.
Collapse
Affiliation(s)
- Farzaneh Esmaili
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| | - Mahdi Pourmirzaei
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| | - Shahin Ramazi
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran 14115-111, Iran.
| | - Seyedehsamaneh Shojaeilangari
- Biomedical Engineering Group, Department of Electrical Engineering and Information Technology, Iranian Research Organization for Science and Technology (IROST), Tehran 33535-111, Iran
| | - Elham Yavari
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| |
Collapse
|
2
|
Manwar Hussain MR, Iqbal Z, Qazi WM, Hoessli DC. Charge and Polarity Preferences for N-Glycosylation: A Genome-Wide In Silico Study and Its Implications Regarding Constitutive Proliferation and Adhesion of Carcinoma Cells. Front Oncol 2018. [PMID: 29541627 PMCID: PMC5835500 DOI: 10.3389/fonc.2018.00029] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The structural and functional diversity of the human proteome is mediated by N- and O-linked glycosylations that define the individual properties of extracellular and membrane-associated proteins. In this study, we utilized different computational tools to perform in silico based genome-wide mapping of 1,117 human proteins and unravel the contribution of both penultimate and vicinal amino acids for the asparagine-based, site-specific N-glycosylation. Our results correlate the non-canonical involvement of charge and polarity environment of classified amino acids (designated as L, O, A, P, and N groups) in the N-glycosylation process, as validated by NetNGlyc predictions, and 130 literature-reported human proteins. From our results, particular charge and polarity combinations of non-polar aliphatic, acidic, basic, and aromatic polar side chain environment of both penultimate and vicinal amino acids were found to promote the N-glycosylation process. However, the alteration in side-chain charge and polarity environment of genetic variants, particularly in the vicinity of Asn-containing epitope, may induce constitutive glycosylation (e.g., aberrant glycosylation at preferred and non-preferred sites) of membrane proteins causing constitutive proliferation and triggering epithelial-to-mesenchymal transition. The current genome-wide mapping of 1,117 proteins (2,909 asparagine residues) was used to explore charge- and polarity-based mechanistic constraints in N-glycosylation, and discuss alterations of the neoplastic phenotype that can be ascribed to N-glycosylation at preferred and non-preferred sites.
Collapse
Affiliation(s)
- Muhammad Ramzan Manwar Hussain
- Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics (CAS), Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zeeshan Iqbal
- Institute of Molecular Sciences & Bioinformatics, Lahore, Pakistan.,Department of Physics, GC University Lahore, Lahore, Pakistan
| | - Wajahat M Qazi
- Center for Intelligent Machines and Robotics, Department of Computer Science, COMSATS Institute of Information Technology, Lahore, Pakistan
| | - Daniel C Hoessli
- Institute of Molecular Sciences & Bioinformatics, Lahore, Pakistan.,Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, Pakistan
| |
Collapse
|
3
|
Iqbal Z, Hoessli DC, Qazi WM, Ahmad M, Shakoori AR, Nasir-ud-Din. Exploring the sequence context of phosphorylatable amino acids: the contribution of the upgraded MAPRes tool. J Cell Biochem 2014; 116:370-9. [PMID: 25258092 DOI: 10.1002/jcb.24983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 09/19/2014] [Indexed: 11/10/2022]
Abstract
Several models that predict where post-translational modifications are likely to occur and formulate the corresponding association rules are available to analyze the functional potential of a protein sequence, but an algorithm incorporating the functional groups of the involved amino acids in the sequence analyses process is not yet available. In its previous version, MAPRes was utilized to investigate the influence of the surrounding amino acids of post- translationally and co-translationally modifiable sites. The MAPRes has been upgraded to take into account the different biophysical and biochemical properties of the amino acids that have the potential to influence different post- translational modifications (PTMs). In the present study, the upgraded version of MAPRes was implemented on phosphorylated Ser/Thr/Tyr data by considering the polarity and charge of the surrounding amino acids. The patterns mined by MAPRes incorporating structural information on polarity and charge of amino acids suggest distinct structure-function relationships for phosphorylated serines in a multifunctional protein such as the insulin-receptor substrate-1 (IRS-1) protein. The new version of MAPRes is freely available at http://www.imsb.edu.pk/Database.htm.
Collapse
Affiliation(s)
- Zeeshan Iqbal
- Institute of Molecular Sciences and Bioinformatics, Lahore, Pakistan
| | | | | | | | | | | |
Collapse
|
4
|
Liu R, France B, George S, Rallo R, Zhang H, Xia T, Nel AE, Bradley K, Cohen Y. Association rule mining of cellular responses induced by metal and metal oxide nanoparticles. Analyst 2014; 139:943-53. [PMID: 24260774 DOI: 10.1039/c3an01409f] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Relationships among fourteen different biological responses (including ten signaling pathway activities and four cytotoxicity effects) of murine macrophage (RAW264.7) and bronchial epithelial (BEAS-2B) cells exposed to six metal and metal oxide nanoparticles (NPs) were analyzed using both statistical and data mining approaches. Both the pathway activities and cytotoxicity effects were assessed using high-throughput screening (HTS) over an exposure period of up to 24 h and concentration range of 0.39-200 mg L(-1). HTS data were processed by outlier removal, normalization, and hit-identification (for significantly regulated cellular responses) to arrive at reliable multiparametric bioactivity profiles for the NPs. Association rule mining was then applied to the bioactivity profiles followed by a pruning process to remove redundant rules. The non-redundant association rules indicated that "significant regulation" of one or more cellular responses implies regulation of other (associated) cellular response types. Pairwise correlation analysis (via Pearson's χ(2) test) and self-organizing map clustering of the different cellular response types indicated consistency with the identified non-redundant association rules. Furthermore, in order to explore the potential use of association rules as a tool for data-driven hypothesis generation, specific pathway activity experiments were carried out for ZnO NPs. The experimental results confirmed the association rule identified for the p53 pathway and mitochondrial superoxide levels (via MitoSox reagent) and further revealed that blocking of the transcriptional activity of p53 lowered the MitoSox signal. The present approach of using association rule mining for data-driven hypothesis generation has important implications for streamlining multi-parameter HTS assays, improving the understanding of NP toxicity mechanisms, and selection of endpoints for the development of nanomaterial structure-activity relationships.
Collapse
Affiliation(s)
- Rong Liu
- Institute of the Environment and Sustainability, University of California, Los Angeles, CA 90095, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Ijaz A. SUMOhunt: Combining Spatial Staging between Lysine and SUMO with Random Forests to Predict SUMOylation. ISRN BIOINFORMATICS 2013; 2013:671269. [PMID: 25937950 PMCID: PMC4393069 DOI: 10.1155/2013/671269] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 05/28/2013] [Indexed: 11/20/2022]
Abstract
Modification with SUMO protein has many key roles in eukaryotic systems which renders the identification of its target proteins and sites of considerable importance. Information regarding the SUMOylation of a protein may tell us about its subcellular localization, function, and spatial orientation. This modification occurs at particular and not all lysine residues in a given protein. In competition with biochemical means of modified-site recognition, computational methods are strong contenders in the prediction of SUMOylation-undergoing sites on proteins. In this research, physicochemical properties of amino acids retrieved from AAIndex, especially those involved in docking of modifier and target proteins and optimal presentation of target lysine, in combination with sequence information and random forest-based classifier presented in WEKA have been used to develop a prediction model, SUMOhunt, with statistics significantly better than all previous predictors. In this model 97.56% accuracy, 100% sensitivity, 94% specificity, and 0.95 MCC have been achieved which shows that proposed amino acid properties have a significant role in SUMO attachment. SUMOhunt will hence bring great reliability and efficiency in SUMOylation prediction.
Collapse
Affiliation(s)
- Amna Ijaz
- National Institute of Biotechnology and Genetic Engineering, P.O. Box 577, Jhang Road, Faisalabad, Pakistan
| |
Collapse
|
6
|
Iqbal Z, Hoessli DC, Kaleem A, Munir J, Saleem M, Afzal I, Shakoori AR, Nasir-Ud-Din. Influence of the sequence environment and properties of neighboring amino acids on amino-acetylation: relevance for structure-function analysis. J Cell Biochem 2012; 114:874-87. [PMID: 23097243 DOI: 10.1002/jcb.24426] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2012] [Accepted: 10/15/2012] [Indexed: 12/13/2022]
Abstract
Proteins function is regulated by co-translational modifications and post-translational modifications (PTMs) such as phosphorylation, glycosylation, and acetylation, which induce proteins to perform multiple tasks in a specified environment. Acetylation takes place post-translationally on the ε-amino group of Lys in histone proteins, allowing regulation of gene expression. Furthermore, amino group acetylation also occurs co-translationally on Ser, Thr, Gly, Met, and Ala, possibly contributing to the stability of proteins. In this work, the influence of amino acids next to acetylated sites has been investigated by using MAPRes (Mining Association Patterns among preferred amino acid residues in the vicinity of amino acids targeted for PTMs). MAPRes was utilized to examine the sequence patterns vicinal to modified and non-modified residues, taking into account their charge and polarity. The PTMs data were further sub-divided according to their sub-cellular location (nuclear, mitochondrial, and cytoplasmic), and their association patterns were mined. The association patterns mined by MAPRes for acetylated and non-acetylated residues are consistent with the existing literature but also revealed novel patterns. These rules have been utilized to describe the acetylation and its effects on the protein structure-function relationship.
Collapse
Affiliation(s)
- Zeeshan Iqbal
- Institute of Molecular Sciences and Bioinformatics, Lahore, Pakistan
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Tai YM, Chiu HW. Comorbidity study of ADHD: applying association rule mining (ARM) to National Health Insurance Database of Taiwan. Int J Med Inform 2009; 78:e75-83. [PMID: 19853501 DOI: 10.1016/j.ijmedinf.2009.09.005] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2009] [Revised: 09/17/2009] [Accepted: 09/17/2009] [Indexed: 11/24/2022]
Abstract
OBJECTIVE This paper intends to apply association rule mining (ARM) to explore the labyrinthian network of ADHD comorbidity, and to examine the practicality of ARM in comorbidity studies using clinic databases. METHODS From clinic records of enrollees of Taiwan National Health Insurance (NHI), 18,321 youngsters aged 18 or less with diagnosis of ADHD in 2001 were recruited as case group in this study. And all their clinic diagnoses made from 2000 to 2002, as comorbidity, were categorized according to "The International Classification of Disease, 9th Revision, Clinical Modification" (ICD-9-CM) diagnosis system. For comparison, fourfold non-ADHD controls were recruited from 2001s NHI enrollees on a random base but matched gender and age of cases. ARM was done with Apriori algorithm to examine the strengths of associations among those diagnoses. The support and confidence values of ARM results were examined. Comorbidity rates and relative risk (RR) ratios of both groups of each diagnosis were compared one another. RESULTS ADHD case group has apparently higher risk of comorbidity with psychiatric comorbidity than with other physical illnesses. From results of ARM, developmental delay (DD) appears as an important node between ADHD and anxiety disorder (support: 5.12%, confidence: 97.42%), mild mental retardation (support: 4.42%, confidence: 92.09%) and autism (support: 6.49%, confidence: 94.93%). CONCLUSIONS The finding of this study, an important role of DD between ADHD and other psychiatric comorbidity, supports neurological findings in developmental delay of ADHD children's front cortex, as well as some epidemiology findings. This study also demonstrated the practicality of ARM in comorbidity studies using enormous clinic databases like NHIRD.
Collapse
Affiliation(s)
- Yueh-Ming Tai
- Department of Children and Adolescent Psychiatry, Beitou Armed Forces Hospital, Taiwan
| | | |
Collapse
|
8
|
Ahmad I, Mehmood A, Khurshid A, Qazi WM, Hoessli DC, Walker-Nasir E, Shakoori AR, Nasir-ud-Din. Phosphoproteome sequence analysis and significance: Mining association patterns around phosphorylation sites utilizing MAPRes. J Cell Biochem 2009; 108:64-74. [DOI: 10.1002/jcb.22220] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|