1
|
Cao J, Xu Y. Predicting cysteine reactivity changes upon phosphorylation using XGBoost. FEBS Open Bio 2024; 14:51-62. [PMID: 37964470 PMCID: PMC10761938 DOI: 10.1002/2211-5463.13737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/11/2023] [Accepted: 10/27/2023] [Indexed: 11/16/2023] Open
Abstract
Cysteine reactivity serves as a significant indicator of protein function and can be affected by phosphorylation events. Experimental approaches have been developed to investigate this effect, but the scale is still relatively limited. Machine-learning approaches promise to accelerate the investigation of these phenomena. In this study, protein sequence information, distances to the closest phosphorylation sites, and the membership score of the intrinsically disordered region were used to represent the cysteine. Following the feature selection using an elastic net model, two groups of binary classifiers based on XGBoost were built to predict the occurrence and the direction of the reactivity change as a response to phosphorylation events, respectively. In addition, function enrichment analysis was performed on proteins/genes predicted to have reactivity changes. XGBoost performed the best in the independent test with AUC of 0.8192 and 0.9203 for the prediction of the change's occurrence and direction, respectively. The use of two binary classifiers successively resulted in an accuracy of 0.7568 in predicting whether reactivity would be unchanged, increased, or decreased. The enrichment analysis revealed the association of proteins carrying reactivity-changed cysteine residues with various disease-related pathways, particularly cancer, autosomal dominant diseases, and viral infections. Changes in cysteine reactivity influenced by phosphorylation are site-specific and can be predicted by XGBoost algorithms. Our model provides an efficient alternative way to explore the cysteine reactivity upon phosphorylation at the proteome-wide level, facilitating the investigation of protein functions and their clinical insights. Our code is available on GitHub (https://github.com/DarinaOsamu/predictors-of-cysteine-reactivity-changes).
Collapse
Affiliation(s)
- Jing Cao
- Department of StatisticsUniversity of Science and Technology BeijingChina
| | - Yan Xu
- Department of StatisticsUniversity of Science and Technology BeijingChina
| |
Collapse
|
2
|
Li X, Gluth A, Zhang T, Qian WJ. Thiol redox proteomics: Characterization of thiol-based post-translational modifications. Proteomics 2023; 23:e2200194. [PMID: 37248656 PMCID: PMC10764013 DOI: 10.1002/pmic.202200194] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 05/12/2023] [Accepted: 05/16/2023] [Indexed: 05/31/2023]
Abstract
Redox post-translational modifications on cysteine thiols (redox PTMs) have profound effects on protein structure and function, thus enabling regulation of various biological processes. Redox proteomics approaches aim to characterize the landscape of redox PTMs at the systems level. These approaches facilitate studies of condition-specific, dynamic processes implicating redox PTMs and have furthered our understanding of redox signaling and regulation. Mass spectrometry (MS) is a powerful tool for such analyses which has been demonstrated by significant advances in redox proteomics during the last decade. A group of well-established approaches involves the initial blocking of free thiols followed by selective reduction of oxidized PTMs and subsequent enrichment for downstream detection. Alternatively, novel chemoselective probe-based approaches have been developed for various redox PTMs. Direct detection of redox PTMs without any enrichment has also been demonstrated given the sensitivity of contemporary MS instruments. This review discusses the general principles behind different analytical strategies and covers recent advances in redox proteomics. Several applications of redox proteomics are also highlighted to illustrate how large-scale redox proteomics data can lead to novel biological insights.
Collapse
Affiliation(s)
- Xiaolu Li
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Austin Gluth
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
- Department of Biological Systems Engineering, Washington State University, Richland, WA 99354
| | - Tong Zhang
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Wei-Jun Qian
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| |
Collapse
|
3
|
Hu Y, Wang Z, Shen C, Jiang C, Zhu Z, Liang P, Li H, Zeng Q, Xue Y, Wu Y, Wang Y, Liu L, Zhu H, Yi Y, Liu Q. Influence of the pK a value on the antioxidant activity of licorice flavonoids under solvent-mediated effects. Arch Pharm (Weinheim) 2023; 356:e2200470. [PMID: 36707412 DOI: 10.1002/ardp.202200470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 11/25/2022] [Accepted: 11/28/2022] [Indexed: 01/29/2023]
Abstract
Licorice flavonoids (LCFs) have been widely used in food care and medical treatment due to their significant antioxidant activities. However, the molecular mechanism of their antioxidant activity remains unclear. Therefore, network pharmacology, ADMET, density functional theory (DFT), molecular docking, and molecular dynamics (MD) simulation were employed to explore the molecular mechanism of the antioxidant effects of LCF. The network pharmacology and ADMET studies showed that the active molecules of kumatakenin (pKa = 6.18), licoflavonol (pKa = 6.86), and topazolin (pKa = 6.21) in LCF are key antioxidant components and have good biosafety. Molecular docking and MD simulation studies demonstrated that active molecules interacted with amino acid residues in target proteins to form stable protein-ligand complexes and exert their antioxidant effects. DFT studies showed that the antioxidant activity of LCF could be significantly modulated under the solvent-mediated effect. In addition, based on the derivation of the Henderson-Hasselbalch and van't Hoff formulas, the functional relationships between the reaction-free energy (ΔG) of LCF and the pH and pKa values were established. The results showed that active molecules with larger pKa values will be more conducive to the improvement of their antioxidant activity under solvent-mediated effects. In conclusion, this study found that increasing the pKa value of LCF would be an effective strategy to improve their antioxidant activity under the effect of solvent mediation. The pKa value of an LCF will be a direct standard to evaluate its solvent-mediated antioxidant activity. This study will provide theoretical guidance for the development of natural antioxidants.
Collapse
Affiliation(s)
- Yi Hu
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Zhuxian Wang
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Chunyan Shen
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - CuiPing Jiang
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Zhaoming Zhu
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Peiyi Liang
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Hui Li
- Department of Traditional Chinese Medicine, Guangzhou Red Cross Hospital, Jinan University, Guangzhou, China
| | - Quanfu Zeng
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Yaqi Xue
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Yufan Wu
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Yuan Wang
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Li Liu
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Hongxia Zhu
- Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Yankui Yi
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Qiang Liu
- School of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| |
Collapse
|
4
|
Mazo M, Khudobin R, Balabaev N, Belov N, Ryzhikh V, Nikiforov R, Chatterjee R, Banerjee S. Structure and free volume of fluorine-containing polyetherimides with pendant di-tert-butyl groups investigated by molecular dynamics simulation. POLYMER 2022. [DOI: 10.1016/j.polymer.2022.125318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
5
|
Ning Q, Li J. DLF-Sul: a multi-module deep learning framework for prediction of S-sulfinylation sites in proteins. Brief Bioinform 2022; 23:6658856. [PMID: 35945138 DOI: 10.1093/bib/bbac323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 07/16/2022] [Accepted: 07/18/2022] [Indexed: 11/14/2022] Open
Abstract
Protein S-sulfinylation is an important posttranslational modification that regulates a variety of cell and protein functions. This modification has been linked to signal transduction, redox homeostasis and neuronal transmission in studies. Therefore, identification of S-sulfinylation sites is crucial to understanding its structure and function, which is critical in cell biology and human diseases. In this study, we propose a multi-module deep learning framework named DLF-Sul for identification of S-sulfinylation sites in proteins. First, three types of features are extracted including binary encoding, BLOSUM62 and amino acid index. Then, sequential features are further extracted based on these three types of features using bidirectional long short-term memory network. Next, multi-head self-attention mechanism is utilized to filter the effective attribute information, and residual connection helps to reduce information loss. Furthermore, convolutional neural network is employed to extract local deep features information. Finally, fully connected layers acts as classifier that map samples to corresponding label. Performance metrics on independent test set, including sensitivity, specificity, accuracy, Matthews correlation coefficient and area under curve, reach 91.80%, 92.36%, 92.08%, 0.8416 and 96.40%, respectively. The results show that DLF-Sul is an effective tool for predicting S-sulfinylation sites. The source code is available on the website https://github.com/ningq669/DLF-Sul.
Collapse
Affiliation(s)
- Qiao Ning
- Information Science and Technology College, Dalian Maritime University, Dalian 116026, China
| | - Jinmou Li
- Information Science and Technology College, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
6
|
Mohammadi A, Zahiri J, Mohammadi S, Khodarahmi M, Arab SS. PSSMCOOL: A Comprehensive R Package for Generating Evolutionary-based Descriptors of Protein Sequences from PSSM Profiles. BIOLOGY METHODS AND PROTOCOLS 2022; 7:bpac008. [PMID: 35388370 PMCID: PMC8977839 DOI: 10.1093/biomethods/bpac008] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 01/21/2022] [Indexed: 11/14/2022]
Abstract
Position-specific scoring matrix (PSSM), also called profile, is broadly used for representing the evolutionary history of a given protein sequence. Several investigations reported that the PSSM-based feature descriptors can improve the prediction of various protein attributes such as interaction, function, subcellular localization, secondary structure, disorder regions, and accessible surface area. While plenty of algorithms have been suggested for extracting evolutionary features from PSSM in recent years, there is not any integrated standalone tool for providing these descriptors. Here, we introduce PSSMCOOL, a flexible comprehensive R package that generates 38 PSSM-based feature vectors. To our best knowledge, PSSMCOOL is the first PSSM-based feature extraction tool implemented in R. With the growing demand for exploiting machine-learning algorithms in computational biology, this package would be a practical tool for machine-learning predictions.
Collapse
Affiliation(s)
- Alireza Mohammadi
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Javad Zahiri
- Department of Neuroscience, University of California San Diego, California, USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Saber Mohammadi
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Mohsen Khodarahmi
- Department of Radiology, Shahid Madani Hospital, Karaj, Iran
- Bahar Medical Imaging Center, Karaj, Iran
- Dr. Khodarahmi Medical Imaging Center, Karaj, Iran
| | - Seyed Shahriar Arab
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
7
|
Stoichiometric Thiol Redox Proteomics for Quantifying Cellular Responses to Perturbations. Antioxidants (Basel) 2021; 10:antiox10030499. [PMID: 33807006 PMCID: PMC8004825 DOI: 10.3390/antiox10030499] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 03/16/2021] [Accepted: 03/17/2021] [Indexed: 12/14/2022] Open
Abstract
Post-translational modifications regulate the structure and function of proteins that can result in changes to the activity of different pathways. These include modifications altering the redox state of thiol groups on protein cysteine residues, which are sensitive to oxidative environments. While mass spectrometry has advanced the identification of protein thiol modifications and expanded our knowledge of redox-sensitive pathways, the quantitative aspect of this technique is critical for the field of redox proteomics. In this review, we describe how mass spectrometry-based redox proteomics has enabled researchers to accurately quantify the stoichiometry of reversible oxidative modifications on specific cysteine residues of proteins. We will describe advancements in the methodology that allow for the absolute quantitation of thiol modifications, as well as recent reports that have implemented this approach. We will also highlight the significance and application of such measurements and why they are informative for the field of redox biology.
Collapse
|
8
|
Guillaubez JV, Pitrat D, Bretonnière Y, Lemoine J, Girod M. Unbiased Detection of Cysteine Sulfenic Acid by 473 nm Photodissociation Mass Spectrometry: Toward Facile In Vivo Oxidative Status of Plasma Proteins. Anal Chem 2021; 93:2907-2915. [PMID: 33522244 DOI: 10.1021/acs.analchem.0c04484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Cysteine (Cys) is prone to diverse post-translational modifications in proteins, including oxidation into sulfenic acid (Cys-SOH) by reactive oxygen species generated under oxidative stress. Detection of low-concentration and metastable Cys-SOH within complex biological matrices is challenging due to the dynamic concentration range of proteins in the samples. Herein, visible laser-induced dissociation (LID) implemented in a mass spectrometer was used for streamlining the detection of Cys oxidized proteins owing to proper derivatization of Cys-SOH with a chromophore tag functionalized with a cyclohexanedione group. Once grafted, peptides undergo a high fragmentation yield under LID, leading concomitantly to informative backbone ions and to a chromophore reporter ion. Seventy-nine percent of the Cys-containing tryptic peptides derived from human serum albumin and serotransferrin tracked by parallel reaction monitoring (PRM) were detected as targets subjected to oxidation. These candidates as well as Cys-containing peptides predicted by in silico trypsin digestion of five other human plasma proteins were then tracked in real plasma samples to pinpoint the endogenous Cys-SOH subpopulation. Most of the targeted peptides were detected in all plasma samples by LID-PRM, with significant differences in their relative amounts. By eliminating the signal of interfering co-eluted compounds, LID-PRM surpasses conventional HCD (higher-energy collisional dissociation)-PRM in detecting grafted Cys-SOH-containing peptides and allows now to foresee clinical applications in large human cohorts.
Collapse
Affiliation(s)
- Jean-Valery Guillaubez
- Univ Lyon, CNRS, Université Claude Bernard Lyon 1, Institut des Sciences Analytiques, UMR 5280, 5 rue de la Doua, Villeurbanne F-69100, France
| | - Delphine Pitrat
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Lyon I, Laboratoire de Chimie, F-69342 Lyon, France
| | - Yann Bretonnière
- Univ Lyon, ENS de Lyon, CNRS UMR 5182, Université Lyon I, Laboratoire de Chimie, F-69342 Lyon, France
| | - Jérôme Lemoine
- Univ Lyon, CNRS, Université Claude Bernard Lyon 1, Institut des Sciences Analytiques, UMR 5280, 5 rue de la Doua, Villeurbanne F-69100, France
| | - Marion Girod
- Univ Lyon, CNRS, Université Claude Bernard Lyon 1, Institut des Sciences Analytiques, UMR 5280, 5 rue de la Doua, Villeurbanne F-69100, France
| |
Collapse
|
9
|
RAM-PGK: Prediction of Lysine Phosphoglycerylation Based on Residue Adjacency Matrix. Genes (Basel) 2020; 11:genes11121524. [PMID: 33419274 PMCID: PMC7766696 DOI: 10.3390/genes11121524] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/15/2020] [Accepted: 12/16/2020] [Indexed: 11/29/2022] Open
Abstract
Background: Post-translational modification (PTM) is a biological process that is associated with the modification of proteome, which results in the alteration of normal cell biology and pathogenesis. There have been numerous PTM reports in recent years, out of which, lysine phosphoglycerylation has emerged as one of the recent developments. The traditional methods of identifying phosphoglycerylated residues, which are experimental procedures such as mass spectrometry, have shown to be time-consuming and cost-inefficient, despite the abundance of proteins being sequenced in this post-genomic era. Due to these drawbacks, computational techniques are being sought to establish an effective identification system of phosphoglycerylated lysine residues. The development of a predictor for phosphoglycerylation prediction is not a first, but it is necessary as the latest predictor falls short in adequately detecting phosphoglycerylated and non-phosphoglycerylated lysine residues. Results: In this work, we introduce a new predictor named RAM-PGK, which uses sequence-based information relating to amino acid residues to predict phosphoglycerylated and non-phosphoglycerylated sites. A benchmark dataset was employed for this purpose, which contained experimentally identified phosphoglycerylated and non-phosphoglycerylated lysine residues. From the dataset, we extracted the residue adjacency matrix pertaining to each lysine residue in the protein sequences and converted them into feature vectors, which is used to build the phosphoglycerylation predictor. Conclusion: RAM-PGK, which is based on sequential features and support vector machine classifiers, has shown a noteworthy improvement in terms of performance in comparison to some of the recent prediction methods. The performance metrics of the RAM-PGK predictor are: 0.5741 sensitivity, 0.6436 specificity, 0.0531 precision, 0.6414 accuracy, and 0.0824 Mathews correlation coefficient.
Collapse
|
10
|
Le NQK, Yapp EKY, Nagasundaram N, Chua MCH, Yeh HY. Computational identification of vesicular transport proteins from sequences using deep gated recurrent units architecture. Comput Struct Biotechnol J 2019; 17:1245-1254. [PMID: 31921391 PMCID: PMC6944713 DOI: 10.1016/j.csbj.2019.09.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 09/07/2019] [Accepted: 09/11/2019] [Indexed: 11/20/2022] Open
Abstract
Protein function prediction is one of the most well-studied topics, attracting attention from countless researchers in the field of computational biology. Implementing deep neural networks that help improve the prediction of protein function, however, is still a major challenge. In this research, we suggested a new strategy that includes gated recurrent units and position-specific scoring matrix profiles to predict vesicular transportation proteins, a biological function of great importance. Although it is difficult to discover its function, our model is able to achieve accuracies of 82.3% and 85.8% in the cross-validation and independent dataset, respectively. We also solve the problem of imbalance in the dataset via tuning class weight in the deep learning model. The results generated showed sensitivity, specificity, MCC, and AUC to have values of 79.2%, 82.9%, 0.52, and 0.861, respectively. Our strategy shows superiority in results on the same dataset against all other state-of-the-art algorithms. In our suggested research, we have suggested a technique for the discovery of more proteins, particularly proteins connected with vesicular transport. In addition, our accomplishment could encourage the use of gated recurrent units architecture in protein function prediction.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639818, Singapore
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
| | - Edward Kien Yee Yapp
- Singapore Institute of Manufacturing Technology, 2 Fusionopolis Way, #08-04, Innovis, 138634, Singapore
| | - N. Nagasundaram
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639818, Singapore
| | - Matthew Chin Heng Chua
- Institute of Systems Science, 25 Heng Mui Keng Terrace, National University of Singapore, 119615, Singapore
| | - Hui-Yuan Yeh
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639818, Singapore
| |
Collapse
|
11
|
Structure and Properties of High and Low Free Volume Polymers Studied by Molecular Dynamics Simulation. COMPUTATION 2019. [DOI: 10.3390/computation7020027] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Using molecular dynamics, a comparative study was performed of two pairs of glassy polymers, low permeability polyetherimides (PEIs) and highly permeable Si-containing polytricyclononenes. All calculations were made with 32 independent models for each polymer. In both cases, the accessible free volume (AFV) increases with decreasing probe size. However, for a zero-size probe, the curves for both types of polymers cross the ordinate in the vicinity of 40%. The size distribution of free volume in PEI and highly permeable polymers differ significantly. In the former case, they are represented by relatively narrow peaks, with the maxima in the range of 0.5–1.0 Å for all the probes from H2 to Xe. In the case of highly permeable Si-containing polymers, much broader peaks are observed to extend up to 7–8 Å for all the gaseous probes. The obtained size distributions of free volume and accessible volume explain the differences in the selectivity of the studied polymers. The surface area of AFV is found for PEIs using Delaunay tessellation. Its analysis and the chemical nature of the groups that form the surface of free volume elements are presented and discussed.
Collapse
|