1
|
Pratyush P, Kc DB. Advances in Prediction of Posttranslational Modification Sites Known to Localize in Protein Supersecondary Structures. Methods Mol Biol 2025; 2870:117-151. [PMID: 39543034 DOI: 10.1007/978-1-0716-4213-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
Posttranslational modifications (PTMs) play a crucial role in modulating the structure, function, localization, and interactions of proteins, with many PTMs being localized within supersecondary structures, such as helical pairs. These modifications can significantly influence the conformation and stability of these structures. For instance, phosphorylation introduces negative charges that alter electrostatic interactions, while acetylation or methylation of lysine residues affects the stability and interactions of alpha helices or beta strands. Given the pivotal role of supersecondary structures in the overall protein architecture, their modulation by PTMs is essential for protein functionality. This chapter explores the latest advancements in predicting sites for the five PTMs (phosphorylation, acetylation, glycosylation, methylation, and ubiquitination) known to be localized within supersecondary structures. The chapter highlights the recent advances in the prediction of these PTM sites, including the use of global contextualized embeddings from protein language models, integration of structural information, utilization of reliable positive and negative sites, and application of contrastive learning. These methodologies and emerging trends offer a roadmap for novel innovations in addressing PTM prediction challenges, particularly those linked to supersecondary structures.
Collapse
Affiliation(s)
- Pawel Pratyush
- Computer Science Department, Michigan Technological University, Houghton, MI, USA
- Computer Science Department, Rochester Institute of Technology, Henrietta, NY, USA
| | - Dukka B Kc
- Computer Science Department, Michigan Technological University, Houghton, MI, USA.
- Computer Science Department, Rochester Institute of Technology, Henrietta, NY, USA.
| |
Collapse
|
2
|
Tu G, Wang X, Xia R, Song B. m6A-TCPred: a web server to predict tissue-conserved human m 6A sites using machine learning approach. BMC Bioinformatics 2024; 25:127. [PMID: 38528499 PMCID: PMC10962094 DOI: 10.1186/s12859-024-05738-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 03/11/2024] [Indexed: 03/27/2024] Open
Abstract
BACKGROUND N6-methyladenosine (m6A) is the most prevalent post-transcriptional modification in eukaryotic cells that plays a crucial role in regulating various biological processes, and dysregulation of m6A status is involved in multiple human diseases including cancer contexts. A number of prediction frameworks have been proposed for high-accuracy identification of putative m6A sites, however, none have targeted for direct prediction of tissue-conserved m6A modified residues from non-conserved ones at base-resolution level. RESULTS We report here m6A-TCPred, a computational tool for predicting tissue-conserved m6A residues using m6A profiling data from 23 human tissues. By taking advantage of the traditional sequence-based characteristics and additional genome-derived information, m6A-TCPred successfully captured distinct patterns between potentially tissue-conserved m6A modifications and non-conserved ones, with an average AUROC of 0.871 and 0.879 tested on cross-validation and independent datasets, respectively. CONCLUSION Our results have been integrated into an online platform: a database holding 268,115 high confidence m6A sites with their conserved information across 23 human tissues; and a web server to predict the conserved status of user-provided m6A collections. The web interface of m6A-TCPred is freely accessible at: www.rnamd.org/m6ATCPred .
Collapse
Affiliation(s)
- Gang Tu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Xuan Wang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, 215123, China.
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L7 8TX, UK.
| | - Rong Xia
- Department of Financial and Actuarial Mathematics, Xi'an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Bowen Song
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| |
Collapse
|
3
|
Mejia‐Rodriguez D, Kim H, Sadler N, Li X, Bohutskyi P, Valiev M, Qian W, Cheung MS. PTM-Psi: A python package to facilitate the computational investigation of post-translational modification on protein structures and their impacts on dynamics and functions. Protein Sci 2023; 32:e4822. [PMID: 37902126 PMCID: PMC10659954 DOI: 10.1002/pro.4822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 10/31/2023]
Abstract
Post-translational modification (PTM) of a protein occurs after it has been synthesized from its genetic template, and involves chemical modifications of the protein's specific amino acid residues. Despite of the central role played by PTM in regulating molecular interactions, particularly those driven by reversible redox reactions, it remains challenging to interpret PTMs in terms of protein dynamics and function because there are numerous combinatorially enormous means for modifying amino acids in response to changes in the protein environment. In this study, we provide a workflow that allows users to interpret how perturbations caused by PTMs affect a protein's properties, dynamics, and interactions with its binding partners based on inferred or experimentally determined protein structure. This Python-based workflow, called PTM-Psi, integrates several established open-source software packages, thereby enabling the user to infer protein structure from sequence, develop force fields for non-standard amino acids using quantum mechanics, calculate free energy perturbations through molecular dynamics simulations, and score the bound complexes via docking algorithms. Using the S-nitrosylation of several cysteines on the GAP2 protein as an example, we demonstrated the utility of PTM-Psi for interpreting sequence-structure-function relationships derived from thiol redox proteomics data. We demonstrate that the S-nitrosylated cysteine that is exposed to the solvent indirectly affects the catalytic reaction of another buried cysteine over a distance in GAP2 protein through the movement of the two ligands. Our workflow tracks the PTMs on residues that are responsive to changes in the redox environment and lays the foundation for the automation of molecular and systems biology modeling.
Collapse
Affiliation(s)
- Daniel Mejia‐Rodriguez
- Physical Sciences Division, Physical and Computational Sciences Directorate, Pacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Hoshin Kim
- Physical Sciences Division, Physical and Computational Sciences Directorate, Pacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Natalie Sadler
- Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Xiaolu Li
- Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Pavlo Bohutskyi
- Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National LaboratoryRichlandWashingtonUSA
- Biological Systems EngineeringWashington State UniversityRichlandWashingtonUSA
| | - Marat Valiev
- Physical Sciences Division, Physical and Computational Sciences Directorate, Pacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Wei‐Jun Qian
- Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Margaret S. Cheung
- Physical Sciences Division, Physical and Computational Sciences Directorate, Pacific Northwest National LaboratoryRichlandWashingtonUSA
- Environmental Molecular Sciences LaboratoryRichlandWashingtonUSA
- University of WashingtonSeattleWashingtonUSA
| |
Collapse
|
4
|
Chen W, Ding Z, Zang Y, Liu X. Characterization of Proteoform Post-Translational Modifications by Top-Down and Bottom-Up Mass Spectrometry in Conjunction with Annotations. J Proteome Res 2023; 22:3178-3189. [PMID: 37728997 PMCID: PMC10563160 DOI: 10.1021/acs.jproteome.3c00207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Indexed: 09/22/2023]
Abstract
Many proteoforms can be produced from a gene due to genetic mutations, alternative splicing, post-translational modifications (PTMs), and other variations. PTMs in proteoforms play critical roles in cell signaling, protein degradation, and other biological processes. Mass spectrometry (MS) is the primary technique for investigating PTMs in proteoforms, and two alternative MS approaches, top-down and bottom-up, have complementary strengths. The combination of the two approaches has the potential to increase the sensitivity and accuracy in PTM identification and characterization. In addition, protein and PTM knowledge bases, such as UniProt, provide valuable information for PTM characterization and verification. Here, we present a software pipeline PTM-TBA (PTM characterization by Top-down and Bottom-up MS and Annotations) for identifying and localizing PTMs in proteoforms by integrating top-down and bottom-up MS as well as PTM annotations. We assessed PTM-TBA using a technical triplicate of bottom-up and top-down MS data of SW480 cells. On average, database search of the top-down MS data identified 2000 mass shifts, 814.5 (40.7%) of which were matched to 11 common PTMs and 423 of which were localized. Of the mass shifts identified by top-down MS, PTM-TBA verified 435 mass shifts using the bottom-up MS data and UniProt annotations.
Collapse
Affiliation(s)
- Wenrong Chen
- Department
of BioHealth Informatics, Indiana University-Purdue
University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Zhengming Ding
- Department
of Computer Science, Tulane School of Science and Engineering, Tulane University, New Orleans, Louisiana 70118, United States
| | - Yong Zang
- Department
of Biostatics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Xiaowen Liu
- Tulane
Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, Louisiana 70112, United States
- Deming Department
of Medicine, Tulane University, New Orleans, Louisiana 70112, United States
| |
Collapse
|
5
|
Afshinpour M, Smith LA, Chakravarty S. AQcalc: A web server that identifies weak molecular interactions in protein structures. Protein Sci 2023; 32:e4762. [PMID: 37596782 PMCID: PMC10503417 DOI: 10.1002/pro.4762] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 07/25/2023] [Accepted: 08/15/2023] [Indexed: 08/20/2023]
Abstract
Weak molecular interactions play an important role in protein structure and function. Computational tools that identify weak molecular interactions are, therefore, valuable for the study of proteins. Here, we present AQcalc, a web server (https://aqcalcbiocomputing.com/) that can be used to identify anion-quadrupole (AQ) interactions, which are weak interactions involving aromatic residue (Trp, Tyr, and Phe) ring edges and anions (Asp, Glu, and phosphate ion) both within proteins and at their interfaces (protein-protein, protein-nucleic acids, and protein-lipid bilayer). AQcalc identifies AQ interactions as well as clusters involving AQ, cation-π, and salt bridges, among others. Utilizing AQcalc we analyzed weak interactions in protein models, even in the absence of experimental structures, to understand the contributions of weak interactions to deleterious structural changes, including those associated with oncogenic and germline disease variants. We identified several deleterious variants with disrupted AQ interactions (comparable in frequency to cation-π disruptions). Amyloid fibrils utilize AQ to bury anions at frequencies that far exceed those observed for globular proteins. AQ interactions were detected three and five times more frequently than the hydrogen-bonded AQ (HBAQ) in fibril structures and protein-lipid bilayer interfaces, respectively. By contrast, AQ and HBAQ interactions were detected with similar frequencies in globular proteins. Collectively, these findings suggest AQcalc will be effective in facilitating fine structural analysis. As other web utilities designed to identify protein residue interaction networks do not report AQ interactions, wide use of AQcalc will enrich our understanding of residue interaction networks and facilitate hypothesis testing by identifying and experimentally characterizing these comparably weak but important interactions.
Collapse
Affiliation(s)
- Maral Afshinpour
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Logan A. Smith
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Suvobrata Chakravarty
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| |
Collapse
|
6
|
Chen W, Ding Z, Zang Y, Liu X. Characterization of proteoform post-translational modifications by top-down and bottom-up mass spectrometry in conjunction with UniProt annotations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.04.535618. [PMID: 37066296 PMCID: PMC10104052 DOI: 10.1101/2023.04.04.535618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Many proteoforms can be produced from a gene due to genetic mutations, alternative splicing, post-translational modifications (PTMs), and other variations. PTMs in proteoforms play critical roles in cell signaling, protein degradation, and other biological processes. Mass spectrometry (MS) is the primary technique for investigating PTMs in proteoforms, and two alternative MS approaches, top-down and bottom-up, have complementary strengths. The combination of the two approaches has the potential to increase the sensitivity and accuracy in PTM identification and characterization. In addition, protein and PTM knowledgebases, such as UniProt, provide valuable information for PTM characterization and validation. Here, we present a software pipeline called PTM-TBA (PTM characterization by Top-down, Bottom-up MS and Annotations) for identifying and localizing PTMs in proteoforms by integrating top-down and bottom-up MS as well as UniProt annotations. We identified 1,662 mass shifts from a top-down MS data set of SW480 cells, 545 (33%) of which were matched to 12 common PTMs, and 351 of which were localized. PTM-TBA validated 346 of the 1,662 mass shifts using UniProt annotations or a bottom-up MS data set of SW480 cells.
Collapse
|
7
|
Jia J, Wu G, Li M, Qiu W. pSuc-EDBAM: Predicting lysine succinylation sites in proteins based on ensemble dense blocks and an attention module. BMC Bioinformatics 2022; 23:450. [PMID: 36316638 PMCID: PMC9620660 DOI: 10.1186/s12859-022-05001-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 10/25/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Lysine succinylation is a newly discovered protein post-translational modifications. Predicting succinylation sites helps investigate the metabolic disease treatments. However, the biological experimental approaches are costly and inefficient, it is necessary to develop efficient computational approaches. RESULTS In this paper, we proposed a novel predictor based on ensemble dense blocks and an attention module, called as pSuc-EDBAM, which adopted one hot encoding to derive the feature maps of protein sequences, and generated the low-level feature maps through 1-D CNN. Afterward, the ensemble dense blocks were used to capture feature information at different levels in the process of feature learning. We also introduced an attention module to evaluate the importance degrees of different features. The experimental results show that Acc reaches 74.25%, and MCC reaches 0.2927 on the testing dataset, which suggest that the pSuc-EDBAM outperforms the existing predictors. CONCLUSIONS The experimental results of ten-fold cross-validation on the training dataset and independent test on the testing dataset showed that pSuc-EDBAM outperforms the existing succinylation site predictors and can predict potential succinylation sites effectively. The pSuc-EDBAM is feasible and obtains the credible predictive results, which may also provide valuable references for other related research. To make the convenience of the experimental scientists, a user-friendly web server has been established ( http://bioinfo.wugenqiang.top/pSuc-EDBAM/ ), by which the desired results can be easily obtained.
Collapse
Affiliation(s)
- Jianhua Jia
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 China
| | - Genqiang Wu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 China
| | - Meifang Li
- Computer Department, Nanchang Institute of Technology, Nanchang, 330044 China
| | - Wangren Qiu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 China
| |
Collapse
|
8
|
Zhu F, Yang S, Meng F, Zheng Y, Ku X, Luo C, Hu G, Liang Z. Leveraging Protein Dynamics to Identify Functional Phosphorylation Sites using Deep Learning Models. J Chem Inf Model 2022; 62:3331-3345. [PMID: 35816597 DOI: 10.1021/acs.jcim.2c00484] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Accurate prediction of post-translational modifications (PTMs) is of great significance in understanding cellular processes, by modulating protein structure and dynamics. Nowadays, with the rapid growth of protein data at different "omics" levels, machine learning models largely enriched the prediction of PTMs. However, most machine learning models only rely on protein sequence and little structural information. The lack of the systematic dynamics analysis underlying PTMs largely limits the PTM functional predictions. In this research, we present two dynamics-centric deep learning models, namely, cDL-PAU and cDL-FuncPhos, by incorporating sequence, structure, and dynamics-based features to elucidate the molecular basis and underlying functional landscape of PTMs. cDL-PAU achieved satisfactory area under the curve (AUC) scores of 0.804-0.888 for predicting phosphorylation, acetylation, and ubiquitination (PAU) sites, while cDL-FuncPhos achieved an AUC value of 0.771 for predicting functional phosphorylation (FuncPhos) sites, displaying reliable improvements. Through a feature selection, the dynamics-based coupling and commute ability show large contributions in discovering PAU sites and FuncPhos sites, suggesting the allosteric propensity for important PTMs. The application of cDL-FuncPhos in three oncoproteins not only corroborates its strong performance in FuncPhos prioritization but also gains insight into the physical basis for the functions. The source code and data set of cDL-PAU and cDL-FuncPhos are available at https://github.com/ComputeSuda/PTM_ML.
Collapse
Affiliation(s)
- Fei Zhu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.,School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Sijie Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Fanwang Meng
- Department of Chemistry and Chemical Biology, McMaster University, Hamilton L8S 4L8, Ontario, Canada
| | - Yuxiang Zheng
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Xin Ku
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Cheng Luo
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Zhongjie Liang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.,Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.,State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| |
Collapse
|
9
|
Chen Z, Liu X, Zhao P, Li C, Wang Y, Li F, Akutsu T, Bain C, Gasser RB, Li J, Yang Z, Gao X, Kurgan L, Song J. iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets. Nucleic Acids Res 2022; 50:W434-W447. [PMID: 35524557 PMCID: PMC9252729 DOI: 10.1093/nar/gkac351] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 04/22/2022] [Accepted: 04/25/2022] [Indexed: 01/07/2023] Open
Abstract
The rapid accumulation of molecular data motivates development of innovative approaches to computationally characterize sequences, structures and functions of biological and chemical molecules in an efficient, accessible and accurate manner. Notwithstanding several computational tools that characterize protein or nucleic acids data, there are no one-stop computational toolkits that comprehensively characterize a wide range of biomolecules. We address this vital need by developing a holistic platform that generates features from sequence and structural data for a diverse collection of molecule types. Our freely available and easy-to-use iFeatureOmega platform generates, analyzes and visualizes 189 representations for biological sequences, structures and ligands. To the best of our knowledge, iFeatureOmega provides the largest scope when directly compared to the current solutions, in terms of the number of feature extraction and analysis approaches and coverage of different molecules. We release three versions of iFeatureOmega including a webserver, command line interface and graphical interface to satisfy needs of experienced bioinformaticians and less computer-savvy biologists and biochemists. With the assistance of iFeatureOmega, users can encode their molecular data into representations that facilitate construction of predictive models and analytical studies. We highlight benefits of iFeatureOmega based on three research applications, demonstrating how it can be used to accelerate and streamline research in bioinformatics, computational biology, and cheminformatics areas. The iFeatureOmega webserver is freely available at http://ifeatureomega.erc.monash.edu and the standalone versions can be downloaded from https://github.com/Superzchen/iFeatureOmega-GUI/ and https://github.com/Superzchen/iFeatureOmega-CLI/.
Collapse
Affiliation(s)
- Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
- Center for Crop Genome Engineering, Henan Agricultural University, Zhengzhou 450046, China
| | - Xuhan Liu
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Yanan Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Chris Bain
- Monash Data Future Institutes, Monash University, Melbourne, Victoria 3800, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
| | - Zuoren Yang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Future Institutes, Monash University, Melbourne, Victoria 3800, Australia
| |
Collapse
|
10
|
Xia Y, Jiang M, Luo Y, Feng G, Jia G, Zhang H, Wang P, Ge R. SuccSPred2.0: A Two-Step Model to Predict Succinylation Sites Based on Multifeature Fusion and Selection Algorithm. J Comput Biol 2022; 29:1085-1094. [PMID: 35714347 DOI: 10.1089/cmb.2022.0109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein succinylation is a novel type of post-translational modification in recent decade years. It played an important role in biological structure and functions verified by experiments. However, it is time consuming and laborious for the wet experimental identification of succinylation sites. Traditional technology cannot adapt to the rapid growth of the biological sequence data sets. In this study, a new computational method named SuccSPred2.0 was proposed to identify succinylation sites in the protein sequences based on multifeature fusion and maximal information coefficient (MIC) method. SuccSPred2.0 was implemented based on a two-step strategy. At first, high-dimension features were reduced by linear discriminant analysis to prevent overfitting. Subsequently, MIC method was employed to select the important features binding classifiers to predict succinylation sites. From the compared experiments on 10-fold cross-validation and independent test data sets, SuccSPred2.0 obtained promising improvements. Comparative experiments showed that SuccSPred2.0 was superior to previous tools in identifying succinylation sites in the given proteins.
Collapse
Affiliation(s)
- Yixiao Xia
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Minchao Jiang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Yizhang Luo
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Guanwen Feng
- Xi'an Key Laboratory of Big Data and Intelligent Vision, School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Gangyong Jia
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Hua Zhang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Pu Wang
- Computer School, Hubei University of Arts and Science, Xiangyang, China
| | - Ruiquan Ge
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| |
Collapse
|
11
|
Jia J, Wu G, Qiu W. pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm. Front Cell Dev Biol 2022; 10:894874. [PMID: 35686053 PMCID: PMC9170990 DOI: 10.3389/fcell.2022.894874] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open
Abstract
Being a new type of widespread protein post-translational modifications discovered in recent years, succinylation plays a key role in protein conformational regulation and cellular function regulation. Numerous studies have shown that succinylation modifications are closely associated with the development of many diseases. In order to gain insight into the mechanism of succinylation, it is vital to identify lysine succinylation sites. However, experimental identification of succinylation sites is time-consuming and laborious, and traditional identification tools are unable to meet the rapid growth of datasets. Therefore, to solve this problem, we developed a new predictor named pSuc-FFSEA, which can predict succinylation sites in protein sequences by feature fusion and stacking ensemble algorithm. Specifically, the sequence information and physicochemical properties were first extracted using EBGW, One-Hot, continuous bag-of-words, chaos game representation, and AAF_DWT. Following that, feature selection was performed, which applied LASSO to select the optimal subset of features for the classifier, and then, stacking ensemble classifier was designed using two-layer stacking ensemble, selecting three classifiers, SVM, broad learning system and LightGBM classifier, as the base classifiers of the first layer, using logistic regression classifier as the meta classifier of the second layer. In order to further improve the model prediction accuracy and reduce the computational effort, bayesian optimization algorithm and grid search algorithm were utilized to optimize the hyperparameters of the classifier. Finally, the results of rigorous 10-fold cross-validation indicated our predictor showed excellent robustness and performed better than the previous prediction tools, which achieved an average prediction accuracy of 0.7773 ± 0.0120. Besides, for the convenience of the most experimental scientists, a user-friendly and comprehensive web-server for pSuc-FFSEA has been established at https://bio.cangmang.xyz/pSuc-FFSEA, by which one can easily obtain the expected data and results without going through the complicated mathematics.
Collapse
Affiliation(s)
- Jianhua Jia
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, China
| | - Genqiang Wu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, China
| | - Wangren Qiu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, China
| |
Collapse
|
12
|
Kannan S, Krishnankutty R, Souchelnytskyi S. Novel Post-translational Modifications in Human Serum Albumin. Protein Pept Lett 2022; 29:473-484. [DOI: 10.2174/0929866529666220318152509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 01/11/2022] [Accepted: 01/25/2022] [Indexed: 11/22/2022]
Abstract
Aim:
This study aims to identify novel post-translational modifications in human serum
albumin by mass spectrometry.
Background:
Serum albumin is the most abundant protein in plasma, has many physiological
functions, and is in contact with most of the cells and tissues of the human body. Post-translational
modifications (PTMs) may affect functions, stability, and localization of albumin.
Objective:
Identify novel PTMs in human serum albumin by mass spectrometry.
Methods:
Human serum albumin (HSA) was used for tryptic digestion in-solution or in-gel. Mass
spectrometry was applied to identify PTMs in HSA. 3-dimensional modeling was applied to explore
the potential impact of PTMs on known functions of albumin.
Results:
Here, we report the identification of 61 novel PTMs of human serum albumin.
Phosphorylation, glycosylation, nitrosylation, deamidation, methylation, acetylation, palmitoylation,
geranylation, and farnesylation are some examples of the identified PTMs. Mass spectrometry was
used for the identification of PTMs in a purified HSA and HSA from the human plasma. Threedimensional
modeling of albumin with selected PTMs showed the location of these PTMs in the
regions involved in albumin interactions with drugs, metals, and fatty acids. The location of PTMs
in these regions may modify the binding capacity of albumin.
Conclusion:
This report adds 61 novel PTMs to the catalog of human albumin.
Collapse
Affiliation(s)
- Surya Kannan
- College of Medicine, QU Health, Qatar University, Doha, Qatar
| | | | - Serhiy Souchelnytskyi
- College of Medicine, QU Health, Qatar University, Doha, Qatar
- Oranta Cancer Diagnostics AB, Uppsala, 75263, Sweden
- Lviv National
University, Lviv, 79010, Ukraine
| |
Collapse
|
13
|
Methodological advances in the design of peptide-based vaccines. Drug Discov Today 2022; 27:1367-1380. [DOI: 10.1016/j.drudis.2022.03.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 12/02/2021] [Accepted: 03/07/2022] [Indexed: 12/11/2022]
|
14
|
de Brevern AG, Rebehmed J. Current status of PTMs structural databases: applications, limitations and prospects. Amino Acids 2022; 54:575-590. [PMID: 35020020 DOI: 10.1007/s00726-021-03119-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/20/2021] [Indexed: 12/11/2022]
Abstract
Protein 3D structures, determined by their amino acid sequences, are the support of major crucial biological functions. Post-translational modifications (PTMs) play an essential role in regulating these functions by altering the physicochemical properties of proteins. By virtue of their importance, several PTM databases have been developed and released in decades, but very few of these databases incorporate real 3D structural data. Since PTMs influence the function of the protein and their aberrant states are frequently implicated in human diseases, providing structural insights to understand the influence and dynamics of PTMs is crucial for unraveling the underlying processes. This review is dedicated to the current status of databases providing 3D structural data on PTM sites in proteins. Some of these databases are general, covering multiple types of PTMs in different organisms, while others are specific to one particular type of PTM, class of proteins or organism. The importance of these databases is illustrated with two major types of in silico applications: predicting PTM sites in proteins using machine learning approaches and investigating protein structure-function relationships involving PTMs. Finally, these databases suffer from multiple problems and care must be taken when analyzing the PTMs data.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université de Paris, INSERM, UMR_S 1134, DSIMB, 75739, Paris, France.,Université de la Réunion, INSERM, UMR_S 1134, DSIMB, 97715, Saint-Denis de La Réunion, France.,Laboratoire d'Excellence GR-Ex, 75739, Paris, France
| | - Joseph Rebehmed
- Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon.
| |
Collapse
|
15
|
Li F, Dong S, Leier A, Han M, Guo X, Xu J, Wang X, Pan S, Jia C, Zhang Y, Webb GI, Coin LJM, Li C, Song J. Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Brief Bioinform 2021; 23:6415313. [PMID: 34729589 DOI: 10.1093/bib/bbab461] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/27/2021] [Accepted: 10/07/2021] [Indexed: 12/14/2022] Open
Abstract
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
Collapse
Affiliation(s)
- Fuyi Li
- Monash University, Australia
| | | | - André Leier
- Department of Genetics, UAB School of Medicine, USA
| | - Meiya Han
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Jing Xu
- Computer Science and Technology from Nankai University, China
| | - Xiaoyu Wang
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia
| | - Shirui Pan
- University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Australia
| | - Yang Zhang
- Northwestern Polytechnical University, China
| | - Geoffrey I Webb
- Faculty of Information Technology at Monash University, Australia
| | - Lachlan J M Coin
- Department of Clinical Pathology, University of Melbourne, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry of Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| |
Collapse
|
16
|
Zhang H, He J, Hu G, Zhu F, Jiang H, Gao J, Zhou H, Lin H, Wang Y, Chen K, Meng F, Hao M, Zhao K, Luo C, Liang Z. Dynamics of Post-Translational Modification Inspires Drug Design in the Kinase Family. J Med Chem 2021; 64:15111-15125. [PMID: 34668699 DOI: 10.1021/acs.jmedchem.1c01076] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Post-translational modification (PTM) on protein plays important roles in the regulation of cellular function and disease pathogenesis. The systematic analysis of PTM dynamics presents great opportunities to enlarge the target space by PTM allosteric regulation. Here, we presented a framework by integrating the sequence, structural topology, and particular dynamics features to characterize the functional context and druggabilities of PTMs in the well-known kinase family. The machine learning models with these biophysical features could successfully predict PTMs. On the other hand, PTMs were identified to be significantly enriched in the reported allosteric pockets and the allosteric potential of PTM pockets were thus proposed through these biophysical features. In the end, the covalent inhibitor DC-Srci-6668 targeting the PTM pocket in c-Src kinase was identified, which inhibited the phosphorylation and locked c-Src in the inactive state. Our findings represent a crucial step toward PTM-inspired drug design in the kinase family.
Collapse
Affiliation(s)
- Huimin Zhang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.,Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,School of Life Science and Technology, Shanghai Tech University, 100 Haike Road, Shanghai 201210, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China
| | - Jixiao He
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Fei Zhu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Hao Jiang
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China
| | - Jing Gao
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China
| | - Hu Zhou
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China
| | - Hua Lin
- Biomedical Research Center of South China, College of Life Sciences, Fujian Normal University, 1 Keji Road, Fuzhou 350117, China
| | - Yingjuan Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Kaixian Chen
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,School of Life Science and Technology, Shanghai Tech University, 100 Haike Road, Shanghai 201210, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China
| | - Fanwang Meng
- Department of Chemistry and Chemical Biology, McMaster University, Hamilton, ON L8S 4L8, Canada
| | - Minghong Hao
- Ensem Therapeutics, Inc., 200 Boston Avenue, Medford, Massachusetts 02155, United States
| | - Kehao Zhao
- School of Pharmacy, Key Laboratory of Molecular Pharmacology and Drug Evaluation (Yantai University), Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, Yantai University, Yantai 264005, China
| | - Cheng Luo
- Drug Discovery and Design Center, the Center for Chemical Biology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,School of Life Science and Technology, Shanghai Tech University, 100 Haike Road, Shanghai 201210, China.,University of Chinese Academy of Sciences (UCAS), 19 Yuquan Road, Beijing 100049, China.,School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China
| | - Zhongjie Liang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| |
Collapse
|
17
|
Jia C, Zhang M, Fan C, Li F, Song J. Formator: Predicting Lysine Formylation Sites Based on the Most Distant Undersampling and Safe-Level Synthetic Minority Oversampling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1937-1945. [PMID: 31804942 DOI: 10.1109/tcbb.2019.2957758] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Lysine formylation is a reversible type of protein post-translational modification and has been found to be involved in a myriad of biological processes, including modulation of chromatin conformation and gene expression in histones and other nuclear proteins. Accurate identification of lysine formylation sites is essential for elucidating the underlying molecular mechanisms of formylation. Traditional experimental methods are time-consuming and expensive. As such, it is desirable and necessary to develop computational methods for accurate prediction of formylation sites. In this study, we propose a novel predictor, termed Formator, for identifying lysine formylation sites from sequences information. Formator is developed using the ensemble learning (EL) strategy based on four individual support vector machine classifiers via a voting system. Moreover, the most distant undersampling and Safe-Level-SMOTE oversampling techniques were integrated to deal with the data imbalance problem of the training dataset. Four effective feature extraction methods, namely bi-profile Bayes (BPB), k-nearest neighbor (KNN), amino acid physicochemical properties (AAindex), and composition and transition (CTD) were employed to encode the surrounding sequence features of potential formylation sites. Extensive empirical studies show that Formator achieved the accuracy of 87.24 and 74.96 percent on jackknife test and the independent test, respectively. Performance comparison results on the independent test indicate that Formator outperforms current existing prediction tool, LFPred, suggesting that it has a great potential to serve as a useful tool in identifying novel lysine formylation sites and facilitating hypothesis-driven experimental efforts.
Collapse
|
18
|
Kamacioglu A, Tuncbag N, Ozlu N. Structural analysis of mammalian protein phosphorylation at a proteome level. Structure 2021; 29:1219-1229.e3. [PMID: 34192515 DOI: 10.1016/j.str.2021.06.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 04/07/2021] [Accepted: 06/04/2021] [Indexed: 10/21/2022]
Abstract
Phosphorylation is an essential post-translational modification for almost all cellular processes. Several global phosphoproteomics analyses have revealed phosphorylation profiles under different conditions. Beyond identification of phospho-sites, protein structures add another layer of information about their functionality. In this study, we systematically characterize phospho-sites based on their 3D locations in the protein and establish a location map for phospho-sites. More than 250,000 phospho-sites have been analyzed, of which 8,686 sites match at least one structure and are stratified based on their respective 3D positions. Core phospho-sites possess two distinct groups based on their dynamicity. Dynamic core phosphorylations are significantly more functional compared with static ones. The dynamic core and the interface phospho-sites are the most functional among all 3D phosphorylation groups. Our analysis provides global characterization and stratification of phospho-sites from a structural perspective that can be utilized for predicting functional relevance and filtering out false positives in phosphoproteomic studies.
Collapse
Affiliation(s)
- Altug Kamacioglu
- Department of Molecular Biology and Genetics, Koc University, Istanbul, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey; School of Medicine, Koc University, 34450 Istanbul, Turkey; Koc University Research Center for Translational Medicine (KUTTAM), 34450 Istanbul, Turkey.
| | - Nurhan Ozlu
- Department of Molecular Biology and Genetics, Koc University, Istanbul, Turkey; School of Medicine, Koc University, 34450 Istanbul, Turkey; Koc University Research Center for Translational Medicine (KUTTAM), 34450 Istanbul, Turkey.
| |
Collapse
|
19
|
Krassowski M, Pellegrina D, Mee MW, Fradet-Turcotte A, Bhat M, Reimand J. ActiveDriverDB: Interpreting Genetic Variation in Human and Cancer Genomes Using Post-translational Modification Sites and Signaling Networks (2021 Update). Front Cell Dev Biol 2021; 9:626821. [PMID: 33834021 PMCID: PMC8021862 DOI: 10.3389/fcell.2021.626821] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 02/08/2021] [Indexed: 12/14/2022] Open
Abstract
Deciphering the functional impact of genetic variation is required to understand phenotypic diversity and the molecular mechanisms of inherited disease and cancer. While millions of genetic variants are now mapped in genome sequencing projects, distinguishing functional variants remains a major challenge. Protein-coding variation can be interpreted using post-translational modification (PTM) sites that are core components of cellular signaling networks controlling molecular processes and pathways. ActiveDriverDB is an interactive proteo-genomics database that uses more than 260,000 experimentally detected PTM sites to predict the functional impact of genetic variation in disease, cancer and the human population. Using machine learning tools, we prioritize proteins and pathways with enriched PTM-specific amino acid substitutions that potentially rewire signaling networks via induced or disrupted short linear motifs of kinase binding. We then map these effects to site-specific protein interaction networks and drug targets. In the 2021 update, we increased the PTM datasets by nearly 50%, included glycosylation, sumoylation and succinylation as new types of PTMs, and updated the workflows to interpret inherited disease mutations. We added a recent phosphoproteomics dataset reflecting the cellular response to SARS-CoV-2 to predict the impact of human genetic variation on COVID-19 infection and disease course. Overall, we estimate that 16-21% of known amino acid substitutions affect PTM sites among pathogenic disease mutations, somatic mutations in cancer genomes and germline variants in the human population. These data underline the potential of interpreting genetic variation through the lens of PTMs and signaling networks. The open-source database is freely available at www.ActiveDriverDB.org.
Collapse
Affiliation(s)
- Michal Krassowski
- Nuffield Department of Women’s and Reproductive Health, Medical Sciences Division, University of Oxford, Oxford, United Kingdom
| | - Diogo Pellegrina
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Miles W. Mee
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Amelie Fradet-Turcotte
- Department of Molecular Biology, Medical Biochemistry and Pathology, Universite Laval, Quebec, QC, Canada
- Oncology Division, Centre Hospitalier Universitaire (CHU) de Quebec-Universite Laval Research Center, Quebec, QC, Canada
| | - Mamatha Bhat
- Multiorgan Transplant Program, University Health Network, Toronto, ON, Canada
- Division of Gastroenterology & Hepatology, Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Jüri Reimand
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
20
|
Recent Advances in Predicting Protein S-Nitrosylation Sites. BIOMED RESEARCH INTERNATIONAL 2021; 2021:5542224. [PMID: 33628788 PMCID: PMC7892234 DOI: 10.1155/2021/5542224] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 01/24/2021] [Accepted: 01/25/2021] [Indexed: 01/09/2023]
Abstract
Protein S-nitrosylation (SNO) is a process of covalent modification of nitric oxide (NO) and its derivatives and cysteine residues. SNO plays an essential role in reversible posttranslational modifications of proteins. The accurate prediction of SNO sites is crucial in revealing a certain biological mechanism of NO regulation and related drug development. Identification of the sites of SNO in proteins is currently a very hot topic. In this review, we briefly summarize recent advances in computationally identifying SNO sites. The challenges and future perspectives for identifying SNO sites are also discussed. We anticipate that this review will provide insights into research on SNO site prediction.
Collapse
|
21
|
Yin J, Li F, Zhou Y, Mou M, Lu Y, Chen K, Xue J, Luo Y, Fu J, He X, Gao J, Zeng S, Yu L, Zhu F. INTEDE: interactome of drug-metabolizing enzymes. Nucleic Acids Res 2021; 49:D1233-D1243. [PMID: 33045737 PMCID: PMC7779056 DOI: 10.1093/nar/gkaa755] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 08/19/2020] [Accepted: 09/22/2020] [Indexed: 12/15/2022] Open
Abstract
Drug-metabolizing enzymes (DMEs) are critical determinant of drug safety and efficacy, and the interactome of DMEs has attracted extensive attention. There are 3 major interaction types in an interactome: microbiome-DME interaction (MICBIO), xenobiotics-DME interaction (XEOTIC) and host protein-DME interaction (HOSPPI). The interaction data of each type are essential for drug metabolism, and the collective consideration of multiple types has implication for the future practice of precision medicine. However, no database was designed to systematically provide the data of all types of DME interactions. Here, a database of the Interactome of Drug-Metabolizing Enzymes (INTEDE) was therefore constructed to offer these interaction data. First, 1047 unique DMEs (448 host and 599 microbial) were confirmed, for the first time, using their metabolizing drugs. Second, for these newly confirmed DMEs, all types of their interactions (3359 MICBIOs between 225 microbial species and 185 DMEs; 47 778 XEOTICs between 4150 xenobiotics and 501 DMEs; 7849 HOSPPIs between 565 human proteins and 566 DMEs) were comprehensively collected and then provided, which enabled the crosstalk analysis among multiple types. Because of the huge amount of accumulated data, the INTEDE made it possible to generalize key features for revealing disease etiology and optimizing clinical treatment. INTEDE is freely accessible at: https://idrblab.org/intede/.
Collapse
Affiliation(s)
- Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ying Zhou
- The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yinjing Lu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Kangli Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jia Xue
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xu He
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Hangzhou 310018, China
| | - Su Zeng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Hangzhou 310018, China
| | - Lushan Yu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Hangzhou 310018, China
| |
Collapse
|
22
|
Wang P, Zhang Q, Li S, Cheng B, Xue H, Wei Z, Shao T, Liu ZX, Cheng H, Wang Z. iCysMod: an integrative database for protein cysteine modifications in eukaryotes. Brief Bioinform 2021; 22:6066620. [PMID: 33406221 DOI: 10.1093/bib/bbaa400] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 11/23/2020] [Accepted: 12/07/2020] [Indexed: 01/06/2023] Open
Abstract
As important post-translational modifications, protein cysteine modifications (PCMs) occurring at cysteine thiol group play critical roles in the regulation of various biological processes in eukaryotes. Due to the rapid advancement of high-throughput proteomics technologies, a large number of PCM events have been identified but remain to be curated. Thus, an integrated resource of eukaryotic PCMs will be useful for the research community. In this work, we developed an integrative database for protein cysteine modifications in eukaryotes (iCysMod), which curated and hosted 108 030 PCM events for 85 747 experimentally identified sites on 31 483 proteins from 48 eukaryotes for 8 types of PCMs, including oxidation, S-nitrosylation (-SNO), S-glutathionylation (-SSG), disulfide formation (-SSR), S-sulfhydration (-SSH), S-sulfenylation (-SOH), S-sulfinylation (-SO2H) and S-palmitoylation (-S-palm). Then, browse and search options were provided for accessing the dataset, while various detailed information about the PCM events was well organized for visualization. With human dataset in iCysMod, the sequence features around the cysteine modification sites for each PCM type were analyzed, and the results indicated that various types of PCMs presented distinct sequence recognition preferences. Moreover, different PCMs can crosstalk with each other to synergistically orchestrate specific biological processes, and 37 841 PCM events involved in 119 types of PCM co-occurrences at the same cysteine residues were finally obtained. Taken together, we anticipate that the database of iCysMod would provide a useful resource for eukaryotic PCMs to facilitate related researches, while the online service is freely available at http://icysmod.omicsbio.info.
Collapse
Affiliation(s)
- Panqin Wang
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Qingfeng Zhang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Shihua Li
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Ben Cheng
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Han Xue
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Zhen Wei
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Tian Shao
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Ze-Xian Liu
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Han Cheng
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| | - Zhenlong Wang
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|
23
|
Peng D, Li H, Hu B, Zhang H, Chen L, Lin S, Zuo Z, Xue Y, Ren J, Xie Y. PTMsnp: A Web Server for the Identification of Driver Mutations That Affect Protein Post-translational Modification. Front Cell Dev Biol 2020; 8:593661. [PMID: 33240890 PMCID: PMC7683509 DOI: 10.3389/fcell.2020.593661] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 10/21/2020] [Indexed: 11/16/2022] Open
Abstract
High-throughput sequencing technologies have identified millions of genetic mutations in multiple human diseases. However, the interpretation of the pathogenesis of these mutations and the discovery of driver genes that dominate disease progression is still a major challenge. Combining functional features such as protein post-translational modification (PTM) with genetic mutations is an effective way to predict such alterations. Here, we present PTMsnp, a web server that implements a Bayesian hierarchical model to identify driver genetic mutations targeting PTM sites. PTMsnp accepts genetic mutations in a standard variant call format or tabular format as input and outputs several interactive charts of PTM-related mutations that potentially affect PTMs. Additional functional annotations are performed to evaluate the impact of PTM-related mutations on protein structure and function, as well as to classify variants relevant to Mendelian disease. A total of 4,11,574 modification sites from 33 different types of PTMs and 1,776,848 somatic mutations from TCGA across 33 different cancer types are integrated into the web server, enabling identification of candidate cancer driver genes based on PTM. Applications of PTMsnp to the cancer cohorts and a GWAS dataset of type 2 diabetes identified a set of potential drivers together with several known disease-related genes, indicating its reliability in distinguishing disease-related mutations and providing potential molecular targets for new therapeutic strategies. PTMsnp is freely available at: http://ptmsnp.renlab.org.
Collapse
Affiliation(s)
- Di Peng
- Precision Medicine Institute, The First Affiliated Hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Huiqin Li
- Precision Medicine Institute, The First Affiliated Hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Bosu Hu
- Precision Medicine Institute, The First Affiliated Hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Hongwan Zhang
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou, China
| | - Li Chen
- Precision Medicine Institute, The First Affiliated Hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Shaofeng Lin
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Zhixiang Zuo
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yu Xue
- Key Laboratory of Molecular Biophysics of Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Jian Ren
- Precision Medicine Institute, The First Affiliated Hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yubin Xie
- Precision Medicine Institute, The First Affiliated Hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
24
|
Chen X, Xiong Y, Liu Y, Chen Y, Bi S, Zhu X. m5CPred-SVM: a novel method for predicting m5C sites of RNA. BMC Bioinformatics 2020; 21:489. [PMID: 33126851 PMCID: PMC7602301 DOI: 10.1186/s12859-020-03828-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 10/21/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement. RESULTS In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites. CONCLUSION In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at https://zhulab.ahu.edu.cn/m5CPred-SVM .
Collapse
Affiliation(s)
- Xiao Chen
- School of Sciences, Anhui Agricultural University, Hefei, 230036 Anhui China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240 China
| | - Yinbo Liu
- School of Sciences, Anhui Agricultural University, Hefei, 230036 Anhui China
| | - Yuqing Chen
- School of Sciences, Anhui Agricultural University, Hefei, 230036 Anhui China
| | - Shoudong Bi
- School of Sciences, Anhui Agricultural University, Hefei, 230036 Anhui China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, 230036 Anhui China
| |
Collapse
|
25
|
Bonne Køhler J, Jers C, Senissar M, Shi L, Derouiche A, Mijakovic I. Importance of protein Ser/Thr/Tyr phosphorylation for bacterial pathogenesis. FEBS Lett 2020; 594:2339-2369. [PMID: 32337704 DOI: 10.1002/1873-3468.13797] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 04/16/2020] [Accepted: 04/20/2020] [Indexed: 12/13/2022]
Abstract
Protein phosphorylation regulates a large variety of biological processes in all living cells. In pathogenic bacteria, the study of serine, threonine, and tyrosine (Ser/Thr/Tyr) phosphorylation has shed light on the course of infectious diseases, from adherence to host cells to pathogen virulence, replication, and persistence. Mass spectrometry (MS)-based phosphoproteomics has provided global maps of Ser/Thr/Tyr phosphosites in bacterial pathogens. Despite recent developments, a quantitative and dynamic view of phosphorylation events that occur during bacterial pathogenesis is currently lacking. Temporal, spatial, and subpopulation resolution of phosphorylation data is required to identify key regulatory nodes underlying bacterial pathogenesis. Herein, we discuss how technological improvements in sample handling, MS instrumentation, data processing, and machine learning should improve bacterial phosphoproteomic datasets and the information extracted from them. Such information is expected to significantly extend the current knowledge of Ser/Thr/Tyr phosphorylation in pathogenic bacteria and should ultimately contribute to the design of novel strategies to combat bacterial infections.
Collapse
Affiliation(s)
- Julie Bonne Køhler
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
| | - Carsten Jers
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
| | - Mériem Senissar
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
| | - Lei Shi
- Systems and Synthetic Biology Division, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Abderahmane Derouiche
- Systems and Synthetic Biology Division, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Ivan Mijakovic
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark.,Systems and Synthetic Biology Division, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
26
|
Chen H, Li F, Wang L, Jin Y, Chi CH, Kurgan L, Song J, Shen J. Systematic evaluation of machine learning methods for identifying human-pathogen protein-protein interactions. Brief Bioinform 2020; 22:5847611. [PMID: 32459334 DOI: 10.1093/bib/bbaa068] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 03/31/2020] [Accepted: 04/01/2020] [Indexed: 12/11/2022] Open
Abstract
In recent years, high-throughput experimental techniques have significantly enhanced the accuracy and coverage of protein-protein interaction identification, including human-pathogen protein-protein interactions (HP-PPIs). Despite this progress, experimental methods are, in general, expensive in terms of both time and labour costs, especially considering that there are enormous amounts of potential protein-interacting partners. Developing computational methods to predict interactions between human and bacteria pathogen has thus become critical and meaningful, in both facilitating the detection of interactions and mining incomplete interaction maps. In this paper, we present a systematic evaluation of machine learning-based computational methods for human-bacterium protein-protein interactions (HB-PPIs). We first reviewed a vast number of publicly available databases of HP-PPIs and then critically evaluate the availability of these databases. Benefitting from its well-structured nature, we subsequently preprocess the data and identified six bacterium pathogens that could be used to study bacterium subjects in which a human was the host. Additionally, we thoroughly reviewed the literature on 'host-pathogen interactions' whereby existing models were summarized that we used to jointly study the impact of different feature representation algorithms and evaluate the performance of existing machine learning computational models. Owing to the abundance of sequence information and the limited scale of other protein-related information, we adopted the primary protocol from the literature and dedicated our analysis to a comprehensive assessment of sequence information and machine learning models. A systematic evaluation of machine learning models and a wide range of feature representation algorithms based on sequence information are presented as a comparison survey towards the prediction performance evaluation of HB-PPIs.
Collapse
|
27
|
Li F, Chen J, Ge Z, Wen Y, Yue Y, Hayashida M, Baggag A, Bensmail H, Song J. Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework. Brief Bioinform 2020; 22:2126-2140. [PMID: 32363397 DOI: 10.1093/bib/bbaa049] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 02/25/2020] [Accepted: 03/11/2020] [Indexed: 12/12/2022] Open
Abstract
Promoters are short consensus sequences of DNA, which are responsible for transcription activation or the repression of all genes. There are many types of promoters in bacteria with important roles in initiating gene transcription. Therefore, solving promoter-identification problems has important implications for improving the understanding of their functions. To this end, computational methods targeting promoter classification have been established; however, their performance remains unsatisfactory. In this study, we present a novel stacked-ensemble approach (termed SELECTOR) for identifying both promoters and their respective classification. SELECTOR combined the composition of k-spaced nucleic acid pairs, parallel correlation pseudo-dinucleotide composition, position-specific trinucleotide propensity based on single-strand, and DNA strand features and using five popular tree-based ensemble learning algorithms to build a stacked model. Both 5-fold cross-validation tests using benchmark datasets and independent tests using the newly collected independent test dataset showed that SELECTOR outperformed state-of-the-art methods in both general and specific types of promoter prediction in Escherichia coli. Furthermore, this novel framework provides essential interpretations that aid understanding of model success by leveraging the powerful Shapley Additive exPlanation algorithm, thereby highlighting the most important features relevant for predicting both general and specific types of promoters and overcoming the limitations of existing 'Black-box' approaches that are unable to reveal causal relationships from large amounts of initially encoded features.
Collapse
Affiliation(s)
- Fuyi Li
- Northwest A&F University, China.,Department of Biochemistry and Molecular Biology and the Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Australia
| | - Jinxiang Chen
- Biomedicine Discovery Institute and the Department of Biochemistry and Molecular Biology, Monash University from the College of Information Engineering, Northwest A&F University, China
| | - Zongyuan Ge
- Monash University and also serves as a Deep Learning Specialist at NVIDIA AI Technology Centre. Before joining Monash, he was a research scientist at IBM Research Australia doing research in medical AI during 2016-2018. His research interests are AI, computer vision, medical image, robotics and deep learning
| | - Ya Wen
- computer technology from Ningxia University, China
| | - Yanwei Yue
- medical science from Southern Medical University, China
| | - Morihiro Hayashida
- informatics from Kyoto University, Japan, in 2005. He is an Assistant Professor in the Department of Electrical Engineering and Computer Science, National Institute of Technology, Matsue College, Japan
| | - Abdelkader Baggag
- computer science from the University of Minnesota. He is a Senior Scientist at the Qatar Computing Research Institute (QCRI) and has a joint appointment as an Associate Professor at Hamad Bin Khalifa University (HBKU) in the Division of Information and Computing Technology. His research interests include data mining, linear algebra and machine learning
| | - Halima Bensmail
- University of Pierre & Marie Currie (Paris 6) in France. She is currently a Principal Scientist at QCRI-HBKU and a joint Associate Professor at the College of Computer and Science Engineering, HBKU
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Australia. He is also affiliated with the Monash Centre for Data Science, Faculty of Information Technology, Monash University. His research interests include bioinformatics, computational biology, machine learning, data mining, and pattern recognition
| |
Collapse
|
28
|
Li F, Leier A, Liu Q, Wang Y, Xiang D, Akutsu T, Webb GI, Smith AI, Marquez-Lago T, Li J, Song J. Procleave: Predicting Protease-specific Substrate Cleavage Sites by Combining Sequence and Structural Information. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:52-64. [PMID: 32413515 PMCID: PMC7393547 DOI: 10.1016/j.gpb.2019.08.002] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 08/08/2019] [Accepted: 10/23/2019] [Indexed: 10/29/2022]
Abstract
Proteases are enzymes that cleave and hydrolyse the peptide bonds between two specific amino acid residues of target substrate proteins. Protease-controlled proteolysis plays a key role in the degradation and recycling of proteins, which is essential for various physiological processes. Thus, solving the substrate identification problem will have important implications for the precise understanding of functions and physiological roles of proteases, as well as for therapeutic target identification and pharmaceutical applicability. Consequently, there is a great demand for bioinformatics methods that can predict novel substrate cleavage events with high accuracy by utilizing both sequence and structural information. In this study, we present Procleave, a novel bioinformatics approach for predicting protease-specific substrates and specific cleavage sites by taking into account both their sequence and 3D structural information. Structural features of known cleavage sites were represented by discrete values using a LOWESS data-smoothing optimization method, which turned out to be critical for the performance of Procleave. The optimal approximations of all structural parameter values were encoded in a conditional random field (CRF) computational framework, alongside sequence and chemical group-based features. Here, we demonstrate the outstanding performance of Procleave through extensive benchmarking and independent tests. Procleave is capable of correctly identifying most cleavage sites in the case study. Importantly, when applied to the human structural proteome encompassing 17,628 protein structures, Procleave suggests a number of potential novel target substrates and their corresponding cleavage sites of different proteases. Procleave is implemented as a webserver and is freely accessible at http://procleave.erc.monash.edu/.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Andre Leier
- School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Yanan Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Dongxu Xiang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Tatiana Marquez-Lago
- School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35233, USA.
| | - Jian Li
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia; ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia.
| |
Collapse
|