1
|
Zhra M, Qasem RJ, Aldossari F, Saleem R, Aljada A. A Comprehensive Exploration of Caspase Detection Methods: From Classical Approaches to Cutting-Edge Innovations. Int J Mol Sci 2024; 25:5460. [PMID: 38791499 PMCID: PMC11121653 DOI: 10.3390/ijms25105460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/07/2024] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
The activation of caspases is a crucial event and an indicator of programmed cell death, also known as apoptosis. These enzymes play a central role in cancer biology and are considered one promising target for current and future advancements in therapeutic interventions. Traditional methods of measuring caspase activity such as antibody-based methods provide fundamental insights into their biological functions, and are considered essential tools in the fields of cell and cancer biology, pharmacology and toxicology, and drug discovery. However, traditional methods, though extensively used, are now recognized as having various shortcomings. In addition, these methods fall short of providing solutions to and matching the needs of the rapid and expansive progress achieved in studying caspases. For these reasons, there has been a continuous improvement in detection methods for caspases and the network of pathways involved in their activation and downstream signaling. Over the past decade, newer methods based on cutting-edge state-of-the-art technologies have been introduced to the biomedical community. These methods enable both the temporal and spatial monitoring of the activity of caspases and their downstream substrates, and with enhanced accuracy and precision. These include fluorescent-labeled inhibitors (FLIs) for live imaging, single-cell live imaging, fluorescence resonance energy transfer (FRET) sensors, and activatable multifunctional probes for in vivo imaging. Recently, the recruitment of mass spectrometry (MS) techniques in the investigation of these enzymes expanded the repertoire of tools available for the identification and quantification of caspase substrates, cleavage products, and post-translational modifications in addition to unveiling the complex regulatory networks implicated. Collectively, these methods are enabling researchers to unravel much of the complex cellular processes involved in apoptosis, and are helping generate a clearer and comprehensive understanding of caspase-mediated proteolysis during apoptosis. Herein, we provide a comprehensive review of various assays and detection methods as they have evolved over the years, so to encourage further exploration of these enzymes, which should have direct implications for the advancement of therapeutics for cancer and other diseases.
Collapse
Affiliation(s)
- Mahmoud Zhra
- Department of Biochemistry and Molecular Medicine, College of Medicine, Alfaisal University, Riyadh 11533, Saudi Arabia
| | - Rani J. Qasem
- Department of Pharmacology and Pharmacy Practice, College of Pharmacy, Middle East University, Amman 11831, Jordan
| | - Fai Aldossari
- Zoology Department, College of Science, King Saud University, Riyadh 12372, Saudi Arabia
| | - Rimah Saleem
- Department of Biochemistry and Molecular Medicine, College of Medicine, Alfaisal University, Riyadh 11533, Saudi Arabia
| | - Ahmad Aljada
- Department of Biochemistry and Molecular Medicine, College of Medicine, Alfaisal University, Riyadh 11533, Saudi Arabia
| |
Collapse
|
2
|
Mu L, Song J, Akutsu T, Mori T. DiCleave: a deep learning model for predicting human Dicer cleavage sites. BMC Bioinformatics 2024; 25:13. [PMID: 38195423 PMCID: PMC10775615 DOI: 10.1186/s12859-024-05638-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 01/03/2024] [Indexed: 01/11/2024] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) are a class of non-coding RNAs that play a pivotal role as gene expression regulators. These miRNAs are typically approximately 20 to 25 nucleotides long. The maturation of miRNAs requires Dicer cleavage at specific sites within the precursor miRNAs (pre-miRNAs). Recent advances in machine learning-based approaches for cleavage site prediction, such as PHDcleav and LBSizeCleav, have been reported. ReCGBM, a gradient boosting-based model, demonstrates superior performance compared with existing methods. Nonetheless, ReCGBM operates solely as a binary classifier despite the presence of two cleavage sites in a typical pre-miRNA. Previous approaches have focused on utilizing only a fraction of the structural information in pre-miRNAs, often overlooking comprehensive secondary structure information. There is a compelling need for the development of a novel model to address these limitations. RESULTS In this study, we developed a deep learning model for predicting the presence of a Dicer cleavage site within a pre-miRNA segment. This model was enhanced by an autoencoder that learned the secondary structure embeddings of pre-miRNA. Benchmarking experiments demonstrated that the performance of our model was comparable to that of ReCGBM in the binary classification tasks. In addition, our model excelled in multi-class classification tasks, making it a more versatile and practical solution than ReCGBM. CONCLUSIONS Our proposed model exhibited superior performance compared with the current state-of-the-art model, underscoring the effectiveness of a deep learning approach in predicting Dicer cleavage sites. Furthermore, our model could be trained using only sequence and secondary structure information. Its capacity to accommodate multi-class classification tasks has enhanced the practical utility of our model.
Collapse
Affiliation(s)
- Lixuan Mu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan
| | - Tomoya Mori
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan.
| |
Collapse
|
3
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network. Proc Natl Acad Sci U S A 2023; 120:e2303590120. [PMID: 37729196 PMCID: PMC10523478 DOI: 10.1073/pnas.2303590120] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease-substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Joseph H. Lubin
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | | | - Guanyang Wang
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| |
Collapse
|
4
|
Li F, Wang C, Guo X, Akutsu T, Webb GI, Coin LJM, Kurgan L, Song J. ProsperousPlus: a one-stop and comprehensive platform for accurate protease-specific substrate cleavage prediction and machine-learning model construction. Brief Bioinform 2023; 24:bbad372. [PMID: 37874948 DOI: 10.1093/bib/bbad372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/30/2023] [Accepted: 09/29/2023] [Indexed: 10/26/2023] Open
Abstract
Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.
Collapse
Affiliation(s)
- Fuyi Li
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Cong Wang
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
| | - Lachlan J M Coin
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jiangning Song
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| |
Collapse
|
5
|
Matveev EV, Safronov VV, Ponomarev GV, Kazanov MD. Predicting Structural Susceptibility of Proteins to Proteolytic Processing. Int J Mol Sci 2023; 24:10761. [PMID: 37445939 DOI: 10.3390/ijms241310761] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 06/16/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
The importance of 3D protein structure in proteolytic processing is well known. However, despite the plethora of existing methods for predicting proteolytic sites, only a few of them utilize the structural features of potential substrates as predictors. Moreover, to our knowledge, there is currently no method available for predicting the structural susceptibility of protein regions to proteolysis. We developed such a method using data from CutDB, a database that contains experimentally verified proteolytic events. For prediction, we utilized structural features that have been shown to influence proteolysis in earlier studies, such as solvent accessibility, secondary structure, and temperature factor. Additionally, we introduced new structural features, including length of protruded loops and flexibility of protein termini. To maximize the prediction quality of the method, we carefully curated the training set, selected an appropriate machine learning method, and sampled negative examples to determine the optimal positive-to-negative class size ratio. We demonstrated that combining our method with models of protease primary specificity can outperform existing bioinformatics methods for the prediction of proteolytic sites. We also discussed the possibility of utilizing this method for bioinformatics prediction of other post-translational modifications.
Collapse
Affiliation(s)
- Evgenii V Matveev
- Skolkovo Institute of Science and Technology, Moscow 121205, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Moscow 127051, Russia
- Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow 117998, Russia
| | - Vyacheslav V Safronov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Gennady V Ponomarev
- Skolkovo Institute of Science and Technology, Moscow 121205, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Moscow 127051, Russia
| | - Marat D Kazanov
- Skolkovo Institute of Science and Technology, Moscow 121205, Russia
- A.A. Kharkevich Institute for Information Transmission Problems, Moscow 127051, Russia
- Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow 117998, Russia
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| |
Collapse
|
6
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528728. [PMID: 36824945 PMCID: PMC9949123 DOI: 10.1101/2023.02.16.528728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key post-translational modification involved in physiology and disease. The ability to robustly and rapidly predict protease substrate specificity would also enable targeted proteolytic cleavage - editing - of a target protein by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally-derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the three-dimensional structure and energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically-grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases: the NS3/4 protease from the Hepatitis C virus (HCV) and the Tobacco Etch Virus (TEV) proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pre-trained PGCN model to guide the design of TEV protease libraries for cleaving two non-canonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Joseph H. Lubin
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | | | - Guanyang Wang
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| |
Collapse
|
7
|
Wang H, Julien O. CaspSites: A Database and Web Application for Experimentally Observed Human Caspase Substrates Using N-Terminomics. J Proteome Res 2023; 22:454-461. [PMID: 36696595 DOI: 10.1021/acs.jproteome.2c00620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
CaspSites is a free-to-use database and web application for experimentally observed human caspase substrates using N-terminomics. It can be accessed and used by all users at the web URL www.caspsites.org. CaspSites stores cleavage site information identified for human caspases 1-9 in lysates and apoptotic cells, collected from their corresponding published studies. The database can be queried, viewed, and exported using the search page of the web application. The main parameters offered are protein substrate, cleavage site (P4-P4') residues, and individual caspase data sets, which can be connected using OR, AND, or NOT logical operators for custom user-built queries. CaspSites will be regularly updated with new experimental findings for understudied caspases, providing researchers insight into the distinctive roles human caspases play in cellular processes by identifying their target proteins in relation to each other.
Collapse
Affiliation(s)
- Henry Wang
- Department of Biochemistry, University of Alberta, Edmonton, Alberta T6G2H7, Canada
| | - Olivier Julien
- Department of Biochemistry, University of Alberta, Edmonton, Alberta T6G2H7, Canada
| |
Collapse
|
8
|
Hou P, Wang X, Wang H, Wang T, Yu Z, Xu C, Zhao Y, Wang W, Zhao Y, Chu F, Chang H, Zhu H, Lu J, Zhang F, Liang X, Li X, Wang S, Gao Y, He H. The ORF7a protein of SARS-CoV-2 initiates autophagy and limits autophagosome-lysosome fusion via degradation of SNAP29 to promote virus replication. Autophagy 2023; 19:551-569. [PMID: 35670302 PMCID: PMC9851267 DOI: 10.1080/15548627.2022.2084686] [Citation(s) in RCA: 71] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection is closely related to various cellular aspects associated with autophagy. However, how SARS-CoV-2 mediates the subversion of the macroautophagy/autophagy pathway remains largely unclear. In this study, we demonstrate that overexpression of the SARS-CoV-2 ORF7a protein activates LC3-II and leads to the accumulation of autophagosomes in multiple cell lines, while knockdown of the viral ORF7a gene via shRNAs targeting ORF7a sgRNA during SARS-CoV-2 infection decreased autophagy levels. Mechanistically, the ORF7a protein initiates autophagy via the AKT-MTOR-ULK1-mediated pathway, but ORF7a limits the progression of autophagic flux by activating CASP3 (caspase 3) to cleave the SNAP29 protein at aspartic acid residue 30 (D30), ultimately impairing complete autophagy. Importantly, SARS-CoV-2 infection-induced accumulated autophagosomes promote progeny virus production, whereby ORF7a downregulates SNAP29, ultimately resulting in failure of autophagosome fusion with lysosomes to promote viral replication. Taken together, our study reveals a mechanism by which SARS-CoV-2 utilizes the autophagic machinery to facilitate its own propagation via ORF7a.Abbreviations: 3-MA: 3-methyladenine; ACE2: angiotensin converting enzyme 2; ACTB/β-actin: actin beta; ATG7: autophagy related 7; Baf A1: bafilomycin A1; BECN1: beclin 1; CASP3: caspase 3; COVID-19: coronavirus disease 2019; GFP: green fluorescent protein; hpi: hour post-infection; hpt: hour post-transfection; MAP1LC3/LC3: microtubule associated protein 1 light chain 3; MERS: Middle East respiratory syndrome; MTOR: mechanistic target of rapamycin kinase; ORF: open reading frame; PARP: poly(ADP-ribose) polymerase; SARS-CoV-2: severe acute respiratory syndrome coronavirus 2; shRNAs: short hairpin RNAs; siRNA: small interfering RNA; SNAP29: synaptosome associated protein 29; SQSTM1/p62: sequestosome 1; STX17: syntaxin 17; TCID50: tissue culture infectious dose; TEM: transmission electron microscopy; TUBB, tubulin, beta; ULK1: unc-51 like autophagy activating kinase 1.
Collapse
Affiliation(s)
- Peili Hou
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Xuefeng Wang
- Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun, China
| | - Hongmei Wang
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China,CONTACT Hongmei Wang ;; Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, Shandong250014, China; Yuwei Gao Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun, Jilin130122, China; Hongbin He Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan250014, China
| | - Tiecheng Wang
- Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun, China
| | - Zhangping Yu
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Chunqing Xu
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Yudong Zhao
- Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun, China
| | - Wenqi Wang
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China,Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun, China
| | - Yong Zhao
- College of Veterinary Medicine, Shanxi Agricultural University, Jinzhong, China
| | - Fengyun Chu
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Huasong Chang
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Hongchao Zhu
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Jiahui Lu
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Fuzhen Zhang
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Xue Liang
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Xingyu Li
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Song Wang
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Yuwei Gao
- Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun, China
| | - Hongbin He
- Ruminant Diseases Research Center, College of Life Sciences, Shandong Normal University, Jinan, China
| |
Collapse
|
9
|
Wang H, Li H, Gao W, Xie J. PrUb-EL: A hybrid framework based on deep learning for identifying ubiquitination sites in Arabidopsis thaliana using ensemble learning strategy. Anal Biochem 2022; 658:114935. [PMID: 36206844 DOI: 10.1016/j.ab.2022.114935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 12/30/2022]
Abstract
Identification of ubiquitination sites is central to many biological experiments. Ubiquitination is a kind of post-translational protein modification (PTM). It is a key mechanism for increasing protein diversity and plays a vital role in regulating cell function. In recent years, many models have been developed to predict ubiquitination sites in humans, mice and yeast. However, few studies have predicted ubiquitination sites in Arabidopsis thaliana. In view of this, a deep network model named PrUb-EL is proposed to predict ubiquitination sites in Arabidopsis thaliana. Firstly, six features based on the protein sequence are extracted with amino acid index database (AAindex), dipeptide deviates from the expected mean (DDE), dipeptide composition (DPC), blocks substitution matrix (BLOSUM62), enhanced amino acid composition (EAAC) and binary encoding. Secondly, the synthetic minority over-sampling technique (SMOTE) is utilized to process the imbalanced data set. Then a new classifier named DG is presented, which includes Dense block, Residual block and Gated recurrent unit (GRU) block. Finally, each of six feature extraction methods is integrated into the DG model, and the ensemble learning strategy is used to gain the final prediction result. Experimental results show that PrUb-EL has good predictive ability with the accuracy (ACC) and area under the ROC curve (auROC) values of 91.00% and 97.70% using 5-fold cross-validation, respectively. Note that the values of ACC and auROC are 88.58% and 96.09% in the independent test, respectively. Compared with previous studies, our model has significantly improved performance thus it is an excellent method for identifying ubiquitination sites in Arabidopsis thaliana. The datasets and code used for the article are available at https://github.com/Tom-Wangy/PreUb-EL.git.
Collapse
Affiliation(s)
- Houqiang Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Hong Li
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China.
| | - Weifeng Gao
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Jin Xie
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| |
Collapse
|
10
|
Zuo Y, Hong Y, Zeng X, Zhang Q, Liu X. MLysPRED: graph-based multi-view clustering and multi-dimensional normal distribution resampling techniques to predict multiple lysine sites. Brief Bioinform 2022; 23:6661182. [PMID: 35953081 DOI: 10.1093/bib/bbac277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 06/11/2022] [Accepted: 06/14/2022] [Indexed: 11/13/2022] Open
Abstract
Posttranslational modification of lysine residues, K-PTM, is one of the most popular PTMs. Some lysine residues in proteins can be continuously or cascaded covalently modified, such as acetylation, crotonylation, methylation and succinylation modification. The covalent modification of lysine residues may have some special functions in basic research and drug development. Although many computational methods have been developed to predict lysine PTMs, up to now, the K-PTM prediction methods have been modeled and learned a single class of K-PTM modification. In view of this, this study aims to fill this gap by building a multi-label computational model that can be directly used to predict multiple K-PTMs in proteins. In this study, a multi-label prediction model, MLysPRED, is proposed to identify multiple lysine sites using features generated from human protein sequences. In MLysPRED, three kinds of multi-label sequence encoding algorithms (MLDBPB, MLPSDAAP, MLPSTAAP) are proposed and combined with three encoding strategies (CHHAA, DR and Kmer) to convert preprocessed lysine sequences into effective numerical features. A multidimensional normal distribution oversampling technique and graph-based multi-view clustering under-sampling algorithm were first proposed and incorporated to reduce the proportion of the original training samples, and multi-label nearest neighbor algorithm is used for classification. It is observed that MLysPRED achieved an Aiming of 92.21%, Coverage of 94.98%, Accuracy of 89.63%, Absolute-True of 81.46% and Absolute-False of 0.0682 on the independent datasets. Additionally, comparison of results with five existing predictors also indicated that MLysPRED is very promising and encouraging to predict multiple K-PTMs in proteins. For the convenience of the experimental scientists, 'MLysPRED' has been deployed as a user-friendly web-server at http://47.100.136.41:8181.
Collapse
Affiliation(s)
- Yun Zuo
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Yue Hong
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Xiangxiang Zeng
- School of Information Science and Engineering, Hunan University, Changsha, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology (DLUT), China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| |
Collapse
|
11
|
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.
Collapse
|
12
|
Matrikines as mediators of tissue remodelling. Adv Drug Deliv Rev 2022; 185:114240. [PMID: 35378216 DOI: 10.1016/j.addr.2022.114240] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 02/21/2022] [Accepted: 03/26/2022] [Indexed: 11/21/2022]
Abstract
Extracellular matrix (ECM) proteins confer biomechanical properties, maintain cell phenotype and mediate tissue repair (via release of sequestered cytokines and proteases). In contrast to intracellular proteomes, where proteins are monitored and replaced over short time periods, many ECM proteins function for years (decades in humans) without replacement. The longevity of abundant ECM proteins, such as collagen I and elastin, leaves them vulnerable to damage accumulation and their host organs prone to chronic, age-related diseases. However, ECM protein fragmentation can potentially produce peptide cytokines (matrikines) which may exacerbate and/or ameliorate age- and disease-related ECM remodelling. In this review, we discuss ECM composition, function and degradation and highlight examples of endogenous matrikines. We then critically and comprehensively analyse published studies of matrix-derived peptides used as topical skin treatments, before considering the potential for improvements in the discovery and delivery of novel matrix-derived peptides to skin and internal organs. From this, we conclude that while the translational impact of matrix-derived peptide therapeutics is evident, the mechanisms of action of these peptides are poorly defined. Further, well-designed, multimodal studies are required.
Collapse
|
13
|
Bahatyrevich-Kharitonik B, Medina-Guzman R, Flores-Cortes A, García-Cruzado M, Kavanagh E, Burguillos MA. Cell Death Related Proteins Beyond Apoptosis in the CNS. Front Cell Dev Biol 2022; 9:825747. [PMID: 35096845 PMCID: PMC8794922 DOI: 10.3389/fcell.2021.825747] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 12/28/2021] [Indexed: 12/14/2022] Open
Abstract
Cell death related (CDR) proteins are a diverse group of proteins whose original function was ascribed to apoptotic cell death signaling. Recently, descriptions of non-apoptotic functions for CDR proteins have increased. In this minireview, we comment on recent studies of CDR proteins outside the field of apoptosis in the CNS, encompassing areas such as the inflammasome and non-apoptotic cell death, cytoskeleton reorganization, synaptic plasticity, mitophagy, neurodegeneration and calcium signaling among others. Furthermore, we discuss the evolution of proteomic techniques used to predict caspase substrates that could potentially explain their non-apoptotic roles. Finally, we address new concepts in the field of non-apoptotic functions of CDR proteins that require further research such the effect of sexual dimorphism on non-apoptotic CDR protein function and the emergence of zymogen-specific caspase functions.
Collapse
Affiliation(s)
- Bazhena Bahatyrevich-Kharitonik
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Sevilla, and Instituto de Biomedicina de Sevilla (IBiS), Hospital Universitario Virgen del Rocío/CSIC, Seville, Spain
| | - Rafael Medina-Guzman
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Sevilla, and Instituto de Biomedicina de Sevilla (IBiS), Hospital Universitario Virgen del Rocío/CSIC, Seville, Spain
| | - Alicia Flores-Cortes
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Sevilla, and Instituto de Biomedicina de Sevilla (IBiS), Hospital Universitario Virgen del Rocío/CSIC, Seville, Spain
| | - Marta García-Cruzado
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Sevilla, and Instituto de Biomedicina de Sevilla (IBiS), Hospital Universitario Virgen del Rocío/CSIC, Seville, Spain
| | - Edel Kavanagh
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Sevilla, and Instituto de Biomedicina de Sevilla (IBiS), Hospital Universitario Virgen del Rocío/CSIC, Seville, Spain
| | - Miguel Angel Burguillos
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Sevilla, and Instituto de Biomedicina de Sevilla (IBiS), Hospital Universitario Virgen del Rocío/CSIC, Seville, Spain
| |
Collapse
|
14
|
Ho CT, Huang YW, Chen TR, Lo CH, Lo WC. Discovering the Ultimate Limits of Protein Secondary Structure Prediction. Biomolecules 2021; 11:1627. [PMID: 34827624 PMCID: PMC8615938 DOI: 10.3390/biom11111627] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 12/29/2022] Open
Abstract
Secondary structure prediction (SSP) of proteins is an important structural biology technique with many applications. There have been ~300 algorithms published in the past seven decades with fierce competition in accuracy. In the first 60 years, the accuracy of three-state SSP rose from ~56% to 81%; after that, it has long stayed at 81-86%. In the 1990s, the theoretical limit of three-state SSP accuracy had been estimated to be 88%. Thus, SSP is now generally considered not challenging or too challenging to improve. However, we found that the limit of three-state SSP might be underestimated. Besides, there is still much room for improving segment-based and eight-state SSPs, but the limits of these emerging topics have not been determined. This work performs large-scale sequence and structural analyses to estimate SSP accuracy limits and assess state-of-the-art SSP methods. The limit of three-state SSP is re-estimated to be ~92%, 4-5% higher than previously expected, indicating that SSP is still challenging. The estimated limit of eight-state SSP is 84-87%. Several proposals for improving future SSP algorithms are made based on our results. We hope that these findings will help move forward the development of SSP and all its applications.
Collapse
Affiliation(s)
- Chia-Tzu Ho
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Yu-Wei Huang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Chia-Hua Lo
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
15
|
Liu T, Chen J, Zhang Q, Hippe K, Hunt C, Le T, Cao R, Tang H. The Development of Machine Learning Methods in discriminating Secretory Proteins of Malaria Parasite. Curr Med Chem 2021; 29:807-821. [PMID: 34636289 DOI: 10.2174/0929867328666211005140625] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/28/2021] [Accepted: 08/15/2021] [Indexed: 11/22/2022]
Abstract
Malaria caused by Plasmodium falciparum is one of the major infectious diseases in the world. It is essential to exploit an effective method to predict secretory proteins of malaria parasites to develop effective cures and treatment. Biochemical assays can provide details for accurate identification of the secretory proteins, but these methods are expensive and time-consuming. In this paper, we summarized the machine learning-based identification algorithms and compared the construction strategies between different computational methods. Also, we discussed the use of machine learning to improve the ability of algorithms to identify proteins secreted by malaria parasites.
Collapse
Affiliation(s)
- Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Jiamao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University. United States
| | - Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University. United States
| | - Thu Le
- Department of Computer Science, Pacific Lutheran University. United States
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University. United States
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| |
Collapse
|
16
|
Zhao YW, Zhang S, Ding H. Recent development of machine learning methods in sumoylation sites prediction. Curr Med Chem 2021; 29:894-907. [PMID: 34525906 DOI: 10.2174/0929867328666210915112030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 07/24/2021] [Accepted: 08/07/2021] [Indexed: 11/22/2022]
Abstract
Sumoylation of proteins is an important reversible post-translational modification of proteins and mediates a variety of cellular processes. Sumo-modified proteins can change their subcellular localization, activity and stability. In addition, it also plays an important role in various cellular processes such as transcriptional regulation and signal transduction. The abnormal sumoylation is involved in many diseases, including neurodegeneration and immune-related diseases, as well as the development of cancer. Therefore, identification of the sumoylation site (SUMO site) is fundamental to understanding their molecular mechanisms and regulatory roles. In contrast to labor-intensive and costly experimental approaches, computational prediction of sumoylation sites in silico also attracted much attention for its accuracy, convenience and speed. At present, many computational prediction models have been used to identify SUMO sites, but these contents have not been comprehensively summarized and reviewed. Therefore, the research progress of relevant models is summarized and discussed in this paper. We will briefly summarize the development of bioinformatics methods on sumoylation site prediction. We will mainly focus on the benchmark dataset construction, feature extraction, machine learning method, published results and online tools. We hope the review will provide more help for wet-experimental scholars.
Collapse
Affiliation(s)
- Yi-Wei Zhao
- School of Medicine, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shihua Zhang
- College of Life Science and Health, Wuhan University of Science and Technology, Wuhan 430065. China
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
17
|
Yang YH, Wang JS, Yuan SS, Liu ML, Su W, Lin H, Zhang ZY. A Survey for Predicting ATP Binding Residues of Proteins Using Machine Learning Methods. Curr Med Chem 2021; 29:789-806. [PMID: 34514982 DOI: 10.2174/0929867328666210910125802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 06/29/2021] [Accepted: 07/04/2021] [Indexed: 11/22/2022]
Abstract
Protein-ligand interactions are necessary for majority protein functions. Adenosine-5'-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is cost-ineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.
Collapse
Affiliation(s)
- Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Jia-Shu Wang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shi-Shi Yuan
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Meng-Lu Liu
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Wei Su
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Zhao-Yue Zhang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
18
|
Yan K, Wen J, Liu JX, Xu Y, Liu B. Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2008-2016. [PMID: 31940548 DOI: 10.1109/tcbb.2020.2966450] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein fold recognition is one of the most essential steps for protein structure prediction, aiming to classify proteins into known protein folds. There are two main computational approaches: one is the template-based method based on the alignment scores between query-template protein pairs and the other is the machine learning method based on the feature representation and classifier. These two approaches have their own advantages and disadvantages. Can we combine these methods to establish more accurate predictors for protein fold recognition? In this study, we made an initial attempt and proposed two novel algorithms: TSVM-fold and ESVM-fold. TSVM-fold was based on the Support Vector Machines (SVMs), which utilizes a set of pairwise sequence similarity scores generated by three complementary template-based methods, including HHblits, SPARKS-X, and DeepFR. These scores measured the global relationships between query sequences and templates. The comprehensive features of the attributes of the sequences were fed into the SVMs for the prediction. Then the TSVM-fold was further combined with the HHblits algorithm so as to improve its generalization ability. The combined method is called ESVM-fold. Experimental results in two rigorous benchmark datasets (LE and YK datasets) showed that the proposed methods outperform some state-of-the-art methods, indicating that the TSVM-fold and ESVM-fold are efficient predictors for protein fold recognition.
Collapse
|
19
|
Jia C, Zhang M, Fan C, Li F, Song J. Formator: Predicting Lysine Formylation Sites Based on the Most Distant Undersampling and Safe-Level Synthetic Minority Oversampling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1937-1945. [PMID: 31804942 DOI: 10.1109/tcbb.2019.2957758] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Lysine formylation is a reversible type of protein post-translational modification and has been found to be involved in a myriad of biological processes, including modulation of chromatin conformation and gene expression in histones and other nuclear proteins. Accurate identification of lysine formylation sites is essential for elucidating the underlying molecular mechanisms of formylation. Traditional experimental methods are time-consuming and expensive. As such, it is desirable and necessary to develop computational methods for accurate prediction of formylation sites. In this study, we propose a novel predictor, termed Formator, for identifying lysine formylation sites from sequences information. Formator is developed using the ensemble learning (EL) strategy based on four individual support vector machine classifiers via a voting system. Moreover, the most distant undersampling and Safe-Level-SMOTE oversampling techniques were integrated to deal with the data imbalance problem of the training dataset. Four effective feature extraction methods, namely bi-profile Bayes (BPB), k-nearest neighbor (KNN), amino acid physicochemical properties (AAindex), and composition and transition (CTD) were employed to encode the surrounding sequence features of potential formylation sites. Extensive empirical studies show that Formator achieved the accuracy of 87.24 and 74.96 percent on jackknife test and the independent test, respectively. Performance comparison results on the independent test indicate that Formator outperforms current existing prediction tool, LFPred, suggesting that it has a great potential to serve as a useful tool in identifying novel lysine formylation sites and facilitating hypothesis-driven experimental efforts.
Collapse
|
20
|
Conde-Rubio MDC, Mylonas R, Widmann C. The proteolytic landscape of cells exposed to non-lethal stresses is shaped by executioner caspases. Cell Death Discov 2021; 7:164. [PMID: 34226511 PMCID: PMC8257705 DOI: 10.1038/s41420-021-00539-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 04/26/2021] [Accepted: 05/29/2021] [Indexed: 02/06/2023] Open
Abstract
Cells are in constant adaptation to environmental changes to insure their proper functioning. When exposed to stresses, cells activate specific pathways to elicit adaptive modifications. Those changes can be mediated by selective modulation of gene and protein expression as well as by post-translational modifications, such as phosphorylation and proteolytic processing. Protein cleavage, as a controlled and limited post-translational modification, is involved in diverse physiological processes such as the maintenance of protein homeostasis, activation of repair pathways, apoptosis and the regulation of proliferation. Here we assessed by quantitative proteomics the proteolytic landscape in two cell lines subjected to low cisplatin concentrations used as a mild non-lethal stress paradigm. This landscape was compared to the one obtained in the same cells stimulated with cisplatin concentrations inducing apoptosis. These analyses were performed in wild-type cells and in cells lacking the two main executioner caspases: caspase-3 and caspase-7. Ninety-two proteins were found to be cleaved at one or a few sites (discrete cleavage) in low stress conditions compared to four hundred and fifty-three in apoptotic cells. Many of the cleaved proteins in stressed cells were also found to be cleaved in apoptotic conditions. As expected, ~90% of the cleavage events were dependent on caspase-3/caspase-7 in apoptotic cells. Strikingly, upon exposure to non-lethal stresses, no discrete cleavage was detected in cells lacking caspase-3 and caspase-7. This indicates that the proteolytic landscape in stressed viable cells fully depends on the activity of executioner caspases. These results suggest that the so-called executioner caspases fulfill important stress adaptive responses distinct from their role in apoptosis. Mass spectrometry data are available via ProteomeXchange with identifier PXD023488.
Collapse
Affiliation(s)
| | - Roman Mylonas
- Protein Analysis Facility, University of Lausanne, Génopode, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Amphipole, Lausanne, Switzerland
| | - Christian Widmann
- Department of Biomedical Sciences, University of Lausanne, Bugnon 7, Lausanne, Switzerland.
| |
Collapse
|
21
|
Chen YZ, Wang ZZ, Wang Y, Ying G, Chen Z, Song J. nhKcr: a new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning. Brief Bioinform 2021; 22:6277413. [PMID: 34002774 DOI: 10.1093/bib/bbab146] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 03/18/2021] [Accepted: 03/25/2021] [Indexed: 12/20/2022] Open
Abstract
Lysine crotonylation (Kcr) is a newly discovered type of protein post-translational modification and has been reported to be involved in various pathophysiological processes. High-resolution mass spectrometry is the primary approach for identification of Kcr sites. However, experimental approaches for identifying Kcr sites are often time-consuming and expensive when compared with computational approaches. To date, several predictors for Kcr site prediction have been developed, most of which are capable of predicting crotonylation sites on either histones alone or mixed histone and nonhistone proteins together. These methods exhibit high diversity in their algorithms, encoding schemes, feature selection techniques and performance assessment strategies. However, none of them were designed for predicting Kcr sites on nonhistone proteins. Therefore, it is desirable to develop an effective predictor for identifying Kcr sites from the large amount of nonhistone sequence data. For this purpose, we first provide a comprehensive review on six methods for predicting crotonylation sites. Second, we develop a novel deep learning-based computational framework termed as CNNrgb for Kcr site prediction on nonhistone proteins by integrating different types of features. We benchmark its performance against multiple commonly used machine learning classifiers (including random forest, logitboost, naïve Bayes and logistic regression) by performing both 10-fold cross-validation and independent test. The results show that the proposed CNNrgb framework achieves the best performance with high computational efficiency on large datasets. Moreover, to facilitate users' efforts to investigate Kcr sites on human nonhistone proteins, we implement an online server called nhKcr and compare it with other existing tools to illustrate the utility and robustness of our method. The nhKcr web server and all the datasets utilized in this study are freely accessible at http://nhKcr.erc.monash.edu/.
Collapse
Affiliation(s)
- Yong-Zi Chen
- Laboratory of Tumor Cell Biology, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, China
| | | | | | - Guoguang Ying
- Laboratory of Tumor Cell Biology in Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, China
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Australia
| |
Collapse
|
22
|
Zhang D, Xu ZC, Su W, Yang YH, Lv H, Yang H, Lin H. iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features. Bioinformatics 2021; 37:171-177. [PMID: 32766811 DOI: 10.1093/bioinformatics/btaa702] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 07/12/2020] [Accepted: 07/28/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Protein carbonylation is one of the most important oxidative stress-induced post-translational modifications, which is generally characterized as stability, irreversibility and relative early formation. It plays a significant role in orchestrating various biological processes and has been already demonstrated to be related to many diseases. However, the experimental technologies for carbonylation sites identification are not only costly and time consuming, but also unable of processing a large number of proteins at a time. Thus, rapidly and effectively identifying carbonylation sites by computational methods will provide key clues for the analysis of occurrence and development of diseases. RESULTS In this study, we developed a predictor called iCarPS to identify carbonylation sites based on sequence information. A novel feature encoding scheme called residues conical coordinates combined with their physicochemical properties was proposed to formulate carbonylated protein and non-carbonylated protein samples. To remove potential redundant features and improve the prediction performance, a feature selection technique was used. The accuracy and robustness of iCarPS were proved by experiments on training and independent datasets. Comparison with other published methods demonstrated that the proposed method is powerful and could provide powerful performance for carbonylation sites identification. AVAILABILITY AND IMPLEMENTATION Based on the proposed model, a user-friendly webserver and a software package were constructed, which can be freely accessed at http://lin-group.cn/server/iCarPS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dan Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Wei Su
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yu-He Yang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lv
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Yang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
23
|
Li S, Yu K, Wu G, Zhang Q, Wang P, Zheng J, Liu ZX, Wang J, Gao X, Cheng H. pCysMod: Prediction of Multiple Cysteine Modifications Based on Deep Learning Framework. Front Cell Dev Biol 2021; 9:617366. [PMID: 33732693 PMCID: PMC7959776 DOI: 10.3389/fcell.2021.617366] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 01/12/2021] [Indexed: 12/18/2022] Open
Abstract
Thiol groups on cysteines can undergo multiple post-translational modifications (PTMs), acting as a molecular switch to maintain redox homeostasis and regulating a series of cell signaling transductions. Identification of sophistical protein cysteine modifications is crucial for dissecting its underlying regulatory mechanism. Instead of a time-consuming and labor-intensive experimental method, various computational methods have attracted intense research interest due to their convenience and low cost. Here, we developed the first comprehensive deep learning based tool pCysMod for multiple protein cysteine modification prediction, including S-nitrosylation, S-palmitoylation, S-sulfenylation, S-sulfhydration, and S-sulfinylation. Experimentally verified cysteine sites curated from literature and sites collected by other databases and predicting tools were integrated as benchmark dataset. Several protein sequence features were extracted and united into a deep learning model, and the hyperparameters were optimized by particle swarm optimization algorithms. Cross-validations indicated our model showed excellent robustness and outperformed existing tools, which was able to achieve an average AUC of 0.793, 0.807, 0.796, 0.793, and 0.876 for S-nitrosylation, S-palmitoylation, S-sulfenylation, S-sulfhydration, and S-sulfinylation, demonstrating pCysMod was stable and suitable for protein cysteine modification prediction. Besides, we constructed a comprehensive protein cysteine modification prediction web server based on this model to benefit the researches finding the potential modification sites of their interested proteins, which could be accessed at http://pcysmod.omicsbio.info. This work will undoubtedly greatly promote the study of protein cysteine modification and contribute to clarifying the biological regulation mechanisms of cysteine modification within and among the cells.
Collapse
Affiliation(s)
- Shihua Li
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China.,School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Kai Yu
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Guandi Wu
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Qingfeng Zhang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Panqin Wang
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Jian Zheng
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Ze-Xian Liu
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Jichao Wang
- CAS Key Lab of Biobased Materials, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, China
| | - Xinjiao Gao
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, China
| | - Han Cheng
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| |
Collapse
|
24
|
Liu P, Song J, Lin CY, Akutsu T. ReCGBM: a gradient boosting-based method for predicting human dicer cleavage sites. BMC Bioinformatics 2021; 22:63. [PMID: 33568063 PMCID: PMC7877110 DOI: 10.1186/s12859-021-03993-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 02/02/2021] [Indexed: 11/30/2022] Open
Abstract
Background Human dicer is an enzyme that cleaves pre-miRNAs into miRNAs. Several models have been developed to predict human dicer cleavage sites, including PHDCleav and LBSizeCleav. Given an input sequence, these models can predict whether the sequence contains a cleavage site. However, these models only consider each sequence independently and lack interpretability. Therefore, it is necessary to develop an accurate and explainable predictor, which employs relations between different sequences, to enhance the understanding of the mechanism by which human dicer cleaves pre-miRNA. Results In this study, we develop an accurate and explainable predictor for human dicer cleavage site – ReCGBM. We design relational features and class features as inputs to a lightGBM model. Computational experiments show that ReCGBM achieves the best performance compared to the existing methods. Further, we find that features in close proximity to the center of pre-miRNA are more important and make a significant contribution to the performance improvement of the developed method. Conclusions The results of this study show that ReCGBM is an interpretable and accurate predictor. Besides, the analyses of feature importance show that it might be of particular interest to consider more informative features close to the center of the pre-miRNA in future predictors.
Collapse
Affiliation(s)
- Pengyu Liu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
| | - Chun-Yu Lin
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan.,Center for Intelligent Drug Systems and Smart Bio-devices, National Chiao Tung University, Hsinchu, 300, Taiwan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan.
| |
Collapse
|
25
|
Na J, Newman JA, Then CK, Syed J, Vendrell I, Torrecilla I, Ellermann S, Ramadan K, Fischer R, Kiltie AE. SPRTN protease-cleaved MRE11 decreases DNA repair and radiosensitises cancer cells. Cell Death Dis 2021; 12:165. [PMID: 33558481 PMCID: PMC7870818 DOI: 10.1038/s41419-021-03437-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 01/07/2021] [Accepted: 01/11/2021] [Indexed: 12/21/2022]
Abstract
The human MRE11/RAD50/NBS1 (MRN) complex plays a crucial role in sensing and repairing DNA DSB. MRE11 possesses dual 3'-5' exonuclease and endonuclease activity and forms the core of the multifunctional MRN complex. We previously identified a C-terminally truncated form of MRE11 (TR-MRE11) associated with post-translational MRE11 degradation. Here we identified SPRTN as the essential protease for the formation of TR-MRE11 and characterised the role of this MRE11 form in its DNA damage response (DDR). Using tandem mass spectrometry and site-directed mutagenesis, the SPRTN-dependent cleavage site for MRE11 was identified between 559 and 580 amino acids. Despite the intact interaction of TR-MRE11 with its constitutive core complex proteins RAD50 and NBS1, both nuclease activities of truncated MRE11 were dramatically reduced due to its deficient binding to DNA. Furthermore, lack of the MRE11 C-terminal decreased HR repair efficiency, very likely due to abolished recruitment of TR-MRE11 to the sites of DNA damage, which consequently led to increased cellular radiosensitivity. The presence of this DNA repair-defective TR-MRE11 could explain our previous finding that the high MRE11 protein expression by immunohistochemistry correlates with improved survival following radical radiotherapy in bladder cancer patients.
Collapse
Affiliation(s)
- Juri Na
- MRC Oxford Institute for Radiation Oncology, Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
| | - Joseph A Newman
- Centre for Medicines Discovery, University of Oxford, Oxford, UK
| | - Chee Kin Then
- MRC Oxford Institute for Radiation Oncology, Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
| | - Junetha Syed
- MRC Oxford Institute for Radiation Oncology, Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
| | - Iolanda Vendrell
- MRC Oxford Institute for Radiation Oncology, Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
- Target Discovery Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Ignacio Torrecilla
- MRC Oxford Institute for Radiation Oncology, Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
| | - Sophie Ellermann
- MRC Oxford Institute for Radiation Oncology, Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
| | - Kristijan Ramadan
- MRC Oxford Institute for Radiation Oncology, Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
| | - Roman Fischer
- Target Discovery Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Anne E Kiltie
- MRC Oxford Institute for Radiation Oncology, Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK.
| |
Collapse
|
26
|
Atkin-Smith GK, Miles MA, Tixeira R, Lay FT, Duan M, Hawkins CJ, Phan TK, Paone S, Mathivanan S, Hulett MD, Chen W, Poon IKH. Plexin B2 Is a Regulator of Monocyte Apoptotic Cell Disassembly. Cell Rep 2020; 29:1821-1831.e3. [PMID: 31722200 DOI: 10.1016/j.celrep.2019.10.014] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 09/10/2019] [Accepted: 10/03/2019] [Indexed: 12/18/2022] Open
Abstract
Billions of cells undergo apoptosis daily and often fragment into small, membrane-bound extracellular vesicles termed apoptotic bodies (ApoBDs). We demonstrate that apoptotic monocytes undergo a highly coordinated disassembly process and form long, beaded protrusions (coined as beaded apoptopodia), which fragment to release ApoBDs. Here, we find that the protein plexin B2 (PlexB2), a transmembrane receptor that regulates axonal guidance in neurons, is enriched in the ApoBDs of THP1 monocytes and is a caspase 3/7 substrate. To determine whether PlexB2 is involved in the disassembly of apoptotic monocytes, we generate PlexB2-deficient THP1 monocytes and demonstrate that lack of PlexB2 impairs the formation of beaded apoptopodia and ApoBDs. Consequently, the loss of PlexB2 in apoptotic THP1 monocytes impairs their uptake by both professional and non-professional phagocytes. Altogether, these data identify PlexB2 as a positive regulator of apoptotic monocyte disassembly and demonstrate the importance of this process in apoptotic cell clearance.
Collapse
Affiliation(s)
- Georgia K Atkin-Smith
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Mark A Miles
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Rochelle Tixeira
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Fung T Lay
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Mubing Duan
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Christine J Hawkins
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Thanh Kha Phan
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Stephanie Paone
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Suresh Mathivanan
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Mark D Hulett
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Weisan Chen
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
| | - Ivan K H Poon
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia.
| |
Collapse
|
27
|
Li S, Yu K, Wang D, Zhang Q, Liu ZX, Zhao L, Cheng H. Deep learning based prediction of species-specific protein S-glutathionylation sites. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140422. [DOI: 10.1016/j.bbapap.2020.140422] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 03/12/2020] [Accepted: 03/26/2020] [Indexed: 02/08/2023]
|
28
|
Juan SH, Chen TR, Lo WC. A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy. PLoS One 2020; 15:e0235153. [PMID: 32603341 PMCID: PMC7326220 DOI: 10.1371/journal.pone.0235153] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 06/09/2020] [Indexed: 01/06/2023] Open
Abstract
The secondary structure prediction of proteins is a classic topic of computational structural biology with a variety of applications. During the past decade, the accuracy of prediction achieved by state-of-the-art algorithms has been >80%; meanwhile, the time cost of prediction increased rapidly because of the exponential growth of fundamental protein sequence data. Based on literature studies and preliminary observations on the relationships between the size/homology of the fundamental protein dataset and the speed/accuracy of predictions, we raised two hypotheses that might be helpful to determine the main influence factors of the efficiency of secondary structure prediction. Experimental results of size and homology reductions of the fundamental protein dataset supported those hypotheses. They revealed that shrinking the size of the dataset could substantially cut down the time cost of prediction with a slight decrease of accuracy, which could be increased on the contrary by homology reduction of the dataset. Moreover, the Shannon information entropy could be applied to explain how accuracy was influenced by the size and homology of the dataset. Based on these findings, we proposed that a proper combination of size and homology reductions of the protein dataset could speed up the secondary structure prediction while preserving the high accuracy of state-of-the-art algorithms. Testing the proposed strategy with the fundamental protein dataset of the year 2018 provided by the Universal Protein Resource, the speed of prediction was enhanced over 20 folds while all accuracy measures remained equivalently high. These findings are supposed helpful for improving the efficiency of researches and applications depending on the secondary structure prediction of proteins. To make future implementations of the proposed strategy easy, we have established a database of size and homology reduced protein datasets at http://10.life.nctu.edu.tw/UniRefNR.
Collapse
Affiliation(s)
- Sheng-Hung Juan
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- The Center for Bioinformatics Research, National Chiao Tung University, Hsinchu, Taiwan
| |
Collapse
|
29
|
Tange N, Hayakawa F, Yasuda T, Odaira K, Yamamoto H, Hirano D, Sakai T, Terakura S, Tsuzuki S, Kiyoi H. Staurosporine and venetoclax induce the caspase-dependent proteolysis of MEF2D-fusion proteins and apoptosis in MEF2D-fusion (+) ALL cells. Biomed Pharmacother 2020; 128:110330. [PMID: 32504922 DOI: 10.1016/j.biopha.2020.110330] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 05/22/2020] [Accepted: 05/23/2020] [Indexed: 01/01/2023] Open
Abstract
MEF2D-fusion (M-fusion) genes are newly discovered recurrent gene abnormalities that are detected in approximately 5 % of acute lymphoblastic leukemia (ALL) cases. Their introduction to cells has been reported to transform cell lines or increase the colony formation of bone marrow cells, suggesting their survival-supporting ability, which prompted us to examine M-fusion-targeting drugs. To identify compounds that reduce the protein expression level of MEF2D, we developed a high-throughput screening system using 293T cells stably expressing a fusion protein of MEF2D and luciferase, in which the protein expression level of MEF2D was easily measured by a luciferase assay. We screened 3766 compounds with known pharmaceutical activities using this system and selected staurosporine as a potential inducer of the proteolysis of MEF2D. Staurosporine induced the proteolysis of M-fusion proteins in M-fusion (+) ALL cell lines. Proteolysis was inhibited by caspase inhibitors, not proteasome inhibitors, suggesting caspase dependency. Consistent with this result, the growth inhibitory effects of staurosporine were stronger in M-fusion (+) ALL cell lines than in negative cell lines, and caspase inhibitors blocked apoptosis induced by staurosporine. We identified the cleavage site of MEF2D-HNRNPUL1 by caspases and confirmed that its caspase cleavage-resistant mutant was resistant to staurosporine-induced proteolysis. Based on these results, we investigated another Food and Drug Administration-approved caspase activator, venetoclax, and found that it exerted similar effects to staurosporine, namely, the proteolysis of M-fusion proteins and strong growth inhibitory effects in M-fusion (+) ALL cell lines. The present study provides novel insights into drug screening strategies and the clinical indications of venetoclax.
Collapse
Affiliation(s)
- Naoyuki Tange
- Department of Hematology and Oncology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Fumihiko Hayakawa
- Department of Hematology and Oncology, Nagoya University Graduate School of Medicine, Nagoya, Japan; Department of Pathophysiological Laboratory Sciences, Nagoya University Graduate School of Medicine, Nagoya, Japan.
| | - Takahiko Yasuda
- Clinical Research Center, Nagoya Medical Center, National Hospital Organization, Nagoya, Japan
| | - Koya Odaira
- Department of Pathophysiological Laboratory Sciences, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Hideyuki Yamamoto
- Department of Hematology and Oncology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Daiki Hirano
- Department of Hematology and Oncology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Toshiyasu Sakai
- Department of Hematology and Oncology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Seitaro Terakura
- Department of Hematology and Oncology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Shinobu Tsuzuki
- Department of Biochemistry, Aichi Medical University, School of Medicine, Japan
| | - Hitoshi Kiyoi
- Department of Hematology and Oncology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| |
Collapse
|
30
|
Feng CQ, Zhang ZY, Zhu XJ, Lin Y, Chen W, Tang H, Lin H. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics 2020; 35:1469-1477. [PMID: 30247625 DOI: 10.1093/bioinformatics/bty827] [Citation(s) in RCA: 156] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 09/13/2018] [Accepted: 09/20/2018] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Transcription termination is an important regulatory step of gene expression. If there is no terminator in gene, transcription could not stop, which will result in abnormal gene expression. Detecting such terminators can determine the operon structure in bacterial organisms and improve genome annotation. Thus, accurate identification of transcriptional terminators is essential and extremely important in the research of transcription regulations. RESULTS In this study, we developed a new predictor called 'iTerm-PseKNC' based on support vector machine to identify transcription terminators. The binomial distribution approach was used to pick out the optimal feature subset derived from pseudo k-tuple nucleotide composition (PseKNC). The 5-fold cross-validation test results showed that our proposed method achieved an accuracy of 95%. To further evaluate the generalization ability of 'iTerm-PseKNC', the model was examined on independent datasets which are experimentally confirmed Rho-independent terminators in Escherichia coli and Bacillus subtilis genomes. As a result, all the terminators in E. coli and 87.5% of the terminators in B. subtilis were correctly identified, suggesting that the proposed model could become a powerful tool for bacterial terminator recognition. AVAILABILITY AND IMPLEMENTATION For the convenience of most of wet-experimental researchers, the web-server for 'iTerm-PseKNC' was established at http://lin-group.cn/server/iTerm-PseKNC/, by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.
Collapse
Affiliation(s)
- Chao-Qin Feng
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhao-Yue Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiao-Juan Zhu
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yan Lin
- Key Laboratory for Animal Disease Resistance Nutrition of the Ministry of Education, Animal Nutrition Institute, Sichuan Agricultural University, Chengdu, China
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan, China
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
31
|
Li HF, Wang XF, Tang H. Predicting Bacteriophage Enzymes and Hydrolases by Using Combined Features. Front Bioeng Biotechnol 2020; 8:183. [PMID: 32266225 PMCID: PMC7105632 DOI: 10.3389/fbioe.2020.00183] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2020] [Accepted: 02/24/2020] [Indexed: 12/19/2022] Open
Abstract
Bacteriophage is a type of virus that could infect the host bacteria. They have been applied in the treatment of pathogenic bacterial infection. Phage enzymes and hydrolases play the most important role in the destruction of bacterial cells. Correctly identifying the hydrolases coded by phage is not only beneficial to their function study, but also conducive to antibacteria drug discovery. Thus, this work aims to recognize the enzymes and hydrolases in phage. A combination of different features was used to represent samples of phage and hydrolase. A feature selection technique called analysis of variance was developed to optimize features. The classification was performed by using support vector machine (SVM). The prediction process includes two steps. The first step is to identify phage enzymes. The second step is to determine whether a phage enzyme is hydrolase or not. The jackknife cross-validated results showed that our method could produce overall accuracies of 85.1 and 94.3%, respectively, for the two predictions, demonstrating that the proposed method is promising.
Collapse
Affiliation(s)
- Hong-Fei Li
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou, China.,School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Xian-Fang Wang
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou, China
| |
Collapse
|
32
|
Liu T, Tang H. A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite. Curr Pharm Des 2020; 26:3049-3058. [PMID: 32156226 DOI: 10.2174/1381612826666200310122324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 02/10/2020] [Indexed: 11/22/2022]
Abstract
The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.
Collapse
Affiliation(s)
- Ting Liu
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
33
|
Zhu Y, Jia C, Li F, Song J. Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling. Anal Biochem 2020; 593:113592. [DOI: 10.1016/j.ab.2020.113592] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 01/14/2020] [Accepted: 01/17/2020] [Indexed: 12/13/2022]
|
34
|
Song J, Wang Y, Li F, Akutsu T, Rawlings ND, Webb GI, Chou KC. iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform 2020; 20:638-658. [PMID: 29897410 PMCID: PMC6556904 DOI: 10.1093/bib/bby028] [Citation(s) in RCA: 128] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 03/02/2018] [Indexed: 01/03/2023] Open
Abstract
Regulation of proteolysis plays a critical role in a myriad of important cellular processes. The key to better understanding the mechanisms that control this process is to identify the specific substrates that each protease targets. To address this, we have developed iProt-Sub, a powerful bioinformatics tool for the accurate prediction of protease-specific substrates and their cleavage sites. Importantly, iProt-Sub represents a significantly advanced version of its successful predecessor, PROSPER. It provides optimized cleavage site prediction models with better prediction performance and coverage for more species-specific proteases (4 major protease families and 38 different proteases). iProt-Sub integrates heterogeneous sequence and structural features and uses a two-step feature selection procedure to further remove redundant and irrelevant features in an effort to improve the cleavage site prediction accuracy. Features used by iProt-Sub are encoded by 11 different sequence encoding schemes, including local amino acid sequence profile, secondary structure, solvent accessibility and native disorder, which will allow a more accurate representation of the protease specificity of approximately 38 proteases and training of the prediction models. Benchmarking experiments using cross-validation and independent tests showed that iProt-Sub is able to achieve a better performance than several existing generic tools. We anticipate that iProt-Sub will be a powerful tool for proteome-wide prediction of protease-specific substrates and their cleavage sites, and will facilitate hypothesis-driven functional interrogation of protease-specific substrate cleavage and proteolytic events.
Collapse
Affiliation(s)
- Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia.,Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, 611-0011, Japan
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA and Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
35
|
Li F, Leier A, Liu Q, Wang Y, Xiang D, Akutsu T, Webb GI, Smith AI, Marquez-Lago T, Li J, Song J. Procleave: Predicting Protease-specific Substrate Cleavage Sites by Combining Sequence and Structural Information. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:52-64. [PMID: 32413515 PMCID: PMC7393547 DOI: 10.1016/j.gpb.2019.08.002] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 08/08/2019] [Accepted: 10/23/2019] [Indexed: 10/29/2022]
Abstract
Proteases are enzymes that cleave and hydrolyse the peptide bonds between two specific amino acid residues of target substrate proteins. Protease-controlled proteolysis plays a key role in the degradation and recycling of proteins, which is essential for various physiological processes. Thus, solving the substrate identification problem will have important implications for the precise understanding of functions and physiological roles of proteases, as well as for therapeutic target identification and pharmaceutical applicability. Consequently, there is a great demand for bioinformatics methods that can predict novel substrate cleavage events with high accuracy by utilizing both sequence and structural information. In this study, we present Procleave, a novel bioinformatics approach for predicting protease-specific substrates and specific cleavage sites by taking into account both their sequence and 3D structural information. Structural features of known cleavage sites were represented by discrete values using a LOWESS data-smoothing optimization method, which turned out to be critical for the performance of Procleave. The optimal approximations of all structural parameter values were encoded in a conditional random field (CRF) computational framework, alongside sequence and chemical group-based features. Here, we demonstrate the outstanding performance of Procleave through extensive benchmarking and independent tests. Procleave is capable of correctly identifying most cleavage sites in the case study. Importantly, when applied to the human structural proteome encompassing 17,628 protein structures, Procleave suggests a number of potential novel target substrates and their corresponding cleavage sites of different proteases. Procleave is implemented as a webserver and is freely accessible at http://procleave.erc.monash.edu/.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Andre Leier
- School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Yanan Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Dongxu Xiang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Tatiana Marquez-Lago
- School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35233, USA.
| | - Jian Li
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia; Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia; ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia.
| |
Collapse
|
36
|
Marini S, Vitali F, Rampazzi S, Demartini A, Akutsu T. Protease target prediction via matrix factorization. Bioinformatics 2019; 35:923-929. [PMID: 30169576 DOI: 10.1093/bioinformatics/bty746] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 08/20/2018] [Accepted: 08/27/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein cleavage is an important cellular event, involved in a myriad of processes, from apoptosis to immune response. Bioinformatics provides in silico tools, such as machine learning-based models, to guide the discovery of targets for the proteases responsible for protein cleavage. State-of-the-art models have a scope limited to specific protease families (such as Caspases), and do not explicitly include biological or medical knowledge (such as the hierarchical protein domain similarity or gene-gene interactions). To fill this gap, we present a novel approach for protease target prediction based on data integration. RESULTS By representing protease-protein target information in the form of relational matrices, we design a model (i) that is general and not limited to a single protease family, and (b) leverages on the available knowledge, managing extremely sparse data from heterogeneous data sources, including primary sequence, pathways, domains and interactions. When compared with other algorithms on test data, our approach provides a better performance even for models specifically focusing on a single protease family. AVAILABILITY AND IMPLEMENTATION https://gitlab.com/smarini/MaDDA/ (Matlab code and utilized data.). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Simone Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Francesca Vitali
- Department of Medicine, Center for Biomedical Informatics and Biostatistics, BIO5 Institute), University of Arizona, Tucson, AZ, USA
| | - Sara Rampazzi
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Andrea Demartini
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
| |
Collapse
|
37
|
Ju Z, Wang SY. Identify Lysine Neddylation Sites Using Bi-profile Bayes Feature Extraction via the Chou's 5-steps Rule and General Pseudo Components. Curr Genomics 2019; 20:592-601. [PMID: 32581647 PMCID: PMC7290059 DOI: 10.2174/1389202921666191223154629] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 10/19/2019] [Accepted: 11/07/2019] [Indexed: 01/06/2023] Open
Abstract
Introduction Neddylation is a highly dynamic and reversible post-translational modification. The abnormality of neddylation has previously been shown to be closely related to some human diseases. The detection of neddylation sites is essential for elucidating the regulation mechanisms of protein neddylation. Objective As the detection of the lysine neddylation sites by the traditional experimental method is often expensive and time-consuming, it is imperative to design computational methods to identify neddylation sites. Methods In this study, a bioinformatics tool named NeddPred is developed to identify underlying protein neddylation sites. A bi-profile bayes feature extraction is used to encode neddylation sites and a fuzzy support vector machine model is utilized to overcome the problem of noise and class imbalance in the prediction. Results Matthew's correlation coefficient of NeddPred achieved 0.7082 and an area under the receiver operating characteristic curve of 0.9769. Independent tests show that NeddPred significantly outperforms existing lysine neddylation sites predictor NeddyPreddy. Conclusion Therefore, NeddPred can be a complement to the existing tools for the prediction of neddylation sites. A user-friendly webserver for NeddPred is accessible at 123.206.31.171/NeddPred/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| | - Shi-Yun Wang
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| |
Collapse
|
38
|
Li F, Wang Y, Li C, Marquez-Lago TT, Leier A, Rawlings ND, Haffari G, Revote J, Akutsu T, Chou KC, Purcell AW, Pike RN, Webb GI, Ian Smith A, Lithgow T, Daly RJ, Whisstock JC, Song J. Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods. Brief Bioinform 2019; 20:2150-2166. [PMID: 30184176 PMCID: PMC6954447 DOI: 10.1093/bib/bby077] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 07/26/2018] [Accepted: 08/01/2018] [Indexed: 01/06/2023] Open
Abstract
The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Biology, Institute of Molecular Systems Biology,ETH Zürich, Zürich 8093, Switzerland
| | - Tatiana T Marquez-Lago
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Wellcome Trust Genome Campus,Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gholamreza Haffari
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Jerico Revote
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Anthony W Purcell
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Robert N Pike
- La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Trevor Lithgow
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria 3800, Australia
| | - Roger J Daly
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - James C Whisstock
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
39
|
Chen Z, Liu X, Li F, Li C, Marquez-Lago T, Leier A, Akutsu T, Webb GI, Xu D, Smith AI, Li L, Chou KC, Song J. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform 2019; 20:2267-2290. [PMID: 30285084 PMCID: PMC6954452 DOI: 10.1093/bib/bby089] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/17/2018] [Accepted: 08/18/2018] [Indexed: 12/22/2022] Open
Abstract
Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Xuhan Liu
- Medicinal Chemistry, Leiden Academic Centre for Drug Research,Einsteinweg, Leiden, The Netherlands
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- Institute of Molecular Systems Biology, ETH Zürich,Auguste-Piccard-Hof, Zürich, Switzerland
| | - Tatiana Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research,Kyoto University, Uji, Kyoto, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Dakang Xu
- Faculty of Medical Laboratory Science, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Department of Molecular and Translational Science, Faculty of Medicine, Hudson Institute of Medical Research, Monash University, Melbourne, VIC, Australia
| | - Alexander Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Lei Li
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
40
|
Chen Z, Zhao P, Li F, Wang Y, Smith AI, Webb GI, Akutsu T, Baggag A, Bensmail H, Song J. Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences. Brief Bioinform 2019; 21:1676-1696. [DOI: 10.1093/bib/bbz112] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 07/31/2019] [Accepted: 08/07/2019] [Indexed: 12/14/2022] Open
Abstract
Abstract
RNA post-transcriptional modifications play a crucial role in a myriad of biological processes and cellular functions. To date, more than 160 RNA modifications have been discovered; therefore, accurate identification of RNA-modification sites is fundamental for a better understanding of RNA-mediated biological functions and mechanisms. However, due to limitations in experimental methods, systematic identification of different types of RNA-modification sites remains a major challenge. Recently, more than 20 computational methods have been developed to identify RNA-modification sites in tandem with high-throughput experimental methods, with most of these capable of predicting only single types of RNA-modification sites. These methods show high diversity in their dataset size, data quality, core algorithms, features extracted and feature selection techniques and evaluation strategies. Therefore, there is an urgent need to revisit these methods and summarize their methodologies, in order to improve and further develop computational techniques to identify and characterize RNA-modification sites from the large amounts of sequence data. With this goal in mind, first, we provide a comprehensive survey on a large collection of 27 state-of-the-art approaches for predicting N1-methyladenosine and N6-methyladenosine sites. We cover a variety of important aspects that are crucial for the development of successful predictors, including the dataset quality, operating algorithms, sequence and genomic features, feature selection, model performance evaluation and software utility. In addition, we also provide our thoughts on potential strategies to improve the model performance. Second, we propose a computational approach called DeepPromise based on deep learning techniques for simultaneous prediction of N1-methyladenosine and N6-methyladenosine. To extract the sequence context surrounding the modification sites, three feature encodings, including enhanced nucleic acid composition, one-hot encoding, and RNA embedding, were used as the input to seven consecutive layers of convolutional neural networks (CNNs), respectively. Moreover, DeepPromise further combined the prediction score of the CNN-based models and achieved around 43% higher area under receiver-operating curve (AUROC) for m1A site prediction and 2–6% higher AUROC for m6A site prediction, respectively, when compared with several existing state-of-the-art approaches on the independent test. In-depth analyses of characteristic sequence motifs identified from the convolution-layer filters indicated that nucleotide presentation at proximal positions surrounding the modification sites contributed most to the classification, whereas those at distal positions also affected classification but to different extents. To maximize user convenience, a web server was developed as an implementation of DeepPromise and made publicly available at http://DeepPromise.erc.monash.edu/, with the server accepting both RNA sequences and genomic sequences to allow prediction of two types of putative RNA-modification sites.
Collapse
Affiliation(s)
- Zhen Chen
- School of BasicMedical Science, Qingdao University, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Fuyi Li
- Northwest A&F University, China
| | | | - A Ian Smith
- Prince Henrys Institute Melbourne and Monash University, Australia
| | | | | | - Abdelkader Baggag
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Victoria 3800, Australia
| |
Collapse
|
41
|
Yang R, Zhang C, Gao R, Zhang L, Song Q. Predicting FAD Interacting Residues with Feature Selection and Comprehensive Sequence Descriptors. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:2046-2056. [PMID: 29993986 DOI: 10.1109/tcbb.2018.2824332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The function of a flavoprotein is determined to a great extent by the binding sites on its surface that interacts with flavin adenine dinucleotide (FAD). Malfunction or dysregulation of FAD binding leads to a series of diseases. Therefore, accurately identifying FAD interacting residues (FIRs) provides insights into the molecular mechanisms of flavoprotein-related biological processes and disease progression. In this paper, a new computational method is proposed for identifying FIRs from protein sequences. Various sequence-derived discriminative features are explored. We analyze the distinctions of these features between FIRs and non-FIRs. We also investigate the predictive capabilities of both individual features and combinations of features. A relief algorithm followed by incremental feature selection (relief-IFS) is then adopted to search the optimal features. Finally, a random forest (RF) module is used to predict FIRs based on the optimal features. Using a 5-fold cross-validation test, the proposed method performs well, with a sensitivity of 0.847, a specificity of 0.933, an accuracy of 0.890, and a Matthews correlation coefficient (MCC) of 0.782, thereby outperforming previous methods. These results indicate that our method is relatively successful at predicting FIRs.
Collapse
|
42
|
Wang F, Guan ZX, Dao FY, Ding H. A Brief Review of the Computational Identification of Antifreeze Protein. CURR ORG CHEM 2019. [DOI: 10.2174/1385272823666190718145613] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Lots of cold-adapted organisms could produce antifreeze proteins (AFPs) to counter the freezing of cell fluids by controlling the growth of ice crystal. AFPs have been found in various species such as in vertebrates, invertebrates, plants, bacteria, and fungi. These AFPs from fish, insects and plants displayed a high diversity. Thus, the identification of the AFPs is a challenging task in computational proteomics. With the accumulation of AFPs and development of machine meaning methods, it is possible to construct a high-throughput tool to timely identify the AFPs. In this review, we briefly reviewed the application of machine learning methods in antifreeze proteins identification from difference section, including published benchmark dataset, sequence descriptor, classification algorithms and published methods. We hope that this review will produce new ideas and directions for the researches in identifying antifreeze proteins.
Collapse
Affiliation(s)
- Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
43
|
Li F, Chen J, Leier A, Marquez-Lago T, Liu Q, Wang Y, Revote J, Smith AI, Akutsu T, Webb GI, Kurgan L, Song J. DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites. Bioinformatics 2019; 36:1057-1065. [PMID: 31566664 PMCID: PMC8215920 DOI: 10.1093/bioinformatics/btz721] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 08/13/2019] [Accepted: 09/25/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Proteases are enzymes that cleave target substrate proteins by catalyzing the hydrolysis of peptide bonds between specific amino acids. While the functional proteolysis regulated by proteases plays a central role in the 'life and death' cellular processes, many of the corresponding substrates and their cleavage sites were not found yet. Availability of accurate predictors of the substrates and cleavage sites would facilitate understanding of proteases' functions and physiological roles. Deep learning is a promising approach for the development of accurate predictors of substrate cleavage events. RESULTS We propose DeepCleave, the first deep learning-based predictor of protease-specific substrates and cleavage sites. DeepCleave uses protein substrate sequence data as input and employs convolutional neural networks with transfer learning to train accurate predictive models. High predictive performance of our models stems from the use of high-quality cleavage site features extracted from the substrate sequences through the deep learning process, and the application of transfer learning, multiple kernels and attention layer in the design of the deep network. Empirical tests against several related state-of-the-art methods demonstrate that DeepCleave outperforms these methods in predicting caspase and matrix metalloprotease substrate-cleavage sites. AVAILABILITY AND IMPLEMENTATION The DeepCleave webserver and source code are freely available at http://deepcleave.erc.monash.edu/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - André Leier
- Department of Genetics, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Tatiana Marquez-Lago
- Department of Genetics, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Yanze Wang
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Jerico Revote
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | | | | |
Collapse
|
44
|
Bao Y, Marini S, Tamura T, Kamada M, Maegawa S, Hosokawa H, Song J, Akutsu T. Toward more accurate prediction of caspase cleavage sites: a comprehensive review of current methods, tools and features. Brief Bioinform 2019; 20:1669-1684. [PMID: 29860277 PMCID: PMC6917222 DOI: 10.1093/bib/bby041] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 04/16/2018] [Indexed: 12/20/2022] Open
Abstract
As one of the few irreversible protein posttranslational modifications, proteolytic cleavage is involved in nearly all aspects of cellular activities, ranging from gene regulation to cell life-cycle regulation. Among the various protease-specific types of proteolytic cleavage, cleavages by casapses/granzyme B are considered as essential in the initiation and execution of programmed cell death and inflammation processes. Although a number of substrates for both types of proteolytic cleavage have been experimentally identified, the complete repertoire of caspases and granzyme B substrates remains to be fully characterized. To tackle this issue and complement experimental efforts for substrate identification, systematic bioinformatics studies of known cleavage sites provide important insights into caspase/granzyme B substrate specificity, and facilitate the discovery of novel substrates. In this article, we review and benchmark 12 state-of-the-art sequence-based bioinformatics approaches and tools for caspases/granzyme B cleavage prediction. We evaluate and compare these methods in terms of their input/output, algorithms used, prediction performance, validation methods and software availability and utility. In addition, we construct independent data sets consisting of caspases/granzyme B substrates from different species and accordingly assess the predictive power of these different predictors for the identification of cleavage sites. We find that the prediction results are highly variable among different predictors. Furthermore, we experimentally validate the predictions of a case study by performing caspase cleavage assay. We anticipate that this comprehensive review and survey analysis will provide an insightful resource for biologists and bioinformaticians who are interested in using and/or developing tools for caspase/granzyme B cleavage prediction.
Collapse
Affiliation(s)
- Yu Bao
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Simone Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, 1241 E. Catherine St., 5940 Buhl, Ann Arbor 48109-5618, USA
| | - Takeyuki Tamura
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Mayumi Kamada
- Graduate School of Medicine, Kyoto University, Sakyo-ku, Kyoto 606-8507, Japan
| | - Shingo Maegawa
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Hiroshi Hosokawa
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash Centre for Data Science and ARC Centre of Excellence in Advance Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| |
Collapse
|
45
|
Zhang M, Li F, Marquez-Lago TT, Leier A, Fan C, Kwoh CK, Chou KC, Song J, Jia C. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 2019; 35:2957-2965. [PMID: 30649179 PMCID: PMC6736106 DOI: 10.1093/bioinformatics/btz016] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 12/09/2018] [Accepted: 01/05/2019] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Promoters are short DNA consensus sequences that are localized proximal to the transcription start sites of genes, allowing transcription initiation of particular genes. However, the precise prediction of promoters remains a challenging task because individual promoters often differ from the consensus at one or more positions. RESULTS In this study, we present a new multi-layer computational approach, called MULTiPly, for recognizing promoters and their specific types. MULTiPly took into account the sequences themselves, including both local information such as k-tuple nucleotide composition, dinucleotide-based auto covariance and global information of the entire samples based on bi-profile Bayes and k-nearest neighbour feature encodings. Specifically, the F-score feature selection method was applied to identify the best unique type of feature prediction results, in combination with other types of features that were subsequently added to further improve the prediction performance of MULTiPly. Benchmarking experiments on the benchmark dataset and comparisons with five state-of-the-art tools show that MULTiPly can achieve a better prediction performance on 5-fold cross-validation and jackknife tests. Moreover, the superiority of MULTiPly was also validated on a newly constructed independent test dataset. MULTiPly is expected to be used as a useful tool that will facilitate the discovery of both general and specific types of promoters in the post-genomic era. AVAILABILITY AND IMPLEMENTATION The MULTiPly webserver and curated datasets are freely available at http://flagshipnt.erc.monash.edu/MULTiPly/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meng Zhang
- School of Science, Dalian Maritime University, Dalian, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Tatiana T Marquez-Lago
- Department of Genetics, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Cunshuo Fan
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | | | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, China
- College of Information Engineering, Northwest A&F University, Yangling, China
| |
Collapse
|
46
|
Liu ZX, Yu K, Dong J, Zhao L, Liu Z, Zhang Q, Li S, Du Y, Cheng H. Precise Prediction of Calpain Cleavage Sites and Their Aberrance Caused by Mutations in Cancer. Front Genet 2019; 10:715. [PMID: 31440276 PMCID: PMC6694742 DOI: 10.3389/fgene.2019.00715] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Accepted: 07/05/2019] [Indexed: 02/05/2023] Open
Abstract
As a widespread post-translational modification of proteins, calpain-mediated cleavage regulates a broad range of cellular processes, including proliferation, differentiation, cytoskeletal reorganization, and apoptosis. The identification of proteins that undergo calpain cleavage in a site-specific manner is the necessary foundation for understanding the exact molecular mechanisms and regulatory roles of calpain-mediated cleavage. In contrast with time-consuming and labor-intensive experimental methods, computational approaches for detecting calpain cleavage sites have attracted wide attention due to their efficiency and convenience. In this study, we established a novel computational tool named DeepCalpain (http://deepcalpain.cancerbio.info/) for predicting the potential calpain cleavage sites by adopting deep neural network and the particle swarm optimization algorithm. Through critical evaluation and comparison, DeepCalpain exhibited superior performance against other existing tools. Meanwhile, we found that protein interactions could enrich the calpain-substrate regulatory relationship. Since calpain-mediated cleavage was critical for cancer development and progression, we comprehensively analyzed the calpain cleavage associated mutations across 11 cancers with the help of DeepCalpain, which demonstrated that the calpain-mediated cleavage events were affected by mutations and heavily implicated in the regulation of cancer cells. These prediction and analysis results might provide helpful information to reveal the regulatory mechanism of calpain cleavage in biological pathways and different cancer types, which might open new avenues for the diagnosis and treatment of cancers.
Collapse
Affiliation(s)
- Ze-Xian Liu
- School of Life Sciences, Zhengzhou University, Zhengzhou, China.,State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Kai Yu
- School of Life Sciences, Zhengzhou University, Zhengzhou, China.,State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Jingsi Dong
- Lung Cancer Center, West China Hospital, Sichuan University, Chengdu, China
| | - Linhong Zhao
- Institute of Life Sciences, Southeast University, Nanjing, China
| | - Zekun Liu
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Qingfeng Zhang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Shihua Li
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Yimeng Du
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| | - Han Cheng
- School of Life Sciences, Zhengzhou University, Zhengzhou, China
| |
Collapse
|
47
|
Wang J, Yang B, An Y, Marquez-Lago T, Leier A, Wilksch J, Hong Q, Zhang Y, Hayashida M, Akutsu T, Webb GI, Strugnell RA, Song J, Lithgow T. Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches. Brief Bioinform 2019; 20:931-951. [PMID: 29186295 PMCID: PMC6585386 DOI: 10.1093/bib/bbx164] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2017] [Revised: 11/08/2017] [Indexed: 12/13/2022] Open
Abstract
In the course of infecting their hosts, pathogenic bacteria secrete numerous effectors, namely, bacterial proteins that pervert host cell biology. Many Gram-negative bacteria, including context-dependent human pathogens, use a type IV secretion system (T4SS) to translocate effectors directly into the cytosol of host cells. Various type IV secreted effectors (T4SEs) have been experimentally validated to play crucial roles in virulence by manipulating host cell gene expression and other processes. Consequently, the identification of novel effector proteins is an important step in increasing our understanding of host-pathogen interactions and bacterial pathogenesis. Here, we train and compare six machine learning models, namely, Naïve Bayes (NB), K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), support vector machines (SVMs) and multilayer perceptron (MLP), for the identification of T4SEs using 10 types of selected features and 5-fold cross-validation. Our study shows that: (1) including different but complementary features generally enhance the predictive performance of T4SEs; (2) ensemble models, obtained by integrating individual single-feature models, exhibit a significantly improved predictive performance and (3) the 'majority voting strategy' led to a more stable and accurate classification performance when applied to predicting an ensemble learning model with distinct single features. We further developed a new method to effectively predict T4SEs, Bastion4 (Bacterial secretion effector predictor for T4SS), and we show our ensemble classifier clearly outperforms two recent prediction tools. In summary, we developed a state-of-the-art T4SE predictor by conducting a comprehensive performance evaluation of different machine learning algorithms along with a detailed analysis of single- and multi-feature selections.
Collapse
Affiliation(s)
- Jiawei Wang
- Biomedicine Discovery Institute and the Department of Microbiology at Monash University, Australia
| | - Bingjiao Yang
- National Engineering Research Center for Equipment and Technology of Cold Strip Rolling, College of Mechanical Engineering from Yanshan University, China
| | - Yi An
- College of Information Engineering, Northwest A&F University, China
| | - Tatiana Marquez-Lago
- Department of Genetics, University of Alabama at Birmingham (UAB) School of Medicine, USA
| | - André Leier
- Department of Genetics and the Informatics Institute, University of Alabama at Birmingham (UAB) School of Medicine, USA
| | - Jonathan Wilksch
- Department of Microbiology and Immunology at the University of Melbourne, Australia
| | | | - Yang Zhang
- Computer Science and Engineering in 2015 fromNorthwestern Polytechnical University, China
| | | | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan
| | - Geoffrey I Webb
- Faculty of Information Technology, Monash Centre for Data Science, Monash University
| | - Richard A Strugnell
- Department of Microbiology and Immunology, Faculty of Medicine Dentistry and Health Sciences, University of Melbourne
| | - Jiangning Song
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Trevor Lithgow
- Department of Microbiology at Monash University, Australia
| |
Collapse
|
48
|
Maes E, Oeyen E, Boonen K, Schildermans K, Mertens I, Pauwels P, Valkenborg D, Baggerman G. The challenges of peptidomics in complementing proteomics in a clinical context. MASS SPECTROMETRY REVIEWS 2019; 38:253-264. [PMID: 30372792 DOI: 10.1002/mas.21581] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 10/01/2018] [Indexed: 06/08/2023]
Abstract
Naturally occurring peptides, including growth factors, hormones, and neurotransmitters, represent an important class of biomolecules and have crucial roles in human physiology. The study of these peptides in clinical samples is therefore as relevant as ever. Compared to more routine proteomics applications in clinical research, peptidomics research questions are more challenging and have special requirements with regard to sample handling, experimental design, and bioinformatics. In this review, we describe the issues that confront peptidomics in a clinical context. After these hurdles are (partially) overcome, peptidomics will be ready for a successful translation into medical practice.
Collapse
Affiliation(s)
- Evelyne Maes
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Food and Bio-Based Products, AgResearch Ltd., Lincoln, New Zealand
| | - Eline Oeyen
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Karin Schildermans
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Inge Mertens
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Patrick Pauwels
- Molecular Pathology Unit, Department of Pathology, Antwerp University Hospital, Edegem, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Center for Statistics, Hasselt University, Diepenbeek, Belgium
| | - Geert Baggerman
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
49
|
Tixeira R, Phan TK, Caruso S, Shi B, Atkin-Smith GK, Nedeva C, Chow JDY, Puthalakath H, Hulett MD, Herold MJ, Poon IKH. ROCK1 but not LIMK1 or PAK2 is a key regulator of apoptotic membrane blebbing and cell disassembly. Cell Death Differ 2019; 27:102-116. [PMID: 31043701 DOI: 10.1038/s41418-019-0342-5] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 04/15/2019] [Accepted: 04/17/2019] [Indexed: 12/31/2022] Open
Abstract
Many cell types are known to undergo a series of morphological changes during the progression of apoptosis, leading to their disassembly into smaller membrane-bound vesicles known as apoptotic bodies (ApoBDs). In particular, the formation of circular bulges called membrane blebs on the surface of apoptotic cells is a key morphological step required for a number of cell types to generate ApoBDs. Although apoptotic membrane blebbing is thought to be regulated by kinases including ROCK1, PAK2 and LIMK1, it is unclear whether these kinases exhibit overlapping roles in the disassembly of apoptotic cells. Utilising both pharmacological and CRISPR/Cas9 gene editing based approaches, we identified ROCK1 but not PAK2 or LIMK1 as a key non-redundant positive regulator of apoptotic membrane blebbing as well as ApoBD formation. Functionally, we have established an experimental system to either inhibit or enhance ApoBD formation and demonstrated the importance of apoptotic cell disassembly in the efficient uptake of apoptotic materials by various phagocytes. Unexpectedly, we also noted that ROCK1 could play a role in regulating the onset of secondary necrosis. Together, these data shed light on both the mechanism and function of cell disassembly during apoptosis.
Collapse
Affiliation(s)
- Rochelle Tixeira
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Thanh Kha Phan
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Sarah Caruso
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Bo Shi
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Georgia K Atkin-Smith
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Christina Nedeva
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Jenny D Y Chow
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Hamsa Puthalakath
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Mark D Hulett
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia
| | - Marco J Herold
- The Walter and Eliza Hall Institute for Medical Research, Parkville, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, Australia
| | - Ivan K H Poon
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, 3086, Australia.
| |
Collapse
|
50
|
Chen Z, Zhao P, Li F, Marquez-Lago TT, Leier A, Revote J, Zhu Y, Powell DR, Akutsu T, Webb GI, Chou KC, Smith AI, Daly RJ, Li J, Song J. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief Bioinform 2019; 21:1047-1057. [DOI: 10.1093/bib/bbz041] [Citation(s) in RCA: 189] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Revised: 02/28/2019] [Accepted: 03/13/2019] [Indexed: 12/13/2022] Open
Abstract
Abstract
With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medical Science, Qingdao University, 38 Dengzhou Road, Qingdao, 266021, Shandong, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang, 455000, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Jerico Revote
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yan Zhu
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - David R Powell
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Roger J Daly
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Jian Li
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|