1
|
Chu DH, An JY, Nie XM. An Effective Computational Method for Predicting Self-Interacting Proteins Based on VGGNet Convolutional Neural Network and Gray-Level Co-occurrence Matrix. Evol Bioinform Online 2024; 20:11769343241292224. [PMID: 39464790 PMCID: PMC11503870 DOI: 10.1177/11769343241292224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 10/01/2024] [Indexed: 10/29/2024] Open
Abstract
Introduction Predicting Self-interacting proteins (SIPs) is a crucial area of research in predicting protein functions, as well as in understanding gene-disease and disease-drug associations. These interactions are integral to numerous cellular processes and play pivotal roles within cells. However, traditional methods for identifying SIPs through biological experiments are often expensive, time-consuming, and have long cycles. Therefore, the development of effective computational methods for accurately predicting SIPs is not only necessary but also presents a significant challenge. Results In this research, we introduce a novel computational prediction technique, VGGNGLCM, which leverages protein sequence data. This method integrates the VGGNet deep convolutional neural network (VGGN) with the Gray-Level Co-occurrence Matrix (GLCM) to detect Self-interacting proteins associations. Specifically, we initially utilized Position Specific Scoring Matrix (PSSM) to capture protein evolutionary information and integrated key features from PSSM using GLCM. We then employed VGGNet as a predictive classifier, leveraging its capabilities for powerful learning and classification prediction. Subsequently, the extracted features were input into the VGGNet deep convolutional neural network to identify Self-interacting proteins. To evaluate the performance of the VGGNGLCM model, we conducted experiments using yeast and human datasets, achieving average accuracies of 95.68% and 97.72% respectively. Additionally, we compared the prediction performance of the VGGNet classifier with that of the Convolutional Neural Network (CNN) and the state-of-the-art Support Vector Machine (SVM) using the same feature extraction method. We also compared the prediction ability of VGGNGLCM with other existing approaches. The comparison results further demonstrate the superior performance of VGGNGLCM over other prediction models in this domain. Conclusion The experimental verification further strengthens the evidence that VGGNGLCM is effective and robust compared to existing methods. It also highlights the high accuracy and robustness of the VGGNGLCM model in predicting Self-interacting proteins (SIPs). Consequently, we believe that the VGGNGLCM method serves as a valuable computational tool and can catalyze extensive bioinformatics research related to SIPs prediction.
Collapse
Affiliation(s)
- Dan-Hua Chu
- School of Mathematics, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Xiao-Mei Nie
- The Library of China University of Mining and Technology, Xuzhou, Jiangsu, China
| |
Collapse
|
2
|
Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier. BMC Bioinformatics 2022; 23:518. [PMID: 36457083 PMCID: PMC9713954 DOI: 10.1186/s12859-022-04880-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 08/03/2022] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Self-interacting proteins (SIPs), two or more copies of the protein that can interact with each other expressed by one gene, play a central role in the regulation of most living cells and cellular functions. Although numerous SIPs data can be provided by using high-throughput experimental techniques, there are still several shortcomings such as in time-consuming, costly, inefficient, and inherently high in false-positive rates, for the experimental identification of SIPs even nowadays. Therefore, it is more and more significant how to develop efficient and accurate automatic approaches as a supplement of experimental methods for assisting and accelerating the study of predicting SIPs from protein sequence information. RESULTS In this paper, we present a novel framework, termed GLCM-WSRC (gray level co-occurrence matrix-weighted sparse representation based classification), for predicting SIPs automatically based on protein evolutionary information from protein primary sequences. More specifically, we firstly convert the protein sequence into Position Specific Scoring Matrix (PSSM) containing protein sequence evolutionary information, exploiting the Position Specific Iterated BLAST (PSI-BLAST) tool. Secondly, using an efficient feature extraction approach, i.e., GLCM, we extract abstract salient and invariant feature vectors from the PSSM, and then perform a pre-processing operation, the adaptive synthetic (ADASYN) technique, to balance the SIPs dataset to generate new feature vectors for classification. Finally, we employ an efficient and reliable WSRC model to identify SIPs according to the known information of self-interacting and non-interacting proteins. CONCLUSIONS Extensive experimental results show that the proposed approach exhibits high prediction performance with 98.10% accuracy on the yeast dataset, and 91.51% accuracy on the human dataset, which further reveals that the proposed model could be a useful tool for large-scale self-interacting protein prediction and other bioinformatics tasks detection in the future.
Collapse
|
3
|
Zingg D, Bhin J, Yemelyanenko J, Kas SM, Rolfs F, Lutz C, Lee JK, Klarenbeek S, Silverman IM, Annunziato S, Chan CS, Piersma SR, Eijkman T, Badoux M, Gogola E, Siteur B, Sprengers J, de Klein B, de Goeij-de Haas RR, Riedlinger GM, Ke H, Madison R, Drenth AP, van der Burg E, Schut E, Henneman L, van Miltenburg MH, Proost N, Zhen H, Wientjens E, de Bruijn R, de Ruiter JR, Boon U, de Korte-Grimmerink R, van Gerwen B, Féliz L, Abou-Alfa GK, Ross JS, van de Ven M, Rottenberg S, Cuppen E, Chessex AV, Ali SM, Burn TC, Jimenez CR, Ganesan S, Wessels LFA, Jonkers J. Truncated FGFR2 is a clinically actionable oncogene in multiple cancers. Nature 2022; 608:609-617. [PMID: 35948633 PMCID: PMC9436779 DOI: 10.1038/s41586-022-05066-5] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 07/03/2022] [Indexed: 12/13/2022]
Abstract
Somatic hotspot mutations and structural amplifications and fusions that affect fibroblast growth factor receptor 2 (encoded by FGFR2) occur in multiple types of cancer1. However, clinical responses to FGFR inhibitors have remained variable1–9, emphasizing the need to better understand which FGFR2 alterations are oncogenic and therapeutically targetable. Here we apply transposon-based screening10,11 and tumour modelling in mice12,13, and find that the truncation of exon 18 (E18) of Fgfr2 is a potent driver mutation. Human oncogenomic datasets revealed a diverse set of FGFR2 alterations, including rearrangements, E1–E17 partial amplifications, and E18 nonsense and frameshift mutations, each causing the transcription of E18-truncated FGFR2 (FGFR2ΔE18). Functional in vitro and in vivo examination of a compendium of FGFR2ΔE18 and full-length variants pinpointed FGFR2-E18 truncation as single-driver alteration in cancer. By contrast, the oncogenic competence of FGFR2 full-length amplifications depended on a distinct landscape of cooperating driver genes. This suggests that genomic alterations that generate stable FGFR2ΔE18 variants are actionable therapeutic targets, which we confirmed in preclinical mouse and human tumour models, and in a clinical trial. We propose that cancers containing any FGFR2 variant with a truncated E18 should be considered for FGFR-targeted therapies. Truncation of exon 18 of FGFR2 (FGFR2ΔE18) is a potent driver mutation in mice and humans, and FGFR-targeted therapy should be considered for patients with cancer expressing stable FGFR2ΔE18 variants.
Collapse
Affiliation(s)
- Daniel Zingg
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Jinhyuk Bhin
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands.,Division of Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Julia Yemelyanenko
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Sjors M Kas
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Frank Rolfs
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands.,OncoProteomics Laboratory, Department of Medical Oncology, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Catrin Lutz
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | | | - Sjoerd Klarenbeek
- Experimental Animal Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | | | - Stefano Annunziato
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Chang S Chan
- Department of Medicine, Division of Medical Oncology, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA.,Department of Medicine and Pharmacology, Rutgers University, Piscataway, NJ, USA
| | - Sander R Piersma
- OncoProteomics Laboratory, Department of Medical Oncology, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Timo Eijkman
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Madelon Badoux
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Ewa Gogola
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Bjørn Siteur
- Mouse Clinic for Cancer and Aging, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Justin Sprengers
- Mouse Clinic for Cancer and Aging, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Bim de Klein
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Richard R de Goeij-de Haas
- OncoProteomics Laboratory, Department of Medical Oncology, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Gregory M Riedlinger
- Department of Medicine and Pharmacology, Rutgers University, Piscataway, NJ, USA.,Department of Pathology, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA
| | - Hua Ke
- Department of Medicine, Division of Medical Oncology, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA.,Department of Medicine and Pharmacology, Rutgers University, Piscataway, NJ, USA
| | | | - Anne Paulien Drenth
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Eline van der Burg
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Eva Schut
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Linda Henneman
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands.,Mouse Clinic for Cancer and Aging, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Martine H van Miltenburg
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Natalie Proost
- Mouse Clinic for Cancer and Aging, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | | | - Ellen Wientjens
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | - Roebi de Bruijn
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands.,Division of Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Julian R de Ruiter
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands.,Division of Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Ute Boon
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Oncode Institute, Utrecht, The Netherlands
| | | | - Bastiaan van Gerwen
- Mouse Clinic for Cancer and Aging, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Luis Féliz
- Incyte Biosciences International, Morges, Switzerland
| | - Ghassan K Abou-Alfa
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.,Department of Medicine, Weill Medical College at Cornell University, New York, NY, USA
| | - Jeffrey S Ross
- Foundation Medicine, Cambridge, MA, USA.,Upstate University Hospital, Upstate Medical University, Syracuse, NY, USA
| | - Marieke van de Ven
- Mouse Clinic for Cancer and Aging, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Sven Rottenberg
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Institute of Animal Pathology, Vetsuisse Faculty, University of Bern, Bern, Switzerland.,Bern Center for Precision Medicine, University of Bern, Bern, Switzerland
| | - Edwin Cuppen
- Oncode Institute, Utrecht, The Netherlands.,Hartwig Medical Foundation, Amsterdam, The Netherlands.,Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | | | - Connie R Jimenez
- OncoProteomics Laboratory, Department of Medical Oncology, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Shridar Ganesan
- Department of Medicine, Division of Medical Oncology, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA. .,Department of Medicine and Pharmacology, Rutgers University, Piscataway, NJ, USA.
| | - Lodewyk F A Wessels
- Oncode Institute, Utrecht, The Netherlands. .,Division of Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Jos Jonkers
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands. .,Oncode Institute, Utrecht, The Netherlands.
| |
Collapse
|
4
|
Liu Z, Li H, Jin Z, Li Y, Guo F, He Y, Liu X, Qi Y, Yuan L, He F, Li D. Exploration of Target Spaces in the Human Genome for Protein and Peptide Drugs. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:780-794. [PMID: 35338014 PMCID: PMC9881050 DOI: 10.1016/j.gpb.2021.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 10/20/2021] [Accepted: 11/01/2021] [Indexed: 01/31/2023]
Abstract
After decades of development, protein and peptide drugs have now grown into a major drug class in the marketplace. Target identification and validation are crucial for the discovery of protein and peptide drugs, and bioinformatics prediction of targets based on the characteristics of known target proteins will help improve the efficiency and success rate of target selection. However, owing to the developmental history in the pharmaceutical industry, previous systematic exploration of the target spaces has mainly focused on traditional small-molecule drugs, while studies related to protein and peptide drugs are lacking. Here, we systematically explore the target spaces in the human genome specifically for protein and peptide drugs. Compared with other proteins, both successful protein and peptide drug targets have many special characteristics, and are also significantly different from those of small-molecule drugs in many aspects. Based on these features, we develop separate effective genome-wide target prediction models for protein and peptide drugs. Finally, a user-friendly web server, Predictor Of Protein and PeptIde drugs' therapeutic Targets (POPPIT) (http://poppit.ncpsb.org.cn/), is established, which provides not only target prediction specifically for protein and peptide drugs but also abundant annotations for predicted targets.
Collapse
Affiliation(s)
- Zhongyang Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China,School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China,College of Chemistry and Environmental Science, Hebei University, Baoding 071002, China,Corresponding authors.
| | - Honglei Li
- Suzhou Geneworks Technology Co., Ltd., Suzhou 215028, China
| | - Zhaoyu Jin
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yang Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Feifei Guo
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Yangzhige He
- Department of Medical Research Center, Peking Union Medical College Hospital, Chinese Academy of Medical Science & Peking Union Medical College, Beijing 100730, China
| | - Xinyue Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yaning Qi
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China,College of Life Sciences, Hebei University, Baoding 071002, China
| | - Liying Yuan
- College of Life Sciences, Hebei University, Baoding 071002, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China,Corresponding authors.
| | - Dong Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China,School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China,Corresponding authors.
| |
Collapse
|
5
|
Li Y, Zhang YR, Zhang P, Li DX, Xiao TL. Protein–Protein Interactions Prediction Base on Multiple Information Fusion via Graph Representation Learning. J BIOMATER TISS ENG 2022. [DOI: 10.1166/jbt.2022.2953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
It is a critical impact on the processing of biological cells to protein–protein interactions (PPIs) in nature. Traditional PPIs predictive biological experiments consume a lot of human and material costs and time. Therefore, there is a great need to use computational methods
to forecast PPIs. Most of the existing calculation methods are based on the sequence characteristics or internal structural characteristics of proteins, and most of them have the singleness of features. Therefore, we propose a novel method to predict PPIs base on multiple information fusion
through graph representation learning. Specifically, firstly, the known protein sequences are calculated, and the properties of each protein are obtained by k-mer. Then, the known protein relationship pairs were constructed into an adjacency graph, and the graph representation learning method–graph
convolution network was used to fuse the attributes of each protein with the graph structure information to obtain the features containing a variety of information. Finally, we put the multi-information features into the random forest classifier species for prediction and classification. Experimental
results indicate that our method has high accuracy and AUC of 78.83% and 86.10%, respectively. In conclusion, our method has an excellent application prospect for predicting unknown PPIs.
Collapse
Affiliation(s)
- Yan Li
- School of Economics and Management, Shangluo University, Shangluo, 726000, China
| | - Yu-Ren Zhang
- The School of Computer Sciences, BaoJi University of Arts and Sciences, Baoji, 721016, China
| | - Ping Zhang
- The School of Computer Sciences, BaoJi University of Arts and Sciences, Baoji, 721016, China
| | - Dong-Xu Li
- The School of Computer Sciences, BaoJi University of Arts and Sciences, Baoji, 721016, China
| | - Tian-Long Xiao
- The School of Computer Sciences, BaoJi University of Arts and Sciences, Baoji, 721016, China
| |
Collapse
|
6
|
Jia LN, Yan X, You ZH, Zhou X, Li LP, Wang L, Song KJ. NLPEI: A Novel Self-Interacting Protein Prediction Model Based on Natural Language Processing and Evolutionary Information. Evol Bioinform Online 2020; 16:1176934320984171. [PMID: 33488064 PMCID: PMC7768313 DOI: 10.1177/1176934320984171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 12/01/2020] [Indexed: 12/13/2022] Open
Abstract
The study of protein self-interactions (SIPs) can not only reveal the function of proteins at the molecular level, but is also crucial to understand activities such as growth, development, differentiation, and apoptosis, providing an important theoretical basis for exploring the mechanism of major diseases. With the rapid advances in biotechnology, a large number of SIPs have been discovered. However, due to the long period and high cost inherent to biological experiments, the gap between the identification of SIPs and the accumulation of data is growing. Therefore, fast and accurate computational methods are needed to effectively predict SIPs. In this study, we designed a new method, NLPEI, for predicting SIPs based on natural language understanding theory and evolutionary information. Specifically, we first understand the protein sequence as natural language and use natural language processing algorithms to extract its features. Then, we use the Position-Specific Scoring Matrix (PSSM) to represent the evolutionary information of the protein and extract its features through the Stacked Auto-Encoder (SAE) algorithm of deep learning. Finally, we fuse the natural language features of proteins with evolutionary features and make accurate predictions by Extreme Learning Machine (ELM) classifier. In the SIPs gold standard data sets of human and yeast, NLPEI achieved 94.19% and 91.29% prediction accuracy. Compared with different classifier models, different feature models, and other existing methods, NLPEI obtained the best results. These experimental results indicated that NLPEI is an effective tool for predicting SIPs and can provide reliable candidates for biological experiments.
Collapse
Affiliation(s)
- Li-Na Jia
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| | - Xin Yan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
- School of Foreign Languages, Zaozhuang University, Zaozhuang, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Xi Zhou
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- Lei Wang, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.
| | - Ke-Jian Song
- School of information engineering, Jiangxi University of Science and Technology, Ganzhou, China
| |
Collapse
|
7
|
An JY, Zhou Y, Yan ZJ, Zhao YJ. Predicting Self-Interacting Proteins Using a Recurrent Neural Network and Protein Evolutionary Information. Evol Bioinform Online 2020; 16:1176934320924674. [PMID: 32550764 PMCID: PMC7278102 DOI: 10.1177/1176934320924674] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 04/16/2020] [Indexed: 11/15/2022] Open
Abstract
Self-interacting proteins (SIPs) play crucial roles in biological activities of organisms. Many high-throughput methods can be used to identify SIPs. However, these methods are both time-consuming and expensive. How to develop effective computational approaches for identifying SIPs is a challenging task. In the article, we present a novel computational method called RRN-SIFT, which combines the recurrent neural network (RNN) with scale invariant feature transform (SIFT) to predict SIPs based on protein evolutionary information. The main advantage of the proposed RNN-SIFT model is that it uses SIFT for extracting key feature by exploring the evolutionary information embedded in Position-Specific Iterated BLAST-constructed position-specific scoring matrix and employs an RNN classifier to perform classification based on extracted features. Extensive experiments show that the RRN-SIFT obtained average accuracy of 94.34% and 97.12% on the yeast and human dataset, respectively. We also compared our performance with the back propagation neural network (BPNN), the state-of-the-art support vector machine (SVM), and other existing methods. By comparing with experimental results, the performance of RNN-SIFT is significantly better than that of the BPNN, SVM, and other previous methods in the domain. Therefore, we conclude that the proposed RNN-SIFT model is a useful tool for predicting SIPs, as well to solve other bioinformatics tasks. To facilitate widely studies and encourage future proteomics research, a freely available web server called RNN-SIFT-SIPs was developed at http://219.219.62.123:8888/RNNSIFT/ including the source code and the SIP datasets.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| | - Yong Zhou
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| | - Zi-Ji Yan
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| | - Yu-Jun Zhao
- School of Computer Science and Technology, China
University of Mining and Technology, Xuzhou, China
| |
Collapse
|
8
|
Cattoglio C, Pustova I, Darzacq X, Tjian R, Hansen AS. Assessing Self-interaction of Mammalian Nuclear Proteins by Co-immunoprecipitation. Bio Protoc 2020; 10:e3526. [PMID: 33654750 PMCID: PMC7842838 DOI: 10.21769/bioprotoc.3526] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 01/25/2020] [Accepted: 01/13/2020] [Indexed: 11/02/2022] Open
Abstract
Protein-protein interactions constitute the molecular foundations of virtually all biological processes. Co-immunoprecipitation (CoIP) experiments are probably the most widely used method to probe both heterotypic and homotypic protein-protein interactions. Recent advances in super-resolution microscopy have revealed that several nuclear proteins such as transcription factors are spatially distributed into local high-concentration clusters in mammalian cells, suggesting that many nuclear proteins self-interact. These observations have further underscored the need for orthogonal biochemical approaches for testing if self-association occurs, and if so, what the mechanisms are. Here, we describe a CoIP protocol specifically optimized to test self-association of endogenously tagged nuclear proteins (self-CoIP), and to evaluate the role of nucleic acids in such self-interaction. This protocol has proven reliable and robust in our hands, and it can be used to test both homotypic and heterotypic (CoIP) protein-protein interactions.
Collapse
Affiliation(s)
- Claudia Cattoglio
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Li Ka Shing Center for Biomedical and Health Sciences, Berkeley, CA, USA
- CIRM Center of Excellence, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Iryna Pustova
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Li Ka Shing Center for Biomedical and Health Sciences, Berkeley, CA, USA
- CIRM Center of Excellence, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Xavier Darzacq
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Li Ka Shing Center for Biomedical and Health Sciences, Berkeley, CA, USA
- CIRM Center of Excellence, University of California, Berkeley, Berkeley, CA, USA
| | - Robert Tjian
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Li Ka Shing Center for Biomedical and Health Sciences, Berkeley, CA, USA
- CIRM Center of Excellence, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Anders S. Hansen
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Li Ka Shing Center for Biomedical and Health Sciences, Berkeley, CA, USA
- CIRM Center of Excellence, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, USA
| |
Collapse
|
9
|
Qu J, Zhao Y, Zhang L, Cai SB, Ming Z, Wang CC. Computational Models for Self-Interacting Proteins Prediction. Protein Pept Lett 2019; 27:392-399. [PMID: 31880240 DOI: 10.2174/0929866527666191227141713] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 11/19/2019] [Accepted: 11/21/2019] [Indexed: 11/22/2022]
Abstract
Self-Interacting Proteins (SIPs), whose two or more copies can interact with each other, have significant roles in cellular functions and evolution of Protein Interaction Networks (PINs). Knowing whether a protein can act on itself is important to understand its functions. Previous studies on SIPs have focused on their structures and functions, while their whole properties are less emphasized. Not surprisingly, identifying SIPs is one of the most important works in biomedical research, which will help to understanding the function and mechanism of proteins. It is worth noting that high throughput methods can be used for SIPs prediction, but can be costly, time consuming and challenging. Therefore, it is urgent to design computational models for the identification of SIPs. In this review, the concept and function of SIPs were introduced in detail. We further introduced SIPs data and some excellent computational models that have been designed for SIPs prediction. Specially, the most existing approaches were developed based on machine learning through carrying out different extract feature methods. Finally, we discussed several difficult problems in developing computational models for SIPs prediction.
Collapse
Affiliation(s)
- Jia Qu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Shu-Bin Cai
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Zhong Ming
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
10
|
Chen ZH, You ZH, Li LP, Wang YB, Qiu Y, Hu PW. Identification of self-interacting proteins by integrating random projection classifier and finite impulse response filter. BMC Genomics 2019; 20:928. [PMID: 31881833 PMCID: PMC6933882 DOI: 10.1186/s12864-019-6301-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Background Identification of protein-protein interactions (PPIs) is crucial for understanding biological processes and investigating the cellular functions of genes. Self-interacting proteins (SIPs) are those in which more than two identical proteins can interact with each other and they are the specific type of PPIs. More and more researchers draw attention to the SIPs detection, and several prediction model have been proposed, but there are still some problems. Hence, there is an urgent need to explore a efficient computational model for SIPs prediction. Results In this study, we developed an effective model to predict SIPs, called RP-FIRF, which merges the Random Projection (RP) classifier and Finite Impulse Response Filter (FIRF) together. More specifically, each protein sequence was firstly transformed into the Position Specific Scoring Matrix (PSSM) by exploiting Position Specific Iterated BLAST (PSI-BLAST). Then, to effectively extract the discriminary SIPs feature to improve the performance of SIPs prediction, a FIRF method was used on PSSM. The R’classifier was proposed to execute the classification and predict novel SIPs. We evaluated the performance of the proposed RP-FIRF model and compared it with the state-of-the-art support vector machine (SVM) on human and yeast datasets, respectively. The proposed model can achieve high average accuracies of 97.89 and 97.35% using five-fold cross-validation. To further evaluate the high performance of the proposed method, we also compared it with other six exiting methods, the experimental results demonstrated that the capacity of our model surpass that of the other previous approaches. Conclusion Experimental results show that self-interacting proteins are accurately well-predicted by the proposed model on human and yeast datasets, respectively. It fully show that the proposed model can predict the SIPs effectively and sufficiently. Thus, RP-FIRF model is an automatic decision support method which should provide useful insights into the recognition of SIPs.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Li-Ping Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Yan-Bin Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Yu Qiu
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | | |
Collapse
|
11
|
Chen ZH, You ZH, Zhang WB, Wang YB, Cheng L, Alghazzawi D. Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model. Genes (Basel) 2019; 10:genes10110924. [PMID: 31726752 PMCID: PMC6896115 DOI: 10.3390/genes10110924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 11/05/2019] [Accepted: 11/06/2019] [Indexed: 11/22/2022] Open
Abstract
Self-interacting proteins (SIPs) is of paramount importance in current molecular biology. There have been developed a number of traditional biological experiment methods for predicting SIPs in the past few years. However, these methods are costly, time-consuming and inefficient, and often limit their usage for predicting SIPs. Therefore, the development of computational method emerges at the times require. In this paper, we for the first time proposed a novel deep learning model which combined natural language processing (NLP) method for potential SIPs prediction from the protein sequence information. More specifically, the protein sequence is de novo assembled by k-mers. Then, we obtained the global vectors representation for each protein sequences by using natural language processing (NLP) technique. Finally, based on the knowledge of known self-interacting and non-interacting proteins, a multi-grained cascade forest model is trained to predict SIPs. Comprehensive experiments were performed on yeast and human datasets, which obtained an accuracy rate of 91.45% and 93.12%, respectively. From our evaluations, the experimental results show that the use of amino acid semantics information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work would have potential applications for various biological classification problems.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (Z.-H.C.); (W.-B.Z.); (Y.-B.W.); (L.C.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (Z.-H.C.); (W.-B.Z.); (Y.-B.W.); (L.C.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Correspondence: or ; Tel.: +86-991-3835-823
| | - Wen-Bo Zhang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (Z.-H.C.); (W.-B.Z.); (Y.-B.W.); (L.C.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan-Bin Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (Z.-H.C.); (W.-B.Z.); (Y.-B.W.); (L.C.)
| | - Li Cheng
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (Z.-H.C.); (W.-B.Z.); (Y.-B.W.); (L.C.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Daniyal Alghazzawi
- Department of Information Systems, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| |
Collapse
|
12
|
Chen ZH, Li LP, He Z, Zhou JR, Li Y, Wong L. An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation. Front Genet 2019; 10:90. [PMID: 30881376 PMCID: PMC6405691 DOI: 10.3389/fgene.2019.00090] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 01/29/2019] [Indexed: 12/23/2022] Open
Abstract
Self-interacting proteins (SIPs), whose more than two identities can interact with each other, play significant roles in the understanding of cellular process and cell functions. Although a number of experimental methods have been designed to detect the SIPs, they remain to be extremely time-consuming, expensive, and challenging even nowadays. Therefore, there is an urgent need to develop the computational methods for predicting SIPs. In this study, we propose a deep forest based predictor for accurate prediction of SIPs using protein sequence information. More specifically, a novel feature representation method, which integrate position-specific scoring matrix (PSSM) with wavelet transform, is introduced. To evaluate the performance of the proposed method, cross-validation tests are performed on two widely used benchmark datasets. The experimental results show that the proposed model achieved high accuracies of 95.43 and 93.65% on human and yeast datasets, respectively. The AUC value for evaluating the performance of the proposed method was also reported. The AUC value for yeast and human datasets are 0.9203 and 0.9586, respectively. To further show the advantage of the proposed method, it is compared with several existing methods. The results demonstrate that the proposed model is better than other SIPs prediction methods. This work can offer an effective architecture to biologists in detecting new SIPs.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Li-Ping Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zhou He
- College of Engineering and Applied Science, University of Colorado Boulder, Boulder, CO, United States
| | - Ji-Ren Zhou
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Yangming Li
- ECTET, Rochester Institute of Technology, Rochester, NY, United States
| | - Leon Wong
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
13
|
Chen ZH, You ZH, Li LP, Wang YB, Wong L, Yi HC. Prediction of Self-Interacting Proteins from Protein Sequence Information Based on Random Projection Model and Fast Fourier Transform. Int J Mol Sci 2019; 20:ijms20040930. [PMID: 30795499 PMCID: PMC6412412 DOI: 10.3390/ijms20040930] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 01/06/2019] [Accepted: 01/07/2019] [Indexed: 12/30/2022] Open
Abstract
It is significant for biological cells to predict self-interacting proteins (SIPs) in the field of bioinformatics. SIPs mean that two or more identical proteins can interact with each other by one gene expression. This plays a major role in the evolution of protein‒protein interactions (PPIs) and cellular functions. Owing to the limitation of the experimental identification of self-interacting proteins, it is more and more significant to develop a useful biological tool for the prediction of SIPs from protein sequence information. Therefore, we propose a novel prediction model called RP-FFT that merges the Random Projection (RP) model and Fast Fourier Transform (FFT) for detecting SIPs. First, each protein sequence was transformed into a Position Specific Scoring Matrix (PSSM) using the Position Specific Iterated BLAST (PSI-BLAST). Second, the features of protein sequences were extracted by the FFT method on PSSM. Lastly, we evaluated the performance of RP-FFT and compared the RP classifier with the state-of-the-art support vector machine (SVM) classifier and other existing methods on the human and yeast datasets; after the five-fold cross-validation, the RP-FFT model can obtain high average accuracies of 96.28% and 91.87% on the human and yeast datasets, respectively. The experimental results demonstrated that our RP-FFT prediction model is reasonable and robust.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Li-Ping Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| | - Yan-Bin Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| | - Leon Wong
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Hai-Cheng Yi
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
14
|
An JY, Zhou Y, Zhang L, Niu Q, Wang DF. Improving Self-interacting Proteins Prediction Accuracy Using Protein Evolutionary Information and Weighed-Extreme Learning Machine. Curr Bioinform 2019. [DOI: 10.2174/1574893613666180209161152] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Background:
Self Interacting Proteins (SIPs) play an essential role in various aspects of the
structural and functional organization of the cell.
Objective:
In the study, we presented a novelty sequence-based computational approach for predicting
Self-interacting proteins using Weighed-Extreme Learning Machine (WELM) model combined with an
Autocorrelation (AC) descriptor protein feature representation.
Method:
The major advantage of the proposed method mainly lies in adopting an effective feature
extraction method to represent candidate self-interacting proteins by using the evolutionary information
embedded in PSI-BLAST-constructed Position Specific Scoring Matrix (PSSM); and then employing a
reliable and effective WELM classifier to perform classify.
</P><P>
Result: In order to evaluate the performance, the proposed approach is applied to yeast and human SIP
datasets. The experimental results show that our method obtained 93.43% and 98.15% prediction
accuracies on yeast and human dataset, respectively. Extensive experiments are carried out to compare
our approach with the SVM classifier and existing sequence-based method on yeast and human dataset.
Experimental results show that the performance of our method is better than several other state-of-theart
methods.
Conclusion:
It is demonstrated that the proposed method is suitable for SIPs detection and can execute
incredibly well for identifying Sips. In order to facilitate extensive studies for future proteomics
research, we developed a freely available web server called WELM-AC-SIPs in Hypertext Preprocessor
(PHP) for predicting SIPs. The web server including source code and the datasets are available at
http://219.219.62.123:8888/WELMAC/.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 21116, China
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 21116, China
| | - Lei Zhang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 21116, China
| | - Qiang Niu
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 21116, China
| | - Da-Fu Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 21116, China
| |
Collapse
|
15
|
Wang YB, You ZH, Li X, Jiang TH, Cheng L, Chen ZH. Prediction of protein self-interactions using stacked long short-term memory from protein sequences information. BMC SYSTEMS BIOLOGY 2018; 12:129. [PMID: 30577794 PMCID: PMC6302371 DOI: 10.1186/s12918-018-0647-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
BACKGROUND Self-interacting Proteins (SIPs) plays a critical role in a series of life function in most living cells. Researches on SIPs are important part of molecular biology. Although numerous SIPs data be provided, traditional experimental methods are labor-intensive, time-consuming and costly and can only yield limited results in real-world needs. Hence,it's urgent to develop an efficient computational SIPs prediction method to fill the gap. Deep learning technologies have proven to produce subversive performance improvements in many areas, but the effectiveness of deep learning methods for SIPs prediction has not been verified. RESULTS We developed a deep learning model for predicting SIPs by constructing a Stacked Long Short-Term Memory (SLSTM) neural network that contains "dropout". We extracted features from protein sequences using a novel feature extraction scheme that combined Zernike Moments (ZMs) with Position Specific Weight Matrix (PSWM). The capability of the proposed approach was assessed on S.erevisiae and Human SIPs datasets. The result indicates that the approach based on deep learning can effectively resist data skew and achieve good accuracies of 95.69 and 97.88%, respectively. To demonstrate the progressiveness of deep learning, we compared the results of the SLSTM-based method and the celebrated Support Vector Machine (SVM) method and several other well-known methods on the same datasets. CONCLUSION The results show that our method is overall superior to any of the other existing state-of-the-art techniques. As far as we know, this study first applies deep learning method to predict SIPs, and practical experimental results reveal its potential in SIPs identification.
Collapse
Affiliation(s)
- Yan-Bin Wang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
| | - Xiao Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
| | - Tong-Hai Jiang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
| | - Li Cheng
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
| | - Zhan-Heng Chen
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011 China
- University of Chinese Academy of Sciences, Beijing, 100049 China
| |
Collapse
|
16
|
Wang YB, You ZH, Li LP, Huang DS, Zhou FF, Yang S. Improving Prediction of Self-interacting Proteins Using Stacked Sparse Auto-Encoder with PSSM profiles. Int J Biol Sci 2018; 14:983-991. [PMID: 29989064 PMCID: PMC6036743 DOI: 10.7150/ijbs.23817] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Accepted: 03/29/2018] [Indexed: 12/05/2022] Open
Abstract
Self-interacting proteins (SIPs) play a significant role in the execution of most important molecular processes in cells, such as signal transduction, gene expression regulation, immune response and enzyme activation. Although the traditional experimental methods can be used to generate SIPs data, it is very expensive and time-consuming based only on biological technique. Therefore, it is important and urgent to develop an efficient computational method for SIPs detection. In this study, we present a novel SIPs identification method based on machine learning technology by combing the Zernike Moments (ZMs) descriptor on Position Specific Scoring Matrix (PSSM) with Probabilistic Classification Vector Machines (PCVM) and Stacked Sparse Auto-Encoder (SSAE). More specifically, an efficient feature extraction technique called ZMs is firstly utilized to generate feature vectors on Position Specific Scoring Matrix (PSSM); Then, Deep neural network is employed for reducing the feature dimensions and noise; Finally, the Probabilistic Classification Vector Machine is used to execute the classification. The prediction performance of the proposed method is evaluated on S.erevisiae and Human SIPs datasets via cross-validation. The experimental results indicate that the proposed method can achieve good accuracies of 92.55% and 97.47%, respectively. To further evaluate the advantage of our scheme for SIPs prediction, we also compared the PCVM classifier with the Support Vector Machine (SVM) and other existing techniques on the same data sets. Comparison results reveal that the proposed strategy is outperforms other methods and could be a used tool for identifying SIPs.
Collapse
Affiliation(s)
- Yan-Bin Wang
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Caoan Road 4800, Shanghai 201804, China
| | - Feng-Feng Zhou
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Shan Yang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| |
Collapse
|
17
|
Li JQ, You ZH, Li X, Ming Z, Chen X. PSPEL: In Silico Prediction of Self-Interacting Proteins from Amino Acids Sequences Using Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1165-1172. [PMID: 28092572 DOI: 10.1109/tcbb.2017.2649529] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Self interacting proteins (SIPs) play an important role in various aspects of the structural and functional organization of the cell. Detecting SIPs is one of the most important issues in current molecular biology. Although a large number of SIPs data has been generated by experimental methods, wet laboratory approaches are both time-consuming and costly. In addition, they yield high false negative and positive rates. Thus, there is a great need for in silico methods to predict SIPs accurately and efficiently. In this study, a new sequence-based method is proposed to predict SIPs. The evolutionary information contained in Position-Specific Scoring Matrix (PSSM) is extracted from of protein with known sequence. Then, features are fed to an ensemble classifier to distinguish the self-interacting and non-self-interacting proteins. When performed on Saccharomyces cerevisiae and Human SIPs data sets, the proposed method can achieve high accuracies of 86.86 and 91.30 percent, respectively. Our method also shows a good performance when compared with the SVM classifier and previous methods. Consequently, the proposed method can be considered to be a novel promising tool to predict SIPs.
Collapse
|
18
|
An JY, Zhang L, Zhou Y, Zhao YJ, Wang DF. Computational methods using weighed-extreme learning machine to predict protein self-interactions with protein evolutionary information. J Cheminform 2017; 9:47. [PMID: 29086182 PMCID: PMC5561767 DOI: 10.1186/s13321-017-0233-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 08/05/2017] [Indexed: 02/07/2023] Open
Abstract
Self-interactions Proteins (SIPs) is important for their biological activity owing to the inherent interaction amongst their secondary structures or domains. However, due to the limitations of experimental Self-interactions detection, one major challenge in the study of prediction SIPs is how to exploit computational approaches for SIPs detection based on evolutionary information contained protein sequence. In the work, we presented a novel computational approach named WELM-LAG, which combined the Weighed-Extreme Learning Machine (WELM) classifier with Local Average Group (LAG) to predict SIPs based on protein sequence. The major improvement of our method lies in presenting an effective feature extraction method used to represent candidate Self-interactions proteins by exploring the evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix (PSSM); and then employing a reliable and robust WELM classifier to carry out classification. In addition, the Principal Component Analysis (PCA) approach is used to reduce the impact of noise. The WELM-LAG method gave very high average accuracies of 92.94 and 96.74% on yeast and human datasets, respectively. Meanwhile, we compared it with the state-of-the-art support vector machine (SVM) classifier and other existing methods on human and yeast datasets, respectively. Comparative results indicated that our approach is very promising and may provide a cost-effective alternative for predicting SIPs. In addition, we developed a freely available web server called WELM-LAG-SIPs to predict SIPs. The web server is available at http://219.219.62.123:8888/WELMLAG/ .
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Lei Zhang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Yu-Jun Zhao
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| | - Da-Fu Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116 Jiangsu China
| |
Collapse
|
19
|
Highly accurate prediction of protein self-interactions by incorporating the average block and PSSM information into the general PseAAC. J Theor Biol 2017; 432:80-86. [PMID: 28802824 DOI: 10.1016/j.jtbi.2017.08.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 08/05/2017] [Accepted: 08/08/2017] [Indexed: 11/23/2022]
Abstract
It is a challenging task for fundamental research whether proteins can interact with their partners. Protein self-interaction (SIP) is a special case of PPIs, which plays a key role in the regulation of cellular functions. Due to the limitations of experimental self-interaction identification, it is very important to develop an effective biological tool for predicting SIPs based on protein sequences. In the study, we developed a novel computational method called RVM-AB that combines the Relevance Vector Machine (RVM) model and Average Blocks (AB) for detecting SIPs from protein sequences. Firstly, Average Blocks (AB) feature extraction method is employed to represent protein sequences on a Position Specific Scoring Matrix (PSSM). Secondly, Principal Component Analysis (PCA) method is used to reduce the dimension of AB vector for reducing the influence of noise. Then, by employing the Relevance Vector Machine (RVM) algorithm, the performance of RVM-AB is assessed and compared with the state-of-the-art support vector machine (SVM) classifier and other exiting methods on yeast and human datasets respectively. Using the fivefold test experiment, RVM-AB model achieved very high accuracies of 93.01% and 97.72% on yeast and human datasets respectively, which are significantly better than the method based on SVM classifier and other previous methods. The experimental results proved that the RVM-AB prediction model is efficient and robust. It can be an automatic decision support tool for detecting SIPs. For facilitating extensive studies for future proteomics research, the RVMAB server is freely available for academic use at http://219.219.62.123:8888/SIP_AB.
Collapse
|
20
|
An JY, You ZH, Chen X, Huang DS, Yan G, Wang DF. Robust and accurate prediction of protein self-interactions from amino acids sequence using evolutionary information. MOLECULAR BIOSYSTEMS 2017; 12:3702-3710. [PMID: 27759121 DOI: 10.1039/c6mb00599c] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Self-interacting proteins (SIPs) play an essential role in cellular functions and the evolution of protein interaction networks (PINs). Due to the limitations of experimental self-interaction proteins detection technology, it is a very important task to develop a robust and accurate computational approach for SIPs prediction. In this study, we propose a novel computational method for predicting SIPs from protein amino acids sequence. Firstly, a novel feature representation scheme based on Local Binary Pattern (LBP) is developed, in which the evolutionary information, in the form of multiple sequence alignments, is taken into account. Then, by employing the Relevance Vector Machine (RVM) classifier, the performance of our proposed method is evaluated on yeast and human datasets using a five-fold cross-validation test. The experimental results show that the proposed method can achieve high accuracies of 94.82% and 97.28% on yeast and human datasets, respectively. For further assessing the performance of our method, we compared it with the state-of-the-art Support Vector Machine (SVM) classifier, and other existing methods, on the same datasets. Comparison results demonstrate that the proposed method is very promising and could provide a cost-effective alternative for predicting SIPs. In addition, to facilitate extensive studies for future proteomics research, a web server is freely available for academic use at .
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| | - Zhu-Hong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Ürümqi 830011, China.
| | - Xing Chen
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China.
| | - De-Shuang Huang
- School of Electronics and Information Engineering, Tongji University, Shanghai, 201804, China
| | - Guiying Yan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Da-Fu Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| |
Collapse
|
21
|
Traustadóttir GÁ, Jensen CH, Garcia Ramirez JJ, Beck HC, Sheikh SP, Andersen DC. The non-canonical NOTCH1 ligand Delta-like 1 homolog (DLK1) self interacts in mammals. Int J Biol Macromol 2017; 97:460-467. [DOI: 10.1016/j.ijbiomac.2017.01.067] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Revised: 01/12/2017] [Accepted: 01/13/2017] [Indexed: 12/11/2022]
|
22
|
An JY, You ZH, Chen X, Huang DS, Li ZW, Liu G, Wang Y. Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix. Oncotarget 2016; 7:82440-82449. [PMID: 27732957 PMCID: PMC5347703 DOI: 10.18632/oncotarget.12517] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 09/28/2016] [Indexed: 01/31/2023] Open
Abstract
Self-interacting Proteins (SIPs) play an essential role in a wide range of biological processes, such as gene expression regulation, signal transduction, enzyme activation and immune response. Because of the limitations for experimental self-interaction proteins identification, developing an effective computational method based on protein sequence to detect SIPs is much important. In the study, we proposed a novel computational approach called RVMBIGP that combines the Relevance Vector Machine (RVM) model and Bi-gram probability (BIGP) to predict SIPs based on protein sequence. The proposed prediction model includes as following steps: (1) an effective feature extraction method named BIGP is used to represent protein sequences on Position Specific Scoring Matrix (PSSM); (2) Principal Component Analysis (PCA) method is employed for integrating the useful information and reducing the influence of noise; (3) the robust classifier Relevance Vector Machine (RVM) is used to carry out classification. When performed on yeast and human datasets, the proposed RVMBIGP model can achieve very high accuracies of 95.48% and 98.80%, respectively. The experimental results show that our proposed method is very promising and may provide a cost-effective alternative for SIPs identification. In addition, to facilitate extensive studies for future proteomics research, the RVMBIGP server is freely available for academic use at http://219.219.62.123:8888/RVMBIGP.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Xing Chen
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - De-Shuang Huang
- School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| | - Gang Liu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Yin Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
| |
Collapse
|
23
|
SPAR: a random forest-based predictor for self-interacting proteins with fine-grained domain information. Amino Acids 2016; 48:1655-65. [PMID: 27074717 DOI: 10.1007/s00726-016-2226-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 03/30/2016] [Indexed: 02/07/2023]
Abstract
Protein self-interaction, i.e. the interaction between two or more identical proteins expressed by one gene, plays an important role in the regulation of cellular functions. Considering the limitations of experimental self-interaction identification, it is necessary to design specific bioinformatics tools for self-interacting protein (SIP) prediction from protein sequence information. In this study, we proposed an improved computational approach for SIP prediction, termed SPAR (Self-interacting Protein Analysis serveR). Firstly, we developed an improved encoding scheme named critical residues substitution (CRS), in which the fine-grained domain-domain interaction information was taken into account. Then, by employing the Random Forest algorithm, the performance of CRS was evaluated and compared with several other encoding schemes commonly used for sequence-based protein-protein interaction prediction. Through the tenfold cross-validation tests on a balanced training dataset, CRS performed the best, with the average accuracy up to 72.01 %. We further integrated CRS with other encoding schemes and identified the most important features using the mRMR (the minimum redundancy maximum relevance) feature selection method. Our SPAR model with selected features achieved an average accuracy of 92.09 % on the human-independent test set (the ratio of positives to negatives was about 1:11). Besides, we also evaluated the performance of SPAR on an independent yeast test set (the ratio of positives to negatives was about 1:8) and obtained an average accuracy of 76.96 %. The results demonstrate that SPAR is capable of achieving a reasonable performance in cross-species application. The SPAR server is freely available for academic use at http://systbio.cau.edu.cn/zzdlab/spar/ .
Collapse
|
24
|
Liu Z, Guo F, Gu J, Wang Y, Li Y, Wang D, Lu L, Li D, He F. Similarity-based prediction for Anatomical Therapeutic Chemical classification of drugs by integrating multiple data sources. Bioinformatics 2015; 31:1788-95. [DOI: 10.1093/bioinformatics/btv055] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 01/26/2015] [Indexed: 11/13/2022] Open
|
25
|
Li N, Xu Z, Zhai L, Li Y, Fan F, Zheng J, Xu P, He F. Rapid development of proteomics in China: from the perspective of the Human Liver Proteome Project and technology development. SCIENCE CHINA. LIFE SCIENCES 2014; 57:1162-1171. [PMID: 25119674 DOI: 10.1007/s11427-014-4714-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 07/01/2014] [Indexed: 12/17/2022]
Abstract
Proteomics focuses on the systematic identification and quantification of entire proteomes and interpretation of proteins' biological functions. During the last decade, proteomics in China has grown much faster than other research fields in the life sciences. At the beginning of the second decade of the 21(st) century, the rapid development of high-resolution and high-speed mass spectrometry makes proteomics a powerful tool to study the mechanisms underlying physiological/pathological processes in organisms. This article provides a brief overview of proteomics technology development and representative scientific progress of the Human Liver Proteome Project (HLPP) in China over the past three years.
Collapse
Affiliation(s)
- Ning Li
- State Key Laboratory of Proteomics, National Engineering Research Center for Protein Drugs, Beijing Proteome Research Center, National Center for Protein Sciences Beijing, Beijing Institute of Radiation Medicine, Beijing, 102206, China
| | | | | | | | | | | | | | | |
Collapse
|