1
|
Emami N, Ferdousi R. HormoNet: a deep learning approach for hormone-drug interaction prediction. BMC Bioinformatics 2024; 25:87. [PMID: 38418979 PMCID: PMC10903040 DOI: 10.1186/s12859-024-05708-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 02/16/2024] [Indexed: 03/02/2024] Open
Abstract
Several experimental evidences have shown that the human endogenous hormones can interact with drugs in many ways and affect drug efficacy. The hormone drug interactions (HDI) are essential for drug treatment and precision medicine; therefore, it is essential to understand the hormone-drug associations. Here, we present HormoNet to predict the HDI pairs and their risk level by integrating features derived from hormone and drug target proteins. To the best of our knowledge, this is one of the first attempts to employ deep learning approach for prediction of HDI prediction. Amino acid composition and pseudo amino acid composition were applied to represent target information using 30 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied synthetic minority over-sampling technique technique. Additionally, we constructed novel datasets for HDI prediction and the risk level of their interaction. HormoNet achieved high performance on our constructed hormone-drug benchmark datasets. The results provide insights into the understanding of the relationship between hormone and a drug, and indicate the potential benefit of reducing risk levels of interactions in designing more effective therapies for patients in drug treatments. Our benchmark datasets and the source codes for HormoNet are available in: https://github.com/EmamiNeda/HormoNet .
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
2
|
Pesaranghader A, Matwin S, Sokolova M, Grenier JC, Beiko RG, Hussin J. OUP accepted manuscript. Bioinformatics 2022; 38:3051-3061. [PMID: 35536192 PMCID: PMC9154256 DOI: 10.1093/bioinformatics/btac304] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 02/12/2022] [Indexed: 11/24/2022] Open
Abstract
Motivation There is a plethora of measures to evaluate functional similarity (FS) of genes based on their co-expression, protein–protein interactions and sequence similarity. These measures are typically derived from hand-engineered and application-specific metrics to quantify the degree of shared information between two genes using their Gene Ontology (GO) annotations. Results We introduce deepSimDEF, a deep learning method to automatically learn FS estimation of gene pairs given a set of genes and their GO annotations. deepSimDEF’s key novelty is its ability to learn low-dimensional embedding vector representations of GO terms and gene products and then calculate FS using these learned vectors. We show that deepSimDEF can predict the FS of new genes using their annotations: it outperformed all other FS measures by >5–10% on yeast and human reference datasets on protein–protein interactions, gene co-expression and sequence homology tasks. Thus, deepSimDEF offers a powerful and adaptable deep neural architecture that can benefit a wide range of problems in genomics and proteomics, and its architecture is flexible enough to support its extension to any organism. Availability and implementation Source code and data are available at https://github.com/ahmadpgh/deepSimDEF Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Stan Matwin
- Faculty of Computer Science, Dalhousie University, Halifax B3H 4R2, Canada
- Institute for Big Data Analytics, Dalhousie University, Halifax B3H 4R2, Canada
- Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
| | - Marina Sokolova
- Institute for Big Data Analytics, Dalhousie University, Halifax B3H 4R2, Canada
- Faculty of Medicine and Faculty of Engineering, University of Ottawa, Ottawa K1H 8M5, Canada
| | | | - Robert G Beiko
- Faculty of Computer Science, Dalhousie University, Halifax B3H 4R2, Canada
- Institute for Big Data Analytics, Dalhousie University, Halifax B3H 4R2, Canada
| | | |
Collapse
|
3
|
Zhang Y, Jiang Z, Chen C, Wei Q, Gu H, Yu B. DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier. Interdiscip Sci 2021; 14:311-330. [PMID: 34731411 DOI: 10.1007/s12539-021-00488-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 12/12/2022]
Abstract
Accurate prediction of drug-target interactions (DTIs), which is often used in the fields of drug discovery and drug repositioning, is regarded a key challenge in the study of drug science. In this paper, a new method called DeepStack-DTIs is proposed to predict DTIs. First, for the target protein, pseudo-position specific score matrix, pseudo amino acid composition and SPIDER3 are used to extract the different feature information of the target protein. Meanwhile, the path-based fingerprint features of each drug are extracted. Then, the synthetic minority oversampling technique (SMOTE) and light gradient boosting machine (LightGBM) are used for data balancing and feature selection, respectively. Finally, the processed features are input to the deep-stacked ensemble classifier composed of gated recurrent unit (GRU), deep neural network (DNN), support vector machine (SVM), eXtreme gradient boosting (XGBoost) and logistic regression (LR) to predict DTIs. Under the five-fold cross-validation and compared with existing methods, the proposed method achieves higher prediction accuracy on the gold standard dataset. To evaluate the predictive power of DeepStack-DTIs, we validate the method on another dataset and predict the drug-target interaction network. The results indicate that DeepStack-DTIs has excellent predictive ability than the other methods, and provides novel insights for the prediction of DTIs. A novel method DeepStack-DTIs for drug-target interactions prediction. PsePSSM, PseAAC, SPIDER3 and FP2 are fused to convert protein sequence and drug molecule information into digital information, respectively. The SMOTE algorithm is used to balance the dataset and LightGBM feature selection algorithm is employed to remove redundant and irrelevant features to select the optimal feature subset. This optimal feature subset is inputted into the deep-stacked ensemble classifier to predict drug-target interactions. The experimental results show DeepStack-DTIs method can significantly improve the prediction accuracy of drug-target interactions.
Collapse
Affiliation(s)
- Yan Zhang
- College of Mechanical and Electrical Engineering, Qingdao University of Science and Technology, Qingdao, 266061, China.,College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Zhiwen Jiang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Cheng Chen
- School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Qinqin Wei
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Haiming Gu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| |
Collapse
|
4
|
Liu Y, Jin S, Song L, Han Y, Yu B. Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier. J Mol Graph Model 2021; 107:107962. [PMID: 34198216 DOI: 10.1016/j.jmgm.2021.107962] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 05/03/2021] [Accepted: 06/02/2021] [Indexed: 01/29/2023]
Abstract
Ubiquitination is a common and reversible post-translational protein modification that regulates apoptosis and plays an important role in protein degradation and cell diseases. However, experimental identification of protein ubiquitination sites is usually time-consuming and labor-intensive, so it is necessary to establish effective predictors. In this study, we propose a ubiquitination sites prediction method based on multi-view features, namely UbiSite-XGBoost. Firstly, we use seven single-view features encoding methods to convert protein sequence fragments into digital information. Secondly, the least absolute shrinkage and selection operator (LASSO) is applied to remove the redundant information and get the optimal feature subsets. Finally, these features are inputted into the eXtreme gradient boosting (XGBoost) classifier to predict ubiquitination sites. Five-fold cross-validation shows that the AUC values of Set1-Set6 datasets are 0.8258, 0.7592, 0.7853, 0.8345, 0.8979 and 0.8901, respectively. The synthetic minority oversampling technique (SMOTE) is employed in Set4-Set6 unbalanced datasets, and the AUC values are 0.9777, 0.9782 and 0.9860, respectively. In addition, we have constructed three independent test datasets which the AUC values are 0.8007, 0.6897 and 0.7280, respectively. The results show that the proposed method UbiSite-XGBoost is superior to other ubiquitination prediction methods and it provides new guidance for the identification of ubiquitination sites. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/UbiSite-XGBoost/.
Collapse
Affiliation(s)
- Yushuang Liu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Shuping Jin
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Lili Song
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Yu Han
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| |
Collapse
|
5
|
Zervou MA, Doutsi E, Pavlidis P, Tsakalides P. Structural classification of proteins based on the computationally efficient recurrence quantification analysis and horizontal visibility graphs. Bioinformatics 2021; 37:1796-1804. [PMID: 34048559 DOI: 10.1093/bioinformatics/btab407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 04/13/2021] [Accepted: 05/27/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein structural class prediction is one of the most significant problems in bioinformatics, as it has a prominent role in understanding the function and evolution of proteins. Designing a computationally efficient but at the same time accurate prediction method remains a pressing issue, especially for sequences that we cannot obtain a sufficient amount of homologous information from existing protein sequence databases. Several studies demonstrate the potential of utilizing chaos game representation (CGR) along with time series analysis tools such as recurrence quantification analysis (RQA), complex networks, horizontal visibility graphs (HVG) and others. However, the majority of existing works involve a large amount of features and they require an exhaustive, time consuming search of the optimal parameters. To address the aforementioned problems, this work adopts the generalized multidimensional recurrence quantification analysis (GmdRQA) as an efficient tool that enables to process concurrently a multidimensional time series and reduce the number of features. In addition, two data-driven algorithms, namely average mutual information (AMI) and false nearest neighbors (FNN), are utilized to define in a fast yet precise manner the optimal GmdRQA parameters. RESULTS The classification accuracy is improved by the combination of GmdRQA with the HVG. Experimental evaluation on a real benchmark dataset demonstrates that our methods achieve similar performance with the state-of-the-art but with a smaller computational cost. AVAILABILITY The code to reproduce all the results is available at https://github.com/aretiz/protein_structure_classification/tree/main. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michaela Areti Zervou
- Department of Computer Science, University of Crete, Heraklion, 700 13, Greece.,Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, 700 13, Greece
| | - Effrosyni Doutsi
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, 700 13, Greece
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, 700 13, Greece
| | - Panagiotis Tsakalides
- Department of Computer Science, University of Crete, Heraklion, 700 13, Greece.,Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, 700 13, Greece
| |
Collapse
|
6
|
Recent Advances in the Prediction of Protein Structural Classes: Feature Descriptors and Machine Learning Algorithms. CRYSTALS 2021. [DOI: 10.3390/cryst11040324] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
In the postgenomic age, rapid growth in the number of sequence-known proteins has been accompanied by much slower growth in the number of structure-known proteins (as a result of experimental limitations), and a widening gap between the two is evident. Because protein function is linked to protein structure, successful prediction of protein structure is of significant importance in protein function identification. Foreknowledge of protein structural class can help improve protein structure prediction with significant medical and pharmaceutical implications. Thus, a fast, suitable, reliable, and reasonable computational method for protein structural class prediction has become pivotal in bioinformatics. Here, we review recent efforts in protein structural class prediction from protein sequence, with particular attention paid to new feature descriptors, which extract information from protein sequence, and the use of machine learning algorithms in both feature selection and the construction of new classification models. These new feature descriptors include amino acid composition, sequence order, physicochemical properties, multiprofile Bayes, and secondary structure-based features. Machine learning methods, such as artificial neural networks (ANNs), support vector machine (SVM), K-nearest neighbor (KNN), random forest, deep learning, and examples of their application are discussed in detail. We also present our view on possible future directions, challenges, and opportunities for the applications of machine learning algorithms for prediction of protein structural classes.
Collapse
|
7
|
Emami N, Ferdousi R. AptaNet as a deep learning approach for aptamer-protein interaction prediction. Sci Rep 2021; 11:6074. [PMID: 33727685 PMCID: PMC7971039 DOI: 10.1038/s41598-021-85629-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 03/03/2021] [Indexed: 02/08/2023] Open
Abstract
Aptamers are short oligonucleotides (DNA/RNA) or peptide molecules that can selectively bind to their specific targets with high specificity and affinity. As a powerful new class of amino acid ligands, aptamers have high potentials in biosensing, therapeutic, and diagnostic fields. Here, we present AptaNet-a new deep neural network-to predict the aptamer-protein interaction pairs by integrating features derived from both aptamers and the target proteins. Aptamers were encoded by using two different strategies, including k-mer and reverse complement k-mer frequency. Amino acid composition (AAC) and pseudo amino acid composition (PseAAC) were applied to represent target information using 24 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied a neighborhood cleaning algorithm. The predictor was constructed based on a deep neural network, and optimal features were selected using the random forest algorithm. As a result, 99.79% accuracy was achieved for the training dataset, and 91.38% accuracy was obtained for the testing dataset. AptaNet achieved high performance on our constructed aptamer-protein benchmark dataset. The results indicate that AptaNet can help identify novel aptamer-protein interacting pairs and build more-efficient insights into the relationship between aptamers and proteins. Our benchmark dataset and the source codes for AptaNet are available in: https://github.com/nedaemami/AptaNet .
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
8
|
Abdennaji I, Zaied M, Girault JM. Prediction of protein structural class based on symmetrical recurrence quantification analysis. Comput Biol Chem 2021; 92:107450. [PMID: 33631460 DOI: 10.1016/j.compbiolchem.2021.107450] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 02/03/2021] [Indexed: 11/19/2022]
Abstract
Protein structural class prediction for low similarity sequences is a significant challenge and one of the deeply explored subjects. This plays an important role in drug design, folding recognition of protein, functional analysis and several other biology applications. In this paper, we worked with two benchmark databases existing in the literature (1) 25PDB and (2) 1189 to apply our proposed method for predicting protein structural class. Initially, we transformed protein sequences into DNA sequences and then into binary sequences. Furthermore, we applied symmetrical recurrence quantification analysis (the new approach), where we got 8 features from each symmetry plot computation. Moreover, the machine learning algorithms such as Linear Discriminant Analysis (LDA), Random Forest (RF) and Support Vector Machine (SVM) are used. In addition, comparison was made to find the best classifier for protein structural class prediction. Results show that symmetrical recurrence quantification as feature extraction method with RF classifier outperformed existing methods with an overall accuracy of 100% without overfitting.
Collapse
Affiliation(s)
- Ines Abdennaji
- Research Team in Intelligent Machines, National School of Engineers of Gabes, B.P. W, 6072 Gabes, Tunisia; GSII ESEO - LAUM UMR CNRS 6613, 49000 Angers, France.
| | - Mourad Zaied
- Research Team in Intelligent Machines, National School of Engineers of Gabes, B.P. W, 6072 Gabes, Tunisia; GSII ESEO - LAUM UMR CNRS 6613, 49000 Angers, France
| | - Jean-Marc Girault
- Research Team in Intelligent Machines, National School of Engineers of Gabes, B.P. W, 6072 Gabes, Tunisia; GSII ESEO - LAUM UMR CNRS 6613, 49000 Angers, France
| |
Collapse
|
9
|
Li J, Wang B, He Y, Wen L, Nan H, Zheng F, Liu H, Lu S, Wu M, Zhang H. A review of the interaction between anthocyanins and proteins. FOOD SCI TECHNOL INT 2020; 27:470-482. [PMID: 33059464 DOI: 10.1177/1082013220962613] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Anthocyanins have good physiological functions, but they are unstable. The interaction between anthocyanins and proteins can improve the stability, nutritional and functional properties of the complex. This paper reviews the structural changes of complex of anthocyanins interacting with proteins from different sources. By circular dichroism (CD) spectroscopy, it was found that the contents of α-helix (from 15.90%-42.40% to 17.60%-52.80%) or β-sheet (from 29.00%-50.00% to 29.40%-57.00%) of the anthocyanins-proteins complex increased. Fourier transform infrared spectroscopy showed that the regions of amide I (from 1627.87-1641.41 cm-1 to 1643.34-1651.02 cm-1) and amide II (from 1537.00-1540.25 cm-1 to 1539.00-1543.75 cm-1) of anthocyanins-proteins complex were shifted. Fluorescence spectroscopy showed that the fluorescence intensity of the complex decreased from 150-5100 to 40-3900 a.u. The thermodynamic analysis showed that there were hydrophobic interactions, electrostatic and hydrogen bonding interactions between anthocyanins and proteins. The kinetic analysis showed that the half-life and activation energy of the complex increased. The stability, antioxidant, digestion, absorption, and emulsification of the complex were improved. This provides a reference for the study and application of anthocyanins and proteins interactions.
Collapse
Affiliation(s)
- Jia Li
- College of Food Science and Engineering, Jilin Agricultural University, Changchun, China
| | - Bixiang Wang
- College of Food Science and Engineering, Jilin Agricultural University, Changchun, China
| | - Yang He
- College of Food Science and Engineering, Jilin Agricultural University, Changchun, China
| | - Liankui Wen
- College of Food Science and Engineering, Jilin Agricultural University, Changchun, China
| | - Hailong Nan
- Vitis amurensis Rupr, Industry Service Center of Liuhe County, Tonghua, China
| | - Fei Zheng
- Jilin Ginseng Academy, Changchun University of Chinese Medicine, Changchun, China
| | - He Liu
- College of Food Science and Engineering, Jilin Agricultural University, Changchun, China
| | - Siyan Lu
- College of Food Science and Engineering, Jilin Agricultural University, Changchun, China
| | - Manyu Wu
- College of Food Science and Engineering, Jilin Agricultural University, Changchun, China
| | - Haoran Zhang
- College of Food Science and Engineering, Jilin Agricultural University, Changchun, China
| |
Collapse
|
10
|
Chen C, Zhang Q, Yu B, Yu Z, Lawrence PJ, Ma Q, Zhang Y. Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput Biol Med 2020; 123:103899. [DOI: 10.1016/j.compbiomed.2020.103899] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 06/28/2020] [Accepted: 06/28/2020] [Indexed: 10/23/2022]
|
11
|
Abstract
During the last three decades or so, many efforts have been made to study the protein cleavage
sites by some disease-causing enzyme, such as HIV (Human Immunodeficiency Virus) protease
and SARS (Severe Acute Respiratory Syndrome) coronavirus main proteinase. It has become increasingly
clear <i>via</i> this mini-review that the motivation driving the aforementioned studies is quite wise,
and that the results acquired through these studies are very rewarding, particularly for developing peptide
drugs.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
12
|
|
13
|
Wang M, Cui X, Yu B, Chen C, Ma Q, Zhou H. SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-04792-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
14
|
Apurva M, Mazumdar H. Predicting structural class for protein sequences of 40% identity based on features of primary and secondary structure using Random Forest algorithm. Comput Biol Chem 2020; 84:107164. [DOI: 10.1016/j.compbiolchem.2019.107164] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 10/25/2019] [Accepted: 11/10/2019] [Indexed: 02/08/2023]
|
15
|
Prediction of Extracellular Matrix Proteins by Fusing Multiple Feature Information, Elastic Net, and Random Forest Algorithm. MATHEMATICS 2020. [DOI: 10.3390/math8020169] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Extracellular matrix (ECM) proteins play an important role in a series of biological processes of cells. The study of ECM proteins is helpful to further comprehend their biological functions. We propose ECMP-RF (extracellular matrix proteins prediction by random forest) to predict ECM proteins. Firstly, the features of the protein sequence are extracted by combining encoding based on grouped weight, pseudo amino-acid composition, pseudo position-specific scoring matrix, a local descriptor, and an autocorrelation descriptor. Secondly, the synthetic minority oversampling technique (SMOTE) algorithm is employed to process the class imbalance data, and the elastic net (EN) is used to reduce the dimension of the feature vectors. Finally, the random forest (RF) classifier is used to predict the ECM proteins. Leave-one-out cross-validation shows that the balanced accuracy of the training and testing datasets is 97.3% and 97.9%, respectively. Compared with other state-of-the-art methods, ECMP-RF is significantly better than other predictors.
Collapse
|
16
|
Jiang Z, Wang D, Wu P, Chen Y, Shang H, Wang L, Xie H. Predicting subcellular localization of multisite proteins using differently weighted multi-label k-nearest neighbors sets. Technol Health Care 2020; 27:185-193. [PMID: 31045538 PMCID: PMC6598103 DOI: 10.3233/thc-199018] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
BACKGROUND: For a protein to execute its function, ensuring its correct subcellular localization is essential. In addition to biological experiments, bioinformatics is widely used to predict and determine the subcellular localization of proteins. However, single-feature extraction methods cannot effectively handle the huge amount of data and multisite localization of proteins. Thus, we developed a pseudo amino acid composition (PseAAC) method and an entropy density technique to extract feature fusion information from subcellular multisite proteins. OBJECTIVE: Predicting multiplex protein subcellular localization and achieve high prediction accuracy. METHOD: To improve the efficiency of predicting multiplex protein subcellular localization, we used the multi-label k-nearest neighbors algorithm and assigned different weights to various attributes. The method was evaluated using several performance metrics with a dataset consisting of protein sequences with single-site and multisite subcellular localizations. RESULTS: Evaluation experiments showed that the proposed method significantly improves the optimal overall accuracy rate of multiplex protein subcellular localization. CONCLUSION: This method can help to more comprehensively predict protein subcellular localization toward better understanding protein function, thereby bridging the gap between theory and application toward improved identification and monitoring of drug targets.
Collapse
Affiliation(s)
- Zhongting Jiang
- School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China
| | - Dong Wang
- School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China.,CAS Key Laboratory of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu, China.,Key Laboratory of Medicinal Plant and Animal Resources of Qinghai-Tibet Plateau in Qinghai Province, Qinghai Normal University, Xining, Qinghai, China
| | - Peng Wu
- School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China
| | - Huijie Shang
- School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China
| | - Luyao Wang
- School of Information Science and Engineering, University of Jinan, Jinan, Shandong, China
| | - Huichun Xie
- Key Laboratory of Medicinal Plant and Animal Resources of Qinghai-Tibet Plateau in Qinghai Province, Qinghai Normal University, Xining, Qinghai, China
| |
Collapse
|
17
|
Some illuminating remarks on molecular genetics and genomics as well as drug development. Mol Genet Genomics 2020; 295:261-274. [PMID: 31894399 DOI: 10.1007/s00438-019-01634-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023]
Abstract
Facing the explosive growth of biological sequences unearthed in the post-genomic age, one of the most important but also most difficult problems in computational biology is how to express a biological sequence with a discrete model or a vector, but still keep it with considerable sequence-order information or its special pattern. To deal with such a challenging problem, the ideas of "pseudo amino acid components" and "pseudo K-tuple nucleotide composition" have been proposed. The ideas and their approaches have further stimulated the birth for "distorted key theory", "wenxing diagram", and substantially strengthening the power in treating the multi-label systems, as well as the establishment of the famous "5-steps rule". All these logic developments are quite natural that are very useful not only for theoretical scientists but also for experimental scientists in conducting genetics/genomics analysis and drug development. Presented in this review paper are also their future perspectives; i.e., their impacts will become even more significant and propounding.
Collapse
|
18
|
Shao YT, Liu XX, Lu Z, Chou KC. pLoc_Deep-mHum: Predict Subcellular Localization of Human Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.127042] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
19
|
Shao Y, Chou KC. pLoc_Deep-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.126034] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
20
|
Wang S, Wang X. Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion. BMC Bioinformatics 2019; 20:701. [PMID: 31874617 PMCID: PMC6929547 DOI: 10.1186/s12859-019-3276-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Protein structural class predicting is a heavily researched subject in bioinformatics that plays a vital role in protein functional analysis, protein folding recognition, rational drug design and other related fields. However, when traditional feature expression methods are adopted, the features usually contain considerable redundant information, which leads to a very low recognition rate of protein structural classes. RESULTS We constructed a prediction model based on wavelet denoising using different feature expression methods. A new fusion idea, first fuse and then denoise, is proposed in this article. Two types of pseudo amino acid compositions are utilized to distill feature vectors. Then, a two-dimensional (2-D) wavelet denoising algorithm is used to remove the redundant information from two extracted feature vectors. The two feature vectors based on parallel 2-D wavelet denoising are fused, which is known as PWD-FU-PseAAC. The related source codes are available at https://github.com/Xiaoheng-Wang12/Wang-xiaoheng/tree/master. CONCLUSIONS Experimental verification of three low-similarity datasets suggests that the proposed model achieves notably good results as regarding the prediction of protein structural classes.
Collapse
Affiliation(s)
- Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China.
| | - Xiaoheng Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China
| |
Collapse
|
21
|
Chou KC. Impacts of Pseudo Amino Acid Components and 5-steps Rule to Proteomics and Proteome Analysis. Curr Top Med Chem 2019; 19:2283-2300. [DOI: 10.2174/1568026619666191018100141] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 08/18/2019] [Accepted: 08/26/2019] [Indexed: 01/27/2023]
Abstract
Stimulated by the 5-steps rule during the last decade or so, computational proteomics has achieved remarkable progresses in the following three areas: (1) protein structural class prediction; (2) protein subcellular location prediction; (3) post-translational modification (PTM) site prediction. The results obtained by these predictions are very useful not only for an in-depth study of the functions of proteins and their biological processes in a cell, but also for developing novel drugs against major diseases such as cancers, Alzheimer’s, and Parkinson’s. Moreover, since the targets to be predicted may have the multi-label feature, two sets of metrics are introduced: one is for inspecting the global prediction quality, while the other for the local prediction quality. All the predictors covered in this review have a userfriendly web-server, through which the majority of experimental scientists can easily obtain their desired data without the need to go through the complicated mathematics.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
22
|
Chou KC. Advances in Predicting Subcellular Localization of Multi-label Proteins and its Implication for Developing Multi-target Drugs. Curr Med Chem 2019; 26:4918-4943. [PMID: 31060481 DOI: 10.2174/0929867326666190507082559] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 12/16/2022]
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
23
|
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
24
|
Chou KC. Proposing Pseudo Amino Acid Components is an Important Milestone for Proteome and Genome Analyses. Int J Pept Res Ther 2019. [DOI: 10.1007/s10989-019-09910-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
25
|
|
26
|
Lin J, Chen H, Li S, Liu Y, Li X, Yu B. Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier. Artif Intell Med 2019; 98:35-47. [DOI: 10.1016/j.artmed.2019.07.005] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Revised: 03/03/2019] [Accepted: 07/18/2019] [Indexed: 12/14/2022]
|
27
|
Tian B, Wu X, Chen C, Qiu W, Ma Q, Yu B. Predicting protein–protein interactions by fusing various Chou's pseudo components and using wavelet denoising approach. J Theor Biol 2019; 462:329-346. [DOI: 10.1016/j.jtbi.2018.11.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 11/08/2018] [Accepted: 11/15/2018] [Indexed: 12/26/2022]
|
28
|
Contreras-Torres E. Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou's PseAAC. J Theor Biol 2018; 454:139-145. [DOI: 10.1016/j.jtbi.2018.05.033] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Revised: 05/23/2018] [Accepted: 05/28/2018] [Indexed: 11/24/2022]
|
29
|
Sudha P, Ramyachitra D, Manikandan P. Enhanced Artificial Neural Network for Protein Fold Recognition and Structural Class Prediction. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.07.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
30
|
Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou's pseudo-amino acid composition. J Theor Biol 2018; 450:86-103. [DOI: 10.1016/j.jtbi.2018.04.026] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 04/10/2018] [Accepted: 04/16/2018] [Indexed: 01/16/2023]
|
31
|
Yu B, Li S, Qiu W, Wang M, Du J, Zhang Y, Chen X. Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction. BMC Genomics 2018; 19:478. [PMID: 29914358 PMCID: PMC6006758 DOI: 10.1186/s12864-018-4849-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 06/01/2018] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Apoptosis is associated with some human diseases, including cancer, autoimmune disease, neurodegenerative disease and ischemic damage, etc. Apoptosis proteins subcellular localization information is very important for understanding the mechanism of programmed cell death and the development of drugs. Therefore, the prediction of subcellular localization of apoptosis protein is still a challenging task. RESULTS In this paper, we propose a novel method for predicting apoptosis protein subcellular localization, called PsePSSM-DCCA-LFDA. Firstly, the protein sequences are extracted by combining pseudo-position specific scoring matrix (PsePSSM) and detrended cross-correlation analysis coefficient (DCCA coefficient), then the extracted feature information is reduced dimensionality by LFDA (local Fisher discriminant analysis). Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of the apoptosis proteins. The overall prediction accuracy of 99.7, 99.6 and 100% are achieved respectively on the three benchmark datasets by the most rigorous jackknife test, which is better than other state-of-the-art methods. CONCLUSION The experimental results indicate that our method can significantly improve the prediction accuracy of subcellular localization of apoptosis proteins, which is quite high to be able to become a promising tool for further proteomics studies. The source code and all datasets are available at https://github.com/QUST-BSBRC/PsePSSM-DCCA-LFDA/ .
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China. .,School of Life Sciences, University of Science and Technology of China, Hefei, 230027, China.
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Wenying Qiu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Minghui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Junwei Du
- College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University at Weihai, Weihai, 264209, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 21116, China
| |
Collapse
|
32
|
DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC. J Theor Biol 2018; 452:22-34. [PMID: 29753757 DOI: 10.1016/j.jtbi.2018.05.006] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Revised: 04/21/2018] [Accepted: 05/04/2018] [Indexed: 11/21/2022]
Abstract
A DNA-binding protein (DNA-BP) is a protein that can bind and interact with a DNA. Identification of DNA-BPs using experimental methods is expensive as well as time consuming. As such, fast and accurate computational methods are sought for predicting whether a protein can bind with a DNA or not. In this paper, we focus on building a new computational model to identify DNA-BPs in an efficient and accurate way. Our model extracts meaningful information directly from the protein sequences, without any dependence on functional domain or structural information. After feature extraction, we have employed Random Forest (RF) model to rank the features. Afterwards, we have used Recursive Feature Elimination (RFE) method to extract an optimal set of features and trained a prediction model using Support Vector Machine (SVM) with linear kernel. Our proposed method, named as DNA-binding Protein Prediction model using Chou's general PseAAC (DPP-PseAAC), demonstrates superior performance compared to the state-of-the-art predictors on standard benchmark dataset. DPP-PseAAC achieves accuracy values of 93.21%, 95.91% and 77.42% for 10-fold cross-validation test, jackknife test and independent test respectively. The source code of DPP-PseAAC, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/DNABinding. A publicly accessible web interface has also been established at: http://77.68.43.135:8080/DPP-PseAAC/.
Collapse
|
33
|
Feng P, Yang H, Ding H, Lin H, Chen W, Chou KC. iDNA6mA-PseKNC: Identifying DNA N 6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 2018; 111:96-102. [PMID: 29360500 DOI: 10.1016/j.ygeno.2018.01.005] [Citation(s) in RCA: 193] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 12/24/2017] [Accepted: 01/07/2018] [Indexed: 11/29/2022]
Abstract
N6-methyladenine (6mA) is one kind of post-replication modification (PTM or PTRM) occurring in a wide range of DNA sequences. Accurate identification of its sites will be very helpful for revealing the biological functions of 6mA, but it is time-consuming and expensive to determine them by experiments alone. Unfortunately, so far, no bioinformatics tool is available to do so. To fill in such an empty area, we have proposed a novel predictor called iDNA6mA-PseKNC that is established by incorporating nucleotide physicochemical properties into Pseudo K-tuple Nucleotide Composition (PseKNC). It has been observed via rigorous cross-validations that the predictor's sensitivity (Sn), specificity (Sp), accuracy (Acc), and stability (MCC) are 93%, 100%, 96%, and 0.93, respectively. For the convenience of most experimental scientists, a user-friendly web server for iDNA6mA-PseKNC has been established at http://lin-group.cn/server/iDNA6mA-PseKNC, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Pengmian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan 063000, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan 063000, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| |
Collapse
|
34
|
Accurate prediction of subcellular location of apoptosis proteins combining Chou's PseAAC and PsePSSM based on wavelet denoising. Oncotarget 2017; 8:107640-107665. [PMID: 29296195 PMCID: PMC5746097 DOI: 10.18632/oncotarget.22585] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Accepted: 10/30/2017] [Indexed: 02/05/2023] Open
Abstract
Apoptosis proteins subcellular localization information are very important for understanding the mechanism of programmed cell death and the development of drugs. The prediction of subcellular localization of an apoptosis protein is still a challenging task because the prediction of apoptosis proteins subcellular localization can help to understand their function and the role of metabolic processes. In this paper, we propose a novel method for protein subcellular localization prediction. Firstly, the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition (PseAAC) and pseudo-position specific scoring matrix (PsePSSM), then the feature information of the extracted is denoised by two-dimensional (2-D) wavelet denoising. Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of apoptosis proteins. Quite promising predictions are obtained using the jackknife test on three widely used datasets and compared with other state-of-the-art methods. The results indicate that the method proposed in this paper can remarkably improve the prediction accuracy of apoptosis protein subcellular localization, which will be a supplementary tool for future proteomics research.
Collapse
|
35
|
Xiao X, Cheng X, Su S, Mao Q, Chou KC. pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.99032] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|