1
|
Dewaker V, Morya VK, Kim YH, Park ST, Kim HS, Koh YH. Revolutionizing oncology: the role of Artificial Intelligence (AI) as an antibody design, and optimization tools. Biomark Res 2025; 13:52. [PMID: 40155973 PMCID: PMC11954232 DOI: 10.1186/s40364-025-00764-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2025] [Accepted: 03/13/2025] [Indexed: 04/01/2025] Open
Abstract
Antibodies play a crucial role in defending the human body against diseases, including life-threatening conditions like cancer. They mediate immune responses against foreign antigens and, in some cases, self-antigens. Over time, antibody-based technologies have evolved from monoclonal antibodies (mAbs) to chimeric antigen receptor T cells (CAR-T cells), significantly impacting biotechnology, diagnostics, and therapeutics. Although these advancements have enhanced therapeutic interventions, the integration of artificial intelligence (AI) is revolutionizing antibody design and optimization. This review explores recent AI advancements, including large language models (LLMs), diffusion models, and generative AI-based applications, which have transformed antibody discovery by accelerating de novo generation, enhancing immune response precision, and optimizing therapeutic efficacy. Through advanced data analysis, AI enables the prediction and design of antibody sequences, 3D structures, complementarity-determining regions (CDRs), paratopes, epitopes, and antigen-antibody interactions. These AI-powered innovations address longstanding challenges in antibody development, significantly improving speed, specificity, and accuracy in therapeutic design. By integrating computational advancements with biomedical applications, AI is driving next-generation cancer therapies, transforming precision medicine, and enhancing patient outcomes.
Collapse
Affiliation(s)
- Varun Dewaker
- Institute of New Frontier Research Team, Hallym University, Chuncheon-Si, Gangwon-Do, 24252, Republic of Korea
| | - Vivek Kumar Morya
- Department of Orthopedic Surgery, Hallym University Dongtan Sacred Hospital, Hwaseong-Si, 18450, Republic of Korea
| | - Yoo Hee Kim
- Department of Biomedical Gerontology, Ilsong Institute of Life Science, Hallym University, Seoul, 07247, Republic of Korea
| | - Sung Taek Park
- Institute of New Frontier Research Team, Hallym University, Chuncheon-Si, Gangwon-Do, 24252, Republic of Korea
- Department of Obstetrics and Gynecology, Kangnam Sacred-Heart Hospital, Hallym University Medical Center, Hallym University College of Medicine, Seoul, 07441, Republic of Korea
- EIONCELL Inc, Chuncheon-Si, 24252, Republic of Korea
| | - Hyeong Su Kim
- Institute of New Frontier Research Team, Hallym University, Chuncheon-Si, Gangwon-Do, 24252, Republic of Korea.
- Department of Internal Medicine, Division of Hemato-Oncology, Kangnam Sacred-Heart Hospital, Hallym University Medical Center, Hallym University College of Medicine, Seoul, 07441, Republic of Korea.
- EIONCELL Inc, Chuncheon-Si, 24252, Republic of Korea.
| | - Young Ho Koh
- Department of Biomedical Gerontology, Ilsong Institute of Life Science, Hallym University, Seoul, 07247, Republic of Korea.
| |
Collapse
|
2
|
Dong B, Liu Z, Xu D, Hou C, Dong G, Zhang T, Wang G. SERT-StructNet: Protein secondary structure prediction method based on multi-factor hybrid deep model. Comput Struct Biotechnol J 2024; 23:1364-1375. [PMID: 38596312 PMCID: PMC11001767 DOI: 10.1016/j.csbj.2024.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 04/11/2024] Open
Abstract
Protein secondary structure prediction (PSSP) is a pivotal research endeavour that plays a crucial role in the comprehensive elucidation of protein functions and properties. Current prediction methodologies are focused on deep-learning techniques, particularly focusing on multi-factor features. Diverging from existing approaches, in this study, we placed special emphasis on the effects of amino acid properties and protein secondary structure propensity scores (SSPs) on secondary structure during the meticulous selection of multi-factor features. This differential feature-selection strategy results in a distinctive and effective amalgamation of the sequence and property features. To harness these multi-factor features optimally, we introduced a hybrid deep feature extraction model. The model initially employs mechanisms such as dilated convolution (D-Conv) and a channel attention network (SENet) for local feature extraction and targeted channel enhancement. Subsequently, a combination of recurrent neural network variants (BiGRU and BiLSTM), along with a transformer module, was employed to achieve global bidirectional information consideration and feature enhancement. This approach to multi-factor feature input and multi-level feature processing enabled a comprehensive exploration of intricate associations among amino acid residues in protein sequences, yielding a Q 3 accuracy of 84.9% and an Sov score of 85.1%. The overall performance surpasses that of the comparable methods. This study introduces a novel and efficient method for determining the PSSP domain, which is poised to deepen our understanding of the practical applications of protein molecular structures.
Collapse
Affiliation(s)
- Benzhi Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Zheng Liu
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Dali Xu
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Chang Hou
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Guanghui Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Tianjiao Zhang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| |
Collapse
|
3
|
Yi W, Sun A, Liu M, Liu X, Zhang W, Dai Q. Comparative Study on Feature Selection in Protein Structure and Function Prediction. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:1650693. [PMID: 36267316 PMCID: PMC9578875 DOI: 10.1155/2022/1650693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 09/14/2022] [Indexed: 11/18/2022]
Abstract
Many effective methods extract and fuse different protein features to study the relationship between protein sequence, structure, and function, but different methods have preferences in solving the research of protein structure and function, which requires selecting valuable and contributing features to design more effective prediction methods. This work mainly focused on the feature selection methods in the study of protein structure and function, and systematically compared and analyzed the efficiency of different feature selection methods in the prediction of protein structures, protein disorders, protein molecular chaperones, and protein solubility. The results show that the feature selection method based on nonlinear SVM performs best in protein structure prediction, protein solubility prediction, protein molecular chaperone prediction, and protein solubility prediction. After selection, the accuracy of features is improved by 13.16% ~71%, especially the Kmer features and PSSM features of proteins.
Collapse
Affiliation(s)
- Wenjing Yi
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Ao Sun
- College of Informatics Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Manman Liu
- College of Informatics Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Wei Zhang
- College of Informatics Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| |
Collapse
|
4
|
Wang Y, Xu Y, Yang Z, Liu X, Dai Q. Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5529389. [PMID: 34055035 PMCID: PMC8123985 DOI: 10.1155/2021/5529389] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/28/2021] [Indexed: 11/20/2022]
Abstract
Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.
Collapse
Affiliation(s)
- Yaoxin Wang
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Yingjie Xu
- Qixin School, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Zhenyu Yang
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| |
Collapse
|
5
|
Abdennaji I, Zaied M, Girault JM. Prediction of protein structural class based on symmetrical recurrence quantification analysis. Comput Biol Chem 2021; 92:107450. [PMID: 33631460 DOI: 10.1016/j.compbiolchem.2021.107450] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 02/03/2021] [Indexed: 11/19/2022]
Abstract
Protein structural class prediction for low similarity sequences is a significant challenge and one of the deeply explored subjects. This plays an important role in drug design, folding recognition of protein, functional analysis and several other biology applications. In this paper, we worked with two benchmark databases existing in the literature (1) 25PDB and (2) 1189 to apply our proposed method for predicting protein structural class. Initially, we transformed protein sequences into DNA sequences and then into binary sequences. Furthermore, we applied symmetrical recurrence quantification analysis (the new approach), where we got 8 features from each symmetry plot computation. Moreover, the machine learning algorithms such as Linear Discriminant Analysis (LDA), Random Forest (RF) and Support Vector Machine (SVM) are used. In addition, comparison was made to find the best classifier for protein structural class prediction. Results show that symmetrical recurrence quantification as feature extraction method with RF classifier outperformed existing methods with an overall accuracy of 100% without overfitting.
Collapse
Affiliation(s)
- Ines Abdennaji
- Research Team in Intelligent Machines, National School of Engineers of Gabes, B.P. W, 6072 Gabes, Tunisia; GSII ESEO - LAUM UMR CNRS 6613, 49000 Angers, France.
| | - Mourad Zaied
- Research Team in Intelligent Machines, National School of Engineers of Gabes, B.P. W, 6072 Gabes, Tunisia; GSII ESEO - LAUM UMR CNRS 6613, 49000 Angers, France
| | - Jean-Marc Girault
- Research Team in Intelligent Machines, National School of Engineers of Gabes, B.P. W, 6072 Gabes, Tunisia; GSII ESEO - LAUM UMR CNRS 6613, 49000 Angers, France
| |
Collapse
|
6
|
Wang S, Wang X. Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion. BMC Bioinformatics 2019; 20:701. [PMID: 31874617 PMCID: PMC6929547 DOI: 10.1186/s12859-019-3276-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Protein structural class predicting is a heavily researched subject in bioinformatics that plays a vital role in protein functional analysis, protein folding recognition, rational drug design and other related fields. However, when traditional feature expression methods are adopted, the features usually contain considerable redundant information, which leads to a very low recognition rate of protein structural classes. RESULTS We constructed a prediction model based on wavelet denoising using different feature expression methods. A new fusion idea, first fuse and then denoise, is proposed in this article. Two types of pseudo amino acid compositions are utilized to distill feature vectors. Then, a two-dimensional (2-D) wavelet denoising algorithm is used to remove the redundant information from two extracted feature vectors. The two feature vectors based on parallel 2-D wavelet denoising are fused, which is known as PWD-FU-PseAAC. The related source codes are available at https://github.com/Xiaoheng-Wang12/Wang-xiaoheng/tree/master. CONCLUSIONS Experimental verification of three low-similarity datasets suggests that the proposed model achieves notably good results as regarding the prediction of protein structural classes.
Collapse
Affiliation(s)
- Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China.
| | - Xiaoheng Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China
| |
Collapse
|
7
|
Xi B, Tao J, Liu X, Xu X, He P, Dai Q. RaaMLab: A MATLAB toolbox that generates amino acid groups and reduced amino acid modes. Biosystems 2019; 180:38-45. [PMID: 30904554 DOI: 10.1016/j.biosystems.2019.03.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 12/25/2018] [Accepted: 03/06/2019] [Indexed: 01/31/2023]
Abstract
Amino acid (AA) classification and its different biophysical and chemical characteristics have been widely applied to analyze and predict the structural, functional, expression and interaction profiles of proteins and peptides. We present RaaMLab, a free and open-source MATLAB toolbox, to facilitate studies on proteins and peptides, to generate AA groups and to extract the structural and physicochemical features of reduced AAs (RedAA). This toolbox offers 4 kinds of databases, including the physicochemical properties of AAs and their groupings, 49 AA classification methods and 5 types of biophysicochemical features of RedAAs. These factors can be easily computed based on user-defined alphabet size and AA properties of AA groupings. RaaMLab is an open source freely available at https://github.com/bioinfo0706/RaaMLab. This website also contains a tutorial, extensive documentation and examples.
Collapse
Affiliation(s)
- Baohang Xi
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Jin Tao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Xinnan Xu
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Pingan He
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China.
| |
Collapse
|
8
|
A novel feature selection method to predict protein structural class. Comput Biol Chem 2018; 76:118-129. [DOI: 10.1016/j.compbiolchem.2018.06.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Revised: 05/14/2018] [Accepted: 06/30/2018] [Indexed: 01/05/2023]
|
9
|
Development of Easily Accessible Electricity Consumption Model Using Open Data and GA-SVR. ENERGIES 2018. [DOI: 10.3390/en11020373] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
10
|
Yu B, Lou L, Li S, Zhang Y, Qiu W, Wu X, Wang M, Tian B. Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising. J Mol Graph Model 2017; 76:260-273. [DOI: 10.1016/j.jmgm.2017.07.012] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 07/11/2017] [Accepted: 07/12/2017] [Indexed: 11/25/2022]
|
11
|
|
12
|
Kong L, Kong L, Jing R. Improving the Prediction of Protein Structural Class for Low-Similarity Sequences by Incorporating Evolutionaryand Structural Information. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2016. [DOI: 10.20965/jaciii.2016.p0402] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein structural class prediction is beneficial to study protein function, regulation and interactions. However, protein structural class prediction for low-similarity sequences (i.e., below 40% in pairwise sequence similarity) remains a challenging problem at present. In this study, a novel computational method is proposed to accurately predict protein structural class for low-similarity sequences. This method is based on support vector machine in conjunction with integrated features from evolutionary information generated with position specific iterative basic local alignment search tool (PSI-BLAST) and predicted secondary structure. Various prediction accuracies evaluated by the jackknife tests are reported on two widely-used low-similarity benchmark datasets (25PDB and 1189), reaching overall accuracies 89.3% and 87.9%, which are significantly higher than those achieved by state-of-the-art in protein structural class prediction. The experimental results suggest that our method could serve as an effective alternative to existing methods in protein structural classification, especially for low-similarity sequences.
Collapse
|
13
|
Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:370756. [PMID: 26788119 PMCID: PMC4693000 DOI: 10.1155/2015/370756] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 11/19/2015] [Accepted: 12/01/2015] [Indexed: 11/17/2022]
Abstract
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Collapse
|
14
|
Li X, Liu T, Tao P, Wang C, Chen L. A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination. Comput Biol Chem 2015; 59 Pt A:95-100. [DOI: 10.1016/j.compbiolchem.2015.08.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Revised: 08/30/2015] [Accepted: 08/30/2015] [Indexed: 12/11/2022]
|
15
|
Fan M, Zheng B, Li L. A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure. J Bioinform Comput Biol 2015; 13:1550022. [DOI: 10.1142/s0219720015500225] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.
Collapse
Affiliation(s)
- Ming Fan
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Bin Zheng
- Hunan Mechanical and Electrical Polytechnic, Chang Sha 410151, China
| | - Lihua Li
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
16
|
Qin Y, Zheng X, Wang J, Chen M, Zhou C. Prediction of protein structural class based on Linear Predictive Coding of PSI-BLAST profiles. Open Life Sci 2015. [DOI: 10.1515/biol-2015-0055] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
AbstractKnowledge of protein structure plays a key role in the analysis of protein functions, protein binding, rational drug design, and many other related fields and applications. In this study, a novel feature extraction model based on linear predictive coding (LPC) and position-specific score matrices (PSSM) was proposed to predict structural class from protein sequences. First, the PSI-BLAST tool was employed to transform the original protein sequences into PSSMs. Then, the LPC, a signal processing tool, was applied to extract the features from PSSMs. The selected features were finally fed to a support vector machine to perform the prediction. Cross-validation tests on the four benchmark datasets Z277, Z498, 1189 and 25PDB, showed a significant leap in overall accuracy using the proposed method. Compared to existing methods, our method achieved better performance in prediction of protein structural class.
Collapse
|
17
|
Wang J, Wang C, Cao J, Liu X, Yao Y, Dai Q. Prediction of protein structural classes for low-similarity sequences using reduced PSSM and position-based secondary structural features. Gene 2015; 554:241-8. [DOI: 10.1016/j.gene.2014.10.037] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2014] [Revised: 10/19/2014] [Accepted: 10/22/2014] [Indexed: 10/24/2022]
|
18
|
Hayat M, Iqbal N. Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 116:184-192. [PMID: 24997484 DOI: 10.1016/j.cmpb.2014.06.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 06/09/2014] [Accepted: 06/13/2014] [Indexed: 06/03/2023]
Abstract
Proteins control all biological functions in living species. Protein structure is comprised of four major classes including all-α class, all-β class, α+β, and α/β. Each class performs different function according to their nature. Owing to the large exploration of protein sequences in the databanks, the identification of protein structure classes is difficult through conventional methods with respect to cost and time. Looking at the importance of protein structure classes, it is thus highly desirable to develop a computational model for discriminating protein structure classes with high accuracy. For this purpose, we propose a silco method by incorporating Pseudo Average Chemical Shift and Support Vector Machine. Two feature extraction schemes namely Pseudo Amino Acid Composition and Pseudo Average Chemical Shift are used to explore valuable information from protein sequences. The performance of the proposed model is assessed using four benchmark datasets 25PDB, 1189, 640 and 399 employing jackknife test. The success rates of the proposed model are 84.2%, 85.0%, 86.4%, and 89.2%, respectively on the four datasets. The empirical results reveal that the performance of our proposed model compared to existing models is promising in the literature so far and might be useful for future research.
Collapse
Affiliation(s)
- Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| |
Collapse
|
19
|
Mohan A, Rao MD, Sunderrajan S, Pennathur G. Automatic classification of protein structures using physicochemical parameters. Interdiscip Sci 2014; 6:176-86. [PMID: 25205495 DOI: 10.1007/s12539-013-0199-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Revised: 11/12/2013] [Accepted: 12/05/2013] [Indexed: 11/26/2022]
Abstract
Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.
Collapse
Affiliation(s)
- Abhilash Mohan
- The Center for Biotechnology, Anna University, Chennai, 600025, Tamilnadu, India
| | | | | | | |
Collapse
|
20
|
Ding S, Yan S, Qi S, Li Y, Yao Y. A protein structural classes prediction method based on PSI-BLAST profile. J Theor Biol 2014; 353:19-23. [DOI: 10.1016/j.jtbi.2014.02.034] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 01/27/2014] [Accepted: 02/24/2014] [Indexed: 11/27/2022]
|
21
|
Zhang L, Zhao X, Kong L. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou's pseudo amino acid composition. J Theor Biol 2014; 355:105-10. [PMID: 24735902 DOI: 10.1016/j.jtbi.2014.04.008] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 02/26/2014] [Accepted: 04/04/2014] [Indexed: 10/25/2022]
Abstract
Knowledge of protein structural class plays an important role in characterizing the overall folding type of a given protein. At present, it is still a challenge to extract sequence information solely using protein sequence for protein structural class prediction with low similarity sequence in the current computational biology. In this study, a novel sequence representation method is proposed based on position specific scoring matrix for protein structural class prediction. By defined evolutionary difference formula, varying length proteins are expressed as uniform dimensional vectors, which can represent evolutionary difference information between the adjacent residues of a given protein. To perform and evaluate the proposed method, support vector machine and jackknife tests are employed on three widely used datasets, 25PDB, 1189 and 640 datasets with sequence similarity lower than 25%, 40% and 25%, respectively. Comparison of our results with the previous methods shows that our method may provide a promising method to predict protein structural class especially for low-similarity sequences.
Collapse
Affiliation(s)
- Lichao Zhang
- College of Marine Life Science, Ocean University of China, Yushan Road, Qingdao 266003, PR China
| | - Xiqiang Zhao
- College of Mathematical Science, Ocean University of China, Songling Road, Qingdao 266100, PR China.
| | - Liang Kong
- College of Mathematics and Information Technology, Hebei Normal University of Science and Technology, Qinhuangdao 066004, PR China
| |
Collapse
|
22
|
Kong L, Zhang L. Novel structure-driven features for accurate prediction of protein structural class. Genomics 2014; 103:292-7. [DOI: 10.1016/j.ygeno.2014.04.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2013] [Revised: 04/05/2014] [Accepted: 04/07/2014] [Indexed: 11/25/2022]
|
23
|
Wang J, Li Y, Liu X, Dai Q, Yao Y, He P. High-accuracy prediction of protein structural classes using PseAA structural properties and secondary structural patterns. Biochimie 2014; 101:104-12. [PMID: 24412731 DOI: 10.1016/j.biochi.2013.12.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 12/30/2013] [Indexed: 10/25/2022]
Abstract
Since introduction of PseAAs and functional domains, promising results have been achieved in protein structural class predication, but some challenges still exist in the representation of the PseAA structural correlation and structural domains. This paper proposed a high-accuracy prediction method using novel PseAA structural properties and secondary structural patterns, reflecting the long-range and local structural properties of the PseAAs and certain compact structural domains. The proposed prediction method was tested against the competing prediction methods with four experiments. The experiment results indicate that the proposed method achieved the best performance. Its overall accuracies for datasets 25 PDB, D640, FC699 and 1189 are 88.8%, 90.9%, 96.4% and 87.4%, which are 4.5%, 7.6%, 2% and 3.9% higher than the existing best-performing method. This understanding can be used to guide development of more powerful methods for protein structural class prediction. The software and supplement material are freely available at http://bioinfo.zstu.edu.cn/PseAA-SSP.
Collapse
Affiliation(s)
- Junru Wang
- College of Mechanical Engineering and Automation, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Yan Li
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China.
| | - Yuhua Yao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Pingan He
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| |
Collapse
|
24
|
Zhang S, Liang Y, Yuan X. Improving the prediction accuracy of protein structural class: Approached with alternating word frequency and normalized Lempel–Ziv complexity. J Theor Biol 2014; 341:71-7. [DOI: 10.1016/j.jtbi.2013.10.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Revised: 09/08/2013] [Accepted: 10/08/2013] [Indexed: 10/26/2022]
|
25
|
Hayat M, Tahir M, Khan SA. Prediction of protein structure classes using hybrid space of multi-profile Bayes and bi-gram probability feature spaces. J Theor Biol 2013; 346:8-15. [PMID: 24384128 DOI: 10.1016/j.jtbi.2013.12.015] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 10/30/2013] [Accepted: 12/12/2013] [Indexed: 11/28/2022]
Abstract
Proteins are the executants of biological functions in living organisms. Comprehension of protein structure is a challenging problem in the era of proteomics, computational biology, and bioinformatics because of its pivotal role in protein folding patterns. Owing to the large exploration of protein sequences in protein databanks and intricacy of protein structures, experimental and theoretical methods are insufficient for prediction of protein structure classes. Therefore, it is highly desirable to develop an accurate, reliable, and high throughput computational model to predict protein structure classes correctly from polygenetic sequences. In this regard, we propose a promising model employing hybrid descriptor space in conjunction with optimized evidence-theoretic K-nearest neighbor algorithm. Hybrid space is the composition of two descriptor spaces including Multi-profile Bayes and bi-gram probability. In order to enhance the generalization power of the classifier, we have selected high discriminative descriptors from the hybrid space using particle swarm optimization, a well-known evolutionary feature selection technique. Performance evaluation of the proposed model is performed using the jackknife test on three low similarity benchmark datasets including 25PDB, 1189, and 640. The success rates of the proposed model are 87.0%, 86.6%, and 88.4%, respectively on the three benchmark datasets. The comparative analysis exhibits that our proposed model has yielded promising results compared to the existing methods in the literature. In addition, our proposed prediction system might be helpful in future research particularly in cases where the major focus of research is on low similarity datasets.
Collapse
Affiliation(s)
- Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| | - Muhammad Tahir
- Department of Computer Science, National University of Computer and Emerging Science, Peshawar, Pakistan
| | - Sher Afzal Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| |
Collapse
|
26
|
Using protein granularity to extract the protein sequence features. J Theor Biol 2013; 331:48-53. [DOI: 10.1016/j.jtbi.2013.04.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2012] [Revised: 04/16/2013] [Accepted: 04/18/2013] [Indexed: 11/21/2022]
|
27
|
Dai Q, Li Y, Liu X, Yao Y, Cao Y, He P. Comparison study on statistical features of predicted secondary structures for protein structural class prediction: From content to position. BMC Bioinformatics 2013; 14:152. [PMID: 23641706 PMCID: PMC3652764 DOI: 10.1186/1471-2105-14-152] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 04/03/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many content-based statistical features of secondary structural elements (CBF-PSSEs) have been proposed and achieved promising results in protein structural class prediction, but until now position distribution of the successive occurrences of an element in predicted secondary structure sequences hasn't been used. It is necessary to extract some appropriate position-based features of the secondary structural elements for prediction task. RESULTS We proposed some position-based features of predicted secondary structural elements (PBF-PSSEs) and assessed their intrinsic ability relative to the available CBF-PSSEs, which not only offers a systematic and quantitative experimental assessment of these statistical features, but also naturally complements the available comparison of the CBF-PSSEs. We also analyzed the performance of the CBF-PSSEs combined with the PBF-PSSE and further constructed a new combined feature set, PBF11CBF-PSSE. Based on these experiments, novel valuable guidelines for the use of PBF-PSSEs and CBF-PSSEs were obtained. CONCLUSIONS PBF-PSSEs and CBF-PSSEs have a compelling impact on protein structural class prediction. When combining with the PBF-PSSE, most of the CBF-PSSEs get a great improvement over the prediction accuracies, so the PBF-PSSEs and the CBF-PSSEs have to work closely so as to make significant and complementary contributions to protein structural class prediction. Besides, the proposed PBF-PSSE's performance is extremely sensitive to the choice of parameter k. In summary, our quantitative analysis verifies that exploring the position information of predicted secondary structural elements is a promising way to improve the abilities of protein structural class prediction.
Collapse
Affiliation(s)
- Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Yan Li
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Xiaoqing Liu
- College of Science, Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Yuhua Yao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Yunjie Cao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Pingan He
- College of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| |
Collapse
|
28
|
Chen L, Lu J, Zhang J, Feng KR, Zheng MY, Cai YD. Predicting chemical toxicity effects based on chemical-chemical interactions. PLoS One 2013; 8:e56517. [PMID: 23457578 PMCID: PMC3574107 DOI: 10.1371/journal.pone.0056517] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Accepted: 01/10/2013] [Indexed: 12/02/2022] Open
Abstract
Toxicity is a major contributor to high attrition rates of new chemical entities in drug discoveries. In this study, an order-classifier was built to predict a series of toxic effects based on data concerning chemical-chemical interactions under the assumption that interactive compounds are more likely to share similar toxicity profiles. According to their interaction confidence scores, the order from the most likely toxicity to the least was obtained for each compound. Ten test groups, each of them containing one training dataset and one test dataset, were constructed from a benchmark dataset consisting of 17,233 compounds. By a Jackknife test on each of these test groups, the 1st order prediction accuracies of the training dataset and the test dataset were all approximately 79.50%, substantially higher than the rate of 25.43% achieved by random guesses. Encouraged by the promising results, we expect that our method will become a useful tool in screening out drugs with high toxicity.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Jing Lu
- Drug Discovery and Design Center (DDDC), Shanghai Institute of Materia Medica, Shanghai, China
| | - Jian Zhang
- Department of Ophthalmology, Shanghai First People’s Hospital Affiliated to Shanghai Jiaotong University, Shanghai, China
| | - Kai-Rui Feng
- Simcyp Limited, Blades Enterprise Centre, Sheffield, United Kingdom
| | - Ming-Yue Zheng
- Drug Discovery and Design Center (DDDC), Shanghai Institute of Materia Medica, Shanghai, China
- * E-mail: (MYZ); (YDC)
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, China
- * E-mail: (MYZ); (YDC)
| |
Collapse
|
29
|
|
30
|
Gao YF, Chen L, Cai YD, Feng KY, Huang T, Jiang Y. Predicting metabolic pathways of small molecules and enzymes based on interaction information of chemicals and proteins. PLoS One 2012; 7:e45944. [PMID: 23029334 PMCID: PMC3448724 DOI: 10.1371/journal.pone.0045944] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2012] [Accepted: 08/23/2012] [Indexed: 12/25/2022] Open
Abstract
Metabolic pathway analysis, one of the most important fields in biochemistry, is pivotal to understanding the maintenance and modulation of the functions of an organism. Good comprehension of metabolic pathways is critical to understanding the mechanisms of some fundamental biological processes. Given a small molecule or an enzyme, how may one identify the metabolic pathways in which it may participate? Answering such a question is a first important step in understanding a metabolic pathway system. By utilizing the information provided by chemical-chemical interactions, chemical-protein interactions, and protein-protein interactions, a novel method was proposed by which to allocate small molecules and enzymes to 11 major classes of metabolic pathways. A benchmark dataset consisting of 3,348 small molecules and 654 enzymes of yeast was constructed to test the method. It was observed that the first order prediction accuracy evaluated by the jackknife test was 79.56% in identifying the small molecules and enzymes in a benchmark dataset. Our method may become a useful vehicle in predicting the metabolic pathways of small molecules and enzymes, providing a basis for some further analysis of the pathway systems.
Collapse
Affiliation(s)
- Yu-Fei Gao
- Department of Surgery, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, China
| | - Kai-Yan Feng
- Beijing Genomics Institute, Shenzhen Beishan Industrial zone, Shenzhen, China
| | - Tao Huang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Yang Jiang
- Department of Surgery, China-Japan Union Hospital of Jilin University, Changchun, China
| |
Collapse
|
31
|
Ahmadi Adl A, Nowzari-Dalini A, Xue B, Uversky VN, Qian X. Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences. J Biomol Struct Dyn 2012; 29:623-33. [DOI: 10.1080/07391102.2011.672626] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
32
|
Zhang S, Ye F, Yuan X. Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM. J Biomol Struct Dyn 2012; 29:634-42. [PMID: 22545994 DOI: 10.1080/07391102.2011.672627] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
33
|
Liu T, Geng X, Zheng X, Li R, Wang J. Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles. Amino Acids 2011; 42:2243-9. [PMID: 21698456 DOI: 10.1007/s00726-011-0964-5] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 06/11/2011] [Indexed: 02/07/2023]
Abstract
Computational prediction of protein structural class based solely on sequence data remains a challenging problem in protein science. Existing methods differ in the protein sequence representation models and prediction engines adopted. In this study, a powerful feature extraction method, which combines position-specific score matrix (PSSM) with auto covariance (AC) transformation, is introduced. Thus, a sample protein is represented by a series of discrete components, which could partially incorporate the long-range sequence order information and evolutionary information reflected from the PSI-BLAST profile. To verify the performance of our method, jackknife cross-validation tests are performed on four widely used benchmark datasets. Comparison of our results with existing methods shows that our method provides the state-of-the-art performance for structural class prediction. A Web server that implements the proposed method is freely available at http://202.194.133.5/xinxi/AAC_PSSM_AC/index.htm.
Collapse
Affiliation(s)
- Taigang Liu
- College of Information Sciences and Engineering, Shandong Agricultural University, Taian, 271018, China
| | | | | | | | | |
Collapse
|
34
|
Naveed M, Khan AU. GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble. Amino Acids 2011; 42:1809-23. [DOI: 10.1007/s00726-011-0902-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Accepted: 03/26/2011] [Indexed: 11/27/2022]
|
35
|
He P, Xu X, Chen L. Manifold Mapping Machine. Neurocomputing 2011. [DOI: 10.1016/j.neucom.2010.11.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
36
|
Pugalenthi G, Kandaswamy KK, Suganthan PN, Sowdhamini R, Martinetz T, Kolatkar PR. SMpred: a support vector machine approach to identify structural motifs in protein structure without using evolutionary information. J Biomol Struct Dyn 2011; 28:405-14. [PMID: 20919755 DOI: 10.1080/07391102.2010.10507369] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Knowledge of three dimensional structure is essential to understand the function of a protein. Although the overall fold is made from the whole details of its sequence, a small group of residues, often called as structural motifs, play a crucial role in determining the protein fold and its stability. Identification of such structural motifs requires sufficient number of sequence and structural homologs to define conservation and evolutionary information. Unfortunately, there are many structures in the protein structure databases have no homologous structures or sequences. In this work, we report an SVM method, SMpred, to identify structural motifs from single protein structure without using sequence and structural homologs. SMpred method was trained and tested using 132 proteins domains containing 581 motifs. SMpred method achieved 78.79% accuracy with 79.06% sensitivity and 78.53% specificity. The performance of SMpred was evaluated with MegaMotifBase using 188 proteins containing 1161 motifs. Out of 1161 motifs, SMpred correctly identified 1503 structural motifs reported in MegaMotifBase. Further, we showed that SMpred is useful approach for the length deviant superfamilies and single member superfamilies. This result suggests the usefulness of our approach for facilitating the identification of structural motifs in protein structure in the absence of sequence and structural homologs. The dataset and executable for the SMpred algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/SMpred.htm.
Collapse
Affiliation(s)
- Ganesan Pugalenthi
- Laboratory of Structural Biochemistry, Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672
| | | | | | | | | | | |
Collapse
|
37
|
Zhang S, Ding S, Wang T. High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. Biochimie 2011; 93:710-4. [PMID: 21237245 DOI: 10.1016/j.biochi.2011.01.001] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2010] [Accepted: 01/04/2011] [Indexed: 11/30/2022]
Abstract
Information on the structural classes of proteins has been proven to be important in many fields of bioinformatics. Prediction of protein structural class for low-similarity sequences is a challenge problem. In this study, 11 features (including 8 re-used features and 3 newly-designed features) are rationally utilized to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets, 1189 and 25PDB with sequence similarity lower than 40% and 25%, respectively. Comparison of our results with other methods shows that our proposed method is very promising and may provide a cost-effective alternative to predict protein structural class in particular for low-similarity datasets.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematical Sciences, Dalian University of Technology, Ganjingzi District, Dalian, Liaoning, PR China.
| | | | | |
Collapse
|
38
|
Ken-Li Lin, Chin-Teng Lin, Pal NR. Incremental Mountain Clustering Method to Find Building Blocks for Constructing Structures of Proteins. IEEE Trans Nanobioscience 2010; 9:278-88. [DOI: 10.1109/tnb.2010.2095467] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
39
|
Sahu SS, Panda G. A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction. Comput Biol Chem 2010; 34:320-7. [DOI: 10.1016/j.compbiolchem.2010.09.002] [Citation(s) in RCA: 147] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2010] [Revised: 09/28/2010] [Accepted: 09/28/2010] [Indexed: 10/19/2022]
|
40
|
Liu T, Zheng X, Wang J. Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile. Biochimie 2010; 92:1330-4. [DOI: 10.1016/j.biochi.2010.06.013] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Accepted: 06/16/2010] [Indexed: 11/25/2022]
|
41
|
Zheng X, Li C, Wang J. An information-theoretic approach to the prediction of protein structural class. J Comput Chem 2010; 31:1201-6. [PMID: 19777491 DOI: 10.1002/jcc.21406] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
An information-theoretical approach, which combines a sequence decomposition technique and a fuzzy clustering algorithm, is proposed for prediction of protein structural class. This approach could bypass the process of selecting and comparing sequence features as done previously. First, distances between each pair of protein sequences are estimated using a conditional decomposition technique in information theory. Then, the fuzzy k-nearest neighbor algorithm is used to identify the structural class of a protein given as set of sample sequences. To verify the strength of our method, we choose three widely used datasets constructed by Chou and Zhou. It is shown by the Jackknife test that our approach represents an improvement in the prediction of accuracy over existing methods.
Collapse
Affiliation(s)
- Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai 200234, China
| | | | | |
Collapse
|
42
|
Anand A, Pugalenthi G, Fogel GB, Suganthan PN. An approach for classification of highly imbalanced data using weighting and undersampling. Amino Acids 2010; 39:1385-91. [DOI: 10.1007/s00726-010-0595-2] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2009] [Accepted: 04/07/2010] [Indexed: 10/19/2022]
|
43
|
Zhang N, Duan G, Gao S, Ruan J, Zhang T. Prediction of the parallel/antiparallel orientation of beta-strands using amino acid pairing preferences and support vector machines. J Theor Biol 2010; 263:360-8. [PMID: 20035768 DOI: 10.1016/j.jtbi.2009.12.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Revised: 11/05/2009] [Accepted: 12/17/2009] [Indexed: 10/20/2022]
|
44
|
Liu T, Zheng X, Wang J. Prediction of protein structural class using a complexity-based distance measure. Amino Acids 2009; 38:721-8. [PMID: 19330425 DOI: 10.1007/s00726-009-0276-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2008] [Accepted: 03/11/2009] [Indexed: 11/30/2022]
Abstract
Knowledge of structural class plays an important role in understanding protein folding patterns. So it is necessary to develop effective and reliable computational methods for prediction of protein structural class. To this end, we present a new method called NN-CDM, a nearest neighbor classifier with a complexity-based distance measure. Instead of extracting features from protein sequences as done previously, distance between each pair of protein sequences is directly evaluated by a complexity measure of symbol sequences. Then the nearest neighbor classifier is adopted as the predictive engine. To verify the performance of this method, jackknife cross-validation tests are performed on several benchmark datasets. Results show that our approach achieves a high prediction accuracy over some classical methods.
Collapse
Affiliation(s)
- Taigang Liu
- Department of Applied Mathematics, Dalian University of Technology, 116024 Dalian, China.
| | | | | |
Collapse
|
45
|
Ghanty P, Pal N. Prediction of Protein Folds: Extraction of New Features, Dimensionality Reduction, and Fusion of Heterogeneous Classifiers. IEEE Trans Nanobioscience 2009; 8:100-10. [DOI: 10.1109/tnb.2009.2016488] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
46
|
Xiao X, Lin WZ, Chou KC. Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes. J Comput Chem 2008; 29:2018-24. [PMID: 18381630 DOI: 10.1002/jcc.20955] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333000, China.
| | | | | |
Collapse
|
47
|
Chen K, Kurgan LA, Ruan J. Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem 2008; 29:1596-604. [PMID: 18293306 DOI: 10.1002/jcc.20918] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although existing structural class prediction methods applied virtually all state-of-the-art classifiers, many of them use a relatively simple protein sequence representation that often includes amino acid (AA) composition. To this end, we propose a novel sequence representation that incorporates evolutionary information encoded using PSI-BLAST profile-based collocation of AA pairs. We used six benchmark datasets and five representative classifiers to quantify and compare the quality of the structural class prediction with the proposed representation. The best, classifier support vector machine achieved 61-96% accuracy on the six datasets. These predictions were comprehensively compared with a wide range of recently proposed methods for prediction of structural classes. Our comprehensive comparison shows superiority of the proposed representation, which results in error rate reductions that range between 14% and 26% when compared with predictions of the best-performing, previously published classifiers on the considered datasets. The study also shows that, for the benchmark dataset that includes sequences characterized by low identity (i.e., 25%, 30%, and 40%), the prediction accuracies are 20-35% lower than for the other three datasets that include sequences with a higher degree of similarity. In conclusion, the proposed representation is shown to substantially improve the accuracy of the structural class prediction. A web server that implements the presented prediction method is freely available at http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html.
Collapse
Affiliation(s)
- Ke Chen
- Department of Electrical and Computer Engineering, ECERF, University of Alberta, Edmonton, Alberta, Canada
| | | | | |
Collapse
|
48
|
Xiao X, Wang P, Chou KC. Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image. J Theor Biol 2008; 254:691-6. [PMID: 18634802 DOI: 10.1016/j.jtbi.2008.06.016] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2008] [Revised: 06/18/2008] [Accepted: 06/18/2008] [Indexed: 11/28/2022]
Abstract
A novel approach was developed for predicting the structural classes of proteins based on their sequences. It was assumed that proteins belonging to the same structural class must bear some sort of similar texture on the images generated by the cellular automaton evolving rule [Wolfram, S., 1984. Cellular automation as models of complexity. Nature 311, 419-424]. Based on this, two geometric invariant moment factors derived from the image functions were used as the pseudo amino acid components [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct., Funct., Genet. (Erratum: ibid., 2001, vol. 44, 60) 43, 246-255] to formulate the protein samples for statistical prediction. The success rates thus obtained on a previously constructed benchmark dataset are quite promising, implying that the cellular automaton image can help to reveal some inherent and subtle features deeply hidden in a pile of long and complicated amino acid sequences.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 33300, China.
| | | | | |
Collapse
|
49
|
Feng Y, Luo L. Use of tetrapeptide signals for protein secondary-structure prediction. Amino Acids 2008; 35:607-14. [PMID: 18431531 DOI: 10.1007/s00726-008-0089-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2007] [Accepted: 03/04/2008] [Indexed: 10/22/2022]
Abstract
This paper develops a novel sequence-based method, tetra-peptide-based increment of diversity with quadratic discriminant analysis (TPIDQD for short), for protein secondary-structure prediction. The proposed TPIDQD method is based on tetra-peptide signals and is used to predict the structure of the central residue of a sequence fragment. The three-state overall per-residue accuracy (Q (3)) is about 80% in the threefold cross-validated test for 21-residue fragments in the CB513 dataset. The accuracy can be further improved by taking long-range sequence information (fragments of more than 21 residues) into account in prediction. The results show the tetra-peptide signals can indeed reflect some relationship between an amino acid's sequence and its secondary structure, indicating the importance of tetra-peptide signals as the protein folding code in the protein structure prediction.
Collapse
Affiliation(s)
- Yonge Feng
- Laboratory of Theoretical Biophysics, Faculty of Science and Technology, Inner Mongolia University, Hohhot, 010021, China.
| | | |
Collapse
|
50
|
Zhang SW, Chen W, Yang F, Pan Q. Using Chou's pseudo amino acid composition to predict protein quaternary structure: a sequence-segmented PseAAC approach. Amino Acids 2008; 35:591-8. [PMID: 18427713 DOI: 10.1007/s00726-008-0086-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2008] [Accepted: 02/28/2008] [Indexed: 12/11/2022]
Abstract
In the protein universe, many proteins are composed of two or more polypeptide chains, generally referred to as subunits, which associate through noncovalent interactions and, occasionally, disulfide bonds to form protein quaternary structures. It has long been known that the functions of proteins are closely related to their quaternary structures; some examples include enzymes, hemoglobin, DNA polymerase, and ion channels. However, it is extremely labor-expensive and even impossible to quickly determine the structures of hundreds of thousands of protein sequences solely from experiments. Since the number of protein sequences entering databanks is increasing rapidly, it is highly desirable to develop computational methods for classifying the quaternary structures of proteins from their primary sequences. Since the concept of Chou's pseudo amino acid composition (PseAAC) was introduced, a variety of approaches, such as residue conservation scores, von Neumann entropy, multiscale energy, autocorrelation function, moment descriptors, and cellular automata, have been utilized to formulate the PseAAC for predicting different attributes of proteins. Here, in a different approach, a sequence-segmented PseAAC is introduced to represent protein samples. Meanwhile, multiclass SVM classifier modules were adopted to classify protein quaternary structures. As a demonstration, the dataset constructed by Chou and Cai [(2003) Proteins 53:282-289] was adopted as a benchmark dataset. The overall jackknife success rates thus obtained were 88.2-89.1%, indicating that the new approach is quite promising for predicting protein quaternary structure.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- College of Automation, Northwestern Polytechnical University, 710072, Xi'an, China.
| | | | | | | |
Collapse
|