1
|
Identification of adaptor proteins using the ANOVA feature selection technique. Methods 2022; 208:42-47. [DOI: 10.1016/j.ymeth.2022.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 10/01/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022] Open
|
2
|
Chen Y, Wei E, Chen Y, He P, Wang R, Wang Q, Tang X, Zhang Y, Zhu F, Shen Z. Identification and subcellular localization analysis of membrane protein Ycf 1 in the microsporidian Nosema bombycis. PeerJ 2022; 10:e13530. [PMID: 35833014 PMCID: PMC9272817 DOI: 10.7717/peerj.13530] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/11/2022] [Indexed: 01/22/2023] Open
Abstract
Microsporidia are obligate intracellular parasites that can infect a wide range of vertebrates and invertebrates including humans and insects, such as silkworm and bees. The microsporidium Nosema bombycis can cause pebrine in Bombyx mori, which is the most destructive disease in the sericulture industry. Although membrane proteins are involved in a wide range of cellular functions and part of many important metabolic pathways, there are rare reports about the membrane proteins of microsporidia up to now. We screened a putative membrane protein Ycf 1 from the midgut transcriptome of the N. bombycis-infected silkworm. Gene cloning and bioinformatics analysis showed that the Ycf 1 gene contains a complete open reading frame (ORF) of 969 bp in length encoding a 322 amino acid polypeptide that has one signal peptide and one transmembrane domain. Indirect immunofluorescence results showed that Ycf 1 protein is distributed on the plasma membrane. Expression pattern analysis showed that the Ycf 1 gene expressed in all developmental stages of N. bombycis. Knockdown of the Ycf 1 gene by RNAi effectively inhibited the proliferation of N. bombycis. These results indicated that Ycf 1 is a membrane protein and plays an important role in the life cycle of N. bombycis.
Collapse
Affiliation(s)
- Yong Chen
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| | - Erjun Wei
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| | - Ying Chen
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| | - Ping He
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| | - Runpeng Wang
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| | - Qiang Wang
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
- Chinese Academy of Agricultural Sciences, Institute of Sericulture, Zhenjiang, China
| | - Xudong Tang
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
- Chinese Academy of Agricultural Sciences, Institute of Sericulture, Zhenjiang, China
| | - Yiling Zhang
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
- Chinese Academy of Agricultural Sciences, Institute of Sericulture, Zhenjiang, China
| | - Feng Zhu
- Zaozhuang University, Zaozhuang, Shangdong, China
| | - Zhongyuan Shen
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
- Chinese Academy of Agricultural Sciences, Institute of Sericulture, Zhenjiang, China
| |
Collapse
|
3
|
Zhang X, Chen L. Prediction of membrane protein types by fusing protein-protein interaction and protein sequence information. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140524. [PMID: 32858174 DOI: 10.1016/j.bbapap.2020.140524] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 07/17/2020] [Accepted: 07/30/2020] [Indexed: 11/30/2022]
Abstract
Membrane proteins are gatekeepers to the cell and essential for determination of the function of cells. Identification of the types of membrane proteins is an essential problem in cell biology. It is time-consuming and expensive to identify the type of membrane proteins with traditional experimental methods. The alternative way is to design effective computational methods, which can provide quick and reliable predictions. To date, several computational methods have been proposed in this regard. Several of them used the features extracted from the sequence information of individual proteins. Recently, networks are more and more popular to tackle different protein-related problems, which can organize proteins in a system level and give an overview of all proteins. However, such form weakens the essential properties of proteins, such as their sequence information. In this study, a novel feature fusion scheme was proposed, which integrated the information of protein sequences and protein-protein interaction network. The fused features of a protein were defined as the linear combination of sequence features of all proteins in the network, where the combination coefficients were the probabilities yielded by the random walk with restart algorithm with the protein as the seed node. Several models with such fused features and different classification algorithms were built and evaluated. Their performance for predicting the type of membrane proteins was improved compared with the models only with the sequence features or network information.
Collapse
Affiliation(s)
- Xiaolin Zhang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.
| |
Collapse
|
4
|
Abstract
During the last three decades or so, many efforts have been made to study the protein cleavage
sites by some disease-causing enzyme, such as HIV (Human Immunodeficiency Virus) protease
and SARS (Severe Acute Respiratory Syndrome) coronavirus main proteinase. It has become increasingly
clear <i>via</i> this mini-review that the motivation driving the aforementioned studies is quite wise,
and that the results acquired through these studies are very rewarding, particularly for developing peptide
drugs.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
5
|
Chou KC. Proposing Pseudo Amino Acid Components is an Important Milestone for Proteome and Genome Analyses. Int J Pept Res Ther 2019. [DOI: 10.1007/s10989-019-09910-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
6
|
Prediction of membrane protein types by exploring local discriminative information from evolutionary profiles. Anal Biochem 2019; 564-565:123-132. [DOI: 10.1016/j.ab.2018.10.027] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2018] [Revised: 10/23/2018] [Accepted: 10/25/2018] [Indexed: 11/17/2022]
|
7
|
Agrawal P, Patiyal S, Kumar R, Kumar V, Singh H, Raghav PK, Raghava GPS. ccPDB 2.0: an updated version of datasets created and compiled from Protein Data Bank. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5298333. [PMID: 30689843 PMCID: PMC6343045 DOI: 10.1093/database/bay142] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 12/09/2018] [Indexed: 12/20/2022]
Abstract
ccPDB 2.0 (http://webs.iiitd.edu.in/raghava/ccpdb) is an updated version of the manually curated database ccPDB that maintains datasets required for developing methods to predict the structure and function of proteins. The number of datasets compiled from literature increased from 45 to 141 in ccPDB 2.0. Similarly, the number of protein structures used for creating datasets also increased from ~74 000 to ~137 000 (PDB March 2018 release). ccPDB 2.0 provides the same web services and flexible tools which were present in the previous version of the database. In the updated version, links of the number of methods developed in the past few years have also been incorporated. This updated resource is built on responsive templates which is compatible with smartphones (mobile, iPhone, iPad, tablets etc.) and large screen gadgets. In summary, ccPDB 2.0 is a user-friendly web-based platform that provides comprehensive as well as updated information about datasets.
Collapse
Affiliation(s)
- Piyush Agrawal
- Bioinformatics Center, CSIR-Institute of Microbial Technology, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Rajesh Kumar
- Bioinformatics Center, CSIR-Institute of Microbial Technology, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Vinod Kumar
- Bioinformatics Center, CSIR-Institute of Microbial Technology, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Harinder Singh
- J. Craig Venter Institute 9605 Medical Center Drive, Suite 150 Rockville, MD, USA
| | - Pawan Kumar Raghav
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| |
Collapse
|
8
|
Sankari ES, Manimegalai D. Predicting membrane protein types by incorporating a novel feature set into Chou's general PseAAC. J Theor Biol 2018; 455:319-328. [DOI: 10.1016/j.jtbi.2018.07.032] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 06/27/2018] [Accepted: 07/23/2018] [Indexed: 10/28/2022]
|
9
|
Sankari ES, Manimegalai D. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets. J Theor Biol 2017; 435:208-217. [PMID: 28941868 DOI: 10.1016/j.jtbi.2017.09.018] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 09/15/2017] [Accepted: 09/18/2017] [Indexed: 12/19/2022]
Abstract
Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier.
Collapse
Affiliation(s)
- E Siva Sankari
- Department of CSE, Government College of Engineering, Tirunelveli, Tamil Nadu, India.
| | - D Manimegalai
- Department of IT, National Engineering College, Kovilpatti, Tamil Nadu, India.
| |
Collapse
|
10
|
|
11
|
Liu B, Wu H, Chou KC. Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.94007] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
12
|
Butt AH, Rasool N, Khan YD. A Treatise to Computational Approaches Towards Prediction of Membrane Protein and Its Subtypes. J Membr Biol 2016; 250:55-76. [DOI: 10.1007/s00232-016-9937-7] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2016] [Accepted: 11/02/2016] [Indexed: 10/20/2022]
|
13
|
Saidijam M, Azizpour S, Patching SG. Amino acid composition analysis of human secondary transport proteins and implications for reliable membrane topology prediction. J Biomol Struct Dyn 2016; 35:929-949. [PMID: 27159787 DOI: 10.1080/07391102.2016.1167622] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Secondary transporters in humans are a large group of proteins that transport a wide range of ions, metals, organic and inorganic solutes involved in energy transduction, control of membrane potential and osmotic balance, metabolic processes and in the absorption or efflux of drugs and xenobiotics. They are also emerging as important targets for development of new drugs and as target sites for drug delivery to specific organs or tissues. We have performed amino acid composition (AAC) and phylogenetic analyses and membrane topology predictions for 336 human secondary transport proteins and used the results to confirm protein classification and to look for trends and correlations with structural domains and specific substrates and/or function. Some proteins showed statistically high contents of individual amino acids or of groups of amino acids with similar physicochemical properties. One recurring trend was a correlation between high contents of charged and/or polar residues with misleading results in predictions of membrane topology, which was especially prevalent in Mitochondrial Carrier family proteins. We demonstrate how charged or polar residues located in the middle of transmembrane helices can interfere with their identification by membrane topology tools resulting in missed helices in the prediction. Comparison of AAC in the human proteins with that in 235 secondary transport proteins from Escherichia coli revealed similar overall trends along with differences in average contents for some individual amino acids and groups of similar amino acids that are presumed to result from a greater number of functions and complexity in the higher organism.
Collapse
Affiliation(s)
- Massoud Saidijam
- a Department of Molecular Medicine and Genetics, Research Centre for Molecular Medicine, School of Medicine , Hamadan University of Medical Sciences , Hamadan , Iran
| | - Sonia Azizpour
- a Department of Molecular Medicine and Genetics, Research Centre for Molecular Medicine, School of Medicine , Hamadan University of Medical Sciences , Hamadan , Iran
| | - Simon G Patching
- b School of BioMedical Sciences and the Astbury Centre for Structural Molecular Biology , University of Leeds , Leeds LS2 9JT , UK
| |
Collapse
|
14
|
Liu X, Wang J, Yin M, Edwards B, Xu P. Supervised learning of sparse context reconstruction coefficients for data representation and classification. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-2042-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
|
16
|
|
17
|
Saidijam M, Patching SG. Amino acid composition analysis of secondary transport proteins from Escherichia coli with relation to functional classification, ligand specificity and structure. J Biomol Struct Dyn 2015; 33:2205-20. [DOI: 10.1080/07391102.2014.998283] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- Massoud Saidijam
- Department of Molecular Medicine and Genetics, Research Centre for Molecular Medicine, School of Medicine, Hamadan University of Medical Sciences , Hamadan, Iran
| | - Simon G. Patching
- Department of Molecular Medicine and Genetics, Research Centre for Molecular Medicine, School of Medicine, Hamadan University of Medical Sciences , Hamadan, Iran
| |
Collapse
|
18
|
Chen W, Lin H, Chou KC. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. MOLECULAR BIOSYSTEMS 2015; 11:2620-34. [DOI: 10.1039/c5mb00155b] [Citation(s) in RCA: 262] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
With the avalanche of DNA/RNA sequences generated in the post-genomic age, it is urgent to develop automated methods for analyzing the relationship between the sequences and their functions.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| | - Hao Lin
- Gordon Life Science Institute
- Boston
- USA
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
| | - Kuo-Chen Chou
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| |
Collapse
|
19
|
Chen P, Huang JZ, Gao X. LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinformatics 2014; 15 Suppl 15:S4. [PMID: 25474163 PMCID: PMC4271564 DOI: 10.1186/1471-2105-15-s15-s4] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Background Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.
Collapse
|
20
|
Nanni L, Lumini A, Brahnam S. An empirical study of different approaches for protein classification. ScientificWorldJournal 2014; 2014:236717. [PMID: 25028675 PMCID: PMC4084589 DOI: 10.1155/2014/236717] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 05/05/2014] [Accepted: 05/07/2014] [Indexed: 01/05/2023] Open
Abstract
Many domains would benefit from reliable and efficient systems for automatic protein classification. An area of particular interest in recent studies on automatic protein classification is the exploration of new methods for extracting features from a protein that work well for specific problems. These methods, however, are not generalizable and have proven useful in only a few domains. Our goal is to evaluate several feature extraction approaches for representing proteins by testing them across multiple datasets. Different types of protein representations are evaluated: those starting from the position specific scoring matrix of the proteins (PSSM), those derived from the amino-acid sequence, two matrix representations, and features taken from the 3D tertiary structure of the protein. We also test new variants of proteins descriptors. We develop our system experimentally by comparing and combining different descriptors taken from the protein representations. Each descriptor is used to train a separate support vector machine (SVM), and the results are combined by sum rule. Some stand-alone descriptors work well on some datasets but not on others. Through fusion, the different descriptors provide a performance that works well across all tested datasets, in some cases performing better than the state-of-the-art.
Collapse
Affiliation(s)
- Loris Nanni
- Dipartimento di Ingegneria dell'Informazione, Via Gradenigo 6/A, 35131 Padova, Italy
| | | | - Sheryl Brahnam
- Computer Information Systems, Missouri State University, 901 South National, Springfield, MO 65804, USA
| |
Collapse
|
21
|
Prediction of multi-type membrane proteins in human by an integrated approach. PLoS One 2014; 9:e93553. [PMID: 24676214 PMCID: PMC3968155 DOI: 10.1371/journal.pone.0093553] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Accepted: 03/05/2014] [Indexed: 11/29/2022] Open
Abstract
Membrane proteins were found to be involved in various cellular processes performing various important functions, which are mainly associated to their types. However, it is very time-consuming and expensive for traditional biophysical methods to identify membrane protein types. Although some computational tools predicting membrane protein types have been developed, most of them can only recognize one kind of type. Therefore, they are not as effective as one membrane protein can have several types at the same time. To our knowledge, few methods handling multiple types of membrane proteins were reported. In this study, we proposed an integrated approach to predict multiple types of membrane proteins by employing sequence homology and protein-protein interaction network. As a result, the prediction accuracies reached 87.65%, 81.39% and 70.79%, respectively, by the leave-one-out test on three datasets. It outperformed the nearest neighbor algorithm adopting pseudo amino acid composition. The method is anticipated to be an alternative tool for identifying membrane protein types. New metrics for evaluating performances of methods dealing with multi-label problems were also presented. The program of the method is available upon request.
Collapse
|
22
|
Han GS, Yu ZG, Anh V. A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou's PseAAC. J Theor Biol 2013; 344:31-9. [PMID: 24316387 DOI: 10.1016/j.jtbi.2013.11.017] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Revised: 10/16/2013] [Accepted: 11/24/2013] [Indexed: 01/12/2023]
Abstract
Membrane proteins play important roles in many biochemical processes and are also attractive targets of drug discovery for various diseases. The elucidation of membrane protein types provides clues for understanding the structure and function of proteins. Recently we developed a novel system for predicting protein subnuclear localizations. In this paper, we propose a simplified version of our system for predicting membrane protein types directly from primary protein structures, which incorporates amino acid classifications and physicochemical properties into a general form of pseudo-amino acid composition. In this simplified system, we will design a two-stage multi-class support vector machine combined with a two-step optimal feature selection process, which proves very effective in our experiments. The performance of the present method is evaluated on two benchmark datasets consisting of five types of membrane proteins. The overall accuracies of prediction for five types are 93.25% and 96.61% via the jackknife test and independent dataset test, respectively. These results indicate that our method is effective and valuable for predicting membrane protein types. A web server for the proposed method is available at http://www.juemengt.com/jcc/memty_page.php.
Collapse
Affiliation(s)
- Guo-Sheng Han
- School of Mathematics and Computational Science, Xiangtan University, Hunan 411105, China
| | - Zu-Guo Yu
- School of Mathematics and Computational Science, Xiangtan University, Hunan 411105, China; School of Mathematical Science, Queensland University of Technology, GPO Box 2434, Brisbane Q 4001, Australia.
| | - Vo Anh
- School of Mathematical Science, Queensland University of Technology, GPO Box 2434, Brisbane Q 4001, Australia
| |
Collapse
|
23
|
Deng X, Liu Q, Hu Y, Deng Y. TOPPER: topology prediction of transmembrane protein based on evidential reasoning. ScientificWorldJournal 2013; 2013:123731. [PMID: 23401665 PMCID: PMC3562663 DOI: 10.1155/2013/123731] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Accepted: 10/18/2012] [Indexed: 01/16/2023] Open
Abstract
The topology prediction of transmembrane protein is a hot research field in bioinformatics and molecular biology. It is a typical pattern recognition problem. Various prediction algorithms are developed to predict the transmembrane protein topology since the experimental techniques have been restricted by many stringent conditions. Usually, these individual prediction algorithms depend on various principles such as the hydrophobicity or charges of residues. In this paper, an evidential topology prediction method for transmembrane protein is proposed based on evidential reasoning, which is called TOPPER (topology prediction of transmembrane protein based on evidential reasoning). In the proposed method, the prediction results of multiple individual prediction algorithms can be transformed into BPAs (basic probability assignments) according to the confusion matrix. Then, the final prediction result can be obtained by the combination of each individual prediction base on Dempster's rule of combination. The experimental results show that the proposed method is superior to the individual prediction algorithms, which illustrates the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Xinyang Deng
- School of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Qi Liu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Department of Biomedical Informatics, Medical Center, Vanderbilt University, Nashville, TN 37235, USA
| | - Yong Hu
- Institute of Business Intelligence and Knowledge Discovery, Guangdong University of Foreign Studies, Sun Yat-sen University, Guangzhou 510006, China
| | - Yong Deng
- School of Computer and Information Science, Southwest University, Chongqing 400715, China
- School of Engineering, Vanderbilt University, Nashville, TN 37235, USA
| |
Collapse
|
24
|
Ding C, Yuan LF, Guo SH, Lin H, Chen W. Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J Proteomics 2012; 77:321-8. [DOI: 10.1016/j.jprot.2012.09.006] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2012] [Revised: 08/18/2012] [Accepted: 09/08/2012] [Indexed: 11/25/2022]
|
25
|
An empirical study on the matrix-based protein representations and their combination with sequence-based approaches. Amino Acids 2012; 44:887-901. [PMID: 23108592 DOI: 10.1007/s00726-012-1416-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 10/03/2012] [Indexed: 10/27/2022]
Abstract
Many domains have a stake in the development of reliable systems for automatic protein classification. Of particular interest in recent studies of automatic protein classification is the exploration of new methods for extracting features from a protein that enhance classification for specific problems. These methods have proven very useful in one or two domains, but they have failed to generalize well across several domains (i.e. classification problems). In this paper, we evaluate several feature extraction approaches for representing proteins with the aim of sequence-based protein classification. Several protein representations are evaluated, those starting from: the position specific scoring matrix (PSSM) of the proteins; the amino-acid sequence; a matrix representation of the protein, of dimension (length of the protein) ×20, obtained using the substitution matrices for representing each amino-acid as a vector. A valuable result is that a texture descriptor can be extracted from the PSSM protein representation which improves the performance of standard descriptors based on the PSSM representation. Experimentally, we develop our systems by comparing several protein descriptors on nine different datasets. Each descriptor is used to train a support vector machine (SVM) or an ensemble of SVM. Although different stand-alone descriptors work well on some datasets (but not on others), we have discovered that fusion among classifiers trained using different descriptors obtains a good performance across all the tested datasets. Matlab code/Datasets used in the proposed paper are available at http://www.bias.csr.unibo.it\nanni\PSSM.rar.
Collapse
|
26
|
Cheng X, Xiao X, Wu ZC, Wang P, Lin WZ. Swfoldrate: predicting protein folding rates from amino acid sequence with sliding window method. Proteins 2012; 81:140-8. [PMID: 22933332 DOI: 10.1002/prot.24171] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Revised: 07/20/2012] [Accepted: 08/25/2012] [Indexed: 01/18/2023]
Abstract
Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci-bioinfo.cn/swfrate/input.jsp.
Collapse
Affiliation(s)
- Xiang Cheng
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China
| | | | | | | | | |
Collapse
|
27
|
Du P, Wang X, Xu C, Gao Y. PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. Anal Biochem 2012; 425:117-9. [PMID: 22459120 DOI: 10.1016/j.ab.2012.03.015] [Citation(s) in RCA: 223] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2012] [Revised: 03/17/2012] [Accepted: 03/20/2012] [Indexed: 01/05/2023]
Abstract
The pseudo-amino acid composition has been widely used to convert complicated protein sequences with various lengths to fixed length digital feature vectors while keeping considerable sequence order information. However, so far the only software available to the public is the web server PseAAC (http://www.csbio.sjtu.edu.cn/bioinf/PseAAC), which has some limitations in dealing with large-scale datasets. Here, we propose a new cross-platform stand-alone software program, called PseAAC-Builder (http://www.pseb.sf.net), which can be used to generate various modes of Chou's pseudo-amino acid composition in a much more efficient and flexible way. It is anticipated that PseAAC-Builder may become a useful tool for studying various protein attributes.
Collapse
Affiliation(s)
- Pufeng Du
- School of Computer Science and Technology, Tianjin University, Tianjin 300072, China.
| | | | | | | |
Collapse
|