1
|
Montezano D, Bernstein R, Copeland MM, Slusky JSG. General features of transmembrane beta barrels from a large database. Proc Natl Acad Sci U S A 2023; 120:e2220762120. [PMID: 37432995 PMCID: PMC10629564 DOI: 10.1073/pnas.2220762120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Accepted: 06/03/2023] [Indexed: 07/13/2023] Open
Abstract
Large datasets contribute new insights to subjects formerly investigated by exemplars. We used coevolution data to create a large, high-quality database of transmembrane β-barrels (TMBB). By applying simple feature detection on generated evolutionary contact maps, our method (IsItABarrel) achieves 95.88% balanced accuracy when discriminating among protein classes. Moreover, comparison with IsItABarrel revealed a high rate of false positives in previous TMBB algorithms. In addition to being more accurate than previous datasets, our database (available online) contains 1,938,936 bacterial TMBB proteins from 38 phyla, respectively, 17 and 2.2 times larger than the previous sets TMBB-DB and OMPdb. We anticipate that due to its quality and size, the database will serve as a useful resource where high-quality TMBB sequence data are required. We found that TMBBs can be divided into 11 types, three of which have not been previously reported. We find tremendous variance in proteome percentage among TMBB-containing organisms with some using 6.79% of their proteome for TMBBs and others using as little as 0.27% of their proteome. The distribution of the lengths of the TMBBs is suggestive of previously hypothesized duplication events. In addition, we find that the C-terminal β-signal varies among different classes of bacteria though its consensus sequence is LGLGYRF. However, this β-signal is only characteristic of prototypical TMBBs. The ten non-prototypical barrel types have other C-terminal motifs, and it remains to be determined if these alternative motifs facilitate TMBB insertion or perform any other signaling function.
Collapse
Affiliation(s)
- Daniel Montezano
- Computational Biology Program, University of Kansas, Lawrence, KS66045
| | - Rebecca Bernstein
- Computational Biology Program, University of Kansas, Lawrence, KS66045
| | | | - Joanna S. G. Slusky
- Computational Biology Program, University of Kansas, Lawrence, KS66045
- Department of Molecular Biosciences, University of Kansas, Lawrence, KS66045
| |
Collapse
|
2
|
Yan R, Lin J, Chen Z, Wang X, Huang L, Cai W, Zhang Z. Prediction of outer membrane proteins by combining the position- and composition-based features of sequence profiles. MOLECULAR BIOSYSTEMS 2014; 10:1004-13. [DOI: 10.1039/c3mb70435a] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
3
|
Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. Comput Biol Chem 2013; 46:16-22. [DOI: 10.1016/j.compbiolchem.2013.05.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Revised: 04/24/2013] [Accepted: 05/03/2013] [Indexed: 01/15/2023]
|
4
|
Zuo YC, Chen W, Fan GL, Li QZ. A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids 2012; 44:573-80. [PMID: 22851052 DOI: 10.1007/s00726-012-1374-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 07/17/2012] [Indexed: 11/25/2022]
Abstract
The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm .
Collapse
Affiliation(s)
- Yong-Chun Zuo
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
| | | | | | | |
Collapse
|
5
|
E-komon T, Burchmore R, Herzyk P, Davies R. Predicting the outer membrane proteome of Pasteurella multocida based on consensus prediction enhanced by results integration and manual confirmation. BMC Bioinformatics 2012; 13:63. [PMID: 22540951 PMCID: PMC3403877 DOI: 10.1186/1471-2105-13-63] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 04/27/2012] [Indexed: 01/26/2023] Open
Abstract
Background Outer membrane proteins (OMPs) of Pasteurella multocida have various functions related to virulence and pathogenesis and represent important targets for vaccine development. Various bioinformatic algorithms can predict outer membrane localization and discriminate OMPs by structure or function. The designation of a confident prediction framework by integrating different predictors followed by consensus prediction, results integration and manual confirmation will improve the prediction of the outer membrane proteome. Results In the present study, we used 10 different predictors classified into three groups (subcellular localization, transmembrane β-barrel protein and lipoprotein predictors) to identify putative OMPs from two available P. multocida genomes: those of avian strain Pm70 and porcine non-toxigenic strain 3480. Predicted proteins in each group were filtered by optimized criteria for consensus prediction: at least two positive predictions for the subcellular localization predictors, three for the transmembrane β-barrel protein predictors and one for the lipoprotein predictors. The consensus predicted proteins were integrated from each group into a single list of proteins. We further incorporated a manual confirmation step including a public database search against PubMed and sequence analyses, e.g. sequence and structural homology, conserved motifs/domains, functional prediction, and protein-protein interactions to enhance the confidence of prediction. As a result, we were able to confidently predict 98 putative OMPs from the avian strain genome and 107 OMPs from the porcine strain genome with 83% overlap between the two genomes. Conclusions The bioinformatic framework developed in this study has increased the number of putative OMPs identified in P. multocida and allowed these OMPs to be identified with a higher degree of confidence. Our approach can be applied to investigate the outer membrane proteomes of other Gram-negative bacteria.
Collapse
Affiliation(s)
- Teerasak E-komon
- Institute of Infection, Immunity and Inflammation, College of Medical, Veterinary and Life Sciences, University of Glasgow, Sir Graeme Davies Building, Glasgow G12 8QQ, UK
| | | | | | | |
Collapse
|
6
|
Outer membrane proteins can be simply identified using secondary structure element alignment. BMC Bioinformatics 2011; 12:76. [PMID: 21414186 PMCID: PMC3072342 DOI: 10.1186/1471-2105-12-76] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2010] [Accepted: 03/17/2011] [Indexed: 02/04/2023] Open
Abstract
Background Outer membrane proteins (OMPs) are frequently found in the outer membranes of gram-negative bacteria, mitochondria and chloroplasts and have been found to play diverse functional roles. Computational discrimination of OMPs from globular proteins and other types of membrane proteins is helpful to accelerate new genome annotation and drug discovery. Results Based on the observation that almost all OMPs consist of antiparallel β-strands in a barrel shape and that their secondary structure arrangements differ from those of other types of proteins, we propose a simple method called SSEA-OMP to identify OMPs using secondary structure element alignment. Through intensive benchmark experiments, the proposed SSEA-OMP method is better than some well-established OMP detection methods. Conclusions The major advantage of SSEA-OMP is its good prediction performance considering its simplicity. The web server implements the method is freely accessible at http://protein.cau.edu.cn/SSEA-OMP/index.html.
Collapse
|
7
|
iFC²: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content. Amino Acids 2010; 40:963-73. [PMID: 20730460 DOI: 10.1007/s00726-010-0721-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 08/06/2010] [Indexed: 10/19/2022]
Abstract
Several descriptors of protein structure at the sequence and residue levels have been recently proposed. They are widely adopted in the analysis and prediction of structural and functional characteristics of proteins. Numerous in silico methods have been developed for sequence-based prediction of these descriptors. However, many of them do not have a public web-server and only a few integrate multiple descriptors to improve the predictions. We introduce iFC² (integrated prediction of fold, class, and content) server that is the first to integrate three modern predictors of sequence-level descriptors. They concern fold type (PFRES), structural class (SCEC), and secondary structure content (PSSC-core). The server exploits relations between the three descriptors to implement a cross-evaluation procedure that improves over the predictions of the individual methods. The iFC² annotates fold and class predictions as potentially correct/incorrect. When tested on datasets with low-similarity chains, for the fold prediction iFC² labels 82% of the PFRES predictions as correct and the accuracy of these predictions equals 72%. The accuracy of the remaining 28% of the PFRES predictions equals 38%. Similarly, our server assigns correct labels for over 79% of SCEC predictions, which are shown to be 98% accurate, while the remaining SCEC predictions are only 15% accurate. These results are shown to be competitive when contrasted against recent relevant web-servers. Predictions on CASP8 targets show that the content predicted by iFC² is competitive when compared with the content computed from the tertiary structures predicted by three best-performing methods in CASP8. The iFC² server is available at http://biomine.ece.ualberta.ca/1D/1D.html .
Collapse
|
8
|
Chen K, Kurgan LA, Ruan J. Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem 2008; 29:1596-604. [PMID: 18293306 DOI: 10.1002/jcc.20918] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although existing structural class prediction methods applied virtually all state-of-the-art classifiers, many of them use a relatively simple protein sequence representation that often includes amino acid (AA) composition. To this end, we propose a novel sequence representation that incorporates evolutionary information encoded using PSI-BLAST profile-based collocation of AA pairs. We used six benchmark datasets and five representative classifiers to quantify and compare the quality of the structural class prediction with the proposed representation. The best, classifier support vector machine achieved 61-96% accuracy on the six datasets. These predictions were comprehensively compared with a wide range of recently proposed methods for prediction of structural classes. Our comprehensive comparison shows superiority of the proposed representation, which results in error rate reductions that range between 14% and 26% when compared with predictions of the best-performing, previously published classifiers on the considered datasets. The study also shows that, for the benchmark dataset that includes sequences characterized by low identity (i.e., 25%, 30%, and 40%), the prediction accuracies are 20-35% lower than for the other three datasets that include sequences with a higher degree of similarity. In conclusion, the proposed representation is shown to substantially improve the accuracy of the structural class prediction. A web server that implements the presented prediction method is freely available at http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html.
Collapse
Affiliation(s)
- Ke Chen
- Department of Electrical and Computer Engineering, ECERF, University of Alberta, Edmonton, Alberta, Canada
| | | | | |
Collapse
|
9
|
Ou YY, Gromiha M, Chen SA, Suwa M. TMBETADISC-RBF: Discrimination of -barrel membrane proteins using RBF networks and PSSM profiles. Comput Biol Chem 2008; 32:227-31. [DOI: 10.1016/j.compbiolchem.2008.03.002] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2007] [Revised: 03/11/2008] [Accepted: 03/11/2008] [Indexed: 10/22/2022]
|
10
|
Kurgan LA, Zhang T, Zhang H, Shen S, Ruan J. Secondary structure-based assignment of the protein structural classes. Amino Acids 2008; 35:551-64. [DOI: 10.1007/s00726-008-0080-3] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Accepted: 02/27/2008] [Indexed: 11/24/2022]
|
11
|
Martin J, de Brevern AG, Camproux AC. In silico local structure approach: a case study on outer membrane proteins. Proteins 2008; 71:92-109. [PMID: 17932925 DOI: 10.1002/prot.21659] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The detection of Outer Membrane Proteins (OMP) in whole genomes is an actual question, their sequence characteristics have thus been intensively studied. This class of protein displays a common beta-barrel architecture, formed by adjacent antiparallel strands. However, due to the lack of available structures, few structural studies have been made on this class of proteins. Here we propose a novel OMP local structure investigation, based on a structural alphabet approach, i.e., the decomposition of 3D structures using a library of four-residue protein fragments. The optimal decomposition of structures using hidden Markov model results in a specific structural alphabet of 20 fragments, six of them dedicated to the decomposition of beta-strands. This optimal alphabet, called SA20-OMP, is analyzed in details, in terms of local structures and transitions between fragments. It highlights a particular and strong organization of beta-strands as series of regular canonical structural fragments. The comparison with alphabets learned on globular structures indicates that the internal organization of OMP structures is more constrained than in globular structures. The analysis of OMP structures using SA20-OMP reveals some recurrent structural patterns. The preferred location of fragments in the distinct regions of the membrane is investigated. The study of pairwise specificity of fragments reveals that some contacts between structural fragments in beta-sheets are clearly favored whereas others are avoided. This contact specificity is stronger in OMP than in globular structures. Moreover, SA20-OMP also captured sequential information. This can be integrated in a scoring function for structural model ranking with very promising results.
Collapse
Affiliation(s)
- Juliette Martin
- INSERM UMR-S 726/Université Denis Diderot Paris 7, Equipe de Bioinformatique Génomique et Moléculaire, F-75005 Paris
| | | | | |
Collapse
|
12
|
Lin H. The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. J Theor Biol 2008; 252:350-6. [PMID: 18355838 DOI: 10.1016/j.jtbi.2008.02.004] [Citation(s) in RCA: 182] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2007] [Revised: 12/02/2007] [Accepted: 02/04/2008] [Indexed: 11/15/2022]
Abstract
The outer membrane proteins (OMPs) are beta-barrel membrane proteins that performed lots of biology functions. The discriminating OMPs from other non-OMPs is a very important task for understanding some biochemical process. In this study, a method that combines increment of diversity with modified Mahalanobis Discriminant, called IDQD, is presented to predict 208 OMPs, 206 transmembrane helical proteins (TMHPs) and 673 globular proteins (GPs) by using Chou's pseudo amino acid compositions as parameters. The overall accuracy of jackknife cross-validation is 93.2% and 96.1%, respectively, for three datasets (OMPs, TMHPs and GPs) and two datasets (OMPs and non-OMPs). These predicted results suggest that the method can be effectively applied to discriminate OMPs, TMHPs and GPs. And it also indicates that the pseudo amino acid composition can better reflect the core feature of membrane proteins than the classical amino acid composition.
Collapse
Affiliation(s)
- Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
13
|
Gromiha MM, Yabuki Y, Suwa M. TMB finding pipeline: novel approach for detecting beta-barrel membrane proteins in genomic sequences. J Chem Inf Model 2007; 47:2456-61. [PMID: 17958348 DOI: 10.1021/ci700222s] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have developed a novel approach for dissecting transmembrane beta-barrel proteins (TMBs) in genomic sequences. The features include (i) the identification of TMBs using the preference of residue pairs in globular, transmembrane helical (TMH) and TMBs, (ii) elimination of globular/TMH proteins that show sequence identity of more than 70% for the coverage of 80% residues with known structures, (iii) elimination of globular/TMH proteins that have sequence identity of more than 60% with known sequences in SWISS-PROT, and (iv) exclusion of TMH proteins using SOSUI, a prediction system for TMH proteins. Our approach picked up 7% TMBs in all the considered genomes. The comparison between the identified TMBs in E. coli genome and available experimental data demonstrated that the new approach could correctly identify all the 11 known TMBs, whose crystal structures are available. Further, it revealed the presence of 19 TMBs, homology with known structures, 60 TMBs similar to well annotated sequences, and 54 TMBs that have high sequence similarity with Escherichia coli beta-barrel proteins deposited in Transport Classification Database (TCDB). Interestingly, the present approach identified TMBs from all 15 families in TCDB. In human genome, the occurrence of TMBs varies from 0 to 3% in different chromosomes. We suggest that our approach could lead to a step forward in the advancement of structural and functional genomics.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
14
|
Zhang G, Fang B. LogitBoost classifier for discriminating thermophilic and mesophilic proteins. J Biotechnol 2007; 127:417-24. [PMID: 17045354 DOI: 10.1016/j.jbiotec.2006.07.020] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2006] [Revised: 07/04/2006] [Accepted: 07/19/2006] [Indexed: 11/17/2022]
Abstract
A novel classifier, the so-called LogitBoost classifier, was introduced to discriminate the thermophilic and mesophilic proteins according to their primary structures. When the 20-amino acid composition was chosen as the feature vector, the overall accuracy of the self-consistency check and a five-fold cross-validation procedure was 97.0% and 86.6%, respectively. To test if the method was also applicable to a wide range of biological targets, an independent testing dataset was also used. The method based on LogitBoost algorithm has achieved an overall classification accuracy of 88.9%. According to the three different validation check approaches, it was demonstrated that LogitBoost outperformed AdaBoost and performed comparably with RBF neural network and support vector machine. The influence of protein size on discrimination was addressed.
Collapse
Affiliation(s)
- Guangya Zhang
- Key Laboratory of Industrial Biotechnology (Hua Qiao University), Fujian Province University, Quanzhou, 362021 Fujian, PR China.
| | | |
Collapse
|
15
|
Gromiha MM, Yabuki Y, Kundu S, Suharnan S, Suwa M. TMBETA-GENOME: database for annotated beta-barrel membrane proteins in genomic sequences. Nucleic Acids Res 2006; 35:D314-6. [PMID: 17088282 PMCID: PMC1669718 DOI: 10.1093/nar/gkl805] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
We have developed the database, TMBETA-GENOME, for annotated β-barrel membrane proteins in genomic sequences using statistical methods and machine learning algorithms. The statistical methods are based on amino acid composition, reside pair preference and motifs. In machine learning techniques, the combination of amino acid and dipeptide compositions has been used as main attributes. In addition, annotations have been made using the criterion based on the identification of β-barrel membrane proteins and exclusion of globular and transmembrane helical proteins. A web interface has been developed for identifying the annotated β-barrel membrane proteins in all known genomes. The users have the feasibility of selecting the genome from the three kingdoms of life, archaea, bacteria and eukaryote, and five different methods. Further, the statistics for all genomes have been provided along with the links to different algorithms and related databases. It is freely available at .
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST) AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | | | | | |
Collapse
|
16
|
Gromiha MM, Suwa M. Influence of amino acid properties for discriminating outer membrane proteins at better accuracy. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2006; 1764:1493-7. [PMID: 16963325 DOI: 10.1016/j.bbapap.2006.07.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2006] [Revised: 07/13/2006] [Accepted: 07/28/2006] [Indexed: 10/24/2022]
Abstract
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the influence of physico-chemical, energetic and conformational properties of amino acid residues for discriminating outer membrane proteins using different machine learning algorithms, such as, Bayes rules, Logistic functions, Neural networks, Support vector machines, Decision trees, etc. We observed that most of the properties have discriminated the OMPs with similar accuracy. The neural network method with the property, free energy change could discriminate the OMPs from other folding types of globular and membrane proteins at the 5-fold cross-validation accuracy of 94.4% in a dataset of 1,088 proteins, which is better than that obtained with amino acid composition. The accuracy of discriminating globular proteins is 94.3% and that of transmembrane helical (TMH) proteins is 91.8%. Further, the neural network method is tested with globular proteins belonging to 30 major folding types and it could successfully exclude 99.4% of the considered 1612 non-redundant proteins. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, CBRC, National Institute of Advanced Industrial Science and Technology, AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Tokyo 135-0064, Japan.
| | | |
Collapse
|
17
|
Gromiha MM, Suwa M. Discrimination of outer membrane proteins using machine learning algorithms. Proteins 2006; 63:1031-7. [PMID: 16493651 DOI: 10.1002/prot.20929] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees, etc. for discriminating OMPs. We found that most of the machine learning techniques discriminate OMPs with similar accuracy. The neural network-based method could discriminate the OMPs from other proteins [globular/transmembrane helical (TMH)] at the fivefold cross-validation accuracy of 91.0% in a dataset of 1,088 proteins. The accuracy of discriminating globular proteins is 88.8% and that of TMH proteins is 93.7%. Further, the neural network method is tested with globular proteins belonging to 30 different folding types and it could successfully exclude 95% of the considered proteins. The proteins with SAM domain such as knottins, rubredoxin, and thioredoxin folds are eliminated with 100% accuracy. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan.
| | | |
Collapse
|