1
|
Baltoumas FA, Karatzas E, Liu S, Ovchinnikov S, Sofianatos Y, Chen IM, Kyrpides N, Pavlopoulos G. NMPFamsDB: a database of novel protein families from microbial metagenomes and metatranscriptomes. Nucleic Acids Res 2024; 52:D502-D512. [PMID: 37811892 PMCID: PMC10767849 DOI: 10.1093/nar/gkad800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 09/19/2023] [Indexed: 10/10/2023] Open
Abstract
The Novel Metagenome Protein Families Database (NMPFamsDB) is a database of metagenome- and metatranscriptome-derived protein families, whose members have no hits to proteins of reference genomes or Pfam domains. Each protein family is accompanied by multiple sequence alignments, Hidden Markov Models, taxonomic information, ecosystem and geolocation metadata, sequence and structure predictions, as well as 3D structure models predicted with AlphaFold2. In its current version, NMPFamsDB hosts over 100 000 protein families, each with at least 100 members. The reported protein families significantly expand (more than double) the number of known protein sequence clusters from reference genomes and reveal new insights into their habitat distribution, origins, functions and taxonomy. We expect NMPFamsDB to be a valuable resource for microbial proteome-wide analyses and for further discovery and characterization of novel functions. NMPFamsDB is publicly available in http://www.nmpfamsdb.org/ or https://bib.fleming.gr/NMPFamsDB.
Collapse
Affiliation(s)
- Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Sirui Liu
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA
| | - Yorgos Sofianatos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - I-Min Chen
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA
| | - Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 75 Mikras Asias Street, Athens 11527, Greece
| |
Collapse
|
2
|
Nastou KC, Tsaousis GN, Iconomidou VA. PerMemDB: A database for eukaryotic peripheral membrane proteins. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2019; 1862:183076. [PMID: 31629694 DOI: 10.1016/j.bbamem.2019.183076] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 09/11/2019] [Accepted: 09/12/2019] [Indexed: 12/11/2022]
Abstract
The majority of all proteins in cells interact with membranes either permanently or temporarily. Peripheral membrane proteins form transient complexes with membrane proteins and/or lipids, via non-covalent interactions and are of outmost importance, due to numerous cellular functions in which they participate. In an effort to collect data regarding this heterogeneous group of proteins we designed and constructed a database, called PerMemDB. PerMemDB is currently the most complete and comprehensive repository of data for eukaryotic peripheral membrane proteins deposited in UniProt or predicted with the use of MBPpred - a computational method that specializes in the detection of proteins that interact non-covalently with membrane lipids, via membrane binding domains. The first version of the database contains 231,770 peripheral membrane proteins from 1009 organisms. All entries have cross-references to other databases, literature references and annotation regarding their interactions with other proteins. Moreover, additional sequence annotation of the characteristic domains that allow these proteins to interact with membranes is available, due to the application of MBPpred. Through the web interface of PerMemDB, users can browse the contents of the database, submit advanced text searches and BLAST queries against the protein sequences deposited in PerMemDB. We expect this repository to serve as a source of information that will allow the scientific community to gain a deeper understanding of the evolution and function of peripheral membrane proteins via the enhancement of proteome-wide analyses. The database is available at: http://bioinformatics.biol.uoa.gr/db=permemdb.
Collapse
Affiliation(s)
- Katerina C Nastou
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece
| | - Georgios N Tsaousis
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece
| | - Vassiliki A Iconomidou
- Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15701, Greece.
| |
Collapse
|
3
|
Szalkai B, Grolmusz V. Near perfect protein multi-label classification with deep neural networks. Methods 2018; 132:50-56. [DOI: 10.1016/j.ymeth.2017.06.034] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Revised: 05/09/2017] [Accepted: 06/30/2017] [Indexed: 10/19/2022] Open
|
4
|
Nastou KC, Tsaousis GN, Papandreou NC, Hamodrakas SJ. MBPpred: Proteome-wide detection of membrane lipid-binding proteins using profile Hidden Markov Models. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2016; 1864:747-54. [DOI: 10.1016/j.bbapap.2016.03.015] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 03/02/2016] [Accepted: 03/25/2016] [Indexed: 01/09/2023]
|
5
|
Asraf SS, Rajnish K, Gunasekaran P. Genomics Perspectives of Bioethanol Producing Zymomonas Mobilis. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
In recent years, there has been continuous increase in demand for fossil fuels that has led to the need for new potential fuel sources. Biofuels, in particular ethanol, are of high interest because of dwindling fossil fuels. Among the ethanol producers, Zymomonas mobilis has acquired greater interest because it is a renewable source of bioethanol. Zymomonas mobilis is an aerotolerant, gram-negative, ethanol producing bacterium that shows high ethanol yield, tolerance, and greater productivity. This chapter focuses on recent efforts made to engineer Z. mobilis, transcriptomic, genome-based metabolomic studies, and bioinformatics exploitation of the available genomic data for the production of bioethanol. Recently, several bioinformatics tools have been used to predict the functional properties of the carbohydrate active ethanologenic enzymes in Z. mobilis. A number of processes were used to study the functional properties of the ethanologenic enzymes of Z. mobilis. Thus, functional genomics seeks to apply technologies that would help to improve the production of bioethanol by Z. mobilis.
Collapse
|
6
|
Saraç ÖS, Atalay V, Cetin-Atalay R. GOPred: GO molecular function prediction by combined classifiers. PLoS One 2010; 5:e12382. [PMID: 20824206 PMCID: PMC2930845 DOI: 10.1371/journal.pone.0012382] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2009] [Accepted: 06/22/2010] [Indexed: 11/18/2022] Open
Abstract
Functional protein annotation is an important matter for in vivo and in silico biology. Several computational methods have been proposed that make use of a wide range of features such as motifs, domains, homology, structure and physicochemical properties. There is no single method that performs best in all functional classification problems because information obtained using any of these features depends on the function to be assigned to the protein. In this study, we portray a novel approach that combines different methods to better represent protein function. First, we formulated the function annotation problem as a classification problem defined on 300 different Gene Ontology (GO) terms from molecular function aspect. We presented a method to form positive and negative training examples while taking into account the directed acyclic graph (DAG) structure and evidence codes of GO. We applied three different methods and their combinations. Results show that combining different methods improves prediction accuracy in most cases. The proposed method, GOPred, is available as an online computational annotation tool (http://kinaz.fen.bilkent.edu.tr/gopred).
Collapse
Affiliation(s)
- Ömer Sinan Saraç
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Rengul Cetin-Atalay
- Department of Molecular Biology and Genetics, Faculty of Science, Bilkent University, Ankara, Turkey
- * E-mail:
| |
Collapse
|
7
|
Identification of protein functions using a machine-learning approach based on sequence-derived properties. Proteome Sci 2009; 7:27. [PMID: 19664241 PMCID: PMC2731080 DOI: 10.1186/1477-5956-7-27] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2009] [Accepted: 08/09/2009] [Indexed: 02/07/2023] Open
Abstract
Background Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities. Results A highly accurate prediction method capable of identifying protein function, based solely on protein sequence properties, is described. This method analyses and identifies specific features of the protein sequence that are highly correlated with certain protein functions and determines the combination of protein sequence features that best characterises protein function. Thirty-three features that represent subtle differences in local regions and full regions of the protein sequences were introduced. On the basis of 484 features extracted solely from the protein sequence, models were built to predict the functions of 11 different proteins from a broad range of cellular components, molecular functions, and biological processes. The accuracy of protein function prediction using random forests with feature selection ranged from 94.23% to 100%. The local sequence information was found to have a broad range of applicability in predicting protein function. Conclusion We present an accurate prediction method using a machine-learning approach based solely on protein sequence properties. The primary contribution of this paper is to propose new PNPRD features representing global and/or local differences in sequences, based on positively and/or negatively charged residues, to assist in predicting protein function. In addition, we identified a compact and useful feature subset for predicting the function of various proteins. Our results indicate that sequence-based classifiers can provide good results among a broad range of proteins, that the proposed features are useful in predicting several functions, and that the combination of our and traditional features may support the creation of a discriminative feature set for specific protein functions.
Collapse
|
8
|
Faria D, Ferreira AEN, Falcão AO. Enzyme classification with peptide programs: a comparative study. BMC Bioinformatics 2009; 10:231. [PMID: 19630945 PMCID: PMC2724424 DOI: 10.1186/1471-2105-10-231] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 07/24/2009] [Indexed: 11/29/2022] Open
Abstract
Background Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning methods have been applied to predict protein function, but they lose much of the information encoded by protein sequences because they need to transform them to obtain data of fixed length. Results We have developed a machine learning methodology, called peptide programs (PPs), to deal directly with protein sequences and compared its performance with that of Support Vector Machines (SVMs) and BLAST in detailed enzyme classification tasks. Overall, the PPs and SVMs had a similar performance in terms of Matthews Correlation Coefficient, but the PPs had generally a higher precision. BLAST performed globally better than both methodologies, but the PPs had better results than BLAST and SVMs for the smaller datasets. Conclusion The higher precision of the PPs in comparison to the SVMs suggests that dealing with sequences is advantageous for detailed protein classification, as precision is essential to avoid annotation errors. The fact that the PPs performed better than BLAST for the smaller datasets demonstrates the potential of the methodology, but the drop in performance observed for the larger datasets indicates that further development is required. Possible strategies to address this issue include partitioning the datasets into smaller subsets and training individual PPs for each subset, or training several PPs for each dataset and combining them using a bagging strategy.
Collapse
Affiliation(s)
- Daniel Faria
- Department of Informatics, Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal.
| | | | | |
Collapse
|
9
|
Nugent T, Jones DT. Transmembrane protein topology prediction using support vector machines. BMC Bioinformatics 2009; 10:159. [PMID: 19470175 PMCID: PMC2700806 DOI: 10.1186/1471-2105-10-159] [Citation(s) in RCA: 299] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2008] [Accepted: 05/26/2009] [Indexed: 12/02/2022] Open
Abstract
Background Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated. Results We present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from . Conclusion The high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins.
Collapse
Affiliation(s)
- Timothy Nugent
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
| | | |
Collapse
|
10
|
Sarac OS, Gürsoy-Yüzügüllü O, Cetin-Atalay R, Atalay V. Subsequence-based feature map for protein function classification. Comput Biol Chem 2007; 32:122-30. [PMID: 18243801 DOI: 10.1016/j.compbiolchem.2007.11.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2007] [Accepted: 11/30/2007] [Indexed: 11/19/2022]
Abstract
Automated classification of proteins is indispensable for further in vivo investigation of excessive number of unknown sequences generated by large scale molecular biology techniques. This study describes a discriminative system based on feature space mapping, called subsequence profile map (SPMap) for functional classification of protein sequences. SPMap takes into account the information coming from the subsequences of a protein. A group of protein sequences that belong to the same level of classification is decomposed into fixed-length subsequences and they are clustered to obtain a representative feature space mapping. Mapping is defined as the distribution of the subsequences of a protein sequence over these clusters. The resulting feature space representation is used to train discriminative classifiers for functional families. The aim of this approach is to incorporate information coming from important subregions that are conserved over a family of proteins while avoiding the difficult task of explicit motif identification. The performance of the method was assessed through tests on various protein classification tasks. Our results showed that SPMap is capable of high accuracy classification in most of these tasks. Furthermore SPMap is fast and scalable enough to handle large datasets.
Collapse
Affiliation(s)
- Omer Sinan Sarac
- Department of Computer Engineering, Middle East Technical University, 06531 Ankara, Turkey
| | | | | | | |
Collapse
|
11
|
Nagarajan V, Elasri MO. Structure and function predictions of the Msa protein in Staphylococcus aureus. BMC Bioinformatics 2007; 8 Suppl 7:S5. [PMID: 18047728 PMCID: PMC2099497 DOI: 10.1186/1471-2105-8-s7-s5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Staphylococcus aureus is a human pathogen that causes a wide variety of life-threatening infections using a large number of virulence factors. One of the major global regulators used by S. aureus is the staphylococcal accessory regulator (sarA). We have identified and characterized a new gene (modulator of sarA: msa) that modulates the expression of sarA. Genetic and functional analysis shows that msa has a global effect on gene expression in S. aureus. However, the mechanism of Msa function is still unknown. Function predictions of Msa are complicated by the fact that it does not have a homologous partner in any other organism. This work aims at predicting the structure and function of the Msa protein. RESULTS Preliminary sequence analysis showed that Msa is a putative membrane protein. It would therefore be very difficult to purify and crystallize Msa in order to acquire structure information about this protein. We have used several computational tools to predict the physico-chemical properties, secondary structural features, topology, 3D tertiary structure, binding sites, motifs/patterns/domains and cellular location. We have built a consensus that is derived from analysis using different algorithms to predict several structural features. We confirm that Msa is a putative membrane protein with three transmembrane regions. We also predict that Msa has phosphorylation sites and binding sites suggesting functions in signal transduction. CONCLUSION Based on our predictions we hypothesise that Msa is a novel signal transducer that might be involved in the interaction of the S. aureus with its environment.
Collapse
Affiliation(s)
- Vijayaraj Nagarajan
- Department of Biological Sciences, The University of Southern Mississippi, Hattiesburg, MS 39406, USA.
| | | |
Collapse
|
12
|
Fernández M, Caballero J. Analysis of protegrin structure–activity relationships: the structural characteristics important for antimicrobial activity using smoothed amino acid sequence descriptors. MOLECULAR SIMULATION 2007. [DOI: 10.1080/08927020701236771] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
13
|
Waltman P, Blumer A, Kaplan D. FiberID-A technique to identify fibrous protein subclasses. Proteins 2006; 66:127-35. [PMID: 17039548 DOI: 10.1002/prot.21128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Fibrous proteins such as collagen, silk, and elastin play critical biological roles, yet they have been the subject of few projects that use computational techniques to predict either their class or their structure. In this article, we present FiberID, a simple yet effective method for identifying and distinguishing three fibrous protein subclasses from their primary sequences. Using a combination of amino acid composition and fast Fourier measurements, FiberID can classify fibrous proteins belonging to these subclasses with high accuracy by using two standard machine learning techniques (decision trees and Naïve Bayesian classifiers). After presenting our results, we present several fibrous sequences that are regularly misclassified by FiberID as sequences of potential interest for further study. Finally, we analyze the decision trees developed by FiberID for potential insights regarding the structure of these proteins.
Collapse
Affiliation(s)
- Peter Waltman
- Department of Computer Science, Tufts University, Medford, Massachusetts 02155, USA
| | | | | |
Collapse
|
14
|
Bravo IG, Alonso A. Mucosal human papillomaviruses encode four different E5 proteins whose chemistry and phylogeny correlate with malignant or benign growth. J Virol 2004; 78:13613-26. [PMID: 15564472 PMCID: PMC533923 DOI: 10.1128/jvi.78.24.13613-13626.2004] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
We performed a phylogenetic study of the E2-L2 region of human mucosal papillomaviruses (PVs) and of the proteins therein encoded. Hitherto, proteins codified in this region were known as E5 proteins. We show that many of these proteins could be spurious translations, according to phylogenetic and chemical coherence criteria between similar protein sequences. We show that there are four separate families of E5 proteins, with different characteristics of phylogeny, chemistry, and rate of evolution. For the sake of clarity, we propose a change in the present nomenclature. E5alpha is present in groups A5, A6, A7, A9, and A11, PVs highly associated with malignant carcinomas of the cervix and penis. E5beta is present in groups A2, A3, A4, and A12, i.e., viruses associated with certain warts. E5gamma is present in group A10, and E5delta is encoded in groups A1, A8, and A10, which are associated with benign transformations. The phylogenetic relationships between mucosal human PVs are the same when considering the oncoproteins E6 and E7 and the E5 proteins and differ from the phylogeny estimated for the structural proteins L1 and L2. Besides, the protein divergence rate is higher in early proteins than in late proteins, increasing in the order L1 < L2 < E6 approximately E7 < E5. Moreover, the same proteins have diverged more rapidly in viruses associated with malignant transformations than in viruses associated with benign transformations. The E5 proteins display, therefore, evolutionary characteristics similar to those of the E6 and E7 oncoproteins. This could reflect a differential involvement of the E5 types in the transformation processes.
Collapse
Affiliation(s)
- Ignacio G Bravo
- Deutsches Krebsforschungszentrum, Im Neuenheimer Feld-242, 69120 Heidelberg, Germany.
| | | |
Collapse
|
15
|
Bagos PG, Liakopoulos TD, Spyropoulos IC, Hamodrakas SJ. PRED-TMBB: a web server for predicting the topology of beta-barrel outer membrane proteins. Nucleic Acids Res 2004; 32:W400-4. [PMID: 15215419 PMCID: PMC441555 DOI: 10.1093/nar/gkh417] [Citation(s) in RCA: 270] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The beta-barrel outer membrane proteins constitute one of the two known structural classes of membrane proteins. Whereas there are several different web-based predictors for alpha-helical membrane proteins, currently there is no freely available prediction method for beta-barrel membrane proteins, at least with an acceptable level of accuracy. We present here a web server (PRED-TMBB, http://bioinformatics.biol.uoa.gr/PRED-TMBB) which is capable of predicting the transmembrane strands and the topology of beta-barrel outer membrane proteins of Gram-negative bacteria. The method is based on a Hidden Markov Model, trained according to the Conditional Maximum Likelihood criterion. The model was retrained and the training set now includes 16 non-homologous outer membrane proteins with structures known at atomic resolution. The user may submit one sequence at a time and has the option of choosing between three different decoding methods. The server reports the predicted topology of a given protein, a score indicating the probability of the protein being an outer membrane beta-barrel protein, posterior probabilities for the transmembrane strand prediction and a graphical representation of the assumed position of the transmembrane strands with respect to the lipid bilayer.
Collapse
Affiliation(s)
- Pantelis G Bagos
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens 15701, Greece.
| | | | | | | |
Collapse
|
16
|
Bagos PG, Liakopoulos TD, Spyropoulos IC, Hamodrakas SJ. A Hidden Markov Model method, capable of predicting and discriminating beta-barrel outer membrane proteins. BMC Bioinformatics 2004; 5:29. [PMID: 15070403 PMCID: PMC385222 DOI: 10.1186/1471-2105-5-29] [Citation(s) in RCA: 138] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2003] [Accepted: 03/15/2004] [Indexed: 11/10/2022] Open
Abstract
Background Integral membrane proteins constitute about 20–30% of all proteins in the fully sequenced genomes. They come in two structural classes, the α-helical and the β-barrel membrane proteins, demonstrating different physicochemical characteristics, structure and localization. While transmembrane segment prediction for the α-helical integral membrane proteins appears to be an easy task nowadays, the same is much more difficult for the β-barrel membrane proteins. We developed a method, based on a Hidden Markov Model, capable of predicting the transmembrane β-strands of the outer membrane proteins of gram-negative bacteria, and discriminating those from water-soluble proteins in large datasets. The model is trained in a discriminative manner, aiming at maximizing the probability of correct predictions rather than the likelihood of the sequences. Results The training has been performed on a non-redundant database of 14 outer membrane proteins with structures known at atomic resolution; it has been tested with a jacknife procedure, yielding a per residue accuracy of 84.2% and a correlation coefficient of 0.72, whereas for the self-consistency test the per residue accuracy was 88.1% and the correlation coefficient 0.824. The total number of correctly predicted topologies is 10 out of 14 in the self-consistency test, and 9 out of 14 in the jacknife. Furthermore, the model is capable of discriminating outer membrane from water-soluble proteins in large-scale applications, with a success rate of 88.8% and 89.2% for the correct classification of outer membrane and water-soluble proteins respectively, the highest rates obtained in the literature. That test has been performed independently on a set of known outer membrane proteins with low sequence identity with each other and also with the proteins of the training set. Conclusion Based on the above, we developed a strategy, that enabled us to screen the entire proteome of E. coli for outer membrane proteins. The results were satisfactory, thus the method presented here appears to be suitable for screening entire proteomes for the discovery of novel outer membrane proteins. A web interface available for non-commercial users is located at: , and it is the only freely available HMM-based predictor for β-barrel outer membrane protein topology.
Collapse
Affiliation(s)
- Pantelis G Bagos
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens 15701, GREECE
| | - Theodore D Liakopoulos
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens 15701, GREECE
| | - Ioannis C Spyropoulos
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens 15701, GREECE
| | - Stavros J Hamodrakas
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens 15701, GREECE
| |
Collapse
|
17
|
Vernikos GS, Gkogkas CG, Promponas VJ, Hamodrakas SJ. GeneViTo: visualizing gene-product functional and structural features in genomic datasets. BMC Bioinformatics 2003; 4:53. [PMID: 14594459 PMCID: PMC280652 DOI: 10.1186/1471-2105-4-53] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2003] [Accepted: 10/31/2003] [Indexed: 11/17/2022] Open
Abstract
Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources) and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI) allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating systems, provided that the appropriate Java Runtime Environment is already installed in the system.
Collapse
Affiliation(s)
- Georgios S Vernikos
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, 15701, Athens, Greece
| | - Christos G Gkogkas
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, 15701, Athens, Greece
| | - Vasilis J Promponas
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, 15701, Athens, Greece
| | - Stavros J Hamodrakas
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, 15701, Athens, Greece
| |
Collapse
|
18
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2002. [PMCID: PMC2447231 DOI: 10.1002/cfg.116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|