1
|
Chen D, Li S, Chen Y. ISTRF: Identification of sucrose transporter using random forest. Front Genet 2022; 13:1012828. [PMID: 36171889 PMCID: PMC9511101 DOI: 10.3389/fgene.2022.1012828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 08/22/2022] [Indexed: 12/05/2022] Open
Abstract
Sucrose transporter (SUT) is a type of transmembrane protein that exists widely in plants and plays a significant role in the transportation of sucrose and the specific signal sensing process of sucrose. Therefore, identifying sucrose transporter is significant to the study of seed development and plant flowering and growth. In this study, a random forest-based model named ISTRF was proposed to identify sucrose transporter. First, a database containing 382 SUT proteins and 911 non-SUT proteins was constructed based on the UniProt and PFAM databases. Second, k-separated-bigrams-PSSM was exploited to represent protein sequence. Third, to overcome the influence of imbalance of samples on identification performance, the Borderline-SMOTE algorithm was used to overcome the shortcoming of imbalance training data. Finally, the random forest algorithm was used to train the identification model. It was proved by 10-fold cross-validation results that k-separated-bigrams-PSSM was the most distinguishable feature for identifying sucrose transporters. The Borderline-SMOTE algorithm can improve the performance of the identification model. Furthermore, random forest was superior to other classifiers on almost all indicators. Compared with other identification models, ISTRF has the best general performance and makes great improvements in identifying sucrose transporter proteins.
Collapse
Affiliation(s)
- Dong Chen
- College of Electrical and Information Engineering, Qu Zhou University, Quzhou, China
| | - Sai Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yu Chen
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
2
|
Feng C, Wei H, Yang D, Feng B, Ma Z, Han S, Zou Q, Shi H. ORS-Pred: An optimized reduced scheme-based identifier for antioxidant proteins. Proteomics 2021; 21:e2100017. [PMID: 34009737 DOI: 10.1002/pmic.202100017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 04/22/2021] [Accepted: 05/12/2021] [Indexed: 12/30/2022]
Abstract
Antioxidant proteins can terminate a chain of reactions caused by free radicals and protect cells from damage. To identify antioxidant proteins rapidly, a computational model was proposed based on the optimized recoding scheme, sequence information and machine learning methods. First, over 600 recoding schemes were collected to build a scheme set. Then, the original sequence was recoded as a reduced expression whose g-gap dipeptides (g = 0, 1, 2) were used as the features of proteins. Furthermore, a random forest method was used to evaluate the classification ability of the obtained dipeptide features. After going through all schemes, the best predictive performance scheme was chosen as the optimized reduction scheme. Finally, for the RF method, a grid search strategy was used to select a better parameter combination to identify antioxidant proteins. In the experiment, the present method correctly recognized 90.13-99.87% of the antioxidant samples. Other experimental results also proved that the present method was efficient to identify antioxidant proteins. Finally, we also developed a web server that was freely accessible to researchers.
Collapse
Affiliation(s)
- Changli Feng
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Haiyan Wei
- Department of Teachers and Education, Taishan University, Taian, China
| | - Deyun Yang
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Bin Feng
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Zhaogui Ma
- Department of Information Science and Technology, Taishan University, Taian, China
| | - Shuguang Han
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,China and Hainan Key Laboratory for Computational Science and Application, Hainan Normal University, Haikou, China
| | - Hua Shi
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China
| |
Collapse
|
3
|
Wu L, Li M. Applying a Probabilistic Network Method to Solve Business-Related Few-Shot Classification Problems. COMPLEXITY 2021; 2021:1-12. [DOI: 10.1155/2021/6633906] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
It can be challenging to learn algorithms due to the research of business-related few-shot classification problems. Therefore, in this paper, we evaluate the classification of few-shot learning in the commercial field. To accurately identify the categories of few-shot learning problems, we proposed a probabilistic network (PN) method based on few-shot and one-shot learning problems. The enhancement of the original data was followed by the subsequent development of the PN method based on feature extraction, category comparison, and loss function analysis. The effectiveness of the method was validated using two examples (absenteeism at work and Las Vegas Strip hotels). Experimental results demonstrate the ability of the PN method to effectively identify the categories of commercial few-shot learning problems. Therefore, the proposed method can be applied to business-related few-shot classification problems.
Collapse
Affiliation(s)
- Lang Wu
- School of Applied Science, Beijing Information Science and Technology University, Beijing, China
| | - Menggang Li
- School of Economics and Management, Beijing Jiaotong University, Beijing, China
| |
Collapse
|
4
|
Hou R, Wang L, Wu YJ. Predicting ATP-Binding Cassette Transporters Using the Random Forest Method. Front Genet 2020; 11:156. [PMID: 32269586 PMCID: PMC7109328 DOI: 10.3389/fgene.2020.00156] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 02/11/2020] [Indexed: 12/21/2022] Open
Abstract
ATP-binding cassette (ABC) proteins play important roles in a wide variety of species. These proteins are involved in absorbing nutrients, exporting toxic substances, and regulating potassium channels, and they contribute to drug resistance in cancer cells. Therefore, the identification of ABC transporters is an urgent task. The present study used 188D as the feature extraction method, which is based on sequence information and physicochemical properties. We also visualized the feature extracted by t-Distributed Stochastic Neighbor Embedding (t-SNE). The sample based on the features extracted by 188D may be separated. Further, random forest (RF) is an efficient classifier to identify proteins. Under the 10-fold cross-validation of the model proposed here for a training set, the average accuracy rate of 10 training sets was 89.54%. We obtained values of 0.87 for specificity, 0.92 for sensitivity, and 0.79 for MCC. In the testing set, the accuracy achieved was 89%. These results suggest that the model combining 188D with RF is an optimal tool to identify ABC transporters.
Collapse
Affiliation(s)
- Ruiyan Hou
- Laboratory of Molecular Toxicology, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Lida Wang
- Department of Scientific Research, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yi-Jun Wu
- Laboratory of Molecular Toxicology, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
5
|
Identification of Hot Spots in Protein Structures Using Gaussian Network Model and Gaussian Naive Bayes. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4354901. [PMID: 27882325 PMCID: PMC5110947 DOI: 10.1155/2016/4354901] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2016] [Revised: 10/02/2016] [Accepted: 10/11/2016] [Indexed: 01/21/2023]
Abstract
Residue fluctuations in protein structures have been shown to be highly associated with various protein functions. Gaussian network model (GNM), a simple representative coarse-grained model, was widely adopted to reveal function-related protein dynamics. We directly utilized the high frequency modes generated by GNM and further performed Gaussian Naive Bayes (GNB) to identify hot spot residues. Two coding schemes about the feature vectors were implemented with varying distance cutoffs for GNM and sliding window sizes for GNB based on tenfold cross validations: one by using only a single high mode and the other by combining multiple modes with the highest frequency. Our proposed methods outperformed the previous work that did not directly utilize the high frequency modes generated by GNM, with regard to overall performance evaluated using F1 measure. Moreover, we found that inclusion of more high frequency modes for a GNB classifier can significantly improve the sensitivity. The present study provided additional valuable insights into the relation between the hot spots and the residue fluctuations.
Collapse
|
6
|
“Barcode” and Differential Effects of GPCR Phosphorylation by Different GRKs. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2016. [DOI: 10.1007/978-1-4939-3798-1_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
7
|
Guruceaga E, Sanchez del Pino MM, Corrales FJ, Segura V. Prediction of a Missing Protein Expression Map in the Context of the Human Proteome Project. J Proteome Res 2015; 14:1350-60. [DOI: 10.1021/pr500850u] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
8
|
Bioinformatics tools for predicting GPCR gene functions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 796:205-24. [PMID: 24158807 DOI: 10.1007/978-94-007-7423-0_10] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
The automatic classification of GPCRs by bioinformatics methodology can provide functional information for new GPCRs in the whole 'GPCR proteome' and this information is important for the development of novel drugs. Since GPCR proteome is classified hierarchically, general ways for GPCR function prediction are based on hierarchical classification. Various computational tools have been developed to predict GPCR functions; those tools use not simple sequence searches but more powerful methods, such as alignment-free methods, statistical model methods, and machine learning methods used in protein sequence analysis, based on learning datasets. The first stage of hierarchical function prediction involves the discrimination of GPCRs from non-GPCRs and the second stage involves the classification of the predicted GPCR candidates into family, subfamily, and sub-subfamily levels. Then, further classification is performed according to their protein-protein interaction type: binding G-protein type, oligomerized partner type, etc. Those methods have achieved predictive accuracies of around 90 %. Finally, I described the future subject of research of the bioinformatics technique about functional prediction of GPCR.
Collapse
|
9
|
Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One 2014; 9:e86703. [PMID: 24475169 PMCID: PMC3901691 DOI: 10.1371/journal.pone.0086703] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Accepted: 12/10/2013] [Indexed: 11/22/2022] Open
Abstract
Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins.
Collapse
|
10
|
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:567529. [PMID: 24062796 PMCID: PMC3766563 DOI: 10.1155/2013/567529] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Revised: 07/20/2013] [Accepted: 07/22/2013] [Indexed: 12/22/2022]
Abstract
Antioxidant proteins are substances that protect cells from the damage caused by free radicals. Accurate identification of new antioxidant proteins is important in understanding their roles in delaying aging. Therefore, it is highly desirable to develop computational methods to identify antioxidant proteins. In this study, a Naïve Bayes-based method was proposed to predict antioxidant proteins using amino acid compositions and dipeptide compositions. In order to remove redundant information, a novel feature selection technique was employed to single out optimized features. In the jackknife test, the proposed method achieved an accuracy of 66.88% for the discrimination between antioxidant and nonantioxidant proteins, which is superior to that of other state-of-the-art classifiers. These results suggest that the proposed method could be an effective and promising high-throughput method for antioxidant protein identification.
Collapse
Affiliation(s)
- Peng-Mian Feng
- School of Public Health, Hebei United University, Tangshan 063000, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China
| |
Collapse
|
11
|
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:530696. [PMID: 23762187 PMCID: PMC3671239 DOI: 10.1155/2013/530696] [Citation(s) in RCA: 107] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/10/2013] [Revised: 04/16/2013] [Accepted: 04/28/2013] [Indexed: 12/31/2022]
Abstract
Knowledge about the protein composition of phage virions is a key step to understand the functions of phage virion proteins. However, the experimental method to identify virion proteins is time consuming and expensive. Thus, it is highly desirable to develop novel computational methods for phage virion protein identification. In this study, a Naïve Bayes based method was proposed to predict phage virion proteins using amino acid composition and dipeptide composition. In order to remove redundant information, a novel feature selection technique was employed to single out optimized features. In the jackknife test, the proposed method achieved an accuracy of 79.15% for phage virion and nonvirion proteins classification, which are superior to that of other state-of-the-art classifiers. These results indicate that the proposed method could be as an effective and promising high-throughput method in phage proteomics research.
Collapse
Affiliation(s)
- Peng-Mian Feng
- School of Public Health, Hebei United University, Tangshan 063000, China
| | - Hui Ding
- Key Laboratory for Neuroinformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China
| | - Hao Lin
- Key Laboratory for Neuroinformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
12
|
Classification of G proteins and prediction of GPCRs-G proteins coupling specificity using continuous wavelet transform and information theory. Amino Acids 2011; 43:793-804. [PMID: 22086210 DOI: 10.1007/s00726-011-1133-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Accepted: 10/20/2011] [Indexed: 10/15/2022]
Abstract
The coupling between G protein-coupled receptors (GPCRs) and guanine nucleotide-binding proteins (G proteins) regulates various signal transductions from extracellular space into the cell. However, the coupling mechanism between GPCRs and G proteins is still unknown, and experimental determination of their coupling specificity and function is both expensive and time consuming. Therefore, it is significant to develop a theoretical method to predict the coupling specificity between GPCRs and G proteins as well as their function using their primary sequences. In this study, a novel four-layer predictor (GPCRsG_CWTIT) based on support vector machine (SVM), continuous wavelet transform (CWT) and information theory (IT) is developed to classify G proteins and predict the coupling specificity between GPCRs and G proteins. SVM is used for construction of models. CWT and IT are used to characterize the primary structure of protein. Performance of GPCRsG_CWTIT is evaluated with cross-validation test on various working dataset. The overall accuracy of the G proteins at the levels of class and family is 98.23 and 85.42%, respectively. The accuracy of the coupling specificity prediction varies from 74.60 to 94.30%. These results indicate that the proposed predictor is an effective and feasible tool to predict the coupling specificity between GPCRs and G proteins as well as their functions using only the protein full sequence. The establishment of such an accurate prediction method will facilitate drug discovery by improving the ability to identify and predict protein-protein interactions. GPCRsG_CWTIT and dataset can be acquired freely on request from the authors.
Collapse
|
13
|
DeMars G, Fanelli F, Puett D. The extreme C-terminal region of Gαs differentially couples to the luteinizing hormone and beta2-adrenergic receptors. Mol Endocrinol 2011; 25:1416-30. [PMID: 21622536 DOI: 10.1210/me.2011-0009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The mechanisms of G protein coupling to G protein-coupled receptors (GPCR) share general characteristics but may exhibit specific interactions unique for each GPCR/G protein partnership. The extreme C terminus (CT) of G protein α-subunits has been shown to be important for association with GPCR. Hypothesizing that the extreme CT of Gα(s) is an essential component of the molecular landscape of the GPCR, human LH receptor (LHR), and β(2)-adrenergic receptor (β(2)-AR), a model cell system was created for the expression and manipulation of Gα(s) subunits in LHR(+) s49 ck cells that lack endogenous Gα(s). On the basis of studies involving truncations, mutations, and chain extensions of Gα(s), the CT was found to be necessary for LHR and β(2)-AR signaling. Some general similarities were found for the responses of the two receptors, but significant differences were also noted. Computational modeling was performed with a combination of comparative modeling, molecular dynamics simulations, and rigid body docking. The resulting models, focused on the Gα(s) CT, are supported by the experimental observations and are characterized by the interaction of the four extreme CT amino acid residues of Gα(s) with residues in LHR and β(2)-AR helix 3, (including R of the DRY motif), helix 6, and intracellular loop 2. This portion of Gα(s) recognizes the same regions of the two GPCR, although with differences in the details of selected interactions. The predicted longer cytosolic extensions of helices 5 and 6 of β(2)-AR are expected to contribute significantly to differences in Gα(s) recognition by the two receptors.
Collapse
Affiliation(s)
- Geneva DeMars
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia 30602-7229, USA
| | | | | |
Collapse
|
14
|
ur-Rehman Z, Khan A. G-protein-coupled receptor prediction using pseudo-amino-acid composition and multiscale energy representation of different physiochemical properties. Anal Biochem 2011; 412:173-82. [DOI: 10.1016/j.ab.2011.01.040] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2010] [Revised: 01/26/2011] [Accepted: 01/27/2011] [Indexed: 11/28/2022]
|
15
|
Suwa M, Ono Y. Computational overview of GPCR gene universe to support reverse chemical genomics study. Methods Mol Biol 2010; 577:41-54. [PMID: 19718507 DOI: 10.1007/978-1-60761-232-2_4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
Abstract
In order to support high-throughput screening for ligands of G-protein coupled receptors (GPCRs) by using bioinformatics technology, we introduce a database (SEVENS) with genome-scale annotation and software (GRIFFIN) that can simulate GPCR function. SEVENS ( http://sevens.cbrc.jp/ ) is an integrated database that includes GPCR genes that are identified with high accuracy (99.4% sensitivity and 96.6% specificity) from various types of genomes, by a pipeline that integrates such software as a gene finder, a sequence alignment tool, a motif and domain assignment tool, and a transmembrane helix (TMH) predictor. SEVENS provides the user a genome-scale overview of the "GPCR universe" with detailed information of chromosomal mapping, phylogenetic tree, protein sequence and structure, and experimental evidence, all of which are accessible via a user-friendly interface. GRIFFIN ( http://griffin.cbrc.jp/ ) can predict GPCR and G-protein coupling selectivity induced by ligand binding with high sensitivity and specificity (more than 87% on average), based on the support vector machine (SVM) and hidden Markov Model (HMM). SEVENS and GRIFFIN are expected to contribute to revealing the function of orphan and unknown GPCRs.
Collapse
Affiliation(s)
- Makiko Suwa
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | | |
Collapse
|
16
|
|
17
|
Abstract
Biological and medical data have been growing exponentially over the past several years [1, 2]. In particular, proteomics has seen automation dramatically change the rate at which data are generated [3]. Analysis that systemically incorporates prior information is becoming essential to making inferences about the myriad, complex data [4-6]. A Bayesian approach can help capture such information and incorporate it seamlessly through a rigorous, probabilistic framework. This paper starts with a review of the background mathematics behind the Bayesian methodology: from parameter estimation to Bayesian networks. The article then goes on to discuss how emerging Bayesian approaches have already been successfully applied to research across proteomics, a field for which Bayesian methods are particularly well suited [7-9]. After reviewing the literature on the subject of Bayesian methods in biological contexts, the article discusses some of the recent applications in proteomics and emerging directions in the field.
Collapse
Affiliation(s)
- Gil Alterovitz
- Division of Health Sciences and Technology, Harvard University and Massachusetts Institute of Technology, Boston, MA, USA.
| | | | | | | |
Collapse
|
18
|
Jiang Z, Guan C, Zhou Y. Computational prediction of the coupling specificity of g protein-coupled receptors. Appl Biochem Biotechnol 2007; 141:109-18. [PMID: 17625269 DOI: 10.1007/s12010-007-9213-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2006] [Revised: 04/17/2006] [Accepted: 05/16/2006] [Indexed: 10/23/2022]
Abstract
G protein-coupled receptors (GPCRs) represent one of the most important categories of membrane proteins that play important roles in signaling pathways. GPCRs transduce the extracellular stimuli into intracellular second messengers via their coupling to specific class of heterotrimeric GTP-binding proteins (G proteins) and the subsequent regulation of a diverse variety of effectors. Understanding the coupling specificity of GPCRs is critical for further comprehending their function, and is of tremendous clinical significance because GPCRs are the most successful drug targets. This minireview addresses the computational approaches that have been created for the prediction of coupling specificity of GPCRs and highlights the perspective of bioinformatics strategies that may be used to tackle this important task. In addition, some of the important resources of this field are also provided.
Collapse
Affiliation(s)
- Zhenran Jiang
- Hubei Bioinformatics and Molecular Imaging Key Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | | | | |
Collapse
|
19
|
Ono T, Hishigaki H. Prediction of GPCR-G protein coupling specificity using features of sequences and biological functions. GENOMICS PROTEOMICS & BIOINFORMATICS 2007; 4:238-44. [PMID: 17531799 PMCID: PMC5054072 DOI: 10.1016/s1672-0229(07)60004-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Understanding the coupling specificity between G protein-coupled receptors (GPCRs) and specific classes of G proteins is important for further elucidation of receptor functions within a cell. Increasing information on GPCR sequences and the G protein family would facilitate prediction of the coupling properties of GPCRs. In this study, we describe a novel approach for predicting the coupling specificity between GPCRs and G proteins. This method uses not only GPCR sequences but also the functional knowledge generated by natural language processing, and can achieve 92.2% prediction accuracy by using the C4.5 algorithm. Furthermore, rules related to GPCR-G protein coupling are generated. The combination of sequence analysis and text mining improves the prediction accuracy for GPCR-G protein coupling specificity, and also provides clues for understanding GPCR signaling.
Collapse
Affiliation(s)
- Toshihide Ono
- Laboratory of Bioinformatics, Otsuka Pharmaceutical Co., Ltd., Kawauchi-cho, Tokushima 771-0192, Japan.
| | | |
Collapse
|
20
|
Lu F, Li J, Jiang Z. Computational identification and analysis of G protein-coupled receptor targets. Drug Dev Res 2007. [DOI: 10.1002/ddr.20148] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
21
|
Wang YF, Chen H, Zhou YH. Prediction and classification of human G-protein coupled receptors based on support vector machines. GENOMICS PROTEOMICS & BIOINFORMATICS 2006; 3:242-6. [PMID: 16689693 PMCID: PMC5173243 DOI: 10.1016/s1672-0229(05)03034-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A computational system for the prediction and classification of human G-protein coupled receptors (GPCRs) has been developed based on the support vector machine (SVM) method and protein sequence information. The feature vectors used to develop the SVM prediction models consist of statistically significant features selected from single amino acid, dipeptide, and tripeptide compositions of protein sequences. Furthermore, the length distribution difference between GPCRs and non-GPCRs has also been exploited to improve the prediction performance. The testing results with annotated human protein sequences demonstrate that this system can get good performance for both prediction and classification of human GPCRs.
Collapse
|
22
|
Guan CP, Jiang ZR, Zhou YH. Predicting the coupling specificity of GPCRs to G-proteins by support vector machines. GENOMICS PROTEOMICS & BIOINFORMATICS 2006; 3:247-51. [PMID: 16689694 PMCID: PMC5173181 DOI: 10.1016/s1672-0229(05)03035-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
G-protein coupled receptors (GPCRs) represent one of the most important classes of drug targets for pharmaceutical industry and play important roles in cellular signal transduction. Predicting the coupling specificity of GPCRs to G-proteins is vital for further understanding the mechanism of signal transduction and the function of the receptors within a cell, which can provide new clues for pharmaceutical research and development. In this study, the features of amino acid compositions and physiochemical properties of the full-length GPCR sequences have been analyzed and extracted. Based on these features, classifiers have been developed to predict the coupling specificity of GPCRs to G-proteins using support vector machines. The testing results show that this method could obtain better prediction accuracy.
Collapse
|
23
|
Guo Y, Li M, Lu M, Wen Z, Huang Z. Predicting G-protein coupled receptors-G-protein coupling specificity based on autocross-covariance transform. Proteins 2006; 65:55-60. [PMID: 16865706 DOI: 10.1002/prot.21097] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Determining G-protein coupled receptors (GPCRs) coupling specificity is very important for further understanding the functions of receptors. A successful method in this area will benefit both basic research and drug discovery practice. Previously published methods rely on the transmembrane topology prediction at training step, even at prediction step. However, the transmembrane topology predicted by even the best algorithm is not of high accuracy. In this study, we developed a new method, autocross-covariance (ACC) transform based support vector machine (SVM), to predict coupling specificity between GPCRs and G-proteins. The primary amino acid sequences are translated into vectors based on the principal physicochemical properties of the amino acids and the data are transformed into a uniform matrix by applying ACC transform. SVMs for nonpromiscuous coupled GPCRs and promiscuous coupled GPCRs were trained and validated by jackknife test and the results thus obtained are very promising. All classifiers were also evaluated by the test datasets with good performance. Besides the high prediction accuracy, the most important feature of this method is that it does not require any transmembrane topology prediction at either training or prediction step but only the primary sequences of proteins. The results indicate that this relatively simple method is applicable. Academic users can freely download the prediction program at http://www.scucic.net/group/database/Service.asp.
Collapse
Affiliation(s)
- Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, People's Republic of China
| | | | | | | | | |
Collapse
|
24
|
Bates B, Zhang L, Nawoschik S, Kodangattil S, Tseng E, Kopsco D, Kramer A, Shan Q, Taylor N, Johnson J, Sun Y, Chen HM, Blatcher M, Paulsen JE, Pausch MH. Characterization of Gpr101 expression and G-protein coupling selectivity. Brain Res 2006; 1087:1-14. [PMID: 16647048 DOI: 10.1016/j.brainres.2006.02.123] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2005] [Revised: 02/15/2006] [Accepted: 02/26/2006] [Indexed: 11/21/2022]
Abstract
This report describes the identification and characterization of the murine orphan GPCR, Gpr101. Both human and murine genes were localized to chromosome X. Similar to its human ortholog, murine Gpr101 mRNA was detected predominantly in the brain within discrete nuclei. A knowledge-restricted hidden Markov model-based algorithm, capable of accurately predicting G-protein coupling selectivity, indicated that both human and murine GPR101 were likely coupled to Gs. This prediction was supported by the elevation of cyclic AMP levels and the activation of a cyclic AMP response element-luciferase reporter gene in HEK293 cells over-expressing human GPR101. Consistent with this, over-expression of human GPR101 in a yeast-based system yielded an elevated, agonist-independent reporter gene response in the presence of a yeast chimeric Galphas protein. These results indicate that GPR101 participates in a potentially wide range of activities in the CNS via modulation of cAMP levels.
Collapse
Affiliation(s)
- Brian Bates
- Wyeth Research, Biological Technologies, 87 Cambridge Park Drive, Cambridge, MA 02140, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Duprat E, Lefranc MP, Gascuel O. A simple method to predict protein-binding from aligned sequences--application to MHC superfamily and beta2-microglobulin. Bioinformatics 2005; 22:453-9. [PMID: 16352655 DOI: 10.1093/bioinformatics/bti826] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The MHC superfamily (MhcSF) consists of immune system MHC class I (MHC-I) proteins, along with proteins with a MHC-I-like structure that are involved in a large variety of biological processes. beta2-Microglobulin (B2M) non-covalent binding to MHC-I proteins is required for their surface expression and function, whereas MHC-I-like proteins interact, or not, with B2M. This study was designed to predict B2M binding (or non-binding) of newly identified MhcSF proteins, in order to decipher their function, understand the molecular recognition mechanisms and identify deleterious mutations. IMGT standardization of MhcSF protein domains provides a unique numbering of the multiple alignment positions, and conditions to develop such predictive tool. METHOD We combine a simple-Bayes classifier with IMGT unique numbering. Our method involves two steps: (1) selection of discriminant binary features, which associate an alignment position with an amino acid group; and (2) learning of the classifier by estimating the frequencies of selected features, conditionally to the B2M binding property. RESULTS Our dataset contains aligned sequences of 806 allelic forms of 47 MhcSF proteins, corresponding to 9 receptor types and 4 mammalian species. Eighteen discriminant features are selected, belonging to B2M contact sites, or stabilizing the molecular structure required for this contact. Three leave-one-out procedures are used to assess classifier performance, which corresponds to B2M binding prediction for: (1) new proteins, (2) species not represented in the dataset and (3) new receptor types. The prediction accuracy is high, i.e. 98, 94 and 70%, respectively. Application of our classifier to lower vertebrate MHC-I proteins indicates that these proteins bind to B2M and should then be expressed on the cellular surface by a process similar to that of mammalian MHC-I proteins. These results demonstrate the usefulness and accuracy of our (simple) approach, which should apply to other function or interaction prediction problems.
Collapse
Affiliation(s)
- Elodie Duprat
- Laboratoire d'ImmunoGénétique Moléculaire IGH (UPR CNRS 1142), 141 rue de la Cardonille, 34396 Montpellier Cedex 5, France
| | | | | |
Collapse
|
26
|
Sgourakis NG, Bagos PG, Hamodrakas SJ. Prediction of the coupling specificity of GPCRs to four families of G-proteins using hidden Markov models and artificial neural networks. Bioinformatics 2005; 21:4101-6. [PMID: 16174684 DOI: 10.1093/bioinformatics/bti679] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION G-protein coupled receptors are a major class of eukaryotic cell-surface receptors. A very important aspect of their function is the specific interaction (coupling) with members of four G-protein families. A single GPCR may interact with members of more than one G-protein families (promiscuous coupling). To date all published methods that predict the coupling specificity of GPCRs are restricted to three main coupling groups G(i/o), G(q/11) and G(s), not including G(12/13)-coupled or other promiscuous receptors. RESULTS We present a method that combines hidden Markov models and a feed-forward artificial neural network to overcome these limitations, while producing the most accurate predictions currently available. Using an up-to-date curated dataset, our method yields a 94% correct classification rate in a 5-fold cross-validation test. The method predicts also promiscuous coupling preferences, including coupling to G(12/13), whereas unlike other methods avoids overpredictions (false positives) when non-GPCR sequences are encountered. AVAILABILITY A webserver for academic users is available at http://bioinformatics.biol.uoa.gr/PRED-COUPLE2
Collapse
Affiliation(s)
- Nikolaos G Sgourakis
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Greece
| | | | | |
Collapse
|
27
|
Yabuki Y, Muramatsu T, Hirokawa T, Mukai H, Suwa M. GRIFFIN: a system for predicting GPCR-G-protein coupling selectivity using a support vector machine and a hidden Markov model. Nucleic Acids Res 2005; 33:W148-53. [PMID: 15980445 PMCID: PMC1160255 DOI: 10.1093/nar/gki495] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We describe a novel system, GRIFFIN (G-protein and Receptor Interaction Feature Finding INstrument), that predicts G-protein coupled receptor (GPCR) and G-protein coupling selectivity based on a support vector machine (SVM) and a hidden Markov model (HMM) with high sensitivity and specificity. Based on our assumption that whole structural segments of ligands, GPCRs and G-proteins are essential to determine GPCR and G-protein coupling, various quantitative features were selected for ligands, GPCRs and G-protein complex structures, and those parameters that are the most effective in selecting G-protein type were used as feature vectors in the SVM. The main part of GRIFFIN includes a hierarchical SVM classifier using the feature vectors, which is useful for Class A GPCRs, the major family. For the opsins and olfactory subfamilies of Class A and other minor families (Classes B, C, frizzled and smoothened), the binding G-protein is predicted with high accuracy using the HMM. Applying this system to known GPCR sequences, each binding G-protein is predicted with high sensitivity and specificity (>85% on average). GRIFFIN () is freely available and allows users to easily execute this reliable prediction of G-proteins.
Collapse
Affiliation(s)
- Yukimitsu Yabuki
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Information and Mathematical Science Laboratory (IMS) Inc.Meikei Building, 1-5-21 Otsuka, Bunkyo-ku, Tokyo 112-0012, Japan
| | - Takahiko Muramatsu
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Nara Institute of Science and Technology, Graduate School of Information Science8916-5 Takayama-cho, Ikoma-shi, Nara 630-0192, Japan
| | - Takatsugu Hirokawa
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Hidehito Mukai
- Mitsubishi Kagaku Institute of Life Sciences11 Minamiooya, Machida, Tokyo 194-8511, Japan
| | - Makiko Suwa
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST)2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Nara Institute of Science and Technology, Graduate School of Information Science8916-5 Takayama-cho, Ikoma-shi, Nara 630-0192, Japan
- To whom correspondence should be addressed. Tel: +81 3 3599 8051; Fax: +81 3 3599 8081;
| |
Collapse
|
28
|
Sgourakis NG, Bagos PG, Papasaikas PK, Hamodrakas SJ. A method for the prediction of GPCRs coupling specificity to G-proteins using refined profile Hidden Markov Models. BMC Bioinformatics 2005; 6:104. [PMID: 15847681 PMCID: PMC1087828 DOI: 10.1186/1471-2105-6-104] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2004] [Accepted: 04/22/2005] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND G- Protein coupled receptors (GPCRs) comprise the largest group of eukaryotic cell surface receptors with great pharmacological interest. A broad range of native ligands interact and activate GPCRs, leading to signal transduction within cells. Most of these responses are mediated through the interaction of GPCRs with heterotrimeric GTP-binding proteins (G-proteins). Due to the information explosion in biological sequence databases, the development of software algorithms that could predict properties of GPCRs is important. Experimental data reported in the literature suggest that heterotrimeric G-proteins interact with parts of the activated receptor at the transmembrane helix-intracellular loop interface. Utilizing this information and membrane topology information, we have developed an intensive exploratory approach to generate a refined library of statistical models (Hidden Markov Models) that predict the coupling preference of GPCRs to heterotrimeric G-proteins. The method predicts the coupling preferences of GPCRs to Gs, Gi/o and Gq/11, but not G12/13 subfamilies. RESULTS Using a dataset of 282 GPCR sequences of known coupling preference to G-proteins and adopting a five-fold cross-validation procedure, the method yielded an 89.7% correct classification rate. In a validation set comprised of all receptor sequences that are species homologues to GPCRs with known coupling preferences, excluding the sequences used to train the models, our method yields a correct classification rate of 91.0%. Furthermore, promiscuous coupling properties were correctly predicted for 6 of the 24 GPCRs that are known to interact with more than one subfamily of G-proteins. CONCLUSION Our method demonstrates high correct classification rate. Unlike previously published methods performing the same task, it does not require any transmembrane topology prediction in a preceding step. A web-server for the prediction of GPCRs coupling specificity to G-proteins available for non-commercial users is located at http://bioinformatics.biol.uoa.gr/PRED-COUPLE.
Collapse
MESH Headings
- Algorithms
- Amino Acid Sequence
- Animals
- Binding Sites
- Computational Biology/methods
- Databases, Protein
- Humans
- Ligands
- Markov Chains
- Models, Biological
- Models, Chemical
- Models, Statistical
- Molecular Sequence Data
- Pattern Recognition, Automated
- Protein Interaction Mapping
- Receptors, Cell Surface
- Receptors, G-Protein-Coupled/chemistry
- Receptors, G-Protein-Coupled/genetics
- Sensitivity and Specificity
- Sequence Alignment
- Sequence Analysis, Protein
- Sequence Homology, Amino Acid
- Software
Collapse
Affiliation(s)
- Nikolaos G Sgourakis
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 157 01, Greece
| | - Pantelis G Bagos
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 157 01, Greece
| | - Panagiotis K Papasaikas
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 157 01, Greece
| | - Stavros J Hamodrakas
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 157 01, Greece
| |
Collapse
|
29
|
Elefsinioti AL, Bagos PG, Spyropoulos IC, Hamodrakas SJ. A database for G proteins and their interaction with GPCRs. BMC Bioinformatics 2004; 5:208. [PMID: 15619328 PMCID: PMC544346 DOI: 10.1186/1471-2105-5-208] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2004] [Accepted: 12/24/2004] [Indexed: 11/10/2022] Open
Abstract
Background G protein-coupled receptors (GPCRs) transduce signals from extracellular space into the cell, through their interaction with G proteins, which act as switches forming hetero-trimers composed of different subunits (α,β,γ). The α subunit of the G protein is responsible for the recognition of a given GPCR. Whereas specialised resources for GPCRs, and other groups of receptors, are already available, currently, there is no publicly available database focusing on G Proteins and containing information about their coupling specificity with their respective receptors. Description gpDB is a publicly accessible G proteins/GPCRs relational database. Including species homologs, the database contains detailed information for 418 G protein monomers (272 Gα, 87 Gβ and 59 Gγ) and 2782 GPCRs sequences belonging to families with known coupling to G proteins. The GPCRs and the G proteins are classified according to a hierarchy of different classes, families and sub-families, based on extensive literature searchs. The main innovation besides the classification of both G proteins and GPCRs is the relational model of the database, describing the known coupling specificity of the GPCRs to their respective α subunit of G proteins, a unique feature not available in any other database. There is full sequence information with cross-references to publicly available databases, references to the literature concerning the coupling specificity and the dimerization of GPCRs and the user may submit advanced queries for text search. Furthermore, we provide a pattern search tool, an interface for running BLAST against the database and interconnectivity with PRED-TMR, PRED-GPCR and TMRPres2D. Conclusions The database will be very useful, for both experimentalists and bioinformaticians, for the study of G protein/GPCR interactions and for future development of predictive algorithms. It is available for academics, via a web browser at the URL:
Collapse
Affiliation(s)
- Antigoni L Elefsinioti
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 157 01, Greece
| | - Pantelis G Bagos
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 157 01, Greece
| | - Ioannis C Spyropoulos
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 157 01, Greece
| | - Stavros J Hamodrakas
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 157 01, Greece
| |
Collapse
|
30
|
Mertens I, Vandingenen A, Meeusen T, De Loof A, Schoofs L. Postgenomic characterization of G-protein-coupled receptors. Pharmacogenomics 2004; 5:657-72. [PMID: 15335287 DOI: 10.1517/14622416.5.6.657] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
G-protein-coupled receptors (GPCRs) constitute one of the largest families of membrane-spanning proteins. Their importance in drug development has been proven over and over again. Therefore, they remain one of the most significant groups of molecules to be characterized. In the postgenomic era, the methods used for the characterization of GPCRs have dramatically changed: the predicted orphan receptors are now often used to ascertain the ligands (reverse pharmacology), whereas, in the past, the bioactive ligand was used to identify the receptor (classic approach). In this review, we will give an overview of the recent postgenomic functional assays that are frequently used to link the orphan GPCR of both vertebrate and invertebrate organisms with their ligands.
Collapse
Affiliation(s)
- Inge Mertens
- Laboratory of Developmental Physiology, Genomics and Proteomics, Katholieke Universiteit Leuven, Naamsestraat 59, 3000 Leuven, Belgium.
| | | | | | | | | |
Collapse
|
31
|
Qian B, Soyer OS, Neubig RR, Goldstein RA. Depicting a protein's two faces: GPCR classification by phylogenetic tree-based HMMs. FEBS Lett 2003; 554:95-9. [PMID: 14596921 DOI: 10.1016/s0014-5793(03)01112-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Related proteins with similar biological functions generally share common features, allowing us to extract the common sequence features. These common features enable us to build statistical models that can be used to classify proteins, to predict new members, and to study the sequence-function relationship of this protein function group. Although evolution underlies the basis of multiple sequence analysis methods, most methods ignore phylogenetic relationships and the evolutionary process in building these statistical models. Previously we have shown that a phylogenetic tree-based profile hidden Markov model (T-HMM) is superior in generating a profile for a group of similar proteins. In this study we used the method to generate common features of G protein-coupled receptors (GPCRs). The profile generated by T-HMM gives high accuracy in GPCR function classification, both by ligand and by coupled G protein.
Collapse
Affiliation(s)
- Bin Qian
- Biophysics Research Division, University of Michigan, Ann Arbor, MI 48105, USA
| | | | | | | |
Collapse
|