1
|
Rios S, Fernandez MF, Caltabiano G, Campillo M, Pardo L, Gonzalez A. GPCRtm: An amino acid substitution matrix for the transmembrane region of class A G Protein-Coupled Receptors. BMC Bioinformatics 2015; 16:206. [PMID: 26134144 PMCID: PMC4489126 DOI: 10.1186/s12859-015-0639-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/06/2015] [Indexed: 01/08/2023] Open
Abstract
Background Protein sequence alignments and database search methods use standard scoring matrices calculated from amino acid substitution frequencies in general sets of proteins. These general-purpose matrices are not optimal to align accurately sequences with marked compositional biases, such as hydrophobic transmembrane regions found in membrane proteins. In this work, an amino acid substitution matrix (GPCRtm) is calculated for the membrane spanning segments of the G protein-coupled receptor (GPCR) rhodopsin family; one of the largest transmembrane protein family in humans with great importance in health and disease. Results The GPCRtm matrix reveals the amino acid compositional bias distinctive of the GPCR rhodopsin family and differs from other standard substitution matrices. These membrane receptors, as expected, are characterized by a high content of hydrophobic residues with regard to globular proteins. On the other hand, the presence of polar and charged residues is higher than in average membrane proteins, displaying high frequencies of replacement within themselves. Conclusions Analysis of amino acid frequencies and values obtained from the GPCRtm matrix reveals patterns of residue replacements different from other standard substitution matrices. GPCRs prioritize the reactivity properties of the amino acids over their bulkiness in the transmembrane regions. A distinctive role is that charged and polar residues seem to evolve at different rates than other amino acids. This observation is related to the role of the transmembrane bundle in the binding of ligands, that in many cases involve electrostatic and hydrogen bond interactions. This new matrix can be useful in database search and for the construction of more accurate sequence alignments of GPCRs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0639-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Santiago Rios
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
| | - Marta F Fernandez
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
| | - Gianluigi Caltabiano
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
| | - Mercedes Campillo
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
| | - Leonardo Pardo
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
| | - Angel Gonzalez
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, 08193, Bellaterra, Barcelona, Spain.
| |
Collapse
|
2
|
Esque J, Urbain A, Etchebest C, de Brevern AG. Sequence-structure relationship study in all-α transmembrane proteins using an unsupervised learning approach. Amino Acids 2015; 47:2303-22. [PMID: 26043903 DOI: 10.1007/s00726-015-2010-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 05/15/2015] [Indexed: 01/28/2023]
Abstract
Transmembrane proteins (TMPs) are major drug targets, but the knowledge of their precise topology structure remains highly limited compared with globular proteins. In spite of the difficulties in obtaining their structures, an important effort has been made these last years to increase their number from an experimental and computational point of view. In view of this emerging challenge, the development of computational methods to extract knowledge from these data is crucial for the better understanding of their functions and in improving the quality of structural models. Here, we revisit an efficient unsupervised learning procedure, called Hybrid Protein Model (HPM), which is applied to the analysis of transmembrane proteins belonging to the all-α structural class. HPM method is an original classification procedure that efficiently combines sequence and structure learning. The procedure was initially applied to the analysis of globular proteins. In the present case, HPM classifies a set of overlapping protein fragments, extracted from a non-redundant databank of TMP 3D structure. After fine-tuning of the learning parameters, the optimal classification results in 65 clusters. They represent at best similar relationships between sequence and local structure properties of TMPs. Interestingly, HPM distinguishes among the resulting clusters two helical regions with distinct hydrophobic patterns. This underlines the complexity of the topology of these proteins. The HPM classification enlightens unusual relationship between amino acids in TMP fragments, which can be useful to elaborate new amino acids substitution matrices. Finally, two challenging applications are described: the first one aims at annotating protein functions (channel or not), the second one intends to assess the quality of the structures (X-ray or models) via a new scoring function deduced from the HPM classification.
Collapse
Affiliation(s)
- Jérémy Esque
- INSERM, U 1134, DSIMB, 75739, Paris, France.,Univ. Paris Diderot, Sorbonne Paris Cité UMR-S 1134, 75739, Paris, France.,Institut National de la Transfusion Sanguine (INTS), 75739, Paris, France.,Laboratoire d'Excellence GR-Ex, 75739, Paris, France.,Laboratoire d'Ingénierie des Fonctions Moléculaire (IFM), ISIS, UMR 7006, 67000, Strasbourg, France.,Department of Integrative Structural Biology, INSERM U964, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), 67404, Illkirch, France.,UMR7104, Centre National de la Recherche Scientifique (CNRS), 67404, Illkirch, France.,Université de Strasbourg, 67404, Illkirch, France
| | - Aurélie Urbain
- Institut Jean-Pierre Bourgin, INRA, UMR 1318, 78026, Versailles, France
| | - Catherine Etchebest
- INSERM, U 1134, DSIMB, 75739, Paris, France.,Univ. Paris Diderot, Sorbonne Paris Cité UMR-S 1134, 75739, Paris, France.,Institut National de la Transfusion Sanguine (INTS), 75739, Paris, France.,Laboratoire d'Excellence GR-Ex, 75739, Paris, France
| | - Alexandre G de Brevern
- INSERM, U 1134, DSIMB, 75739, Paris, France. .,Univ. Paris Diderot, Sorbonne Paris Cité UMR-S 1134, 75739, Paris, France. .,Institut National de la Transfusion Sanguine (INTS), 75739, Paris, France. .,Laboratoire d'Excellence GR-Ex, 75739, Paris, France.
| |
Collapse
|
3
|
Waldispühl J, O'Donnell CW, Will S, Devadas S, Backofen R, Berger B. Simultaneous alignment and folding of protein sequences. J Comput Biol 2014; 21:477-91. [PMID: 24766258 DOI: 10.1089/cmb.2013.0163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically, partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments, partiFold-Align significantly outperforms state-of-the-art pairwise and multiple sequence alignment tools in the most difficult low-sequence homology case. It also improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families (partiFold-Align is available at http://partifold.csail.mit.edu/ ).
Collapse
|
4
|
Sadovskaya NS, Sutormin RA, Gelfand MS. RECOGNITION OF TRANSMEMBRANE SEGMENTS IN PROTEINS: REVIEW AND CONSISTENCY-BASED BENCHMARKING OF INTERNET SERVERS. J Bioinform Comput Biol 2011; 4:1033-56. [PMID: 17099940 DOI: 10.1142/s0219720006002326] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2006] [Revised: 06/21/2006] [Accepted: 06/22/2006] [Indexed: 11/18/2022]
Abstract
Membrane proteins perform a number of crucial functions as transporters, receptors, and components of enzyme complexes. Identification of membrane proteins and prediction of their topology is thus an important part of genome annotation. We present here an overview of transmembrane segments in protein sequences, summarize data from large-scale genome studies, and report results of benchmarking of several popular internet servers.
Collapse
Affiliation(s)
- Nataliya S Sadovskaya
- Institute for Information Transmission Problems, Russian Academy of Science, Bolshoi Karetny per. 19, Moscow 127994, Russia.
| | | | | |
Collapse
|
5
|
Paila U, Kondam R, Ranjan A. Genome bias influences amino acid choices: analysis of amino acid substitution and re-compilation of substitution matrices exclusive to an AT-biased genome. Nucleic Acids Res 2008; 36:6664-75. [PMID: 18948281 PMCID: PMC2588515 DOI: 10.1093/nar/gkn635] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The genomic era has seen a remarkable increase in the number of genomes being sequenced and annotated. Nonetheless, annotation remains a serious challenge for compositionally biased genomes. For the preliminary annotation, popular nucleotide and protein comparison methods such as BLAST are widely employed. These methods make use of matrices to score alignments such as the amino acid substitution matrices. Since a nucleotide bias leads to an overall bias in the amino acid composition of proteins, it is possible that a genome with nucleotide bias may have introduced atypical amino acid substitutions in its proteome. Consequently, standard matrices fail to perform well in sequence analysis of these genomes. To address this issue, we examined the amino acid substitution in the AT-rich genome of Plasmodium falciparum, chosen as a reference and reconstituted a substitution matrix in the genome's context. The matrix was used to generate protein sequence alignments for the parasite proteins that improved across the functional regions. We attribute this to the consistency that may have been achieved amid the target and background frequencies calculated exclusively in our study. This study has important implications on annotation of proteins that are of experimental interest but give poor sequence alignments with standard conventional matrices.
Collapse
Affiliation(s)
| | | | - Akash Ranjan
- *To whom correspondence should be addressed. Tel: +91 40 27171503; Fax: +91 40 27155610;
| |
Collapse
|
6
|
Bulka B, desJardins M, Freeland SJ. An interactive visualization tool to explore the biophysical properties of amino acids and their contribution to substitution matrices. BMC Bioinformatics 2006; 7:329. [PMID: 16817972 PMCID: PMC1524819 DOI: 10.1186/1471-2105-7-329] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2005] [Accepted: 07/03/2006] [Indexed: 11/26/2022] Open
Abstract
Background Quantitative descriptions of amino acid similarity, expressed as probabilistic models of evolutionary interchangeability, are central to many mainstream bioinformatic procedures such as sequence alignment, homology searching, and protein structural prediction. Here we present a web-based, user-friendly analysis tool that allows any researcher to quickly and easily visualize relationships between these bioinformatic metrics and to explore their relationships to underlying indices of amino acid molecular descriptors. Results We demonstrate the three fundamental types of question that our software can address by taking as a specific example the connections between 49 measures of amino acid biophysical properties (e.g., size, charge and hydrophobicity), a generalized model of amino acid substitution (as represented by the PAM74-100 matrix), and the mutational distance that separates amino acids within the standard genetic code (i.e., the number of point mutations required for interconversion during protein evolution). We show that our software allows a user to recapture the insights from several key publications on these topics in just a few minutes. Conclusion Our software facilitates rapid, interactive exploration of three interconnected topics: (i) the multidimensional molecular descriptors of the twenty proteinaceous amino acids, (ii) the correlation of these biophysical measurements with observed patterns of amino acid substitution, and (iii) the causal basis for differences between any two observed patterns of amino acid substitution. This software acts as an intuitive bioinformatic exploration tool that can guide more comprehensive statistical analyses relating to a diverse array of specific research questions.
Collapse
Affiliation(s)
- Blazej Bulka
- Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Marie desJardins
- Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Stephen J Freeland
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| |
Collapse
|
7
|
Sutormin RA, Mironov AA. Membrane profile-based probabilistic method for predicting transmembrane segments via multiple protein sequence alignment. Mol Biol 2006. [DOI: 10.1134/s0026893306030150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
8
|
Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 2004; 13:443-56. [PMID: 14739328 PMCID: PMC2286703 DOI: 10.1110/ps.03191704] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The increasing volume of genomic data opens new possibilities for analysis of protein function. We introduce a method for automated selection of residues that determine the functional specificity of proteins with a common general function (the specificity-determining positions [SDP] prediction method). Such residues are assumed to be conserved within groups of orthologs (that may be assumed to have the same specificity) and to vary between paralogs. Thus, considering a multiple sequence alignment of a protein family divided into orthologous groups, one can select positions where the distribution of amino acids correlates with this division. Unlike previously published techniques, the introduced method directly takes into account nonuniformity of amino acid substitution frequencies. In addition, it does not require setting arbitrary thresholds. Instead, a formal procedure for threshold selection using the Bernoulli estimator is implemented. We tested the SDP prediction method on the LacI family of bacterial transcription factors and a sample of bacterial water and glycerol transporters belonging to the major intrinsic protein (MIP) family. In both cases, the comparison with available experimental and structural data strongly supported our predictions.
Collapse
Affiliation(s)
- Olga V Kalinina
- State Scientific Center GosNIIGenetika, 1st Dorozhny pr., 1, Moscow 113545, Russia
| | | | | | | |
Collapse
|
9
|
Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucleic Acids Res 2004; 32:W424-8. [PMID: 15215423 PMCID: PMC441529 DOI: 10.1093/nar/gkh391] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
SDPpred (Specificity Determining Position prediction) is a tool for prediction of residues in protein sequences that determine the proteins' functional specificity. It is designed for analysis of protein families whose members have biochemically similar but not identical interaction partners (e.g. different substrates for a family of transporters). SDPpred predicts residues that could be responsible for the proteins' choice of their correct interaction partners. The input of SDPpred is a multiple alignment of a protein family divided into a number of specificity groups, within which the interaction partner is believed to be the same. SDPpred does not require information about the secondary or three-dimensional structure of proteins. It produces a set of the alignment positions (specificity determining positions) that determine differences in functional specificity. SDPpred is available at http://math.genebee.msu.ru/~psn/.
Collapse
Affiliation(s)
- Olga V Kalinina
- Department of Bioengineering and Bioinformatics, Moscow State University, Vorob'evy gory, 1-73, Moscow, 119992, Russia
| | | | | | | | | |
Collapse
|