1
|
Postovskaya A, Vercauteren K, Meysman P, Laukens K. tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs. Brief Bioinform 2024; 26:bbae602. [PMID: 39576224 PMCID: PMC11583439 DOI: 10.1093/bib/bbae602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 10/07/2024] [Accepted: 11/05/2024] [Indexed: 11/24/2024] Open
Abstract
Deciphering the specificity of T-cell receptor (TCR) repertoires is crucial for monitoring adaptive immune responses and developing targeted immunotherapies and vaccines. To elucidate the specificity of previously unseen TCRs, many methods employ the BLOSUM62 matrix to find TCRs with similar amino acid (AA) sequences. However, while BLOSUM62 reflects the AA substitutions within conserved regions of proteins with similar functions, the remarkable diversity of TCRs means that both TCRs with similar and dissimilar sequences can bind the same epitope. Therefore, reliance on BLOSUM62 may bias detection towards epitope-specific TCRs with similar biochemical properties, overlooking those with more diverse AA compositions. In this study, we introduce tcrBLOSUMa and tcrBLOSUMb, specialized AA substitution matrices for CDR3 alpha and CDR3 beta TCR chains, respectively. The matrices reflect AA frequencies and variations occurring within TCRs that bind the same epitope, revealing that both CDR3 alpha and CDR3 beta display tolerance to a wide range of AA substitutions and differ noticeably from the standard BLOSUM62. By accurately aligning distant TCRs employing tcrBLOSUMb, we were able to improve clustering performance and capture a large number of epitope-specific TCRs with diverse AA compositions and physicochemical profiles overlooked by BLOSUM62. Utilizing both the general BLOSUM62 and specialized tcrBLOSUM matrices in existing computational tools will broaden the range of TCRs that can be associated with their cognate epitopes, thereby enhancing TCR repertoire analysis.
Collapse
MESH Headings
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/chemistry
- Amino Acid Substitution
- Humans
- Amino Acid Sequence
- Epitopes, T-Lymphocyte/immunology
- Epitopes, T-Lymphocyte/chemistry
- Sequence Alignment
- Complementarity Determining Regions/genetics
- Complementarity Determining Regions/immunology
- Complementarity Determining Regions/chemistry
- Computational Biology/methods
- Epitopes/immunology
- Epitopes/chemistry
- Algorithms
- Receptors, Antigen, T-Cell, alpha-beta/genetics
- Receptors, Antigen, T-Cell, alpha-beta/immunology
- Receptors, Antigen, T-Cell, alpha-beta/chemistry
Collapse
Affiliation(s)
- Anna Postovskaya
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), University of Antwerp, Antwerp, Belgium
- Clinical Virology Unit, Department of Clinical Sciences, Institute of Tropical Medicine, Antwerp, Belgium
| | - Koen Vercauteren
- Clinical Virology Unit, Department of Clinical Sciences, Institute of Tropical Medicine, Antwerp, Belgium
| | - Pieter Meysman
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Network Antwerp (BIOMINA), University of Antwerp, Antwerp, Belgium
| |
Collapse
|
2
|
Pandey M, Shah SK, Gromiha MM. Computational approaches for identifying disease-causing mutations in proteins. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2023; 139:141-171. [PMID: 38448134 DOI: 10.1016/bs.apcsb.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Advancements in genome sequencing have expanded the scope of investigating mutations in proteins across different diseases. Amino acid mutations in a protein alter its structure, stability and function and some of them lead to diseases. Identification of disease-causing mutations is a challenging task and it will be helpful for designing therapeutic strategies. Hence, mutation data available in the literature have been curated and stored in several databases, which have been effectively utilized for developing computational methods to identify deleterious mutations (drivers), using sequence and structure-based properties of proteins. In this chapter, we describe the contents of specific databases that have information on disease-causing and neutral mutations followed by sequence and structure-based properties. Further, characteristic features of disease-causing mutations will be discussed along with computational methods for identifying cancer hotspot residues and disease-causing mutations in proteins.
Collapse
Affiliation(s)
- Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - Suraj Kumar Shah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
3
|
Suzuki S, Ota S, Yamagishi T, Tuji A, Yamaguchi H, Kawachi M. Rapid transcriptomic and physiological changes in the freshwater pennate diatom Mayamaea pseudoterrestris in response to copper exposure. DNA Res 2022; 29:dsac037. [PMID: 36197113 PMCID: PMC9724779 DOI: 10.1093/dnares/dsac037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 09/27/2022] [Accepted: 10/03/2022] [Indexed: 12/12/2022] Open
Abstract
Diatoms function as major primary producers, accumulating large amounts of biomass in most aquatic environments. Given their rapid responses to changes in environmental conditions, diatoms are used for the biological monitoring of water quality and for performing ecotoxicological tests in aquatic ecosystems. However, the molecular basis for their toxicity to chemical compounds remains largely unknown. Here, we sequenced the genome of a freshwater diatom, Mayamaea pseudoterrestris NIES-4280, which has been proposed as an alternative strain of Navicula pelliculosa UTEX 664 for performing the Organisation for Economic Co-operation and Development ecotoxicological test. This study shows that M. pseudoterrestris has a small genome and carries the lowest number of genes among freshwater diatoms. The gene content of M. pseudoterrestris is similar to that of the model marine diatom, Phaeodactylum tricornutum. Genes related to cell motility, polysaccharide metabolism, oxidative stress alleviation, intracellular calcium signalling, and reactive compound detoxification showed rapid changes in their expression patterns in response to copper exposure. Active gliding motility was observed in response to copper addition, and copper exposure decreased intracellular calcium concentration. These findings enhance our understanding of the environmental adaptation of diatoms, and elucidate the molecular basis of toxicity of chemical compounds in algae.
Collapse
Affiliation(s)
- Shigekatsu Suzuki
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Japan
| | - Shuhei Ota
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Japan
| | - Takahiro Yamagishi
- Health and Environmental Risk Division, National Institute for Environmental Studies, Tsukuba, Japan
| | - Akihiro Tuji
- Department of Botany, National Museum of Nature and Science, Tsukuba, Japan
| | - Haruyo Yamaguchi
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Japan
| | - Masanobu Kawachi
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Japan
| |
Collapse
|
4
|
Del Amparo R, Arenas M. Consequences of Substitution Model Selection on Protein Ancestral Sequence Reconstruction. Mol Biol Evol 2022; 39:6628884. [PMID: 35789388 PMCID: PMC9254009 DOI: 10.1093/molbev/msac144] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The selection of the best-fitting substitution model of molecular evolution is a traditional step for phylogenetic inferences, including ancestral sequence reconstruction (ASR). However, a few recent studies suggested that applying this procedure does not affect the accuracy of phylogenetic tree reconstruction. Here, we revisited this debate topic by analyzing the influence of selection among substitution models of protein evolution, with focus on exchangeability matrices, on the accuracy of ASR using simulated and real data. We found that the selected best-fitting substitution model produces the most accurate ancestral sequences, especially if the data present large genetic diversity. Indeed, ancestral sequences reconstructed under substitution models with similar exchangeability matrices were similar, suggesting that if the selected best-fitting model cannot be used for the reconstruction, applying a model similar to the selected one is preferred. We conclude that selecting among substitution models of protein evolution is recommended for reconstructing accurate ancestral sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, Vigo, Spain.,Departamento de Bioquímica, Xenética e Immunoloxía, Universidade de Vigo, Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, Vigo, Spain.,Departamento de Bioquímica, Xenética e Immunoloxía, Universidade de Vigo, Vigo, Spain.,Galicia Sur Health Research Institute (IIS Galicia Sur), Vigo, Spain
| |
Collapse
|
5
|
Chao J, Tang F, Xu L. Developments in Algorithms for Sequence Alignment: A Review. Biomolecules 2022; 12:biom12040546. [PMID: 35454135 PMCID: PMC9024764 DOI: 10.3390/biom12040546] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 01/27/2023] Open
Abstract
The continuous development of sequencing technologies has enabled researchers to obtain large amounts of biological sequence data, and this has resulted in increasing demands for software that can perform sequence alignment fast and accurately. A number of algorithms and tools for sequence alignment have been designed to meet the various needs of biologists. Here, the ideas that prevail in the research of sequence alignment and some quality estimation methods for multiple sequence alignment tools are summarized.
Collapse
Affiliation(s)
- Jiannan Chao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China;
| | - Furong Tang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China;
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
- Correspondence:
| |
Collapse
|
6
|
Jabeen A, Vijayram R, Ranganathan S. A two-stage computational approach to predict novel ligands for a chemosensory receptor. Curr Res Struct Biol 2021; 2:213-221. [PMID: 34235481 PMCID: PMC8244491 DOI: 10.1016/j.crstbi.2020.10.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 09/29/2020] [Accepted: 10/03/2020] [Indexed: 11/01/2022] Open
Abstract
Olfactory receptor (OR) 1A2 is the member of largest superfamily of G protein-coupled receptors (GPCRs). OR1A2 is an ectopically expressed receptor with only 13 known ligands, implicated in reducing hepatocellular carcinoma progression, with enormous therapeutic potential. We have developed a two-stage screening approach to identify novel putative ligands of OR1A2. We first used a pharmacophore model based on atomic property field (APF) to virtually screen a library of 5942 human metabolites. We then carried out structure-based virtual screening (SBVS) for predicting the potential agonists, based on a 3D homology model of OR1A2. This model was developed using a biophysical approach for template selection, based on multiple parameters including hydrophobicity correspondence, applied to the complete set of available GPCR structures to pick the most appropriate template. Finally, the membrane-embedded 3D model was refined by molecular dynamics (MD) simulations in both the apo and holo forms. The refined model in the apo form was selected for SBVS. Four novel small molecules were identified as strong binders to this olfactory receptor on the basis of computed binding energies.
Collapse
Key Words
- APF, Atomic property field
- Amber, Assisted model Building with Energy Refinement
- Atomic property field
- Binding free energy calculation
- CSF, Cerebrospinal fluid
- ECL, Extracellular loop
- GPCR, G protein coupled receptor
- HCMV, Human cytomegalovirus
- HMDB, Human metabolome database
- Hydrophobicity correspondence
- LBVS, Ligand based virtual screening
- LC, Lung carcinoids
- MD, Molecular dynamics
- MMGBSA, Molecular mechanics generalized born surface area
- MMPBSA, Molecular mechanics Poisson–Boltzmann surface area
- Molecular dynamics
- NAFLD, Non-alcoholic fatty liver disease
- NASH, Nonalcoholic steatohepatitis
- OR, olfactory receptor
- OR1A2
- Olfactory receptor
- PMEMD, Particle-Mesh Ewald Molecular Dynamics
- POPC, 1-palmitoyl-2-oleoyl-sn-glycero- 3-phosphatidylcholine
- RMSD, Root mean square deviation
- RMSF, Root mean square fluctuation
- SBVS, Structure based virtual screening
- SSD, Sum of squared difference
- TM, Transmembrane
- Virtual ligand screening
Collapse
Affiliation(s)
- Amara Jabeen
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Ramya Vijayram
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, Tamilnadu, India
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
7
|
de Felice A, Aureli S, Limongelli V. Drug Repurposing on G Protein-Coupled Receptors Using a Computational Profiling Approach. Front Mol Biosci 2021; 8:673053. [PMID: 34026848 PMCID: PMC8138314 DOI: 10.3389/fmolb.2021.673053] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 04/13/2021] [Indexed: 11/23/2022] Open
Abstract
G protein-coupled receptors (GPCRs) are the largest human membrane receptor family regulating a wide range of cell signaling. For this reason, GPCRs are highly desirable drug targets, with approximately 40% of prescribed medicines targeting a member of this receptor family. The structural homology of GPCRs and the broad spectrum of applications of GPCR-acting drugs suggest an investigation of the cross-activity of a drug toward different GPCR receptors with the aim of rationalizing drug side effects, designing more selective and less toxic compounds, and possibly proposing off-label therapeutic applications. Herein, we present an original in silico approach named “Computational Profiling for GPCRs” (CPG), which is able to represent, in a one-dimensional (1D) string, the physico-chemical properties of a ligand–GPCR binding interaction and, through a tailored alignment algorithm, repurpose the ligand for a different GPCR. We show three case studies where docking calculations and pharmacological data confirm the drug repurposing findings obtained through CPG on 5-hydroxytryptamine receptor 2B, beta-2 adrenergic receptor, and M2 muscarinic acetylcholine receptor. The CPG code is released as a user-friendly graphical user interface with numerous options that make CPG a powerful tool to assist the drug design of GPCR ligands.
Collapse
Affiliation(s)
- Alessandra de Felice
- Faculty of Biomedical Sciences, Euler Institute, Università della Svizzera italiana (USI), Lugano, Switzerland
| | - Simone Aureli
- Faculty of Biomedical Sciences, Euler Institute, Università della Svizzera italiana (USI), Lugano, Switzerland
| | - Vittorio Limongelli
- Faculty of Biomedical Sciences, Euler Institute, Università della Svizzera italiana (USI), Lugano, Switzerland.,Department of Pharmacy, University of Naples "Federico II", Naples, Italy
| |
Collapse
|
8
|
Jabeen A, Vijayram R, Ranganathan S. BIO-GATS: A Tool for Automated GPCR Template Selection Through a Biophysical Approach for Homology Modeling. Front Mol Biosci 2021; 8:617176. [PMID: 33898512 PMCID: PMC8059640 DOI: 10.3389/fmolb.2021.617176] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 02/24/2021] [Indexed: 11/13/2022] Open
Abstract
G protein-coupled receptors (GPCRs) are the largest family of membrane proteins with more than 800 members. GPCRs are involved in numerous physiological functions within the human body and are the target of more than 30% of the United States Food and Drug Administration (FDA) approved drugs. At present, over 400 experimental GPCR structures are available in the Protein Data Bank (PDB) representing 76 unique receptors. The absence of an experimental structure for the majority of GPCRs demand homology models for structure-based drug discovery workflows. The generation of good homology models requires appropriate templates. The commonly used methods for template selection are based on sequence identity. However, there exists low sequence identity among the GPCRs. Sequences with similar patterns of hydrophobic residues are often structural homologs, even with low sequence identity. Extending this, we propose a biophysical approach for template selection based principally on hydrophobicity correspondence between the target and the template. Our approach takes into consideration other relevant parameters, including resolution, similarity within the orthosteric binding pocket of GPCRs, and structure completeness, for template selection. The proposed method was implemented in the form of a free tool called Bio-GATS, to provide the user with easy selection of the appropriate template for a query GPCR sequence. Bio-GATS was successfully validated with recent published benchmarking datasets. An application to an olfactory receptor to select an appropriate template has also been provided as a case study.
Collapse
Affiliation(s)
- Amara Jabeen
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, Australia
| | - Ramya Vijayram
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, Australia
| |
Collapse
|
9
|
Jimenez RC, Casajuana-Martin N, García-Recio A, Alcántara L, Pardo L, Campillo M, Gonzalez A. The mutational landscape of human olfactory G protein-coupled receptors. BMC Biol 2021; 19:21. [PMID: 33546694 PMCID: PMC7866472 DOI: 10.1186/s12915-021-00962-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 01/15/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Olfactory receptors (ORs) constitute a large family of sensory proteins that enable us to recognize a wide range of chemical volatiles in the environment. By contrast to the extensive information about human olfactory thresholds for thousands of odorants, studies of the genetic influence on olfaction are limited to a few examples. To annotate on a broad scale the impact of mutations at the structural level, here we analyzed a compendium of 119,069 natural variants in human ORs collected from the public domain. RESULTS OR mutations were categorized depending on their genomic and protein contexts, as well as their frequency of occurrence in several human populations. Functional interpretation of the natural changes was estimated from the increasing knowledge of the structure and function of the G protein-coupled receptor (GPCR) family, to which ORs belong. Our analysis reveals an extraordinary diversity of natural variations in the olfactory gene repertoire between individuals and populations, with a significant number of changes occurring at the structurally conserved regions. A particular attention is paid to mutations in positions linked to the conserved GPCR activation mechanism that could imply phenotypic variation in the olfactory perception. An interactive web application (hORMdb, Human Olfactory Receptor Mutation Database) was developed for the management and visualization of this mutational dataset. CONCLUSION We performed topological annotations and population analysis of natural variants of human olfactory receptors and provide an interactive application to explore human OR mutation data. We envisage that the utility of this information will increase as the amount of available pharmacological data for these receptors grow. This effort, together with ongoing research in the study of genetic changes in other sensory receptors could shape an emerging sensegenomics field of knowledge, which should be considered by food and cosmetic consumer product manufacturers for the benefit of the general population.
Collapse
Affiliation(s)
- Ramón Cierco Jimenez
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193, Bellaterra, Spain
- Present Address: International Agency for Research on Cancer, Evidence Synthesis and Classification Section, WHO Classification of Tumours Group, 150 Cours Albert Thomas, 69008, Lyon, France
| | - Nil Casajuana-Martin
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193, Bellaterra, Spain
| | - Adrián García-Recio
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193, Bellaterra, Spain
| | - Lidia Alcántara
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193, Bellaterra, Spain
| | - Leonardo Pardo
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193, Bellaterra, Spain
| | - Mercedes Campillo
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193, Bellaterra, Spain
| | - Angel Gonzalez
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193, Bellaterra, Spain.
| |
Collapse
|
10
|
Trivedi R, Nagarajaram HA. Substitution scoring matrices for proteins - An overview. Protein Sci 2020; 29:2150-2163. [PMID: 32954566 DOI: 10.1002/pro.3954] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 09/17/2020] [Accepted: 09/18/2020] [Indexed: 01/17/2023]
Abstract
Sequence analysis is the primary and simplest approach to discover structural, functional and evolutionary details of related proteins. All the alignment based approaches of sequence analysis make use of amino acid substitution matrices, and the accuracy of the results largely depends on the type of scoring matrices used to perform alignment tasks. An amino acid substitution matrix is a 20 × 20 matrix in which the individual elements encapsulate the rates at which each of the 20 amino acid residues in proteins are substituted by other amino acid residues over time. In contrast to most globular/ordered proteins whose amino acids composition is considered as standard, there are several classes of proteins (e.g., transmembrane proteins) in which certain types of amino acid (e.g., hydrophobic residues) are enriched. These compositional differences among various classes of proteins are manifested in their underlying residue substitution frequencies. Therefore, each of the compositionally distinct class of proteins or protein segments should be studied using specific scoring matrices that reflect their distinct residue substitution pattern. In this review, we describe the development and application of various substitution scoring matrices peculiar to proteins with standard and biased compositions. Along with most commonly used standard matrices (PAM, BLOSUM, MD and VTML) that act as default parameters in various homologs search and alignment tools, different substitution scoring matrices specific to compositionally distinct class of proteins are discussed in detail.
Collapse
Affiliation(s)
- Rakesh Trivedi
- Laboratory of Computational Biology, Centre for DNA Fingerprinting and Diagnostics, Uppal, Hyderabad, Telangana, India.,Graduate School, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Hampapathalu Adimurthy Nagarajaram
- Laboratory of Computational Biology, Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India.,Centre for Modelling, Simulation and Design, University of Hyderabad, Hyderabad, Telangana, India
| |
Collapse
|
11
|
Molecular evolution of a collage of cholesterol interaction motifs in transmembrane helix V of the serotonin 1A receptor. Chem Phys Lipids 2020; 232:104955. [PMID: 32846149 DOI: 10.1016/j.chemphyslip.2020.104955] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 08/08/2020] [Accepted: 08/16/2020] [Indexed: 12/20/2022]
Abstract
The human serotonin1A receptor is a representative member of the superfamily of G protein-coupled receptors (GPCRs) and an important drug target for neurological disorders. Using a combination of biochemical, biophysical and molecular dynamics simulation approaches, we and others have shown that membrane cholesterol modulates the organization, dynamics and function of vertebrate serotonin1A receptors. Previous studies have shown that the cytoplasmic portion of transmembrane helix V (TM V) and the extramembraneous intracellular loop 3 are critical for G-protein coupling, phosphorylation and desensitization of the receptor. We have recently resolved a collage of putative cholesterol interaction motifs from the amino acid sequence overlapping this region. In this paper, we explore the sequence plasticity of this fragment that may have adapted to altered membrane lipidome, after vertebrates evolved from primordial invertebrates. Since invertebrates have lower levels of membrane cholesterol relative to vertebrates, we compared TM V sequence fragments from invertebrate serotonin1 receptors with vertebrate orthologs to infer the sequence plasticity in TM V. We report that the average number of cholesterol interaction motifs in TM V for diverse phyla represents an increasing trend that could mirror vertebrate evolution from primordial invertebrates. By statistical modeling, we propose that the collage of cholesterol interaction motifs in TM V of the human serotonin1A receptor may have evolved from rudimentary collages, reminiscent of primordial invertebrate orthologs. Taken together, we propose that a repertoire of cholesterol-philic nonsynonymous substitutions may have enhanced collage complexity in TM V during vertebrate evolution.
Collapse
|
12
|
Perron U, Kozlov AM, Stamatakis A, Goldman N, Moal IH. Modeling Structural Constraints on Protein Evolution via Side-Chain Conformational States. Mol Biol Evol 2020; 36:2086-2103. [PMID: 31114882 PMCID: PMC6736381 DOI: 10.1093/molbev/msz122] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Few models of sequence evolution incorporate parameters describing protein structure, despite its high conservation, essential functional role and increasing availability. We present a structurally aware empirical substitution model for amino acid sequence evolution in which proteins are expressed using an expanded alphabet that relays both amino acid identity and structural information. Each character specifies an amino acid as well as information about the rotamer configuration of its side-chain: the discrete geometric pattern of permitted side-chain atomic positions, as defined by the dihedral angles between covalently linked atoms. By assigning rotamer states in 251,194 protein structures and identifying 4,508,390 substitutions between closely related sequences, we generate a 55-state “Dayhoff-like” model that shows that the evolutionary properties of amino acids depend strongly upon side-chain geometry. The model performs as well as or better than traditional 20-state models for divergence time estimation, tree inference, and ancestral state reconstruction. We conclude that not only is rotamer configuration a valuable source of information for phylogenetic studies, but that modeling the concomitant evolution of sequence and structure may have important implications for understanding protein folding and function.
Collapse
Affiliation(s)
- Umberto Perron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.,Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Iain H Moal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom.,Computational and Modelling Sciences, GlaxoSmithKline Research and Development, Stevenage, United Kingdom
| |
Collapse
|
13
|
Popov P, Kozlovskii I, Katritch V. Computational design for thermostabilization of GPCRs. Curr Opin Struct Biol 2019; 55:25-33. [PMID: 30909106 DOI: 10.1016/j.sbi.2019.02.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Accepted: 02/19/2019] [Indexed: 10/27/2022]
Abstract
GPCR superfamily is the largest clinically relevant family of targets in human genome; however, low thermostability and high conformational plasticity of these integral membrane proteins make them notoriously hard to handle in biochemical, biophysical, and structural experiments. Here, we describe the recent advances in computational approaches to design stabilizing mutations for GPCR that take advantage of the structural and sequence conservation properties of the receptors, and employ machine learning on accumulated mutation data for the superfamily. The fast and effective computational tools can provide a viable alternative to existing experimental mutation screening and are poised for further improvements with expansion of thermostability datasets for training the machine learning models. The rapidly growing practical applications of computational stability design streamline GPCR structure determination and may contribute to more efficient drug discovery.
Collapse
Affiliation(s)
- Petr Popov
- Skolkovo Institute of Science and Technology, Moscow, Russia; Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Igor Kozlovskii
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Vsevolod Katritch
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia; Departments of Biological Sciences and Chemistry, Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
14
|
Popov P, Peng Y, Shen L, Stevens RC, Cherezov V, Liu ZJ, Katritch V. Computational design of thermostabilizing point mutations for G protein-coupled receptors. eLife 2018; 7:34729. [PMID: 29927385 PMCID: PMC6013254 DOI: 10.7554/elife.34729] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Accepted: 05/05/2018] [Indexed: 12/02/2022] Open
Abstract
Engineering of GPCR constructs with improved thermostability is a key for successful structural and biochemical studies of this transmembrane protein family, targeted by 40% of all therapeutic drugs. Here we introduce a comprehensive computational approach to effective prediction of stabilizing mutations in GPCRs, named CompoMug, which employs sequence-based analysis, structural information, and a derived machine learning predictor. Tested experimentally on the serotonin 5-HT2C receptor target, CompoMug predictions resulted in 10 new stabilizing mutations, with an apparent thermostability gain ~8.8°C for the best single mutation and ~13°C for a triple mutant. Binding of antagonists confers further stabilization for the triple mutant receptor, with total gains of ~21°C as compared to wild type apo 5-HT2C. The predicted mutations enabled crystallization and structure determination for the 5-HT2C receptor complexes in inactive and active-like states. While CompoMug already shows high 25% hit rate and utility in GPCR structural studies, further improvements are expected with accumulation of structural and mutation data. The trillions of cells in the human body rely on receptors that sit in their cell membranes to communicate with each other. Hundreds of different receptors belong to the G protein-coupled receptor superfamily (called GPCRs for short) and play vital roles in the all organs and bodily systems. Indeed, GPCRs are the targets for almost 40% of therapeutic drugs. As such, deciphering the shape and activity of GPCRs is key to understanding the normal workings of the human biology and could help scientists discover new treatments for various diseases, from depression to high blood pressure to cancer. These receptors, however, are notoriously flimsy and unstable, making them difficult to work with in the laboratory. Different approaches have been developed to make GPCRs more stable, usually by swapping one or a few of the amino acid building blocks in the protein for other amino acids. Currently, this requires a costly and slow trial-and-error approach in which each amino acid out of 300-400 in the protein is mutated and tested experimentally. To speed up and reduce the cost of the process, Popov et al. asked if a computer could predict which mutations in the protein would stabilize it, meaning that fewer proteins would actually need to be tested. Four computer algorithms based on four different principles were developed and verified. The first one compares the target GPCR to other closely related receptors, trying to detect variations that cause the instability. The second tries to build in specific stabilizing interactions, or “bridges”, between different parts of the receptor. The third algorithm searches the known structures of other GPCRs for useful mutations. Finally, the fourth one uses accumulated data on the stability of hundreds of mutations in different GPCRs to train a machine learning predictor to recognize stabilizing mutations. All four algorithms produced useful predictions in a real-life project. Indeed, when combined in one computational tool, named CompoMug, the algorithms made it possible to detect optimal mutations in a human GPCR called 5-HT2C. This made the protein much easier to work with in the laboratory, and ultimately helped to solve its three-dimensional structure (which was reported in a separate study, published earlier in 2018) The 5-HT2C receptor is involved in regulating, among other things, mood and appetite. Details of its structure might therefore help researchers to design new antidepressants and obesity treatments. Moreover, CompoMug is already helping structural biologists to solve the structures of other GPCRs, which will further facilitate many aspects of GPCR drug discovery.
Collapse
Affiliation(s)
- Petr Popov
- Department of Biological Sciences, University of Southern California, Los Angeles, Los Angeles, United States.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Yao Peng
- iHuman Institute, ShanghaiTech University, Shanghai, China
| | - Ling Shen
- iHuman Institute, ShanghaiTech University, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Raymond C Stevens
- Department of Biological Sciences, University of Southern California, Los Angeles, Los Angeles, United States.,iHuman Institute, ShanghaiTech University, Shanghai, China.,Department of Chemistry, University of Southern California, Los Angeles, Los Angeles, United States.,Bridge Institute, University of Southern California, Los Angeles, Los Angeles, United States
| | - Vadim Cherezov
- Department of Biological Sciences, University of Southern California, Los Angeles, Los Angeles, United States.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia.,Department of Chemistry, University of Southern California, Los Angeles, Los Angeles, United States.,Bridge Institute, University of Southern California, Los Angeles, Los Angeles, United States
| | - Zhi-Jie Liu
- iHuman Institute, ShanghaiTech University, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China.,Insititute of Molecular and Clinical Medicine, Kunming Medical University, Kunming, China
| | - Vsevolod Katritch
- Department of Biological Sciences, University of Southern California, Los Angeles, Los Angeles, United States.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia.,Department of Chemistry, University of Southern California, Los Angeles, Los Angeles, United States.,Bridge Institute, University of Southern California, Los Angeles, Los Angeles, United States
| |
Collapse
|
15
|
Izquierdo C, Gómez-Tamayo JC, Nebel JC, Pardo L, Gonzalez A. Identifying human diamine sensors for death related putrescine and cadaverine molecules. PLoS Comput Biol 2018; 14:e1005945. [PMID: 29324768 PMCID: PMC5783396 DOI: 10.1371/journal.pcbi.1005945] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 01/24/2018] [Accepted: 12/28/2017] [Indexed: 12/21/2022] Open
Abstract
Pungent chemical compounds originating from decaying tissue are strong drivers of animal behavior. Two of the best-characterized death smell components are putrescine (PUT) and cadaverine (CAD), foul-smelling molecules produced by decarboxylation of amino acids during decomposition. These volatile polyamines act as ‘necromones’, triggering avoidance or attractive responses, which are fundamental for the survival of a wide range of species. The few studies that have attempted to identify the cognate receptors for these molecules have suggested the involvement of the seven-helix trace amine-associated receptors (TAARs), localized in the olfactory epithelium. However, very little is known about the precise chemosensory receptors that sense these compounds in the majority of organisms and the molecular basis of their interactions. In this work, we have used computational strategies to characterize the binding between PUT and CAD with the TAAR6 and TAAR8 human receptors. Sequence analysis, homology modeling, docking and molecular dynamics studies suggest a tandem of negatively charged aspartates in the binding pocket of these receptors which are likely to be involved in the recognition of these small biogenic diamines. The distinctive dead smell comes largely from molecules like cadaverine and putrescine that are produced during decomposition of organic tissues. These volatile compounds act as powerful chemical signals important for the survival of a wide range of species. Previous studies have identified the trace amine-associated receptor 13c (or TAAR13c) in zebrafish as the cognate receptor of cadaverine in bony fishes. In this work, we employed computational strategies to disclose the human TAAR6 and TAAR8 receptors as sensors of the putrescine and cadaverine molecules. Our results indicate that several negatively charged residues in the ligand binding pocket of these receptors constitute the molecular basis for recognition of these necromones in humans.
Collapse
Affiliation(s)
- Cristina Izquierdo
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193 Bellaterra, Spain
| | - José C. Gómez-Tamayo
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193 Bellaterra, Spain
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, United Kingdom
| | - Leonardo Pardo
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193 Bellaterra, Spain
| | - Angel Gonzalez
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, E-08193 Bellaterra, Spain
- * E-mail:
| |
Collapse
|
16
|
Barlowe S, Coan HB, Youker RT. SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment. PeerJ 2017; 5:e3492. [PMID: 28674656 PMCID: PMC5490468 DOI: 10.7717/peerj.3492] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 05/27/2017] [Indexed: 01/13/2023] Open
Abstract
Understanding how proteins mutate is critical to solving a host of biological problems. Mutations occur when an amino acid is substituted for another in a protein sequence. The set of likelihoods for amino acid substitutions is stored in a matrix and input to alignment algorithms. The quality of the resulting alignment is used to assess the similarity of two or more sequences and can vary according to assumptions modeled by the substitution matrix. Substitution strategies with minor parameter variations are often grouped together in families. For example, the BLOSUM and PAM matrix families are commonly used because they provide a standard, predefined way of modeling substitutions. However, researchers often do not know if a given matrix family or any individual matrix within a family is the most suitable. Furthermore, predefined matrix families may inaccurately reflect a particular hypothesis that a researcher wishes to model or otherwise result in unsatisfactory alignments. In these cases, the ability to compare the effects of one or more custom matrices may be needed. This laborious process is often performed manually because the ability to simultaneously load multiple matrices and then compare their effects on alignments is not readily available in current software tools. This paper presents SubVis, an interactive R package for loading and applying multiple substitution matrices to pairwise alignments. Users can simultaneously explore alignments resulting from multiple predefined and custom substitution matrices. SubVis utilizes several of the alignment functions found in R, a common language among protein scientists. Functions are tied together with the Shiny platform which allows the modification of input parameters. Information regarding alignment quality and individual amino acid substitutions is displayed with the JavaScript language which provides interactive visualizations for revealing both high-level and low-level alignment information.
Collapse
Affiliation(s)
- Scott Barlowe
- Department of Mathematics and Computer Science, Western Carolina University, Cullowhee, NC, United States of America
| | - Heather B Coan
- Department of Biology, Western Carolina University, Cullowhee, NC, United States of America
| | - Robert T Youker
- Department of Biology, Western Carolina University, Cullowhee, NC, United States of America
| |
Collapse
|
17
|
Interaction of G protein coupled receptors and cholesterol. Chem Phys Lipids 2016; 199:61-73. [PMID: 27108066 DOI: 10.1016/j.chemphyslip.2016.04.006] [Citation(s) in RCA: 152] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Revised: 03/30/2016] [Accepted: 04/19/2016] [Indexed: 12/20/2022]
Abstract
G protein coupled receptors (GPCRs) form the largest receptor superfamily in eukaryotic cells. Owing to their seven transmembrane helices, large parts of these proteins are embedded in the cholesterol-rich plasma membrane bilayer. Thus, GPCRs are always in proximity to cholesterol. Some of them are functionally dependent on the specific presence of cholesterol. Over the last years, enormous progress on receptor structures has been achieved. While lipophilic ligands other than cholesterol have been shown to bind either inside the helix bundle or at the receptor-lipid interface, the binding site of cholesterol was either a single transmembrane helix or a groove between two or more transmembrane helices. A clear preference for one of the two membrane leaflets has not been observed. Not surprisingly, many hydrophobic residues (primarily leucine and isoleucine) were found to be involved in cholesterol binding. In most cases, the rough β-face of cholesterol contacted the transmembrane helix bundle rather than the surrounding lipid matrix. The polar hydroxy group of cholesterol was localized near the water-membrane interface with potential hydrogen bonding to residues in receptor loop regions. Although a canonical motif, designated as CCM site, was detected as a specific cholesterol binding site in case of the β2AR, this site was not found to be occupied by cholesterol in other GPCRs possessing the same motif. Cholesterol-receptor interactions can increase the compactness of the receptor structure and are able to enhance the conformational stability towards active or inactive receptor states. Overall, all current data suggest a high plasticity of cholesterol interaction sites in GPCRs.
Collapse
|
18
|
Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications. PLoS Comput Biol 2016; 12:e1004805. [PMID: 27028541 PMCID: PMC4814114 DOI: 10.1371/journal.pcbi.1004805] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Accepted: 02/11/2016] [Indexed: 11/23/2022] Open
Abstract
The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs. These residues can be used to make testable hypotheses about the structural basis of receptor function and about the molecular basis of disease-associated single nucleotide polymorphisms. G-protein coupled receptors (GPCRs) are a large superfamily of integral membrane proteins that share a characteristic 7 transmembrane helix fold. They detect various molecules outside of the cell and signal their presence to the inside of the cell. At least half of the 800 human GPCRs are potential drug targets, so understanding their structure and function is critical. Experimental structures are now available for at least one receptor from each GPCR class. The structure of the 7 helix fold is highly conserved even for receptors with very low sequence similarity. We analyze the available experimental structures and compare the common inter-helical contacts. Our analysis leads to a unified sequence-structure alignment of the GPCR superfamily that can then be used as the starting point for structure prediction of all other GPCRs. A key result of our analysis is a list of conserved contact residues and activation “hot-spots” residues that are critical for GPCR folding and function. We propose that mutations and natural variants of amino acids at these locations in the GPCRs can dramatically influence their activation state and alter intracellular signaling. This provides hypotheses for the molecular mechanisms underlying disease causing mutants for any GPCR.
Collapse
|