1
|
Barteri F, Valenzuela A, Farré X, de Juan D, Muntané G, Esteve-Altava B, Navarro A. CAAStools: a toolbox to identify and test Convergent Amino Acid Substitutions. Bioinformatics 2023; 39:btad623. [PMID: 37846039 PMCID: PMC10598582 DOI: 10.1093/bioinformatics/btad623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 08/04/2023] [Accepted: 10/13/2023] [Indexed: 10/18/2023] Open
Abstract
MOTIVATION Coincidence of Convergent Amino Acid Substitutions (CAAS) with phenotypic convergences allow pinpointing genes and even individual mutations that are likely to be associated with trait variation within their phylogenetic context. Such findings can provide useful insights into the genetic architecture of complex phenotypes. RESULTS Here we introduce CAAStools, a set of bioinformatics tools to identify and validate CAAS in orthologous protein alignments for predefined groups of species representing the phenotypic values targeted by the user. AVAILABILITY AND IMPLEMENTATION CAAStools source code is available at http://github.com/linudz/caastools, along with documentation and examples.
Collapse
Affiliation(s)
- Fabio Barteri
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. C. Doctor Aiguader 88, Barcelona 08003, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, C/ Wellington 30, Barcelona 08006, Spain
| | - Alejandro Valenzuela
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. C. Doctor Aiguader 88, Barcelona 08003, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, C/ Wellington 30, Barcelona 08006, Spain
| | - Xavier Farré
- Genomes for Life-GCAT Lab, GermanTrias i Pujol Research Institute (IGTP), Camí de les Escoles, s/n, Badalona 08916, Spain
| | - David de Juan
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. C. Doctor Aiguader 88, Barcelona 08003, Spain
| | - Gerard Muntané
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. C. Doctor Aiguader 88, Barcelona 08003, Spain
- Institut d’Investigació Sanitària Pere Virgili (IISPV), Hospital Universitari Institut Pere Mata, Universitat Rovira i Virgili. Avda. Josep Laporte, 2 – Planta 0 – E2 color taronja, Reus 43204, Spain
- Centro de Investigación Biomédica en Red en Salud Mental (CIBERSAM), Av. Monforte de Lemos, 3-5. Pabellón 11. Planta 0. Madrid 28029, Spain
| | - Borja Esteve-Altava
- European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Arcadi Navarro
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra. C. Doctor Aiguader 88, Barcelona 08003, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, C/ Wellington 30, Barcelona 08006, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) and Universitat Pompeu Fabra, Pg. Lluís Companys 23, Barcelona 08010, Spain
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, C. Doctor Aiguader N88, Barcelona 08003, Spain
| |
Collapse
|
2
|
Fam BSO, Reales G, Vargas-Pinilla P, Paré P, Viscardi LH, Sortica VA, Felkl AB, de O Franco Á, Lucion AB, Costa-Neto CM, Pissinatti A, Salzano FM, Paixão-Côrtes VR, Bortolini MC. AVPR1b variation and the emergence of adaptive phenotypes in Platyrrhini primates. Am J Primatol 2019; 81:e23028. [PMID: 31318063 DOI: 10.1002/ajp.23028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 05/31/2019] [Accepted: 06/16/2019] [Indexed: 12/14/2022]
Abstract
Platyrrhini (New World monkeys, NWm) are a group of primates characterized by behavioral and reproductive traits that are otherwise uncommon among primates, including social monogamy, direct paternal care, and twin births. As a consequence, the study of Platyrrhine primates is an invaluable tool for the discovery of the genetic repertoire underlying these taxon-specific traits. Recently, high conservation of vasopressin (AVP) sequence, in contrast with high variability of oxytocin (OXT), has been described in NWm. AVP and OXT functions are possible due to interaction with their receptors: AVPR1a, AVPR1b, AVPR2, and OXTR; and the variability in this system is associated with the traits mentioned above. Understanding the variability in the receptors is thus fundamental to understand the function and evolution of the system as a whole. Here we describe the variability of AVPR1b coding region in 20 NWm species, which is well-known to influence behavioral traits such as aggression, anxiety, and stress control in placental mammals. Our results indicate that 4% of AVPR1b sites may be under positive selection and a significant number of sites under relaxed selective constraint. Considering the known role of AVPR1b, we suggest that some of the changes described here for the Platyrrhini may be a part of the genetic repertoire connected with the complex network of neuroendocrine mechanisms of AVP-OXT system in the modulation of the HPA axis. Thus, these changes may have promoted the emergence of social behaviors such as direct paternal care in socially monogamous species that are also characterized by small body size and twin births.
Collapse
Affiliation(s)
- Bibiana S O Fam
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Guillermo Reales
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil.,INAGEMP - Instituto de Genética Médica e Populacional, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Pedro Vargas-Pinilla
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Pamela Paré
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Lucas H Viscardi
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Vinicius A Sortica
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Aline B Felkl
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Álvaro de O Franco
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Aldo B Lucion
- Departamento de Fisiologia, Instituto de Ciências Básicas da Saúde, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Claudio M Costa-Neto
- Departamento de Bioquímica e Imunologia, Faculdade de Medicina, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
| | | | - Francisco M Salzano
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Vanessa R Paixão-Côrtes
- Departamento de Biologia Geral, Instituto de Biologia, Universidade Federal da Bahia, Salvador, Brazil
| | - Maria Cátira Bortolini
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| |
Collapse
|
3
|
Correlated Selection on Amino Acid Deletion and Replacement in Mammalian Protein Sequences. J Mol Evol 2018; 86:365-378. [PMID: 29955898 DOI: 10.1007/s00239-018-9853-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 06/21/2018] [Indexed: 10/28/2022]
Abstract
A low ratio of nonsynonymous and synonymous substitution rates (dN/dS) at a codon is an indicator of functional constraint caused by purifying selection. Intuitively, the functional constraint would also be expected to prevent such a codon from being deleted. However, to the best of our knowledge, the correlation between the rates of deletion and substitution has never actually been estimated. Here, we use 8595 protein-coding region sequences from nine mammalian species to examine the relationship between deletion rate and dN/dS. We find significant positive correlations at the levels of both sites and genes. We compared our data against controls consisting of simulated coding sequences evolving along identical phylogenetic trees, where deletions occur independently of substitutions. A much weaker correlation was found in the corresponding simulated sequences, probably caused by alignment errors. In the real data, the correlations cannot be explained by alignment errors. Separate investigations on nonsynonymous (dN) and synonymous (dS) substitution rates indicate that the correlation is most likely due to a similarity in patterns of selection rather than in mutation rates.
Collapse
|
4
|
Levy Karin E, Rabin A, Ashkenazy H, Shkedy D, Avram O, Cartwright RA, Pupko T. Inferring Indel Parameters using a Simulation-based Approach. Genome Biol Evol 2015; 7:3226-38. [PMID: 26537226 PMCID: PMC4700945 DOI: 10.1093/gbe/evv212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
In this study, we present a novel methodology to infer indel parameters from multiple sequence alignments (MSAs) based on simulations. Our algorithm searches for the set of evolutionary parameters describing indel dynamics which best fits a given input MSA. In each step of the search, we use parametric bootstraps and the Mahalanobis distance to estimate how well a proposed set of parameters fits input data. Using simulations, we demonstrate that our methodology can accurately infer the indel parameters for a large variety of plausible settings. Moreover, using our methodology, we show that indel parameters substantially vary between three genomic data sets: Mammals, bacteria, and retroviruses. Finally, we demonstrate how our methodology can be used to simulate MSAs based on indel parameters inferred from real data sets.
Collapse
Affiliation(s)
- Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Avigayel Rabin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Dafna Shkedy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Oren Avram
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel The Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe School of Life Sciences, Arizona State University, Tempe
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| |
Collapse
|
5
|
Protein sequence conservation and stable molecular evolution reveals influenza virus nucleoprotein as a universal druggable target. INFECTION GENETICS AND EVOLUTION 2015; 34:200-10. [PMID: 26140959 DOI: 10.1016/j.meegid.2015.06.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Revised: 06/16/2015] [Accepted: 06/29/2015] [Indexed: 01/05/2023]
Abstract
The high mutation rate in influenza virus genome and appearance of drug resistance calls for a constant effort to identify alternate drug targets and develop new antiviral strategies. The internal proteins of the virus can be exploited as a potential target for therapeutic interventions. Among these, the nucleoprotein (NP) is the most abundant protein that provides structural and functional support to the viral replication machinery. The current study aims at analysis of protein sequence polymorphism patterns, degree of molecular evolution and sequence conservation as a function of potential druggability of nucleoprotein. We analyzed a universal set of amino acid sequences, (n=22,000) and, in order to identify and correlate the functionally conserved, druggable regions across different parameters, classified them on the basis of host organism, strain type and continental region of sample isolation. The results indicated that around 95% of the sequence length was conserved, with at least 7 regions conserved across the protein among various classes. Moreover, the highly variable regions, though very limited in number, were found to be positively selected indicating, thereby, the high degree of protein stability against various hosts and spatio-temporal references. Furthermore, on mapping the conserved regions on the protein, 7 drug binding pockets in the functionally important regions of the protein were revealed. The results, therefore, collectively indicate that nucleoprotein is a highly conserved and stable viral protein that can potentially be exploited for development of broadly effective antiviral strategies.
Collapse
|
6
|
Babar MM, Zaidi NUSS, Tahir M. Global geno-proteomic analysis reveals cross-continental sequence conservation and druggable sites among influenza virus polymerases. Antiviral Res 2014; 112:120-31. [DOI: 10.1016/j.antiviral.2014.10.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 10/23/2014] [Accepted: 10/24/2014] [Indexed: 12/23/2022]
|
7
|
Wang HC, Susko E, Roger AJ. An amino acid substitution-selection model adjusts residue fitness to improve phylogenetic estimation. Mol Biol Evol 2014; 31:779-92. [PMID: 24441033 DOI: 10.1093/molbev/msu044] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Standard protein phylogenetic models use fixed rate matrices of amino acid interchange derived from analyses of large databases. Differences between the stationary amino acid frequencies of these rate matrices from those of a data set of interest are typically adjusted for by matrix multiplication that converts the empirical rate matrix to an exchangeability matrix which is then postmultiplied by the amino acid frequencies in the alignment. The result is a time-reversible rate matrix with stationary amino acid frequencies equal to the data set frequencies. On the basis of population genetics principles, we develop an amino acid substitution-selection model that parameterizes the fitness of an amino acid as the logarithm of the ratio of the frequency of the amino acid to the frequency of the same amino acid under no selection. The model gives rise to a different sequence of matrix multiplications to convert an empirical rate matrix to one that has stationary amino acid frequencies equal to the data set frequencies. We incorporated the substitution-selection model with an improved amino acid class frequency mixture (cF) model to partially take into account site-specific amino acid frequencies in the phylogenetic models. We show that 1) the selection models fit data significantly better than corresponding models without selection for most of the 21 test data sets; 2) both cF and cF selection models favored the phylogenetic trees that were inferred under current sophisticated models and methods for three difficult phylogenetic problems (the positions of microsporidia and breviates in eukaryote phylogeny and the position of the root of the angiosperm tree); and 3) for data simulated under site-specific residue frequencies, the cF selection models estimated trees closer to the generating trees than a standard Г model or cF without selection. We also explored several ways of estimating amino acid frequencies under neutral evolution that are required for these selection models. By better modeling the amino acid substitution process, the cF selection models will be valuable for phylogenetic inference and evolutionary studies.
Collapse
Affiliation(s)
- Huai-Chun Wang
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | | | | |
Collapse
|