1
|
Vaglietti S, Boggio Bozzo S, Ghirardi M, Fiumara F. Divergent evolution of low-complexity regions in the vertebrate CPEB protein family. FRONTIERS IN BIOINFORMATICS 2025; 5:1491735. [PMID: 40182702 PMCID: PMC11965684 DOI: 10.3389/fbinf.2025.1491735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 01/28/2025] [Indexed: 04/05/2025] Open
Abstract
The cytoplasmic polyadenylation element-binding proteins (CPEBs) are a family of translational regulators involved in multiple biological processes, including memory-related synaptic plasticity. In vertebrates, four paralogous genes (CPEB1-4) encode proteins with phylogenetically conserved C-terminal RNA-binding domains and variable N-terminal regions (NTRs). The CPEB NTRs are characterized by low-complexity regions (LCRs), including homopolymeric amino acid repeats (AARs), and have been identified as mediators of liquid-liquid phase separation (LLPS) and prion-like aggregation. After their appearance following gene duplication, the four paralogous CPEB proteins functionally diverged in terms of activation mechanisms and modes of mRNA binding. The paralog-specific NTRs may have contributed substantially to such functional diversification but their evolutionary history remains largely unexplored. Here, we traced the evolution of vertebrate CPEBs and their LCRs/AARs focusing on primary sequence composition, complexity, repetitiveness, and their possible functional impact on LLPS propensity and prion-likeness. We initially defined these composition- and function-related quantitative parameters for the four human CPEB paralogs and then systematically analyzed their evolutionary variation across more than 500 species belonging to nine major clades of different stem age, from Chondrichthyes to Euarchontoglires, along the vertebrate lineage. We found that the four CPEB proteins display highly divergent, paralog-specific evolutionary trends in composition- and function-related parameters, primarily driven by variation in their LCRs/AARs and largely related to clade stem ages. These findings shed new light on the molecular and functional evolution of LCRs in the CPEB protein family, in both quantitative and qualitative terms, highlighting the emergence of CPEB2 as a proline-rich prion-like protein in younger vertebrate clades, including Primates.
Collapse
Affiliation(s)
| | | | | | - Ferdinando Fiumara
- “Rita Levi-Montalcini” Department of Neuroscience, University of Turin, Turin, Italy
| |
Collapse
|
2
|
Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications. Mol Biol Evol 2024; 41:msae177. [PMID: 39172750 PMCID: PMC11385596 DOI: 10.1093/molbev/msae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/02/2024] [Accepted: 07/09/2024] [Indexed: 08/24/2024] Open
Abstract
Insertions and deletions constitute the second most important source of natural genomic variation. Insertions and deletions make up to 25% of genomic variants in humans and are involved in complex evolutionary processes including genomic rearrangements, adaptation, and speciation. Recent advances in long-read sequencing technologies allow detailed inference of insertions and deletion variation in species and populations. Yet, despite their importance, evolutionary studies have traditionally ignored or mishandled insertions and deletions due to a lack of comprehensive methodologies and statistical models of insertions and deletion dynamics. Here, we discuss methods for describing insertions and deletion variation and modeling insertions and deletions over evolutionary time. We provide practical advice for tackling insertions and deletions in genomic sequences and illustrate our discussion with examples of insertions and deletion-induced effects in human and other natural populations and their contribution to evolutionary processes. We outline promising directions for future developments in statistical methodologies that would allow researchers to analyze insertions and deletion variation and their effects in large genomic data sets and to incorporate insertions and deletions in evolutionary inference.
Collapse
Affiliation(s)
| | - Ian Holmes
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
- Calico Life Sciences LLC, South San Francisco, CA 94080, USA
| | - Gerton Lunter
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9713 GZ, The Netherlands
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
3
|
Desingu PA, Rubeni TP, Nagarajan K, Sundaresan NR. Molecular evolution of 2022 multi-country outbreak-causing monkeypox virus Clade IIb. iScience 2024; 27:108601. [PMID: 38188513 PMCID: PMC10770499 DOI: 10.1016/j.isci.2023.108601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 09/16/2023] [Accepted: 11/28/2023] [Indexed: 01/09/2024] Open
Abstract
The monkeypox virus (Mpoxv) Clade IIb viruses that caused an outbreak in 2017-18 in Nigeria and its genetically related viruses have been detected in many countries and caused multi-country outbreak in 2022. Since the pandemic-causing Mpoxv Clade IIb viruses are closely related to Clade IIa viruses which mostly cause endemic, the Clade IIb Mpoxv might have certain specific genetic variations that are still largely unknown. Here, we have systematically analyzed genetic alterations in different clades of Mpox viruses. The results suggest that the Mpoxv Clade IIb have genetic variations in terms of genomic gaps, frameshift mutations, in-frame nonsense mutations, amino acid tandem repeats, and APOBEC3 mutations. Further, we observed specific genetic variations in the multiple genes specific for Clade I and Clade IIb, and exclusive genetic variations for Clade IIa and Clade IIb. Collectively, findings shed light on the evolution and genetic variations in the outbreak of 2022 causing Mpoxv Clade IIb.
Collapse
Affiliation(s)
- Perumal Arumugam Desingu
- Department of Microbiology and Cell Biology, Indian Institute of Science, Bengaluru 560012, India
| | | | - K. Nagarajan
- Department of Veterinary Pathology, Madras Veterinary College, Vepery, Chennai 600007, Tamil Nadu
- Veterinary and Animal Sciences University (TANUVAS)
| | | |
Collapse
|
4
|
Bader AS, Bushell M. iMUT-seq: high-resolution DSB-induced mutation profiling reveals prevalent homologous-recombination dependent mutagenesis. Nat Commun 2023; 14:8419. [PMID: 38110444 PMCID: PMC10728174 DOI: 10.1038/s41467-023-44167-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 12/04/2023] [Indexed: 12/20/2023] Open
Abstract
DNA double-strand breaks (DSBs) are the most mutagenic form of DNA damage, and play a significant role in cancer biology, neurodegeneration and aging. However, studying DSB-induced mutagenesis is limited by our current approaches. Here, we describe iMUT-seq, a technique that profiles DSB-induced mutations at high-sensitivity and single-nucleotide resolution around endogenous DSBs. By depleting or inhibiting 20 DSB-repair factors we define their mutational signatures in detail, revealing insights into the mechanisms of DSB-induced mutagenesis. Notably, we find that homologous-recombination (HR) is more mutagenic than previously thought, inducing prevalent base substitutions and mononucleotide deletions at distance from the break due to DNA-polymerase errors. Simultaneously, HR reduces translocations, suggesting a primary role of HR is specifically the prevention of genomic rearrangements. The results presented here offer fundamental insights into DSB-induced mutagenesis and have significant implications for our understanding of cancer biology and the development of DDR-targeting chemotherapeutics.
Collapse
Affiliation(s)
- Aldo S Bader
- Cancer Research UK Beatson Institute, Glasgow, G61 1BD, UK.
- Cancer Research UK/CI, University of Cambridge, Li Ka Shing Centre, Cambridge, CB2 0RE, UK.
- The Gurdon Institute, University of Cambridge, Biochemistry, Cambridge, UK.
| | - Martin Bushell
- Cancer Research UK Beatson Institute, Glasgow, G61 1BD, UK.
- Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1QH, UK.
| |
Collapse
|
5
|
Lynch VJ, Wagner GP. Cooption of polyalanine tract into a repressor domain in the mammalian transcription factor HoxA11. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2023; 340:486-495. [PMID: 34125492 DOI: 10.1002/jez.b.23063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 04/21/2021] [Accepted: 04/26/2021] [Indexed: 06/12/2023]
Abstract
An enduring problem in biology is explaining how novel functions of genes originated and how those functions diverge between species. Despite detailed studies on the functional evolution of a few proteins, the molecular mechanisms by which protein functions have evolved are almost entirely unknown. Here, we show that a polyalanine tract in the homeodomain transcription factor HoxA11 arose in the stem-lineage of mammals and functions as an autonomous repressor module by physically interacting with the PAH domains of SIN3 proteins. These results suggest that long polyalanine tracts, which are common in transcription factors and often associated with disease, may tend to function as repressor domains and can contribute to the diversification of transcription factor functions despite the deleterious consequences of polyalanine tract expansion.
Collapse
Affiliation(s)
- Vincent J Lynch
- Department of Biological Sciences, University at Buffalo, Buffalo, New York, USA
| | - Gunter P Wagner
- Department of Ecology and Evolutionary Biology and Yale Systems Biology Institute, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
6
|
White LJ, Russell AJ, Pizzey AR, Dasmahapatra KK, Pownall ME. The Presence of Two MyoD Genes in a Subset of Acanthopterygii Fish Is Associated with a Polyserine Insert in MyoD1. J Dev Biol 2023; 11:jdb11020019. [PMID: 37218813 DOI: 10.3390/jdb11020019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/20/2023] [Accepted: 04/26/2023] [Indexed: 05/24/2023] Open
Abstract
The MyoD gene was duplicated during the teleost whole genome duplication and, while a second MyoD gene (MyoD2) was subsequently lost from the genomes of some lineages (including zebrafish), many fish lineages (including Alcolapia species) have retained both MyoD paralogues. Here we reveal the expression patterns of the two MyoD genes in Oreochromis (Alcolapia) alcalica using in situ hybridisation. We report our analysis of MyoD1 and MyoD2 protein sequences from 54 teleost species, and show that O. alcalica, along with some other teleosts, include a polyserine repeat between the amino terminal transactivation domains (TAD) and the cysteine-histidine rich region (H/C) in MyoD1. The evolutionary history of MyoD1 and MyoD2 is compared to the presence of this polyserine region using phylogenetics, and its functional relevance is tested using overexpression in a heterologous system to investigate subcellular localisation, stability, and activity of MyoD proteins that include and do not include the polyserine region.
Collapse
Affiliation(s)
- Lewis J White
- Biology Department, University of York, York YO10 5DD, UK
| | | | | | | | - Mary E Pownall
- Biology Department, University of York, York YO10 5DD, UK
| |
Collapse
|
7
|
Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol 2023; 36:321-336. [PMID: 36289560 PMCID: PMC9990875 DOI: 10.1111/jeb.14106] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/29/2022] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.
Collapse
Affiliation(s)
- Max Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Mikhail Maksimov
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Ye Jin
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of BioengineeringUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Melissa Gymrek
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Tugce Bilgin Sonay
- Institute of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
8
|
Cascarina SM, Ross ED. Expansion and functional analysis of the SR-related protein family across the domains of life. RNA (NEW YORK, N.Y.) 2022; 28:1298-1314. [PMID: 35863866 PMCID: PMC9479744 DOI: 10.1261/rna.079170.122] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 06/29/2022] [Indexed: 06/15/2023]
Abstract
Serine/arginine-rich (SR) proteins comprise a family of proteins that is predominantly found in eukaryotes and plays a prominent role in RNA splicing. A characteristic feature of SR proteins is the presence of an S/R-rich low-complexity domain (RS domain), often in conjunction with spatially distinct RNA recognition motifs (RRMs). To date, 52 human proteins have been classified as SR or SR-related proteins. Here, using an unbiased series of composition criteria together with enrichment for known RNA binding activity, we identified >100 putative SR-related proteins in the human proteome. This method recovers known SR and SR-related proteins with high sensitivity (∼94%), yet identifies a number of additional proteins with many of the hallmark features of true SR-related proteins. Newly identified SR-related proteins display slightly different amino acid compositions yet similar levels of post-translational modification, suggesting that these new SR-related candidates are regulated in vivo and functionally important. Furthermore, candidate SR-related proteins with known RNA-binding activity (but not currently recognized as SR-related proteins) are nevertheless strongly associated with a variety of functions related to mRNA splicing and nuclear speckles. Finally, we applied our SR search method to all available reference proteomes, and provide maps of RS domains and Pfam annotations for all putative SR-related proteins as a resource. Together, these results expand the set of SR-related proteins in humans, and identify the most common functions associated with SR-related proteins across all domains of life.
Collapse
Affiliation(s)
- Sean M Cascarina
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado 80523, USA
| | - Eric D Ross
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado 80523, USA
| |
Collapse
|
9
|
Becerra A, Muñoz-Velasco I, Aguilar-Cámara A, Cottom-Salas W, Cruz-González A, Vázquez-Salazar A, Hernández-Morales R, Jácome R, Campillo-Balderas JA, Lazcano A. Two short low complexity regions (LCRs) are hallmark sequences of the Delta SARS-CoV-2 variant spike protein. Sci Rep 2022; 12:936. [PMID: 35042962 PMCID: PMC8766472 DOI: 10.1038/s41598-022-04976-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 01/04/2022] [Indexed: 11/24/2022] Open
Abstract
Low complexity regions (LCRs) are protein sequences formed by a set of compositionally biased residues. LCRs are extremely abundant in cellular proteins and have also been reported in viruses, where they may partake in evasion of the host immune system. Analyses of 28,231 SARS-CoV-2 whole proteomes and of 261,051 spike protein sequences revealed the presence of four extremely conserved LCRs in the spike protein of several SARS-CoV-2 variants. With the exception of Iota, where it is absent, the Spike LCR-1 is present in the signal peptide of 80.57% of the Delta variant sequences, and in other variants of concern and interest. The Spike LCR-2 is highly prevalent (79.87%) in Iota. Two distinctive LCRs are present in the Delta spike protein. The Delta Spike LCR-3 is present in 99.19% of the analyzed sequences, and the Delta Spike LCR-4 in 98.3% of the same set of proteins. These two LCRs are located in the furin cleavage site and HR1 domain, respectively, and may be considered hallmark traits of the Delta variant. The presence of the medically-important point mutations P681R and D950N in these LCRs, combined with the ubiquity of these regions in the highly contagious Delta variant opens the possibility that they may play a role in its rapid spread.
Collapse
Affiliation(s)
- Arturo Becerra
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | - Israel Muñoz-Velasco
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | | | - Wolfgang Cottom-Salas
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
- Escuela Nacional Preparatoria, Plantel 8 Miguel E. Schulz, Universidad Nacional Autónoma de México, 01600, Mexico City, Mexico
| | - Adrián Cruz-González
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | - Alberto Vázquez-Salazar
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA, 90095, USA
| | | | - Rodrigo Jácome
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | | | - Antonio Lazcano
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.
- El Colegio Nacional, 06470, Mexico City, Mexico.
| |
Collapse
|
10
|
Pelassa I, Cibelli M, Villeri V, Lilliu E, Vaglietti S, Olocco F, Ghirardi M, Montarolo PG, Corà D, Fiumara F. Compound Dynamics and Combinatorial Patterns of Amino Acid Repeats Encode a System of Evolutionary and Developmental Markers. Genome Biol Evol 2020; 11:3159-3178. [PMID: 31589292 PMCID: PMC6839033 DOI: 10.1093/gbe/evz216] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/27/2019] [Indexed: 01/05/2023] Open
Abstract
Homopolymeric amino acid repeats (AARs) like polyalanine (polyA) and polyglutamine (polyQ) in some developmental proteins (DPs) regulate certain aspects of organismal morphology and behavior, suggesting an evolutionary role for AARs as developmental "tuning knobs." It is still unclear, however, whether these are occasional protein-specific phenomena or hints at the existence of a whole AAR-based regulatory system in DPs. Using novel approaches to trace their functional and evolutionary history, we find quantitative evidence supporting a generalized, combinatorial role of AARs in developmental processes with evolutionary implications. We observe nonrandom AAR distributions and combinations in HOX and other DPs, as well as in their interactomes, defining elements of a proteome-wide combinatorial functional code whereby different AARs and their combinations appear preferentially in proteins involved in the development of specific organs/systems. Such functional associations can be either static or display detectable evolutionary dynamics. These findings suggest that progressive changes in AAR occurrence/combination, by altering embryonic development, may have contributed to taxonomic divergence, leaving detectable traces in the evolutionary history of proteomes. Consistent with this hypothesis, we find that the evolutionary trajectories of the 20 AARs in eukaryotic proteomes are highly interrelated and their individual or compound dynamics can sharply mark taxonomic boundaries, or display clock-like trends, carrying overall a strong phylogenetic signal. These findings provide quantitative evidence and an interpretive framework outlining a combinatorial system of AARs whose compound dynamics mark at the same time DP functions and evolutionary transitions.
Collapse
Affiliation(s)
- Ilaria Pelassa
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Marica Cibelli
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Veronica Villeri
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Elena Lilliu
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Serena Vaglietti
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Federica Olocco
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Mirella Ghirardi
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy.,National Institute of Neuroscience (INN), Torino, Italy
| | - Pier Giorgio Montarolo
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy.,National Institute of Neuroscience (INN), Torino, Italy
| | - Davide Corà
- Department of Translational Medicine, Piemonte Orientale University, Novara, Italy.,Center for Translational Research on Autoimmune and Allergic Disease (CAAD), Novara, Italy
| | - Ferdinando Fiumara
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy.,National Institute of Neuroscience (INN), Torino, Italy
| |
Collapse
|
11
|
Chaudhry SR, Lwin N, Phelan D, Escalante AA, Battistuzzi FU. Comparative analysis of low complexity regions in Plasmodia. Sci Rep 2018; 8:335. [PMID: 29321589 PMCID: PMC5762703 DOI: 10.1038/s41598-017-18695-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 12/14/2017] [Indexed: 12/20/2022] Open
Abstract
Low complexity regions (LCRs) are a common feature shared by many genomes, but their evolutionary and functional significance remains mostly unknown. At the core of the uncertainty is a poor understanding of the mechanisms that regulate their retention in genomes, whether driven by natural selection or neutral evolution. Applying a comparative approach of LCRs to multiple strains and species is a powerful approach to identify patterns of conservation in these regions. Using this method, we investigate the evolutionary history of LCRs in the genus Plasmodium based on orthologous protein coding genes shared by 11 species and strains from primate and rodent-infecting pathogens. We find multiple lines of evidence in support of natural selection as a major evolutionary force shaping the composition and conservation of LCRs through time and signatures that their evolutionary paths are species specific. Our findings add a comparative analysis perspective to the debate on the evolution of LCRs and harness the power of sequence comparisons to identify potential functionally important LCR candidates.
Collapse
Affiliation(s)
- S R Chaudhry
- Department of Biological Sciences, Oakland University, Rochester, MI, USA.,Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | - N Lwin
- Department of Biological Sciences, Oakland University, Rochester, MI, USA
| | - D Phelan
- Department of Biological Sciences, Oakland University, Rochester, MI, USA
| | - A A Escalante
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - F U Battistuzzi
- Department of Biological Sciences, Oakland University, Rochester, MI, USA. .,Center for Data Science and Big Data Analytics, Oakland University, Rochester, MI, USA.
| |
Collapse
|
12
|
Ritzman TB, Banovich N, Buss KP, Guida J, Rubel MA, Pinney J, Khang B, Ravosa MJ, Stone AC. Facing the facts: The Runx2 gene is associated with variation in facial morphology in primates. J Hum Evol 2017; 111:139-151. [PMID: 28874267 DOI: 10.1016/j.jhevol.2017.06.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Revised: 06/22/2017] [Accepted: 06/28/2017] [Indexed: 12/31/2022]
Abstract
The phylogenetic and adaptive factors that cause variation in primate facial form-including differences among the major primate clades and variation related to feeding and/or social behavior-are relatively well understood. However, comparatively little is known about the genetic mechanisms that underlie diversity in facial form in primates. Because it is essential for osteoblastic differentiation and skeletal development, the runt-related transcription factor 2 (Runx2) is one gene that may play a role in these genetic mechanisms. Specifically, polymorphisms in the QA ratio (determined by the ratio of the number of polyglutamines to polyalanines in one functional domain of Runx2) have been shown to be correlated with variation in facial length and orientation in other mammal groups. However, to date, the relationship between variation in this gene and variation in facial form in primates has not been explicitly tested. To test the hypothesis that the QA ratio is correlated with facial form in primates, the current study quantified the QA ratio, facial length, and facial angle in a sample of 33 primate species and tested for correlation using phylogenetic generalized least squares. The results indicate that the QA ratio of the Runx2 gene is positively correlated with variation in relative facial length in anthropoid primates. However, no correlation was found in strepsirrhines, and there was no correlation between facial angle and the QA ratio in any groups. These results suggest that, in primates, the QA ratio of the Runx2 gene may play a role in modulating facial size, but not facial orientation. This study therefore provides important clues about the genetic and developmental mechanisms that may underlie variation in facial form in primates.
Collapse
Affiliation(s)
- Terrence B Ritzman
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, USA; Department of Archaeology, University of Cape Town, Cape Town, South Africa; Human Evolution Research Institute, University of Cape Town, Cape Town, South Africa; School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA.
| | - Nicholas Banovich
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA; Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Kaitlin P Buss
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA
| | - Jennifer Guida
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA; School of Public Health, University of Maryland, College Park, MD, USA
| | - Meagan A Rubel
- Department of Anthropology, University of Pennsylvania, Philadelphia, PA, USA
| | - Jennifer Pinney
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA
| | - Bao Khang
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA
| | - Matthew J Ravosa
- Department of Biological Sciences, University of Notre Dame, South Bend, IN, USA; Department of Aerospace and Mechanical Engineering, University of Notre Dame, South Bend, IN, USA; Department of Anthropology, University of Notre Dame, South Bend, IN, USA
| | - Anne C Stone
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA; Center for Bioarchaeological Research, ASU, Tempe, AZ, USA; Institute of Human Origins, ASU, Tempe, AZ, USA
| |
Collapse
|
13
|
Lynch M, Ackerman MS, Gout JF, Long H, Sung W, Thomas WK, Foster PL. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet 2017; 17:704-714. [PMID: 27739533 DOI: 10.1038/nrg.2016.104] [Citation(s) in RCA: 499] [Impact Index Per Article: 62.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
As one of the few cellular traits that can be quantified across the tree of life, DNA-replication fidelity provides an excellent platform for understanding fundamental evolutionary processes. Furthermore, because mutation is the ultimate source of all genetic variation, clarifying why mutation rates vary is crucial for understanding all areas of biology. A potentially revealing hypothesis for mutation-rate evolution is that natural selection primarily operates to improve replication fidelity, with the ultimate limits to what can be achieved set by the power of random genetic drift. This drift-barrier hypothesis is consistent with comparative measures of mutation rates, provides a simple explanation for the existence of error-prone polymerases and yields a formal counter-argument to the view that selection fine-tunes gene-specific mutation rates.
Collapse
Affiliation(s)
- Michael Lynch
- Department of Biology, Indiana University, Bloomington, Indiana 47401, USA
| | - Matthew S Ackerman
- Department of Biology, Indiana University, Bloomington, Indiana 47401, USA
| | - Jean-Francois Gout
- Department of Biology, Indiana University, Bloomington, Indiana 47401, USA
| | - Hongan Long
- Department of Biology, Indiana University, Bloomington, Indiana 47401, USA
| | - Way Sung
- Department of Biology, Indiana University, Bloomington, Indiana 47401, USA
| | - W Kelley Thomas
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, New Hampshire 03824, USA
| | - Patricia L Foster
- Department of Biology, Indiana University, Bloomington, Indiana 47401, USA
| |
Collapse
|
14
|
Persi E, Wolf YI, Koonin EV. Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins. Nat Commun 2016; 7:13570. [PMID: 27857066 PMCID: PMC5120217 DOI: 10.1038/ncomms13570] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Accepted: 10/17/2016] [Indexed: 01/21/2023] Open
Abstract
Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
15
|
Battistuzzi FU, Schneider KA, Spencer MK, Fisher D, Chaudhry S, Escalante AA. Profiles of low complexity regions in Apicomplexa. BMC Evol Biol 2016; 16:47. [PMID: 26923229 PMCID: PMC4770516 DOI: 10.1186/s12862-016-0625-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Accepted: 02/17/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Low complexity regions (LCRs) are a ubiquitous feature in genomes and yet their evolutionary history and functional roles are unclear. Previous studies have shown contrasting evidence in favor of both neutral and selective mechanisms of evolution for different sets of LCRs suggesting that modes of identification of these regions may play a role in our ability to discern their evolutionary history. To further investigate this issue, we used a multiple threshold approach to identify species-specific profiles of proteome complexity and, by comparing properties of these sets, determine the influence that starting parameters have on evolutionary inferences. RESULTS We find that, although qualitatively similar, quantitatively each species has a unique LCR profile which represents the frequency of these regions within each genome. Inferences based on these profiles are more accurate in comparative analyses of genome complexity as they allow to determine the relative complexity of multiple genomes as well as the type of repetitiveness that is most common in each. Based on the multiple threshold LCR sets obtained, we identified predominant evolutionary mechanisms at different complexity levels, which show neutral mechanisms acting on highly repetitive LCRs (e.g., homopolymers) and selective forces becoming more important as heterogeneity of the LCRs increases. CONCLUSIONS Our results show how inferences based on LCRs are influenced by the parameters used to identify these regions. Sets of LCRs are heterogeneous aggregates of regions that include homo- and heteropolymers and, as such, evolve according to different mechanisms. LCR profiles provide a new way to investigate genome complexity across species and to determine the driving mechanism of their evolution.
Collapse
Affiliation(s)
| | - Kristan A Schneider
- Department of MNI, University of Applied Sciences Mittweida, Mittweida, Germany.
| | - Matthew K Spencer
- Department of Geology and Physics, Lake Superior State University, Sault Ste. Marie, MI, USA.
| | - David Fisher
- David Eccles School of Business, University of Utah, Salt Lake City, UT, USA.
| | - Sophia Chaudhry
- Department of Biological Sciences, Oakland University, Rochester, MI, USA. .,Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA.
| | - Ananias A Escalante
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
16
|
Al-Mamun HA, Kwan P, Clark SA, Ferdosi MH, Tellam R, Gondro C. Genome-wide association study of body weight in Australian Merino sheep reveals an orthologous region on OAR6 to human and bovine genomic regions affecting height and weight. Genet Sel Evol 2015; 47:66. [PMID: 26272623 PMCID: PMC4536601 DOI: 10.1186/s12711-015-0142-4] [Citation(s) in RCA: 96] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2014] [Accepted: 07/23/2015] [Indexed: 12/27/2022] Open
Abstract
Background Body weight (BW) is an important trait for meat production in sheep. Although over the past few years, numerous quantitative trait loci (QTL) have been detected for production traits in cattle, few QTL studies have been reported for sheep, with even fewer on meat production traits. Our objective was to perform a genome-wide association study (GWAS) with the medium-density Illumina Ovine SNP50 BeadChip to identify genomic regions and corresponding haplotypes associated with BW in Australian Merino sheep. Methods A total of 1781 Australian Merino sheep were genotyped using the medium-density Illumina Ovine SNP50 BeadChip. Among the 53 862 single nucleotide polymorphisms (SNPs) on this array, 48 640 were used to perform a GWAS using a linear mixed model approach. Genotypes were phased with hsphase; to estimate SNP haplotype effects, linkage disequilibrium blocks were identified in the detected QTL region. Results Thirty-nine SNPs were associated with BW at a Bonferroni-corrected genome-wide significance threshold of 1 %. One region on sheep (Ovis aries) chromosome 6 (OAR6) between 36.15 and 38.56 Mb, included 13 significant SNPs that were associated with BW; the most significant SNP was OAR6_41936490.1 (P = 2.37 × 10−16) at 37.69 Mb with an allele substitution effect of 2.12 kg, which corresponds to 0.248 phenotypic standard deviations for BW. The region that surrounds this association signal on OAR6 contains three genes: leucine aminopeptidase 3 (LAP3), which is involved in the processing of the oxytocin precursor; NCAPG non-SMC condensin I complex, subunit G (NCAPG), which is associated with foetal growth and carcass size in cattle; and ligand dependent nuclear receptor corepressor-like (LCORL), which is associated with height in humans and cattle. Conclusions The GWAS analysis detected 39 SNPs associated with BW in sheep and a major QTL region was identified on OAR6. In several other mammalian species, regions that are syntenic with this region have been found to be associated with body size traits, which may reflect that the underlying biological mechanisms share a common ancestry. These findings should facilitate the discovery of causative variants for BW and contribute to marker-assisted selection. Electronic supplementary material The online version of this article (doi:10.1186/s12711-015-0142-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hawlader A Al-Mamun
- School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia. .,School of Science and Technology, University of New England, Armidale, NSW, 2351, Australia.
| | - Paul Kwan
- School of Science and Technology, University of New England, Armidale, NSW, 2351, Australia.
| | - Samuel A Clark
- School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| | - Mohammad H Ferdosi
- School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| | - Ross Tellam
- CSIRO Animal, Food and Health Sciences, Queensland Bioscience Precinct, St. Lucia, QLD, 4067, Australia.
| | - Cedric Gondro
- School of Environmental and Rural Science, University of New England, Armidale, NSW, 2351, Australia.
| |
Collapse
|
17
|
Abstract
Amino acid repeats (AARs) are abundant in protein sequences. They have particular roles in protein function and evolution. Simple repeat patterns generated by DNA slippage tend to introduce length variations and point mutations in repeat regions. Loss of normal and gain of abnormal function owing to their variable length are potential risks leading to diseases. Repeats with complex patterns mostly refer to the functional domain repeats, such as the well-known leucine-rich repeat and WD repeat, which are frequently involved in protein–protein interaction. They are mainly derived from internal gene duplication events and stabilized by ‘gate-keeper’ residues, which play crucial roles in preventing inter-domain aggregation. AARs are widely distributed in different proteomes across a variety of taxonomic ranges, and especially abundant in eukaryotic proteins. However, their specific evolutionary and functional scenarios are still poorly understood. Identifying AARs in protein sequences is the first step for the further investigation of their biological function and evolutionary mechanism. In principle, this is an NP-hard problem, as most of the repeat fragments are shaped by a series of sophisticated evolutionary events and become latent periodical patterns. It is not possible to define a uniform criterion for detecting and verifying various repeat patterns. Instead, different algorithms based on different strategies have been developed to cope with different repeat patterns. In this review, we attempt to describe the amino acid repeat-detection algorithms currently available and compare their strategies based on an in-depth analysis of the biological significance of protein repeats.
Collapse
|
18
|
Fu M, Huang Z, Mao Y, Tao S. Neighbor preferences of amino acids and context-dependent effects of amino acid substitutions in human, mouse, and dog. Int J Mol Sci 2014; 15:15963-80. [PMID: 25210846 PMCID: PMC4200849 DOI: 10.3390/ijms150915963] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2014] [Revised: 08/27/2014] [Accepted: 09/02/2014] [Indexed: 12/23/2022] Open
Abstract
Amino acids show apparent propensities toward their neighbors. In addition to preferences of amino acids for their neighborhood context, amino acid substitutions are also considered to be context-dependent. However, context-dependence patterns of amino acid substitutions still remain poorly understood. Using relative entropy, we investigated the neighbor preferences of 20 amino acids and the context-dependent effects of amino acid substitutions with protein sequences in human, mouse, and dog. For 20 amino acids, the highest relative entropy was mostly observed at the nearest adjacent site of either N- or C-terminus except C and G. C showed the highest relative entropy at the third flanking site and periodic pattern was detected at G flanking sites. Furthermore, neighbor preference patterns of amino acids varied greatly in different secondary structures. We then comprehensively investigated the context-dependent effects of amino acid substitutions. Our results showed that nearly half of 380 substitution types were evidently context dependent, and the context-dependent patterns relied on protein secondary structures. Among 20 amino acids, P elicited the greatest effect on amino acid substitutions. The underlying mechanisms of context-dependent effects of amino acid substitutions were possibly mutation bias at a DNA level and natural selection. Our findings may improve secondary structure prediction algorithms and protein design; moreover, this study provided useful information to develop empirical models of protein evolution that consider dependence between residues.
Collapse
Affiliation(s)
- Mingchuan Fu
- College of Life Sciences and State Key Laboratory of Crop Stress Biology in Arid Areas, Northwest A&F University, Yangling 712100, China.
| | - Zhuoran Huang
- College of Life Sciences and State Key Laboratory of Crop Stress Biology in Arid Areas, Northwest A&F University, Yangling 712100, China.
| | - Yuanhui Mao
- College of Life Sciences and State Key Laboratory of Crop Stress Biology in Arid Areas, Northwest A&F University, Yangling 712100, China.
| | - Shiheng Tao
- College of Life Sciences and State Key Laboratory of Crop Stress Biology in Arid Areas, Northwest A&F University, Yangling 712100, China.
| |
Collapse
|
19
|
Persi E, Horn D. Systematic analysis of compositional order of proteins reveals new characteristics of biological functions and a universal correlate of macroevolution. PLoS Comput Biol 2013; 9:e1003346. [PMID: 24278003 PMCID: PMC3836704 DOI: 10.1371/journal.pcbi.1003346] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2013] [Accepted: 10/03/2013] [Indexed: 01/01/2023] Open
Abstract
We present a novel analysis of compositional order (CO) based on the occurrence of Frequent amino-acid Triplets (FTs) that appear much more than random in protein sequences. The method captures all types of proteomic compositional order including single amino-acid runs, tandem repeats, periodic structure of motifs and otherwise low complexity amino-acid regions. We introduce new order measures, distinguishing between ‘regularity’, ‘periodicity’ and ‘vocabulary’, to quantify these phenomena and to facilitate the identification of evolutionary effects. Detailed analysis of representative species across the tree-of-life demonstrates that CO proteins exhibit numerous functional enrichments, including a wide repertoire of particular patterns of dependencies on regularity and periodicity. Comparison between human and mouse proteomes further reveals the interplay of CO with evolutionary trends, such as faster substitution rate in mouse leading to decrease of periodicity, while innovation along the human lineage leads to larger regularity. Large-scale analysis of 94 proteomes leads to systematic ordering of all major taxonomic groups according to FT-vocabulary size. This is measured by the count of Different Frequent Triplets (DFT) in proteomes. The latter provides a clear hierarchical delineation of vertebrates, invertebrates, plants, fungi and prokaryotes, with thermophiles showing the lowest level of FT-vocabulary. Among eukaryotes, this ordering correlates with phylogenetic proximity. Interestingly, in all kingdoms CO accumulation in the proteome has universal characteristics. We suggest that CO is a genomic-information correlate of both macroevolution and various protein functions. The results indicate a mechanism of genomic ‘innovation’ at the peptide level, involved in protein elongation, shaped in a universal manner by mutational and selective forces. Variations in compositionally ordered (CO) sections of proteins, such as amino acid runs, tandem repeats and low complexity regions, are often considered as a third type of genomic variation along with SNP and CNV. At the microevolutionary scale, they are involved in the rapid evolution of numerous biological functions and the development of novel phenotypic complex traits, including disease in human, in particular neurodegeneration and cancer. At the macroevolutionary scale, the best discriminating proteomic factor between super-kingdoms is the prevalence of CO proteins in eukaryotes. The analysis of CO structures has so far been quite eclectic. Here we introduce a novel unifying methodology, accounting for all types of low-complexity regions and repetitive phenomena, including the existence of large periodic structures in protein sequences. We define new CO measures providing insights into the correlation of CO with protein function and with evolution. In particular, a large-scale analysis of 94 proteomes shows that the CO vocabulary of frequently appearing amino acid triplets serves as a measure of taxonomic ordering separating major clades from each other. It unravels a missing genomic correlate of macroevolution and serves as a novel phylogenetic tool. This suggests that major CO generation occurs during the creation of a completely new species, i.e. during macroevolutionary events.
Collapse
Affiliation(s)
- Erez Persi
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
| | - David Horn
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
- * E-mail:
| |
Collapse
|
20
|
Tian X, Strassmann JE, Queller DC. A conserved extraordinarily long serine homopolymer in Dictyostelid amoebae. Heredity (Edinb) 2013; 112:215-8. [PMID: 24084645 DOI: 10.1038/hdy.2013.96] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Revised: 06/12/2012] [Accepted: 08/30/2013] [Indexed: 12/19/2022] Open
Abstract
Eukaryotic protein sequences often contain amino-acid homopolymers that consist of a single amino acid repeated from several to dozens of times. Some of these are functional but others may persist largely because of high expansion rates due to DNA slippage. However, very long homopolymers with over a hundred repeats are very rare. We report an extraordinarily long homopolymer consisting of 306 tandem serine repeats from the single-celled eukaryote Dictyostelium discoideum, which also has a multicellular stage. The gene has a paralog with 132 repeats and orthologs, also with high serine repeat numbers, in various other Dictyostelid species. The conserved gene structure and protein sequences suggest that the homopolymer is functional. The high codon diversity and very poor alignment of serine codons in this gene between species similarly indicate functionality. This is because the serine homopolymer is conserved despite much DNA sequence change. A survey of other very long amino-acid homopolymers in eukaryotes shows that high codon diversity is the rule, suggesting that these too may be functional.
Collapse
Affiliation(s)
- X Tian
- Department of Biology, Washington University in St Louis, St Louis, MO, USA
| | - J E Strassmann
- Department of Biology, Washington University in St Louis, St Louis, MO, USA
| | - D C Queller
- Department of Biology, Washington University in St Louis, St Louis, MO, USA
| |
Collapse
|
21
|
Chong Z, Zhai W, Li C, Gao M, Gong Q, Ruan J, Li J, Jiang L, Lv X, Hungate E, Wu CI. The evolution of small insertions and deletions in the coding genes of Drosophila melanogaster. Mol Biol Evol 2013; 30:2699-708. [PMID: 24077769 DOI: 10.1093/molbev/mst167] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Studies of protein evolution have focused on amino acid substitutions with much less systematic analysis on insertion and deletions (indels) in protein coding genes. We hence surveyed 7,500 genes between Drosophila melanogaster and D. simulans, using D. yakuba as an outgroup for this purpose. The evolutionary rate of coding indels is indeed low, at only 3% of that of nonsynonymous substitutions. As coding indels follow a geometric distribution in size and tend to fall in low-complexity regions of proteins, it is unclear whether selection or mutation underlies this low rate. To resolve the issue, we collected genomic sequences from an isogenic African line of D. melanogaster (ZS30) at a high coverage of 70× and analyzed indel polymorphism between ZS30 and the reference genome. In comparing polymorphism and divergence, we found that the divergence to polymorphism ratio (i.e., fixation index) for smaller indels (size ≤ 10 bp) is very similar to that for synonymous changes, suggesting that most of the within-species polymorphism and between-species divergence for indels are selectively neutral. Interestingly, deletions of larger sizes (size ≥ 11 bp and ≤ 30 bp) have a much higher fixation index than synonymous mutations and 44.4% of fixed middle-sized deletions are estimated to be adaptive. To our surprise, this pattern is not found for insertions. Protein indel evolution appear to be in a dynamic flux of neutrally driven expansion (insertions) together with adaptive-driven contraction (deletions), and these observations provide important insights for understanding the fitness of new mutations as well as the evolutionary driving forces for genomic evolution in Drosophila species.
Collapse
Affiliation(s)
- Zechen Chong
- Center for Computational Biology and Laboratory of Disease Genomics and Individualized Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Loire E, Higuet D, Netter P, Achaz G. Evolution of coding microsatellites in primate genomes. Genome Biol Evol 2013; 5:283-95. [PMID: 23315383 PMCID: PMC3590770 DOI: 10.1093/gbe/evt003] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Microsatellites (SSRs) are highly susceptible to expansions and contractions. When located in a coding sequence, the insertion or the deletion of a single unit for a mono-, di-, tetra-, or penta(nucleotide)-SSR creates a frameshift. As a consequence, one would expect to find only very few of these SSRs in coding sequences because of their strong deleterious potential. Unexpectedly, genomes contain many coding SSRs of all types. Here, we report on a study of their evolution in a phylogenetic context using the genomes of four primates: human, chimpanzee, orangutan, and macaque. In a set of 5,015 orthologous genes unambiguously aligned among the four species, we show that, except for tri- and hexa-SSRs, for which insertions and deletions are frequently observed, SSRs in coding regions evolve mainly by substitutions. We show that the rate of substitution in all types of coding SSRs is typically two times higher than in the rest of coding sequences. Additionally, we observe that although numerous coding SSRs are created and lost by substitutions in the lineages, their numbers remain constant. This last observation suggests that the coding SSRs have reached equilibrium. We hypothesize that this equilibrium involves a combination of mutation, drift, and selection. We thus estimated the fitness cost of mono-SSRs and show that it increases with the number of units. We finally show that the cost of coding mono-SSRs greatly varies from function to function, suggesting that the strength of the selection that acts against them can be correlated to gene functions.
Collapse
Affiliation(s)
- Etienne Loire
- UMR 7138, Systématique, Adaptation, Evolution (UPMC, CNRS, MNHN, IRD), Paris, France
| | | | | | | |
Collapse
|
23
|
Scala C, Tian X, Mehdiabadi NJ, Smith MH, Saxer G, Stephens K, Buzombo P, Strassmann JE, Queller DC. Amino acid repeats cause extraordinary coding sequence variation in the social amoeba Dictyostelium discoideum. PLoS One 2012; 7:e46150. [PMID: 23029418 PMCID: PMC3460934 DOI: 10.1371/journal.pone.0046150] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 08/28/2012] [Indexed: 12/19/2022] Open
Abstract
Protein sequences are normally the most conserved elements of genomes owing to purifying selection to maintain their functions. We document an extraordinary amount of within-species protein sequence variation in the model eukaryote Dictyostelium discoideum stemming from triplet DNA repeats coding for long strings of single amino acids. D. discoideum has a very large number of such strings, many of which are polyglutamine repeats, the same sequence that causes various human neurological disorders in humans, like Huntington’s disease. We show here that D. discoideum coding repeat loci are highly variable among individuals, making D. discoideum a candidate for the most variable proteome. The coding repeat loci are not significantly less variable than similar non-coding triplet repeats. This pattern is consistent with these amino-acid repeats being largely non-functional sequences evolving primarily by mutation and drift.
Collapse
Affiliation(s)
- Clea Scala
- Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas, United States of America
| | - Xiangjun Tian
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Natasha J. Mehdiabadi
- Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas, United States of America
| | - Margaret H. Smith
- Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas, United States of America
| | - Gerda Saxer
- Department of Biochemistry and Cell Biology, Rice University, Houston, Texas, United States of America
| | - Katie Stephens
- Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas, United States of America
| | - Prince Buzombo
- Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas, United States of America
| | - Joan E. Strassmann
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - David C. Queller
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
24
|
Radó-Trilla N, Albà M. Dissecting the role of low-complexity regions in the evolution of vertebrate proteins. BMC Evol Biol 2012; 12:155. [PMID: 22920595 PMCID: PMC3523016 DOI: 10.1186/1471-2148-12-155] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 07/30/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution. RESULTS We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance. CONCLUSION We have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.
Collapse
Affiliation(s)
- Núria Radó-Trilla
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics - IMIM Hospital del Mar Research Institute, Universitat Pompeu Fabra, Dr. Aiguader 88, Barcelona 08003, Spain
| | | |
Collapse
|
25
|
Pointer MA, Kamilar JM, Warmuth V, Chester SGB, Delsuc F, Mundy NI, Asher RJ, Bradley BJ. RUNX2 tandem repeats and the evolution of facial length in placental mammals. BMC Evol Biol 2012; 12:103. [PMID: 22741925 PMCID: PMC3438065 DOI: 10.1186/1471-2148-12-103] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2012] [Accepted: 06/28/2012] [Indexed: 01/21/2023] Open
Abstract
Background When simple sequence repeats are integrated into functional genes, they can potentially act as evolutionary ‘tuning knobs’, supplying abundant genetic variation with minimal risk of pleiotropic deleterious effects. The genetic basis of variation in facial shape and length represents a possible example of this phenomenon. Runt-related transcription factor 2 (RUNX2), which is involved in osteoblast differentiation, contains a functionally-important tandem repeat of glutamine and alanine amino acids. The ratio of glutamines to alanines (the QA ratio) in this protein seemingly influences the regulation of bone development. Notably, in domestic breeds of dog, and in carnivorans in general, the ratio of glutamines to alanines is strongly correlated with facial length. Results In this study we examine whether this correlation holds true across placental mammals, particularly those mammals for which facial length is highly variable and related to adaptive behavior and lifestyle (e.g., primates, afrotherians, xenarthrans). We obtained relative facial length measurements and RUNX2 sequences for 41 mammalian species representing 12 orders. Using both a phylogenetic generalized least squares model and a recently-developed Bayesian comparative method, we tested for a correlation between genetic and morphometric data while controlling for phylogeny, evolutionary rates, and divergence times. Non-carnivoran taxa generally had substantially lower glutamine-alanine ratios than carnivorans (primates and xenarthrans with means of 1.34 and 1.25, respectively, compared to a mean of 3.1 for carnivorans), and we found no correlation between RUNX2 sequence and face length across placental mammals. Conclusions Results of our diverse comparative phylogenetic analyses indicate that QA ratio does not consistently correlate with face length across the 41 mammalian taxa considered. Thus, although RUNX2 might function as a ‘tuning knob’ modifying face length in carnivorans, this relationship is not conserved across mammals in general.
Collapse
Affiliation(s)
- Marie A Pointer
- Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ, UK
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Ryan CP, Crespi BJ. Androgen receptor polyglutamine repeat number: models of selection and disease susceptibility. Evol Appl 2012; 6:180-96. [PMID: 23467468 PMCID: PMC3586616 DOI: 10.1111/j.1752-4571.2012.00275.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 05/04/2012] [Indexed: 12/14/2022] Open
Abstract
Variation in polyglutamine repeat number in the androgen receptor (AR CAGn) is negatively correlated with the transcription of androgen-responsive genes and is associated with susceptibility to an extensive list of human disease. Only a small portion of the heritability for many of these diseases is explained by conventional SNP-based genome-wide association studies, and the forces shaping AR CAGn among humans remains largely unexplored. Here, we propose evolutionary models for understanding selection at the AR CAG locus, namely balancing selection, sexual conflict, accumulation-selection, and antagonistic pleiotropy. We evaluate these models by examining AR CAGn-linked susceptibility to eight extensively studied diseases representing the diverse physiological roles of androgens, and consider the costs of these diseases by their frequency and fitness effects. Five diseases could contribute to the distribution of AR CAGn observed among contemporary human populations. With support for disease susceptibilities associated with long and short AR CAGn, balancing selection provides a useful model for studying selection at this locus. Gender-specific differences AR CAGn health effects also support this locus as a candidate for sexual conflict over repeat number. Accompanied by the accumulation of AR CAGn in humans, these models help explain the distribution of repeat number in contemporary human populations.
Collapse
Affiliation(s)
- Calen P Ryan
- Department of Biological Sciences, Simon Fraser University Burnaby, BC, Canada
| | | |
Collapse
|
27
|
Ramazzotti M, Monsellier E, Kamoun C, Degl'Innocenti D, Melki R. Polyglutamine repeats are associated to specific sequence biases that are conserved among eukaryotes. PLoS One 2012; 7:e30824. [PMID: 22312432 PMCID: PMC3270027 DOI: 10.1371/journal.pone.0030824] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2011] [Accepted: 12/23/2011] [Indexed: 12/20/2022] Open
Abstract
Nine human neurodegenerative diseases, including Huntington's disease and several spinocerebellar ataxia, are associated to the aggregation of proteins comprising an extended tract of consecutive glutamine residues (polyQs) once it exceeds a certain length threshold. This event is believed to be the consequence of the expansion of polyCAG codons during the replication process. This is in apparent contradiction with the fact that many polyQs-containing proteins remain soluble and are encoded by invariant genes in a number of eukaryotes. The latter suggests that polyQs expansion and/or aggregation might be counter-selected through a genetic and/or protein context. To identify this context, we designed a software that scrutinize entire proteomes in search for imperfect polyQs. The nature of residues flanking the polyQs and that of residues other than Gln within polyQs (insertions) were assessed. We discovered strong amino acid residue biases robustly associated to polyQs in the 15 eukaryotic proteomes we examined, with an over-representation of Pro, Leu and His and an under-representation of Asp, Cys and Gly amino acid residues. These biases are conserved amongst unrelated proteins and are independent of specific functional classes. Our findings suggest that specific residues have been co-selected with polyQs during evolution. We discuss the possible selective pressures responsible of the observed biases.
Collapse
Affiliation(s)
- Matteo Ramazzotti
- Dipartimento di Scienze Biochimiche, Università degli Studi di Firenze, Florence, Italy
- * E-mail: (MR); (EM)
| | - Elodie Monsellier
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
- * E-mail: (MR); (EM)
| | - Choumouss Kamoun
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| | | | - Ronald Melki
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| |
Collapse
|
28
|
Zhou Y, Liu J, Han L, Li ZG, Zhang Z. Comprehensive analysis of tandem amino acid repeats from ten angiosperm genomes. BMC Genomics 2011; 12:632. [PMID: 22195734 PMCID: PMC3283746 DOI: 10.1186/1471-2164-12-632] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Accepted: 12/23/2011] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The presence of tandem amino acid repeats (AARs) is one of the signatures of eukaryotic proteins. AARs were thought to be frequently involved in bio-molecular interactions. Comprehensive studies that primarily focused on metazoan AARs have suggested that AARs are evolving rapidly and are highly variable among species. However, there is still controversy over causal factors of this inter-species variation. In this work, we attempted to investigate this topic mainly by comparing AARs in orthologous proteins from ten angiosperm genomes. RESULTS Angiosperm AAR content is positively correlated with the GC content of the protein coding sequence. However, based on observations from fungal AARs and insect AARs, we argue that the applicability of this kind of correlation is limited by AAR residue composition and species' life history traits. Angiosperm AARs also tend to be fast evolving and structurally disordered, supporting the results of comprehensive analyses of metazoans. The functions of conserved long AARs are summarized. Finally, we propose that the rapid mRNA decay rate, alternative splicing and tissue specificity are regulatory processes that are associated with angiosperm proteins harboring AARs. CONCLUSIONS Our investigation suggests that GC content is a predictor of AAR content in the protein coding sequence under certain conditions. Although angiosperm AARs lack conservation and 3D structure, a fraction of the proteins that contain AARs may be functionally important and are under extensive regulation in plant cells.
Collapse
Affiliation(s)
- Yuan Zhou
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Jing Liu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhi-Gang Li
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
29
|
Luo H, Lin K, David A, Nijveen H, Leunissen JAM. ProRepeat: an integrated repository for studying amino acid tandem repeats in proteins. Nucleic Acids Res 2011; 40:D394-9. [PMID: 22102581 PMCID: PMC3245022 DOI: 10.1093/nar/gkr1019] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
ProRepeat (http://prorepeat.bioinformatics.nl/) is an integrated curated repository and analysis platform for in-depth research on the biological characteristics of amino acid tandem repeats. ProRepeat collects repeats from all proteins included in the UniProt knowledgebase, together with 85 completely sequenced eukaryotic proteomes contained within the RefSeq collection. It contains non-redundant perfect tandem repeats, approximate tandem repeats and simple, low-complexity sequences, covering the majority of the amino acid tandem repeat patterns found in proteins. The ProRepeat web interface allows querying the repeat database using repeat characteristics like repeat unit and length, number of repetitions of the repeat unit and position of the repeat in the protein. Users can also search for repeats by the characteristics of repeat containing proteins, such as entry ID, protein description, sequence length, gene name and taxon. ProRepeat offers powerful analysis tools for finding biological interesting properties of repeats, such as the strong position bias of leucine repeats in the N-terminus of eukaryotic protein sequences, the differences of repeat abundance among proteomes, the functional classification of repeat containing proteins and GC content constrains of repeats’ corresponding codons.
Collapse
Affiliation(s)
- Hong Luo
- Laboratory of Bioinformatics, Wageningen University and Research Centre, PO Box 569, 6700 AN Wageningen, Netherlands
| | | | | | | | | |
Collapse
|
30
|
Haerty W, Golding GB. Increased polymorphism near low-complexity sequences across the genomes of Plasmodium falciparum isolates. Genome Biol Evol 2011; 3:539-50. [PMID: 21602572 PMCID: PMC3140889 DOI: 10.1093/gbe/evr045] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Low-complexity regions (LCRs) within proteins sequences are often considered to evolve neutrally even though recent studies reported evidence for selection acting on some of them. Because of their widespread distribution among eukaryotes genomes and the potential deleterious effect of expansion/contraction of some of them in humans, low-complexity sequences are of major interest and numerous studies have attempted to describe their dynamic between genomes as well as the factors correlated to their variation and to assess their selective value. However, due to the scarcity of individual genomes within a species, most of the analyses so far have been performed at the species level with the implicit assumption that the variation both in composition and size within species is too small relative to the between-species divergence to affect the conclusions of the analysis. Here we used the available genomes of 14 Plasmodium falciparum isolates to assess the relationship between low-complexity sequence variation and factors such as nucleotide polymorphism across strains, sequence composition, and protein expression. We report that more than half of the 7,711 low-complexity sequences found within aligned coding sequences are variable in size among strains. Across strains, we observed an increasing density of polymorphic sites toward the LCR boundaries. This observation strongly suggests the joint effects of lowered selective constraints on low-complexity sequences and a mutagenic effect of these simple sequences.
Collapse
Affiliation(s)
- Wilfried Haerty
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | | |
Collapse
|
31
|
Haerty W, Golding GB. Low-complexity sequences and single amino acid repeats: not just "junk" peptide sequences. Genome 2011; 53:753-62. [PMID: 20962881 DOI: 10.1139/g10-063] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
For decades proteins were thought to interact in a "lock and key" system, which led to the definition of a paradigm linking stable three-dimensional structure to biological function. As a consequence, any non-structured peptide was considered to be nonfunctional and to evolve neutrally. Surprisingly, the most commonly shared peptides between eukaryotic proteomes are low-complexity sequences that in most conditions do not present a stable three-dimensional structure. However, because these sequences evolve rapidly and because the size variation of a few of them can have deleterious effects, low-complexity sequences have been suggested to be the target of selection. Here we review evidence that supports the idea that these simple sequences should not be considered just "junk" peptides and that selection drives the evolution of many of them.
Collapse
Affiliation(s)
- Wilfried Haerty
- Biology Department, McMaster University, Hamilton, ON, Canada
| | | |
Collapse
|
32
|
Role of Everlasting Triplet Expansions in Protein Evolution. J Mol Evol 2010; 72:232-9. [DOI: 10.1007/s00239-010-9425-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2010] [Accepted: 12/01/2010] [Indexed: 02/05/2023]
|
33
|
Whan V, Hobbs M, McWilliam S, Lynn DJ, Lutzow YS, Khatkar M, Barendse W, Raadsma H, Tellam RL. Bovine proteins containing poly-glutamine repeats are often polymorphic and enriched for components of transcriptional regulatory complexes. BMC Genomics 2010; 11:654. [PMID: 21092319 PMCID: PMC3014979 DOI: 10.1186/1471-2164-11-654] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2010] [Accepted: 11/23/2010] [Indexed: 11/12/2022] Open
Abstract
Background About forty human diseases are caused by repeat instability mutations. A distinct subset of these diseases is the result of extreme expansions of polymorphic trinucleotide repeats; typically CAG repeats encoding poly-glutamine (poly-Q) tracts in proteins. Polymorphic repeat length variation is also apparent in human poly-Q encoding genes from normal individuals. As these coding sequence repeats are subject to selection in mammals, it has been suggested that normal variations in some of these typically highly conserved genes are implicated in morphological differences between species and phenotypic variations within species. At present, poly-Q encoding genes in non-human mammalian species are poorly documented, as are their functions and propensities for polymorphic variation. Results The current investigation identified 178 bovine poly-Q encoding genes (Q ≥ 5) and within this group, 26 genes with orthologs in both human and mouse that did not contain poly-Q repeats. The bovine poly-Q encoding genes typically had ubiquitous expression patterns although there was bias towards expression in epithelia, brain and testes. They were also characterised by unusually large sizes. Analysis of gene ontology terms revealed that the encoded proteins were strongly enriched for functions associated with transcriptional regulation and many contributed to physical interaction networks in the nucleus where they presumably act cooperatively in transcriptional regulatory complexes. In addition, the coding sequence CAG repeats in some bovine genes impacted mRNA splicing thereby generating unusual transcriptional diversity, which in at least one instance was tissue-specific. The poly-Q encoding genes were prioritised using multiple criteria for their likelihood of being polymorphic and then the highest ranking group was experimentally tested for polymorphic variation within a cattle diversity panel. Extensive and meiotically stable variation was identified. Conclusions Transcriptional diversity can potentially be generated in poly-Q encoding genes by the impact of CAG repeat tracts on mRNA alternative splicing. This effect, combined with the physical interactions of the encoded proteins in large transcriptional regulatory complexes suggests that polymorphic variations of proteins in these complexes have strong potential to affect phenotype.
Collapse
Affiliation(s)
- Vicki Whan
- CSIRO Livestock Industries, Queensland Bioscience Precinct, 306 Carmody Rd, St Lucia, Queensland 4067, Australia
| | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Birge LM, Pitts ML, Richard BH, Wilkinson GS. Length polymorphism and head shape association among genes with polyglutamine repeats in the stalk-eyed fly, Teleopsis dalmanni. BMC Evol Biol 2010; 10:227. [PMID: 20663190 PMCID: PMC3055267 DOI: 10.1186/1471-2148-10-227] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2010] [Accepted: 07/27/2010] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Polymorphisms of single amino acid repeats (SARPs) are a potential source of genetic variation for rapidly evolving morphological traits. Here, we characterize variation in and test for an association between SARPs and head shape, a trait under strong sexual selection, in the stalk-eyed fly, Teleopsis dalmanni. Using an annotated expressed sequence tag database developed from eye-antennal imaginal disc tissues in T. dalmanni we identified 98 genes containing nine or more consecutive copies of a single amino acid. We then quantify variation in length and allelic diversity for 32 codon and 15 noncodon repeat regions in a large outbred population. We also assessed the frequency with which amino acid repeats are either gained or lost by identifying sequence similarities between T. dalmanni SARP loci and their orthologs in Drosophila melanogaster. Finally, to identify SARP containing genes that may influence head development we conducted a two-generation association study after assortatively mating for extreme relative eyespan. RESULTS We found that glutamine repeats occur more often than expected by amino acid abundance among 3,400 head development genes in T. dalmanni and D. melanogaster. Furthermore, glutamine repeats occur disproportionately in transcription factors. Loci with glutamine repeats exhibit heterozygosities and allelic diversities that do not differ from noncoding dinucleotide microsatellites, including greater variation among X-linked than autosomal regions. In the majority of cases, repeat tracts did not overlap between T. dalmanni and D. melanogaster indicating that large glutamine repeats are gained or lost frequently during Dipteran evolution. Analysis of covariance reveals a significant effect of parental genotype on mean progeny eyespan, with body length as a covariate, at six SARP loci [CG33692, ptip, band4.1 inhibitor LRP interactor, corto, 3531953:1, and ecdysone-induced protein 75B (Eip75B)]. Mixed model analysis of covariance using the eyespan of siblings segregating for repeat length variation confirms that significant genotype-phenotype associations exist for at least one sex at five of these loci and for one gene, CG33692, longer repeats were associated with longer relative eyespan in both sexes. CONCLUSION Among genes expressed during head development in stalk-eyed flies, long codon repeats typically contain glutamine, occur in transcription factors and exhibit high levels of heterozygosity. Furthermore, the presence of significant associations within families between repeat length and head shape indicates that six genes, or genes linked to them, contribute genetic variation to the development of this extremely sexually dimorphic trait.
Collapse
Affiliation(s)
- Leanna M Birge
- Department of Biology, University of Maryland, College Park, MD 20742 USA
- University College London, Research Department of Genetics, Evolution and Environment, Wolfson House, 4 Stephenson Way, London, NW1 2HE, UK
| | - Marie L Pitts
- Department of Biology, The College of William and Mary, Williamsburg, VA 23187 USA
| | - Baker H Richard
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, 10024 USA
| | - Gerald S Wilkinson
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| |
Collapse
|
35
|
Mularoni L, Ledda A, Toll-Riera M, Albà MM. Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Res 2010; 20:745-54. [PMID: 20335526 DOI: 10.1101/gr.101261.109] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence.
Collapse
Affiliation(s)
- Loris Mularoni
- Biomedical Informatics Research Programme (GRIB), Fundació Institut Municipal d'Investigació Mèdica, Barcelona 08003, Spain
| | | | | | | |
Collapse
|