1
|
Sun Y, Shui K, Li Q, Liu C, Jin W, Ni JQ, Lu J, Zhang L. Upstream open reading frames dynamically modulate CLOCK protein translation to regulate circadian rhythms and sleep. PLoS Biol 2025; 23:e3003173. [PMID: 40354412 DOI: 10.1371/journal.pbio.3003173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Accepted: 04/18/2025] [Indexed: 05/14/2025] Open
Abstract
The circadian rhythm is an evolutionarily conserved mechanism with translational regulation increasingly recognized as pivotal in its modulation. In this study, we found that upstream open reading frames (uORFs) are enriched in Drosophila circadian rhythm genes, with particularly conserved uORFs present in core circadian clock genes. We demonstrate evidence that the uORFs of the core clock gene, Clock (Clk), rhythmically and substantially attenuate CLK protein translation in Drosophila, with pronounced suppression occurring during daylight hours. Eliminating Clk uORFs leads to increased CLK protein levels during the day and results in a shortened circadian cycle, along with a broad shift in clock gene expression rhythms. Notably, Clk uORF deletion also augments morning sleep by reducing dopaminergic activity. Beyond daily circadian adjustments, Clk uORFs play a role in modulating sleep patterns in response to seasonal daylight variations. Furthermore, the Clk uORFs act as an important regulator to shape the rhythmic expression of a vast array of genes and influence multifaceted physiological outcomes. Collectively, our research sheds light on the intricate ways uORFs dynamically adjust downstream coding sequences to acclimate to environmental shifts.
Collapse
Affiliation(s)
- Yuanqiang Sun
- State Key Laboratory of Gene Function and Modulation Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, China
| | - Ke Shui
- College of Biomedicine and Health, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
- Key Laboratory of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Qinyu Li
- Key Laboratory of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Chenlu Liu
- State Key Laboratory of Gene Function and Modulation Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, China
| | - Wanting Jin
- State Key Laboratory of Gene Function and Modulation Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, China
| | - Jian-Quan Ni
- Gene Regulatory Lab, School of Medicine, Tsinghua University, Beijing, China
| | - Jian Lu
- State Key Laboratory of Gene Function and Modulation Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing, China
- Beijing Advanced Center of RNA Biology (BEACON), Peking University, Beijing, China
| | - Luoying Zhang
- Key Laboratory of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
- Hubei Province Key Laboratory of Oral and Maxillofacial Development and Regeneration, Wuhan, China
| |
Collapse
|
2
|
Xu K, Zhu J, Zhai H, Yang Q, Zhou K, Song Q, Wu J, Liu D, Li Y, Xia Z. A single-nucleotide polymorphism in PvPW1 encoding β-1,3-glucanase 9 is associated with pod width in Phaseolus vulgaris L. J Genet Genomics 2024; 51:1413-1422. [PMID: 39389459 DOI: 10.1016/j.jgg.2024.09.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 09/25/2024] [Accepted: 09/26/2024] [Indexed: 10/12/2024]
Abstract
Pod width influences pod size, shape, yield, and consumer preference in snap beans (Phaseolus vulgaris L.). In this study, we map PvPW1, a quantitative trait locus associated with pod width in snap beans, through genotyping and phenotyping of recombinant plants. We identify Phvul.006G072800, encoding the β-1,3-glucanase 9 protein, as the causal gene for PvPW1. The PvPW1G3555 allele is found to positively regulate pod width, as revealed by an association analysis between pod width phenotype and the PvPW1G3555C genotype across 17 bi-parental F2 populations. In total, 97.7% of the 133 wide pod accessions carry PvPW1G3555, while 82.1% of the 78 narrow pod accessions carry PvPW1C3555, indicating strong selection pressure on PvPW1 during common bean breeding. Re-sequencing data from 59 common bean cultivars identify an 8-bp deletion in the intron linked to PvPW1C3555, leading to the development of the InDel marker of PvM436. Genotyping 317 common bean accessions with PvM436 demonstrated that accessions with PvM436247 and PvM436227 alleles have wider pods compared to those with PvM436219 allele, establishing PvM436 as a reliable marker for molecular breeding in snap beans. These findings highlight PvPW1 as a critical gene regulating pod width and underscore the utility of PvM436 in marker-assisted selection for snap bean breeding.
Collapse
Affiliation(s)
- Kun Xu
- State Key Laboratory of Black Soils Conservation and Utilization, Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, Heilongjiang 150081, China
| | - Jinlong Zhu
- State Key Laboratory of Black Soils Conservation and Utilization, Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, Heilongjiang 150081, China
| | - Hong Zhai
- State Key Laboratory of Black Soils Conservation and Utilization, Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, Heilongjiang 150081, China
| | - Qiang Yang
- State Key Laboratory of Black Soils Conservation and Utilization, Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, Heilongjiang 150081, China
| | - Keqin Zhou
- State Key Laboratory of Black Soils Conservation and Utilization, Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, Heilongjiang 150081, China
| | - Qijian Song
- USDA ARS, Soybean Genome & Improvement Lab, Beltsville 20705, USA
| | - Jing Wu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 10081, China.
| | - Dajun Liu
- Horticulture Department, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin, Heilongjiang 150000, China.
| | - Yanhua Li
- State Key Laboratory of Black Soils Conservation and Utilization, Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, Heilongjiang 150081, China.
| | - Zhengjun Xia
- State Key Laboratory of Black Soils Conservation and Utilization, Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, Heilongjiang 150081, China.
| |
Collapse
|
3
|
Teekas L, Sharma S, Vijay N. Terminal regions of a protein are a hotspot for low complexity regions and selection. Open Biol 2024; 14:230439. [PMID: 38862022 DOI: 10.1098/rsob.230439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/13/2024] [Indexed: 06/13/2024] Open
Abstract
Volatile low complexity regions (LCRs) are a novel source of adaptive variation, functional diversification and evolutionary novelty. An interplay of selection and mutation governs the composition and length of low complexity regions. High %GC and mutations provide length variability because of mechanisms like replication slippage. Owing to the complex dynamics between selection and mutation, we need a better understanding of their coexistence. Our findings underscore that positively selected sites (PSS) and low complexity regions prefer the terminal regions of genes, co-occurring in most Tetrapoda clades. We observed that positively selected sites within a gene have position-specific roles. Central-positively selected site genes primarily participate in defence responses, whereas terminal-positively selected site genes exhibit non-specific functions. Low complexity region-containing genes in the Tetrapoda clade exhibit a significantly higher %GC and lower ω (dN/dS: non-synonymous substitution rate/synonymous substitution rate) compared with genes without low complexity regions. This lower ω implies that despite providing rapid functional diversity, low complexity region-containing genes are subjected to intense purifying selection. Furthermore, we observe that low complexity regions consistently display ubiquitous prevalence at lower purity levels, but exhibit a preference for specific positions within a gene as the purity of the low complexity region stretch increases, implying a composition-dependent evolutionary role. Our findings collectively contribute to the understanding of how genetic diversity and adaptation are shaped by the interplay of selection and low complexity regions in the Tetrapoda clade.
Collapse
Affiliation(s)
- Lokdeep Teekas
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Sandhya Sharma
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| |
Collapse
|
4
|
Rich KD, Srivastava S, Muthye VR, Wasmuth JD. Identification of potential molecular mimicry in pathogen-host interactions. PeerJ 2023; 11:e16339. [PMID: 37953771 PMCID: PMC10637249 DOI: 10.7717/peerj.16339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 10/02/2023] [Indexed: 11/14/2023] Open
Abstract
Pathogens have evolved sophisticated strategies to manipulate host signaling pathways, including the phenomenon of molecular mimicry, where pathogen-derived biomolecules imitate host biomolecules. In this study, we resurrected, updated, and optimized a sequence-based bioinformatics pipeline to identify potential molecular mimicry candidates between humans and 32 pathogenic species whose proteomes' 3D structure predictions were available at the start of this study. We observed considerable variation in the number of mimicry candidates across pathogenic species, with pathogenic bacteria exhibiting fewer candidates compared to fungi and protozoans. Further analysis revealed that the candidate mimicry regions were enriched in solvent-accessible regions, highlighting their potential functional relevance. We identified a total of 1,878 mimicked regions in 1,439 human proteins, and clustering analysis indicated diverse target proteins across pathogen species. The human proteins containing mimicked regions revealed significant associations between these proteins and various biological processes, with an emphasis on host extracellular matrix organization and cytoskeletal processes. However, immune-related proteins were underrepresented as targets of mimicry. Our findings provide insights into the broad range of host-pathogen interactions mediated by molecular mimicry and highlight potential targets for further investigation. This comprehensive analysis contributes to our understanding of the complex mechanisms employed by pathogens to subvert host defenses and we provide a resource to assist researchers in the development of novel therapeutic strategies.
Collapse
Affiliation(s)
- Kaylee D. Rich
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| | - Shruti Srivastava
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| | - Viraj R. Muthye
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| | - James D. Wasmuth
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
5
|
Reinar WB, Tørresen OK, Nederbragt AJ, Matschiner M, Jentoft S, Jakobsen KS. Teleost genomic repeat landscapes in light of diversification rates and ecology. Mob DNA 2023; 14:14. [PMID: 37789366 PMCID: PMC10546739 DOI: 10.1186/s13100-023-00302-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 09/20/2023] [Indexed: 10/05/2023] Open
Abstract
Repetitive DNA make up a considerable fraction of most eukaryotic genomes. In fish, transposable element (TE) activity has coincided with rapid species diversification. Here, we annotated the repetitive content in 100 genome assemblies, covering the major branches of the diverse lineage of teleost fish. We investigated if TE content correlates with family level net diversification rates and found support for a weak negative correlation. Further, we demonstrated that TE proportion correlates with genome size, but not to the proportion of short tandem repeats (STRs), which implies independent evolutionary paths. Marine and freshwater fish had large differences in STR content, with the most extreme propagation detected in the genomes of codfish species and Atlantic herring. Such a high density of STRs is likely to increase the mutational load, which we propose could be counterbalanced by high fecundity as seen in codfishes and herring.
Collapse
Affiliation(s)
| | - Ole K Tørresen
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Alexander J Nederbragt
- Department of Biosciences, University of Oslo, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Michael Matschiner
- Department of Biosciences, University of Oslo, Oslo, Norway
- University of Oslo, Natural History Museum, Oslo, Norway
| | - Sissel Jentoft
- Department of Biosciences, University of Oslo, Oslo, Norway
| | | |
Collapse
|
6
|
Sousa e Silva R, Sousa AD, Vieira J, Vieira CP. The Josephin domain (JD) containing proteins are predicted to bind to the same interactors: Implications for spinocerebellar ataxia type 3 (SCA3) studies using Drosophila melanogaster mutants. Front Mol Neurosci 2023; 16:1140719. [PMID: 37008788 PMCID: PMC10050893 DOI: 10.3389/fnmol.2023.1140719] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 02/21/2023] [Indexed: 03/17/2023] Open
Abstract
Spinocerebellar ataxia type 3, also known as Machado-Joseph disease (SCA3/ MJD), is the most frequent polyglutamine (polyQ) neurodegenerative disorder. It is caused by a pathogenic expansion of the polyQ tract, located at the C-terminal region of the protein encoded by the ATXN3 gene. This gene codes for a deubiquitinating enzyme (DUB) that belongs to a gene family, that in humans is composed by three more genes (ATXN3L, JOSD1, and JOSD2), that define two gene lineages (the ATXN3 and the Josephins). These proteins have in common the N-terminal catalytic domain (Josephin domain, JD), that in Josephins is the only domain present. In ATXN3 knock-out mouse and nematode models, the SCA3 neurodegeneration phenotype is not, however, reproduced, suggesting that in the genome of these species there are other genes that are able to compensate for the lack of ATXN3. Moreover, in mutant Drosophila melanogaster, where the only JD protein is coded by a Josephin-like gene, expression of the expanded human ATXN3 gene reproduces multiple aspects of the SCA3 phenotype, in contrast with the results of the expression of the wild type human form. In order to explain these findings, phylogenetic, as well as, protein–protein docking inferences are here performed. Here we show multiple losses of JD containing genes across the animal kingdom, suggesting partial functional redundancy of these genes. Accordingly, we predict that the JD is essential for binding with ataxin-3 and proteins of the Josephin lineages, and that D. melanogaster mutants are a good model of SCA3 despite the absence of a gene from the ATXN3 lineage. The molecular recognition regions of the ataxin-3 binding and those predicted for the Josephins are, however, different. We also report different binding regions between the two ataxin-3 forms (wild-type (wt) and expanded (exp)). The interactors that show an increase in the interaction strength with exp ataxin-3, are enriched in extrinsic components of mitochondrial outer membrane and endoplasmatic reticulum membrane. On the other hand, the group of interactors that show a decrease in the interaction strength with exp ataxin-3 is significantly enriched in extrinsic component of cytoplasm.
Collapse
|
7
|
Babišová K, Mentelová L, Geisseová TK, Beňová-Liszeková D, Beňo M, Chase BA, Farkaš R. Apocrine secretion in the salivary glands of Drosophilidae and other dipterans is evolutionarily conserved. Front Cell Dev Biol 2023; 10:1088055. [PMID: 36712974 PMCID: PMC9880899 DOI: 10.3389/fcell.2022.1088055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 12/15/2022] [Indexed: 01/15/2023] Open
Abstract
Apocrine secretion is a transport and secretory mechanism that remains only partially characterized, even though it is evolutionarily conserved among all metazoans, including humans. The excellent genetic model organism Drosophila melanogaster holds promise for elucidating the molecular mechanisms regulating this fundamental metazoan process. Two prerequisites for such investigations are to clearly define an experimental system to investigate apocrine secretion and to understand the evolutionarily and functional contexts in which apocrine secretion arose in that system. To this end, we recently demonstrated that, in D. melanogaster, the prepupal salivary glands utilize apocrine secretion prior to pupation to deliver innate immune and defense components to the exuvial fluid that lies between the metamorphosing pupae and its chitinous case. This finding provided a unique opportunity to appraise how this novel non-canonical and non-vesicular transport and secretory mechanism is employed in different developmental and evolutionary contexts. Here we demonstrate that this apocrine secretion, which is mechanistically and temporarily separated from the exocytotic mechanism used to produce the massive salivary glue secretion (Sgs), is shared across Drosophilidae and two unrelated dipteran species. Screening more than 30 species of Drosophila from divergent habitats across the globe revealed that apocrine secretion is a widespread and evolutionarily conserved cellular mechanism used to produce exuvial fluid. Species with longer larval and prepupal development than D. melanogaster activate apocrine secretion later, while smaller and more rapidly developing species activate it earlier. In some species, apocrine secretion occurs after the secretory material is first concentrated in cytoplasmic structures of unknown origin that we name "collectors." Strikingly, in contrast to the widespread use of apocrine secretion to provide exuvial fluid, not all species use exocytosis to produce the viscid salivary glue secretion that is seen in D. melanogaster. Thus, apocrine secretion is the conserved mechanism used to realize the major function of the salivary gland in fruitflies and related species: it produces the pupal exuvial fluid that provides an active defense against microbial invasion during pupal metamorphosis.
Collapse
Affiliation(s)
- Klaudia Babišová
- Laboratory of Developmental Genetics, Institute of Experimental Endocrinology, Biomedical Research Center v.v.i., Slovak Academy of Sciences, Bratislava, Slovakia
| | - Lucia Mentelová
- Laboratory of Developmental Genetics, Institute of Experimental Endocrinology, Biomedical Research Center v.v.i., Slovak Academy of Sciences, Bratislava, Slovakia,Department of Genetics, Comenius University, Bratislava, Slovakia
| | - Terézia Klaudia Geisseová
- Laboratory of Developmental Genetics, Institute of Experimental Endocrinology, Biomedical Research Center v.v.i., Slovak Academy of Sciences, Bratislava, Slovakia
| | - Denisa Beňová-Liszeková
- Laboratory of Developmental Genetics, Institute of Experimental Endocrinology, Biomedical Research Center v.v.i., Slovak Academy of Sciences, Bratislava, Slovakia
| | - Milan Beňo
- Laboratory of Developmental Genetics, Institute of Experimental Endocrinology, Biomedical Research Center v.v.i., Slovak Academy of Sciences, Bratislava, Slovakia
| | - Bruce A. Chase
- Department of Biology, University of Nebraska, Omaha, NE, United States
| | - Robert Farkaš
- Laboratory of Developmental Genetics, Institute of Experimental Endocrinology, Biomedical Research Center v.v.i., Slovak Academy of Sciences, Bratislava, Slovakia,*Correspondence: Robert Farkaš,
| |
Collapse
|
8
|
Lee B, Jaberi-Lashkari N, Calo E. A unified view of low complexity regions (LCRs) across species. eLife 2022; 11:e77058. [PMID: 36098382 PMCID: PMC9470157 DOI: 10.7554/elife.77058] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 08/17/2022] [Indexed: 11/13/2022] Open
Abstract
Low complexity regions (LCRs) play a role in a variety of important biological processes, yet we lack a unified view of their sequences, features, relationships, and functions. Here, we use dotplots and dimensionality reduction to systematically define LCR type/copy relationships and create a map of LCR sequence space capable of integrating LCR features and functions. By defining LCR relationships across the proteome, we provide insight into how LCR type and copy number contribute to higher order assemblies, such as the importance of K-rich LCR copy number for assembly of the nucleolar protein RPA43 in vivo and in vitro. With LCR maps, we reveal the underlying structure of LCR sequence space, and relate differential occupancy in this space to the conservation and emergence of higher order assemblies, including the metazoan extracellular matrix and plant cell wall. Together, LCR relationships and maps uncover and identify scaffold-client relationships among E-rich LCR-containing proteins in the nucleolus, and revealed previously undescribed regions of LCR sequence space with signatures of higher order assemblies, including a teleost-specific T/H-rich sequence space. Thus, this unified view of LCRs enables discovery of how LCRs encode higher order assemblies of organisms.
Collapse
Affiliation(s)
- Byron Lee
- Department of Biology, Massachusetts Institute of TechnologyCambridgeUnited States
| | - Nima Jaberi-Lashkari
- Department of Biology, Massachusetts Institute of TechnologyCambridgeUnited States
| | - Eliezer Calo
- Department of Biology, Massachusetts Institute of TechnologyCambridgeUnited States
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of TechnologyCambridgeUnited States
| |
Collapse
|
9
|
Gutierrez JI, Brittingham GP, Karadeniz YB, Tran KD, Dutta A, Holehouse AS, Peterson CL, Holt LJ. SWI/SNF senses carbon starvation with a pH-sensitive low complexity sequence. eLife 2022; 11:70344. [PMID: 35129437 PMCID: PMC8890752 DOI: 10.7554/elife.70344] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 02/06/2022] [Indexed: 11/16/2022] Open
Abstract
It is increasingly appreciated that intracellular pH changes are important biological signals. This motivates the elucidation of molecular mechanisms of pH sensing. We determined that a nucleocytoplasmic pH oscillation was required for the transcriptional response to carbon starvation in Saccharomyces cerevisiae. The SWI/SNF chromatin remodeling complex is a key mediator of this transcriptional response. A glutamine-rich low-complexity domain (QLC) in the SNF5 subunit of this complex, and histidines within this sequence, was required for efficient transcriptional reprogramming. Furthermore, the SNF5 QLC mediated pH-dependent recruitment of SWI/SNF to an acidic transcription factor in a reconstituted nucleosome remodeling assay. Simulations showed that protonation of histidines within the SNF5 QLC leads to conformational expansion, providing a potential biophysical mechanism for regulation of these interactions. Together, our results indicate that pH changes are a second messenger for transcriptional reprogramming during carbon starvation and that the SNF5 QLC acts as a pH sensor.
Collapse
Affiliation(s)
| | - Gregory P Brittingham
- Institute for Systems Genetics, New York University Langone Health, New York, United States
| | - Yonca B Karadeniz
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, United States
| | - Kathleen D Tran
- Department of Cell and Molecular Biology, University of Rhode Island, South Kingstown, United States
| | - Arnob Dutta
- Department of Cell and Molecular Biology, University of Rhode Island, South Kingstown, United States
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, St Louis, United States
| | - Craig L Peterson
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, United States
| | - Liam J Holt
- Institute for Systems Genetics, New York University Langone Health, New York, United States
| |
Collapse
|
10
|
Homopeptide and homocodon levels across fungi are coupled to GC/AT-bias and intrinsic disorder, with unique behaviours for some amino acids. Sci Rep 2021; 11:10025. [PMID: 33976321 PMCID: PMC8113271 DOI: 10.1038/s41598-021-89650-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/22/2021] [Indexed: 11/09/2022] Open
Abstract
Homopeptides (runs of one amino-acid type) are evolutionarily important since they are prone to expand/contract during DNA replication, recombination and repair. To gain insight into the genomic/proteomic traits driving their variation, we analyzed how homopeptides and homocodons (which are pure codon repeats) vary across 405 Dikarya, and probed their linkage to genome GC/AT bias and other factors. We find that amino-acid homopeptide frequencies vary diversely between clades, with the AT-rich Saccharomycotina trending distinctly. As organisms evolve, homocodon and homopeptide numbers are majorly coupled to GC/AT-bias, exhibiting a bi-furcated correlation with degree of AT- or GC-bias. Mid-GC/AT genomes tend to have markedly fewer simply because they are mid-GC/AT. Despite these trends, homopeptides tend to be GC-biased relative to other parts of coding sequences, even in AT-rich organisms, indicating they absorb AT bias less or are inherently more GC-rich. The most frequent and most variable homopeptide amino acids favour intrinsic disorder, and there are an opposing correlation and anti-correlation versus homopeptide levels for intrinsic disorder and structured-domain content respectively. Specific homopeptides show unique behaviours that we suggest are linked to inherent slippage probabilities during DNA replication and recombination, such as poly-glutamine, which is an evolutionarily very variable homopeptide with a codon repertoire unbiased for GC/AT, and poly-lysine whose homocodons are overwhelmingly made from the codon AAG.
Collapse
|
11
|
Li Y, Chen X, Wu K, Pan J, Long H, Yan Y. Characterization of Simple Sequence Repeats (SSRs) in Ciliated Protists Inferred by Comparative Genomics. Microorganisms 2020; 8:microorganisms8050662. [PMID: 32370063 PMCID: PMC7285179 DOI: 10.3390/microorganisms8050662] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 04/24/2020] [Accepted: 04/26/2020] [Indexed: 01/02/2023] Open
Abstract
Simple sequence repeats (SSRs) are prevalent in the genomes of all organisms. They are widely used as genetic markers, and are insertion/deletion mutation hotspots, which directly influence genome evolution. However, little is known about such important genomic components in ciliated protists, a large group of unicellular eukaryotes with extremely long evolutionary history and genome diversity. With recent publications of multiple ciliate genomes, we start to get a chance to explore perfect SSRs with motif size 1-100 bp and at least three motif repeats in nine species of two ciliate classes, Oligohymenophorea and Spirotrichea. We found that homopolymers are the most prevalent SSRs in these A/T-rich species, with AAA (lysine, charged amino acid; also seen as an SSR with one-adenine motif repeated three times) being the codons repeated at the highest frequencies in coding SSR regions, consistent with the widespread alveolin proteins rich in lysine repeats as found in Tetrahymena. Micronuclear SSRs are universally more abundant than the macronuclear ones of the same motif-size, except for the 8-bp-motif SSRs in extensively fragmented chromosomes. Both the abundance and A/T content of SSRs decrease as motif-size increases, while the abundance is positively correlated with the A/T content of the genome. Also, smaller genomes have lower proportions of coding SSRs out of all SSRs in Paramecium species. This genome-wide and cross-species analysis reveals the high diversity of SSRs and reflects the rapid evolution of these simple repetitive elements in ciliate genomes.
Collapse
|
12
|
CAPRI enables comparison of evolutionarily conserved RNA interacting regions. Nat Commun 2019; 10:2682. [PMID: 31213602 PMCID: PMC6581911 DOI: 10.1038/s41467-019-10585-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Accepted: 05/21/2019] [Indexed: 12/21/2022] Open
Abstract
RNA-protein complexes play essential regulatory roles at nearly all levels of gene expression. Using in vivo crosslinking and RNA capture, we report a comprehensive RNA-protein interactome in a metazoan at four levels of resolution: single amino acids, domains, proteins and multisubunit complexes. We devise CAPRI, a method to map RNA-binding domains (RBDs) by simultaneous identification of RNA interacting crosslinked peptides and peptides adjacent to such crosslinked sites. CAPRI identifies more than 3000 RNA proximal peptides in Drosophila and human proteins with more than 45% of them forming new interaction interfaces. The comparison of orthologous proteins enables the identification of evolutionary conserved RBDs in globular domains and intrinsically disordered regions (IDRs). By comparing the sequences of IDRs through evolution, we classify them based on the type of motif, accumulation of tandem repeats, conservation of amino acid composition and high sequence divergence. Comprehensive characterisation of RNA-protein interactions requires different levels of resolution. Here, the authors present an integrated mass spectrometry-based approach that allows them to define the Drosophila RNA-protein interactome from the level of multisubunit complexes down to the RNA-binding amino acid.
Collapse
|
13
|
Abstract
Understanding phylogenetic relationships among taxa is key to designing and implementing comparative analyses. The genus Drosophila, which contains over 1600 species, is one of the most important model systems in the biological sciences. For over a century, one species in this group, Drosophila melanogaster, has been key to studies of animal development and genetics, genome organization and evolution, and human disease. As whole-genome sequencing becomes more cost-effective, there is increasing interest in other members of this morphologically, ecologically, and behaviorally diverse genus. Phylogenetic relationships within Drosophila are complicated, and the goal of this paper is to provide a review of the recent taxonomic changes and phylogenetic relationships in this genus to aid in further comparative studies.
Collapse
|
14
|
Press MO, McCoy RC, Hall AN, Akey JM, Queitsch C. Massive variation of short tandem repeats with functional consequences across strains of Arabidopsis thaliana. Genome Res 2018; 28:1169-1178. [PMID: 29970452 PMCID: PMC6071631 DOI: 10.1101/gr.231753.117] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 06/26/2018] [Indexed: 11/24/2022]
Abstract
Short tandem repeat (STR) mutations may comprise more than half of the mutations in eukaryotic coding DNA, yet STR variation is rarely examined as a contributor to complex traits. We assessed this contribution across a collection of 96 strains of Arabidopsis thaliana, genotyping 2046 STR loci each, using highly parallel STR sequencing with molecular inversion probes. We found that 95% of examined STRs are polymorphic, with a median of six alleles per STR across these strains. STR expansions (large copy number increases) are found in most strains, several of which have evident functional effects. These include three of six intronic STR expansions we found to be associated with intron retention. Coding STRs were depleted of variation relative to noncoding STRs, and we detected a total of 56 coding STRs (11%) showing low variation consistent with the action of purifying selection. In contrast, some STRs show hypervariable patterns consistent with diversifying selection. Finally, we detected 133 novel STR-phenotype associations under stringent criteria, most of which could not be detected with SNPs alone, and validated some with follow-up experiments. Our results support the conclusion that STRs constitute a large, unascertained reservoir of functionally relevant genomic variation.
Collapse
Affiliation(s)
- Maximilian O Press
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Rajiv C McCoy
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Ashley N Hall
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
15
|
Li J, Su Y, Wang T. The Repeat Sequences and Elevated Substitution Rates of the Chloroplast accD Gene in Cupressophytes. FRONTIERS IN PLANT SCIENCE 2018; 9:533. [PMID: 29731764 PMCID: PMC5920036 DOI: 10.3389/fpls.2018.00533] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Accepted: 04/05/2018] [Indexed: 05/23/2023]
Abstract
The plastid accD gene encodes a subunit of the acetyl-CoA carboxylase (ACCase) enzyme. The length of accD gene has been supposed to expand in Cryptomeria japonica, Taiwania cryptomerioides, Cephalotaxus, Taxus chinensis, and Podocarpus lambertii, and the main reason for this phenomenon was the existence of tandemly repeated sequences. However, it is still unknown whether the accD gene length in other cupressophytes has expanded. Here, in order to investigate how widespread this phenomenon was, 18 accD sequences and its surrounding regions of cupressophyte were sequenced and analyzed. Together with 39 GenBank sequence data, our taxon sampling covered all the extant gymnosperm orders. The repetitive elements and substitution rates of accD among 57 gymnosperm species were analyzed, the results show: (1) Reading frame length of accD gene in 18 cupressophytes species has also expanded. (2) Many repetitive elements were identified in accD gene of cupressophyte lineages. (3) The synonymous and non-synonymous substitution rates of accD were accelerated in cupressophytes. (4) accD was located in rearrangement endpoints. These results suggested that repetitive elements may mediate the chloroplast genome rearrangement and accelerated the substitution rates.
Collapse
Affiliation(s)
- Jia Li
- Department of Life Sciences, Shaanxi Xueqian Normal University, Xi’an, China
| | - Yingjuan Su
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Research Institute of Sun Yat-sen University, Shenzhen, China
| | - Ting Wang
- College of Life Science, South China Agricultural University, Guangzhou, China
| |
Collapse
|
16
|
Tørresen OK, Brieuc MSO, Solbakken MH, Sørhus E, Nederbragt AJ, Jakobsen KS, Meier S, Edvardsen RB, Jentoft S. Genomic architecture of haddock (Melanogrammus aeglefinus) shows expansions of innate immune genes and short tandem repeats. BMC Genomics 2018; 19:240. [PMID: 29636006 PMCID: PMC5894186 DOI: 10.1186/s12864-018-4616-y] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 03/22/2018] [Indexed: 02/06/2023] Open
Abstract
Background Increased availability of genome assemblies for non-model organisms has resulted in invaluable biological and genomic insight into numerous vertebrates, including teleosts. Sequencing of the Atlantic cod (Gadus morhua) genome and the genomes of many of its relatives (Gadiformes) demonstrated a shared loss of the major histocompatibility complex (MHC) II genes 100 million years ago. An improved version of the Atlantic cod genome assembly shows an extreme density of tandem repeats compared to other vertebrate genome assemblies. Highly contiguous assemblies are therefore needed to further investigate the unusual immune system of the Gadiformes, and whether the high density of tandem repeats found in Atlantic cod is a shared trait in this group. Results Here, we have sequenced and assembled the genome of haddock (Melanogrammus aeglefinus) – a relative of Atlantic cod – using a combination of PacBio and Illumina reads. Comparative analyses reveal that the haddock genome contains an even higher density of tandem repeats outside and within protein coding sequences than Atlantic cod. Further, both species show an elevated number of tandem repeats in genes mainly involved in signal transduction compared to other teleosts. A characterization of the immune gene repertoire demonstrates a substantial expansion of MCHI in Atlantic cod compared to haddock. In contrast, the Toll-like receptors show a similar pattern of gene losses and expansions. For the NOD-like receptors (NLRs), another gene family associated with the innate immune system, we find a large expansion common to all teleosts, with possible lineage-specific expansions in zebrafish, stickleback and the codfishes. Conclusions The generation of a highly contiguous genome assembly of haddock revealed that the high density of short tandem repeats as well as expanded immune gene families is not unique to Atlantic cod – but possibly a feature common to all, or most, codfishes. A shared expansion of NLR genes in teleosts suggests that the NLRs have a more substantial role in the innate immunity of teleosts than other vertebrates. Moreover, we find that high copy number genes combined with variable genome assembly qualities may impede complete characterization of these genes, i.e. the number of NLRs in different teleost species might be underestimates. Electronic supplementary material The online version of this article (10.1186/s12864-018-4616-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
| | - Marine S O Brieuc
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Monica H Solbakken
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Elin Sørhus
- Institute of Marine Research, Bergen, Norway
| | - Alexander J Nederbragt
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.,Biomedical Informatics Research Group, Department of Informatics, University of Oslo, Oslo, Norway
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | | | | | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
| |
Collapse
|
17
|
Franco ME, Bitencourt TA, Marins M, Fachin AL. In silico characterization of tandem repeats in Trichophyton rubrum and related dermatophytes provides new insights into their role in pathogenesis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:3866792. [PMID: 29220431 PMCID: PMC5502367 DOI: 10.1093/database/bax035] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 03/28/2017] [Indexed: 01/01/2023]
Abstract
Trichophyton rubrum is the most common etiological agent of dermatophytoses worldwide, which is able to degrade keratinized tissues. The sequencing of the genome of different dermatophyte species has provided a large amount of data, including tandem repeats that may play a role in genetic variability and in the pathogenesis of these fungi. Tandem repeats are adjacent DNA sequences of 2–200 nucleotides in length, which exert regulatory and adaptive functions. These repetitive DNA sequences are found in different classes of fungal proteins, especially those involved in cell adhesion, a determinant factor for the establishment of fungal infection. The objective of this study was to develop a Dermatophyte Tandem Repeat Database (DTRDB) for the storage and identification of tandem repeats in T. rubrum and six other dermatophyte species. The current version of the database contains 35 577 tandem repeats detected in 16 173 coding sequences. The repeats can be searched using entry parameters such as repeat unit length (nt—nucleotide), repeat number, variability score, and repeat sequence motif. These data were used to study the relative frequency and distribution of repeats in the sequences, as well as their possible functions in dermatophytes. A search of the database revealed that these repeats occur in 22–33% of genes transcribed in dermatophytes where they could be involved in the success of adaptation to the host tissue and establishment of infection. The repeats were detected in transcripts that are mainly related to three biological processes: regulation, adhesion, and metabolism. The database developed enables users to identify and analyse tandem repeat regions in target genes related to pathogenicity and fungal–host interactions in dermatophytes and may contribute to the discovery of new targets for the development of antifungal agents. Database URL:http://comp.mch.ifsuldeminas.edu.br/dtrdb/
Collapse
Affiliation(s)
- Matheus Eloy Franco
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Federal Institute of Education, Science and Technology of South of Minas Gerais - IFSULDEMINAS, 37750-000, Brazil
| | - Tamires Aparecida Bitencourt
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Departamento de Genetica, 049-900, FMRP-USP, SP, Brazil
| | - Mozart Marins
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Curso de Medicina, Universidade de Ribeirão Preto, SP, Brazil
| | - Ana Lúcia Fachin
- Unidade de Biotecnologia, Universidade de Ribeirão Preto, Av: Costabile Romano 2201, 14096-900, Ribeirao Preto SP, Brazil.,Curso de Medicina, Universidade de Ribeirão Preto, SP, Brazil
| |
Collapse
|
18
|
Chaudhry SR, Lwin N, Phelan D, Escalante AA, Battistuzzi FU. Comparative analysis of low complexity regions in Plasmodia. Sci Rep 2018; 8:335. [PMID: 29321589 PMCID: PMC5762703 DOI: 10.1038/s41598-017-18695-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 12/14/2017] [Indexed: 12/20/2022] Open
Abstract
Low complexity regions (LCRs) are a common feature shared by many genomes, but their evolutionary and functional significance remains mostly unknown. At the core of the uncertainty is a poor understanding of the mechanisms that regulate their retention in genomes, whether driven by natural selection or neutral evolution. Applying a comparative approach of LCRs to multiple strains and species is a powerful approach to identify patterns of conservation in these regions. Using this method, we investigate the evolutionary history of LCRs in the genus Plasmodium based on orthologous protein coding genes shared by 11 species and strains from primate and rodent-infecting pathogens. We find multiple lines of evidence in support of natural selection as a major evolutionary force shaping the composition and conservation of LCRs through time and signatures that their evolutionary paths are species specific. Our findings add a comparative analysis perspective to the debate on the evolution of LCRs and harness the power of sequence comparisons to identify potential functionally important LCR candidates.
Collapse
Affiliation(s)
- S R Chaudhry
- Department of Biological Sciences, Oakland University, Rochester, MI, USA.,Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | - N Lwin
- Department of Biological Sciences, Oakland University, Rochester, MI, USA
| | - D Phelan
- Department of Biological Sciences, Oakland University, Rochester, MI, USA
| | - A A Escalante
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - F U Battistuzzi
- Department of Biological Sciences, Oakland University, Rochester, MI, USA. .,Center for Data Science and Big Data Analytics, Oakland University, Rochester, MI, USA.
| |
Collapse
|
19
|
Constraints and consequences of the emergence of amino acid repeats in eukaryotic proteins. Nat Struct Mol Biol 2017; 24:765-777. [PMID: 28805808 DOI: 10.1038/nsmb.3441] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 06/23/2017] [Indexed: 12/21/2022]
Abstract
Proteins with amino acid homorepeats have the potential to be detrimental to cells and are often associated with human diseases. Why, then, are homorepeats prevalent in eukaryotic proteomes? In yeast, homorepeats are enriched in proteins that are essential and pleiotropic and that buffer environmental insults. The presence of homorepeats increases the functional versatility of proteins by mediating protein interactions and facilitating spatial organization in a repeat-dependent manner. During evolution, homorepeats are preferentially retained in proteins with stringent proteostasis, which might minimize repeat-associated detrimental effects such as unregulated phase separation and protein aggregation. Their presence facilitates rapid protein divergence through accumulation of amino acid substitutions, which often affect linear motifs and post-translational-modification sites. These substitutions may result in rewiring protein interaction and signaling networks. Thus, homorepeats are distinct modules that are often retained in stringently regulated proteins. Their presence facilitates rapid exploration of the genotype-phenotype landscape of a population, thereby contributing to adaptation and fitness.
Collapse
|
20
|
Wang Y, Geng H, Dang X, Xiang H, Li T, Pan G, Zhou Z. Comparative Analysis of the Proteins with Tandem Repeats from 8 Microsporidia and Characterization of a Novel Endospore Wall Protein Colocalizing with Polar Tube from Nosema bombycis. J Eukaryot Microbiol 2017; 64:707-715. [PMID: 28321967 DOI: 10.1111/jeu.12412] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 03/09/2017] [Accepted: 03/09/2017] [Indexed: 11/27/2022]
Abstract
As a common feature of eukaryotic proteins, tandem amino acid repeat has been studied extensively in both animal and plant proteins. Here, a comparative analysis focusing on the proteins having tandem repeats was conducted in eight microsporidia, including four mammal-infecting microsporidia (Encephalitozoon cuniculi, Encephalitozoon intestinalis, Encephalitozoon hellem and Encephalitozoon bieneusi) and four insect-infecting microsporidia (Nosema apis, Nosema ceranae, Vavraia culicis and Nosema bombycis). We found that the proteins with tandem repeats were abundant in these species. The quantity of these proteins in insect-infecting microsporidia was larger than that of mammal-infecting microsporidia. Additionally, the hydrophilic residues were overrepresented in the tandem repeats of these eight microsporidian proteins and the amino acids residues in these tandem repeat sequences tend to be encoded by GC-rich codons. The tandem repeat position within proteins of insect-infecting microsporidia was randomly distributed, whereas the tandem repeats within proteins of mammal-infecting microsporidia rarely tend to be present in the N terminal regions, when compared with those present in the C terminal and middle regions. Finally, a hypothetical protein EOB14572 possessing four tandem repeats was successfully characterized as a novel endospore wall protein, which colocalized with polar tube of N. bombycis. Our study provided useful insight for the study of the proteins with tandem repeats in N. bombycis, but also further enriched the spore wall components of this obligate unicellular eukaryotic parasite.
Collapse
Affiliation(s)
- Ying Wang
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400716, China
| | - Huixia Geng
- School of Mathematics and Finance, Chongqing University of Arts and Sciences, Chongqing, 402160, China
| | - Xiaoqun Dang
- Laboratory of Animal Biology, Chongqing Normal University, Chongqing, 400047, China
| | - Heng Xiang
- College of Animal Science and Technology, Southwest University, Chongqing, 400716, China
| | - Tian Li
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400716, China
| | - Guoqing Pan
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400716, China
| | - Zeyang Zhou
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400716, China.,Laboratory of Animal Biology, Chongqing Normal University, Chongqing, 400047, China
| |
Collapse
|
21
|
Shimada MK, Sanbonmatsu R, Yamaguchi-Kabata Y, Yamasaki C, Suzuki Y, Chakraborty R, Gojobori T, Imanishi T. Selection pressure on human STR loci and its relevance in repeat expansion disease. Mol Genet Genomics 2016; 291:1851-69. [PMID: 27290643 DOI: 10.1007/s00438-016-1219-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 05/21/2016] [Indexed: 12/30/2022]
Abstract
Short Tandem Repeats (STRs) comprise repeats of one to several base pairs. Because of the high mutability due to strand slippage during DNA synthesis, rapid evolutionary change in the number of repeating units directly shapes the range of repeat-number variation according to selection pressure. However, the remaining questions include: Why are STRs causing repeat expansion diseases maintained in the human population; and why are these limited to neurodegenerative diseases? By evaluating the genome-wide selection pressure on STRs using the database we constructed, we identified two different patterns of relationship in repeat-number polymorphisms between DNA and amino-acid sequences, although both patterns are evolutionary consequences of avoiding the formation of harmful long STRs. First, a mixture of degenerate codons is represented in poly-proline (poly-P) repeats. Second, long poly-glutamine (poly-Q) repeats are favored at the protein level; however, at the DNA level, STRs encoding long poly-Qs are frequently divided by synonymous SNPs. Furthermore, significant enrichments of apoptosis and neurodevelopment were biological processes found specifically in genes encoding poly-Qs with repeat polymorphism. This suggests the existence of a specific molecular function for polymorphic and/or long poly-Q stretches. Given that the poly-Qs causing expansion diseases were longer than other poly-Qs, even in healthy subjects, our results indicate that the evolutionary benefits of long and/or polymorphic poly-Q stretches outweigh the risks of long CAG repeats predisposing to pathological hyper-expansions. Molecular pathways in neurodevelopment requiring long and polymorphic poly-Q stretches may provide a clue to understanding why poly-Q expansion diseases are limited to neurodegenerative diseases.
Collapse
Affiliation(s)
- Makoto K Shimada
- Institute for Comprehensive Medical Science, Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake, Aichi, 470-1192, Japan.
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.
- Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan.
| | - Ryoko Sanbonmatsu
- Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yumi Yamaguchi-Kabata
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573, Japan
| | - Chisato Yamasaki
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan
- Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yoshiyuki Suzuki
- Graduate School of Natural Sciences, Nagoya City University, 1 Yamanohata, Mizuho-cho, Mizuho-ku, Nagoya, Aichi, 467-8501, Japan
| | - Ranajit Chakraborty
- Health Science Center, University of North Texas, 3500 Camp Bowie Blvd., Fort Worth, TX, 76107, USA
| | - Takashi Gojobori
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Ibn Al-Haytham Building (West), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tadashi Imanishi
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan
- Department of Molecular Life Science, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa, 259-1193, Japan
| |
Collapse
|
22
|
Battistuzzi FU, Schneider KA, Spencer MK, Fisher D, Chaudhry S, Escalante AA. Profiles of low complexity regions in Apicomplexa. BMC Evol Biol 2016; 16:47. [PMID: 26923229 PMCID: PMC4770516 DOI: 10.1186/s12862-016-0625-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Accepted: 02/17/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Low complexity regions (LCRs) are a ubiquitous feature in genomes and yet their evolutionary history and functional roles are unclear. Previous studies have shown contrasting evidence in favor of both neutral and selective mechanisms of evolution for different sets of LCRs suggesting that modes of identification of these regions may play a role in our ability to discern their evolutionary history. To further investigate this issue, we used a multiple threshold approach to identify species-specific profiles of proteome complexity and, by comparing properties of these sets, determine the influence that starting parameters have on evolutionary inferences. RESULTS We find that, although qualitatively similar, quantitatively each species has a unique LCR profile which represents the frequency of these regions within each genome. Inferences based on these profiles are more accurate in comparative analyses of genome complexity as they allow to determine the relative complexity of multiple genomes as well as the type of repetitiveness that is most common in each. Based on the multiple threshold LCR sets obtained, we identified predominant evolutionary mechanisms at different complexity levels, which show neutral mechanisms acting on highly repetitive LCRs (e.g., homopolymers) and selective forces becoming more important as heterogeneity of the LCRs increases. CONCLUSIONS Our results show how inferences based on LCRs are influenced by the parameters used to identify these regions. Sets of LCRs are heterogeneous aggregates of regions that include homo- and heteropolymers and, as such, evolve according to different mechanisms. LCR profiles provide a new way to investigate genome complexity across species and to determine the driving mechanism of their evolution.
Collapse
Affiliation(s)
| | - Kristan A Schneider
- Department of MNI, University of Applied Sciences Mittweida, Mittweida, Germany.
| | - Matthew K Spencer
- Department of Geology and Physics, Lake Superior State University, Sault Ste. Marie, MI, USA.
| | - David Fisher
- David Eccles School of Business, University of Utah, Salt Lake City, UT, USA.
| | - Sophia Chaudhry
- Department of Biological Sciences, Oakland University, Rochester, MI, USA. .,Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA.
| | - Ananias A Escalante
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
23
|
Martins F, Gonçalves R, Oliveira J, Cruz-Monteagudo M, Nieto-Villar JM, Paz-y-Miño C, Rebelo I, Tejera E. Unravelling the relationship between protein sequence and low-complexity regions entropies: Interactome implications. J Theor Biol 2015; 382:320-7. [PMID: 26164061 DOI: 10.1016/j.jtbi.2015.06.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 06/12/2015] [Accepted: 06/28/2015] [Indexed: 10/23/2022]
Abstract
Low-complexity regions are sub-sequences of biased composition in a protein sequence. The influence of these regions over protein evolution, specific functions and highly interactive capacities is well known. Although protein sequence entropy has been largely studied, its relationship with low-complexity regions and the subsequent effects on protein function remains unclear. In this work we propose a theoretical and empirical model integrating the sequence entropy with local complexity parameters. Our results indicate that the protein sequence entropy is related with the protein length, the entropies inside and outside the low-complexity regions as well as their number and average size. We found a small but significant increment in the sequence entropy of hubs proteins. In agreement with our theoretical model, this increment is highly dependent of the balance between the increment of protein length and average size of the low-complexity regions. Finally, our models and proteins analysis provide evidence supporting that modifications in the average size is more relevant in hubs proteins than changes in the number of low-complexity regions.
Collapse
Affiliation(s)
- F Martins
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - R Gonçalves
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - J Oliveira
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - M Cruz-Monteagudo
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| | - J M Nieto-Villar
- Dpto. de Química-Física, Fac. de Química, Universidad de La Habana, Cuba. Cátedra de Sistemas Complejos "H. Poincaré", Universidad de La Habana, Cuba
| | - C Paz-y-Miño
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| | - I Rebelo
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal; UCIBIO@REQUIMTE, Portugal.
| | - E Tejera
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| |
Collapse
|
24
|
Radó-Trilla N, Arató K, Pegueroles C, Raya A, de la Luna S, Albà MM. Key Role of Amino Acid Repeat Expansions in the Functional Diversification of Duplicated Transcription Factors. Mol Biol Evol 2015; 32:2263-72. [PMID: 25931513 PMCID: PMC4540963 DOI: 10.1093/molbev/msv103] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The high regulatory complexity of vertebrates has been related to two rounds of whole genome duplication (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contains LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes.
Collapse
Affiliation(s)
- Núria Radó-Trilla
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Barcelona, Spain
| | - Krisztina Arató
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain Centro de Investigación Biomèdica en Red en Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Cinta Pegueroles
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Alicia Raya
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain Centro de Investigación Biomèdica en Red en Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Susana de la Luna
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain Centro de Investigación Biomèdica en Red en Enfermedades Raras (CIBERER), Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - M Mar Albà
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Barcelona, Spain Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
25
|
Abstract
Amino acid repeats (AARs) are abundant in protein sequences. They have particular roles in protein function and evolution. Simple repeat patterns generated by DNA slippage tend to introduce length variations and point mutations in repeat regions. Loss of normal and gain of abnormal function owing to their variable length are potential risks leading to diseases. Repeats with complex patterns mostly refer to the functional domain repeats, such as the well-known leucine-rich repeat and WD repeat, which are frequently involved in protein–protein interaction. They are mainly derived from internal gene duplication events and stabilized by ‘gate-keeper’ residues, which play crucial roles in preventing inter-domain aggregation. AARs are widely distributed in different proteomes across a variety of taxonomic ranges, and especially abundant in eukaryotic proteins. However, their specific evolutionary and functional scenarios are still poorly understood. Identifying AARs in protein sequences is the first step for the further investigation of their biological function and evolutionary mechanism. In principle, this is an NP-hard problem, as most of the repeat fragments are shaped by a series of sophisticated evolutionary events and become latent periodical patterns. It is not possible to define a uniform criterion for detecting and verifying various repeat patterns. Instead, different algorithms based on different strategies have been developed to cope with different repeat patterns. In this review, we attempt to describe the amino acid repeat-detection algorithms currently available and compare their strategies based on an in-depth analysis of the biological significance of protein repeats.
Collapse
|
26
|
Behura SK, Severson DW. Motif mismatches in microsatellites: insights from genome-wide investigation among 20 insect species. DNA Res 2014; 22:29-38. [PMID: 25378245 PMCID: PMC4379975 DOI: 10.1093/dnares/dsu036] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
We present a detailed genome-wide comparative study of motif mismatches of microsatellites among 20 insect species representing five taxonomic orders. The results show that varying proportions (∼15-46%) of microsatellites identified in these species are imperfect in motif structure, and that they also vary in chromosomal distribution within genomes. It was observed that the genomic abundance of imperfect repeats is significantly associated with the length and number of motif mismatches of microsatellites. Furthermore, microsatellites with a higher number of mismatches tend to have lower abundance in the genome, suggesting that sequence heterogeneity of repeat motifs is a key determinant of genomic abundance of microsatellites. This relationship seems to be a general feature of microsatellites even in unrelated species such as yeast, roundworm, mouse and human. We provide a mechanistic explanation of the evolutionary link between motif heterogeneity and genomic abundance of microsatellites by examining the patterns of motif mismatches and allele sequences of single-nucleotide polymorphisms identified within microsatellite loci. Using Drosophila Reference Genetic Panel data, we further show that pattern of allelic variation modulates motif heterogeneity of microsatellites, and provide estimates of allele age of specific imperfect microsatellites found within protein-coding genes.
Collapse
Affiliation(s)
- Susanta K Behura
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - David W Severson
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
27
|
Lenz C, Haerty W, Golding GB. Increased substitution rates surrounding low-complexity regions within primate proteins. Genome Biol Evol 2014; 6:655-65. [PMID: 24572016 PMCID: PMC3971593 DOI: 10.1093/gbe/evu042] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Previous studies have found that DNA-flanking low-complexity regions (LCRs) have an increased substitution rate. Here, the substitution rate was confirmed to increase in the vicinity of LCRs in several primate species, including humans. This effect was also found among human sequences from the 1000 Genomes Project. A strong correlation was found between average substitution rate per site and distance from the LCR, as well as the proportion of genes with gaps in the alignment at each site and distance from the LCR. Along with substitution rates, dN/dS ratios were also determined for each site, and the proportion of sites undergoing negative selection was found to have a negative relationship with distance from the LCR.
Collapse
Affiliation(s)
- Carolyn Lenz
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | | | | |
Collapse
|
28
|
Brittain A, Stroebele E, Erives A. Microsatellite repeat instability fuels evolution of embryonic enhancers in Hawaiian Drosophila. PLoS One 2014; 9:e101177. [PMID: 24978198 PMCID: PMC4076327 DOI: 10.1371/journal.pone.0101177] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Accepted: 06/03/2014] [Indexed: 12/16/2022] Open
Abstract
For ∼30 million years, the eggs of Hawaiian Drosophila were laid in ever-changing environments caused by high rates of island formation. The associated diversification of the size and developmental rate of the syncytial fly embryo would have altered morphogenic gradients, thus necessitating frequent evolutionary compensation of transcriptional responses. We investigate the consequences these radiations had on transcriptional enhancers patterning the embryo to see whether their pattern of molecular evolution is different from non-Hawaiian species. We identify and functionally assay in transgenic D. melanogaster the Neurogenic Ectoderm Enhancers from two different Hawaiian Drosophila groups: (i) the picture wing group, and (ii) the modified mouthparts group. We find that the binding sites in this set of well-characterized enhancers are footprinted by diverse microsatellite repeat (MSR) sequences. We further show that Hawaiian embryonic enhancers in general are enriched in MSR relative to both Hawaiian non-embryonic enhancers and non-Hawaiian embryonic enhancers. We propose embryonic enhancers are sensitive to Activator spacing because they often serve as assembly scaffolds for the aggregation of transcription factor activator complexes. Furthermore, as most indels are produced by microsatellite repeat slippage, enhancers from Hawaiian Drosophila lineages, which experience dynamic evolutionary pressures, would become grossly enriched in MSR content.
Collapse
Affiliation(s)
- Andrew Brittain
- Department of Biology, University of Iowa, Iowa City, Iowa, United States of America
| | - Elizabeth Stroebele
- Department of Biology, University of Iowa, Iowa City, Iowa, United States of America
| | - Albert Erives
- Department of Biology, University of Iowa, Iowa City, Iowa, United States of America
- * E-mail:
| |
Collapse
|
29
|
Chong Z, Zhai W, Li C, Gao M, Gong Q, Ruan J, Li J, Jiang L, Lv X, Hungate E, Wu CI. The evolution of small insertions and deletions in the coding genes of Drosophila melanogaster. Mol Biol Evol 2013; 30:2699-708. [PMID: 24077769 DOI: 10.1093/molbev/mst167] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Studies of protein evolution have focused on amino acid substitutions with much less systematic analysis on insertion and deletions (indels) in protein coding genes. We hence surveyed 7,500 genes between Drosophila melanogaster and D. simulans, using D. yakuba as an outgroup for this purpose. The evolutionary rate of coding indels is indeed low, at only 3% of that of nonsynonymous substitutions. As coding indels follow a geometric distribution in size and tend to fall in low-complexity regions of proteins, it is unclear whether selection or mutation underlies this low rate. To resolve the issue, we collected genomic sequences from an isogenic African line of D. melanogaster (ZS30) at a high coverage of 70× and analyzed indel polymorphism between ZS30 and the reference genome. In comparing polymorphism and divergence, we found that the divergence to polymorphism ratio (i.e., fixation index) for smaller indels (size ≤ 10 bp) is very similar to that for synonymous changes, suggesting that most of the within-species polymorphism and between-species divergence for indels are selectively neutral. Interestingly, deletions of larger sizes (size ≥ 11 bp and ≤ 30 bp) have a much higher fixation index than synonymous mutations and 44.4% of fixed middle-sized deletions are estimated to be adaptive. To our surprise, this pattern is not found for insertions. Protein indel evolution appear to be in a dynamic flux of neutrally driven expansion (insertions) together with adaptive-driven contraction (deletions), and these observations provide important insights for understanding the fitness of new mutations as well as the evolutionary driving forces for genomic evolution in Drosophila species.
Collapse
Affiliation(s)
- Zechen Chong
- Center for Computational Biology and Laboratory of Disease Genomics and Individualized Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Assessing the genome-wide effect of promoter region tandem repeat natural variation on gene expression. G3-GENES GENOMES GENETICS 2012; 2:1643-9. [PMID: 23275886 PMCID: PMC3516485 DOI: 10.1534/g3.112.004663] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2012] [Accepted: 10/24/2012] [Indexed: 12/23/2022]
Abstract
Copy number polymorphisms of nucleotide tandem repeat (TR) regions, such as microsatellites and minisatellites, are mutationally reversible and highly abundant in eukaryotic genomes. Studies linking TR polymorphism to phenotypic variation have led some to suggest that TR variation modulates and majorly contributes to phenotypic variation; however, studies in which the authors assess the genome-wide impact of TR variation on phenotype are lacking. To address this question, we quantified relationships between polymorphism levels in 143 genome-wide promoter region TRs across 16 isolates of the filamentous fungus Aspergillus flavus and its ecotype Aspergillus oryzae with expression levels of their downstream genes. We found that only 4.3% of relationships tested were significant; these findings were consistent with models in which TRs act as “tuning,” “volume,” or “optimality” “knobs” of phenotype but not with “switch” models. Furthermore, the promoter regions of differentially expressed genes between A. oryzae and A. flavus did not show TR enrichment, suggesting that genome-wide differences in molecular phenotype between the two species are not significantly associated with TRs. Although in some cases TR polymorphisms do contribute to transcript abundance variation, these results argue that at least in this case, TRs might not be major modulators of variation in phenotype.
Collapse
|
31
|
Radó-Trilla N, Albà M. Dissecting the role of low-complexity regions in the evolution of vertebrate proteins. BMC Evol Biol 2012; 12:155. [PMID: 22920595 PMCID: PMC3523016 DOI: 10.1186/1471-2148-12-155] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 07/30/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution. RESULTS We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance. CONCLUSION We have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.
Collapse
Affiliation(s)
- Núria Radó-Trilla
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics - IMIM Hospital del Mar Research Institute, Universitat Pompeu Fabra, Dr. Aiguader 88, Barcelona 08003, Spain
| | | |
Collapse
|
32
|
Homepeptide repeats: implications for protein structure, function and evolution. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:217-25. [PMID: 23084777 PMCID: PMC5054710 DOI: 10.1016/j.gpb.2012.04.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Revised: 04/03/2012] [Accepted: 04/19/2012] [Indexed: 11/20/2022]
Abstract
Analysis of protein sequences from Mycobacterium tuberculosis H37Rv (Mtb H37Rv) was performed to identify homopeptide repeat-containing proteins (HRCPs). Functional annotation of the HRCPs showed that they are preferentially involved in cellular metabolism. Furthermore, these homopeptide repeats might play some specific roles in protein–protein interaction. Repeat length differences among Bacteria, Archaea and Eukaryotes were calculated in order to identify the conservation of the repeats in these divergent kingdoms. From the results, it was evident that these repeats have a higher degree of conservation in Bacteria and Archaea than in Eukaryotes. In addition, there seems to be a direct correlation between the repeat length difference and the degree of divergence between the species. Our study supports the hypothesis that the presence of homopeptide repeats influences the rate of evolution of the protein sequences in which they are embedded. Thus, homopeptide repeat may have structural, functional and evolutionary implications on proteins.
Collapse
|
33
|
Behura SK, Severson DW. Genome-wide comparative analysis of simple sequence coding repeats among 25 insect species. Gene 2012; 504:226-32. [PMID: 22633877 DOI: 10.1016/j.gene.2012.05.020] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Revised: 05/11/2012] [Accepted: 05/12/2012] [Indexed: 10/28/2022]
Abstract
We present a detailed genome-scale comparative analysis of simple sequence repeats within protein coding regions among 25 insect genomes. The repetitive sequences in the coding regions primarily represented single codon repeats and codon pair repeats. The CAG triplet is highly repetitive in the coding regions of insect genomes. It is frequently paired with the synonymous codon CAA to code for polyglutamine repeats. The codon pairs that are least repetitive code for polyalanine repeats. The frequency of hexanucleotide and dinucleotide motifs of codon pair repeats is significantly (p<0.001) different in the Drosophila species compared to the non-Drosophila species. However, the frequency of synonymous and non-synonymous codon pair repeats varies in a correlated manner (r(2)=0.79) among all the species. Results further show that perfect and imperfect repeats have significant association with the trinucleotide and hexanucleotide coding repeats in most of these insects. However, only select species show significant association between the numbers of perfect/imperfect hexamers and repeat coding for single amino acid/amino acid pair runs. Our data further suggests that genes containing simple sequence coding repeats may be under negative selection as they tend to be poorly conserved across species. The sequences of coding repeats of orthologous genes vary according to the known phylogeny among the species. In conclusion, the study shows that simple sequence coding repeats are important features of genome diversity among insects.
Collapse
Affiliation(s)
- Susanta K Behura
- Eck Institute for Global Health, Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA.
| | | |
Collapse
|
34
|
Ramazzotti M, Monsellier E, Kamoun C, Degl'Innocenti D, Melki R. Polyglutamine repeats are associated to specific sequence biases that are conserved among eukaryotes. PLoS One 2012; 7:e30824. [PMID: 22312432 PMCID: PMC3270027 DOI: 10.1371/journal.pone.0030824] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2011] [Accepted: 12/23/2011] [Indexed: 12/20/2022] Open
Abstract
Nine human neurodegenerative diseases, including Huntington's disease and several spinocerebellar ataxia, are associated to the aggregation of proteins comprising an extended tract of consecutive glutamine residues (polyQs) once it exceeds a certain length threshold. This event is believed to be the consequence of the expansion of polyCAG codons during the replication process. This is in apparent contradiction with the fact that many polyQs-containing proteins remain soluble and are encoded by invariant genes in a number of eukaryotes. The latter suggests that polyQs expansion and/or aggregation might be counter-selected through a genetic and/or protein context. To identify this context, we designed a software that scrutinize entire proteomes in search for imperfect polyQs. The nature of residues flanking the polyQs and that of residues other than Gln within polyQs (insertions) were assessed. We discovered strong amino acid residue biases robustly associated to polyQs in the 15 eukaryotic proteomes we examined, with an over-representation of Pro, Leu and His and an under-representation of Asp, Cys and Gly amino acid residues. These biases are conserved amongst unrelated proteins and are independent of specific functional classes. Our findings suggest that specific residues have been co-selected with polyQs during evolution. We discuss the possible selective pressures responsible of the observed biases.
Collapse
Affiliation(s)
- Matteo Ramazzotti
- Dipartimento di Scienze Biochimiche, Università degli Studi di Firenze, Florence, Italy
- * E-mail: (MR); (EM)
| | - Elodie Monsellier
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
- * E-mail: (MR); (EM)
| | - Choumouss Kamoun
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| | | | - Ronald Melki
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| |
Collapse
|
35
|
King DG. Evolution of simple sequence repeats as mutable sites. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 769:10-25. [PMID: 23560302 DOI: 10.1007/978-1-4614-5434-2_2] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Because natural selection is commonly presumed to minimize mutation rates, the discovery of mutationally unstable simple sequence repeats (SSRs) in many functional genomic locations came as a surprise to many biologists. Whether such SSRs persist in spite of or because of their intrinsic mutability-whether they constitute a genetic burden or an evolutionary boon--remains uncertain. Two contrasting evolutionary explanations can be offered for SSR abundance. First, suppressing the inherent mutability of repetitive sequences might simply lie beyond the reach of natural selection. Alternatively, natural selection might indirectly favor SSRs at sites where particular repeat-number variants have provided positive contributions to fitness. Indirect selection could thereby shape SSRs into "tuning knobs" that facilitate evolutionary adaptation by implementing an implicit protocol of incremental adjustability. The latter possibility is consistent with deep evolutionary conservation of some SSRs, including several in genes with neurological and neurodevelopmental function.
Collapse
Affiliation(s)
- David G King
- Department of Anatomy, Southern Illinois University Carbondale, Carbondale, Illinois, USA.
| |
Collapse
|
36
|
Zhou Y, Liu J, Han L, Li ZG, Zhang Z. Comprehensive analysis of tandem amino acid repeats from ten angiosperm genomes. BMC Genomics 2011; 12:632. [PMID: 22195734 PMCID: PMC3283746 DOI: 10.1186/1471-2164-12-632] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Accepted: 12/23/2011] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The presence of tandem amino acid repeats (AARs) is one of the signatures of eukaryotic proteins. AARs were thought to be frequently involved in bio-molecular interactions. Comprehensive studies that primarily focused on metazoan AARs have suggested that AARs are evolving rapidly and are highly variable among species. However, there is still controversy over causal factors of this inter-species variation. In this work, we attempted to investigate this topic mainly by comparing AARs in orthologous proteins from ten angiosperm genomes. RESULTS Angiosperm AAR content is positively correlated with the GC content of the protein coding sequence. However, based on observations from fungal AARs and insect AARs, we argue that the applicability of this kind of correlation is limited by AAR residue composition and species' life history traits. Angiosperm AARs also tend to be fast evolving and structurally disordered, supporting the results of comprehensive analyses of metazoans. The functions of conserved long AARs are summarized. Finally, we propose that the rapid mRNA decay rate, alternative splicing and tissue specificity are regulatory processes that are associated with angiosperm proteins harboring AARs. CONCLUSIONS Our investigation suggests that GC content is a predictor of AAR content in the protein coding sequence under certain conditions. Although angiosperm AARs lack conservation and 3D structure, a fraction of the proteins that contain AARs may be functionally important and are under extensive regulation in plant cells.
Collapse
Affiliation(s)
- Yuan Zhou
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Jing Liu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhi-Gang Li
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
37
|
Luo H, Lin K, David A, Nijveen H, Leunissen JAM. ProRepeat: an integrated repository for studying amino acid tandem repeats in proteins. Nucleic Acids Res 2011; 40:D394-9. [PMID: 22102581 PMCID: PMC3245022 DOI: 10.1093/nar/gkr1019] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
ProRepeat (http://prorepeat.bioinformatics.nl/) is an integrated curated repository and analysis platform for in-depth research on the biological characteristics of amino acid tandem repeats. ProRepeat collects repeats from all proteins included in the UniProt knowledgebase, together with 85 completely sequenced eukaryotic proteomes contained within the RefSeq collection. It contains non-redundant perfect tandem repeats, approximate tandem repeats and simple, low-complexity sequences, covering the majority of the amino acid tandem repeat patterns found in proteins. The ProRepeat web interface allows querying the repeat database using repeat characteristics like repeat unit and length, number of repetitions of the repeat unit and position of the repeat in the protein. Users can also search for repeats by the characteristics of repeat containing proteins, such as entry ID, protein description, sequence length, gene name and taxon. ProRepeat offers powerful analysis tools for finding biological interesting properties of repeats, such as the strong position bias of leucine repeats in the N-terminus of eukaryotic protein sequences, the differences of repeat abundance among proteomes, the functional classification of repeat containing proteins and GC content constrains of repeats’ corresponding codons.
Collapse
Affiliation(s)
- Hong Luo
- Laboratory of Bioinformatics, Wageningen University and Research Centre, PO Box 569, 6700 AN Wageningen, Netherlands
| | | | | | | | | |
Collapse
|
38
|
Kurosaki T, Gojobori J, Ueda S. Comparative Genetics of the Poly-Q Tract of Ataxin-1 and Its Binding Protein PQBP-1. Biochem Genet 2011; 50:309-17. [DOI: 10.1007/s10528-011-9473-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2010] [Accepted: 06/14/2011] [Indexed: 11/28/2022]
|
39
|
Behura SK, Haugen M, Flannery E, Sarro J, Tessier CR, Severson DW, Duman-Scheel M. Comparative genomic analysis of Drosophila melanogaster and vector mosquito developmental genes. PLoS One 2011; 6:e21504. [PMID: 21754989 PMCID: PMC3130749 DOI: 10.1371/journal.pone.0021504] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Accepted: 05/30/2011] [Indexed: 11/18/2022] Open
Abstract
Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1) are components of developmental signaling pathways, 2) regulate fundamental developmental processes, 3) are critical for the development of tissues of vector importance, 4) function in developmental processes known to have diverged within insects, and 5) encode microRNAs (miRNAs) that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments.
Collapse
Affiliation(s)
- Susanta K. Behura
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Morgan Haugen
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, South Bend, Indiana, United States of America
| | - Ellen Flannery
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Joseph Sarro
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Charles R. Tessier
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, South Bend, Indiana, United States of America
| | - David W. Severson
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, South Bend, Indiana, United States of America
| | - Molly Duman-Scheel
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, South Bend, Indiana, United States of America
- * E-mail:
| |
Collapse
|
40
|
Haerty W, Golding GB. Increased polymorphism near low-complexity sequences across the genomes of Plasmodium falciparum isolates. Genome Biol Evol 2011; 3:539-50. [PMID: 21602572 PMCID: PMC3140889 DOI: 10.1093/gbe/evr045] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Low-complexity regions (LCRs) within proteins sequences are often considered to evolve neutrally even though recent studies reported evidence for selection acting on some of them. Because of their widespread distribution among eukaryotes genomes and the potential deleterious effect of expansion/contraction of some of them in humans, low-complexity sequences are of major interest and numerous studies have attempted to describe their dynamic between genomes as well as the factors correlated to their variation and to assess their selective value. However, due to the scarcity of individual genomes within a species, most of the analyses so far have been performed at the species level with the implicit assumption that the variation both in composition and size within species is too small relative to the between-species divergence to affect the conclusions of the analysis. Here we used the available genomes of 14 Plasmodium falciparum isolates to assess the relationship between low-complexity sequence variation and factors such as nucleotide polymorphism across strains, sequence composition, and protein expression. We report that more than half of the 7,711 low-complexity sequences found within aligned coding sequences are variable in size among strains. Across strains, we observed an increasing density of polymorphic sites toward the LCR boundaries. This observation strongly suggests the joint effects of lowered selective constraints on low-complexity sequences and a mutagenic effect of these simple sequences.
Collapse
Affiliation(s)
- Wilfried Haerty
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | | |
Collapse
|
41
|
Łabaj PP, Sykacek P, Kreil DP. An analysis of single amino acid repeats as use case for application specific background models. BMC Bioinformatics 2011; 12:173. [PMID: 21595908 PMCID: PMC3124433 DOI: 10.1186/1471-2105-12-173] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 05/19/2011] [Indexed: 11/30/2022] Open
Abstract
Background Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. Results Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. Conclusions Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation.
Collapse
Affiliation(s)
- Paweł P Łabaj
- Chair of Bioinformatics, Boku University Vienna, Muthgasse 18, 1190 Vienna, Austria.
| | | | | |
Collapse
|
42
|
Markova-Raina P, Petrov D. High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes. Genome Res 2011; 21:863-74. [PMID: 21393387 DOI: 10.1101/gr.115949.110] [Citation(s) in RCA: 110] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
We investigate the effect of aligner choice on inferences of positive selection using site-specific models of molecular evolution. We find that independently of the choice of aligner, the rate of false positives is unacceptably high. Our study is a whole-genome analysis of all protein-coding genes in 12 Drosophila genomes annotated in either all 12 species (~6690 genes) or in the six melanogaster group species. We compare six popular aligners: PRANK, T-Coffee, ClustalW, ProbCons, AMAP, and MUSCLE, and find that the aligner choice strongly influences the estimates of positive selection. Differences persist when we use (1) different stringency cutoffs, (2) different selection inference models, (3) alignments with or without gaps, and/or additional masking, (4) per-site versus per-gene statistics, (5) closely related melanogaster group species versus more distant 12 Drosophila genomes. Furthermore, we find that these differences are consequential for downstream analyses such as determination of over/under-represented GO terms associated with positive selection. Visual analysis indicates that most sites inferred as positively selected are, in fact, misaligned at the codon level, resulting in false positive rates of 48%-82%. PRANK, which has been reported to outperform other aligners in simulations, performed best in our empirical study as well. Unfortunately, PRANK still had a high, and unacceptable for most applications, false positives rate of 50%-55%. We identify misannotations and indels, many of which appear to be located in disordered protein regions, as primary culprits for the high misalignment-related error levels and discuss possible workaround approaches to this apparently pervasive problem in genome-wide evolutionary analyses.
Collapse
Affiliation(s)
- Penka Markova-Raina
- Department of Biology, Stanford University, Stanford, California 94305, USA.
| | | |
Collapse
|
43
|
Haerty W, Golding GB. Low-complexity sequences and single amino acid repeats: not just "junk" peptide sequences. Genome 2011; 53:753-62. [PMID: 20962881 DOI: 10.1139/g10-063] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
For decades proteins were thought to interact in a "lock and key" system, which led to the definition of a paradigm linking stable three-dimensional structure to biological function. As a consequence, any non-structured peptide was considered to be nonfunctional and to evolve neutrally. Surprisingly, the most commonly shared peptides between eukaryotic proteomes are low-complexity sequences that in most conditions do not present a stable three-dimensional structure. However, because these sequences evolve rapidly and because the size variation of a few of them can have deleterious effects, low-complexity sequences have been suggested to be the target of selection. Here we review evidence that supports the idea that these simple sequences should not be considered just "junk" peptides and that selection drives the evolution of many of them.
Collapse
Affiliation(s)
- Wilfried Haerty
- Biology Department, McMaster University, Hamilton, ON, Canada
| | | |
Collapse
|
44
|
Role of Everlasting Triplet Expansions in Protein Evolution. J Mol Evol 2010; 72:232-9. [DOI: 10.1007/s00239-010-9425-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2010] [Accepted: 12/01/2010] [Indexed: 02/05/2023]
|
45
|
Gojobori J, Ueda S. Elevated evolutionary rate in genes with homopolymeric amino acid repeats constituting nondisordered structure. Mol Biol Evol 2010; 28:543-50. [PMID: 20798138 DOI: 10.1093/molbev/msq225] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Homopolymeric amino acid repeats are tandem repeats of single amino acids. About 650 genes are known to have repeats of this kind comprising seven residues or more in the human genome. According to the evolutionary conservativeness, we classified the repeats into three categories: those whose length is conserved among mammals (CM), those whose length differs among nonprimate mammals but is conserved among primates (CP), and those whose length differs among primates (VP). The frequency of each repeat, especially Ala, Leu, Pro, and Glu repeats, varies greatly in each category. The 3D structure of homopolymeric amino acid repeats is considered to be intrinsically disordered. As expected, a large proportion of the repeats had a disordered structure, and nearly half of the repeats were predicted as completely disordered. However, a number of the repeats predicted to have nondisordered structure: 13% and 25% of the repeats for categories CM and VP, respectively. Comparison of the substitution rates showed a higher Ka/Ks ratio for the genes with not disordered repeats than the genes with disordered repeats. These results indicate that amino acid substitution rates have been elevated in the genes with nondisordered repeats.
Collapse
Affiliation(s)
- Jun Gojobori
- School of Advanced Studies, Graduate University for Advanced Studies, Hayama, Kanagawa, Japan
| | | |
Collapse
|
46
|
Tan JC, Tan A, Checkley L, Honsa CM, Ferdig MT. Variable numbers of tandem repeats in Plasmodium falciparum genes. J Mol Evol 2010; 71:268-78. [PMID: 20730584 DOI: 10.1007/s00239-010-9381-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2009] [Accepted: 08/09/2010] [Indexed: 11/29/2022]
Abstract
Genome variation studies in Plasmodium falciparum have focused on SNPs and, more recently, large-scale copy number polymorphisms and ectopic rearrangements. Here, we examine another source of variation: variable number tandem repeats (VNTRs). Interspersed low complexity features, including the well-studied P. falciparum microsatellite sequences, are commonly classified as VNTRs; however, this study is focused on longer coding VNTR polymorphisms, a small class of copy number variations. Selection against frameshift mutation is a main constraint on tandem repeats (TRs) in coding regions, while limited propagation of TRs longer than 975 nt total length is a minor restriction in coding regions. Comparative analysis of three P. falciparum genomes reveals that more than 9% of all P. falciparum ORFs harbor VNTRs, much more than has been reported for any other species. Moreover, genotyping of VNTR loci in a drug-selected line, progeny of a genetic cross, and 334 field isolates demonstrates broad variability in these sequences. Functional enrichment analysis of ORFs harboring VNTRs identifies stress and DNA damage responses along with chromatin modification activities, suggesting an influence on genome mutability and functional variation. Analysis of the repeat units and their flanking regions in both P. falciparum and Plasmodium reichenowi sequences implicates a replication slippage mechanism in the generation of TRs from an initially unrepeated sequence. VNTRs can contribute to rapid adaptation by localized sequence duplication. They also can confound SNP-typing microarrays or mapping short-sequence reads and therefore must be accounted for in such analyses.
Collapse
Affiliation(s)
- John C Tan
- The Eck Institute for Global Health, University of Notre Dame, 100 Galvin Life Sciences, Notre Dame, IN, 46556, USA.
| | | | | | | | | |
Collapse
|
47
|
Łabaj PP, Leparc GG, Bardet AF, Kreil G, Kreil DP. Single amino acid repeats in signal peptides. FEBS J 2010; 277:3147-57. [DOI: 10.1111/j.1742-4658.2010.07720.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
48
|
Mularoni L, Ledda A, Toll-Riera M, Albà MM. Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Res 2010; 20:745-54. [PMID: 20335526 DOI: 10.1101/gr.101261.109] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence.
Collapse
Affiliation(s)
- Loris Mularoni
- Biomedical Informatics Research Programme (GRIB), Fundació Institut Municipal d'Investigació Mèdica, Barcelona 08003, Spain
| | | | | | | |
Collapse
|
49
|
Haerty W, Golding GB. Genome-wide evidence for selection acting on single amino acid repeats. Genome Res 2010; 20:755-60. [PMID: 20056893 DOI: 10.1101/gr.101246.109] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Low complexity and homopolymer sequences within coding regions are known to evolve rapidly. While their expansion may be deleterious, there is increasing evidence for a functional role associated with these amino acid sequences. Homopolymer sequences are thought to evolve mostly through replication slippage and, therefore, they may be expected to be longer in regions with relaxed selective constraint. Within the coding sequences of eukaryotes, alternatively spliced exons are known to evolve under relaxed constraints in comparison to those exons that are constitutively spliced because they are not included in all of the mature mRNA of a gene. This relaxed exposure to selection leads to faster rates of evolution for alternatively spliced exons in comparison to constitutively spliced exons. Here, we have tested the effect of splicing on the structure (composition, length) of homopolymer sequences in relation to the splicing pattern in which they are found. We observed a significant relationship between alternative splicing and homopolymer sequences with alternatively spliced genes being enriched in number and length of homopolymer sequences. We also observed lower codon diversity and longer homocodons, suggesting a balance between slippage and point mutations linked to the constraints imposed by selection.
Collapse
Affiliation(s)
- Wilfried Haerty
- Biology Department, McMaster University, Hamilton, Ontario L8S4L8, Canada
| | | |
Collapse
|
50
|
Cruz F, Roux J, Robinson-Rechavi M. The expansion of amino-acid repeats is not associated to adaptive evolution in mammalian genes. BMC Genomics 2009; 10:619. [PMID: 20021652 PMCID: PMC2806350 DOI: 10.1186/1471-2164-10-619] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 12/18/2009] [Indexed: 01/22/2023] Open
Abstract
Background The expansion of amino acid repeats is determined by a high mutation rate and can be increased or limited by selection. It has been suggested that recent expansions could be associated with the potential of adaptation to new environments. In this work, we quantify the strength of this association, as well as the contribution of potential confounding factors. Results Mammalian positively selected genes have accumulated more recent amino acid repeats than other mammalian genes. However, we found little support for an accelerated evolutionary rate as the main driver for the expansion of amino acid repeats. The most significant predictors of amino acid repeats are gene function and GC content. There is no correlation with expression level. Conclusions Our analyses show that amino acid repeat expansions are causally independent from protein adaptive evolution in mammalian genomes. Relaxed purifying selection or positive selection do not associate with more or more recent amino acid repeats. Their occurrence is slightly favoured by the sequence context but mainly determined by the molecular function of the gene.
Collapse
Affiliation(s)
- Fernando Cruz
- Department of Ecology and Evolution, Biophore, University of Lausanne, 1015 Lausanne, Switzerland.
| | | | | |
Collapse
|