1
|
Mier P, Andrade-Navarro MA, Morett E. Homorepeat variability within the human population. NAR Genom Bioinform 2024; 6:lqae053. [PMID: 38774515 PMCID: PMC11106027 DOI: 10.1093/nargab/lqae053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/12/2024] [Accepted: 05/08/2024] [Indexed: 05/24/2024] Open
Abstract
Genetic variation within populations plays a crucial role in driving evolution. Unlike the average protein sequence, the evolution of homorepeats can be influenced by DNA replication slippage, when DNA polymerases either add or skip repeats of nucleotides. While there are some diseases known to be caused by abnormal changes in the length of amino acid homorepeats, naturally occurring variations in homorepeat length remain relatively unexplored. In our study, we examined the variation in amino acid homorepeat length of human individuals by analyzing 125 748 exomes, as well as 15 708 whole genomes. Our analyses revealed significant variability in homorepeat length across the human population, indicating that these motifs are prone to mutations at higher rates than non repeat sequences. We focused our study on glutamine homorepeats, also known as polyQ sequences, and found that shorter polyQ sequences tend to exhibit greater length variation, while longer ones primarily undergo deletions. Notably, polyQ sequencesthat are more conserved across primates tend to show less variation within the human population, indicating stronger selective pressure to maintain their length. Overall, our results demonstrate that there is large natural variation in the length of homorepeats within the human population, with no apparent impact on observable traits.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Enrique Morett
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México (UNAM), Av. Universidad 2001, Cuernavaca, Morelos 62210, Mexico
| |
Collapse
|
2
|
Gudkov M, Thibaut L, Giannoulatou E. Quantifying negative selection on synonymous variants. HGG ADVANCES 2024; 5:100262. [PMID: 38192100 PMCID: PMC10835449 DOI: 10.1016/j.xhgg.2024.100262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 01/01/2024] [Accepted: 01/01/2024] [Indexed: 01/10/2024] Open
Abstract
Widespread adoption of DNA sequencing has resulted in large numbers of genetic variants, whose contribution to disease is not easily determined. Although many types of variation are known to disrupt cellular processes in predictable ways, for some categories of variants, the effects may not be directly detectable. A particular example is synonymous variants, that is, those single-nucleotide variants that create a codon substitution, such that the produced amino acid sequence is unaffected. Contrary to the original theory suggesting that synonymous variants are benign, there is a growing volume of research showing that, despite their "silent" mechanism of action, some synonymous variation may be deleterious. Here, we studied the extent of the negative selective pressure acting on different classes of synonymous variants by analyzing the relative enrichment of synonymous singleton variants in the human exomes provided by gnomAD. Using a modification of the mutability-adjusted proportion of singletons (MAPS) metric as a measure of purifying selection, we found that some classes of synonymous variants are subject to stronger negative selection than others. For instance, variants that reduce codon optimality undergo stronger selection than optimality-increasing variants. Besides, selection affects synonymous variants implicated in splice-site-loss or splice-site-gain events. To understand what drives this negative selection, we tested a number of predictors in the aim to explain the variability in the selection scores. Our findings provide insights into the effects of synonymous variants at the population level, highlighting the specifics of the role that these variants play in health and disease.
Collapse
Affiliation(s)
- Mikhail Gudkov
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia; St Vincent's Clinical School, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Loïc Thibaut
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia; School of Mathematics and Statistics, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Eleni Giannoulatou
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia; St Vincent's Clinical School, UNSW Sydney, Sydney, NSW 2052, Australia.
| |
Collapse
|
3
|
Monzon AM, Arrías PN, Elofsson A, Mier P, Andrade-Navarro MA, Bevilacqua M, Clementel D, Bateman A, Hirsh L, Fornasari MS, Parisi G, Piovesan D, Kajava AV, Tosatto SCE. A STRP-ed definition of Structured Tandem Repeats in Proteins. J Struct Biol 2023; 215:108023. [PMID: 37652396 DOI: 10.1016/j.jsb.2023.108023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/31/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Dept. of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Paula Nazarena Arrías
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Arne Elofsson
- Dept. of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Tomtebodavägen 23, 171 21 Solna, Sweden
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Clementel
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
4
|
Mier P, Andrade-Navarro MA. The nucleotide landscape of polyXY regions. Comput Struct Biotechnol J 2023; 21:5408-5412. [PMID: 38022702 PMCID: PMC10652141 DOI: 10.1016/j.csbj.2023.10.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 10/30/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023] Open
Abstract
PolyXY regions are compositionally biased regions composed of two different amino acids. They are classified according to the arrangement of the two amino acid types 'X' and 'Y' into direpeats (composed of alternating amino acids, e.g. 'XYXYXY'), joined (composed of two consecutive stretches of each amino acid, e.g. 'XXXYYY') and shuffled (other arrangements, e.g., 'XYXXYY'). They have been characterized at the amino acid level in all domains of life, and are described as often found within intrinsically disordered regions. Since DNA replication slippage has been proposed as a driver of repeat variation, and given that some polyXY have a repetitive nature, we hypothesized that characterizing the nucleotide coding of various types of polyXY could give hints about their origin and evolution. To test this, we obtained all polyXY regions in the human transcriptome, categorized them, and studied their coding nucleotide sequences. We observed that polyXY exacerbates the codon biases, and that the similarity between the X and Y codons is higher than in the background proteome. Our results support a general mechanism of emergence and evolution of polyXY from single-codon polyX. PolyXY are revealed as hotspots for replication slippage, particularly those composed of repeats: joined and direpeat polyXY. Inter-conversion to shuffled polyXY disrupts nucleotide repeats and restricts further evolution by replication slippage, a mechanism that we previously observed in polyX. Our results shed light on polyXY composition and should simplify the determination of their functions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
5
|
Barbosa Pereira PJ, Manso JA, Macedo-Ribeiro S. The structural plasticity of polyglutamine repeats. Curr Opin Struct Biol 2023; 80:102607. [PMID: 37178477 DOI: 10.1016/j.sbi.2023.102607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/11/2023] [Accepted: 04/12/2023] [Indexed: 05/15/2023]
Abstract
From yeast to humans, polyglutamine (polyQ) repeat tracts are found frequently in the proteome and are particularly prominent in the activation domains of transcription factors. PolyQ is a polymorphic motif that modulates functional protein-protein interactions and aberrant self-assembly. Expansion of the polyQ repeated sequences beyond critical physiological repeat length thresholds triggers self-assembly and is linked to severe pathological implications. This review provides an overview of the current knowledge on the structures of polyQ tracts in the soluble and aggregated states and discusses the influence of neighboring regions on polyQ secondary structure, aggregation, and fibril morphologies. The influence of the genetic context of the polyQ-encoding trinucleotides is briefly discussed as a challenge for future endeavors in this field.
Collapse
Affiliation(s)
- Pedro José Barbosa Pereira
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal.
| | - José A Manso
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal
| | - Sandra Macedo-Ribeiro
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal
| |
Collapse
|
6
|
Tan D, Wei C, Chen Z, Huang Y, Deng J, Li J, Liu Y, Bao X, Xu J, Hu Z, Wang S, Fan Y, Jiang Y, Wu Y, Wu Y, Wang S, Liu P, Zhang Y, Yang Z, Jiang Y, Zhang H, Hong D, Zhong N, Jiang H, Xiong H. CAG Repeat Expansion in THAP11 Is Associated with a Novel Spinocerebellar Ataxia. Mov Disord 2023. [PMID: 37148549 DOI: 10.1002/mds.29412] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 03/22/2023] [Accepted: 04/05/2023] [Indexed: 05/08/2023] Open
Abstract
BACKGROUND More than 50 loci are associated with spinocerebellar ataxia (SCA), and the most frequent subtypes share nucleotide repeats expansion, especially CAG expansion. OBJECTIVE The objective of this study was to confirm a novel SCA subtype caused by CAG expansion. METHODS We performed long-read whole-genome sequencing combined with linkage analysis in a five-generation Chinese family, and the finding was validated in another pedigree. The three-dimensional structure and function of THAP11 mutant protein were predicted. Polyglutamine (polyQ) toxicity of THAP11 gene with CAG expansion was assessed in skin fibroblasts of patients, human embryonic kidney 293 and Neuro-2a cells. RESULTS We identified THAP11 as the novel causative SCA gene with CAG repeats ranging from 45 to 100 in patients with ataxia and from 20 to 38 in healthy control subjects. Among the patients, the number of CAA interruptions within CAG repeats was decreased to 3 (up to 5-6 in controls), whereas the number of 3' pure CAG repeats was up to 32 to 87 (4-16 in controls), suggesting that the toxicity of polyQ protein was length dependent on the pure CAG repeats. Intracellular aggregates were observed in cultured skin fibroblasts from patients. THAP11 polyQ protein was more intensely distributed in the cytoplasm of cultured skin fibroblasts from patients, which was replicated with in vitro cultured neuro-2a transfected with 54 or 100 CAG repeats. CONCLUSIONS This study identified a novel SCA subtype caused by intragenic CAG repeat expansion in THAP11 with intracellular aggregation of THAP11 polyQ protein. Our findings extended the spectrum of polyQ diseases and offered a new perspective in understanding polyQ-mediated toxic aggregation. © 2023 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Collapse
Affiliation(s)
- Dandan Tan
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
| | - Cuijie Wei
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
| | - Zhao Chen
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, P.R. China
| | - Yu Huang
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, P.R. China
| | - Jianwen Deng
- Department of Neurology, Peking University First Hospital, Beijing, P.R. China
| | | | - Yidan Liu
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
| | - Xinhua Bao
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
- Beijing Key Laboratory of Molecular Diagnosis and Study on Pediatric Genetic Diseases, Beijing, P.R. China
| | - Jin Xu
- Center of Ultrastructural Pathology, Lab of Electron Microscopy, Peking University First Hospital, Beijing, P.R. China
| | - Zhengmao Hu
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, P.R. China
| | - Suxia Wang
- Center of Ultrastructural Pathology, Lab of Electron Microscopy, Peking University First Hospital, Beijing, P.R. China
| | - Yanbin Fan
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
| | - Yizheng Jiang
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, P.R. China
| | - Ye Wu
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
- Beijing Key Laboratory of Molecular Diagnosis and Study on Pediatric Genetic Diseases, Beijing, P.R. China
| | - Yuan Wu
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
| | - Shuang Wang
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
| | - Panyan Liu
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, P.R. China
| | - Yuehua Zhang
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
- Beijing Key Laboratory of Molecular Diagnosis and Study on Pediatric Genetic Diseases, Beijing, P.R. China
| | - Zhixian Yang
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
- Beijing Key Laboratory of Molecular Diagnosis and Study on Pediatric Genetic Diseases, Beijing, P.R. China
| | - Yuwu Jiang
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
- Beijing Key Laboratory of Molecular Diagnosis and Study on Pediatric Genetic Diseases, Beijing, P.R. China
| | - Hong Zhang
- Institute of Cardiovascular Sciences and Key Laboratory of Molecular Cardiovascular Sciences, Peking University Health Science Center, Beijing, P.R. China
| | - Daojun Hong
- Department of Neurology, The First Affiliated Hospital of Nanchang University, Nanchang, P.R. China
| | - Nanbert Zhong
- New York State Institute for Basic Research in Developmental Disabilities, Staten Island, New York, USA
| | - Hong Jiang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, P.R. China
- Key Laboratory of Hunan Province in Neurodegenerative Disorders, Central South University, Changsha, P.R. China
- National Clinical Research Center for Geriatric Diseases, Central South University, Changsha, P.R. China
- National International Collaborative Research Center for Medical Metabolomics, Central South University, Changsha, P.R. China
- Department of Neurology, The Third Xiangya Hospital, Central South University, Changsha, P.R. China
| | - Hui Xiong
- Department of Pediatrics, Peking University First Hospital, Beijing, P.R. China
- Beijing Key Laboratory of Molecular Diagnosis and Study on Pediatric Genetic Diseases, Beijing, P.R. China
| |
Collapse
|
7
|
Mier P, Elena-Real CA, Cortés J, Bernadó P, Andrade-Navarro MA. The sequence context in poly-alanine regions: structure, function and conservation. Bioinformatics 2022; 38:4851-4858. [PMID: 36106994 PMCID: PMC9620824 DOI: 10.1093/bioinformatics/btac610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 07/07/2022] [Accepted: 09/05/2022] [Indexed: 11/24/2022] Open
Abstract
Motivation Poly-alanine (polyA) regions are protein stretches mostly composed of alanines. Despite their abundance in eukaryotic proteomes and their association to nine inherited human diseases, the structural and functional roles exerted by polyA stretches remain poorly understood. In this work we study how the amino acid context in which polyA regions are settled in proteins influences their structure and function. Results We identified glycine and proline as the most abundant amino acids within polyA and in the flanking regions of polyA tracts, in human proteins as well as in 17 additional eukaryotic species. Our analyses indicate that the non-structuring nature of these two amino acids influences the α-helical conformations predicted for polyA, suggesting a relevant role in reducing the inherent aggregation propensity of long polyA. Then, we show how polyA position in protein N-termini relates with their function as transit peptides. PolyA placed just after the initial methionine is often predicted as part of mitochondrial transit peptides, whereas when placed in downstream positions, polyA are part of signal peptides. A few examples from known structures suggest that short polyA can emerge by alanine substitutions in α-helices; but evolution by insertion is observed for longer polyA. Our results showcase the importance of studying the sequence context of homorepeats as a mechanism to shape their structure–function relationships. Availability and implementation The datasets used and/or analyzed during the current study are available from the corresponding author onreasonable request. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz , 55128 Mainz, Germany
| | - Carlos A Elena-Real
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS , 34090 Montpellier, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS , Toulouse, France
| | - Pau Bernadó
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS , 34090 Montpellier, France
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz , 55128 Mainz, Germany
| |
Collapse
|
8
|
Mier P, Andrade-Navarro MA. Between Interactions and Aggregates: The PolyQ Balance. Genome Biol Evol 2021; 13:evab246. [PMID: 34791220 PMCID: PMC8763233 DOI: 10.1093/gbe/evab246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2021] [Indexed: 11/17/2022] Open
Abstract
Polyglutamine (polyQ) regions are highly abundant consecutive runs of glutamine residues. They have been generally studied in relation to the so-called polyQ-associated diseases, characterized by protein aggregation caused by the expansion of the polyQ tract via a CAG-slippage mechanism. However, more than 4,800 human proteins contain a polyQ, and only nine of these regions are known to be associated with disease. Computational sequence studies and experimental structure determinations are completing a more interesting picture in which polyQ emerge as a motif for modulation of protein-protein interactions. But long polyQ regions may lead to an excess of interactions, and produce aggregates. Within this mechanistic perspective of polyQ function and malfunction, we discuss polyQ definition and properties such as variable codon usage, sequence and context structure imposition, functional relevance, evolutionary patterns in species-centered analyses, and open resources.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| |
Collapse
|
9
|
McGrath C. Synonymous but Not Equal: A Special Section and Virtual Issue on Phenotypic Effects of Synonymous Mutations. Genome Biol Evol 2021. [PMCID: PMC8410135 DOI: 10.1093/gbe/evab186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
10
|
Moving beyond disease to function: Physiological roles for polyglutamine-rich sequences in cell decisions. Curr Opin Cell Biol 2021; 69:120-126. [PMID: 33610098 DOI: 10.1016/j.ceb.2021.01.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 12/18/2020] [Accepted: 01/12/2021] [Indexed: 12/17/2022]
Abstract
Glutamine-rich tracts, also known as polyQ domains, have received a great deal of attention for their role in multiple neurodegenerative diseases, including Huntington's disease (HD), spinocerebellar ataxia (SCA), and others [22], [27]. Expansions in the normal polyQ tracts are thus commonly linked to disease, but polyQ domains themselves play multiple important functional roles in cells that are being increasingly appreciated. The biochemical nature of these domains allows them to adopt a number of different structures and form large assemblies that enable environmental responsiveness, localized signaling, and cellular memory. In many cases, these involve the formation of condensates that have varied material states. In this review, we highlight known and emerging functional roles for polyQ tracts in normal cell physiology.
Collapse
|
11
|
Mier P, Andrade-Navarro MA. The features of polyglutamine regions depend on their evolutionary stability. BMC Evol Biol 2020; 20:59. [PMID: 32448113 PMCID: PMC7247214 DOI: 10.1186/s12862-020-01626-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 05/13/2020] [Indexed: 11/29/2022] Open
Abstract
Background Polyglutamine regions (polyQ) are one of the most studied and prevalent homorepeats in eukaryotes. They have a particular length-dependent codon usage, which relates to a characteristic CAG-slippage mechanism. Pathologically expanded tracts of polyQ are known to form aggregates and are involved in the development of several human neurodegenerative diseases. The non-pathogenic function of polyQ is to mediate protein-protein interactions via a coiled-coil pairing with an interactor. They are usually located in a helical context. Results Here we study the stability of polyQ regions in evolution, using a set of 60 proteomes from four distinct taxonomic groups (Insecta, Teleostei, Sauria and Mammalia). The polyQ regions can be distinctly grouped in three categories based on their evolutionary stability: stable, unstable by length variation (inserted), and unstable by mutations (mutated). PolyQ regions in these categories can be significantly distinguished by their glutamine codon usage, and we show that the CAG-slippage mechanism is predominant in inserted polyQ of Sauria and Mammalia. The polyQ amino acid context is also influenced by the polyQ stability, with a higher proportion of proline residues around inserted polyQ. By studying the secondary structure of the sequences surrounding polyQ regions, we found that regarding the structural conformation around a polyQ, its stability category is more relevant than its taxonomic information. The protein-protein interaction capacity of a polyQ is also affected by its stability, as stable polyQ have more interactors than unstable polyQ. Conclusions Our results show that apart from the sequence of a polyQ, information about its orthologous sequences is needed to assess its function. Codon usage, amino acid context, structural conformation and the protein-protein interaction capacity of polyQ from all studied taxa critically depend on the region stability. There are however some taxa-specific polyQ features that override this importance. We conclude that a taxa-driven evolutionary analysis is of the highest importance for the comprehensive study of any feature of polyglutamine regions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| |
Collapse
|
12
|
Mier P, Elena-Real C, Urbanek A, Bernadó P, Andrade-Navarro MA. The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context. Comput Struct Biotechnol J 2020; 18:306-313. [PMID: 32071707 PMCID: PMC7016039 DOI: 10.1016/j.csbj.2020.01.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 12/13/2019] [Accepted: 01/30/2020] [Indexed: 12/18/2022] Open
Abstract
Polyglutamine (polyQ) regions are one of the most prevalent homorepeats in eukaryotes. It is however difficult to evaluate their prevalence because various studies claim different results. The reason is the lack of a consensus to define what is indeed a polyQ region. We have tackled this issue by studying how the use of different thresholds (i.e., minimum number of glutamines required in a protein region of a given size), to detect polyQ regions in the human proteome influences not only their prevalence but also their general features and sequence context. Threshold definition shapes the length distribution of the polyQ dataset, and changes the observed number and position of impurities (amino acids other than glutamine) within polyQ regions. Irrespective of the chosen threshold, leucine and proline residues are enriched both within and around polyQ. While leucine is enriched at the N-terminus of polyQ and specially at position -1 (amino acid preceding the polyQ), proline is prevalent in the C-terminus (positions +1 to +5, that is, the first five amino acids after the polyQ). We also checked the suitability of these thresholds for other species, and compared their polyQ features with those found in humans. As the sequence context and features of polyQ regions are threshold-dependent, we propose a method to quickly scan the polyQ landscape of a proteome. We complement our results with a summarized overview about which biases are to be expected per threshold when studying polyQ regions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Carlos Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
13
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 153] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
14
|
Sorek M, Cohen LRZ, Meshorer E. Open chromatin structure in PolyQ disease-related genes: a potential mechanism for CAG repeat expansion in the normal human population. NAR Genom Bioinform 2019; 1:e3. [PMID: 33575550 PMCID: PMC7671342 DOI: 10.1093/nargab/lqz003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 07/07/2019] [Accepted: 07/16/2019] [Indexed: 02/05/2023] Open
Abstract
The human genome contains dozens of genes that encode for proteins containing long poly-glutamine repeats (polyQ, usually encoded by CAG codons) of 10Qs or more. However, only nine of these genes have been reported to expand beyond the healthy variation and cause diseases. To address whether these nine disease-associated genes are unique in any way, we compared genetic and epigenetic features relative to other types of genes, especially repeat containing genes that do not cause diseases. Our analyses show that in pluripotent cells, the nine polyQ disease-related genes are characterized by an open chromatin profile, enriched for active chromatin marks and depleted for suppressive chromatin marks. By contrast, genes that encode for polyQ-containing proteins that are not associated with diseases, and other repeat containing genes, possess a suppressive chromatin environment. We propose that the active epigenetic landscape support decreased genomic stability and higher susceptibility for expansion mutations.
Collapse
Affiliation(s)
- Matan Sorek
- Edmond and Lily Safra Center for Brain Sciences, Edmond J. Safra Campus, Jerusalem, Hebrew University of Jerusalem, 9190401, Israel.,Department of Genetics, The Alexander Silberman Institute of Life Sciences, Edmond J. Safra Campus, Jerusalem, Hebrew University of Jerusalem, 9190401, Israel
| | - Lea R Z Cohen
- Edmond and Lily Safra Center for Brain Sciences, Edmond J. Safra Campus, Jerusalem, Hebrew University of Jerusalem, 9190401, Israel.,Department of Genetics, The Alexander Silberman Institute of Life Sciences, Edmond J. Safra Campus, Jerusalem, Hebrew University of Jerusalem, 9190401, Israel
| | - Eran Meshorer
- Edmond and Lily Safra Center for Brain Sciences, Edmond J. Safra Campus, Jerusalem, Hebrew University of Jerusalem, 9190401, Israel.,Department of Genetics, The Alexander Silberman Institute of Life Sciences, Edmond J. Safra Campus, Jerusalem, Hebrew University of Jerusalem, 9190401, Israel
| |
Collapse
|
15
|
Galzitskaya OV, Novikov GS, Dovidchenko NV, Lobanov MY. Is there codon usage bias for poly-Q stretches in the human proteome? J Bioinform Comput Biol 2019; 17:1950010. [PMID: 30866735 DOI: 10.1142/s0219720019500100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We have analyzed codon usage for poly-Q stretches of different lengths for the human proteome. First, we have obtained that all long poly-Q stretches in Protein Data Bank (PDB) belong to the disordered regions. Second, we have found the bias for codon usage for glutamine homo-repeats in the human proteome. In the cases when the same codon is used for poly-Q stretches only CAG triplets are found. Similar results are obtained for human proteins with glutamine homo-repeats associated with diseases. Moreover, for proteins associated with diseases (from the HraDis database), the fraction of proteins for which the same codon is used for glutamine homo-repeats is less (22%) than for proteins from the human proteome (26%). We have demonstrated for poly-Q stretches in the human proteome that in some cases (28) the splicing sites correspond to the homo-repeats and in 11 cases, these sites appear at the C -terminal part of the homo-repeats with statistical significance 10 -8 .
Collapse
Affiliation(s)
- Oxana V Galzitskaya
- * Institute of Protein Research, Russian Academy of Sciences, Institutskaya Str., 4, Pushchino, Moscow Region 142290, Russia
| | - Georgii S Novikov
- † St. Petersburg Academic University, Nanotechnology Research and Education Centre of the Russian Academy of Sciences, St. Petersburg, Khlopina Str., 8/3, 194021, Russia
| | - Nikita V Dovidchenko
- * Institute of Protein Research, Russian Academy of Sciences, Institutskaya Str., 4, Pushchino, Moscow Region 142290, Russia
| | - Mikhail Yu Lobanov
- * Institute of Protein Research, Russian Academy of Sciences, Institutskaya Str., 4, Pushchino, Moscow Region 142290, Russia
| |
Collapse
|