1
|
Persi E, Wolf YI, Karamycheva S, Makarova KS, Koonin EV. Compensatory relationship between low-complexity regions and gene paralogy in the evolution of prokaryotes. Proc Natl Acad Sci U S A 2023; 120:e2300154120. [PMID: 37036997 PMCID: PMC10120016 DOI: 10.1073/pnas.2300154120] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 03/17/2023] [Indexed: 04/12/2023] Open
Abstract
The evolution of genomes in all life forms involves two distinct, dynamic types of genomic changes: gene duplication (and loss) that shape families of paralogous genes and extension (and contraction) of low-complexity regions (LCR), which occurs through dynamics of short repeats in protein-coding genes. Although the roles of each of these types of events in genome evolution have been studied, their co-evolutionary dynamics is not thoroughly understood. Here, by analyzing a wide range of genomes from diverse bacteria and archaea, we show that LCR and paralogy represent two distinct routes of evolution that are inversely correlated. The emergence of LCR is a prominent evolutionary mechanism in fast evolving, young protein families, whereas paralogy dominates the comparatively slow evolution of old protein families. The analysis of multiple prokaryotic genomes shows that the formation of LCR is likely a widespread, transient evolutionary mechanism that temporally and locally affects also ancestral functions, but apparently, fades away with time, under mutational and selective pressures, yielding to gene paralogy. We propose that compensatory relationships between short-term and longer-term evolutionary mechanisms are universal in the evolution of life.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Svetlana Karamycheva
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Kira S. Makarova
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| |
Collapse
|
2
|
Teekas L, Sharma S, Vijay N. Lineage-specific protein repeat expansions and contractions reveal malleable regions of immune genes. Genes Immun 2022; 23:218-234. [PMID: 36203090 DOI: 10.1038/s41435-022-00186-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/21/2022] [Accepted: 09/22/2022] [Indexed: 01/07/2023]
Abstract
Functional diversification, a higher evolutionary rate, and intense positive selection help a limited number of immune genes interact with many pathogens. Repeats in protein-coding regions are a well-known source of functional diversification, adaptive variation, and evolutionary novelty in a short time. Repeats play a crucial role in biochemical functions like functional diversification of transcription regulation, protein kinases, cell adhesion, signaling pathways, morphogenesis, DNA repair, recombination, and RNA processing. Repeat length variation can change the associated protein's interaction, efficacy, and overall protein network. Repeats have an intrinsic unstable nature and can potentially evolve rapidly and expedite the acquisition of complex phenotypic traits and functions. Because of their ability to generate rapid, adaptive variations over short evolutionary distances, repeats are considered "tuning knobs." Repeat length variation in specific genes, like RUNX2 and ALX4, is associated with morphological and physiological changes across vertebrates. Here we study repeat length variation as a potent source of species-specific immune diversification across several clades of tetrapods. Moreover, we provide a clade-wise comprehensive list of immune genes with repeat types for future studies of morphological/evolutionary changes within species groups. We observe significant repeat length variation of FASLG and C1QC in Rodentia and Primates' contrasting species groups, respectively.
Collapse
Affiliation(s)
- Lokdeep Teekas
- Department of Biological Sciences, Computational Evolutionary Genomics Lab, IISER Bhopal, Bhauri, Madhya Pradesh, India
| | - Sandhya Sharma
- Department of Biological Sciences, Computational Evolutionary Genomics Lab, IISER Bhopal, Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Department of Biological Sciences, Computational Evolutionary Genomics Lab, IISER Bhopal, Bhauri, Madhya Pradesh, India.
| |
Collapse
|
3
|
Karamycheva S, Wolf YI, Persi E, Koonin EV, Makarova KS. Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions. Biol Direct 2022; 17:22. [PMID: 36042479 PMCID: PMC9425974 DOI: 10.1186/s13062-022-00337-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/13/2022] [Indexed: 12/24/2022] Open
Abstract
Background Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylogenetic analysis. Here we present an easy to estimate protein family level measure of sequence variability based on alignment column homogeneity in multiple alignments of protein sequences from Clade-Specific Clusters of Orthologous Genes (csCOGs). Results We report genome-wide estimates of variability for 8 diverse groups of bacteria and archaea and investigate the connection between variability and various genomic and biological features. The variability estimates are based on homogeneity distributions across amino acid sequence alignments and can be obtained for multiple groups of genomes at minimal computational expense. About half of the variance in variability values can be explained by the analyzed features, with the greatest contribution coming from the extent of gene paralogy in the given csCOG. The correlation between variability and paralogy appears to originate, primarily, not from gene duplication, but from acquisition of distant paralogs and xenologs, introducing sequence variants that are more divergent than those that could have evolved in situ during the lifetime of the given group of organisms. Both high-variability and low-variability csCOGs were identified in all functional categories, but as expected, proteins encoded by integrated mobile elements as well as proteins involved in defense functions and cell motility are, on average, more variable than proteins with housekeeping functions. Additionally, using linear discriminant analysis, we found that variability and fraction of genomes carrying a given gene are the two variables that provide the best prediction of gene essentiality as compared to the results of transposon mutagenesis in Sulfolobus islandicus. Conclusions Variability, a measure of sequence diversity within an alignment relative to the overall diversity within a group of organisms, offers a convenient proxy for evolutionary rate estimates and is informative with respect to prediction of functional properties of proteins. In particular, variability is a strong predictor of gene essentiality for the respective organisms and indicative of sub- or neofunctionalization of paralogs. Supplementary Information The online version contains supplementary material available at 10.1186/s13062-022-00337-7.
Collapse
Affiliation(s)
- Svetlana Karamycheva
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA.
| |
Collapse
|
4
|
Persi E, Wolf YI, Horn D, Ruppin E, Demichelis F, Gatenby RA, Gillies RJ, Koonin EV. Mutation-selection balance and compensatory mechanisms in tumour evolution. Nat Rev Genet 2020; 22:251-262. [PMID: 33257848 DOI: 10.1038/s41576-020-00299-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2020] [Indexed: 12/11/2022]
Abstract
Intratumour heterogeneity and phenotypic plasticity, sustained by a range of somatic aberrations, as well as epigenetic and metabolic adaptations, are the principal mechanisms that enable cancers to resist treatment and survive under environmental stress. A comprehensive picture of the interplay between different somatic aberrations, from point mutations to whole-genome duplications, in tumour initiation and progression is lacking. We posit that different genomic aberrations generally exhibit a temporal order, shaped by a balance between the levels of mutations and selective pressures. Repeat instability emerges first, followed by larger aberrations, with compensatory effects leading to robust tumour fitness maintained throughout the tumour progression. A better understanding of the interplay between genetic aberrations, the microenvironment, and epigenetic and metabolic cellular states is essential for early detection and prevention of cancer as well as development of efficient therapeutic strategies.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David Horn
- School of Physics and Astronomy, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Eytan Ruppin
- Cancer Data Science Lab, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Francesca Demichelis
- Department for Cellular, Computational and Integrative Biology, University of Trento, Trento, Italy.,Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital, Weill Cornell Medicine, New York, NY, USA
| | - Robert A Gatenby
- Integrated Mathematical Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Robert J Gillies
- Department of Cancer Physiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA.
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
5
|
Proteomic and genomic signatures of repeat instability in cancer and adjacent normal tissues. Proc Natl Acad Sci U S A 2019; 116:16987-16996. [PMID: 31387980 DOI: 10.1073/pnas.1908790116] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Repetitive sequences are hotspots of evolution at multiple levels. However, due to difficulties involved in their assembly and analysis, the role of repeats in tumor evolution is poorly understood. We developed a rigorous motif-based methodology to quantify variations in the repeat content, beyond microsatellites, in proteomes and genomes directly from proteomic and genomic raw data. This method was applied to a wide range of tumors and normal tissues. We identify high similarity between repeat instability patterns in tumors and their patient-matched adjacent normal tissues. Nonetheless, tumor-specific signatures both in protein expression and in the genome strongly correlate with cancer progression and robustly predict the tumorigenic state. In a patient, the hierarchy of genomic repeat instability signatures accurately reconstructs tumor evolution, with primary tumors differentiated from metastases. We observe an inverse relationship between repeat instability and point mutation load within and across patients independent of other somatic aberrations. Thus, repeat instability is a distinct, transient, and compensatory adaptive mechanism in tumor evolution and a potential signal for early detection.
Collapse
|
6
|
Persi E, Wolf YI, Koonin EV. Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins. Nat Commun 2016; 7:13570. [PMID: 27857066 PMCID: PMC5120217 DOI: 10.1038/ncomms13570] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Accepted: 10/17/2016] [Indexed: 01/21/2023] Open
Abstract
Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
7
|
Martins F, Gonçalves R, Oliveira J, Cruz-Monteagudo M, Nieto-Villar JM, Paz-y-Miño C, Rebelo I, Tejera E. Unravelling the relationship between protein sequence and low-complexity regions entropies: Interactome implications. J Theor Biol 2015; 382:320-7. [PMID: 26164061 DOI: 10.1016/j.jtbi.2015.06.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 06/12/2015] [Accepted: 06/28/2015] [Indexed: 10/23/2022]
Abstract
Low-complexity regions are sub-sequences of biased composition in a protein sequence. The influence of these regions over protein evolution, specific functions and highly interactive capacities is well known. Although protein sequence entropy has been largely studied, its relationship with low-complexity regions and the subsequent effects on protein function remains unclear. In this work we propose a theoretical and empirical model integrating the sequence entropy with local complexity parameters. Our results indicate that the protein sequence entropy is related with the protein length, the entropies inside and outside the low-complexity regions as well as their number and average size. We found a small but significant increment in the sequence entropy of hubs proteins. In agreement with our theoretical model, this increment is highly dependent of the balance between the increment of protein length and average size of the low-complexity regions. Finally, our models and proteins analysis provide evidence supporting that modifications in the average size is more relevant in hubs proteins than changes in the number of low-complexity regions.
Collapse
Affiliation(s)
- F Martins
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - R Gonçalves
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - J Oliveira
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - M Cruz-Monteagudo
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| | - J M Nieto-Villar
- Dpto. de Química-Física, Fac. de Química, Universidad de La Habana, Cuba. Cátedra de Sistemas Complejos "H. Poincaré", Universidad de La Habana, Cuba
| | - C Paz-y-Miño
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| | - I Rebelo
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal; UCIBIO@REQUIMTE, Portugal.
| | - E Tejera
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| |
Collapse
|
8
|
Press MO, Carlson KD, Queitsch C. The overdue promise of short tandem repeat variation for heritability. Trends Genet 2014; 30:504-12. [PMID: 25182195 DOI: 10.1016/j.tig.2014.07.008] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 07/23/2014] [Accepted: 07/24/2014] [Indexed: 12/11/2022]
Abstract
Short tandem repeat (STR) variation has been proposed as a major explanatory factor in the heritability of complex traits in humans and model organisms. However, we still struggle to incorporate STR variation into genotype-phenotype maps. We review here the promise of STRs in contributing to complex trait heritability and highlight the challenges that STRs pose due to their repetitive nature. We argue that STR variants are more likely than single-nucleotide variants to have epistatic interactions, reiterate the need for targeted assays to genotype STRs accurately, and call for more appropriate statistical methods in detecting STR-phenotype associations. Lastly, we suggest that somatic STR variation within individuals may serve as a read-out of disease susceptibility, and is thus potentially a valuable covariate for future association studies.
Collapse
Affiliation(s)
- Maximilian O Press
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Keisha D Carlson
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
| |
Collapse
|