1
|
Jackson EK, Bellott DW, Skaletsky H, Page DC. GC-biased gene conversion in X-chromosome palindromes conserved in human, chimpanzee, and rhesus macaque. G3 GENES|GENOMES|GENETICS 2021; 11:6317831. [PMID: 34849781 PMCID: PMC8981503 DOI: 10.1093/g3journal/jkab224] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/28/2021] [Indexed: 12/03/2022]
Abstract
Gene conversion is GC-biased across a wide range of taxa. Large palindromes on mammalian
sex chromosomes undergo frequent gene conversion that maintains arm-to-arm sequence
identity greater than 99%, which may increase their susceptibility to the effects of
GC-biased gene conversion. Here, we demonstrate a striking history of GC-biased gene
conversion in 12 palindromes conserved on the X chromosomes of human, chimpanzee, and
rhesus macaque. Primate X-chromosome palindrome arms have significantly higher GC content
than flanking single-copy sequences. Nucleotide replacements that occurred in human and
chimpanzee palindrome arms over the past 7 million years are one-and-a-half times as
GC-rich as the ancestral bases they replaced. Using simulations, we show that our observed
pattern of nucleotide replacements is consistent with GC-biased gene conversion with a
magnitude of 70%, similar to previously reported values based on analyses of human
meioses. However, GC-biased gene conversion since the divergence of human and rhesus
macaque explains only a fraction of the observed difference in GC content between
palindrome arms and flanking sequence, suggesting that palindromes are older than 29
million years and/or had elevated GC content at the time of their formation. This work
supports a greater than 2:1 preference for GC bases over AT bases during gene conversion
and demonstrates that the evolution and composition of mammalian sex chromosome
palindromes is strongly influenced by GC-biased gene conversion.
Collapse
Affiliation(s)
- Emily K Jackson
- Whitehead Institute, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | - Helen Skaletsky
- Whitehead Institute, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, MA 02142, USA
| | - David C Page
- Whitehead Institute, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
2
|
Lindsay SJ, Rahbari R, Kaplanis J, Keane T, Hurles ME. Similarities and differences in patterns of germline mutation between mice and humans. Nat Commun 2019; 10:4053. [PMID: 31492841 PMCID: PMC6731245 DOI: 10.1038/s41467-019-12023-w] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Accepted: 08/02/2019] [Indexed: 01/26/2023] Open
Abstract
Whole genome sequencing (WGS) studies have estimated the human germline mutation rate per basepair per generation (~1.2 × 10−8) to be higher than in mice (3.5–5.4 × 10−9). In humans, most germline mutations are paternal in origin and numbers of mutations per offspring increase with paternal and maternal age. Here we estimate germline mutation rates and spectra in six multi-sibling mouse pedigrees and compare to three multi-sibling human pedigrees. In both species we observe a paternal mutation bias, a parental age effect, and a highly mutagenic first cell division contributing to the embryo. We also observe differences between species in mutation spectra, in mutation rates per cell division, and in the parental bias of mutations in early embryogenesis. These differences between species likely result from both species-specific differences in cellular genealogies of the germline, as well as biological differences within the same stage of embryogenesis or gametogenesis. Estimates of mutation rates differ between species. Here, Lindsay et al. perform side-by-side analyses of germline mutation rates using multi-sibling mouse and human pedigrees and find different mutation rates between species, also stratified by sex and temporal stage of mutation acquisition.
Collapse
Affiliation(s)
| | | | | | - Thomas Keane
- Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | | |
Collapse
|
3
|
Mugal CF, Weber CC, Ellegren H. GC-biased gene conversion links the recombination landscape and demography to genomic base composition. Bioessays 2015; 37:1317-26. [DOI: 10.1002/bies.201500058] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Carina F. Mugal
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| | - Claudia C. Weber
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
- Department of Biology; Center for Computational Genetics and Genomics; Temple University; Philadelphia PA USA
| | - Hans Ellegren
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| |
Collapse
|
4
|
Begum T, Ghosh TC. Elucidating the genotype-phenotype relationships and network perturbations of human shared and specific disease genes from an evolutionary perspective. Genome Biol Evol 2014; 6:2741-53. [PMID: 25287147 PMCID: PMC4224346 DOI: 10.1093/gbe/evu220] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To date, numerous studies have been attempted to determine the extent of variation in evolutionary rates between human disease and nondisease (ND) genes. In our present study, we have considered human autosomal monogenic (Mendelian) disease genes, which were classified into two groups according to the number of phenotypic defects, that is, specific disease (SPD) gene (one gene: one defect) and shared disease (SHD) gene (one gene: multiple defects). Here, we have compared the evolutionary rates of these two groups of genes, that is, SPD genes and SHD genes with respect to ND genes. We observed that the average evolutionary rates are slow in SHD group, intermediate in SPD group, and fast in ND group. Group-to-group evolutionary rate differences remain statistically significant regardless of their gene expression levels and number of defects. We demonstrated that disease genes are under strong selective constraint if they emerge through edgetic perturbation or drug-induced perturbation of the interactome network, show tissue-restricted expression, and are involved in transmembrane transport. Among all the factors, our regression analyses interestingly suggest the independent effects of 1) drug-induced perturbation and 2) the interaction term of expression breadth and transmembrane transport on protein evolutionary rates. We reasoned that the drug-induced network disruption is a combination of several edgetic perturbations and, thus, has more severe effect on gene phenotypes.
Collapse
Affiliation(s)
- Tina Begum
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
| | | |
Collapse
|
5
|
Provata A, Nicolis C, Nicolis G. Complexity measures for the evolutionary categorization of organisms. Comput Biol Chem 2014; 53 Pt A:5-14. [PMID: 25216557 DOI: 10.1016/j.compbiolchem.2014.08.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 01/17/2023]
Abstract
Complexity measures are used to compare the genomic characteristics of five organisms belonging to distinct classes spanning the evolutionary tree: higher eukaryotes, amoebae, unicellular eukaryotes and bacteria. The comparisons are undertaken using the full four-letter alphabet and the coarse grained two-letter alphabets AG-CT and AT-CG. We show that the conditional probability matrix for the four-letter and AT-CG alphabet is markedly asymmetric in eukaryotes while it is nearly symmetric in bacterial genomes. Spatial asymmetry is revealed in the four-letter alphabet, signifying that the probability fluxes are nonvanishing and thus the reading sense of a sequence is irreversible for all organisms. Calculations of the block entropy and excess entropy demonstrate that the human genome accommodates better all possible block configurations, especially for long blocks. With respect to point-to-point details and to spatial arrangement of blocks the exit distance distributions from a particular letter demonstrate long distance characteristics in the eukaryotic sequences for all three alphabets, while the bacterial (prokaryotic) genomes deviate indicating short range characteristics. Overall, the conditional probability, the fluxes, the block entropy content and the exit distance distributions can be used as markers, discriminating between eukaryotic and prokaryotic DNA, allowing in many cases to discern details related to finer classes. In all cases the reduction from four letters to two masks some important statistical and spatial properties, with the AT-CG alphabet having higher ability of discrimination than the AG-CT one. In particular, the AT-CG alphabet reduction accentuates the CpG related properties (conditional probabilities w32, long ranged exit distance distribution for A and T nucleotides), but masks sequence asymmetry and irreversibility in all examined organisms.
Collapse
Affiliation(s)
- A Provata
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", 15310 Athens, Greece.
| | - C Nicolis
- Institut Royal Météorogique de Belgique, 3 Avenue Circulaire, 1180 Bruxelles, Belgium.
| | - G Nicolis
- Interdisciplinary Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Campus Plaine, C.P. 231, 1050 Bruxelles, Belgium.
| |
Collapse
|
6
|
Weber CC, Boussau B, Romiguier J, Jarvis ED, Ellegren H. Evidence for GC-biased gene conversion as a driver of between-lineage differences in avian base composition. Genome Biol 2014; 15:549. [PMID: 25496599 PMCID: PMC4290106 DOI: 10.1186/s13059-014-0549-1] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Accepted: 11/19/2014] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND While effective population size (Ne) and life history traits such as generation time are known to impact substitution rates, their potential effects on base composition evolution are less well understood. GC content increases with decreasing body mass in mammals, consistent with recombination-associated GC biased gene conversion (gBGC) more strongly impacting these lineages. However, shifts in chromosomal architecture and recombination landscapes between species may complicate the interpretation of these results. In birds, interchromosomal rearrangements are rare and the recombination landscape is conserved, suggesting that this group is well suited to assess the impact of life history on base composition. RESULTS Employing data from 45 newly and 3 previously sequenced avian genomes covering a broad range of taxa, we found that lineages with large populations and short generations exhibit higher GC content. The effect extends to both coding and non-coding sites, indicating that it is not due to selection on codon usage. Consistent with recombination driving base composition, GC content and heterogeneity were positively correlated with the rate of recombination. Moreover, we observed ongoing increases in GC in the majority of lineages. CONCLUSIONS Our results provide evidence that gBGC may drive patterns of nucleotide composition in avian genomes and are consistent with more effective gBGC in large populations and a greater number of meioses per unit time; that is, a shorter generation time. Thus, in accord with theoretical predictions, base composition evolution is substantially modulated by species life history.
Collapse
Affiliation(s)
- Claudia C Weber
- />Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden
| | - Bastien Boussau
- />Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558 Villeurbanne, France
| | | | - Erich D Jarvis
- />Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC USA
| | - Hans Ellegren
- />Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden
| |
Collapse
|
7
|
Comparative analysis of context-dependent mutagenesis using human and mouse models. BIOMED RESEARCH INTERNATIONAL 2013; 2013:989410. [PMID: 24058920 PMCID: PMC3766559 DOI: 10.1155/2013/989410] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Accepted: 07/19/2013] [Indexed: 11/17/2022]
Abstract
Substitution rates strongly depend on their nucleotide context. One of the most studied examples is the excess of C > T mutations in the CG context in various groups of organisms, including vertebrates. Studies on the molecular mechanisms underlying this mutation regularity have provided insights into evolution, mutagenesis, and cancer development. Recently several other hypermutable motifs were identified in the human genome. There is an increased frequency of T > C mutations in the second position of the words ATTG and ATAG and an increased frequency of A > C mutations in the first position of the word ACAA. For a better understanding of evolution, it is of interest whether these mutation regularities are human specific or present in other vertebrates, as their presence might affect the validity of currently used substitution models and molecular clocks. A comprehensive analysis of mutagenesis in 4 bp mutation contexts requires a vast amount of mutation data. Such data may be derived from the comparisons of individual genomes or from single nucleotide polymorphism (SNP) databases. Using this approach, we performed a systematical comparison of mutation regularities within 2-4 bp contexts in Mus musculus and Homo sapiens and uncovered that even closely related organisms may have notable differences in context-dependent mutation regularities.
Collapse
|
8
|
Clément Y, Arndt PF. Meiotic recombination strongly influences GC-content evolution in short regions in the mouse genome. Mol Biol Evol 2013; 30:2612-8. [PMID: 24030552 DOI: 10.1093/molbev/mst154] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Meiotic recombination is known to influence GC-content evolution in large regions of mammalian genomes by favoring the fixation of G and C alleles and increasing the rate of A/T to G/C substitutions. This process is known as GC-biased gene conversion (gBGC). Until recently, genome-wide measures of fine-scale recombination activity were unavailable in mice. Additionally, comparative studies focusing on mouse were limited as the closest organism with its genome fully sequenced was rat. Here, we make use of the recent mapping of double strand breaks (DSBs), the first step of meiotic recombination, in the mouse genome and of the sequencing of mouse closely related subspecies to analyze the fine-scale evolutionary signature of meiotic recombination on GC-content evolution in recombination hotspots, short regions that undergo extreme rates of recombination. We measure substitution rates around DSB hotspots and observe that gBGC is affecting a very short region (≈ 1 kbp) in length around these hotspots. Furthermore, we can infer that the locations of hotspots evolved rapidly during mouse evolution.
Collapse
Affiliation(s)
- Yves Clément
- Montpellier SupAgro, Unité Mixte de Recherche 1334, Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, Montpellier, France
| | | |
Collapse
|
9
|
Lartillot N. Interaction between selection and biased gene conversion in mammalian protein-coding sequence evolution revealed by a phylogenetic covariance analysis. Mol Biol Evol 2012; 30:356-68. [PMID: 23024185 DOI: 10.1093/molbev/mss231] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
According to the nearly-neutral model, variation in long-term effective population size among species should result in correlated variation in the ratio of nonsynonymous over synonymous substitution rates (dN/dS). Previous empirical investigations in mammals have been consistent with this prediction, suggesting an important role for nearly-neutral effects on protein-coding sequence evolution. GC-biased gene conversion (gBGC), on the other hand, is increasingly recognized as a major evolutionary force shaping genome nucleotide composition. When sufficiently strong compared with random drift, gBGC may significantly interfere with a nearly-neutral regime and impact dN/dS in a complex manner. Here, we investigate the phylogenetic correlations between dN/dS, the equilibrium GC composition (GC*), and several life-history and karyotypic traits in placental mammals. We show that the equilibrium GC composition decreases with body mass and increases with the number of chromosomes, suggesting a modulation of the strength of biased gene conversion due to changes in effective population size and genome-wide recombination rate. The variation in dN/dS is complex and only partially fits the prediction of the nearly-neutral theory. However, specifically restricting estimation of the dN/dS ratio on GC-conservative transversions, which are immune from gBGC, results in correlations that are more compatible with a nearly-neutral interpretation. Our investigation indicates the presence of complex interactions between selection and biased gene conversion and suggests that further mechanistic development is warranted, to tease out mutation, selection, drift, and conversion.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
10
|
Cooper DN, Bacolla A, Férec C, Vasquez KM, Kehrer-Sawatzki H, Chen JM. On the sequence-directed nature of human gene mutation: the role of genomic architecture and the local DNA sequence environment in mediating gene mutations underlying human inherited disease. Hum Mutat 2011; 32:1075-99. [PMID: 21853507 PMCID: PMC3177966 DOI: 10.1002/humu.21557] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2011] [Accepted: 06/17/2011] [Indexed: 12/21/2022]
Abstract
Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher order features of the genomic architecture. The human genome is now recognized to contain "pervasive architectural flaws" in that certain DNA sequences are inherently mutation prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here, we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of noncanonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair and may serve to increase mutation frequencies in generalized fashion (i.e., both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom.
| | | | | | | | | | | |
Collapse
|
11
|
Late replicating domains are highly recombining in females but have low male recombination rates: implications for isochore evolution. PLoS One 2011; 6:e24480. [PMID: 21949720 PMCID: PMC3176772 DOI: 10.1371/journal.pone.0024480] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 08/11/2011] [Indexed: 01/01/2023] Open
Abstract
In mammals sequences that are either late replicating or highly recombining have high rates of evolution at putatively neutral sites. As early replicating domains and highly recombining domains both tend to be GC rich we a priori expect these two variables to covary. If so, the relative contribution of either of these variables to the local neutral substitution rate might have been wrongly estimated owing to covariance with the other. Against our expectations, we find that sex-averaged recombination rates show little or no correlation with replication timing, suggesting that they are independent determinants of substitution rates. However, this result masks significant sex-specific complexity: late replicating domains tend to have high recombination rates in females but low recombination rates in males. That these trends are antagonistic explains why sex-averaged recombination is not correlated with replication timing. This unexpected result has several important implications. First, although both male and female recombination rates covary significantly with intronic substitution rates, the magnitude of this correlation is moderately underestimated for male recombination and slightly overestimated for female recombination, owing to covariance with replicating timing. Second, the result could explain why male recombination is strongly correlated with GC content but female recombination is not. If to explain the correlation between GC content and replication timing we suppose that late replication forces reduced GC content, then GC promotion by biased gene conversion during female recombination is partly countered by the antagonistic effect of later replicating sequence tending increase AT content. Indeed, the strength of the correlation between female recombination rate and local GC content is more than doubled by control for replication timing. Our results underpin the need to consider sex-specific recombination rates and potential covariates in analysis of GC content and rates of evolution.
Collapse
|
12
|
Capra JA, Pollard KS. Substitution patterns are GC-biased in divergent sequences across the metazoans. Genome Biol Evol 2011; 3:516-27. [PMID: 21670083 PMCID: PMC3138425 DOI: 10.1093/gbe/evr051] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The fastest-evolving regions in the human and chimpanzee genomes show a remarkable excess of weak (A,T) to strong (G,C) nucleotide substitutions since divergence from their common ancestor. We investigated the phylogenetic extent and possible causes of this weak to strong (W→S) bias in divergent sequences (BDS) using recently sequenced genomes and recombination maps from eight trios of eukaryotic species. To quantify evidence for BDS, we inferred substitution histories using an efficient maximum likelihood approach with a context-dependent evolutionary model. We then annotated all lineage-specific substitutions in terms of W→S bias and density on the chromosomes. Finally, we used the inferred substitutions to calculate a BDS score—a log odds ratio between substitution type and density—and assessed its statistical significance with Fisher's exact test. Applying this approach, we found significant BDS in the coding and noncoding sequence of human, mouse, dog, stickleback, fruit fly, and worm. We also observed a significant lack of W→S BDS in chicken and yeast. The BDS score varies between species and across the chromosomes within each species. It is most strongly correlated with different genomic features in different species, but a strong correlation with recombination rates is found in several species. Our results demonstrate that a W→S substitution bias in fast-evolving sequences is a widespread phenomenon. The patterns of BDS observed suggest that a recombination-associated process, such as GC-biased gene conversion, is involved in the production of the bias in many species, but the strength of the BDS likely depends on many factors, including genome stability, variability in recombination rate over time and across the genome, the frequency of meiosis, and the amount of outcrossing in each species.
Collapse
Affiliation(s)
- John A. Capra
- Gladstone Institutes, University of California, San Francisco
| | - Katherine S. Pollard
- Gladstone Institutes, University of California, San Francisco
- Division of Biostatistics & Institute for Human Genetics, University of California, San Francisco
- Corresponding author: E-mail:
| |
Collapse
|