1
|
Li X, Scanlon MJ, Yu J. Evolutionary patterns of DNA base composition and correlation to polymorphisms in DNA repair systems. Nucleic Acids Res 2015; 43:3614-25. [PMID: 25765652 PMCID: PMC4402523 DOI: 10.1093/nar/gkv197] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 02/24/2015] [Indexed: 11/15/2022] Open
Abstract
DNA base composition is a fundamental genome feature. However, the evolutionary pattern of base composition and its potential causes have not been well understood. Here, we report findings from comparative analysis of base composition at the whole-genome level across 2210 species, the polymorphic-site level across eight population comparison sets, and the mutation-site level in 12 mutation-tracking experiments. We first demonstrate that base composition follows the individual-strand base equality rule at the genome, chromosome and polymorphic-site levels. More intriguingly, clear separation of base-composition values calculated across polymorphic sites was consistently observed between basal and derived groups, suggesting common underlying mechanisms. Individuals in the derived groups show an A&T-increase/G&C-decrease pattern compared with the basal groups. Spontaneous and induced mutation experiments indicated these patterns of base composition change can emerge across mutation sites. With base-composition across polymorphic sites as a genome phenotype, genome scans with human 1000 Genomes and HapMap3 data identified a set of significant genomic regions enriched with Gene Ontology terms for DNA repair. For three DNA repair genes (BRIP1, PMS2P3 and TTDN), ENCODE data provided evidence for interaction between genomic regions containing these genes and regions containing the significant SNPs. Our findings provide insights into the mechanisms of genome evolution.
Collapse
Affiliation(s)
- Xianran Li
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Michael J Scanlon
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
2
|
Lartillot N. Phylogenetic patterns of GC-biased gene conversion in placental mammals and the evolutionary dynamics of recombination landscapes. Mol Biol Evol 2012; 30:489-502. [PMID: 23079417 DOI: 10.1093/molbev/mss239] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
GC-biased gene conversion (gBGC) is a major evolutionary force shaping genomic nucleotide landscapes, distorting the estimation of the strength of selection, and having potentially deleterious effects on genome-wide fitness. Yet, a global quantitative picture, at large evolutionary scale, of the relative strength of gBGC compared with selection and random drift is still lacking. Furthermore, owing to its dependence on the local recombination rate, gBGC results in modulations of the substitution patterns along genomes and across time which, if correctly interpreted, may yield quantitative insights into the long-term evolutionary dynamics of recombination landscapes. Deriving a model of the substitution process at putatively neutral nucleotide positions from population-genetics arguments, and accounting for among-lineage and among-gene effects, we propose a reconstruction of the variation in gBGC intensity at the scale of placental mammals, and of its scaling with body-size and karyotypic traits. Our results are compatible with a simple population genetics model relating gBGC to effective population size and recombination rate. In addition, among-gene variation and phylogenetic patterns of exon-specific levels of gBGC reveal the presence of rugged recombination landscapes, and suggest that short-lived recombination hot-spots are a general feature of placentals. Across placental mammals, variation in gBGC strength spans two orders of magnitude, at its lowest in apes, strongest in lagomorphs, microbats or tenrecs, and near or above the nearly neutral threshold in most other lineages. Combined with among-gene variation, such high levels of biased gene conversion are likely to significantly impact midly selected positions, and to represent a substantial mutation load. Altogether, our analysis suggests a more important role of gBGC in placental genome evolution, compared with what could have been anticipated from studies conducted in anthropoid primates.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
3
|
Lawrie DS, Petrov DA, Messer PW. Faster than neutral evolution of constrained sequences: the complex interplay of mutational biases and weak selection. Genome Biol Evol 2011; 3:383-95. [PMID: 21498884 PMCID: PMC3101017 DOI: 10.1093/gbe/evr032] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Comparative genomics has become widely accepted as the major framework for the ascertainment of functionally important regions in genomes. The underlying paradigm of this approach is that most of the functional regions are assumed to be under selective constraint, which in turn reduces the rate of evolution relative to neutrality. This assumption allows detection of functional regions through sequence conservation. However, constraint does not always lead to sequence conservation. When purifying selection is weak and mutation is biased, constrained regions can even evolve faster than neutral sequences and thus can appear to be under positive selection. Moreover, conservation estimates depend also on the orientation of selection relative to mutational biases and can vary over time. In the light of recent data of the ubiquity of mutational biases and weak selective forces, these effects should reduce the power of conservation analyses to define functional regions using comparative genomics data. We argue that the estimation of true mutational biases and the use of explicit evolutionary models are essential to improve methods inferring the action of natural selection and functionality in genome sequences.
Collapse
|
4
|
Relative mutation rates of each nucleotide for another estimated from allele frequency spectra at human gene loci. Genet Res (Camb) 2009; 91:293-303. [PMID: 19640324 DOI: 10.1017/s0016672309990164] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
This study aims to comprehensively examine the mutation rates of one base for another in human gene loci. In contrast to most previous efforts based on divergence data from untranscribed regions, the present study employs the basic theory of the reversible recurrent mutation model using large-scale, high-quality re-sequencing data from public databases of gene loci. Population mutation parameters (4Nnu and 4Nmu) are obtained for each pair of base substitutions. The estimated parameters show good strand reversal symmetry, supporting the existence of mutation-drift equilibrium. Analysis of specific gene regions including mRNA, coding sequence (CDS), 5'-untranslated region (5'-UTRs), 3'-UTR and intron shows that there are clear differences in the mutation rates of each base for another depending on the location of the base in question. Results from analyses that take the adjacent bases into account exhibit excellent strand reversal symmetry, confirming that the identity of an adjacent base influences mutation rates. The CpG to TpG (or CpG to CpA) substitution is found at a rate approximately seven-fold higher than the reverse transition in intron regions due to cytosine deamination, but the effect is strongly reduced in mRNA regions and almost entirely lost in 5'-UTRs. However, from the overall increased transitions in sites other than CpGs and the proportion of CpGs in the total sequence, CpG methylation is not the main factor responsible for the increased rate of transitions as compared with transversions. In this report, after adjusting average mutation rates to the sequence compositions, no substitution bias is found between A+T and C+G, indicating base composition equilibrium in human gene loci. Population differences are also identified between groups of people of African and European descent, presumably due to past population histories. By applying the basic theory of population genetics to re-sequenced data, this study contributes new, detailed information regarding mutations in human gene regions.
Collapse
|
5
|
Duret L, Arndt PF. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet 2008; 4:e1000071. [PMID: 18464896 PMCID: PMC2346554 DOI: 10.1371/journal.pgen.1000071] [Citation(s) in RCA: 254] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2007] [Accepted: 04/11/2008] [Indexed: 01/19/2023] Open
Abstract
Unraveling the evolutionary forces responsible for variations of neutral substitution patterns among taxa or along genomes is a major issue for detecting selection within sequences. Mammalian genomes show large-scale regional variations of GC-content (the isochores), but the substitution processes at the origin of this structure are poorly understood. We analyzed the pattern of neutral substitutions in 1 Gb of primate non-coding regions. We show that the GC-content toward which sequences are evolving is strongly negatively correlated to the distance to telomeres and positively correlated to the rate of crossovers (R2 = 47%). This demonstrates that recombination has a major impact on substitution patterns in human, driving the evolution of GC-content. The evolution of GC-content correlates much more strongly with male than with female crossover rate, which rules out selectionist models for the evolution of isochores. This effect of recombination is most probably a consequence of the neutral process of biased gene conversion (BGC) occurring within recombination hotspots. We show that the predictions of this model fit very well with the observed substitution patterns in the human genome. This model notably explains the positive correlation between substitution rate and recombination rate. Theoretical calculations indicate that variations in population size or density in recombination hotspots can have a very strong impact on the evolution of base composition. Furthermore, recombination hotspots can create strong substitution hotspots. This molecular drive affects both coding and non-coding regions. We therefore conclude that along with mutation, selection and drift, BGC is one of the major factors driving genome evolution. Our results also shed light on variations in the rate of crossover relative to non-crossover events, along chromosomes and according to sex, and also on the conservation of hotspot density between human and chimp. Mammalian genomes show a very strong heterogeneity of base composition along chromosomes (the so-called isochores). The functional significance of these peculiar genomic landscapes is highly debated: do isochores confer some selective advantage, or are they simply the by-product of neutral evolutionary processes? To resolve this issue, we analyzed the pattern of substitution in the human genome by comparison with chimpanzee and macaque. We show that the evolution of base composition (GC-content) is essentially determined by the rate of recombination. This effect appears to be much stronger in male than in female germline, which rules out selective explanations for the evolution of isochores. We show that this impact of recombination is most probably a consequence of the process of biased gene conversion (BGC). This neutral process mimics the action of selection and can induce strong substitution hotspots within recombination hotspots, sometimes leading to the fixation of deleterious mutations. BGC appears to be one of the major factors driving genome evolution. It is therefore essential to take this process into account if we want to be able to interpret genome sequences.
Collapse
Affiliation(s)
- Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Villeurbanne, France
- * E-mail: (LD); (PFA)
| | - Peter F. Arndt
- Department for Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
- * E-mail: (LD); (PFA)
| |
Collapse
|
6
|
Lunter G, Rocco A, Mimouni N, Heger A, Caldeira A, Hein J. Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Res 2007; 18:298-309. [PMID: 18073381 DOI: 10.1101/gr.6725608] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Sequence alignment underpins all of comparative genomics, yet it remains an incompletely solved problem. In particular, the statistical uncertainty within inferred alignments is often disregarded, while parametric or phylogenetic inferences are considered meaningless without confidence estimates. Here, we report on a theoretical and simulation study of pairwise alignments of genomic DNA at human-mouse divergence. We find that >15% of aligned bases are incorrect in existing whole-genome alignments, and we identify three types of alignment error, each leading to systematic biases in all algorithms considered. Careful modeling of the evolutionary process improves alignment quality; however, these improvements are modest compared with the remaining alignment errors, even with exact knowledge of the evolutionary model, emphasizing the need for statistical approaches to account for uncertainty. We develop a new algorithm, Marginalized Posterior Decoding (MPD), which explicitly accounts for uncertainties, is less biased and more accurate than other algorithms we consider, and reduces the proportion of misaligned bases by a third compared with the best existing algorithm. To our knowledge, this is the first nonheuristic algorithm for DNA sequence alignment to show robust improvements over the classic Needleman-Wunsch algorithm. Despite this, considerable uncertainty remains even in the improved alignments. We conclude that a probabilistic treatment is essential, both to improve alignment quality and to quantify the remaining uncertainty. This is becoming increasingly relevant with the growing appreciation of the importance of noncoding DNA, whose study relies heavily on alignments. Alignment errors are inevitable, and should be considered when drawing conclusions from alignments. Software and alignments to assist researchers in doing this are provided at http://genserv.anat.ox.ac.uk/grape/.
Collapse
Affiliation(s)
- Gerton Lunter
- MRC Functional Genetics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom.
| | | | | | | | | | | |
Collapse
|
7
|
Dreszer TR, Wall GD, Haussler D, Pollard KS. Biased clustered substitutions in the human genome: the footprints of male-driven biased gene conversion. Genome Res 2007; 17:1420-30. [PMID: 17785536 PMCID: PMC1987345 DOI: 10.1101/gr.6395807] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We examined fixed substitutions in the human lineage since divergence from the common ancestor with the chimpanzee, and determined what fraction are AT to GC (weak-to-strong). Substitutions that are densely clustered on the chromosomes show a remarkable excess of weak-to-strong "biased" substitutions. These unexpected biased clustered substitutions (UBCS) are common near the telomeres of all autosomes but not the sex chromosomes. Regions of extreme bias are enriched for genes. Human and chimp orthologous regions show a striking similarity in the shape and magnitude of their respective UBCS maps, suggesting a relatively stable force leads to clustered bias. The strong and stable signal near telomeres may have participated in the evolution of isochores. One exception to the UBCS pattern found in all autosomes is chromosome 2, which shows a UBCS peak midchromosome, mapping to the fusion site of two ancestral chromosomes. This provides evidence that the fusion occurred as recently as 740,000 years ago and no more than approximately 3 million years ago. No biased clustering was found in SNPs, suggesting that clusters of biased substitutions are selected from mutations. UBCS is strongly correlated with male (and not female) recombination rates, which explains the lack of UBCS signal on chromosome X. These observations support the hypothesis that biased gene conversion (BGC), specifically in the male germline, played a significant role in the evolution of the human genome.
Collapse
MESH Headings
- Animals
- Chromosomes, Human, Pair 2/genetics
- Chromosomes, Human, X/genetics
- Chromosomes, Human, Y/genetics
- Evolution, Molecular
- Female
- Gene Conversion
- Gene Fusion
- Genome, Human
- Humans
- Male
- Models, Genetic
- Pan troglodytes/genetics
- Polymorphism, Single Nucleotide
- Recombination, Genetic
- Sex Characteristics
- Species Specificity
- Telomere/genetics
- Time Factors
Collapse
Affiliation(s)
- Timothy R. Dreszer
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Gregory D. Wall
- Department of Statistics, University of California, Davis, California 95616, USA
| | - David Haussler
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, California 95064, USA
- Corresponding authors.E-mail ; fax (831) 459-1809.E-mail ; fax (530) 754-9658
| | - Katherine S. Pollard
- Department of Statistics, University of California, Davis, California 95616, USA
- UC Davis Genome Center, University of California, Davis, California 95616, USA
- Corresponding authors.E-mail ; fax (831) 459-1809.E-mail ; fax (530) 754-9658
| |
Collapse
|
8
|
Abstract
The vertebrate genome is a mosaic of GC-poor and GC-rich isochores, megabase-sized DNA regions of fairly homogeneous base composition that differ in relative amount, gene density, gene expression, replication timing, and recombination frequency. At the emergence of warm-blooded vertebrates, the gene-rich, moderately GC-rich isochores of the cold-blooded ancestors underwent a GC increase. This increase was similar in mammals and birds and was maintained during the evolution of mammalian and avian orders. Neither the GC increase nor its conservation can be accounted for by the random fixation of neutral or nearly neutral single-nucleotide changes (i.e., the vast majority of nucleotide substitutions) or by a biased gene conversion process occurring at random genome locations. Both phenomena can be explained, however, by the neoselectionist theory of genome evolution that is presented here. This theory fully accepts Ohta's nearly neutral view of point mutations but proposes in addition (i) that the AT-biased mutational input present in vertebrates pushes some DNA regions below a certain GC threshold; (ii) that these lower GC levels cause regional changes in chromatin structure that lead to deleterious effects on replication and transcription; and (iii) that the carriers of these changes undergo negative (purifying) selection, the final result being a compositional conservation of the original isochore pattern in the surviving population. Negative selection may also largely explain the GC increase accompanying the emergence of warm-blooded vertebrates. In conclusion, the neoselectionist theory not only provides a solution to the neutralist/selectionist debate but also introduces an epigenomic component in genome evolution.
Collapse
Affiliation(s)
- Giorgio Bernardi
- Molecular Evolution Laboratory, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy.
| |
Collapse
|
9
|
Directionality of point mutation and 5-methylcytosine deamination rates in the chimpanzee genome. BMC Genomics 2006; 7:316. [PMID: 17166280 PMCID: PMC1764022 DOI: 10.1186/1471-2164-7-316] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2006] [Accepted: 12/13/2006] [Indexed: 12/02/2022] Open
Abstract
Background The pattern of point mutation is important for studying mutational mechanisms, genome evolution, and diseases. Previous studies of mutation direction were largely based on substitution data from a limited number of loci. To date, there is no genome-wide analysis of mutation direction or methylation-dependent transition rates in the chimpanzee or its categorized genomic regions. Results In this study, we performed a detailed examination of mutation direction in the chimpanzee genome and its categorized genomic regions using 588,918 SNPs whose ancestral alleles could be inferred by mapping them to human genome sequences. The C→T (G→A) changes occurred most frequently in the chimpanzee genome. Each type of transition occurred approximately four times more frequently than each type of transversion. Notably, the frequency of C→T (G→A) was the highest in exons among the genomic categories regardless of whether we calculated directly, normalized with the nucleotide content, or removed the SNPs involved in the CpG effect. Moreover, the directionality of the point mutation in exons and CpG islands were opposite relative to their corresponding intergenic regions, indicating that different forces govern the nucleotide changes. Our analysis suggests that the GC content is not in equilibrium in the chimpanzee genome. Further quantitative analysis revealed that the 5-methylcytosine deamination rates at CpG sites were highly dependent on the local GC content and the lengths of SNP flanking sequences and varied among categorized genomic regions. Conclusion We present the first mutational spectrum, estimated by three different approaches, in the chimpanzee genome. Our results provide detailed information on recent nucleotide changes and methylation-dependent transition rates in the chimpanzee genome after its split from the human. These results have important implications for understanding genome composition evolution, mechanisms of point mutation, and other genetic factors such as selection, biased codon usage, biased gene conversion, and recombination.
Collapse
|
10
|
Singh ND, Arndt PF, Petrov DA. Minor shift in background substitutional patterns in the Drosophila saltans and willistoni lineages is insufficient to explain GC content of coding sequences. BMC Biol 2006; 4:37. [PMID: 17049096 PMCID: PMC1626080 DOI: 10.1186/1741-7007-4-37] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2006] [Accepted: 10/18/2006] [Indexed: 11/10/2022] Open
Abstract
Background Several lines of evidence suggest that codon usage in the Drosophila saltans and D. willistoni lineages has shifted towards a less frequent use of GC-ending codons. Introns in these lineages show a parallel shift toward a lower GC content. These patterns have been alternatively ascribed to either a shift in mutational patterns or changes in the definition of preferred and unpreferred codons in these lineages. Results and discussion To gain additional insight into this question, we quantified background substitutional patterns in the saltans/willistoni group using inactive copies of a novel, Q-like retrotransposable element. We demonstrate that the pattern of background substitutions in the saltans/willistoni lineage has shifted to a significant degree, primarily due to changes in mutational biases. These differences predict a lower equilibrium GC content in the genomes of the saltans/willistoni species compared with that in the D. melanogaster species group. The magnitude of the difference can readily account for changes in intronic GC content, but it appears insufficient to explain changes in codon usage within the saltans/willistoni lineage. Conclusion We suggest that the observed changes in codon usage in the saltans/willistoni clade reflects either lineage-specific changes in the definitions of preferred and unpreferred codons, or a weaker selective pressure on codon bias in this lineage.
Collapse
Affiliation(s)
- Nadia D Singh
- Department of Biological Sciences, Stanford University, 371 Serra Mall, Stanford, CA 94305, USA
| | - Peter F Arndt
- Max Planck for Molecular Genetics, 14195 Berlin, Germany
| | - Dmitri A Petrov
- Department of Biological Sciences, Stanford University, 371 Serra Mall, Stanford, CA 94305, USA
| |
Collapse
|