1
|
Seplyarskiy V, Koch EM, Lee DJ, Lichtman JS, Luan HH, Sunyaev SR. A mutation rate model at the basepair resolution identifies the mutagenic effect of polymerase III transcription. Nat Genet 2023; 55:2235-2242. [PMID: 38036792 PMCID: PMC11348951 DOI: 10.1038/s41588-023-01562-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/06/2023] [Indexed: 12/02/2023]
Abstract
De novo mutations occur at substantially different rates depending on genomic location, sequence context and DNA strand. The success of methods to estimate selection intensity, infer demographic history and map rare disease genes, depends strongly on assumptions about the local mutation rate. Here we present Roulette, a genome-wide mutation rate model at basepair resolution that incorporates known determinants of local mutation rate. Roulette is shown to be more accurate than existing models. We use Roulette to refine the estimates of population growth within Europe by incorporating the full range of human mutation rates. The analysis of significant deviations from the model predictions revealed a tenfold increase in mutation rate in nearly all genes transcribed by polymerase III (Pol III), suggesting a new mutagenic mechanism. We also detected an elevated mutation rate within transcription factor binding sites restricted to sites actively used in testis and residing in promoters.
Collapse
Affiliation(s)
- Vladimir Seplyarskiy
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Evan M Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Daniel J Lee
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Joshua S Lichtman
- NGM Biopharmaceuticals Inc., South San Francisco, CA, USA
- Soleil Labs, South San Francisco, CA, USA
| | - Harding H Luan
- NGM Biopharmaceuticals Inc., South San Francisco, CA, USA
- Soleil Labs, South San Francisco, CA, USA
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
2
|
Bertram H, Wilhelmi S, Rajavel A, Boelhauve M, Wittmann M, Ramzan F, Schmitt AO, Gültas M. Comparative Investigation of Coincident Single Nucleotide Polymorphisms Underlying Avian Influenza Viruses in Chickens and Ducks. BIOLOGY 2023; 12:969. [PMID: 37508399 PMCID: PMC10375970 DOI: 10.3390/biology12070969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/26/2023] [Accepted: 07/04/2023] [Indexed: 07/30/2023]
Abstract
Avian influenza is a severe viral infection that has the potential to cause human pandemics. In particular, chickens are susceptible to many highly pathogenic strains of the virus, resulting in significant losses. In contrast, ducks have been reported to exhibit rapid and effective innate immune responses to most avian influenza virus (AIV) infections. To explore the distinct genetic programs that potentially distinguish the susceptibility/resistance of both species to AIV, the investigation of coincident SNPs (coSNPs) and their differing causal effects on gene functions in both species is important to gain novel insight into the varying immune-related responses of chickens and ducks. By conducting a pairwise genome alignment between these species, we identified coSNPs and their respective effect on AIV-related differentially expressed genes (DEGs) in this study. The examination of these genes (e.g., CD74, RUBCN, and SHTN1 for chickens and ABCA3, MAP2K6, and VIPR2 for ducks) reveals their high relevance to AIV. Further analysis of these genes provides promising effector molecules (such as IκBα, STAT1/STAT3, GSK-3β, or p53) and related key signaling pathways (such as NF-κB, JAK/STAT, or Wnt) to elucidate the complex mechanisms of immune responses to AIV infections in both chickens and ducks.
Collapse
Affiliation(s)
- Hendrik Bertram
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany; (H.B.)
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
| | - Selina Wilhelmi
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075 Göttingen, Germany
| | - Abirami Rajavel
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075 Göttingen, Germany
| | - Marc Boelhauve
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany; (H.B.)
| | - Margareta Wittmann
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany; (H.B.)
| | - Faisal Ramzan
- Institute of Animal and Dairy Sciences, University of Agriculture, Faisalabad 38000, Pakistan
| | - Armin Otto Schmitt
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075 Göttingen, Germany
| | - Mehmet Gültas
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany; (H.B.)
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075 Göttingen, Germany
| |
Collapse
|
3
|
Johri P, Eyre-Walker A, Gutenkunst RN, Lohmueller KE, Jensen JD. On the prospect of achieving accurate joint estimation of selection with population history. Genome Biol Evol 2022; 14:evac088. [PMID: 35675379 PMCID: PMC9254643 DOI: 10.1093/gbe/evac088] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2022] [Indexed: 11/15/2022] Open
Abstract
As both natural selection and population history can affect genome-wide patterns of variation, disentangling the contributions of each has remained as a major challenge in population genetics. We here discuss historical and recent progress towards this goal-highlighting theoretical and computational challenges that remain to be addressed, as well as inherent difficulties in dealing with model complexity and model violations-and offer thoughts on potentially fruitful next steps.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | | | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, CA, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
4
|
Seplyarskiy VB, Sunyaev S. The origin of human mutation in light of genomic data. Nat Rev Genet 2021; 22:672-686. [PMID: 34163020 DOI: 10.1038/s41576-021-00376-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/06/2021] [Indexed: 02/05/2023]
Abstract
Despite years of active research into the role of DNA repair and replication in mutagenesis, surprisingly little is known about the origin of spontaneous human mutation in the germ line. With the advent of high-throughput sequencing, genome-scale data have revealed statistical properties of mutagenesis in humans. These properties include variation of the mutation rate and spectrum along the genome at different scales in relation to epigenomic features and dependency on parental age. Moreover, mutations originated in mothers are less frequent than mutations originated in fathers and have a distinct genomic distribution. Statistical analyses that interpret these patterns in the context of known biochemistry can provide mechanistic models of mutagenesis in humans.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
5
|
Goldberg ME, Harris K. Mutational signatures of replication timing and epigenetic modification persist through the global divergence of mutation spectra across the great ape phylogeny. Genome Biol Evol 2021; 14:6275268. [PMID: 33983415 PMCID: PMC8743035 DOI: 10.1093/gbe/evab104] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/07/2021] [Indexed: 11/17/2022] Open
Abstract
Great ape clades exhibit variation in the relative mutation rates of different three-base-pair genomic motifs, with closely related species having more similar mutation spectra than distantly related species. This pattern cannot be explained by classical demographic or selective forces, but imply that DNA replication fidelity has been perturbed in different ways on each branch of the great ape phylogeny. Here, we use whole-genome variation from 88 great apes to investigate whether these species’ mutation spectra are broadly differentiated across the entire genome, or whether mutation spectrum differences are driven by DNA compartments that have particular functional features or chromatin states. We perform principal component analysis (PCA) and mutational signature deconvolution on mutation spectra ascertained from compartments defined by features including replication timing and ancient repeat content, finding evidence for consistent species-specific mutational signatures that do not depend on which functional compartments the spectra are ascertained from. At the same time, we find that many compartments have their own characteristic mutational signatures that appear stable across the great ape phylogeny. For example, in a mutation spectrum PCA compartmentalized by replication timing, the second principal component explaining 21.2% of variation separates all species’ late-replicating regions from their early-replicating regions. Our results suggest that great ape mutation spectrum evolution is not driven by epigenetic changes that modify mutation rates in specific genomic regions, but instead by trans-acting mutational modifiers that affect mutagenesis across the whole genome fairly uniformly.
Collapse
Affiliation(s)
- Michael E Goldberg
- University of Washington Department of Genome Sciences, 3720 15th Ave NE, Seattle WA 98105, United States of America
| | - Kelley Harris
- University of Washington Department of Genome Sciences, 3720 15th Ave NE, Seattle WA 98105, United States of America.,Fred Hutchinson Cancer Center Computational Biology Division, 1100 Fairview Ave N, Seattle, WA 98109, United States of America
| |
Collapse
|
6
|
Lai YP, Ioerger TR. A statistical method to identify recombination in bacterial genomes based on SNP incompatibility. BMC Bioinformatics 2018; 19:450. [PMID: 30466385 PMCID: PMC6251179 DOI: 10.1186/s12859-018-2456-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 10/31/2018] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Phylogeny estimation for bacteria is likely to reflect their true evolutionary histories only if they are highly clonal. However, recombination events could occur during evolution for some species. The reconstruction of phylogenetic trees from an alignment without considering recombination could be misleading, since the relationships among strains in some parts of the genome might be different than in others. Using a single, global tree can create the appearance of homoplasy in recombined regions. Hence, the identification of recombination breakpoints is essential to better understand the evolutionary relationships of isolates among a bacterial population. RESULTS Previously, we have developed a method (called ACR) to detect potential breakpoints in an alignment by evaluating compatibility of polymorphic sites in a sliding window. To assess the statistical significance of candidate breakpoints, we propose an extension of the algorithm (ptACR) that applies a permutation test to generate a null distribution for comparing the average local compatibility. The performance of ptACR is evaluated on both simulated and empirical datasets. ptACR is shown to have similar sensitivity (true positive rate) but a lower false positive rate and higher F1 score compared to basic ACR. When used to analyze a collection of clinical isolates of Staphylococcus aureus, ptACR finds clear evidence of recombination events in this bacterial pathogen, and is able to identify statistically significant boundaries of chromosomal regions with distinct phylogenies. CONCLUSIONS ptACR is an accurate and efficient method for identifying genomic regions affected by recombination in bacterial genomes.
Collapse
Affiliation(s)
- Yi-Pin Lai
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Thomas R Ioerger
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
7
|
Seplyarskiy VB, Andrianova MA, Bazykin GA. APOBEC3A/B-induced mutagenesis is responsible for 20% of heritable mutations in the TpCpW context. Genome Res 2016; 27:175-184. [PMID: 27940951 PMCID: PMC5287224 DOI: 10.1101/gr.210336.116] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 12/01/2016] [Indexed: 12/18/2022]
Abstract
APOBEC3A/B cytidine deaminase is responsible for the majority of cancerous mutations in a large fraction of cancer samples. However, its role in heritable mutagenesis remains very poorly understood. Recent studies have demonstrated that both in yeast and in human cancerous cells, most APOBEC3A/B-induced mutations occur on the lagging strand during replication and on the nontemplate strand of transcribed regions. Here, we use data on rare human polymorphisms, interspecies divergence, and de novo mutations to study germline mutagenesis and to analyze mutations at nucleotide contexts prone to attack by APOBEC3A/B. We show that such mutations occur preferentially on the lagging strand and on nontemplate strands of transcribed regions. Moreover, we demonstrate that APOBEC3A/B-like mutations tend to produce strand-coordinated clusters, which are also biased toward the lagging strand. Finally, we show that the mutation rate is increased 3' of C→G mutations to a greater extent than 3' of C→T mutations, suggesting pervasive trans-lesion bypass of the APOBEC3A/B-induced damage. Our study demonstrates that 20% of C→T and C→G mutations in the TpCpW context-where W denotes A or T, segregating as polymorphisms in human population-or 1.4% of all heritable mutations are attributable to APOBEC3A/B activity.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia
| | - Maria A Andrianova
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia.,Lomonosov Moscow State University, Moscow 119234, Russia
| | - Georgii A Bazykin
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia.,Lomonosov Moscow State University, Moscow 119234, Russia.,Skolkovo Institute of Science and Technology, Skolkovo 143026, Russia
| |
Collapse
|
8
|
Harpak A, Bhaskar A, Pritchard JK. Mutation Rate Variation is a Primary Determinant of the Distribution of Allele Frequencies in Humans. PLoS Genet 2016; 12:e1006489. [PMID: 27977673 PMCID: PMC5157949 DOI: 10.1371/journal.pgen.1006489] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 11/16/2016] [Indexed: 01/06/2023] Open
Abstract
The site frequency spectrum (SFS) has long been used to study demographic history and natural selection. Here, we extend this summary by examining the SFS conditional on the alleles found at the same site in other species. We refer to this extension as the "phylogenetically-conditioned SFS" or cSFS. Using recent large-sample data from the Exome Aggregation Consortium (ExAC), combined with primate genome sequences, we find that human variants that occurred independently in closely related primate lineages are at higher frequencies in humans than variants with parallel substitutions in more distant primates. We show that this effect is largely due to sites with elevated mutation rates causing significant departures from the widely-used infinite sites mutation model. Our analysis also suggests substantial variation in mutation rates even among mutations involving the same nucleotide changes. In summary, we show that variable mutation rates are key determinants of the SFS in humans.
Collapse
Affiliation(s)
- Arbel Harpak
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Anand Bhaskar
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Howard Hughes Medical Institute, Stanford University, Stanford, California, United States of America
| | - Jonathan K. Pritchard
- Department of Biology, Stanford University, Stanford, California, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
- Howard Hughes Medical Institute, Stanford University, Stanford, California, United States of America
| |
Collapse
|
9
|
Smith TCA, Carr AM, Eyre-Walker AC. Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors? PeerJ 2016; 4:e2391. [PMID: 27688957 PMCID: PMC5036107 DOI: 10.7717/peerj.2391] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2016] [Accepted: 08/01/2016] [Indexed: 11/26/2022] Open
Abstract
Across independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likely to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ∼4% of all SNVs are errors in this dataset, but that the rate of error varies by thousands-of-fold between sites.
Collapse
Affiliation(s)
- Thomas C A Smith
- School of Life Sciences, University of Sussex , Brighton , East Sussex , United Kingdom
| | - Antony M Carr
- Genome Damage and Stability Centre, University of Sussex , Brighton , East Sussex , United Kingdom
| | - Adam C Eyre-Walker
- School of Life Sciences, University of Sussex , Brighton , East Sussex , United Kingdom
| |
Collapse
|
10
|
Purifying selection shapes the coincident SNP distribution of primate coding sequences. Sci Rep 2016; 6:27272. [PMID: 27255481 PMCID: PMC4891680 DOI: 10.1038/srep27272] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 05/17/2016] [Indexed: 12/13/2022] Open
Abstract
Genome-wide analysis has observed an excess of coincident single nucleotide polymorphisms (coSNPs) at human-chimpanzee orthologous positions, and suggested that this is due to cryptic variation in the mutation rate. While this phenomenon primarily corresponds with non-coding coSNPs, the situation in coding sequences remains unclear. Here we calculate the observed-to-expected ratio of coSNPs (coSNPO/E) to estimate the prevalence of human-chimpanzee coSNPs, and show that the excess of coSNPs is also present in coding regions. Intriguingly, coSNPO/E is much higher at zero-fold than at nonzero-fold degenerate sites; such a difference is due to an elevation of coSNPO/E at zero-fold degenerate sites, rather than a reduction at nonzero-fold degenerate ones. These trends are independent of chimpanzee subpopulation, population size, or sequencing techniques; and hold in broad generality across primates. We find that this discrepancy cannot fully explained by sequence contexts, shared ancestral polymorphisms, SNP density, and recombination rate, and that coSNPO/E in coding sequences is significantly influenced by purifying selection. We also show that selection and mutation rate affect coSNPO/E independently, and coSNPs tend to be less damaging and more correlated with human diseases than non-coSNPs. These suggest that coSNPs may represent a “signature” during primate protein evolution.
Collapse
|
11
|
Smith T, Ho G, Christodoulou J, Price EA, Onadim Z, Gauthier-Villars M, Dehainault C, Houdayer C, Parfait B, van Minkelen R, Lohman D, Eyre-Walker A. Extensive Variation in the Mutation Rate Between and Within Human Genes Associated with Mendelian Disease. Hum Mutat 2016; 37:488-94. [PMID: 26857394 DOI: 10.1002/humu.22967] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 01/25/2016] [Indexed: 01/05/2023]
Abstract
We have investigated whether the mutation rate varies between genes and sites using de novo mutations (DNMs) from three genes associated with Mendelian diseases (RB1, NF1, and MECP2). We show that the relative frequency of mutations at CpG dinucleotides relative to non-CpG sites varies between genes and relative to the genomic average. In particular we show that the rate of transition mutation at CpG sites relative to the rate of non-CpG transversion is substantially higher in our disease genes than amongst DNMs in general; the rate of CpG transition can be several hundred-fold greater than the rate of non-CpG transversion. We also show that the mutation rate varies significantly between sites of a particular mutational type, such as non-CpG transversion, within a gene. We estimate that for all categories of sites, except CpG transitions, there is at least a 30-fold difference in the mutation rate between the 10% of sites with the highest and lowest mutation rates. However, our best estimate is that the mutation rate varies by several hundred-fold variation. We suggest that the presence of hypermutable sites may be one reason certain genes are associated with disease.
Collapse
Affiliation(s)
- Thomas Smith
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Gladys Ho
- NSW Centre for Rett Syndrome Research, Western Sydney Genetics Program, Children's Hospital at Westmead, Sydney, Australia
| | - John Christodoulou
- NSW Centre for Rett Syndrome Research, Western Sydney Genetics Program, Children's Hospital at Westmead, Sydney, Australia.,Disciplines of Paediatrics and Child Health and Genetic Medicine, Sydney Medical School, University of Sydney, Sydney, Australia
| | - Elizabeth Ann Price
- Retinoblastoma Genetic Screening Unit, Barts Health NHS Trust, The Royal London Hospital, 80 Newark Street, London, United Kingdom
| | - Zerrin Onadim
- Retinoblastoma Genetic Screening Unit, Barts Health NHS Trust, The Royal London Hospital, 80 Newark Street, London, United Kingdom
| | | | | | - Claude Houdayer
- Service de Génétique, Institut Curie, Paris, France.,INSERM U830, centre de recherche de l'Institut Curie, Paris, France.,Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Beatrice Parfait
- EA7331, Faculté de Pharmacie de Paris, Université Paris Descartes, Sorbonne Paris Cité, Paris, France.,Service de Biochimie et de Génétique Moléculaire, Hôpital Cochin, AP-HP, Paris, France
| | - Rick van Minkelen
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
| | - Dietmar Lohman
- Institut für Humangenetik, Universitätsklinikum Essen, Universität Duisburg-Essen, Essen, Germany
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| |
Collapse
|
12
|
Seplyarskiy VB, Bazykin GA, Soldatov RA. Polymerase ζ Activity Is Linked to Replication Timing in Humans: Evidence from Mutational Signatures. Mol Biol Evol 2015; 32:3158-72. [PMID: 26376651 DOI: 10.1093/molbev/msv184] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Replication timing is an important determinant of germline mutation patterns, with a higher rate of point mutations in late replicating regions. Mechanisms underlying this association remain elusive. One of the suggested explanations is the activity of error-prone DNA polymerases in late-replicating regions. Polymerase zeta (pol ζ), an essential error-prone polymerase biased toward transversions, also has a tendency to produce dinucleotide mutations (DNMs), complex mutational events that simultaneously affect two adjacent nucleotides. Experimental studies have shown that pol ζ is strongly biased toward GC→AA/TT DNMs. Using primate divergence data, we show that the GC→AA/TT pol ζ mutational signature is the most frequent among DNMs, and its rate exceeds the mean rate of other DNM types by a factor of approximately 10. Unlike the overall rate of DNMs, the pol ζ signature drastically increases with the replication time in the human genome. Finally, the pol ζ signature is enriched in transcribed regions, and there is a strong prevalence of GC→TT over GC→AA DNMs on the nontemplate strand, indicating association with transcription. A recurrently occurring GC→TT DNM in HRAS and SOD1 genes causes the Costello syndrome and amyotrophic lateral sclerosis correspondently; we observe an approximately 1 kb long mutation hotspot enriched by transversions near these DNMs in both cases, suggesting a link between these diseases and pol ζ activity. This study uncovers the genomic preferences of pol ζ, shedding light on a novel cause of mutational heterogeneity along the genome.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Institute of Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow, Russia Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia Pirogov Russian National Research Medical University, Moscow, Russia
| | - Georgii A Bazykin
- Institute of Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow, Russia Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia Pirogov Russian National Research Medical University, Moscow, Russia
| | - Ruslan A Soldatov
- Institute of Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow, Russia Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
13
|
Teixeira JC, de Filippo C, Weihmann A, Meneu JR, Racimo F, Dannemann M, Nickel B, Fischer A, Halbwax M, Andre C, Atencia R, Meyer M, Parra G, Pääbo S, Andrés AM. Long-Term Balancing Selection in LAD1 Maintains a Missense Trans-Species Polymorphism in Humans, Chimpanzees, and Bonobos. Mol Biol Evol 2015; 32:1186-96. [PMID: 25605789 DOI: 10.1093/molbev/msv007] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Balancing selection maintains advantageous genetic and phenotypic diversity in populations. When selection acts for long evolutionary periods selected polymorphisms may survive species splits and segregate in present-day populations of different species. Here, we investigate the role of long-term balancing selection in the evolution of protein-coding sequences in the Homo-Pan clade. We sequenced the exome of 20 humans, 20 chimpanzees, and 20 bonobos and detected eight coding trans-species polymorphisms (trSNPs) that are shared among the three species and have segregated for approximately 14 My of independent evolution. Although the majority of these trSNPs were found in three genes of the major histocompatibility locus cluster, we also uncovered one coding trSNP (rs12088790) in the gene LAD1. All these trSNPs show clustering of sequences by allele rather than by species and also exhibit other signatures of long-term balancing selection, such as segregating at intermediate frequency and lying in a locus with high genetic diversity. Here, we focus on the trSNP in LAD1, a gene that encodes for Ladinin-1, a collagenous anchoring filament protein of basement membrane that is responsible for maintaining cohesion at the dermal-epidermal junction; the gene is also an autoantigen responsible for linear IgA disease. This trSNP results in a missense change (Leucine257Proline) and, besides altering the protein sequence, is associated with changes in gene expression of LAD1.
Collapse
Affiliation(s)
- João C Teixeira
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Cesare de Filippo
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Antje Weihmann
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Juan R Meneu
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Fernando Racimo
- Department of Integrative Biology, University of California, Berkeley
| | - Michael Dannemann
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Birgit Nickel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anne Fischer
- International Center for Insect Physiology and Ecology, Nairobi, Kenya
| | - Michel Halbwax
- Clinique vétérinaire du Dr. Jacquemin, Maisons-Alfort, France
| | - Claudine Andre
- Lola Ya Bonobo sanctuary, Kinshasa, Democratic Republic Congo
| | - Rebeca Atencia
- Réserve Naturelle Sanctuaire à Chimpanzés de Tchimpounga, Jane Goodall Institute, Pointe-Noire, Republic of Congo
| | - Matthias Meyer
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Genís Parra
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Svante Pääbo
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Aida M Andrés
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| |
Collapse
|
14
|
Bank C, Ewing GB, Ferrer-Admettla A, Foll M, Jensen JD. Thinking too positive? Revisiting current methods of population genetic selection inference. Trends Genet 2014; 30:540-6. [PMID: 25438719 DOI: 10.1016/j.tig.2014.09.010] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Revised: 09/19/2014] [Accepted: 09/23/2014] [Indexed: 02/03/2023]
Abstract
In the age of next-generation sequencing, the availability of increasing amounts and improved quality of data at decreasing cost ought to allow for a better understanding of how natural selection is shaping the genome than ever before. However, alternative forces, such as demography and background selection (BGS), obscure the footprints of positive selection that we would like to identify. In this review, we illustrate recent developments in this area, and outline a roadmap for improved selection inference. We argue (i) that the development and obligatory use of advanced simulation tools is necessary for improved identification of selected loci, (ii) that genomic information from multiple time points will enhance the power of inference, and (iii) that results from experimental evolution should be utilized to better inform population genomic studies.
Collapse
Affiliation(s)
- Claudia Bank
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), 1015 Lausanne, Switzerland.
| | - Gregory B Ewing
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), 1015 Lausanne, Switzerland
| | - Anna Ferrer-Admettla
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), 1015 Lausanne, Switzerland; Department of Biology and Biochemistry, University of Fribourg, 1700 Fribourg, Switzerland
| | - Matthieu Foll
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), 1015 Lausanne, Switzerland
| | - Jeffrey D Jensen
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), 1015 Lausanne, Switzerland
| |
Collapse
|
15
|
How much of the variation in the mutation rate along the human genome can be explained? G3-GENES GENOMES GENETICS 2014; 4:1667-70. [PMID: 24996580 PMCID: PMC4169158 DOI: 10.1534/g3.114.012849] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
It has been claimed recently that it may be possible to predict the rate of de novo mutation of each site in the human genome with a high degree of accuracy [Michaelson et al. (2012), Cell 151: 1431−1442]. We show that this claim is unwarranted. By considering the correlation between the rate of de novo mutation and the predictions from the model of Michaelson et al., we show there could be substantial unexplained variance in the mutation rate. We investigate whether the model of Michaelson et al. captures variation at the single nucleotide level that is not due to simple context. We show that the model captures a substantial fraction of this variation at CpG dinucleotides but fails to explain much of the variation at non-CpG sites.
Collapse
|
16
|
Livnat A. Interaction-based evolution: how natural selection and nonrandom mutation work together. Biol Direct 2013; 8:24. [PMID: 24139515 PMCID: PMC4231362 DOI: 10.1186/1745-6150-8-24] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Accepted: 09/26/2013] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The modern evolutionary synthesis leaves unresolved some of the most fundamental, long-standing questions in evolutionary biology: What is the role of sex in evolution? How does complex adaptation evolve? How can selection operate effectively on genetic interactions? More recently, the molecular biology and genomics revolutions have raised a host of critical new questions, through empirical findings that the modern synthesis fails to explain: for example, the discovery of de novo genes; the immense constructive role of transposable elements in evolution; genetic variance and biochemical activity that go far beyond what traditional natural selection can maintain; perplexing cases of molecular parallelism; and more. PRESENTATION OF THE HYPOTHESIS Here I address these questions from a unified perspective, by means of a new mechanistic view of evolution that offers a novel connection between selection on the phenotype and genetic evolutionary change (while relying, like the traditional theory, on natural selection as the only source of feedback on the fit between an organism and its environment). I hypothesize that the mutation that is of relevance for the evolution of complex adaptation-while not Lamarckian, or "directed" to increase fitness-is not random, but is instead the outcome of a complex and continually evolving biological process that combines information from multiple loci into one. This allows selection on a fleeting combination of interacting alleles at different loci to have a hereditary effect according to the combination's fitness. TESTING AND IMPLICATIONS OF THE HYPOTHESIS This proposed mechanism addresses the problem of how beneficial genetic interactions can evolve under selection, and also offers an intuitive explanation for the role of sex in evolution, which focuses on sex as the generator of genetic combinations. Importantly, it also implies that genetic variation that has appeared neutral through the lens of traditional theory can actually experience selection on interactions and thus has a much greater adaptive potential than previously considered. Empirical evidence for the proposed mechanism from both molecular evolution and evolution at the organismal level is discussed, and multiple predictions are offered by which it may be tested. REVIEWERS This article was reviewed by Nigel Goldenfeld (nominated by Eugene V. Koonin), Jürgen Brosius and W. Ford Doolittle.
Collapse
Affiliation(s)
- Adi Livnat
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA, 24061,
USA
| |
Collapse
|
17
|
Terekhanova NV, Bazykin GA, Neverov A, Kondrashov AS, Seplyarskiy VB. Prevalence of multinucleotide replacements in evolution of primates and Drosophila. Mol Biol Evol 2013; 30:1315-25. [PMID: 23447710 PMCID: PMC3649671 DOI: 10.1093/molbev/mst036] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Evolution of sequences mostly involves independent changes at different sites. However, substitutions at neighboring sites may co-occur as multinucleotide replacement events (MNRs). Here, we compare noncoding sequences of several species of primates, and of three species of Drosophila fruit flies, in a phylogenetic analysis of the replacements that occurred between species at nearby nucleotide sites. Both in primates and in Drosophila, the frequency of single-nucleotide replacements is substantially elevated within 10 nucleotides from other replacements that occurred on the same lineage but not on another lineage. The data imply that dinucleotide replacements (DNRs) affecting sites at distances of up to 10 nucleotides from each other are responsible for 2.3% of single-nucleotide replacements in primate genomes and for 5.6% in Drosophila genomes. Among these DNRs, 26% and 69%, respectively, are in fact parts of replacements of three or more trinucleotide replacements (TNRs). The plurality of MNRs affect nearby nucleotides, so that at least six times as many DNRs affect two adjacent nucleotide sites than sites 10 nucleotides apart. Still, approximately 60% of DNRs, and approximately 90% of TNRs, span distances more than two (or three) nucleotides. MNRs make a major contribution to the observed clustering of substitutions: In the human–chimpanzee comparison, DNRs are responsible for 50% of cases when two nearby replacements are observed on the human lineage, and TNRs are responsible for 83% of cases when three replacements at three immediately adjacent sites are observed on the human lineage. The prevalence of MNRs matches that is observed in data on de novo mutations and is also observed in the regions with the lowest sequence conservation, suggesting that MNRs mainly have mutational origin; however, epistatic selection and/or gene conversion may also play a role.
Collapse
Affiliation(s)
- Nadezhda V Terekhanova
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | | | | | | | | |
Collapse
|
18
|
Lartillot N. Phylogenetic patterns of GC-biased gene conversion in placental mammals and the evolutionary dynamics of recombination landscapes. Mol Biol Evol 2012; 30:489-502. [PMID: 23079417 DOI: 10.1093/molbev/mss239] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
GC-biased gene conversion (gBGC) is a major evolutionary force shaping genomic nucleotide landscapes, distorting the estimation of the strength of selection, and having potentially deleterious effects on genome-wide fitness. Yet, a global quantitative picture, at large evolutionary scale, of the relative strength of gBGC compared with selection and random drift is still lacking. Furthermore, owing to its dependence on the local recombination rate, gBGC results in modulations of the substitution patterns along genomes and across time which, if correctly interpreted, may yield quantitative insights into the long-term evolutionary dynamics of recombination landscapes. Deriving a model of the substitution process at putatively neutral nucleotide positions from population-genetics arguments, and accounting for among-lineage and among-gene effects, we propose a reconstruction of the variation in gBGC intensity at the scale of placental mammals, and of its scaling with body-size and karyotypic traits. Our results are compatible with a simple population genetics model relating gBGC to effective population size and recombination rate. In addition, among-gene variation and phylogenetic patterns of exon-specific levels of gBGC reveal the presence of rugged recombination landscapes, and suggest that short-lived recombination hot-spots are a general feature of placentals. Across placental mammals, variation in gBGC strength spans two orders of magnitude, at its lowest in apes, strongest in lagomorphs, microbats or tenrecs, and near or above the nearly neutral threshold in most other lineages. Combined with among-gene variation, such high levels of biased gene conversion are likely to significantly impact midly selected positions, and to represent a substantial mutation load. Altogether, our analysis suggests a more important role of gBGC in placental genome evolution, compared with what could have been anticipated from studies conducted in anthropoid primates.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
19
|
Seplyarskiy VB, Kharchenko P, Kondrashov AS, Bazykin GA. Heterogeneity of the transition/transversion ratio in Drosophila and Hominidae genomes. Mol Biol Evol 2012; 29:1943-55. [PMID: 22337862 DOI: 10.1093/molbev/mss071] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Mutation rate varies between sites in the genome. Part of this variation can be explained by well-recognized short nucleotide contexts, but a large component of this variation remains cryptic. We used data on interspecies divergence and intraspecies polymorphism in Drosophila and Hominidae to analyze variation of the average rate of the 12 possible kinds of single-nucleotide mutations and in the transition/transversion ratio κ at single-nucleotide resolution. Both the average mutation rate and κ vary by a factor of ~3 between nucleotide sites. The characteristic scale of variation in κ is up to at least ~30 nucleotides in Drosophila and ~5 nucleotides in Hominidae. Genome segments with locally elevated mutation rates possess lower values of κ; however, a substantial fraction of variation in κ cannot be directly explained by the local mutation rates.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia.
| | | | | | | |
Collapse
|
20
|
Abstract
It has been known for many years that the mutation rate varies across the genome. However, only with the advent of large genomic data sets is the full extent of this variation becoming apparent. The mutation rate varies over many different scales, from adjacent sites to whole chromosomes, with the strongest variation seen at the smallest scales. Some of these patterns have clear mechanistic bases, but much of the rate variation remains unexplained, and some of it is deeply perplexing. Variation in the mutation rate has important implications in evolutionary biology and underexplored implications for our understanding of hereditary disease and cancer.
Collapse
|