1
|
Seplyarskiy V, Koch EM, Lee DJ, Lichtman JS, Luan HH, Sunyaev SR. A mutation rate model at the basepair resolution identifies the mutagenic effect of polymerase III transcription. Nat Genet 2023; 55:2235-2242. [PMID: 38036792 PMCID: PMC11348951 DOI: 10.1038/s41588-023-01562-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/06/2023] [Indexed: 12/02/2023]
Abstract
De novo mutations occur at substantially different rates depending on genomic location, sequence context and DNA strand. The success of methods to estimate selection intensity, infer demographic history and map rare disease genes, depends strongly on assumptions about the local mutation rate. Here we present Roulette, a genome-wide mutation rate model at basepair resolution that incorporates known determinants of local mutation rate. Roulette is shown to be more accurate than existing models. We use Roulette to refine the estimates of population growth within Europe by incorporating the full range of human mutation rates. The analysis of significant deviations from the model predictions revealed a tenfold increase in mutation rate in nearly all genes transcribed by polymerase III (Pol III), suggesting a new mutagenic mechanism. We also detected an elevated mutation rate within transcription factor binding sites restricted to sites actively used in testis and residing in promoters.
Collapse
Affiliation(s)
- Vladimir Seplyarskiy
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Evan M Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Daniel J Lee
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Joshua S Lichtman
- NGM Biopharmaceuticals Inc., South San Francisco, CA, USA
- Soleil Labs, South San Francisco, CA, USA
| | - Harding H Luan
- NGM Biopharmaceuticals Inc., South San Francisco, CA, USA
- Soleil Labs, South San Francisco, CA, USA
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
2
|
Hart M, Conrad J, Barrett E, Legg K, Ivey G, Lee PHU, Yung YC, Shim JW. X-linked hydrocephalus genes: Their proximity to telomeres and high A + T content compared to Parkinson's disease. Exp Neurol 2023; 366:114433. [PMID: 37156332 PMCID: PMC10330542 DOI: 10.1016/j.expneurol.2023.114433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 04/15/2023] [Accepted: 05/05/2023] [Indexed: 05/10/2023]
Abstract
Proximity to telomeres (i) and high adenine and thymine (A + T) content (ii) are two factors associated with high mutation rates in human chromosomes. We have previously shown that >100 human genes when mutated to cause congenital hydrocephalus (CH) meet either factor (i) or (ii) at 91% matching, while two factors are poorly satisfied in human genes associated with familial Parkinson's disease (fPD) at 59%. Using the sets of mouse, rat, and human chromosomes, we found that 7 genes associated with CH were located on the X chromosome of mice, rats, and humans. However, genes associated with fPD were in different autosomes depending on species. While the contribution of proximity to telomeres in the autosome was comparable in CH and fPD, high A + T content played a pivotal contribution in X-linked CH (43% in all three species) than in fPD (6% in rodents or 13% in humans). Low A + T content found in fPD cases suggests that PARK family genes harbor roughly 3 times higher chances of methylations in CpG sites or epigenetic changes than X-linked genes.
Collapse
Affiliation(s)
- Madeline Hart
- Department of Biomedical Engineering, Marshall University, Huntington, WV, United States
| | - Joshua Conrad
- Department of Biomedical Engineering, Marshall University, Huntington, WV, United States
| | - Emma Barrett
- Department of Biomedical Engineering, Marshall University, Huntington, WV, United States
| | - Kaitlyn Legg
- Department of Biomedical Engineering, Marshall University, Huntington, WV, United States
| | - Gabrielle Ivey
- Department of Biomedical Engineering, Marshall University, Huntington, WV, United States
| | - Peter H U Lee
- Department of Cardiothoracic Surgery, Southcoast Health, Fall River, MA, United States; Department of Pathology and Laboratory Medicine, Brown University, Providence, RI, United States
| | - Yun C Yung
- Department of Neuroscience, The Scintillon Research Institute, San Diego, CA, United States
| | - Joon W Shim
- Department of Biomedical Engineering, Marshall University, Huntington, WV, United States.
| |
Collapse
|
3
|
Hara Y, Kuraku S. The impact of local genomic properties on the evolutionary fate of genes. eLife 2023; 12:82290. [PMID: 37223962 DOI: 10.7554/elife.82290] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 04/25/2023] [Indexed: 05/25/2023] Open
Abstract
Functionally indispensable genes are likely to be retained and otherwise to be lost during evolution. This evolutionary fate of a gene can also be affected by factors independent of gene dispensability, including the mutability of genomic positions, but such features have not been examined well. To uncover the genomic features associated with gene loss, we investigated the characteristics of genomic regions where genes have been independently lost in multiple lineages. With a comprehensive scan of gene phylogenies of vertebrates with a careful inspection of evolutionary gene losses, we identified 813 human genes whose orthologs were lost in multiple mammalian lineages: designated 'elusive genes.' These elusive genes were located in genomic regions with rapid nucleotide substitution, high GC content, and high gene density. A comparison of the orthologous regions of such elusive genes across vertebrates revealed that these features had been established before the radiation of the extant vertebrates approximately 500 million years ago. The association of human elusive genes with transcriptomic and epigenomic characteristics illuminated that the genomic regions containing such genes were subject to repressive transcriptional regulation. Thus, the heterogeneous genomic features driving gene fates toward loss have been in place and may sometimes have relaxed the functional indispensability of such genes. This study sheds light on the complex interplay between gene function and local genomic properties in shaping gene evolution that has persisted since the vertebrate ancestor.
Collapse
Affiliation(s)
- Yuichiro Hara
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Shigehiro Kuraku
- Molecular Life History Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan
- Department of Genetics, Sokendai (Graduate University for Advanced Studies), Mishima, Japan
- RIKEN Center for Biosystems Dynamics Research, Kobe, Japan
| |
Collapse
|
4
|
Vihinen M. Individual Genetic Heterogeneity. Genes (Basel) 2022; 13:1626. [PMID: 36140794 PMCID: PMC9498725 DOI: 10.3390/genes13091626] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 08/25/2022] [Accepted: 09/08/2022] [Indexed: 11/28/2022] Open
Abstract
Genetic variation has been widely covered in literature, however, not from the perspective of an individual in any species. Here, a synthesis of genetic concepts and variations relevant for individual genetic constitution is provided. All the different levels of genetic information and variation are covered, ranging from whether an organism is unmixed or hybrid, has variations in genome, chromosomes, and more locally in DNA regions, to epigenetic variants or alterations in selfish genetic elements. Genetic constitution and heterogeneity of microbiota are highly relevant for health and wellbeing of an individual. Mutation rates vary widely for variation types, e.g., due to the sequence context. Genetic information guides numerous aspects in organisms. Types of inheritance, whether Mendelian or non-Mendelian, zygosity, sexual reproduction, and sex determination are covered. Functions of DNA and functional effects of variations are introduced, along with mechanism that reduce and modulate functional effects, including TARAR countermeasures and intraindividual genetic conflict. TARAR countermeasures for tolerance, avoidance, repair, attenuation, and resistance are essential for life, integrity of genetic information, and gene expression. The genetic composition, effects of variations, and their expression are considered also in diseases and personalized medicine. The text synthesizes knowledge and insight on individual genetic heterogeneity and organizes and systematizes the central concepts.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22184 Lund, Sweden
| |
Collapse
|
5
|
Lucena-Perez M, Kleinman-Ruiz D, Marmesat E, Saveljev AP, Schmidt K, Godoy JA. Bottleneck-associated changes in the genomic landscape of genetic diversity in wild lynx populations. Evol Appl 2021; 14:2664-2679. [PMID: 34815746 PMCID: PMC8591332 DOI: 10.1111/eva.13302] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 08/17/2021] [Accepted: 09/08/2021] [Indexed: 01/06/2023] Open
Abstract
Demographic bottlenecks generally reduce genetic diversity through more intense genetic drift, but their net effect may vary along the genome due to the random nature of genetic drift and to local effects of recombination, mutation, and selection. Here, we analyzed the changes in genetic diversity following a bottleneck by comparing whole-genome diversity patterns in populations with and without severe recent documented declines of Iberian (Lynx pardinus, n = 31) and Eurasian lynx (Lynx lynx, n = 29). As expected, overall genomic diversity correlated negatively with bottleneck intensity and/or duration. Correlations of genetic diversity with divergence, chromosome size, gene or functional site content, GC content, or recombination were observed in nonbottlenecked populations, but were weaker in bottlenecked populations. Also, functional features under intense purifying selection and the X chromosome showed an increase in the observed density of variants, even resulting in higher θ W diversity than in nonbottlenecked populations. Increased diversity seems to be related to both a higher mutational input in those regions creating a large collection of low-frequency variants, a few of which increase in frequency during the bottleneck to the point they become detectable with our limited sample, and the reduced efficacy of purifying selection, which affects not only protein structure and function but also the regulation of gene expression. The results of this study alert to the possible reduction of fitness and adaptive potential associated with the genomic erosion in regulatory elements. Further, the detection of a gain of diversity in ultra-conserved elements can be used as a sensitive and easy-to-apply signature of genetic erosion in wild populations.
Collapse
Affiliation(s)
- Maria Lucena-Perez
- Departamento de Ecología Integrativa Estación Biológica de Doñana (CSIC) Sevilla Spain
| | - Daniel Kleinman-Ruiz
- Departamento de Ecología Integrativa Estación Biológica de Doñana (CSIC) Sevilla Spain
- Departamento de Genética Facultad de Biología Universidad Complutense Madrid Spain
| | - Elena Marmesat
- Departamento de Ecología Integrativa Estación Biológica de Doñana (CSIC) Sevilla Spain
| | - Alexander P Saveljev
- Department of Animal Ecology Russian Research Institute of Game Management and Fur Farming Kirov Russia
| | - Krzysztof Schmidt
- Mammal Research Institute Polish Academy of Sciences Białowieża Poland
| | - José A Godoy
- Departamento de Ecología Integrativa Estación Biológica de Doñana (CSIC) Sevilla Spain
| |
Collapse
|
6
|
Seplyarskiy VB, Sunyaev S. The origin of human mutation in light of genomic data. Nat Rev Genet 2021; 22:672-686. [PMID: 34163020 DOI: 10.1038/s41576-021-00376-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/06/2021] [Indexed: 02/05/2023]
Abstract
Despite years of active research into the role of DNA repair and replication in mutagenesis, surprisingly little is known about the origin of spontaneous human mutation in the germ line. With the advent of high-throughput sequencing, genome-scale data have revealed statistical properties of mutagenesis in humans. These properties include variation of the mutation rate and spectrum along the genome at different scales in relation to epigenomic features and dependency on parental age. Moreover, mutations originated in mothers are less frequent than mutations originated in fathers and have a distinct genomic distribution. Statistical analyses that interpret these patterns in the context of known biochemistry can provide mechanistic models of mutagenesis in humans.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
7
|
Fitzgerald DM, Rosenberg SM. Biology before the SOS Response-DNA Damage Mechanisms at Chromosome Fragile Sites. Cells 2021; 10:2275. [PMID: 34571923 PMCID: PMC8465572 DOI: 10.3390/cells10092275] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/13/2021] [Accepted: 08/13/2021] [Indexed: 01/03/2023] Open
Abstract
The Escherichia coli SOS response to DNA damage, discovered and conceptualized by Evelyn Witkin and Miroslav Radman, is the prototypic DNA-damage stress response that upregulates proteins of DNA protection and repair, a radical idea when formulated in the late 1960s and early 1970s. SOS-like responses are now described across the tree of life, and similar mechanisms of DNA-damage tolerance and repair underlie the genome instability that drives human cancer and aging. The DNA damage that precedes damage responses constitutes upstream threats to genome integrity and arises mostly from endogenous biology. Radman's vision and work on SOS, mismatch repair, and their regulation of genome and species evolution, were extrapolated directly from bacteria to humans, at a conceptual level, by Radman, then many others. We follow his lead in exploring bacterial molecular genomic mechanisms to illuminate universal biology, including in human disease, and focus here on some events upstream of SOS: the origins of DNA damage, specifically at chromosome fragile sites, and the engineered proteins that allow us to identify mechanisms. Two fragility mechanisms dominate: one at replication barriers and another associated with the decatenation of sister chromosomes following replication. DNA structures in E. coli, additionally, suggest new interpretations of pathways in cancer evolution, and that Holliday junctions may be universal molecular markers of chromosome fragility.
Collapse
Affiliation(s)
- Devon M. Fitzgerald
- Departments of Molecular and Human Genetics, Biochemistry and Molecular Biology, Molecular Virology and Microbiology, and Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Susan M. Rosenberg
- Departments of Molecular and Human Genetics, Biochemistry and Molecular Biology, Molecular Virology and Microbiology, and Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
8
|
Guiblet WM, Cremona MA, Harris RS, Chen D, Eckert KA, Chiaromonte F, Huang YF, Makova KD. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res 2021; 49:1497-1516. [PMID: 33450015 PMCID: PMC7897504 DOI: 10.1093/nar/gkaa1269] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 12/14/2020] [Accepted: 01/11/2021] [Indexed: 12/12/2022] Open
Abstract
Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.
Collapse
Affiliation(s)
- Wilfried M Guiblet
- Bioinformatics and Genomics Graduate Program, Penn State University, UniversityPark, PA 16802, USA
| | - Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Operations and Decision Systems, Université Laval, Canada
- CHU de Québec – Université Laval Research Center, Canada
| | - Robert S Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Di Chen
- Intercollege Graduate Degree Program in Genetics, Huck Institutes of the Life Sciences, Penn State University, UniversityPark, PA 16802, USA
| | - Kristin A Eckert
- Department of Pathology, Penn State University, College of Medicine, Hershey, PA 17033, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
- EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Yi-Fei Huang
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| |
Collapse
|
9
|
Berrio A, Haygood R, Wray GA. Identifying branch-specific positive selection throughout the regulatory genome using an appropriate proxy neutral. BMC Genomics 2020; 21:359. [PMID: 32404186 PMCID: PMC7222330 DOI: 10.1186/s12864-020-6752-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 04/21/2020] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Adaptive changes in cis-regulatory elements are an essential component of evolution by natural selection. Identifying adaptive and functional noncoding DNA elements throughout the genome is therefore crucial for understanding the relationship between phenotype and genotype. RESULTS We used ENCODE annotations to identify appropriate proxy neutral sequences and demonstrate that the conservativeness of the test can be modulated during the filtration of reference alignments. We applied the method to noncoding Human Accelerated Elements as well as open chromatin elements previously identified in 125 human tissues and cell lines to demonstrate its utility. Then, we evaluated the impact of query region length, proxy neutral sequence length, and branch count on test sensitivity and specificity. We found that the length of the query alignment can vary between 150 bp and 1 kb without affecting the estimation of selection, while for the reference alignment, we found that a length of 3 kb is adequate for proper testing. We also simulated sequence alignments under different classes of evolution and validated our ability to distinguish positive selection from relaxation of constraint and neutral evolution. Finally, we re-confirmed that a quarter of all non-coding Human Accelerated Elements are evolving by positive selection. CONCLUSION Here, we introduce a method we called adaptiPhy, which adds significant improvements to our earlier method that tests for branch-specific directional selection in noncoding sequences. The motivation for these improvements is to provide a more sensitive and better targeted characterization of directional selection and neutral evolution across the genome.
Collapse
Affiliation(s)
- Alejandro Berrio
- Department of Biology, Duke University, Biological Sciences Building, 124 Science Drive, Durham, NC, 27708, USA.
| | - Ralph Haygood
- Ronin Institute for Independent Scholarship, 127 Haddon Pl., Montclair, NJ, 07043, USA
| | - Gregory A Wray
- Department of Biology, Duke University, Biological Sciences Building, 124 Science Drive, Durham, NC, 27708, USA
| |
Collapse
|
10
|
Li C, Luscombe NM. Nucleosome positioning stability is a modulator of germline mutation rate variation across the human genome. Nat Commun 2020; 11:1363. [PMID: 32170069 PMCID: PMC7070026 DOI: 10.1038/s41467-020-15185-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 02/23/2020] [Indexed: 02/08/2023] Open
Abstract
Nucleosome organization has been suggested to affect local mutation rates in the genome. However, the lack of de novo mutation and high-resolution nucleosome data has limited the investigation of this hypothesis. Additionally, analyses using indirect mutation rate measurements have yielded contradictory and potentially confounding results. Here, we combine data on >300,000 human de novo mutations with high-resolution nucleosome maps and find substantially elevated mutation rates around translationally stable (‘strong’) nucleosomes. We show that the mutational mechanisms affected by strong nucleosomes are low-fidelity replication, insufficient mismatch repair and increased double-strand breaks. Strong nucleosomes preferentially locate within young SINE/LINE transposons, suggesting that when subject to increased mutation rates, transposons are then more rapidly inactivated. Depletion of strong nucleosomes in older transposons suggests frequent positioning changes during evolution. The findings have important implications for human genetics and genome evolution. Nucleosome organization has been suggested to affect local mutation rates in the genome. Here, the authors analyse data on >300,000 human de novo mutations and high-resolution nucleosome maps and provide evidence that nucleosome positioning stability modulates germline mutation rate variation across the human genome.
Collapse
Affiliation(s)
- Cai Li
- The Francis Crick Institute, London, NW1 1AT, UK. .,School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China.
| | - Nicholas M Luscombe
- The Francis Crick Institute, London, NW1 1AT, UK.,Okinawa Institute of Science & Technology Graduate University, Okinawa, 904-0495, Japan.,UCL Genetics Institute, University College London, London, WC1E 6BT, UK
| |
Collapse
|
11
|
Castellano D, Eyre-Walker A, Munch K. Impact of Mutation Rate and Selection at Linked Sites on DNA Variation across the Genomes of Humans and Other Homininae. Genome Biol Evol 2020; 12:3550-3561. [PMID: 31596481 PMCID: PMC6944223 DOI: 10.1093/gbe/evz215] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/03/2019] [Indexed: 12/23/2022] Open
Abstract
DNA diversity varies across the genome of many species. Variation in diversity across a genome might arise from regional variation in the mutation rate, variation in the intensity and mode of natural selection, and regional variation in the recombination rate. We show that both noncoding and nonsynonymous diversity are positively correlated to a measure of the mutation rate and the recombination rate and negatively correlated to the density of conserved sequences in 50 kb windows across the genomes of humans and nonhuman homininae. Interestingly, we find that although noncoding diversity is equally affected by these three genomic variables, nonsynonymous diversity is mostly dominated by the density of conserved sequences. The positive correlation between diversity and our measure of the mutation rate seems to be largely a direct consequence of regions with higher mutation rates having more diversity. However, the positive correlation with recombination rate and the negative correlation with the density of conserved sequences suggest that selection at linked sites also affect levels of diversity. This is supported by the observation that the ratio of the number of nonsynonymous to noncoding polymorphisms is negatively correlated to a measure of the effective population size across the genome. We show these patterns persist even when we restrict our analysis to GC-conservative mutations, demonstrating that the patterns are not driven by GC biased gene conversion. In conclusion, our comparative analyses describe how recombination rate, gene density, and mutation rate interact to produce the patterns of DNA diversity that we observe along the hominine genomes.
Collapse
Affiliation(s)
- David Castellano
- Bioinformatics Research Centre, Aarhus University, Denmark
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona, Spain
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Kasper Munch
- Bioinformatics Research Centre, Aarhus University, Denmark
| |
Collapse
|
12
|
Fitzgerald DM, Rosenberg SM. What is mutation? A chapter in the series: How microbes "jeopardize" the modern synthesis. PLoS Genet 2019; 15:e1007995. [PMID: 30933985 PMCID: PMC6443146 DOI: 10.1371/journal.pgen.1007995] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Mutations drive evolution and were assumed to occur by chance: constantly, gradually, roughly uniformly in genomes, and without regard to environmental inputs, but this view is being revised by discoveries of molecular mechanisms of mutation in bacteria, now translated across the tree of life. These mechanisms reveal a picture of highly regulated mutagenesis, up-regulated temporally by stress responses and activated when cells/organisms are maladapted to their environments-when stressed-potentially accelerating adaptation. Mutation is also nonrandom in genomic space, with multiple simultaneous mutations falling in local clusters, which may allow concerted evolution-the multiple changes needed to adapt protein functions and protein machines encoded by linked genes. Molecular mechanisms of stress-inducible mutation change ideas about evolution and suggest different ways to model and address cancer development, infectious disease, and evolution generally.
Collapse
Affiliation(s)
- Devon M. Fitzgerald
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, United States of America
- The Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Susan M. Rosenberg
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, United States of America
- The Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas, United States of America
| |
Collapse
|
13
|
Smith TCA, Arndt PF, Eyre-Walker A. Large scale variation in the rate of germ-line de novo mutation, base composition, divergence and diversity in humans. PLoS Genet 2018; 14:e1007254. [PMID: 29590096 PMCID: PMC5891062 DOI: 10.1371/journal.pgen.1007254] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 04/09/2018] [Accepted: 02/13/2018] [Indexed: 01/17/2023] Open
Abstract
It has long been suspected that the rate of mutation varies across the human genome at a large scale based on the divergence between humans and other species. However, it is now possible to directly investigate this question using the large number of de novo mutations (DNMs) that have been discovered in humans through the sequencing of trios. We investigate a number of questions pertaining to the distribution of mutations using more than 130,000 DNMs from three large datasets. We demonstrate that the amount and pattern of variation differs between datasets at the 1MB and 100KB scales probably as a consequence of differences in sequencing technology and processing. In particular, datasets show different patterns of correlation to genomic variables such as replication time. Never-the-less there are many commonalities between datasets, which likely represent true patterns. We show that there is variation in the mutation rate at the 100KB, 1MB and 10MB scale that cannot be explained by variation at smaller scales, however the level of this variation is modest at large scales-at the 1MB scale we infer that ~90% of regions have a mutation rate within 50% of the mean. Different types of mutation show similar levels of variation and appear to vary in concert which suggests the pattern of mutation is relatively constant across the genome. We demonstrate that variation in the mutation rate does not generate large-scale variation in GC-content, and hence that mutation bias does not maintain the isochore structure of the human genome. We find that genomic features explain less than 40% of the explainable variance in the rate of DNM. As expected the rate of divergence between species is correlated to the rate of DNM. However, the correlations are weaker than expected if all the variation in divergence was due to variation in the mutation rate. We provide evidence that this is due the effect of biased gene conversion on the probability that a mutation will become fixed. In contrast to divergence, we find that most of the variation in diversity can be explained by variation in the mutation rate. Finally, we show that the correlation between divergence and DNM density declines as increasingly divergent species are considered.
Collapse
Affiliation(s)
| | - Peter F. Arndt
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| |
Collapse
|
14
|
Seplyarskiy VB, Andrianova MA, Bazykin GA. APOBEC3A/B-induced mutagenesis is responsible for 20% of heritable mutations in the TpCpW context. Genome Res 2016; 27:175-184. [PMID: 27940951 PMCID: PMC5287224 DOI: 10.1101/gr.210336.116] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 12/01/2016] [Indexed: 12/18/2022]
Abstract
APOBEC3A/B cytidine deaminase is responsible for the majority of cancerous mutations in a large fraction of cancer samples. However, its role in heritable mutagenesis remains very poorly understood. Recent studies have demonstrated that both in yeast and in human cancerous cells, most APOBEC3A/B-induced mutations occur on the lagging strand during replication and on the nontemplate strand of transcribed regions. Here, we use data on rare human polymorphisms, interspecies divergence, and de novo mutations to study germline mutagenesis and to analyze mutations at nucleotide contexts prone to attack by APOBEC3A/B. We show that such mutations occur preferentially on the lagging strand and on nontemplate strands of transcribed regions. Moreover, we demonstrate that APOBEC3A/B-like mutations tend to produce strand-coordinated clusters, which are also biased toward the lagging strand. Finally, we show that the mutation rate is increased 3' of C→G mutations to a greater extent than 3' of C→T mutations, suggesting pervasive trans-lesion bypass of the APOBEC3A/B-induced damage. Our study demonstrates that 20% of C→T and C→G mutations in the TpCpW context-where W denotes A or T, segregating as polymorphisms in human population-or 1.4% of all heritable mutations are attributable to APOBEC3A/B activity.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia
| | - Maria A Andrianova
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia.,Lomonosov Moscow State University, Moscow 119234, Russia
| | - Georgii A Bazykin
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia.,Lomonosov Moscow State University, Moscow 119234, Russia.,Skolkovo Institute of Science and Technology, Skolkovo 143026, Russia
| |
Collapse
|