1
|
Oman M, Ness RW. Comparing the predictors of mutability among healthy human tissues inferred from mutations in single-cell genome data. Genetics 2025; 229:iyae215. [PMID: 39950507 DOI: 10.1093/genetics/iyae215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 12/03/2024] [Indexed: 03/19/2025] Open
Abstract
Studying mutation in healthy somatic tissues is the key for understanding the genesis of cancer and other genetic diseases. Mutation rate varies from site to site in the human genome by up to 100-fold and is influenced by numerous epigenetic and genetic factors including GC content, trinucleotide sequence context, and DNAse accessibility. These factors influence mutation at both local and regional scales and are often interrelated with one another, meaning that predicting mutability or uncovering its drivers requires modelling multiple factors and scales simultaneously. Historically, most investigations have focused either on analyzing the local sequence scale through triplet signatures or on examining the impact of epigenetic processes at larger scales, but not both concurrently. Additionally, sequencing technology limitations have restricted analyses of healthy mutations to coding regions (RNA-seq) or to those that have been influenced by selection (e.g. bulk samples from cancer tissue). Here, we leverage single-cell mutations and present a comprehensive analysis of epigenetic and genetic factors at multiple scales in the germline and 3 healthy somatic tissues. We create models that predict mutability with on average 2% error and find up to 63-fold variation among sites within the same tissue. We observe varying degrees of similarity between tissues: the mutability of genomic positions was 93.4% similar between liver and germline tissues, but sites in germline and skin were only 85.9% similar. We observe both universal and tissue-specific mutagenic processes in healthy tissues, with implications for understanding the maintenance of germline vs soma and the mechanisms underlying early tumorigenesis.
Collapse
Affiliation(s)
- Madeleine Oman
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 1A1, Canada
- Department of Biology, University of Toronto Mississauga, Mississauga, L5L1C6, Canada
| | - Rob W Ness
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 1A1, Canada
- Department of Biology, University of Toronto Mississauga, Mississauga, L5L1C6, Canada
- Department of Cell and Systems Biology, University of Toronto, Toronto, M5S 1A1, Canada
| |
Collapse
|
2
|
Roberts MD, Davis O, Josephs EB, Williamson RJ. K-mer-based Approaches to Bridging Pangenomics and Population Genetics. Mol Biol Evol 2025; 42:msaf047. [PMID: 40111256 PMCID: PMC11925024 DOI: 10.1093/molbev/msaf047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 01/10/2025] [Accepted: 02/04/2025] [Indexed: 03/12/2025] Open
Abstract
Many commonly studied species now have more than one chromosome-scale genome assembly, revealing a large amount of genetic diversity previously missed by approaches that map short reads to a single reference. However, many species still lack multiple reference genomes and correctly aligning references to build pangenomes can be challenging for many species, limiting our ability to study this missing genomic variation in population genetics. Here, we argue that k-mers are a very useful but underutilized tool for bridging the reference-focused paradigms of population genetics with the reference-free paradigms of pangenomics. We review current literature on the uses of k-mers for performing three core components of most population genetics analyses: identifying, measuring, and explaining patterns of genetic variation. We also demonstrate how different k-mer-based measures of genetic variation behave in population genetic simulations according to the choice of k, depth of sequencing coverage, and degree of data compression. Overall, we find that k-mer-based measures of genetic diversity scale consistently with pairwise nucleotide diversity (π) up to values of about π=0.025 (R2=0.97) for neutrally evolving populations. For populations with even more variation, using shorter k-mers will maintain the scalability up to at least π=0.1. Furthermore, in our simulated populations, k-mer dissimilarity values can be reliably approximated from counting bloom filters, highlighting a potential avenue to decreasing the memory burden of k-mer-based genomic dissimilarity analyses. For future studies, there is a great opportunity to further develop methods to identifying selected loci using k-mers.
Collapse
Affiliation(s)
- Miles D Roberts
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI 48824, USA
| | - Olivia Davis
- Department of Computer Science and Software Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
| | - Emily B Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI 48824, USA
| | - Robert J Williamson
- Department of Computer Science and Software Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
- Department of Biology and Biomedical Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
| |
Collapse
|
3
|
Zhang Y, Leung AK, Kang JJ, Sun Y, Wu G, Li L, Sun J, Cheng L, Qiu T, Zhang J, Wierbowski SD, Gupta S, Booth JG, Yu H. A multiscale functional map of somatic mutations in cancer integrating protein structure and network topology. Nat Commun 2025; 16:975. [PMID: 39856048 PMCID: PMC11760531 DOI: 10.1038/s41467-024-54176-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 11/04/2024] [Indexed: 01/27/2025] Open
Abstract
A major goal of cancer biology is to understand the mechanisms driven by somatically acquired mutations. Two distinct methodologies-one analyzing mutation clustering within protein sequences and 3D structures, the other leveraging protein-protein interaction network topology-offer complementary strengths. We present NetFlow3D, a unified, end-to-end 3D structurally-informed protein interaction network propagation framework that maps the multiscale mechanistic effects of mutations. Built upon the Human Protein Structurome, which incorporates the 3D structures of every protein and the binding interfaces of all known protein interactions, NetFlow3D integrates atomic, residue, protein and network-level information: It clusters mutations on 3D protein structures to identify driver mutations and propagates their impacts anisotropically across the protein interaction network, guided by the involved interaction interfaces, to reveal systems-level impacts. Applied to 33 cancer types, NetFlow3D identifies 2 times more 3D clusters and incorporates 8 times more proteins in significantly interconnected network modules compared to traditional methods.
Collapse
Affiliation(s)
- Yingying Zhang
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, 14853, NY, USA
| | - Alden K Leung
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Yu Sun
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Guanxi Wu
- College of Agriculture and Life Sciences, Cornell University, Ithaca, 14853, NY, USA
| | - Le Li
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Jiayang Sun
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Lily Cheng
- Department of Science and Technology Studies, Cornell University, Ithaca, 14853, NY, USA
| | - Tian Qiu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, 14853, NY, USA
| | - Junke Zhang
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Shayne D Wierbowski
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Shagun Gupta
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - James G Booth
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Department of Statistics and Data Science, Cornell University, Ithaca, 14853, NY, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA.
| |
Collapse
|
4
|
Hussain S. A modeling of complex trait phenotypic variance determinants. PNAS NEXUS 2024; 3:pgae472. [PMID: 39529912 PMCID: PMC11552524 DOI: 10.1093/pnasnexus/pgae472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 10/09/2024] [Indexed: 11/16/2024]
Abstract
Studies have now shown that the heritability of some complex traits, such as human height, can be virtually fully captured via potential use of sufficiently powered approaches that can characterize the associated collective common- and rare-variant additive genetic architecture. However, for other traits, including complex disease traits, full recovery of such narrow sense heritability would still likely fall far short of respective heritability estimates yielded from pedigree-based analyses such as twin studies. Here, it is proposed that such traits could also involve additional types of relevant architecture and underlying genetic mechanism, such that interaction of somatic variants with heritable variants may represent an underappreciated component. The theoretical model suggested predicts that some relevant heritability estimates are systematically inflated by twin studies, and that instead a significant proportion of the phenotypic variances may be explained by specialized types of heritable genotype-by-environment interaction.
Collapse
Affiliation(s)
- Shobbir Hussain
- Department of Life Sciences, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom
| |
Collapse
|
5
|
Schraiber JG, Spence JP, Edge MD. Estimation of demography and mutation rates from one million haploid genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.18.613708. [PMID: 39345369 PMCID: PMC11429810 DOI: 10.1101/2024.09.18.613708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
As genetic sequencing costs have plummeted, datasets with sizes previously un-thinkable have begun to appear. Such datasets present new opportunities to learn about evolutionary history, particularly via rare alleles that record the very recent past. However, beyond the computational challenges inherent in the analysis of many large-scale datasets, large population-genetic datasets present theoretical problems. In particular, the majority of population-genetic tools require the assumption that each mutant allele in the sample is the result of a single mutation (the "infinite sites" assumption), which is violated in large samples. Here, we present DR EVIL, a method for estimating mutation rates and recent demographic history from very large samples. DR EVIL avoids the infinite-sites assumption by using a diffusion approximation to a branching-process model with recurrent mutation. The branching-process approach limits the method to rare alleles, but, along with recent results, renders tractable likelihoods with recurrent mutation. We show that DR EVIL performs well in simulations and apply it to rare-variant data from a million haploid samples, identifying a signal of mutation-rate heterogeneity within commonly analyzed classes and predicting that in modern sample sizes, most rare variants at sites with high mutation rates represent the descendants of multiple mutation events.
Collapse
|
6
|
Zhang Y, Leung AK, Kang JJ, Sun Y, Wu G, Li L, Sun J, Cheng L, Qiu T, Zhang J, Wierbowski S, Gupta S, Booth J, Yu H. A multiscale functional map of somatic mutations in cancer integrating protein structure and network topology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.06.531441. [PMID: 36945530 PMCID: PMC10028849 DOI: 10.1101/2023.03.06.531441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
A major goal of cancer biology is to understand the mechanisms underlying tumorigenesis driven by somatically acquired mutations. Two distinct types of computational methodologies have emerged: one focuses on analyzing clustering of mutations within protein sequences and 3D structures, while the other characterizes mutations by leveraging the topology of protein-protein interaction network. Their insights are largely non-overlapping, offering complementary strengths. Here, we established a unified, end-to-end 3D structurally-informed protein interaction network propagation framework, NetFlow3D, that systematically maps the multiscale mechanistic effects of somatic mutations in cancer. The establishment of NetFlow3D hinges upon the Human Protein Structurome, a comprehensive repository we compiled that incorporates the 3D structures of every single protein as well as the binding interfaces of all known protein interactions in humans. NetFlow3D leverages the Structurome to integrate information across atomic, residue, protein and network levels: It conducts 3D clustering of mutations across atomic and residue levels on protein structures to identify potential driver mutations. It then anisotropically propagates their impacts across the protein interaction network, with propagation guided by the specific 3D structural interfaces involved, to identify significantly interconnected network "modules", thereby uncovering key biological processes underlying disease etiology. Applied to 1,038,899 somatic protein-altering mutations in 9,946 TCGA tumors across 33 cancer types, NetFlow3D identified 1,4444 significant 3D clusters throughout the Human Protein Structurome, of which ~55% would not have been found if using only experimentally-determined structures. It then identified 26 significantly interconnected modules that encompass ~8-fold more proteins than applying standard network analyses. NetFlow3D and our pan-cancer results can be accessed from http://netflow3d.yulab.org.
Collapse
Affiliation(s)
- Yingying Zhang
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
- Department of Molecular Biology and Genetics, Cornell University; Ithaca, 14853, USA
| | - Alden K. Leung
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Yu Sun
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Guanxi Wu
- College of Agriculture and Life Sciences, Cornell University; Ithaca, 14853, USA
| | - Le Li
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Jiayang Sun
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
| | - Lily Cheng
- Department of Science and Technology Studies, Cornell University; Ithaca, 14853, USA
| | - Tian Qiu
- School of Electrical and Computer Engineering, Cornell University; Ithaca, 14853, USA
| | - Junke Zhang
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Shayne Wierbowski
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Shagun Gupta
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - James Booth
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Department of Statistics and Data Science, Cornell University; Ithaca, 14853, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| |
Collapse
|
7
|
Bradley D, Garand C, Belda H, Gagnon-Arsenault I, Treeck M, Elowe S, Landry CR. The substrate quality of CK2 target sites has a determinant role on their function and evolution. Cell Syst 2024; 15:544-562.e8. [PMID: 38861992 DOI: 10.1016/j.cels.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 02/29/2024] [Accepted: 05/20/2024] [Indexed: 06/13/2024]
Abstract
Most biological processes are regulated by signaling modules that bind to short linear motifs. For protein kinases, substrates may have full or only partial matches to the kinase recognition motif, a property known as "substrate quality." However, it is not clear whether differences in substrate quality represent neutral variation or if they have functional consequences. We examine this question for the kinase CK2, which has many fundamental functions. We show that optimal CK2 sites are phosphorylated at maximal stoichiometries and found in many conditions, whereas minimal substrates are more weakly phosphorylated and have regulatory functions. Optimal CK2 sites tend to be more conserved, and substrate quality is often tuned by selection. For intermediate sites, increases or decreases in substrate quality may be deleterious, as we demonstrate for a CK2 substrate at the kinetochore. The results together suggest a strong role for substrate quality in phosphosite function and evolution. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- David Bradley
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec City, QC G1V 0A6, Canada; Centre de Recherche sur les Données Massives (CRDM), Université Laval, Québec City, QC G1V 0A6, Canada; Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada.
| | - Chantal Garand
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec City, QC G1V 0A6, Canada; Axe de Reproduction, Santé de la mère et de l'enfant, CHU de Québec, Université Laval, Québec City, QC, Canada
| | - Hugo Belda
- Signalling in Host-Pathogen Interaction Laboratory, The Francis Crick Institute, London NW11AT, UK
| | - Isabelle Gagnon-Arsenault
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec City, QC G1V 0A6, Canada; Centre de Recherche sur les Données Massives (CRDM), Université Laval, Québec City, QC G1V 0A6, Canada; Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada
| | - Moritz Treeck
- Signalling in Host-Pathogen Interaction Laboratory, The Francis Crick Institute, London NW11AT, UK; Cell Biology of Host-Pathogen Interaction Laboratory, The Gulbenkian Institute of Science, Oeiras 2780-156, Portugal
| | - Sabine Elowe
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec City, QC G1V 0A6, Canada; Axe de Reproduction, Santé de la mère et de l'enfant, CHU de Québec, Université Laval, Québec City, QC, Canada; Department of Pediatrics, Faculty of Medicine, Université Laval, Québec City, QC, Canada; Centre de Recherche sur le Cancer, CHU de Québec, Université Laval, Québec City, QC, Canada
| | - Christian R Landry
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec City, QC G1V 0A6, Canada; Centre de Recherche sur les Données Massives (CRDM), Université Laval, Québec City, QC G1V 0A6, Canada; Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec City, QC G1V 0A6, Canada.
| |
Collapse
|
8
|
Soni V, Jensen JD. Temporal challenges in detecting balancing selection from population genomic data. G3 (BETHESDA, MD.) 2024; 14:jkae069. [PMID: 38551137 DOI: 10.1093/g3journal/jkae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 12/21/2023] [Accepted: 03/19/2024] [Indexed: 04/28/2024]
Abstract
The role of balancing selection in maintaining genetic variation remains an open question in population genetics. Recent years have seen numerous studies identifying candidate loci potentially experiencing balancing selection, most predominantly in human populations. There are however numerous alternative evolutionary processes that may leave similar patterns of variation, thereby potentially confounding inference, and the expected signatures of balancing selection additionally change in a temporal fashion. Here we use forward-in-time simulations to quantify expected statistical power to detect balancing selection using both site frequency spectrum- and linkage disequilibrium-based methods under a variety of evolutionarily realistic null models. We find that whilst site frequency spectrum-based methods have little power immediately after a balanced mutation begins segregating, power increases with time since the introduction of the balanced allele. Conversely, linkage disequilibrium-based methods have considerable power whilst the allele is young, and power dissipates rapidly as the time since introduction increases. Taken together, this suggests that site frequency spectrum-based methods are most effective at detecting long-term balancing selection (>25N generations since the introduction of the balanced allele) whilst linkage disequilibrium-based methods are effective over much shorter timescales (<1N generations), thereby leaving a large time frame over which current methods have little power to detect the action of balancing selection. Finally, we investigate the extent to which alternative evolutionary processes may mimic these patterns, and demonstrate the need for caution in attempting to distinguish the signatures of balancing selection from those of both neutral processes (e.g. population structure and admixture) as well as of alternative selective processes (e.g. partial selective sweeps).
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
9
|
Liang Y, Hao J, Wang J, Zhang G, Su Y, Liu Z, Wang T. Statistical Genomics Analysis of Simple Sequence Repeats from the Paphiopedilum Malipoense Transcriptome Reveals Control Knob Motifs Modulating Gene Expression. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304848. [PMID: 38647414 PMCID: PMC11200097 DOI: 10.1002/advs.202304848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 02/26/2024] [Indexed: 04/25/2024]
Abstract
Simple sequence repeats (SSRs) are found in nonrandom distributions in genomes and are thought to impact gene expression. The distribution patterns of 48 295 SSRs of Paphiopedilum malipoense are mined and characterized based on the first full-length transcriptome and comprehensive transcriptome dataset from 12 organs. Statistical genomics analyses are used to investigate how SSRs in transcripts affect gene expression. The results demonstrate the correlations between SSR distributions, characteristics, and expression level. Nine expression-modulating motifs (expMotifs) are identified and a model is proposed to explain the effect of their key features, potency, and gene function on an intra-transcribed region scale. The expMotif-transcribed region combination is the most predominant contributor to the expression-modulating effect of SSRs, and some intra-transcribed regions are critical for this effect. Genes containing the same type of expMotif-SSR elements in the same transcribed region are likely linked in function, regulation, or evolution aspects. This study offers novel evidence to understand how SSRs regulate gene expression and provides potential regulatory elements for plant genetic engineering.
Collapse
Affiliation(s)
- Yingyi Liang
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| | - Jing Hao
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| | - Jieyu Wang
- College of Forestry and Landscape ArchitectureSouth China Agricultural UniversityGuangzhou510642China
| | - Guoqiang Zhang
- Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture and ArtFujian Agriculture and Forestry UniversityFuzhou350002China
| | - Yingjuan Su
- School of Life SciencesSun Yat‐sen UniversityGuangzhou510275China
- Research Institute of Sun Yat‐sen University in ShenzhenShenzhen518107China
| | - Zhong‐Jian Liu
- Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture and ArtFujian Agriculture and Forestry UniversityFuzhou350002China
| | - Ting Wang
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| |
Collapse
|
10
|
Livnat A, Love AC. Mutation and evolution: Conceptual possibilities. Bioessays 2024; 46:e2300025. [PMID: 38254311 DOI: 10.1002/bies.202300025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 11/03/2023] [Accepted: 11/06/2023] [Indexed: 01/24/2024]
Abstract
Although random mutation is central to models of evolutionary change, a lack of clarity remains regarding the conceptual possibilities for thinking about the nature and role of mutation in evolution. We distinguish several claims at the intersection of mutation, evolution, and directionality and then characterize a previously unrecognized category: complex conditioned mutation. Empirical evidence in support of this category suggests that the historically famous fluctuation test should be revisited, and new experiments should be undertaken with emerging experimental techniques to facilitate detecting mutation rates within specific loci at an ultra-high, individual base pair resolution.
Collapse
Affiliation(s)
- Adi Livnat
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
- Institute of Evolution, University of Haifa, Haifa, Israel
| | - Alan C Love
- Department of Philosophy and Minnesota Center for Philosophy of Science, University of Minnesota (Twin Cities), Minneapolis, Minnesota, USA
| |
Collapse
|
11
|
Cousins T, Tabin D, Patterson N, Reich D, Durvasula A. Accurate inference of population history in the presence of background selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576291. [PMID: 38313273 PMCID: PMC10838404 DOI: 10.1101/2024.01.18.576291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
All published methods for learning about demographic history make the simplifying assumption that the genome evolves neutrally, and do not seek to account for the effects of natural selection on patterns of variation. This is a major concern, as ample work has demonstrated the pervasive effects of natural selection and in particular background selection (BGS) on patterns of genetic variation in diverse species. Simulations and theoretical work have shown that methods to infer changes in effective population size over time (Ne(t)) become increasingly inaccurate as the strength of linked selection increases. Here, we introduce an extension to the Pairwise Sequentially Markovian Coalescent (PSMC) algorithm, PSMC+, which explicitly co-models demographic history and natural selection. We benchmark our method using forward-in-time simulations with BGS and find that our approach improves the accuracy of effective population size inference. Leveraging a high resolution map of BGS in humans, we infer considerable changes in the magnitude of inferred effective population size relative to previous reports. Finally, we separately infer Ne(t) on the X chromosome and on the autosomes in diverse great apes without making a correction for selection, and find that the inferred ratio fluctuates substantially through time in a way that differs across species, showing that uncorrected selection may be an important driver of signals of genetic difference on the X chromosome and autosomes.
Collapse
Affiliation(s)
- Trevor Cousins
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Daniel Tabin
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Boston, MA, USA
| | - Arun Durvasula
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
12
|
Findlay SD, Romo L, Burge CB. Quantifying negative selection in human 3' UTRs uncovers constrained targets of RNA-binding proteins. Nat Commun 2024; 15:85. [PMID: 38168060 PMCID: PMC10762232 DOI: 10.1038/s41467-023-44456-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 12/14/2023] [Indexed: 01/05/2024] Open
Abstract
Many non-coding variants associated with phenotypes occur in 3' untranslated regions (3' UTRs), and may affect interactions with RNA-binding proteins (RBPs) to regulate gene expression post-transcriptionally. However, identifying functional 3' UTR variants has proven difficult. We use allele frequencies from the Genome Aggregation Database (gnomAD) to identify classes of 3' UTR variants under strong negative selection in humans. We develop intergenic mutability-adjusted proportion singleton (iMAPS), a generalized measure related to MAPS, to quantify negative selection in non-coding regions. This approach, in conjunction with in vitro and in vivo binding data, identifies precise RBP binding sites, miRNA target sites, and polyadenylation signals (PASs) under strong selection. For each class of sites, we identify thousands of gnomAD variants under selection comparable to missense coding variants, and find that sites in core 3' UTR regions upstream of the most-used PAS are under strongest selection. Together, this work improves our understanding of selection on human genes and validates approaches for interpreting genetic variants in human 3' UTRs.
Collapse
Affiliation(s)
- Scott D Findlay
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Lindsay Romo
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
- Boston Children's Hospital, Boston, MA, 02115, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.
| |
Collapse
|
13
|
Guo MH, Francioli LC, Stenton SL, Goodrich JK, Watts NA, Singer-Berk M, Groopman E, Darnowsky PW, Solomonson M, Baxter S, Tiao G, Neale BM, Hirschhorn JN, Rehm HL, Daly MJ, O'Donnell-Luria A, Karczewski KJ, MacArthur DG, Samocha KE. Inferring compound heterozygosity from large-scale exome sequencing data. Nat Genet 2024; 56:152-161. [PMID: 38057443 PMCID: PMC10872287 DOI: 10.1038/s41588-023-01608-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 11/08/2023] [Indexed: 12/08/2023]
Abstract
Recessive diseases arise when both copies of a gene are impacted by a damaging genetic variant. When a patient carries two potentially causal variants in a gene, accurate diagnosis requires determining that these variants occur on different copies of the chromosome (that is, are in trans) rather than on the same copy (that is, in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. Here we developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in the Genome Aggregation Database (v2, n = 125,748 exomes). Our approach estimates phase with 96% accuracy, both in trio data and in patients with Mendelian conditions and presumed causal compound heterozygous variants. We provide a public resource of phasing estimates for coding variants and counts per gene of rare variants in trans that can aid interpretation of rare co-occurring variants in the context of recessive disease.
Collapse
Affiliation(s)
- Michael H Guo
- Department of Neurology, Hospital of the University of the Pennsylvania, Philadelphia, PA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laurent C Francioli
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Sarah L Stenton
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Julia K Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emily Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Philip W Darnowsky
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Samantha Baxter
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joel N Hirschhorn
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Departments of Genetics and Pediatrics, Harvard Medical School, Boston, MA, USA
- Division of Endocrinology, Boston Children's Hospital, Boston, MA, USA
- Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Kaitlin E Samocha
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
14
|
A biology-aware mutation rate model for human germline. Nat Genet 2023; 55:2033-2034. [PMID: 38040830 DOI: 10.1038/s41588-023-01564-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2023]
|
15
|
Seplyarskiy V, Koch EM, Lee DJ, Lichtman JS, Luan HH, Sunyaev SR. A mutation rate model at the basepair resolution identifies the mutagenic effect of polymerase III transcription. Nat Genet 2023; 55:2235-2242. [PMID: 38036792 PMCID: PMC11348951 DOI: 10.1038/s41588-023-01562-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/06/2023] [Indexed: 12/02/2023]
Abstract
De novo mutations occur at substantially different rates depending on genomic location, sequence context and DNA strand. The success of methods to estimate selection intensity, infer demographic history and map rare disease genes, depends strongly on assumptions about the local mutation rate. Here we present Roulette, a genome-wide mutation rate model at basepair resolution that incorporates known determinants of local mutation rate. Roulette is shown to be more accurate than existing models. We use Roulette to refine the estimates of population growth within Europe by incorporating the full range of human mutation rates. The analysis of significant deviations from the model predictions revealed a tenfold increase in mutation rate in nearly all genes transcribed by polymerase III (Pol III), suggesting a new mutagenic mechanism. We also detected an elevated mutation rate within transcription factor binding sites restricted to sites actively used in testis and residing in promoters.
Collapse
Affiliation(s)
- Vladimir Seplyarskiy
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Evan M Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Daniel J Lee
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Joshua S Lichtman
- NGM Biopharmaceuticals Inc., South San Francisco, CA, USA
- Soleil Labs, South San Francisco, CA, USA
| | - Harding H Luan
- NGM Biopharmaceuticals Inc., South San Francisco, CA, USA
- Soleil Labs, South San Francisco, CA, USA
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
16
|
Li H, Yang Z, Tu F, Deng L, Han Y, Fu X, Wang L, Gu D, Werner B, Huang W. Mutation divergence over space in tumour expansion. J R Soc Interface 2023; 20:20230542. [PMID: 37989227 PMCID: PMC10681009 DOI: 10.1098/rsif.2023.0542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 10/30/2023] [Indexed: 11/23/2023] Open
Abstract
Mutation accumulation in tumour evolution is one major cause of intra-tumour heterogeneity (ITH), which often leads to drug resistance during treatment. Previous studies with multi-region sequencing have shown that mutation divergence among samples within the patient is common, and the importance of spatial sampling to obtain a complete picture in tumour measurements. However, quantitative comparisons of the relationship between mutation heterogeneity and tumour expansion modes, sampling distances as well as the sampling methods are still few. Here, we investigate how mutations diverge over space by varying the sampling distance and tumour expansion modes using individual-based simulations. We measure ITH by the Jaccard index between samples and quantify how ITH increases with sampling distance, the pattern of which holds in various sampling methods and sizes. We also compare the inferred mutation rates based on the distributions of variant allele frequencies under different tumour expansion modes and sampling sizes. In exponentially fast expanding tumours, a mutation rate can always be inferred for any sampling size. However, the accuracy compared with the true value decreases when the sampling size decreases, where small sampling sizes result in a high estimate of the mutation rate. In addition, such an inference becomes unreliable when the tumour expansion is slow, such as in surface growth.
Collapse
Affiliation(s)
- Haiyang Li
- Group of Theoretical Biology, The State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, People’s Republic of China
- Evolutionary Dynamics Group, Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Zixuan Yang
- Group of Theoretical Biology, The State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, People’s Republic of China
| | - Fengyu Tu
- Group of Theoretical Biology, The State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, People’s Republic of China
| | - Lijuan Deng
- Group of Theoretical Biology, The State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, People’s Republic of China
| | - Yuqing Han
- Group of Theoretical Biology, The State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, People’s Republic of China
| | - Xing Fu
- Group of Theoretical Biology, The State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, People’s Republic of China
| | - Long Wang
- Group of Theoretical Biology, The State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, People’s Republic of China
| | - Di Gu
- The first affiliated hospital of Guangzhou Medical University, Guangzhou, People’s Republic of China
| | - Benjamin Werner
- Evolutionary Dynamics Group, Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Weini Huang
- Group of Theoretical Biology, The State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, People’s Republic of China
- School of Mathematical Sciences, Queen Mary University of London, London, UK
| |
Collapse
|
17
|
Cecil RM, Sugden LA. On convolutional neural networks for selection inference: Revealing the effect of preprocessing on model learning and the capacity to discover novel patterns. PLoS Comput Biol 2023; 19:e1010979. [PMID: 38011281 PMCID: PMC10703409 DOI: 10.1371/journal.pcbi.1010979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 12/07/2023] [Accepted: 10/26/2023] [Indexed: 11/29/2023] Open
Abstract
A central challenge in population genetics is the detection of genomic footprints of selection. As machine learning tools including convolutional neural networks (CNNs) have become more sophisticated and applied more broadly, these provide a logical next step for increasing our power to learn and detect such patterns; indeed, CNNs trained on simulated genome sequences have recently been shown to be highly effective at this task. Unlike previous approaches, which rely upon human-crafted summary statistics, these methods are able to be applied directly to raw genomic data, allowing them to potentially learn new signatures that, if well-understood, could improve the current theory surrounding selective sweeps. Towards this end, we examine a representative CNN from the literature, paring it down to the minimal complexity needed to maintain comparable performance; this low-complexity CNN allows us to directly interpret the learned evolutionary signatures. We then validate these patterns in more complex models using metrics that evaluate feature importance. Our findings reveal that preprocessing steps, which determine how the population genetic data is presented to the model, play a central role in the learned prediction method. This results in models that mimic previously-defined summary statistics; in one case, the summary statistic itself achieves similarly high accuracy. For evolutionary processes that are less well understood than selective sweeps, we hope this provides an initial framework for using CNNs in ways that go beyond simply achieving high classification performance. Instead, we propose that CNNs might be useful as tools for learning novel patterns that can translate to easy-to-implement summary statistics available to a wider community of researchers.
Collapse
Affiliation(s)
- Ryan M. Cecil
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
- Department of Statistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Lauren A. Sugden
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
18
|
Beichman AC, Robinson J, Lin M, Moreno-Estrada A, Nigenda-Morales S, Harris K. Evolution of the Mutation Spectrum Across a Mammalian Phylogeny. Mol Biol Evol 2023; 40:msad213. [PMID: 37770035 PMCID: PMC10566577 DOI: 10.1093/molbev/msad213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/21/2023] [Accepted: 09/19/2023] [Indexed: 10/03/2023] Open
Abstract
Although evolutionary biologists have long theorized that variation in DNA repair efficacy might explain some of the diversity of lifespan and cancer incidence across species, we have little data on the variability of normal germline mutagenesis outside of humans. Here, we shed light on the spectrum and etiology of mutagenesis across mammals by quantifying mutational sequence context biases using polymorphism data from thirteen species of mice, apes, bears, wolves, and cetaceans. After normalizing the mutation spectrum for reference genome accessibility and k-mer content, we use the Mantel test to deduce that mutation spectrum divergence is highly correlated with genetic divergence between species, whereas life history traits like reproductive age are weaker predictors of mutation spectrum divergence. Potential bioinformatic confounders are only weakly related to a small set of mutation spectrum features. We find that clock-like mutational signatures previously inferred from human cancers cannot explain the phylogenetic signal exhibited by the mammalian mutation spectrum, despite the ability of these signatures to fit each species' 3-mer spectrum with high cosine similarity. In contrast, parental aging signatures inferred from human de novo mutation data appear to explain much of the 1-mer spectrum's phylogenetic signal in combination with a novel mutational signature. We posit that future models purporting to explain the etiology of mammalian mutagenesis need to capture the fact that more closely related species have more similar mutation spectra; a model that fits each marginal spectrum with high cosine similarity is not guaranteed to capture this hierarchy of mutation spectrum variation among species.
Collapse
Affiliation(s)
- Annabel C Beichman
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jacqueline Robinson
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Meixi Lin
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity, Advanced Genomics Unit (UGA-LANGEBIO), CINVESTAV, Irapuato, Mexico
| | - Sergio Nigenda-Morales
- Department of Biological Sciences, California State University, San Marcos, San Marcos, CA, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Herbold Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, USA
| |
Collapse
|
19
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. Evolution 2023; 77:2113-2127. [PMID: 37395482 PMCID: PMC10547124 DOI: 10.1093/evolut/qpad120] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/15/2023] [Accepted: 06/30/2023] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modeled by a realistic mutation rate and as part of a realistic distribution of fitness effects, as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modeled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false-positive rates are in excess of true-positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| |
Collapse
|
20
|
Guo MH, Francioli LC, Stenton SL, Goodrich JK, Watts NA, Singer-Berk M, Groopman E, Darnowsky PW, Solomonson M, Baxter S, gnomAD Project Consortium, Tiao G, Neale BM, Hirschhorn JN, Rehm HL, Daly MJ, O’Donnell-Luria A, Karczewski KJ, MacArthur DG, Samocha KE. Inferring compound heterozygosity from large-scale exome sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.19.533370. [PMID: 36993580 PMCID: PMC10055215 DOI: 10.1101/2023.03.19.533370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
Recessive diseases arise when both the maternal and the paternal copies of a gene are impacted by a damaging genetic variant in the affected individual. When a patient carries two different potentially causal variants in a gene for a given disorder, accurate diagnosis requires determining that these two variants occur on different copies of the chromosome (i.e., are in trans) rather than on the same copy (i.e. in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. We developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in exome sequencing data from the Genome Aggregation Database (gnomAD v2, n=125,748). When applied to trio data where phase can be determined by transmission, our approach estimates phase with 95.7% accuracy and remains accurate even for very rare variants (allele frequency < 1×10-4). We also correctly phase 95.9% of variant pairs in a set of 293 patients with Mendelian conditions carrying presumed causal compound heterozygous variants. We provide a public resource of phasing estimates from gnomAD, including phasing estimates for coding variants across the genome and counts per gene of rare variants in trans, that can aid interpretation of rare co-occurring variants in the context of recessive disease.
Collapse
Affiliation(s)
- Michael H. Guo
- Department of Neurology, Hospital of the University of the Pennsylvania, Philadelphia, PA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laurent C. Francioli
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Sarah L. Stenton
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
| | - Julia K. Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A. Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emily Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
| | - Philip W. Darnowsky
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Samantha Baxter
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Benjamin M. Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joel N. Hirschhorn
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Division of Endocrinology, Boston Children’s Hospital, Boston, MA, USA
- Center for Basic and Translational Obesity Research, Boston Children’s Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J. Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Institute for Molecular Medicine Finland, (FIMM) Helsinki, Finland
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Konrad J. Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel G. MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research, UNSW Sydney, Sydney, Australia
- Centre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Kaitlin E. Samocha
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
21
|
Poulsgaard GA, Sørensen SG, Juul RI, Nielsen MM, Pedersen JS. Sequence dependencies and mutation rates of localized mutational processes in cancer. Genome Med 2023; 15:63. [PMID: 37592287 PMCID: PMC10436389 DOI: 10.1186/s13073-023-01217-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 08/02/2023] [Indexed: 08/19/2023] Open
Abstract
BACKGROUND Cancer mutations accumulate through replication errors and DNA damage coupled with incomplete repair. Individual mutational processes often show nucleotide sequence and functional region preferences. As a result, some sequence contexts mutate at much higher rates than others, with additional variation found between functional regions. Mutational hotspots, with recurrent mutations across cancer samples, represent genomic positions with elevated mutation rates, often caused by highly localized mutational processes. METHODS We count the 11-mer genomic sequences across the genome, and using the PCAWG set of 2583 pan-cancer whole genomes, we associate 11-mers with mutational signatures, hotspots of single nucleotide variants, and specific genomic regions. We evaluate the mutation rates of individual and combined sets of 11-mers and derive mutational sequence motifs. RESULTS We show that hotspots generally identify highly mutable sequence contexts. Using these, we show that some mutational signatures are enriched in hotspot sequence contexts, corresponding to well-defined sequence preferences for the underlying localized mutational processes. This includes signature 17b (of unknown etiology) and signatures 62 (POLE deficiency), 7a (UV), and 72 (linked to lymphomas). In some cases, the mutation rate and sequence preference increase further when focusing on certain genomic regions, such as signature 62 in transcribed regions, where the mutation rate is increased up to 9-folds over cancer type and mutational signature average. CONCLUSIONS We summarize our findings in a catalog of localized mutational processes, their sequence preferences, and their estimated mutation rates.
Collapse
Affiliation(s)
- Gustav Alexander Poulsgaard
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark
| | - Simon Grund Sørensen
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark
| | - Randi Istrup Juul
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark
| | - Morten Muhlig Nielsen
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark
| | - Jakob Skou Pedersen
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark.
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark.
- Bioinformatics Research Centre (BiRC), Aarhus University, University City 81, Building 1872, 3Rd Floor, 8000, Aarhus C, Denmark.
| |
Collapse
|
22
|
Liu Z, Samee M. Structural underpinnings of mutation rate variations in the human genome. Nucleic Acids Res 2023; 51:7184-7197. [PMID: 37395403 PMCID: PMC10415140 DOI: 10.1093/nar/gkad551] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/06/2023] [Accepted: 06/15/2023] [Indexed: 07/04/2023] Open
Abstract
Single nucleotide mutation rates have critical implications for human evolution and genetic diseases. Importantly, the rates vary substantially across the genome and the principles underlying such variations remain poorly understood. A recent model explained much of this variation by considering higher-order nucleotide interactions in the 7-mer sequence context around mutated nucleotides. This model's success implicates a connection between DNA shape and mutation rates. DNA shape, i.e. structural properties like helical twist and tilt, is known to capture interactions between nucleotides within a local context. Thus, we hypothesized that changes in DNA shape features at and around mutated positions can explain mutation rate variations in the human genome. Indeed, DNA shape-based models of mutation rates showed similar or improved performance over current nucleotide sequence-based models. These models accurately characterized mutation hotspots in the human genome and revealed the shape features whose interactions underlie mutation rate variations. DNA shape also impacts mutation rates within putative functional regions like transcription factor binding sites where we find a strong association between DNA shape and position-specific mutation rates. This work demonstrates the structural underpinnings of nucleotide mutations in the human genome and lays the groundwork for future models of genetic variations to incorporate DNA shape.
Collapse
Affiliation(s)
- Zian Liu
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
23
|
Lin Y, Darolti I, van der Bijl W, Morris J, Mank JE. Extensive variation in germline de novo mutations in Poecilia reticulata. Genome Res 2023; 33:1317-1324. [PMID: 37442578 PMCID: PMC10547258 DOI: 10.1101/gr.277936.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 07/07/2023] [Indexed: 07/15/2023]
Abstract
The rate of germline mutation is fundamental to evolutionary processes, as it generates the variation upon which selection acts. The guppy, Poecilia reticulata, is a model of rapid adaptation, however the relative contribution of standing genetic variation versus de novo mutation (DNM) to evolution in this species remains unclear. Here, we use pedigree-based approaches to quantify and characterize germline DNMs in three large guppy families. Our results suggest germline mutation rate in the guppy varies substantially across individuals and families. Most DNMs are shared across multiple siblings, suggesting they arose during early embryonic development. DNMs are randomly distributed throughout the genome, and male-biased mutation rate is low, as would be expected from the short guppy generation time. Overall, our study shows remarkable variation in germline mutation rate and provides insights into rapid evolution of guppies.
Collapse
Affiliation(s)
- Yuying Lin
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada;
| | - Iulia Darolti
- Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland
| | - Wouter van der Bijl
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Jake Morris
- School of Biological Science, University of Bristol, Bristol BS8 1TQ, United Kingdom
| | - Judith E Mank
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| |
Collapse
|
24
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.15.545166. [PMID: 37398347 PMCID: PMC10312679 DOI: 10.1101/2023.06.15.545166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modelled by a realistic mutation rate and as part of a realistic distribution of fitness effects (DFE), as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modelled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false positive rates are in excess of true positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong. Teaser Text Outlier-based genomic scans have proven a popular approach for identifying loci that have potentially experienced recent positive selection. However, it has previously been shown that an evolutionarily appropriate baseline model that incorporates non-equilibrium population histories, purifying and background selection, and variation in mutation and recombination rates is necessary to reduce often extreme false positive rates when performing genomic scans. Here we evaluate the power to detect recurrent selective sweeps using common SFS-based and haplotype-based methods under these increasingly realistic models. We find that while these appropriate evolutionary baselines are essential to reduce false positive rates, the power to accurately detect recurrent selective sweeps is generally low across much of the biologically relevant parameter space.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Present address: Department of Biology, Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | | |
Collapse
|
25
|
Livnat A, Melamed D. Evolutionary honing in and mutational replacement: how long-term directed mutational responses to specific environmental pressures are possible. Theory Biosci 2023; 142:87-105. [PMID: 36899155 PMCID: PMC10209271 DOI: 10.1007/s12064-023-00387-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 01/13/2023] [Indexed: 03/12/2023]
Abstract
Recent results have shown that the human malaria-resistant hemoglobin S mutation originates de novo more frequently in the gene and in the population where it is of adaptive significance, namely, in the hemoglobin subunit beta gene compared to the nonresistant but otherwise identical 20A[Formula: see text]T mutation in the hemoglobin subunit delta gene, and in sub-Saharan Africans, who have been subject to intense malarial pressure for many generations, compared to northern Europeans, who have not. This finding raises a fundamental challenge to the traditional notion of accidental mutation. Here, we address this finding with the replacement hypothesis, according to which preexisting genetic interactions can lead directly and mechanistically to mutations that simplify and replace them. Thus, an evolutionary process under selection can gradually hone in on interactions of importance for the currently evolving adaptations, from which large-effect mutations follow that are relevant to these adaptations. We exemplify this hypothesis using multiple types of mutation, including gene fusion mutations, gene duplication mutations, A[Formula: see text]G mutations in RNA-edited sites and transcription-associated mutations, and place it in the broader context of a system-level view of mutation origination called interaction-based evolution. Potential consequences include that similarity of mutation pressures may contribute to parallel evolution in genetically related species, that the evolution of genome organization may be driven by mutational mechanisms, that transposable element movements may also be explained by replacement, and that long-term directed mutational responses to specific environmental pressures are possible. Such mutational phenomena need to be further tested by future studies in natural and artificial settings.
Collapse
Affiliation(s)
- Adi Livnat
- Department of Evolutionary and Environmental Biology, University of Haifa, 3498838, Haifa, Israel.
- Institute of Evolution, University of Haifa, 3498838, Haifa, Israel.
| | - Daniel Melamed
- Department of Evolutionary and Environmental Biology, University of Haifa, 3498838, Haifa, Israel
- Institute of Evolution, University of Haifa, 3498838, Haifa, Israel
| |
Collapse
|
26
|
Beichman AC, Robinson J, Lin M, Moreno-Estrada A, Nigenda-Morales S, Harris K. "Evolution of the mutation spectrum across a mammalian phylogeny". BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.31.543114. [PMID: 37398383 PMCID: PMC10312511 DOI: 10.1101/2023.05.31.543114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Little is known about how the spectrum and etiology of germline mutagenesis might vary among mammalian species. To shed light on this mystery, we quantify variation in mutational sequence context biases using polymorphism data from thirteen species of mice, apes, bears, wolves, and cetaceans. After normalizing the mutation spectrum for reference genome accessibility and k -mer content, we use the Mantel test to deduce that mutation spectrum divergence is highly correlated with genetic divergence between species, whereas life history traits like reproductive age are weaker predictors of mutation spectrum divergence. Potential bioinformatic confounders are only weakly related to a small set of mutation spectrum features. We find that clocklike mutational signatures previously inferred from human cancers cannot explain the phylogenetic signal exhibited by the mammalian mutation spectrum, despite the ability of these clocklike signatures to fit each species' 3-mer spectrum with high cosine similarity. In contrast, parental aging signatures inferred from human de novo mutation data appear to explain much of the mutation spectrum's phylogenetic signal when fit to non-context-dependent mutation spectrum data in combination with a novel mutational signature. We posit that future models purporting to explain the etiology of mammalian mutagenesis need to capture the fact that more closely related species have more similar mutation spectra; a model that fits each marginal spectrum with high cosine similarity is not guaranteed to capture this hierarchy of mutation spectrum variation among species.
Collapse
Affiliation(s)
| | - Jacqueline Robinson
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA
| | - Meixi Lin
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity, Advanced Genomics Unit (UGA-LANGEBIO), CINVESTAV, Irapuato, Mexico
| | - Sergio Nigenda-Morales
- Department of Biological Sciences, California State University, San Marcos, San Marcos CA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle WA
| |
Collapse
|
27
|
Liao K, Carlson J, Zöllner S. The effect of mutation subtypes on the allele frequency spectrum and population genetics inference. G3 (BETHESDA, MD.) 2023; 13:jkad035. [PMID: 36759699 PMCID: PMC10085755 DOI: 10.1093/g3journal/jkad035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 02/11/2023]
Abstract
Population genetics has adapted as technological advances in next-generation sequencing have resulted in an exponential increase of genetic data. A common approach to efficiently analyze genetic variation present in large sequencing data is through the allele frequency spectrum, defined as the distribution of allele frequencies in a sample. While the frequency spectrum serves to summarize patterns of genetic variation, it implicitly assumes mutation types (A→C vs C→T) as interchangeable. However, mutations of different types arise and spread due to spatial and temporal variation in forces such as mutation rate and biased gene conversion that result in heterogeneity in the distribution of allele frequencies across sites. In this work, we explore the impact of this simplification on multiple aspects of population genetic modeling. As a site's mutation rate is strongly affected by flanking nucleotides, we defined a mutation subtype by the base pair change and adjacent nucleotides (e.g. AAA→ATA) and systematically assessed the heterogeneity in the frequency spectrum across 96 distinct 3-mer mutation subtypes using n = 3556 whole-genome sequenced individuals of European ancestry. We observed substantial variation across the subtype-specific frequency spectra, with some of the variation being influenced by molecular factors previously identified for single base mutation types. Estimates of model parameters from demographic inference performed for each mutation subtype's AFS individually varied drastically across the 96 subtypes. In local patterns of variation, a combination of regional subtype composition and local genomic factors shaped the regional frequency spectrum across genomic regions. Our results illustrate how treating variants in large sequencing samples as interchangeable may confound population genetic frameworks and encourages us to consider the unique evolutionary mechanisms of analyzed polymorphisms.
Collapse
Affiliation(s)
- Kevin Liao
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jedidiah Carlson
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712, USA
- Department of Population Health, University of Texas at Austin, Austin, TX 78712, USA
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
28
|
Revisiting mutagenesis at non-B DNA motifs in the human genome. Nat Struct Mol Biol 2023; 30:417-424. [PMID: 36914796 DOI: 10.1038/s41594-023-00936-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 02/03/2023] [Indexed: 03/16/2023]
Abstract
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.
Collapse
|
29
|
Gao Z, Zhang Y, Cramer N, Przeworski M, Moorjani P. Limited role of generation time changes in driving the evolution of the mutation spectrum in humans. eLife 2023; 12:e81188. [PMID: 36779395 PMCID: PMC10014080 DOI: 10.7554/elife.81188] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 02/02/2023] [Indexed: 02/14/2023] Open
Abstract
Recent studies have suggested that the human germline mutation rate and spectrum evolve rapidly. Variation in generation time has been linked to these changes, though its contribution remains unclear. We develop a framework to characterize temporal changes in polymorphisms within and between populations, while controlling for the effects of natural selection and biased gene conversion. Application to the 1000 Genomes Project dataset reveals multiple independent changes that arose after the split of continental groups, including a previously reported, transient elevation in TCC>TTC mutations in Europeans and novel signals of divergence in C>Gand T>A mutation rates among population samples. We also find a significant difference between groups sampled in and outside of Africa in old T>C polymorphisms that predate the out-of-Africa migration. This surprising signal is driven by TpG>CpG mutations and stems in part from mis-polarized CpG transitions, which are more likely to undergo recurrent mutations. Finally, by relating the mutation spectrum of polymorphisms to parental age effects on de novo mutations, we show that plausible changes in the generation time cannot explain the patterns observed for different mutation types jointly. Thus, other factors - genetic modifiers or environmental exposures - must have had a non-negligible impact on the human mutation landscape.
Collapse
Affiliation(s)
- Ziyue Gao
- Department of Genetics, University of Pennsylvania, Perelman School of MedicinePhiladelphiaUnited States
| | - Yulin Zhang
- Center for Computational Biology, University of California, BerkeleyBerkeleyUnited States
| | - Nathan Cramer
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
| | - Molly Przeworski
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
- Department of Systems Biology, Columbia UniversityNew YorkUnited States
| | - Priya Moorjani
- Center for Computational Biology, University of California, BerkeleyBerkeleyUnited States
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
| |
Collapse
|
30
|
Bethune J, Kleppe A, Besenbacher S. A method to build extended sequence context models of point mutations and indels. Nat Commun 2022; 13:7884. [PMID: 36550134 PMCID: PMC9780256 DOI: 10.1038/s41467-022-35596-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 12/13/2022] [Indexed: 12/24/2022] Open
Abstract
The mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a new method that solves this problem by grouping similar k-mers. We refer to the method as k-mer pattern partition and have implemented it in a software package called kmerPaPa. We use a large set of human de novo mutations to show that this new method leads to improved prediction of mutation rates and makes it possible to create models using wider sequence contexts than previous studies. As the first method of its kind, it does not only predict rates for point mutations but also insertions and deletions. We have additionally created a software package called Genovo that, given a k-mer pattern partition model, predicts the expected number of synonymous, missense, and other functional mutation types for each gene. Using this software, we show that the created mutation rate models increase the statistical power to detect genes containing disease-causing variants and to identify genes under strong selective constraint.
Collapse
Affiliation(s)
- Jörn Bethune
- grid.154185.c0000 0004 0512 597XDepartment of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark ,grid.7048.b0000 0001 1956 2722Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - April Kleppe
- grid.154185.c0000 0004 0512 597XDepartment of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark ,grid.7048.b0000 0001 1956 2722Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - Søren Besenbacher
- grid.154185.c0000 0004 0512 597XDepartment of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus, Denmark ,grid.7048.b0000 0001 1956 2722Department of Clinical Medicine, Aarhus University, Aarhus, Denmark ,grid.7048.b0000 0001 1956 2722Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| |
Collapse
|
31
|
Fang Y, Deng S, Li C. A generalizable deep learning framework for inferring fine-scale germline mutation rate maps. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00574-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
32
|
Ng JK, Vats P, Fritz-Waters E, Sarkar S, Sams EI, Padhi EM, Payne ZL, Leonard S, West MA, Prince C, Trani L, Jansen M, Vacek G, Samadi M, Harkins TT, Pohl C, Turner TN. de novo variant calling identifies cancer mutation signatures in the 1000 Genomes Project. Hum Mutat 2022; 43:1979-1993. [PMID: 36054329 PMCID: PMC9771978 DOI: 10.1002/humu.24455] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 07/22/2022] [Accepted: 08/29/2022] [Indexed: 01/25/2023]
Abstract
Detection of de novo variants (DNVs) is critical for studies of disease-related variation and mutation rates. To accelerate DNV calling, we developed a graphics processing units-based workflow. We applied our workflow to whole-genome sequencing data from three parent-child sequenced cohorts including the Simons Simplex Collection (SSC), Simons Foundation Powering Autism Research (SPARK), and the 1000 Genomes Project (1000G) that were sequenced using DNA from blood, saliva, and lymphoblastoid cell lines (LCLs), respectively. The SSC and SPARK DNV callsets were within expectations for number of DNVs, percent at CpG sites, phasing to the paternal chromosome of origin, and average allele balance. However, the 1000G DNV callset was not within expectations and contained excessive DNVs that are likely cell line artifacts. Mutation signature analysis revealed 30% of 1000G DNV signatures matched B-cell lymphoma. Furthermore, we found variants in DNA repair genes and at Clinvar pathogenic or likely-pathogenic sites and significant excess of protein-coding DNVs in IGLL5; a gene known to be involved in B-cell lymphomas. Our study provides a new rapid DNV caller for the field and elucidates important implications of using sequencing data from LCLs for reference building and disease-related projects.
Collapse
Affiliation(s)
- Jeffrey K. Ng
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | - Elyn Fritz-Waters
- Research Infrastructure Services, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Stephanie Sarkar
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Eleanor I. Sams
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Evin M. Padhi
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Zachary L. Payne
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Shawn Leonard
- Research Infrastructure Services, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Marc A. West
- NVIDIA Corporation, Santa Clara, California, USA
| | - Chandler Prince
- Research Infrastructure Services, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Lee Trani
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Marshall Jansen
- Research Infrastructure Services, Washington University School of Medicine, St. Louis, Missouri, USA
| | - George Vacek
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Craig Pohl
- Research Infrastructure Services, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Tychele N. Turner
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
33
|
Coursimault J, Cassinari K, Lecoquierre F, Quenez O, Coutant S, Derambure C, Vezain M, Drouot N, Vera G, Schaefer E, Philippe A, Doray B, Lambert L, Ghoumid J, Smol T, Rama M, Legendre M, Lacombe D, Fergelot P, Olaso R, Boland A, Deleuze JF, Goldenberg A, Saugier-Veber P, Nicolas G. Deep intronic NIPBL de novo mutations and differential diagnoses revealed by whole genome and RNA sequencing in Cornelia de Lange syndrome patients. Hum Mutat 2022; 43:1882-1897. [PMID: 35842780 DOI: 10.1002/humu.24438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 05/23/2022] [Accepted: 07/09/2022] [Indexed: 01/25/2023]
Abstract
Cornelia de Lange syndrome (CdLS; MIM# 122470) is a rare developmental disorder. Pathogenic variants in 5 genes explain approximately 50% cases, leaving the other 50% unsolved. We performed whole genome sequencing (WGS) ± RNA sequencing (RNA-seq) in 5 unsolved trios fulfilling the following criteria: (i) clinical diagnosis of classic CdLS, (ii) negative gene panel sequencing from blood and saliva-isolated DNA, (iii) unaffected parents' DNA samples available and (iv) proband's blood-isolated RNA available. A pathogenic de novo mutation (DNM) was observed in a CdLS differential diagnosis gene in 3/5 patients, namely POU3F3, SPEN, and TAF1. In the other two, we identified two distinct deep intronic DNM in NIPBL predicted to create a novel splice site. RT-PCRs and RNA-Seq showed aberrant transcripts leading to the creation of a novel frameshift exon. Our findings suggest the relevance of WGS in unsolved suspected CdLS cases and that deep intronic variants may account for a proportion of them.
Collapse
Affiliation(s)
- Juliette Coursimault
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Kévin Cassinari
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - François Lecoquierre
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Olivier Quenez
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Sophie Coutant
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Céline Derambure
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Myriam Vezain
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Nathalie Drouot
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Gabriella Vera
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Elise Schaefer
- Service de Génétique Médicale, Institut de Génétique Médicale d'Alsace (IGMA), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Anaïs Philippe
- Service de Génétique Médicale, Institut de Génétique Médicale d'Alsace (IGMA), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Bérénice Doray
- Service de Génétique Médicale, Centre Hospitalier Universitaire Félix Guyon, Bellepierre Saint Denis, France
| | - Laëtitia Lambert
- Service de Génétique Clinique, CHRU NANCY, F-54000 France, UMR INSERM U 1256 N-GERE, F-54000, Nancy, France
| | - Jamal Ghoumid
- Université de Lille, ULR7364 RADEME, CHU Lille, Clinique de Génétique « Guy Fontaine », and FHU-G4 Génomique, F-59000, Lille, France
| | - Thomas Smol
- Université de Lille, ULR7364 RADEME, CHU Lille, Institut de Génétique Médicale, and FHU-G4 Génomique, F-59000, Lille, France
| | - Mélanie Rama
- Institut de Génétique Médicale, CHU de Lille, France
| | - Marine Legendre
- Service de Génétique Médicale, CHU de Bordeaux, Bordeaux, France
| | - Didier Lacombe
- INSERM U1211, Université de Bordeaux; Génétique Médicale, CHU de Bordeaux, Bordeaux, France
| | - Patricia Fergelot
- INSERM U1211, Université de Bordeaux; Génétique Médicale, CHU de Bordeaux, Bordeaux, France
| | - Robert Olaso
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Anne Boland
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Jean-François Deleuze
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Alice Goldenberg
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Pascale Saugier-Veber
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| | - Gaël Nicolas
- Normandie Univ, UNIROUEN, Inserm U1245 and CHU Rouen, Department of Genetics and reference center for developmental disorders, FHU-G4 Génomique, F-76000, Rouen, France
| |
Collapse
|
34
|
Unni P, Friend J, Weinberg J, Okur V, Hochscherf J, Dominguez I. Predictive functional, statistical and structural analysis of CSNK2A1 and CSNK2B variants linked to neurodevelopmental diseases. Front Mol Biosci 2022; 9:851547. [PMID: 36310603 PMCID: PMC9608649 DOI: 10.3389/fmolb.2022.851547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 06/29/2022] [Indexed: 12/02/2022] Open
Abstract
Okur-Chung Neurodevelopmental Syndrome (OCNDS) and Poirier-Bienvenu Neurodevelopmental Syndrome (POBINDS) were recently identified as rare neurodevelopmental disorders. OCNDS and POBINDS are associated with heterozygous mutations in the CSNK2A1 and CSNK2B genes which encode CK2α, a serine/threonine protein kinase, and CK2β, a regulatory protein, respectively, which together can form a tetrameric enzyme called protein kinase CK2. A challenge in OCNDS and POBINDS is to understand the genetic basis of these diseases and the effect of the various CK2⍺ and CK2β mutations. In this study we have collected all variants available to date in CSNK2A1 and CSNK2B, and identified hotspots. We have investigated CK2⍺ and CK2β missense mutations through prediction programs which consider the evolutionary conservation, functionality and structure or these two proteins, compared these results with published experimental data on CK2α and CK2β mutants, and suggested prediction programs that could help predict changes in functionality of CK2α mutants. We also investigated the potential effect of CK2α and CK2β mutations on the 3D structure of the proteins and in their binding to each other. These results indicate that there are functional and structural consequences of mutation of CK2α and CK2β, and provide a rationale for further study of OCNDS and POBINDS-associated mutations. These data contribute to understanding the genetic and functional basis of these diseases, which is needed to identify their underlying mechanisms.
Collapse
Affiliation(s)
- Prasida Unni
- Department of Medicine, Boston University School of Medicine and Boston Medical Center, Boston University, Boston, MA, United States
| | - Jack Friend
- Department of Medicine, Boston University School of Medicine and Boston Medical Center, Boston University, Boston, MA, United States
| | - Janice Weinberg
- Department of Biostatistics, Boston University School of Public Health, Boston University, Boston, MA, United States
| | - Volkan Okur
- New York Genome Center, New York, NY, United States
| | - Jennifer Hochscherf
- Department of Chemistry, Institute of Biochemistry, University of Cologne, Cologne, Germany
| | - Isabel Dominguez
- Department of Medicine, Boston University School of Medicine and Boston Medical Center, Boston University, Boston, MA, United States
- *Correspondence: Isabel Dominguez,
| |
Collapse
|
35
|
Zhong G, Shen Y. Statistical models of the genetic etiology of congenital heart disease. Curr Opin Genet Dev 2022; 76:101967. [PMID: 35939966 PMCID: PMC10586490 DOI: 10.1016/j.gde.2022.101967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 06/29/2022] [Accepted: 07/08/2022] [Indexed: 11/03/2022]
Abstract
Congenital heart disease (CHD) is a collection of anatomically and clinically heterogeneous structure anomalies of heart at birth. Finding genetic causes of CHD can not only shed light on developmental biology of heart, but also provide basis for improving clinical care and interventions. The optimal study design and analytical approaches to identify genetic causes depend on the underlying genetic architecture. A few well-known syndromes with CHD as core conditions, such as Noonan and CHARGE, have known monogenic causes. The genetic causes of most of CHD patients, however, are unknown and likely to be complex. In this review, we highlight recent studies that assume a complex genetic architecture of CHD with two main approaches. One is genomic sequencing studies aiming for identifying rare or de novo risk variants with large genetic effect. The other is genome-wide association studies optimized for common variants with moderate genetic effect.
Collapse
Affiliation(s)
- Guojie Zhong
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA; Integrated Program in Cellular, Molecular, and Biological Studies, Columbia University Irving Medical Center, New York, NY, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA; Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA; JP Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
36
|
Vihinen M. Individual Genetic Heterogeneity. Genes (Basel) 2022; 13:1626. [PMID: 36140794 PMCID: PMC9498725 DOI: 10.3390/genes13091626] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 08/25/2022] [Accepted: 09/08/2022] [Indexed: 11/28/2022] Open
Abstract
Genetic variation has been widely covered in literature, however, not from the perspective of an individual in any species. Here, a synthesis of genetic concepts and variations relevant for individual genetic constitution is provided. All the different levels of genetic information and variation are covered, ranging from whether an organism is unmixed or hybrid, has variations in genome, chromosomes, and more locally in DNA regions, to epigenetic variants or alterations in selfish genetic elements. Genetic constitution and heterogeneity of microbiota are highly relevant for health and wellbeing of an individual. Mutation rates vary widely for variation types, e.g., due to the sequence context. Genetic information guides numerous aspects in organisms. Types of inheritance, whether Mendelian or non-Mendelian, zygosity, sexual reproduction, and sex determination are covered. Functions of DNA and functional effects of variations are introduced, along with mechanism that reduce and modulate functional effects, including TARAR countermeasures and intraindividual genetic conflict. TARAR countermeasures for tolerance, avoidance, repair, attenuation, and resistance are essential for life, integrity of genetic information, and gene expression. The genetic composition, effects of variations, and their expression are considered also in diseases and personalized medicine. The text synthesizes knowledge and insight on individual genetic heterogeneity and organizes and systematizes the central concepts.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22184 Lund, Sweden
| |
Collapse
|
37
|
Zhou X, Feliciano P, Shu C, Wang T, Astrovskaya I, Hall JB, Obiajulu JU, Wright JR, Murali SC, Xu SX, Brueggeman L, Thomas TR, Marchenko O, Fleisch C, Barns SD, Snyder LG, Han B, Chang TS, Turner TN, Harvey WT, Nishida A, O'Roak BJ, Geschwind DH, Michaelson JJ, Volfovsky N, Eichler EE, Shen Y, Chung WK. Integrating de novo and inherited variants in 42,607 autism cases identifies mutations in new moderate-risk genes. Nat Genet 2022; 54:1305-1319. [PMID: 35982159 PMCID: PMC9470534 DOI: 10.1038/s41588-022-01148-2] [Citation(s) in RCA: 211] [Impact Index Per Article: 70.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 06/28/2022] [Indexed: 12/16/2022]
Abstract
To capture the full spectrum of genetic risk for autism, we performed a two-stage analysis of rare de novo and inherited coding variants in 42,607 autism cases, including 35,130 new cases recruited online by SPARK. We identified 60 genes with exome-wide significance (P < 2.5 × 10-6), including five new risk genes (NAV3, ITSN1, MARK2, SCAF1 and HNRNPUL2). The association of NAV3 with autism risk is primarily driven by rare inherited loss-of-function (LoF) variants, with an estimated relative risk of 4, consistent with moderate effect. Autistic individuals with LoF variants in the four moderate-risk genes (NAV3, ITSN1, SCAF1 and HNRNPUL2; n = 95) have less cognitive impairment than 129 autistic individuals with LoF variants in highly penetrant genes (CHD8, SCN2A, ADNP, FOXP1 and SHANK3) (59% vs 88%, P = 1.9 × 10-6). Power calculations suggest that much larger numbers of autism cases are needed to identify additional moderate-risk genes.
Collapse
Affiliation(s)
- Xueya Zhou
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | | | - Chang Shu
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Tianyun Wang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Medical Genetics, Center for Medical Genetics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
- Neuroscience Research Institute, Department of Neurobiology, School of Basic Medical Sciences, Peking University Health Science Center; Key Laboratory for Neuroscience, Ministry of Education of China & National Health Commission of China, Beijing, China
| | | | | | - Joseph U Obiajulu
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | | | - Shwetha C Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Leo Brueggeman
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | - Taylor R Thomas
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | | | | | | | | | - Bing Han
- Simons Foundation, New York, NY, USA
| | - Timothy S Chang
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Tychele N Turner
- Department of Genetics, Washington University, St. Louis, MO, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrew Nishida
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Brian J O'Roak
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Daniel H Geschwind
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jacob J Michaelson
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA.
- Simons Foundation, New York, NY, USA.
- Department of Medicine, Columbia University Medical Center, New York, NY, USA.
| |
Collapse
|
38
|
Rashed WM, Marcotte EL, Spector LG. Germline De Novo Mutations as a Cause of Childhood Cancer. JCO Precis Oncol 2022; 6:e2100505. [PMID: 35820085 DOI: 10.1200/po.21.00505] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Germline de novo mutations (DNMs) represent one of the important topics that need extensive attention from epidemiologists, geneticists, and other relevant stakeholders. Advances in next-generation sequencing technologies allowed examination of parent-offspring trios to ascertain the frequency of germline DNMs. Many epidemiological risk factors for childhood cancer are indicative of DNMs as a mechanism. The aim of this review was to give an overview of germline DNMs, their causes in general, and to discuss their relation to childhood cancer risk. In addition, we highlighted existing gaps in knowledge in many topics of germline DNMs in childhood cancer that need exploration and collaborative efforts.
Collapse
Affiliation(s)
- Wafaa M Rashed
- Research Department, Children's Cancer Hospital-Egypt 57357 (CCHE-57357), Cairo, Egypt
| | - Erin L Marcotte
- Division of Epidemiology/Clinical, Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN.,Masonic Cancer Center, University of Minnesota, Minneapolis, MN
| | - Logan G Spector
- Division of Epidemiology/Clinical, Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN.,Masonic Cancer Center, University of Minnesota, Minneapolis, MN
| |
Collapse
|
39
|
Gorlova OY, Kimmel M, Tsavachidis S, Amos CI, Gorlov IP. Identification of lung cancer drivers by comparison of the observed and the expected numbers of missense and nonsense mutations in individual human genes. Oncotarget 2022; 13:756-767. [PMID: 35634240 PMCID: PMC9132259 DOI: 10.18632/oncotarget.28231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Accepted: 05/03/2022] [Indexed: 01/25/2023] Open
Abstract
Largely, cancer development is driven by acquisition and positive selection of somatic mutations that increase proliferation and survival of tumor cells. As a result, genes related to cancer development tend to have an excess of somatic mutations in them. An excess of missense and/or nonsense mutations in a gene is an indicator of its cancer relevance. To identify genes with an excess of potentially functional missense or nonsense mutations one needs to compare the observed and expected numbers of mutations in the gene. We estimated the expected numbers of missense and nonsense mutations in individual human genes using (i) the number of potential sites for missense and nonsense mutations in individual transcripts and (ii) histology-specific nucleotide context-dependent mutation rates. To estimate mutation rates defined as the number of mutations per site per tumor we used silent mutations reported in the Catalog Of Somatic Mutations In Cancer (COSMIC). The estimates were nucleotide context dependent. We have identified 26 genes with an excess of missense and/or nonsense mutations for lung adenocarcinoma, 18 genes for small cell lung cancer, and 26 genes for squamous cell carcinoma of the lung. These genes include known genes and novel lung cancer gene candidates.
Collapse
Affiliation(s)
- Olga Y. Gorlova
- 1Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA,Correspondence to:Olga Y. Gorlova, email:
| | - Marek Kimmel
- 2Department of Statistics, Rice University, Houston, TX 77005, USA
| | | | | | - Ivan P. Gorlov
- 1Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
40
|
Li W, Almirantis Y, Provata A. Revisiting the neutral dynamics derived limiting guanine-cytosine content using human de novo point mutation data. Meta Gene 2022. [DOI: 10.1016/j.mgene.2021.100994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
41
|
Context dependency of nucleotide probabilities and variants in human DNA. BMC Genomics 2022; 23:87. [PMID: 35100973 PMCID: PMC8802520 DOI: 10.1186/s12864-021-08246-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 12/10/2021] [Indexed: 12/20/2022] Open
Abstract
Background Genomic DNA has been shaped by mutational processes through evolution. The cellular machinery for error correction and repair has left its marks in the nucleotide composition along with structural and functional constraints. Therefore, the probability of observing a base in a certain position in the human genome is highly context-dependent. Results Here we develop context-dependent nucleotide models. We first investigate models of nucleotides conditioned on sequence context. We develop a bidirectional Markov model that use an average of the probability from a Markov model applied to both strands of the sequence and thus depends on up to 14 bases to each side of the nucleotide. We show how the genome predictability varies across different types of genomic regions. Surprisingly, this model can predict a base from its context with an average of more than 50% accuracy. For somatic variants we show a tendency towards higher probability for the variant base than for the reference base. Inspired by DNA substitution models, we develop a model of mutability that estimates a mutation matrix (called the alpha matrix) on top of the nucleotide distribution. The alpha matrix can be estimated from a much smaller context than the nucleotide model, but the final model will still depend on the full context of the nucleotide model. With the bidirectional Markov model of order 14 and an alpha matrix dependent on just one base to each side, we obtain a model that compares well with a model of mutability that estimates mutation probabilities directly conditioned on three nucleotides to each side. For somatic variants in particular, our model fits better than the simpler model. Interestingly, the model is not very sensitive to the size of the context for the alpha matrix. Conclusions Our study found strong context dependencies of nucleotides in the human genome. The best model uses a context of 14 nucleotides to each side. Based on these models, a substitution model was constructed that separates into the context model and a matrix dependent on a small context. The model fit somatic variants particularly well. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-08246-1).
Collapse
|
42
|
Cost-Efficiency Optimization Serves as a Conserved Mechanism that Promotes Osteosarcoma in Mammals. J Mol Evol 2022; 90:139-148. [DOI: 10.1007/s00239-022-10047-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Accepted: 01/06/2022] [Indexed: 10/19/2022]
|
43
|
Melamed D, Nov Y, Malik A, Yakass MB, Bolotin E, Shemer R, Hiadzi EK, Skorecki KL, Livnat A. De novo mutation rates at the single-mutation resolution in a human HBB gene-region associated with adaptation and genetic disease. Genome Res 2022; 32:488-498. [PMID: 35031571 PMCID: PMC8896469 DOI: 10.1101/gr.276103.121] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 01/10/2022] [Indexed: 11/25/2022]
Abstract
Although it is known that the mutation rate varies across the genome, previous estimates were based on averaging across various numbers of positions. Here, we describe a method to measure the origination rates of target mutations at target base positions and apply it to a 6-bp region in the human hemoglobin subunit beta (HBB) gene and to the identical, paralogous hemoglobin subunit delta (HBD) region in sperm cells from both African and European donors. The HBB region of interest (ROI) includes the site of the hemoglobin S (HbS) mutation, which protects against malaria, is common in Africa, and has served as a classic example of adaptation by random mutation and natural selection. We found a significant correspondence between de novo mutation rates and past observations of alleles in carriers, showing that mutation rates vary substantially in a mutation-specific manner that contributes to the site frequency spectrum. We also found that the overall point mutation rate is significantly higher in Africans than in Europeans in the HBB region studied. Finally, the rate of the 20A→T mutation, called the “HbS mutation” when it appears in HBB, is significantly higher than expected from the genome-wide average for this mutation type. Nine instances were observed in the African HBB ROI, where it is of adaptive significance, representing at least three independent originations; no instances were observed elsewhere. Further studies will be needed to examine mutation rates at the single-mutation resolution across these and other loci and organisms and to uncover the molecular mechanisms responsible.
Collapse
|
44
|
Rashid I, Campos M, Collier T, Crepeau M, Weakley A, Gripkey H, Lee Y, Schmidt H, Lanzaro GC. Spontaneous mutation rate estimates for the principal malaria vectors Anopheles coluzzii and Anopheles stephensi. Sci Rep 2022; 12:226. [PMID: 34996998 PMCID: PMC8742016 DOI: 10.1038/s41598-021-03943-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 12/07/2021] [Indexed: 11/17/2022] Open
Abstract
Using high-depth whole genome sequencing of F0 mating pairs and multiple individual F1 offspring, we estimated the nuclear mutation rate per generation in the malaria vectors Anopheles coluzzii and Anopheles stephensi by detecting de novo genetic mutations. A purpose-built computer program was employed to filter actual mutations from a deep background of superficially similar artifacts resulting from read misalignment. Performance of filtering parameters was determined using software-simulated mutations, and the resulting estimate of false negative rate was used to correct final mutation rate estimates. Spontaneous mutation rates by base substitution were estimated at 1.00 × 10−9 (95% confidence interval, 2.06 × 10−10—2.91 × 10−9) and 1.36 × 10−9 (95% confidence interval, 4.42 × 10−10—3.18 × 10−9) per site per generation in A. coluzzii and A. stephensi respectively. Although similar studies have been performed on other insect species including dipterans, this is the first study to empirically measure mutation rates in the important genus Anopheles, and thus provides an estimate of µ that will be of utility for comparative evolutionary genomics, as well as for population genetic analysis of malaria vector mosquito species.
Collapse
Affiliation(s)
- Iliyas Rashid
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA.,Section of Cell and Developmental Biology, University of California, San Diego, La Jolla, CA, USA.,Tata Institute for Genetics and Society, Center at inStem, Bangalore, Karnataka, 560065, India
| | - Melina Campos
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Travis Collier
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Marc Crepeau
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Allison Weakley
- Department of ChEM-H Operations, Stanford University, 450 Serra Mall, Stanford, CA, 94305, USA
| | - Hans Gripkey
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Yoosook Lee
- Florida Medical Entomology Laboratory, University of Florida, 200 9th St SE, Vero Beach, FL, 32962, USA
| | - Hanno Schmidt
- Anthropology, Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University of Mainz, Saarstraße 21, 55122, Mainz, Germany
| | - Gregory C Lanzaro
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA.
| |
Collapse
|
45
|
Qi M, Stenson PD, Ball EV, Tainer JA, Bacolla A, Kehrer-Sawatzki H, Cooper DN, Zhao H. Distinct sequence features underlie microdeletions and gross deletions in the human genome. Hum Mutat 2021; 43:328-346. [PMID: 34918412 PMCID: PMC9069542 DOI: 10.1002/humu.24314] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 11/02/2021] [Accepted: 12/14/2021] [Indexed: 11/18/2022]
Abstract
Microdeletions and gross deletions are important causes (~20%) of human inherited disease and their genomic locations are strongly influenced by the local DNA sequence environment. This notwithstanding, no study has systematically examined their underlying generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions from the Human Gene Mutation Database (HGMD) that together form a continuum of germline deletions ranging in size from 1 to 28,394,429 bp. We analyzed the DNA sequence within 1 kb of the breakpoint junctions and found that the frequencies of non‐B DNA‐forming repeats, GC‐content, and the presence of seven of 78 specific sequence motifs in the vicinity of pathogenic deletions correlated with deletion length for deletions of length ≤30 bp. Further, we found that the presence of DR, GQ, and STR repeats is important for the formation of longer deletions (>30 bp) but not for the formation of shorter deletions (≤30 bp) while significantly (χ2, p < 2E−16) more microhomologies were identified flanking short deletions than long deletions (length >30 bp). We provide evidence to support a functional distinction between microdeletions and gross deletions. Finally, we propose that a deletion length cut‐off of 25–30 bp may serve as an objective means to functionally distinguish microdeletions from gross deletions.
Collapse
Affiliation(s)
- Mengling Qi
- Department of Medical Research Center, Sun Yat-sen Memorial Hospital; Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Guangzhou, China
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Edward V Ball
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - John A Tainer
- Departments of Cancer Biology and of Molecular and Cellular Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Albino Bacolla
- Departments of Cancer Biology and of Molecular and Cellular Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | | | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Huiying Zhao
- Department of Medical Research Center, Sun Yat-sen Memorial Hospital; Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Guangzhou, China
| |
Collapse
|
46
|
Kaiser VB, Talmane L, Kumar Y, Semple F, MacLennan M, FitzPatrick DR, Taylor MS, Semple CA. Mutational bias in spermatogonia impacts the anatomy of regulatory sites in the human genome. Genome Res 2021; 31:1994-2007. [PMID: 34417209 PMCID: PMC8559717 DOI: 10.1101/gr.275407.121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 08/19/2021] [Indexed: 12/03/2022]
Abstract
Mutation in the germline is the ultimate source of genetic variation, but little is known about the influence of germline chromatin structure on mutational processes. Using ATAC-seq, we profile the open chromatin landscape of human spermatogonia, the most proliferative cell type of the germline, identifying transcription factor binding sites (TFBSs) and PRDM9 binding sites, a subset of which will initiate meiotic recombination. We observe an increase in rare structural variant (SV) breakpoints at PRDM9-bound sites, implicating meiotic recombination in the generation of structural variation. Many germline TFBSs, such as NRF1, are also associated with increased rates of SV breakpoints, apparently independent of recombination. Singleton short insertions (≥5 bp) are highly enriched at TFBSs, particularly at sites bound by testis active TFs, and their rates correlate with those of structural variant breakpoints. Short insertions often duplicate the TFBS motif, leading to clustering of motif sites near regulatory regions in this male-driven evolutionary process. Increased mutation loads at germline TFBSs disproportionately affect neural enhancers with activity in spermatogonia, potentially altering neurodevelopmental regulatory architecture. Local chromatin structure in spermatogonia is thus pervasive in shaping both evolution and disease.
Collapse
Affiliation(s)
- Vera B Kaiser
- MRC Human Genetics Unit, MRC Institute of Genetics and Cancer, The University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, United Kingdom
| | - Lana Talmane
- MRC Human Genetics Unit, MRC Institute of Genetics and Cancer, The University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, United Kingdom
| | - Yatendra Kumar
- MRC Human Genetics Unit, MRC Institute of Genetics and Cancer, The University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, United Kingdom
| | - Fiona Semple
- MRC Human Genetics Unit, MRC Institute of Genetics and Cancer, The University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, United Kingdom
| | - Marie MacLennan
- MRC Human Genetics Unit, MRC Institute of Genetics and Cancer, The University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, United Kingdom
| | - David R FitzPatrick
- MRC Human Genetics Unit, MRC Institute of Genetics and Cancer, The University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, United Kingdom
| | - Martin S Taylor
- MRC Human Genetics Unit, MRC Institute of Genetics and Cancer, The University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, United Kingdom
| | - Colin A Semple
- MRC Human Genetics Unit, MRC Institute of Genetics and Cancer, The University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, United Kingdom
| |
Collapse
|
47
|
Seplyarskiy VB, Sunyaev S. The origin of human mutation in light of genomic data. Nat Rev Genet 2021; 22:672-686. [PMID: 34163020 DOI: 10.1038/s41576-021-00376-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/06/2021] [Indexed: 02/05/2023]
Abstract
Despite years of active research into the role of DNA repair and replication in mutagenesis, surprisingly little is known about the origin of spontaneous human mutation in the germ line. With the advent of high-throughput sequencing, genome-scale data have revealed statistical properties of mutagenesis in humans. These properties include variation of the mutation rate and spectrum along the genome at different scales in relation to epigenomic features and dependency on parental age. Moreover, mutations originated in mothers are less frequent than mutations originated in fathers and have a distinct genomic distribution. Statistical analyses that interpret these patterns in the context of known biochemistry can provide mechanistic models of mutagenesis in humans.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
48
|
Waldvogel AM, Pfenninger M. Temperature dependence of spontaneous mutation rates. Genome Res 2021; 31:1582-1589. [PMID: 34301628 PMCID: PMC8415371 DOI: 10.1101/gr.275168.120] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 07/21/2021] [Indexed: 11/29/2022]
Abstract
Mutation is the source of genetic variation and the fundament of evolution. Temperature has long been suggested to have a direct impact on realized spontaneous mutation rates. If mutation rates vary in response to environmental conditions, such as the variation of the ambient temperature through space and time, they should no longer be described as species-specific constants. By combining mutation accumulation with whole-genome sequencing in a multicellular organism, we provide empirical support to reject the null hypothesis of a constant, temperature-independent mutation rate. Instead, mutation rates depended on temperature in a U-shaped manner with increasing rates toward both temperature extremes. This relation has important implications for mutation-dependent processes in molecular evolution, processes shaping the evolution of mutation rates, and even the evolution of biodiversity as such.
Collapse
Affiliation(s)
- Ann-Marie Waldvogel
- Senckenberg Biodiversity and Climate Research Centre, 60325 Frankfurt am Main, Germany
- Institute of Zoology, University of Cologne, 50674 Cologne, Germany
| | - Markus Pfenninger
- Senckenberg Biodiversity and Climate Research Centre, 60325 Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Biodiversity and Climate Research Centre, 60325 Frankfurt am Main, Germany
- Institute for Organismic and Molecular Evolution, Johannes Gutenberg University, 55128 Mainz, Germany
| |
Collapse
|
49
|
Jia X, Goes FS, Locke AE, Palmer D, Wang W, Cohen-Woods S, Genovese G, Jackson AU, Jiang C, Kvale M, Mullins N, Nguyen H, Pirooznia M, Rivera M, Ruderfer DM, Shen L, Thai K, Zawistowski M, Zhuang Y, Abecasis G, Akil H, Bergen S, Burmeister M, Chapman S, DelaBastide M, Juréus A, Kang HM, Kwok PY, Li JZ, Levy SE, Monson ET, Moran J, Sobell J, Watson S, Willour V, Zöllner S, Adolfsson R, Blackwood D, Boehnke M, Breen G, Corvin A, Craddock N, DiFlorio A, Hultman CM, Landen M, Lewis C, McCarroll SA, Richard McCombie W, McGuffin P, McIntosh A, McQuillin A, Morris D, Myers RM, O'Donovan M, Ophoff R, Boks M, Kahn R, Ouwehand W, Owen M, Pato C, Pato M, Posthuma D, Potash JB, Reif A, Sklar P, Smoller J, Sullivan PF, Vincent J, Walters J, Neale B, Purcell S, Risch N, Schaefer C, Stahl EA, Zandi PP, Scott LJ. Investigating rare pathogenic/likely pathogenic exonic variation in bipolar disorder. Mol Psychiatry 2021; 26:5239-5250. [PMID: 33483695 PMCID: PMC8295400 DOI: 10.1038/s41380-020-01006-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 12/14/2020] [Accepted: 12/16/2020] [Indexed: 01/30/2023]
Abstract
Bipolar disorder (BD) is a serious mental illness with substantial common variant heritability. However, the role of rare coding variation in BD is not well established. We examined the protein-coding (exonic) sequences of 3,987 unrelated individuals with BD and 5,322 controls of predominantly European ancestry across four cohorts from the Bipolar Sequencing Consortium (BSC). We assessed the burden of rare, protein-altering, single nucleotide variants classified as pathogenic or likely pathogenic (P-LP) both exome-wide and within several groups of genes with phenotypic or biologic plausibility in BD. While we observed an increased burden of rare coding P-LP variants within 165 genes identified as BD GWAS regions in 3,987 BD cases (meta-analysis OR = 1.9, 95% CI = 1.3-2.8, one-sided p = 6.0 × 10-4), this enrichment did not replicate in an additional 9,929 BD cases and 14,018 controls (OR = 0.9, one-side p = 0.70). Although BD shares common variant heritability with schizophrenia, in the BSC sample we did not observe a significant enrichment of P-LP variants in SCZ GWAS genes, in two classes of neuronal synaptic genes (RBFOX2 and FMRP) associated with SCZ or in loss-of-function intolerant genes. In this study, the largest analysis of exonic variation in BD, individuals with BD do not carry a replicable enrichment of rare P-LP variants across the exome or in any of several groups of genes with biologic plausibility. Moreover, despite a strong shared susceptibility between BD and SCZ through common genetic variation, we do not observe an association between BD risk and rare P-LP coding variants in genes known to modulate risk for SCZ.
Collapse
Affiliation(s)
- Xiaoming Jia
- Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Fernando S Goes
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21287, USA
| | - Adam E Locke
- Division of Genomics & Bioinformatics, Department of Medicine and McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, 63108, USA
| | - Duncan Palmer
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Weiqing Wang
- Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Sarah Cohen-Woods
- Discipline of Psychology and Flinders Centre for Innovation in Cancer, Flinders University, Adelaide, SA, Australia
- Medical Research Council Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Giulio Genovese
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Anne U Jackson
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Chen Jiang
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, 94611, USA
| | - Mark Kvale
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, 94143, USA
| | - Niamh Mullins
- Pamela Sklar Division of Psychiatric Genomics, Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Hoang Nguyen
- Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Mehdi Pirooznia
- Bioinformatics and Computational Core, National Heart, Lung, and Blood Institute, Bethesda, MD, 20892, USA
| | - Margarita Rivera
- Medical Research Council Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- Department of Biochemistry and Molecular Biology II, Institute of Neurosciences, Center for Biomedical Research, University of Granada, Granada, Spain
| | - Douglas M Ruderfer
- Departments of Medicine, Psychiatry, and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Ling Shen
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, 94611, USA
| | - Khanh Thai
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, 94611, USA
| | - Matthew Zawistowski
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Yongwen Zhuang
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Gonçalo Abecasis
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Huda Akil
- Molecular & Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Sarah Bergen
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Margit Burmeister
- Molecular & Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Sinéad Chapman
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Melissa DelaBastide
- Division of Research, Cold Spring Harbor Laboratory, Cold Spring, Harbor, NY, 11797, USA
| | - Anders Juréus
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Hyun Min Kang
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Pui-Yan Kwok
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, 94143, USA
| | - Jun Z Li
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Shawn E Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Eric T Monson
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, 52242, USA
| | - Jennifer Moran
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Janet Sobell
- Department of Psychiatry and Behavioral Sciences, University of Southern California, Los Angeles, CA, 90033, USA
| | - Stanley Watson
- Molecular & Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Virginia Willour
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, 52242, USA
| | - Sebastian Zöllner
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Rolf Adolfsson
- Departments of Clinical Sciences and Psychiatry, Umea University, Umea, Sweden
| | | | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Gerome Breen
- Medical Research Council Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- NIHR BRC for Mental Health, King's College London, London, UK
| | - Aiden Corvin
- Department of Psychiatry and Trinity Translational Medicine Institute, Trinity College Dublin, Dublin, Ireland
| | - Nick Craddock
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University School of Medicine, Cardiff, UK
| | - Arianna DiFlorio
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University School of Medicine, Cardiff, UK
| | - Christina M Hultman
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Mikael Landen
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden
| | - Cathryn Lewis
- Medical Research Council Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- Department of Medical & Molecular Genetics, King's College London, London, UK
| | | | - W Richard McCombie
- Division of Research, Cold Spring Harbor Laboratory, Cold Spring, Harbor, NY, 11797, USA
| | - Peter McGuffin
- Medical Research Council Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Andrew McIntosh
- Division of Psychiatry, University of Edinburgh, Edinburgh, UK
- Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK
| | | | - Derek Morris
- Department of Psychiatry and Trinity Translational Medicine Institute, Trinity College Dublin, Dublin, Ireland
- Discipline of Biochemistry, Neuroimaging and Cognitive Genomics (NICOG) Centre, National University of Ireland Galway, Galway, Ireland
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA
| | - Michael O'Donovan
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University School of Medicine, Cardiff, UK
| | - Roel Ophoff
- Center for Neurobehavioral Genetics, University of California Los Angeles, Los Angeles, CA, 90095, USA
- Department of Psychiatry, UMC Utrecht Brain Center Rudolf Magnus, Utrecht, the Netherlands
| | - Marco Boks
- Department of Psychiatry, UMC Utrecht Brain Center Rudolf Magnus, Utrecht, the Netherlands
| | - Rene Kahn
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Willem Ouwehand
- Department of Haematology, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Michael Owen
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University School of Medicine, Cardiff, UK
| | - Carlos Pato
- Department of Psychiatry and Behavioral Sciences, University of Southern California, Los Angeles, CA, 90033, USA
- SUNY Downstate Medical Center, Brooklyn, NY, 11203, USA
| | - Michele Pato
- Department of Psychiatry and Behavioral Sciences, University of Southern California, Los Angeles, CA, 90033, USA
- Department of Psychiatry, SUNY Downstate Medical Center, Brooklyn, NY, 11203, USA
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
- Department of Clinical Genetics, Amsterdam Neuroscience, Vrije Universiteit Medical Center, Amsterdam, the Netherlands
| | - James B Potash
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21287, USA
| | - Andreas Reif
- Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital Frankfurt, Frankfurt am Main, Germany
| | - Pamela Sklar
- Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Jordan Smoller
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Patrick F Sullivan
- Departments of Genetics and Psychiatry, University of North Carolina, Chapel Hill, NC, USA
| | - John Vincent
- Molecular Neuropsychiatry and Development Laboratory, Campbell Family Mental Health Research Institute, Center for Addiction & Mental Health, Toronto, ON, Canada
- Department of Psychiatry and Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
| | - James Walters
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University School of Medicine, Cardiff, UK
| | - Benjamin Neale
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Shaun Purcell
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Neil Risch
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, 94143, USA
| | - Catherine Schaefer
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, 94611, USA
| | - Eli A Stahl
- Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Peter P Zandi
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21287, USA.
| | - Laura J Scott
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
50
|
Rajaei M, Saxena AS, Johnson LM, Snyder MC, Crombie TA, Tanny RE, Andersen EC, Joyner-Matos J, Baer CF. Mutability of mononucleotide repeats, not oxidative stress, explains the discrepancy between laboratory-accumulated mutations and the natural allele-frequency spectrum in C. elegans. Genome Res 2021; 31:1602-1613. [PMID: 34404692 PMCID: PMC8415377 DOI: 10.1101/gr.275372.121] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 07/12/2021] [Indexed: 11/24/2022]
Abstract
Important clues about natural selection can be gleaned from discrepancies between the properties of segregating genetic variants and of mutations accumulated experimentally under minimal selection, provided the mutational process is the same in the laboratory as in nature. The base-substitution spectrum differs between C. elegans laboratory mutation accumulation (MA) experiments and the standing site-frequency spectrum, which has been argued to be in part owing to increased oxidative stress in the laboratory environment. Using genome sequence data from C. elegans MA lines carrying a mutation (mev-1) that increases the cellular titer of reactive oxygen species (ROS), leading to increased oxidative stress, we find the base-substitution spectrum is similar between mev-1, its wild-type progenitor (N2), and another set of MA lines derived from a different wild strain (PB306). Conversely, the rate of short insertions is greater in mev-1, consistent with studies in other organisms in which environmental stress increased the rate of insertion–deletion mutations. Further, the mutational properties of mononucleotide repeats in all strains are different from those of nonmononucleotide sequence, both for indels and base-substitutions, and whereas the nonmononucleotide spectra are fairly similar between MA lines and wild isolates, the mononucleotide spectra are very different, with a greater frequency of A:T → T:A transversions and an increased proportion of ±1-bp indels. The discrepancy in mutational spectra between laboratory MA experiments and natural variation is likely owing to a consistent (but unknown) effect of the laboratory environment that manifests itself via different modes of mutability and/or repair at mononucleotide loci.
Collapse
Affiliation(s)
- Moein Rajaei
- Department of Biology, University of Florida, Gainesville, Florida 32611, USA
| | | | - Lindsay M Johnson
- Department of Biology, University of Florida, Gainesville, Florida 32611, USA
| | - Michael C Snyder
- Department of Biology, University of Florida, Gainesville, Florida 32611, USA
| | - Timothy A Crombie
- Department of Biology, University of Florida, Gainesville, Florida 32611, USA.,Department of Molecular Biosciences, Northwestern University, Evanston, Illinois 60208, USA
| | - Robyn E Tanny
- Department of Molecular Biosciences, Northwestern University, Evanston, Illinois 60208, USA
| | - Erik C Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, Illinois 60208, USA
| | - Joanna Joyner-Matos
- Department of Biology, Eastern Washington University, Cheney, Washington 99004, USA
| | - Charles F Baer
- Department of Biology, University of Florida, Gainesville, Florida 32611, USA.,University of Florida Genetics Institute, Gainesville, Florida 32608, USA
| |
Collapse
|