1
|
Shao DD, Kriz AJ, Snellings DA, Zhou Z, Zhao Y, Enyenihi L, Walsh C. Advances in single-cell DNA sequencing enable insights into human somatic mosaicism. Nat Rev Genet 2025:10.1038/s41576-025-00832-3. [PMID: 40281095 DOI: 10.1038/s41576-025-00832-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2025] [Indexed: 04/29/2025]
Abstract
DNA sequencing from bulk or clonal human tissues has shown that genetic mosaicism is common and contributes to both cancer and non-cancerous disorders. However, single-cell resolution is required to understand the full genetic heterogeneity that exists within a tissue and the mechanisms that lead to somatic mosaicism. Single-cell DNA-sequencing technologies have traditionally trailed behind those of single-cell transcriptomics and epigenomics, largely because most applications require whole-genome amplification before costly whole-genome sequencing. Now, recent technological and computational advances are enabling the use of single-cell DNA sequencing to tackle previously intractable problems, such as delineating the genetic landscape of tissues with complex clonal patterns, of samples where cellular material is scarce and of non-cycling, postmitotic cells. Single-cell genomes are also revealing the mutational patterns that arise from biological processes or disease states, and have made it possible to track cell lineage in human tissues. These advances in our understanding of tissue biology and our ability to identify disease mechanisms will ultimately transform how disease is diagnosed and monitored.
Collapse
Affiliation(s)
- Diane D Shao
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA.
- Division of Genetics and Genomics, Department of Paediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Andrea J Kriz
- Division of Genetics and Genomics, Department of Paediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Daniel A Snellings
- Division of Genetics and Genomics, Department of Paediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Zinan Zhou
- Division of Genetics and Genomics, Department of Paediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Yifan Zhao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Liz Enyenihi
- Division of Genetics and Genomics, Department of Paediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
- Biological and Biomedical Sciences Graduate Program, Harvard Medical School, Boston, MA, USA
| | - Christopher Walsh
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA.
- Division of Genetics and Genomics, Department of Paediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
2
|
Rochus CM, Steensma MJ, Bink MCAM, Huisman AE, Harlizius B, Derks MFL, Crooijmans RPMA, Ducro BJ, Bijma P, Groenen MAM, Mulder HA. Estimating mutation rate and characterising single nucleotide de novo mutations in pigs. Genet Sel Evol 2025; 57:21. [PMID: 40229661 PMCID: PMC11995543 DOI: 10.1186/s12711-025-00967-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 04/03/2025] [Indexed: 04/16/2025] Open
Abstract
BACKGROUND Direct estimates of mutation rates in humans have changed our understanding of evolutionary timing and de novo mutations (DNM) have been associated with several developmental disorders in humans. Livestock species, including pigs, can contribute to the study of DNM because of their ideal population structure and routine phenotype collection. In principle, there is the potential for livestock populations to quickly accumulate new genetic variants because of short generation intervals and high selection intensity. However, the impact of DNM on the fitness of individuals is not known and with current genomic selection programs they cannot contribute to estimated breeding values. The aims of our project were to detect and validate single nucleotide DNM in two commercial pig breeding lines, estimate the single nucleotide mutation rate, and characterise DNM. RESULTS We sequenced (150 bp paired end reads, 30X coverage) 46 pig trios from two commercial lines. Single nucleotide DNM were detected using a trio-aware method. We defined candidate DNM as single nucleotide variants (SNVs) found in heterozygous state in trio-offspring with both trio-parents homozygous for the reference allele. In this study, we estimate a lower threshold of the DNM rate in pigs of 6.3 × 10-9 per site per gamete. Our findings are consistent with those from other mammals and those published for a small number of livestock species. Most DNM we detected were in introns (47%) and intergenic regions (49%). The mutational spectrum in pigs differs from that in humans and we found several DNM predicted to have an effect on animal's fitness based on the base pair change and their location in the genome. CONCLUSIONS With this study, we have generated fundamental knowledge on mutation rate in a non-primate species and identified DNM that could have an impact on the fitness of individuals.
Collapse
Affiliation(s)
- Christina M Rochus
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
- The University of Edinburgh, The Roslin Institute Easter Bush Campus, Midlothian, EH25 9RG, Scotland
| | - Marije J Steensma
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.
| | - Marco C A M Bink
- Hendrix Genetics, P.O. Box 114, 5830 AC, Boxmeer, The Netherlands
| | - Abe E Huisman
- Hendrix Genetics, P.O. Box 114, 5830 AC, Boxmeer, The Netherlands
| | - Barbara Harlizius
- Topigs Norsvin Research Center, Meerendonkweg 25, 5216 TZ, Den Bosch, The Netherlands
| | - Martijn F L Derks
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
- Topigs Norsvin Research Center, Meerendonkweg 25, 5216 TZ, Den Bosch, The Netherlands
| | - Richard P M A Crooijmans
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Bart J Ducro
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Piter Bijma
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Martien A M Groenen
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Han A Mulder
- Wageningen University & Research Animal Breeding and Genomics, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
3
|
Dissanayake UC, Roy A, Maghsoud Y, Polara S, Debnath T, Cisneros GA. Computational studies on the functional and structural impact of pathogenic mutations in enzymes. Protein Sci 2025; 34:e70081. [PMID: 40116283 PMCID: PMC11926659 DOI: 10.1002/pro.70081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 01/23/2025] [Accepted: 02/12/2025] [Indexed: 03/23/2025]
Abstract
Enzymes are critical biological catalysts involved in maintaining the intricate balance of metabolic processes within living organisms. Mutations in enzymes can result in disruptions to their functionality that may lead to a range of diseases. This review focuses on computational studies that investigate the effects of disease-associated mutations in various enzymes. Through molecular dynamics simulations, multiscale calculations, and machine learning approaches, computational studies provide detailed insights into how mutations impact enzyme structure, dynamics, and catalytic activity. This review emphasizes the increasing impact of computational simulations in understanding molecular mechanisms behind enzyme (dis)function by highlighting the application of key computational methodologies to selected enzyme examples, aiding in the prediction of mutation effects and the development of therapeutic strategies.
Collapse
Affiliation(s)
- Upeksha C. Dissanayake
- Department of Chemistry and BiochemistryThe University of Texas at DallasRichardsonTexasUSA
| | - Arkanil Roy
- Department of Chemistry and BiochemistryThe University of Texas at DallasRichardsonTexasUSA
| | - Yazdan Maghsoud
- Department of Chemistry and BiochemistryThe University of Texas at DallasRichardsonTexasUSA
- Present address:
Department of Biochemistry and Molecular PharmacologyBaylor College of MedicineHoustonTexasUSA
| | - Sarthi Polara
- Department of Chemistry and BiochemistryThe University of Texas at DallasRichardsonTexasUSA
| | - Tanay Debnath
- Department of PhysicsThe University of Texas at DallasRichardsonTexasUSA
- Present address:
Department of Pathology and Molecular MedicineQueen's UniversityKingstonOntarioCanada
| | - G. Andrés Cisneros
- Department of Chemistry and BiochemistryThe University of Texas at DallasRichardsonTexasUSA
- Department of PhysicsThe University of Texas at DallasRichardsonTexasUSA
| |
Collapse
|
4
|
Harris A, Burnham K, Pradhyumnan R, Jaishankar A, Häkkinen L, Góngora-Rosero RE, Piazza Y, Andl CD, Andl T. Human-Specific Organization of Proliferation and Stemness in Squamous Epithelia: A Comparative Study to Elucidate Differences in Stem Cell Organization. Int J Mol Sci 2025; 26:3144. [PMID: 40243939 PMCID: PMC11989042 DOI: 10.3390/ijms26073144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2025] [Revised: 03/19/2025] [Accepted: 03/26/2025] [Indexed: 04/18/2025] Open
Abstract
The mechanisms that influence human longevity are complex and operate on cellular, tissue, and organismal levels. To better understand the tissue-level mechanisms, we compared the organization of cell proliferation, differentiation, and cytoprotective protein expression in the squamous epithelium of the esophagus between mammals with varying lifespans. Humans are the only species with a quiescent basal stem cell layer that is distinctly physically separated from parabasal transit-amplifying cells. In addition to these stark differences in the organization of proliferation, human squamous epithelial stem cells express DNA repair-related markers, such as MECP2 and XPC, which are absent or low in mouse basal cells. Furthermore, we investigated whether the transition from basal to suprabasal is different between species. In humans, the parabasal cells seem to originate from cells detaching from the basement membrane, and these can already begin to proliferate while delaminating. In most other species, delaminating cells have been rare or their proliferation rate is different from that of their human counterparts, indicating an alternative mode of how stem cells maintain the tissue. In humans, the combination of an elevated cytoprotective signature and novel tissue organization may enhance resistance to aging and prevent cancer. Our results point to enhanced cellular cytoprotection and a tissue architecture which separates stemness and proliferation. These are both potential factors contributing to the increased fitness of human squamous epithelia to support longevity by suppressing tumorigenesis. However, the organization of canine oral mucosa shows some similarities to that of human tissue and may provide a useful model to understand the relationship between tissue architecture, gene expression regulation, tumor suppression, and longevity.
Collapse
Affiliation(s)
- Ashlee Harris
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL 32826, USA (K.B.); (R.P.); (R.E.G.-R.)
| | - Kaylee Burnham
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL 32826, USA (K.B.); (R.P.); (R.E.G.-R.)
| | - Ram Pradhyumnan
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL 32826, USA (K.B.); (R.P.); (R.E.G.-R.)
| | - Arthi Jaishankar
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL 32826, USA (K.B.); (R.P.); (R.E.G.-R.)
| | - Lari Häkkinen
- Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, BC V6T 1Z1, Canada;
| | - Rafael E. Góngora-Rosero
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL 32826, USA (K.B.); (R.P.); (R.E.G.-R.)
| | - Yelena Piazza
- College of Medicine, University of Central Florida, Orlando, FL 32827, USA
| | - Claudia D. Andl
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL 32826, USA (K.B.); (R.P.); (R.E.G.-R.)
| | - Thomas Andl
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL 32826, USA (K.B.); (R.P.); (R.E.G.-R.)
| |
Collapse
|
5
|
Yang Y, Badura ML, O'Leary PC, Delavan HM, Robinson TM, Egusa EA, Zhong X, Swinderman JT, Li H, Zhang M, Kim M, Ashworth A, Feng FY, Chou J, Yang L. Transcription and DNA replication collisions lead to large tandem duplications and expose targetable therapeutic vulnerabilities in cancer. NATURE CANCER 2024; 5:1885-1901. [PMID: 39558146 PMCID: PMC11671220 DOI: 10.1038/s43018-024-00848-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 10/04/2024] [Indexed: 11/20/2024]
Abstract
Despite the abundance of somatic structural variations (SVs) in cancer, the underlying molecular mechanisms of their formation remain unclear. In the present study, we used 6,193 whole-genome sequenced tumors to study the contributions of transcription and DNA replication collisions to genome instability. After deconvoluting robust SV signatures in three independent pan-cancer cohorts, we detected transcription-dependent, replicated-strand bias, the expected footprint of transcription-replication collision (TRC), in large tandem duplications (TDs). Large TDs are abundant in female-enriched, upper gastrointestinal tract and prostate cancers. They are associated with poor patient survival and mutations in TP53, CDK12 and SPOP. Upon inactivating CDK12, cells display significantly more TRCs, R-loops and large TDs. Inhibition of WEE1, CHK1 and ATR selectively inhibits the growth of cells deficient in CDK12. Our data suggest that large TDs in cancer form as a result of TRCs and their presence can be used as a biomarker for prognosis and treatment.
Collapse
Affiliation(s)
- Yang Yang
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL, USA
| | - Michelle L Badura
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, San Francisco, CA, USA
| | - Patrick C O'Leary
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Henry M Delavan
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Division of Hematology/Oncology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Troy M Robinson
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, San Francisco, CA, USA
| | - Emily A Egusa
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, San Francisco, CA, USA
| | - Xiaoming Zhong
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL, USA
| | - Jason T Swinderman
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, San Francisco, CA, USA
| | - Haolong Li
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, San Francisco, CA, USA
| | - Meng Zhang
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, San Francisco, CA, USA
| | - Minkyu Kim
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA
| | - Alan Ashworth
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Division of Hematology/Oncology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Felix Y Feng
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, San Francisco, CA, USA
- Division of Hematology/Oncology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Jonathan Chou
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA.
- Division of Hematology/Oncology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.
| | - Lixing Yang
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL, USA.
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- University of Chicago Comprehensive Cancer Center, Chicago, IL, USA.
| |
Collapse
|
6
|
Shojaeisaadi H, Schoenrock A, Meier MJ, Williams A, Norris JM, Palmer ND, Yauk CL, Marchetti F. Mutational signature analyses in multi-child families reveal sources of age-related increases in human germline mutations. Commun Biol 2024; 7:1451. [PMID: 39506086 PMCID: PMC11541588 DOI: 10.1038/s42003-024-07140-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 10/24/2024] [Indexed: 11/08/2024] Open
Abstract
Whole-genome sequencing studies of parent-offspring trios have provided valuable insights into the potential impact of de novo mutations (DNMs) on human health and disease. However, the molecular mechanisms that drive DNMs are unclear. Studies with multi-child families can provide important insight into the causes of inter-family variability in DNM rates but they are highly limited. We characterized 2479 de novo single nucleotide variants (SNVs) in 13 multi-child families of Mexican-American ethnicity. We observed a strong paternal age effect on validated de novo SNVs with extensive inter-family variability in the yearly rate of increase. Children of older fathers showed more C > T transitions at CpG sites than children from younger fathers. Validated SNVs were examined against one cancer (COSMIC) and two non-cancer (human germline and CRISPR-Cas 9 knockout of human DNA repair genes) mutational signature databases. These analyses suggest that inaccurate DNA mismatch repair during repair initiation and excision processes, along with DNA damage and replication errors, are major sources of human germline de novo SNVs. Our findings provide important information for understanding the potential sources of human germline de novo SNVs and the critical role of DNA mismatch repair in their genesis.
Collapse
Affiliation(s)
| | - Andrew Schoenrock
- Environmental Health Science and Research Bureau, Health Canada, Ottawa, ON, Canada
- Research Computing Services, Carleton University, Ottawa, ON, Canada
| | - Matthew J Meier
- Environmental Health Science and Research Bureau, Health Canada, Ottawa, ON, Canada
| | - Andrew Williams
- Environmental Health Science and Research Bureau, Health Canada, Ottawa, ON, Canada
| | - Jill M Norris
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Carole L Yauk
- Department of Biology, University of Ottawa, Ottawa, ON, Canada
| | - Francesco Marchetti
- Environmental Health Science and Research Bureau, Health Canada, Ottawa, ON, Canada.
| |
Collapse
|
7
|
Maury EA, Jones A, Seplyarskiy V, Nguyen TTL, Rosenbluh C, Bae T, Wang Y, Abyzov A, Khoshkhoo S, Chahine Y, Zhao S, Venkatesh S, Root E, Voloudakis G, Roussos P, Park PJ, Akbarian S, Brennand K, Reilly S, Lee EA, Sunyaev SR, Walsh CA, Chess A. Somatic mosaicism in schizophrenia brains reveals prenatal mutational processes. Science 2024; 386:217-224. [PMID: 39388546 PMCID: PMC11490355 DOI: 10.1126/science.adq1456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 08/16/2024] [Indexed: 10/12/2024]
Abstract
Germline mutations modulate the risk of developing schizophrenia (SCZ). Much less is known about the role of mosaic somatic mutations in the context of SCZ. Deep (239×) whole-genome sequencing (WGS) of brain neurons from 61 SCZ cases and 25 controls postmortem identified mutations occurring during prenatal neurogenesis. SCZ cases showed increased somatic variants in open chromatin, with increased mosaic CpG transversions (CpG>GpG) and T>G mutations at transcription factor binding sites (TFBSs) overlapping open chromatin, a result not seen in controls. Some of these variants alter gene expression, including SCZ risk genes and genes involved in neurodevelopment. Although these mutational processes can reflect a difference in factors indirectly involved in disease, increased somatic mutations at developmental TFBSs could also potentially contribute to SCZ.
Collapse
Affiliation(s)
- Eduardo A. Maury
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children’s Hospital, Boston, MA 02115, USA
- Bioinformatics & Integrative Genomics Program and Harvard/MIT MD-PHD Program, Harvard Medical School, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Attila Jones
- Department of Cell, Developmental & Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Vladimir Seplyarskiy
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Thanh Thanh L. Nguyen
- Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
- Department of Psychiatry, Yale School of Medicine, New Haven, CT 06520, USA
| | - Chaggai Rosenbluh
- Department of Cell, Developmental & Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Taejong Bae
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Yifan Wang
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Alexej Abyzov
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Sattar Khoshkhoo
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children’s Hospital, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Yasmine Chahine
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children’s Hospital, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sijing Zhao
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children’s Hospital, Boston, MA 02115, USA
| | - Sanan Venkatesh
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Elise Root
- Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
| | - Georgios Voloudakis
- Center for Disease Neurogenomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Panagiotis Roussos
- Center for Disease Neurogenomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Peter J. Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Schahram Akbarian
- Department of Psychiatry and Neuroscience, Friedman Brain Institute, Mount Sinai, New York, NY 10029, USA
- Department of Neuroscience, Friedman Brain Institute, Mount Sinai, New York, NY 10029, USA
| | - Kristen Brennand
- Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
- Department of Psychiatry, Yale School of Medicine, New Haven, CT 06520, USA
| | - Steven Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
| | - Eunjung A. Lee
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children’s Hospital, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Shamil R. Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Christopher A. Walsh
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children’s Hospital, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Departments of Pediatrics and Neurology, Harvard Medical School, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston Children’s Hospital, Boston, MA 02115, USA
| | - Andrew Chess
- Department of Cell, Developmental & Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Neuroscience, Friedman Brain Institute, Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
8
|
Liang X, Yang S, Wang D, Knief U. Characterization and distribution of de novo mutations in the zebra finch. Commun Biol 2024; 7:1243. [PMID: 39358581 PMCID: PMC11447093 DOI: 10.1038/s42003-024-06945-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 09/24/2024] [Indexed: 10/04/2024] Open
Abstract
Germline de novo mutations (DNMs) provide the raw material for evolution. The DNM rate varies considerably between species, sexes and chromosomes. Here, we identify DNMs in the zebra finch (Taeniopygia guttata) across 16 parent-offspring trios using two genome assemblies of different quality. Using an independent genotyping assay, we validate 82% of the 150 candidate DNMs. DNM rates are consistent between both assemblies, with estimates of 6.14 × 10-9 and 6.36 × 10-9 per site per generation. We observe a strong paternal bias in DNM rates (male-to-female ratio ɑ ≈ 4), but this bias is in transition mutations only, leading to a transition-to-transversion ratio of 3.18 and 3.57. Finally, we find that DNMs tend to be randomly distributed across chromosomes, not associated with recombination hotspots or genic regions. However, the sex chromosome chrZ shows a roughly fourfold increased DNM rate compared to autosomes, which is more than the expected increase due to chrZ spending two-thirds of its time in males. Overall, our results further enhance our understanding of DNMs in passerine songbirds.
Collapse
Affiliation(s)
- Xixi Liang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Shuai Yang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Daiping Wang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Ulrich Knief
- Evolutionary Biology & Ecology, Faculty of Biology, University of Freiburg, Freiburg, Germany
| |
Collapse
|
9
|
Schraiber JG, Spence JP, Edge MD. Estimation of demography and mutation rates from one million haploid genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.18.613708. [PMID: 39345369 PMCID: PMC11429810 DOI: 10.1101/2024.09.18.613708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
As genetic sequencing costs have plummeted, datasets with sizes previously un-thinkable have begun to appear. Such datasets present new opportunities to learn about evolutionary history, particularly via rare alleles that record the very recent past. However, beyond the computational challenges inherent in the analysis of many large-scale datasets, large population-genetic datasets present theoretical problems. In particular, the majority of population-genetic tools require the assumption that each mutant allele in the sample is the result of a single mutation (the "infinite sites" assumption), which is violated in large samples. Here, we present DR EVIL, a method for estimating mutation rates and recent demographic history from very large samples. DR EVIL avoids the infinite-sites assumption by using a diffusion approximation to a branching-process model with recurrent mutation. The branching-process approach limits the method to rare alleles, but, along with recent results, renders tractable likelihoods with recurrent mutation. We show that DR EVIL performs well in simulations and apply it to rare-variant data from a million haploid samples, identifying a signal of mutation-rate heterogeneity within commonly analyzed classes and predicting that in modern sample sizes, most rare variants at sites with high mutation rates represent the descendants of multiple mutation events.
Collapse
|
10
|
Fan WTL, Wakeley J. Latent mutations in the ancestries of alleles under selection. Theor Popul Biol 2024; 158:1-20. [PMID: 38697365 DOI: 10.1016/j.tpb.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 04/23/2024] [Accepted: 04/29/2024] [Indexed: 05/05/2024]
Abstract
We consider a single genetic locus with two alleles A1 and A2 in a large haploid population. The locus is subject to selection and two-way, or recurrent, mutation. Assuming the allele frequencies follow a Wright-Fisher diffusion and have reached stationarity, we describe the asymptotic behaviors of the conditional gene genealogy and the latent mutations of a sample with known allele counts, when the count n1 of allele A1 is fixed, and when either or both the sample size n and the selection strength |α| tend to infinity. Our study extends previous work under neutrality to the case of non-neutral rare alleles, asserting that when selection is not too strong relative to the sample size, even if it is strongly positive or strongly negative in the usual sense (α→-∞ or α→+∞), the number of latent mutations of the n1 copies of allele A1 follows the same distribution as the number of alleles in the Ewens sampling formula. On the other hand, very strong positive selection relative to the sample size leads to neutral gene genealogies with a single ancient latent mutation. We also demonstrate robustness of our asymptotic results against changing population sizes, when one of |α| or n is large.
Collapse
Affiliation(s)
- Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, 831 East 3rd St, Bloomington, 47405, IN, USA; Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Ave, Cambridge, 02138, MA, USA.
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Ave, Cambridge, 02138, MA, USA.
| |
Collapse
|
11
|
Efimenko B, Popadin K, Gunbin K. NeMu: a comprehensive pipeline for accurate reconstruction of neutral mutation spectra from evolutionary data. Nucleic Acids Res 2024; 52:W108-W115. [PMID: 38795067 PMCID: PMC11223800 DOI: 10.1093/nar/gkae438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/23/2024] [Accepted: 05/09/2024] [Indexed: 05/27/2024] Open
Abstract
The recognized importance of mutational spectra in molecular evolution is yet to be fully exploited beyond human cancer studies and model organisms. The wealth of intraspecific polymorphism data in the GenBank repository, covering a broad spectrum of genes and species, presents an untapped opportunity for detailed mutational spectrum analysis. Existing methods fall short by ignoring intermediate substitutions on the inner branches of phylogenetic trees and lacking the capability for cross-species mutational comparisons. To address these challenges, we present the NeMu pipeline, available at https://nemu-pipeline.com, a tool grounded in phylogenetic principles designed to provide comprehensive and scalable analysis of mutational spectra. Utilizing extensive sequence data from numerous available genome projects, NeMu rapidly and accurately reconstructs the neutral mutational spectrum. This tool, facilitating the reconstruction of gene- and species-specific mutational spectra, contributes to a deeper understanding of evolutionary mechanisms across the broad spectrum of known species.
Collapse
Affiliation(s)
- Bogdan Efimenko
- Center for Mitochondrial Functional Genomics, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- A.A. Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia
| | - Konstantin Popadin
- Center for Mitochondrial Functional Genomics, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Konstantin Gunbin
- Center for Mitochondrial Functional Genomics, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- A.A. Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia
- Institute of Molecular and Cellular Biology SB RAS, Novosibirsk, Russia
| |
Collapse
|
12
|
Spisak N, de Manuel M, Milligan W, Sella G, Przeworski M. The clock-like accumulation of germline and somatic mutations can arise from the interplay of DNA damage and repair. PLoS Biol 2024; 22:e3002678. [PMID: 38885262 PMCID: PMC11213356 DOI: 10.1371/journal.pbio.3002678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 06/28/2024] [Accepted: 05/14/2024] [Indexed: 06/20/2024] Open
Abstract
The rates at which mutations accumulate across human cell types vary. To identify causes of this variation, mutations are often decomposed into a combination of the single-base substitution (SBS) "signatures" observed in germline, soma, and tumors, with the idea that each signature corresponds to one or a small number of underlying mutagenic processes. Two such signatures turn out to be ubiquitous across cell types: SBS signature 1, which consists primarily of transitions at methylated CpG sites thought to be caused by spontaneous deamination, and the more diffuse SBS signature 5, which is of unknown etiology. In cancers, the number of mutations attributed to these 2 signatures accumulates linearly with age of diagnosis, and thus the signatures have been termed "clock-like." To better understand this clock-like behavior, we develop a mathematical model that includes DNA replication errors, unrepaired damage, and damage repaired incorrectly. We show that mutational signatures can exhibit clock-like behavior because cell divisions occur at a constant rate and/or because damage rates remain constant over time, and that these distinct sources can be teased apart by comparing cell lineages that divide at different rates. With this goal in mind, we analyze the rate of accumulation of mutations in multiple cell types, including soma as well as male and female germline. We find no detectable increase in SBS signature 1 mutations in neurons and only a very weak increase in mutations assigned to the female germline, but a significant increase with time in rapidly dividing cells, suggesting that SBS signature 1 is driven by rounds of DNA replication occurring at a relatively fixed rate. In contrast, SBS signature 5 increases with time in all cell types, including postmitotic ones, indicating that it accumulates independently of cell divisions; this observation points to errors in DNA repair as the key underlying mechanism. Thus, the two "clock-like" signatures observed across cell types likely have distinct origins, one set by rates of cell division, the other by damage rates.
Collapse
Affiliation(s)
- Natanael Spisak
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| | - Marc de Manuel
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| | - William Milligan
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| | - Guy Sella
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- Program for Mathematical Genomics, Columbia University, New York, New York, United States of America
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
| |
Collapse
|
13
|
Zhang Y, Ahsan MU, Wang K. Noncoding de novo mutations in SCN2A are associated with autism spectrum disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.05.24306908. [PMID: 38766206 PMCID: PMC11100849 DOI: 10.1101/2024.05.05.24306908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Coding de novo mutations (DNMs) contribute to the risk for autism spectrum disorders (ASD), but the contribution of noncoding DNMs remains relatively unexplored. Here we use whole genome sequencing (WGS) data of 12,411 individuals (including 3,508 probands and 2,218 unaffected siblings) from 3,357 families collected in Simons Foundation Powering Autism Research for Knowledge (SPARK) to detect DNMs associated with ASD, while examining Simons Simplex Collection (SSC) with 6383 individuals from 2274 families to replicate the results. For coding DNMs, SCN2A reached exome-wide significance (p=2.06×10-11) in SPARK. The 618 known dominant ASD genes as a group are strongly enriched for coding DNMs in cases than sibling controls (fold change=1.51, p =1.13×10-5 for SPARK; fold change=1.86, p =2.06×10-9 for SSC). For noncoding DNMs, we used two methods to assess statistical significance: a point-based test that analyzes sites with a Combined Annotation Dependent Depletion (CADD) score ≥15, and a segment-based test that analyzes 1kb genomic segments with segment-specific background mutation rates (inferred from expected rare mutations in Gnocchi genome constraint scores). The point-based test identified SCN2A as marginally significant (p=6.12×10-4) in SPARK, yet segment-based test identified CSMD1, RBFOX1 and CHD13 as exome-wide significant. We did not identify significant enrichment of noncoding DNMs (in all 1kb segments or those with Gnocchi>4) in the 618 known ASD genes as a group in cases than sibling controls. When combining evidence from both coding and noncoding DNMs, we found that SCN2A with 11 coding and 5 noncoding DNMs exhibited the strongest significance (p=4.15×10-13). In summary, we identified both coding and noncoding DNMs in SCN2A associated with ASD, while nominating additional candidates for further examination in future studies.
Collapse
Affiliation(s)
- Yuan Zhang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
14
|
Yang Y, Badura ML, O’Leary PC, Delavan HM, Robinson TM, Egusa EA, Zhong X, Swinderman JT, Li H, Zhang M, Kim M, Ashworth A, Feng FY, Chou J, Yang L. Large tandem duplications in cancer result from transcription and DNA replication collisions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.05.17.23290140. [PMID: 38260434 PMCID: PMC10802642 DOI: 10.1101/2023.05.17.23290140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Despite the abundance of somatic structural variations (SVs) in cancer, the underlying molecular mechanisms of their formation remain unclear. Here, we use 6,193 whole-genome sequenced tumors to study the contributions of transcription and DNA replication collisions to genome instability. After deconvoluting robust SV signatures in three independent pan-cancer cohorts, we detect transcription-dependent replicated-strand bias, the expected footprint of transcription-replication collision (TRC), in large tandem duplications (TDs). Large TDs are abundant in female-enriched, upper gastrointestinal tract and prostate cancers. They are associated with poor patient survival and mutations in TP53, CDK12, and SPOP. Upon inactivating CDK12, cells display significantly more TRCs, R-loops, and large TDs. Inhibition of G2/M checkpoint proteins, such as WEE1, CHK1, and ATR, selectively inhibits the growth of cells deficient in CDK12. Our data suggest that large TDs in cancer form due to TRCs, and their presence can be used as a biomarker for prognosis and treatment.
Collapse
Affiliation(s)
- Yang Yang
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL, USA
| | - Michelle L. Badura
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, CA, USA
| | - Patrick C. O’Leary
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
| | - Henry M. Delavan
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Division of Hematology/Oncology, Department of Medicine, University of California, San Francisco, CA, USA
| | - Troy M. Robinson
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, CA, USA
| | - Emily A. Egusa
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, CA, USA
| | - Xiaoming Zhong
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL, USA
| | - Jason T. Swinderman
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, CA, USA
| | - Haolong Li
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, CA, USA
| | - Meng Zhang
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, CA, USA
| | - Minkyu Kim
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Department of Cellular Molecular Pharmacology, University of California San Francisco, San Francisco, CA, USA
| | - Alan Ashworth
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Division of Hematology/Oncology, Department of Medicine, University of California, San Francisco, CA, USA
| | - Felix Y. Feng
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Departments of Radiation Oncology and Urology, University of California, San Francisco, CA, USA
- Division of Hematology/Oncology, Department of Medicine, University of California, San Francisco, CA, USA
| | - Jonathan Chou
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA
- Division of Hematology/Oncology, Department of Medicine, University of California, San Francisco, CA, USA
| | - Lixing Yang
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- University of Chicago Comprehensive Cancer Center, Chicago, IL, USA
| |
Collapse
|
15
|
Findlay SD, Romo L, Burge CB. Quantifying negative selection in human 3' UTRs uncovers constrained targets of RNA-binding proteins. Nat Commun 2024; 15:85. [PMID: 38168060 PMCID: PMC10762232 DOI: 10.1038/s41467-023-44456-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 12/14/2023] [Indexed: 01/05/2024] Open
Abstract
Many non-coding variants associated with phenotypes occur in 3' untranslated regions (3' UTRs), and may affect interactions with RNA-binding proteins (RBPs) to regulate gene expression post-transcriptionally. However, identifying functional 3' UTR variants has proven difficult. We use allele frequencies from the Genome Aggregation Database (gnomAD) to identify classes of 3' UTR variants under strong negative selection in humans. We develop intergenic mutability-adjusted proportion singleton (iMAPS), a generalized measure related to MAPS, to quantify negative selection in non-coding regions. This approach, in conjunction with in vitro and in vivo binding data, identifies precise RBP binding sites, miRNA target sites, and polyadenylation signals (PASs) under strong selection. For each class of sites, we identify thousands of gnomAD variants under selection comparable to missense coding variants, and find that sites in core 3' UTR regions upstream of the most-used PAS are under strongest selection. Together, this work improves our understanding of selection on human genes and validates approaches for interpreting genetic variants in human 3' UTRs.
Collapse
Affiliation(s)
- Scott D Findlay
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Lindsay Romo
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
- Boston Children's Hospital, Boston, MA, 02115, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.
| |
Collapse
|
16
|
Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, Alföldi J, Watts NA, Vittal C, Gauthier LD, Poterba T, Wilson MW, Tarasova Y, Phu W, Grant R, Yohannes MT, Koenig Z, Farjoun Y, Banks E, Donnelly S, Gabriel S, Gupta N, Ferriera S, Tolonen C, Novod S, Bergelson L, Roazen D, Ruano-Rubio V, Covarrubias M, Llanwarne C, Petrillo N, Wade G, Jeandet T, Munshi R, Tibbetts K, O'Donnell-Luria A, Solomonson M, Seed C, Martin AR, Talkowski ME, Rehm HL, Daly MJ, Tiao G, Neale BM, MacArthur DG, Karczewski KJ. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 2024; 625:92-100. [PMID: 38057664 PMCID: PMC11629659 DOI: 10.1038/s41586-023-06045-0] [Citation(s) in RCA: 430] [Impact Index Per Article: 430.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 04/03/2023] [Indexed: 12/08/2023]
Abstract
The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.
Collapse
Affiliation(s)
- Siwei Chen
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
| | - Laurent C Francioli
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Julia K Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ryan L Collins
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Masahiro Kanai
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Qingbo Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Jessica Alföldi
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Christopher Vittal
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Laura D Gauthier
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Timothy Poterba
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael W Wilson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Yekaterina Tarasova
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - William Phu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Riley Grant
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mary T Yohannes
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zan Koenig
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yossi Farjoun
- Richards Lab, Lady Davis Institute, Montreal, Quebec, Canada
| | - Eric Banks
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Stacey Gabriel
- Broad Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Namrata Gupta
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Broad Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven Ferriera
- Broad Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Charlotte Tolonen
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sam Novod
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Louis Bergelson
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David Roazen
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Miguel Covarrubias
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Nikelle Petrillo
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gordon Wade
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Thibault Jeandet
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ruchi Munshi
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kathleen Tibbetts
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Cotton Seed
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alicia R Martin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
17
|
Volpe E, Corda L, Tommaso ED, Pelliccia F, Ottalevi R, Licastro D, Guarracino A, Capulli M, Formenti G, Tassone E, Giunta S. The complete diploid reference genome of RPE-1 identifies human phased epigenetic landscapes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.01.565049. [PMID: 38168337 PMCID: PMC10760208 DOI: 10.1101/2023.11.01.565049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Comparative analysis of recent human genome assemblies highlights profound sequence divergence that peaks within polymorphic loci such as centromeres. This raises the question about the adequacy of relying on human reference genomes to accurately analyze sequencing data derived from experimental cell lines. Here, we generated the complete diploid genome assembly for the human retinal epithelial cells (RPE-1), a widely used non-cancer laboratory cell line with a stable karyotype, to use as matched reference for multi-omics sequencing data analysis. Our RPE1v1.0 assembly presents completely phased haplotypes and chromosome-level scaffolds that span centromeres with ultra-high base accuracy (>QV60). We mapped the haplotype-specific genomic variation specific to this cell line including t(Xq;10q), a stable 73.18 Mb duplication of chromosome 10 translocated onto the microdeleted chromosome X telomere t(Xq;10q). Polymorphisms between haplotypes of the same genome reveals genetic and epigenetic variation for all chromosomes, especially at centromeres. The RPE-1 assembly as matched reference genome improves mapping quality of multi-omics reads originating from RPE-1 cells with drastic reduction in alignments mismatches compared to using the most complete human reference to date (CHM13). Leveraging the accuracy achieved using a matched reference, we were able to identify the kinetochore sites at base pair resolution and show unprecedented variation between haplotypes. This work showcases the use of matched reference genomes for multiomics analyses and serves as the foundation for a call to comprehensively assemble experimentally relevant cell lines for widespread application.
Collapse
Affiliation(s)
- Emilia Volpe
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Luca Corda
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Elena Di Tommaso
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Franca Pelliccia
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Riccardo Ottalevi
- Department of Bioinformatic, Dante Genomics Corp Inc., 667 Madison Avenue, New York, NY 10065 USA and S.s.17, 67100, L’Aquila, Italy
| | | | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Mattia Capulli
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L’Aquila, Italy
| | - Giulio Formenti
- The Rockefeller University, 1230 York Avenue, 10065 New York, USA
| | - Evelyne Tassone
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Simona Giunta
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| |
Collapse
|
18
|
A biology-aware mutation rate model for human germline. Nat Genet 2023; 55:2033-2034. [PMID: 38040830 DOI: 10.1038/s41588-023-01564-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2023]
|
19
|
Seplyarskiy V, Koch EM, Lee DJ, Lichtman JS, Luan HH, Sunyaev SR. A mutation rate model at the basepair resolution identifies the mutagenic effect of polymerase III transcription. Nat Genet 2023; 55:2235-2242. [PMID: 38036792 PMCID: PMC11348951 DOI: 10.1038/s41588-023-01562-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/06/2023] [Indexed: 12/02/2023]
Abstract
De novo mutations occur at substantially different rates depending on genomic location, sequence context and DNA strand. The success of methods to estimate selection intensity, infer demographic history and map rare disease genes, depends strongly on assumptions about the local mutation rate. Here we present Roulette, a genome-wide mutation rate model at basepair resolution that incorporates known determinants of local mutation rate. Roulette is shown to be more accurate than existing models. We use Roulette to refine the estimates of population growth within Europe by incorporating the full range of human mutation rates. The analysis of significant deviations from the model predictions revealed a tenfold increase in mutation rate in nearly all genes transcribed by polymerase III (Pol III), suggesting a new mutagenic mechanism. We also detected an elevated mutation rate within transcription factor binding sites restricted to sites actively used in testis and residing in promoters.
Collapse
Affiliation(s)
- Vladimir Seplyarskiy
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Evan M Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Daniel J Lee
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA
| | - Joshua S Lichtman
- NGM Biopharmaceuticals Inc., South San Francisco, CA, USA
- Soleil Labs, South San Francisco, CA, USA
| | - Harding H Luan
- NGM Biopharmaceuticals Inc., South San Francisco, CA, USA
- Soleil Labs, South San Francisco, CA, USA
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Brigham and Women's Hospital, Division of Genetics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
20
|
Spisak N, de Manuel M, Milligan W, Sella G, Przeworski M. Disentangling sources of clock-like mutations in germline and soma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.07.556720. [PMID: 37745549 PMCID: PMC10515775 DOI: 10.1101/2023.09.07.556720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
The rates of mutations vary across cell types. To identify causes of this variation, mutations are often decomposed into a combination of the single base substitution (SBS) "signatures" observed in germline, soma and tumors, with the idea that each signature corresponds to one or a small number of underlying mutagenic processes. Two such signatures turn out to be ubiquitous across cell types: SBS signature 1, which consists primarily of transitions at methylated CpG sites caused by spontaneous deamination, and the more diffuse SBS signature 5, which is of unknown etiology. In cancers, the number of mutations attributed to these two signatures accumulates linearly with age of diagnosis, and thus the signatures have been termed "clock-like." To better understand this clock-like behavior, we develop a mathematical model that includes DNA replication errors, unrepaired damage, and damage repaired incorrectly. We show that mutational signatures can exhibit clock-like behavior because cell divisions occur at a constant rate and/or because damage rates remain constant over time, and that these distinct sources can be teased apart by comparing cell lineages that divide at different rates. With this goal in mind, we analyze the rate of accumulation of mutations in multiple cell types, including soma as well as male and female germline. We find no detectable increase in SBS signature 1 mutations in neurons and only a very weak increase in mutations assigned to the female germline, but a significant increase with time in rapidly-dividing cells, suggesting that SBS signature 1 is driven by rounds of DNA replication occurring at a relatively fixed rate. In contrast, SBS signature 5 increases with time in all cell types, including post-mitotic ones, indicating that it accumulates independently of cell divisions; this observation points to errors in DNA repair as the key underlying mechanism. Thus, the two "clock-like" signatures observed across cell types likely have distinct origins, one set by rates of cell division, the other by damage rates.
Collapse
Affiliation(s)
- Natanael Spisak
- Department of Biological Sciences, Columbia University, New York, United States
| | - Marc de Manuel
- Department of Biological Sciences, Columbia University, New York, United States
| | - William Milligan
- Department of Biological Sciences, Columbia University, New York, United States
| | - Guy Sella
- Department of Biological Sciences, Columbia University, New York, United States
- Program for Mathematical Genomics, Columbia University, New York, United States
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University, New York, United States
- Department of Systems Biology, Columbia University, New York, United States
| |
Collapse
|
21
|
Lee YL, Bouwman AC, Harland C, Bosse M, Costa Monteiro Moreira G, Veerkamp RF, Mullaart E, Cambisano N, Groenen MAM, Karim L, Coppieters W, Georges M, Charlier C. The rate of de novo structural variation is increased in in vitro-produced offspring and preferentially affects the paternal genome. Genome Res 2023; 33:1455-1464. [PMID: 37793781 PMCID: PMC10620045 DOI: 10.1101/gr.277884.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 08/08/2023] [Indexed: 10/06/2023]
Abstract
Assisted reproductive technologies (ARTs), including in vitro maturation and fertilization (IVF), are increasingly used in human and animal reproduction. Whether these technologies directly affect the rate of de novo mutation (DNM), and to what extent, has been a matter of debate. Here we take advantage of domestic cattle, characterized by complex pedigrees that are ideally suited to detect DNMs and by the systematic use of ART, to study the rate of de novo structural variation (dnSV) in this species and how it is impacted by IVF. By exploiting features of associated de novo point mutations (dnPMs) and dnSVs in clustered DNMs, we provide strong evidence that (1) IVF increases the rate of dnSV approximately fivefold, and (2) the corresponding mutations occur during the very early stages of embryonic development (one- and two-cell stage), yet primarily affect the paternal genome.
Collapse
Affiliation(s)
- Young-Lim Lee
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Aniek C Bouwman
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Chad Harland
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium
- Livestock Improvement Corporation, Hamilton 3240, New Zealand
| | - Mirte Bosse
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | | | - Roel F Veerkamp
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | | | - Nadine Cambisano
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Martien A M Groenen
- Wageningen University and Research, Animal Breeding, and Genomics, 6708 WG Wageningen, The Netherlands
| | - Latifa Karim
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Wouter Coppieters
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium
- GIGA Genomics Platform, GIGA Institute, University of Liège, B-4000 Liège, Belgium
| | - Michel Georges
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
| | - Carole Charlier
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, B-4000 Liège, Belgium;
| |
Collapse
|
22
|
Lin Y, Darolti I, van der Bijl W, Morris J, Mank JE. Extensive variation in germline de novo mutations in Poecilia reticulata. Genome Res 2023; 33:1317-1324. [PMID: 37442578 PMCID: PMC10547258 DOI: 10.1101/gr.277936.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 07/07/2023] [Indexed: 07/15/2023]
Abstract
The rate of germline mutation is fundamental to evolutionary processes, as it generates the variation upon which selection acts. The guppy, Poecilia reticulata, is a model of rapid adaptation, however the relative contribution of standing genetic variation versus de novo mutation (DNM) to evolution in this species remains unclear. Here, we use pedigree-based approaches to quantify and characterize germline DNMs in three large guppy families. Our results suggest germline mutation rate in the guppy varies substantially across individuals and families. Most DNMs are shared across multiple siblings, suggesting they arose during early embryonic development. DNMs are randomly distributed throughout the genome, and male-biased mutation rate is low, as would be expected from the short guppy generation time. Overall, our study shows remarkable variation in germline mutation rate and provides insights into rapid evolution of guppies.
Collapse
Affiliation(s)
- Yuying Lin
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada;
| | - Iulia Darolti
- Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland
| | - Wouter van der Bijl
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Jake Morris
- School of Biological Science, University of Bristol, Bristol BS8 1TQ, United Kingdom
| | - Judith E Mank
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| |
Collapse
|
23
|
Wakeley J, Fan WT(L, Koch E, Sunyaev S. Recurrent mutation in the ancestry of a rare variant. Genetics 2023; 224:iyad049. [PMID: 36967220 PMCID: PMC10324944 DOI: 10.1093/genetics/iyad049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 01/30/2023] [Accepted: 03/08/2023] [Indexed: 03/28/2023] Open
Abstract
Recurrent mutation produces multiple copies of the same allele which may be co-segregating in a population. Yet, most analyses of allele-frequency or site-frequency spectra assume that all observed copies of an allele trace back to a single mutation. We develop a sampling theory for the number of latent mutations in the ancestry of a rare variant, specifically a variant observed in relatively small count in a large sample. Our results follow from the statistical independence of low-count mutations, which we show to hold for the standard neutral coalescent or diffusion model of population genetics as well as for more general coalescent trees. For populations of constant size, these counts are distributed like the number of alleles in the Ewens sampling formula. We develop a Poisson sampling model for populations of varying size and illustrate it using new results for site-frequency spectra in an exponentially growing population. We apply our model to a large data set of human SNPs and use it to explain dramatic differences in site-frequency spectra across the range of mutation rates in the human genome.
Collapse
Affiliation(s)
- John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Wai-Tong (Louis) Fan
- Department of Mathematics, Indiana University, Bloomington, IN 47405, USA
- Center of Mathematical Sciences and Applications, Harvard University, Cambridge, MA 02138, USA
| | - Evan Koch
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Shamil Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
24
|
Ruf WP, Boros M, Freischmidt A, Brenner D, Grozdanov V, de Meirelles J, Meyer T, Grehl T, Petri S, Grosskreutz J, Weyen U, Guenther R, Regensburger M, Hagenacker T, Koch JC, Emmer A, Roediger A, Steinbach R, Wolf J, Weishaupt JH, Lingor P, Deschauer M, Cordts I, Klopstock T, Reilich P, Schoeberl F, Schrank B, Zeller D, Hermann A, Knehr A, Günther K, Dorst J, Schuster J, Siebert R, Ludolph AC, Müller K. Spectrum and frequency of genetic variants in sporadic amyotrophic lateral sclerosis. Brain Commun 2023; 5:fcad152. [PMID: 37223130 PMCID: PMC10202555 DOI: 10.1093/braincomms/fcad152] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/24/2023] [Accepted: 05/05/2023] [Indexed: 05/25/2023] Open
Abstract
Therapy of motoneuron diseases entered a new phase with the use of intrathecal antisense oligonucleotide therapies treating patients with specific gene mutations predominantly in the context of familial amyotrophic lateral sclerosis. With the majority of cases being sporadic, we conducted a cohort study to describe the mutational landscape of sporadic amyotrophic lateral sclerosis. We analysed genetic variants in amyotrophic lateral sclerosis-associated genes to assess and potentially increase the number of patients eligible for gene-specific therapies. We screened 2340 sporadic amyotrophic lateral sclerosis patients from the German Network for motor neuron diseases for variants in 36 amyotrophic lateral sclerosis-associated genes using targeted next-generation sequencing and for the C9orf72 hexanucleotide repeat expansion. The genetic analysis could be completed on 2267 patients. Clinical data included age at onset, disease progression rate and survival. In this study, we found 79 likely pathogenic Class 4 variants and 10 pathogenic Class 5 variants (without the C9orf72 hexanucleotide repeat expansion) according to the American College of Medical Genetics and Genomics guidelines, of which 31 variants are novel. Thus, including C9orf72 hexanucleotide repeat expansion, Class 4, and Class 5 variants, 296 patients, corresponding to ∼13% of our cohort, could be genetically resolved. We detected 437 variants of unknown significance of which 103 are novel. Corroborating the theory of oligogenic causation in amyotrophic lateral sclerosis, we found a co-occurrence of pathogenic variants in 10 patients (0.4%) with 7 being C9orf72 hexanucleotide repeat expansion carriers. In a gene-wise survival analysis, we found a higher hazard ratio of 1.47 (95% confidence interval 1.02-2.1) for death from any cause for patients with the C9orf72 hexanucleotide repeat expansion and a lower hazard ratio of 0.33 (95% confidence interval 0.12-0.9) for patients with pathogenic SOD1 variants than for patients without a causal gene mutation. In summary, the high yield of 296 patients (∼13%) harbouring a pathogenic variant and oncoming gene-specific therapies for SOD1/FUS/C9orf72, which would apply to 227 patients (∼10%) in this cohort, corroborates that genetic testing should be made available to all sporadic amyotrophic lateral sclerosis patients after respective counselling.
Collapse
Affiliation(s)
- Wolfgang P Ruf
- Correspondence to: Dr Wolfgang P. Ruf Department of Neurology Medical Faculty, Ulm University Albert-Einstein-Allee 23, Ulm 89081, Germany E-mail:
| | - Matej Boros
- Institute of Human Genetics, Ulm University & Ulm University Medical Center, Ulm 89081, Germany
| | - Axel Freischmidt
- Department of Neurology, Ulm University, Ulm 89081, Germany
- Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), German Center for Neurodegenerative Diseases, Ulm 89081, Germany
| | - David Brenner
- Department of Neurology, Ulm University, Ulm 89081, Germany
| | | | - Joao de Meirelles
- Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), German Center for Neurodegenerative Diseases, Ulm 89081, Germany
| | - Thomas Meyer
- Department of Neurology, Center for ALS and other Motor Neuron Disorders, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin 13353, Germany
| | - Torsten Grehl
- Department of Neurology, Alfried Krupp Hospital, Essen 45131, Germany
| | - Susanne Petri
- Department of Neurology, Medizinische Hochschule Hannover, Hannover 30625, Germany
| | | | - Ute Weyen
- Department of Neurology, University Hospital Bochum, Bochum 44789, Germany
| | - Rene Guenther
- Department of Neurology, Technische Universität Dresden, Dresden 01307, Germany
| | - Martin Regensburger
- Department of Neurology, University Hospital Erlangen, Erlangen 91054, Germany
| | - Tim Hagenacker
- Department of Neurology Center for Translational Neuro- and Behavioral Sciences (C-TNBS), University Medicine Essen, Essen 45147, Germany
| | - Jan C Koch
- Department of Neurology, University Medical Center Goettingen, Goettingen 37075, Germany
| | - Alexander Emmer
- University Clinic and Polyclinic for Neurology, University Hospital Halle, Halle 06120, Germany
| | | | - Robert Steinbach
- Department of Neurology, University Hospital Jena, Jena 07747, Germany
| | - Joachim Wolf
- Department of Neurology, Diako Mannheim, Mannheim 68163, Germany
| | - Jochen H Weishaupt
- Department of Neurology, University Hospital Mannheim, Mannheim 68167, Germany
| | - Paul Lingor
- Department of Neurology, Technical University Munich, Munich 80333, Germany
| | - Marcus Deschauer
- Department of Neurology, Technical University Munich, Munich 80333, Germany
| | - Isabell Cordts
- Department of Neurology, Technical University Munich, Munich 80333, Germany
| | - Thomas Klopstock
- Department of Neurology with Friedrich-Baur-Institute, University Hospital of Ludwig-Maximilians-University, München 80336, Germany
- Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), German Center for Neurodegenerative Diseases, Munich 81377, Germany
| | - Peter Reilich
- Department of Neurology with Friedrich-Baur-Institute, University Hospital of Ludwig-Maximilians-University, München 80336, Germany
| | - Florian Schoeberl
- Department of Neurology with Friedrich-Baur-Institute, University Hospital of Ludwig-Maximilians-University, München 80336, Germany
| | - Berthold Schrank
- Department of Neurology, DKD Helios Clinics, Wiesbaden 65191, Germany
| | - Daniel Zeller
- Department of Neurology, University Hospital Wuerzburg, Wuerzburg 97080, Germany
| | - Andreas Hermann
- Translational Neurodegeneration Section ‘Albrecht Kossel’, University Medical Center Rostock, Rostock 18146, Germany
- Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), German Center for Neurodegenerative Diseases, Rostock/Greifswald 17489, Germany
| | - Antje Knehr
- Department of Neurology, Ulm University, Ulm 89081, Germany
| | | | - Johannes Dorst
- Department of Neurology, Ulm University, Ulm 89081, Germany
- Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), German Center for Neurodegenerative Diseases, Ulm 89081, Germany
| | - Joachim Schuster
- Department of Neurology, Ulm University, Ulm 89081, Germany
- Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), German Center for Neurodegenerative Diseases, Ulm 89081, Germany
| | - Reiner Siebert
- Institute of Human Genetics, Ulm University & Ulm University Medical Center, Ulm 89081, Germany
| | - Albert C Ludolph
- Department of Neurology, Ulm University, Ulm 89081, Germany
- Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), German Center for Neurodegenerative Diseases, Ulm 89081, Germany
| | - Kathrin Müller
- Department of Neurology, Ulm University, Ulm 89081, Germany
- Institute of Human Genetics, Ulm University & Ulm University Medical Center, Ulm 89081, Germany
| |
Collapse
|
25
|
Liao K, Carlson J, Zöllner S. The effect of mutation subtypes on the allele frequency spectrum and population genetics inference. G3 (BETHESDA, MD.) 2023; 13:jkad035. [PMID: 36759699 PMCID: PMC10085755 DOI: 10.1093/g3journal/jkad035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 02/11/2023]
Abstract
Population genetics has adapted as technological advances in next-generation sequencing have resulted in an exponential increase of genetic data. A common approach to efficiently analyze genetic variation present in large sequencing data is through the allele frequency spectrum, defined as the distribution of allele frequencies in a sample. While the frequency spectrum serves to summarize patterns of genetic variation, it implicitly assumes mutation types (A→C vs C→T) as interchangeable. However, mutations of different types arise and spread due to spatial and temporal variation in forces such as mutation rate and biased gene conversion that result in heterogeneity in the distribution of allele frequencies across sites. In this work, we explore the impact of this simplification on multiple aspects of population genetic modeling. As a site's mutation rate is strongly affected by flanking nucleotides, we defined a mutation subtype by the base pair change and adjacent nucleotides (e.g. AAA→ATA) and systematically assessed the heterogeneity in the frequency spectrum across 96 distinct 3-mer mutation subtypes using n = 3556 whole-genome sequenced individuals of European ancestry. We observed substantial variation across the subtype-specific frequency spectra, with some of the variation being influenced by molecular factors previously identified for single base mutation types. Estimates of model parameters from demographic inference performed for each mutation subtype's AFS individually varied drastically across the 96 subtypes. In local patterns of variation, a combination of regional subtype composition and local genomic factors shaped the regional frequency spectrum across genomic regions. Our results illustrate how treating variants in large sequencing samples as interchangeable may confound population genetic frameworks and encourages us to consider the unique evolutionary mechanisms of analyzed polymorphisms.
Collapse
Affiliation(s)
- Kevin Liao
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jedidiah Carlson
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712, USA
- Department of Population Health, University of Texas at Austin, Austin, TX 78712, USA
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
26
|
Revisiting mutagenesis at non-B DNA motifs in the human genome. Nat Struct Mol Biol 2023; 30:417-424. [PMID: 36914796 DOI: 10.1038/s41594-023-00936-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 02/03/2023] [Indexed: 03/16/2023]
Abstract
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.
Collapse
|
27
|
Moeckel C, Zaravinos A, Georgakopoulos-Soares I. Strand asymmetries across genomic processes. Comput Struct Biotechnol J 2023; 21:2036-2047. [PMID: 36968020 PMCID: PMC10030826 DOI: 10.1016/j.csbj.2023.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/08/2023] [Accepted: 03/08/2023] [Indexed: 03/12/2023] Open
Abstract
Across biological systems, a number of genomic processes, including transcription, replication, DNA repair, and transcription factor binding, display intrinsic directionalities. These directionalities are reflected in the asymmetric distribution of nucleotides, motifs, genes, transposon integration sites, and other functional elements across the two complementary strands. Strand asymmetries, including GC skews and mutational biases, have shaped the nucleotide composition of diverse organisms. The investigation of strand asymmetries often serves as a method to understand underlying biological mechanisms, including protein binding preferences, transcription factor interactions, retrotransposition, DNA damage and repair preferences, transcription-replication collisions, and mutagenesis mechanisms. Research into this subject also enables the identification of functional genomic sites, such as replication origins and transcription start sites. Improvements in our ability to detect and quantify DNA strand asymmetries will provide insights into diverse functionalities of the genome, the contribution of different mutational mechanisms in germline and somatic mutagenesis, and our knowledge of genome instability and evolution, which all have significant clinical implications in human disease, including cancer. In this review, we describe key developments that have been made across the field of genomic strand asymmetries, as well as the discovery of associated mechanisms.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Apostolos Zaravinos
- Department of Life Sciences, European University Cyprus, Diogenis Str., 6, Nicosia 2404, Cyprus
- Cancer Genetics, Genomics and Systems Biology laboratory, Basic and Translational Cancer Research Center (BTCRC), Nicosia 1516, Cyprus
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
28
|
Guan Z, Begg CB, Shen R. Predicting Cancer Risk from Germline Whole-exome Sequencing Data Using a Novel Context-based Variant Aggregation Approach. CANCER RESEARCH COMMUNICATIONS 2023; 3:483-488. [PMID: 36969913 PMCID: PMC10032232 DOI: 10.1158/2767-9764.crc-22-0355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 01/24/2023] [Accepted: 02/21/2023] [Indexed: 06/18/2023]
Abstract
Many studies have shown that the distributions of the genomic, nucleotide, and epigenetic contexts of somatic variants in tumors are informative of cancer etiology. Recently, a new direction of research has focused on extracting signals from the contexts of germline variants and evidence has emerged that patterns defined by these factors are associated with oncogenic pathways, histologic subtypes, and prognosis. It remains an open question whether aggregating germline variants using meta-features capturing their genomic, nucleotide, and epigenetic contexts can improve cancer risk prediction. This aggregation approach can potentially increase statistical power for detecting signals from rare variants, which have been hypothesized to be a major source of the missing heritability of cancer. Using germline whole-exome sequencing data from the UK Biobank, we developed risk models for 10 cancer types using known risk variants (cancer-associated SNPs and pathogenic variants in known cancer predisposition genes) as well as models that additionally include the meta-features. The meta-features did not improve the prediction accuracy of models based on known risk variants. It is possible that expanding the approach to whole-genome sequencing can lead to gains in prediction accuracy. Significance There is evidence that cancer is partly caused by rare genetic variants that have not yet been identified. We investigate this issue using novel statistical methods and data from the UK Biobank.
Collapse
Affiliation(s)
- Zoe Guan
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Colin B. Begg
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Ronglai Shen
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
| |
Collapse
|
29
|
Gao Z, Zhang Y, Cramer N, Przeworski M, Moorjani P. Limited role of generation time changes in driving the evolution of the mutation spectrum in humans. eLife 2023; 12:e81188. [PMID: 36779395 PMCID: PMC10014080 DOI: 10.7554/elife.81188] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 02/02/2023] [Indexed: 02/14/2023] Open
Abstract
Recent studies have suggested that the human germline mutation rate and spectrum evolve rapidly. Variation in generation time has been linked to these changes, though its contribution remains unclear. We develop a framework to characterize temporal changes in polymorphisms within and between populations, while controlling for the effects of natural selection and biased gene conversion. Application to the 1000 Genomes Project dataset reveals multiple independent changes that arose after the split of continental groups, including a previously reported, transient elevation in TCC>TTC mutations in Europeans and novel signals of divergence in C>Gand T>A mutation rates among population samples. We also find a significant difference between groups sampled in and outside of Africa in old T>C polymorphisms that predate the out-of-Africa migration. This surprising signal is driven by TpG>CpG mutations and stems in part from mis-polarized CpG transitions, which are more likely to undergo recurrent mutations. Finally, by relating the mutation spectrum of polymorphisms to parental age effects on de novo mutations, we show that plausible changes in the generation time cannot explain the patterns observed for different mutation types jointly. Thus, other factors - genetic modifiers or environmental exposures - must have had a non-negligible impact on the human mutation landscape.
Collapse
Affiliation(s)
- Ziyue Gao
- Department of Genetics, University of Pennsylvania, Perelman School of MedicinePhiladelphiaUnited States
| | - Yulin Zhang
- Center for Computational Biology, University of California, BerkeleyBerkeleyUnited States
| | - Nathan Cramer
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
| | - Molly Przeworski
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
- Department of Systems Biology, Columbia UniversityNew YorkUnited States
| | - Priya Moorjani
- Center for Computational Biology, University of California, BerkeleyBerkeleyUnited States
- Department of Molecular and Cell Biology, University of California, BerkeleyBerkeleyUnited States
| |
Collapse
|
30
|
Murat P, Perez C, Crisp A, van Eijk P, Reed SH, Guilbaud G, Sale JE. DNA replication initiation shapes the mutational landscape and expression of the human genome. SCIENCE ADVANCES 2022; 8:eadd3686. [PMID: 36351018 PMCID: PMC9645720 DOI: 10.1126/sciadv.add3686] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Abstract
The interplay between active biological processes and DNA repair is central to mutagenesis. Here, we show that the ubiquitous process of replication initiation is mutagenic, leaving a specific mutational footprint at thousands of early and efficient replication origins. The observed mutational pattern is consistent with two distinct mechanisms, reflecting the two-step process of origin activation, triggering the formation of DNA breaks at the center of origins and local error-prone DNA synthesis in their immediate vicinity. We demonstrate that these replication initiation-dependent mutational processes exert an influence on phenotypic diversity in humans that is disproportionate to the origins' genomic size: By increasing mutational loads at gene promoters and splice junctions, the presence of an origin significantly influences both gene expression and mRNA isoform usage. Last, we show that mutagenesis at origins not only drives the evolution of origin sequences but also contributes to sculpting regulatory domains of the human genome.
Collapse
Affiliation(s)
- Pierre Murat
- Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Consuelo Perez
- Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Alastair Crisp
- Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Patrick van Eijk
- Broken String Biosciences Ltd., BioData Innovation Centre, Unit AB3-03, Level 3, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, UK
- Division of Cancer & Genetics School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Simon H. Reed
- Broken String Biosciences Ltd., BioData Innovation Centre, Unit AB3-03, Level 3, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, UK
- Division of Cancer & Genetics School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Guillaume Guilbaud
- Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Julian E. Sale
- Division of Protein & Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| |
Collapse
|
31
|
Moldovan MA, Gaydukova SA. Unusual Dependence between Gene Expression and Negative Selection in Euplotes. Mol Biol 2022. [DOI: 10.1134/s0026893323010090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
32
|
Löytynoja A. Thousands of human mutation clusters are explained by short-range template switching. Genome Res 2022; 32:1437-1447. [PMID: 35760560 PMCID: PMC9435742 DOI: 10.1101/gr.276478.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 06/21/2022] [Indexed: 02/03/2023]
Abstract
Variation within human genomes is unevenly distributed, and variants show spatial clustering. DNA replication-related template switching is a poorly known mutational mechanism capable of causing major chromosomal rearrangements as well as creating short inverted sequence copies that appear as local mutation clusters in sequence comparisons. In this study, haplotype-resolved genome assemblies representing 25 human populations and multinucleotide variants aggregated from 140,000 human sequencing experiments were reanalyzed. Local template switching could explain thousands of complex mutation clusters across the human genome, the loci segregating within and between populations. During the study, computational tools were developed for identification of template switch events using both short-read sequencing data and genotype data, and for genotyping candidate loci using short-read data. The characteristics of template-switch mutations complicate their detection, and widely used analysis pipelines for short-read sequencing data, normally capable of identifying single nucleotide changes, were found to miss template-switch mutations of tens of base pairs, potentially invalidating medical genetic studies searching for a causative allele behind genetic diseases. Combined with the massive sequencing data now available for humans, the novel tools described here enable building catalogs of affected loci and studying the cellular mechanisms behind template switching in both healthy organisms and disease.
Collapse
Affiliation(s)
- Ari Löytynoja
- Institute of Biotechnology, University of Helsinki, FI-00014 Helsinki, Finland
| |
Collapse
|
33
|
de Manuel M, Wu FL, Przeworski M. A paternal bias in germline mutation is widespread in amniotes and can arise independently of cell division numbers. eLife 2022; 11:e80008. [PMID: 35916372 PMCID: PMC9439683 DOI: 10.7554/elife.80008] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 08/01/2022] [Indexed: 11/13/2022] Open
Abstract
In humans and other mammals, germline mutations are more likely to arise in fathers than in mothers. Although this sex bias has long been attributed to DNA replication errors in spermatogenesis, recent evidence from humans points to the importance of mutagenic processes that do not depend on cell division, calling into question our understanding of this basic phenomenon. Here, we infer the ratio of paternal-to-maternal mutations, α, in 42 species of amniotes, from putatively neutral substitution rates of sex chromosomes and autosomes. Despite marked differences in gametogenesis, physiologies and environments across species, fathers consistently contribute more mutations than mothers in all the species examined, including mammals, birds, and reptiles. In mammals, α is as high as 4 and correlates with generation times; in birds and snakes, α appears more stable around 2. These observations are consistent with a simple model, in which mutations accrue at equal rates in both sexes during early development and at a higher rate in the male germline after sexual differentiation, with a conserved paternal-to-maternal ratio across species. Thus, α may reflect the relative contributions of two or more developmental phases to total germline mutations, and is expected to depend on generation time even if mutations do not track cell divisions.
Collapse
Affiliation(s)
- Marc de Manuel
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
| | - Felix L Wu
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
| | - Molly Przeworski
- Department of Systems Biology, Columbia UniversityNew YorkUnited States
| |
Collapse
|
34
|
Matsen FA, Ralph PL. Enabling Inference for Context-Dependent Models of Mutation by Bounding the Propagation of Dependency. J Comput Biol 2022; 29:802-824. [PMID: 35776513 PMCID: PMC9419934 DOI: 10.1089/cmb.2021.0644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Although the rates at which positions in the genome mutate are known to depend not only on the nucleotide to be mutated, but also on neighboring nucleotides, it remains challenging to do phylogenetic inference using models of context-dependent mutation. In these models, the effects of one mutation may in principle propagate to faraway locations, making it difficult to compute exact likelihoods. This article shows how to use bounds on the propagation of dependency to compute likelihoods of mutation of a given segment of genome by marginalizing over sufficiently long flanking sequence. This can be used for maximum likelihood or Bayesian inference. Protocols examining residuals and iterative model refinement are also discussed. Tools for efficiently working with these models are provided in an R package, which could be used in other applications. The method is used to examine context dependence of mutations since the common ancestor of humans and chimpanzee.
Collapse
Affiliation(s)
- Frederick A. Matsen
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Department of Genome Sciences, and University of Washington, Seattle, Washington, USA
- Department of Statistics, University of Washington, Seattle, Washington, USA
- Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
| | - Peter L. Ralph
- Departments of Biology and Mathematics, Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, USA
| |
Collapse
|
35
|
Halldorsson BV, Eggertsson HP, Moore KHS, Hauswedell H, Eiriksson O, Ulfarsson MO, Palsson G, Hardarson MT, Oddsson A, Jensson BO, Kristmundsdottir S, Sigurpalsdottir BD, Stefansson OA, Beyter D, Holley G, Tragante V, Gylfason A, Olason PI, Zink F, Asgeirsdottir M, Sverrisson ST, Sigurdsson B, Gudjonsson SA, Sigurdsson GT, Halldorsson GH, Sveinbjornsson G, Norland K, Styrkarsdottir U, Magnusdottir DN, Snorradottir S, Kristinsson K, Sobech E, Jonsson H, Geirsson AJ, Olafsson I, Jonsson P, Pedersen OB, Erikstrup C, Brunak S, Ostrowski SR, Thorleifsson G, Jonsson F, Melsted P, Jonsdottir I, Rafnar T, Holm H, Stefansson H, Saemundsdottir J, Gudbjartsson DF, Magnusson OT, Masson G, Thorsteinsdottir U, Helgason A, Jonsson H, Sulem P, Stefansson K. The sequences of 150,119 genomes in the UK Biobank. Nature 2022; 607:732-740. [PMID: 35859178 PMCID: PMC9329122 DOI: 10.1038/s41586-022-04965-x] [Citation(s) in RCA: 238] [Impact Index Per Article: 79.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/10/2022] [Indexed: 12/25/2022]
Abstract
Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data1,2. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank3. This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation.
Collapse
Affiliation(s)
- Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland. .,School of Technology, Reykjavik University, Reykjavik, Iceland.
| | | | | | | | | | - Magnus O Ulfarsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Marteinn T Hardarson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | | | - Snaedis Kristmundsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Technology, Reykjavik University, Reykjavik, Iceland
| | - Brynja D Sigurpalsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Helgi Jonsson
- Landspitali-University Hospital, Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Palmi Jonsson
- Landspitali-University Hospital, Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Ole Birger Pedersen
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
| | - Christian Erikstrup
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.,Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Sisse Rye Ostrowski
- Department of Clinical Immunology, Copenhagen University Hospital (Rigshospitalet), Copenhagen, Denmark.,Department of Clinical Medicine, Faculty of Health and Clinical Sciences, Copenhagen University, Copenhagen, Denmark
| | | | | | | | - Pall Melsted
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Ingileif Jonsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Hilma Holm
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
| | | | | | - Daniel F Gudbjartsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Unnur Thorsteinsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Agnar Helgason
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | | | | | | |
Collapse
|
36
|
Germline predisposition to pediatric Ewing sarcoma is characterized by inherited pathogenic variants in DNA damage repair genes. Am J Hum Genet 2022; 109:1026-1037. [PMID: 35512711 PMCID: PMC9247831 DOI: 10.1016/j.ajhg.2022.04.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 04/11/2022] [Indexed: 12/12/2022] Open
Abstract
More knowledge is needed regarding germline predisposition to Ewing sarcoma to inform biological investigation and clinical practice. Here, we evaluated the enrichment of pathogenic germline variants in Ewing sarcoma relative to other pediatric sarcoma subtypes, as well as patterns of inheritance of these variants. We carried out European-focused and pan-ancestry case-control analyses to screen for enrichment of pathogenic germline variants in 141 established cancer predisposition genes in 1,147 individuals with pediatric sarcoma diagnoses (226 Ewing sarcoma, 438 osteosarcoma, 180 rhabdomyosarcoma, and 303 other sarcoma) relative to identically processed cancer-free control individuals. Findings in Ewing sarcoma were validated with an additional cohort of 430 individuals, and a subset of 301 Ewing sarcoma parent-proband trios was analyzed for inheritance patterns of identified pathogenic variants. A distinct pattern of pathogenic germline variants was seen in Ewing sarcoma relative to other sarcoma subtypes. FANCC was the only gene with an enrichment signal for heterozygous pathogenic variants in the European Ewing sarcoma discovery cohort (three individuals, OR 12.6, 95% CI 3.0–43.2, p = 0.003, FDR = 0.40). This enrichment in FANCC heterozygous pathogenic variants was again observed in the European Ewing sarcoma validation cohort (three individuals, OR 7.0, 95% CI 1.7–23.6, p = 0.014), representing a broader importance of genes involved in DNA damage repair, which were also nominally enriched in individuals with Ewing sarcoma. Pathogenic variants in DNA damage repair genes were acquired through autosomal inheritance. Our study provides new insight into germline risk factors contributing to Ewing sarcoma pathogenesis.
Collapse
|
37
|
Karolak A, Levatić J, Supek F. A framework for mutational signature analysis based on DNA shape parameters. PLoS One 2022; 17:e0262495. [PMID: 35015788 PMCID: PMC8752002 DOI: 10.1371/journal.pone.0262495] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 12/27/2021] [Indexed: 11/18/2022] Open
Abstract
The mutation risk of a DNA locus depends on its oligonucleotide context. In turn, mutability of oligonucleotides varies across individuals, due to exposure to mutagenic agents or due to variable efficiency and/or accuracy of DNA repair. Such variability is captured by mutational signatures, a mathematical construct obtained by a deconvolution of mutation frequency spectra across individuals. There is a need to enhance methods for inferring mutational signatures to make better use of sparse mutation data (e.g., resulting from exome sequencing of cancers), to facilitate insight into underlying biological mechanisms, and to provide more accurate mutation rate baselines for inferring positive and negative selection. We propose a conceptualization of mutational signatures that represents oligonucleotides via descriptors of DNA conformation: base pair, base pair step, and minor groove width parameters. We demonstrate how such DNA structural parameters can accurately predict mutation occurrence due to DNA repair failures or due to exposure to diverse mutagens such as radiation, chemical exposure, and the APOBEC cytosine deaminase enzymes. Furthermore, the mutation frequency of DNA oligomers classed by structural features can accurately capture systematic variability in mutagenesis of >1,000 tumors originating from diverse human tissues. A nonnegative matrix factorization was applied to mutation spectra stratified by DNA structural features, thereby extracting novel mutational signatures. Moreover, many of the known trinucleotide signatures were associated with an additional spectrum in the DNA structural descriptor space, which may aid interpretation and provide mechanistic insight. Overall, we suggest that the power of DNA sequence motif-based mutational signature analysis can be enhanced by drawing on DNA shape features.
Collapse
Affiliation(s)
- Aleksandra Karolak
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Population Sciences and Department of Computational and Quantitative Medicine, Division of Mathematical Oncology, Beckman Research Institute, City of Hope, Duarte, CA, United States of America
| | - Jurica Levatić
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Fran Supek
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology, Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
38
|
Marijuán PC, Navarro J. The biological information flow: From cell theory to a new evolutionary synthesis. Biosystems 2022; 213:104631. [DOI: 10.1016/j.biosystems.2022.104631] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/19/2022] [Accepted: 01/23/2022] [Indexed: 12/19/2022]
|