1
|
Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications. Mol Biol Evol 2024; 41:msae177. [PMID: 39172750 PMCID: PMC11385596 DOI: 10.1093/molbev/msae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/02/2024] [Accepted: 07/09/2024] [Indexed: 08/24/2024] Open
Abstract
Insertions and deletions constitute the second most important source of natural genomic variation. Insertions and deletions make up to 25% of genomic variants in humans and are involved in complex evolutionary processes including genomic rearrangements, adaptation, and speciation. Recent advances in long-read sequencing technologies allow detailed inference of insertions and deletion variation in species and populations. Yet, despite their importance, evolutionary studies have traditionally ignored or mishandled insertions and deletions due to a lack of comprehensive methodologies and statistical models of insertions and deletion dynamics. Here, we discuss methods for describing insertions and deletion variation and modeling insertions and deletions over evolutionary time. We provide practical advice for tackling insertions and deletions in genomic sequences and illustrate our discussion with examples of insertions and deletion-induced effects in human and other natural populations and their contribution to evolutionary processes. We outline promising directions for future developments in statistical methodologies that would allow researchers to analyze insertions and deletion variation and their effects in large genomic data sets and to incorporate insertions and deletions in evolutionary inference.
Collapse
Affiliation(s)
| | - Ian Holmes
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
- Calico Life Sciences LLC, South San Francisco, CA 94080, USA
| | - Gerton Lunter
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9713 GZ, The Netherlands
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
2
|
Sugawara N, Towne MJ, Lovett ST, Haber JE. Spontaneous and double-strand break repair-associated quasipalindrome and frameshift mutagenesis in budding yeast: role of mismatch repair. Genetics 2024; 227:iyae068. [PMID: 38691577 DOI: 10.1093/genetics/iyae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 08/09/2023] [Accepted: 03/06/2024] [Indexed: 05/03/2024] Open
Abstract
Although gene conversion (GC) in Saccharomyces cerevisiae is the most error-free way to repair double-strand breaks (DSBs), the mutation rate during homologous recombination is 1,000 times greater than during replication. Many mutations involve dissociating a partially copied strand from its repair template and re-aligning with the same or another template, leading to -1 frameshifts in homonucleotide runs, quasipalindrome (QP)-associated mutations and microhomology-mediated interchromosomal template switches. We studied GC induced by HO endonuclease cleavage at MATα, repaired by an HMR::KI-URA3 donor. We inserted into HMR::KI-URA3 an 18-bp inverted repeat where one arm had a 4-bp insertion. Most GCs yield MAT::KI-ura3::QP + 4 (Ura-) outcomes, but template-switching produces Ura+ colonies, losing the 4-bp insertion. If the QP arm without the insertion is first encountered by repair DNA polymerase and is then (mis)used as a template, the palindrome is perfected. When the QP + 4 arm is encountered first, Ura+ derivatives only occur after second-end capture and second-strand synthesis. QP + 4 mutations are suppressed by mismatch repair (MMR) proteins Msh2, Msh3, and Mlh1, but not Msh6. Deleting Rdh54 significantly reduces QP mutations only when events creating Ura+ occur in the context of a D-loop but not during second-strand synthesis. A similar bias is found with a proofreading-defective DNA polymerase mutation (poI3-01). DSB-induced mutations differed in several genetic requirements from spontaneous events. We also created a + 1 frameshift in the donor, expanding a run of 4 Cs to 5 Cs. Again, Ura+ recombinants markedly increased by disabling MMR, suggesting that MMR acts during GC but favors the unbroken, template strand.
Collapse
Affiliation(s)
- Neal Sugawara
- Department of Biology and Rosenstiel Basic Medical Sciences Research Center MS029, Brandeis University, Waltham, MA 02454-9110, USA
| | - Mason J Towne
- Department of Biology and Rosenstiel Basic Medical Sciences Research Center MS029, Brandeis University, Waltham, MA 02454-9110, USA
| | - Susan T Lovett
- Department of Biology and Rosenstiel Basic Medical Sciences Research Center MS029, Brandeis University, Waltham, MA 02454-9110, USA
| | - James E Haber
- Department of Biology and Rosenstiel Basic Medical Sciences Research Center MS029, Brandeis University, Waltham, MA 02454-9110, USA
| |
Collapse
|
3
|
Mönttinen HAM, Frilander MJ, Löytynoja A. Generation of de novo miRNAs from template switching during DNA replication. Proc Natl Acad Sci U S A 2023; 120:e2310752120. [PMID: 38019864 PMCID: PMC10710096 DOI: 10.1073/pnas.2310752120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/01/2023] [Indexed: 12/01/2023] Open
Abstract
The mechanisms generating novel genes and genetic information are poorly known, even for microRNA (miRNA) genes with an extremely constrained design. All miRNA primary transcripts need to fold into a stem-loop structure to yield short gene products ([Formula: see text]22 nt) that bind and repress their mRNA targets. While a substantial number of miRNA genes are ancient and highly conserved, short secondary structures coding for entirely novel miRNA genes have been shown to emerge in a lineage-specific manner. Template switching is a DNA-replication-related mutation mechanism that can introduce complex changes and generate perfect base pairing for entire hairpin structures in a single event. Here, we show that the template-switching mutations (TSMs) have participated in the emergence of over 6,000 suitable hairpin structures in the primate lineage to yield at least 18 new human miRNA genes, that is 26% of the miRNAs inferred to have arisen since the origin of primates. While the mechanism appears random, the TSM-generated miRNAs are enriched in introns where they can be expressed with their host genes. The high frequency of TSM events provides raw material for evolution. Being orders of magnitude faster than other mechanisms proposed for de novo creation of genes, TSM-generated miRNAs enable near-instant rewiring of genetic information and rapid adaptation to changing environments.
Collapse
Affiliation(s)
- Heli A. M. Mönttinen
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| | - Mikko J. Frilander
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| | - Ari Löytynoja
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| |
Collapse
|
4
|
Horton JS, Taylor TB. Mutation bias and adaptation in bacteria. MICROBIOLOGY (READING, ENGLAND) 2023; 169:001404. [PMID: 37943288 PMCID: PMC10710837 DOI: 10.1099/mic.0.001404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 10/11/2023] [Indexed: 11/10/2023]
Abstract
Genetic mutation, which provides the raw material for evolutionary adaptation, is largely a stochastic force. However, there is ample evidence showing that mutations can also exhibit strong biases, with some mutation types and certain genomic positions mutating more often than others. It is becoming increasingly clear that mutational bias can play a role in determining adaptive outcomes in bacteria in both the laboratory and the clinic. As such, understanding the causes and consequences of mutation bias can help microbiologists to anticipate and predict adaptive outcomes. In this review, we provide an overview of the mechanisms and features of the bacterial genome that cause mutational biases to occur. We then describe the environmental triggers that drive these mechanisms to be more potent and outline the adaptive scenarios where mutation bias can synergize with natural selection to define evolutionary outcomes. We conclude by describing how understanding mutagenic genomic features can help microbiologists predict areas sensitive to mutational bias, and finish by outlining future work that will help us achieve more accurate evolutionary forecasts.
Collapse
Affiliation(s)
- James S. Horton
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, BA2 7AY, UK
| | - Tiffany B. Taylor
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, BA2 7AY, UK
| |
Collapse
|
5
|
Revisiting mutagenesis at non-B DNA motifs in the human genome. Nat Struct Mol Biol 2023; 30:417-424. [PMID: 36914796 DOI: 10.1038/s41594-023-00936-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 02/03/2023] [Indexed: 03/16/2023]
Abstract
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.
Collapse
|
6
|
Frith MC, Shaw J, Spouge JL. How to optimally sample a sequence for rapid analysis. Bioinformatics 2023; 39:btad057. [PMID: 36702468 PMCID: PMC9907223 DOI: 10.1093/bioinformatics/btad057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Accepted: 01/24/2023] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION We face an increasing flood of genetic sequence data, from diverse sources, requiring rapid computational analysis. Rapid analysis can be achieved by sampling a subset of positions in each sequence. Previous sequence-sampling methods, such as minimizers, syncmers and minimally overlapping words, were developed by heuristic intuition, and are not optimal. RESULTS We present a sequence-sampling approach that provably optimizes sensitivity for a whole class of sequence comparison methods, for randomly evolving sequences. It is likely near-optimal for a wide range of alignment-based and alignment-free analyses. For real biological DNA, it increases specificity by avoiding simple repeats. Our approach generalizes universal hitting sets (which guarantee to sample a sequence at least once) and polar sets (which guarantee to sample a sequence at most once). This helps us understand how to do rapid sequence analysis as accurately as possible. AVAILABILITY AND IMPLEMENTATION Source code is freely available at https://gitlab.com/mcfrith/noverlap. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8568, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo 169-8555, Japan
| | - Jim Shaw
- Department of Mathematics, University of Toronto, Toronto, ON M5S 2E4, Canada
| | - John L Spouge
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
7
|
Frith MC, Mitsuhashi S. Finding Rearrangements in Nanopore DNA Reads with LAST and dnarrange. Methods Mol Biol 2023; 2632:161-175. [PMID: 36781728 DOI: 10.1007/978-1-0716-2996-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
Long-read DNA sequencing techniques such as nanopore are especially useful for characterizing complex sequence rearrangements, which occur in some genetic diseases and also during evolution. Analyzing the sequence data to understand such rearrangements is not trivial, due to sequencing error, rearrangement intricacy, and abundance of repeated similar sequences in genomes.The LAST and dnarrange software packages can resolve complex relationships between DNA sequences and characterize changes such as gene conversion, processed pseudogene insertion, and chromosome shattering. They can filter out numerous rearrangements shared by controls, e.g., healthy humans versus a patient, to focus on rearrangements unique to the patient. One useful ingredient is last-train, which learns the rates (probabilities) of deletions, insertions, and each kind of base match and mismatch. These probabilities are then used to find the most likely sequence relationships/alignments, which is especially useful for DNA with unusual rates, such as DNA from Plasmodium falciparum (malaria) with ∼80% a+t. This is also useful for less-studied species that lack reference genomes, so the DNA reads are compared to a different species' genome. We also point out that a reference genome with ancestral alleles would be ideal.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan.
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan.
- Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan.
| | - Satomi Mitsuhashi
- Department of Genomic Function and Diversity, Tokyo Medical and Dental University, Tokyo, Japan
- Division of Neurology, Department of Internal Medicine, St. Marianna University School of Medicine, Kawasaki, Japan
| |
Collapse
|
8
|
Hrq1/RECQL4 regulation is critical for preventing aberrant recombination during DNA intrastrand crosslink repair and is upregulated in breast cancer. PLoS Genet 2022; 18:e1010122. [PMID: 36126066 PMCID: PMC9488787 DOI: 10.1371/journal.pgen.1010122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 08/18/2022] [Indexed: 11/19/2022] Open
Abstract
Human RECQL4 is a member of the RecQ family of DNA helicases and functions during DNA replication and repair. RECQL4 mutations are associated with developmental defects and cancer. Although RECQL4 mutations lead to disease, RECQL4 overexpression is also observed in cancer, including breast and prostate. Thus, tight regulation of RECQL4 protein levels is crucial for genome stability. Because mammalian RECQL4 is essential, how cells regulate RECQL4 protein levels is largely unknown. Utilizing budding yeast, we investigated the RECQL4 homolog, HRQ1, during DNA crosslink repair. We find that Hrq1 functions in the error-free template switching pathway to mediate DNA intrastrand crosslink repair. Although Hrq1 mediates repair of cisplatin-induced lesions, it is paradoxically degraded by the proteasome following cisplatin treatment. By identifying the targeted lysine residues, we show that preventing Hrq1 degradation results in increased recombination and mutagenesis. Like yeast, human RECQL4 is similarly degraded upon exposure to crosslinking agents. Furthermore, over-expression of RECQL4 results in increased RAD51 foci, which is dependent on its helicase activity. Using bioinformatic analysis, we observe that RECQL4 overexpression correlates with increased recombination and mutations. Overall, our study uncovers a role for Hrq1/RECQL4 in DNA intrastrand crosslink repair and provides further insight how misregulation of RECQL4 can promote genomic instability, a cancer hallmark. RECQL4 is a DNA helicase and functions during DNA replication and repair. While loss-of-function RECQL4 mutations are found in diseases characterized by developmental defects and cancer, such as Rothmund-Thomson syndrome, over-expression of RECQL4 is also observed in cancer, such as breast cancer. Therefore, RECQL4 protein expression must be tightly regulated. Here we used the budding yeast homolog of RECQL4, Hrq1, and discovered that overexpression of Hrq1 protein levels result in increased recombination and mutations, both cancer hallmarks. We find that Hrq1 functions to mediate repair of a specific type of DNA damage, intrastrand crosslinks, which occur when DNA nucleotides on the same strand are chemically linked together. These findings are also conserved in humans suggesting a common mechanism between yeast Hrq1 and human RECQL4. Overall, our study identifies a conserved role for RECQL4 in DNA intrastrand crosslink repair and provides insights into how its misregulation could promote cancer development.
Collapse
|
9
|
Löytynoja A. Thousands of human mutation clusters are explained by short-range template switching. Genome Res 2022; 32:1437-1447. [PMID: 35760560 PMCID: PMC9435742 DOI: 10.1101/gr.276478.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 06/21/2022] [Indexed: 02/03/2023]
Abstract
Variation within human genomes is unevenly distributed, and variants show spatial clustering. DNA replication-related template switching is a poorly known mutational mechanism capable of causing major chromosomal rearrangements as well as creating short inverted sequence copies that appear as local mutation clusters in sequence comparisons. In this study, haplotype-resolved genome assemblies representing 25 human populations and multinucleotide variants aggregated from 140,000 human sequencing experiments were reanalyzed. Local template switching could explain thousands of complex mutation clusters across the human genome, the loci segregating within and between populations. During the study, computational tools were developed for identification of template switch events using both short-read sequencing data and genotype data, and for genotyping candidate loci using short-read data. The characteristics of template-switch mutations complicate their detection, and widely used analysis pipelines for short-read sequencing data, normally capable of identifying single nucleotide changes, were found to miss template-switch mutations of tens of base pairs, potentially invalidating medical genetic studies searching for a causative allele behind genetic diseases. Combined with the massive sequencing data now available for humans, the novel tools described here enable building catalogs of affected loci and studying the cellular mechanisms behind template switching in both healthy organisms and disease.
Collapse
Affiliation(s)
- Ari Löytynoja
- Institute of Biotechnology, University of Helsinki, FI-00014 Helsinki, Finland
| |
Collapse
|
10
|
Template switching in DNA replication can create and maintain RNA hairpins. Proc Natl Acad Sci U S A 2022; 119:2107005119. [PMID: 35046021 PMCID: PMC8794818 DOI: 10.1073/pnas.2107005119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2021] [Indexed: 11/18/2022] Open
Abstract
The evolutionary origin of RNA stem structures and the preservation of their base pairing under a spontaneous and random mutation process have puzzled theoretical evolutionary biologists. DNA replication-related template switching is a mutation mechanism that creates reverse-complement copies of sequence regions within a genome by replicating briefly along either the complementary or nascent DNA strand. Depending on the relative positions and context of the four switch points, this process may produce a reverse-complement repeat capable of forming the stem of a perfect DNA hairpin or fix the base pairing of an existing stem. Template switching is typically thought to trigger large structural changes, and its possible role in the origin and evolution of RNA genes has not been studied. Here, we show that the reconstructed ancestral histories of RNA genes contain mutation patterns consistent with the DNA replication-related template switching. In addition to multibase compensatory mutations, the mechanism can explain complex sequence changes, although mutations breaking the structure rarely get fixed in evolution. Our results suggest a solution for the long-standing dilemma of RNA gene evolution and demonstrate how template switching can both create perfect stems with a single mutation event and help maintaining the stem structure over time. Interestingly, template switching also provides an elegant explanation for the asymmetric base pair frequencies within RNA stems.
Collapse
|
11
|
Potapova NA, Kondrashov AS, Mirkin SM. Characteristics and possible mechanisms of formation of microinversions distinguishing human and chimpanzee genomes. Sci Rep 2022; 12:591. [PMID: 35022450 PMCID: PMC8755829 DOI: 10.1038/s41598-021-04621-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 12/28/2021] [Indexed: 12/02/2022] Open
Abstract
Genomic inversions come in various sizes. While long inversions are relatively easy to identify by aligning high-quality genome sequences, unambiguous identification of microinversions is more problematic. Here, using a set of extra stringent criteria to distinguish microinversions from other mutational events, we describe microinversions that occurred after the divergence of humans and chimpanzees. In total, we found 59 definite microinversions that range from 17 to 33 nucleotides in length. In majority of them, human genome sequences matched exactly the reverse-complemented chimpanzee genome sequences, implying that the inverted DNA segment was copied precisely. All these microinversions were flanked by perfect or nearly perfect inverted repeats pointing to their key role in their formation. Template switching at inverted repeats during DNA replication was previously discussed as a possible mechanism for the microinversion formation. However, many of definite microinversions found by us cannot be easily explained via template switching owing to the combination of the short length and imperfect nature of their flanking inverted repeats. We propose a novel, alternative mechanism that involves repair of a double-stranded break within the inverting segment via microhomology-mediated break-induced replication, which can consistently explain all definite microinversion events.
Collapse
Affiliation(s)
- Nadezhda A Potapova
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia, 127051.
| | - Alexey S Kondrashov
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Sergei M Mirkin
- Department of Biology, Tufts University, Medford, MA, 02155, USA.
| |
Collapse
|
12
|
Protein innovation through template switching in the Saccharomyces cerevisiae lineage. Sci Rep 2021; 11:22558. [PMID: 34799587 PMCID: PMC8604942 DOI: 10.1038/s41598-021-01736-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 10/27/2021] [Indexed: 11/08/2022] Open
Abstract
DNA polymerase template switching between short, non-identical inverted repeats (IRs) is a genetic mechanism that leads to the homogenization of IR arms and to IR spacer inversion, which cause multinucleotide mutations (MNMs). It is unknown if and how template switching affects gene evolution. In this study, we performed a phylogenetic analysis to determine the effect of template switching between IR arms on coding DNA of Saccharomyces cerevisiae. To achieve this, perfect IRs that co-occurred with MNMs between a strain and its parental node were identified in S. cerevisiae strains. We determined that template switching introduced MNMs into 39 protein-coding genes through S. cerevisiae evolution, resulting in both arm homogenization and inversion of the IR spacer. These events in turn resulted in nonsynonymous substitutions and up to five neighboring amino acid replacements in a single gene. The study demonstrates that template switching is a powerful generator of multiple substitutions within codons. Additionally, some template switching events occurred more than once during S. cerevisiae evolution. Our findings suggest that template switching constitutes a general mutagenic mechanism that results in both nonsynonymous substitutions and parallel evolution, which are traditionally considered as evidence for positive selection, without the need for adaptive explanations.
Collapse
|
13
|
Seplyarskiy VB, Sunyaev S. The origin of human mutation in light of genomic data. Nat Rev Genet 2021; 22:672-686. [PMID: 34163020 DOI: 10.1038/s41576-021-00376-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/06/2021] [Indexed: 02/05/2023]
Abstract
Despite years of active research into the role of DNA repair and replication in mutagenesis, surprisingly little is known about the origin of spontaneous human mutation in the germ line. With the advent of high-throughput sequencing, genome-scale data have revealed statistical properties of mutagenesis in humans. These properties include variation of the mutation rate and spectrum along the genome at different scales in relation to epigenomic features and dependency on parental age. Moreover, mutations originated in mothers are less frequent than mutations originated in fathers and have a distinct genomic distribution. Statistical analyses that interpret these patterns in the context of known biochemistry can provide mechanistic models of mutagenesis in humans.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
14
|
Walker CR, Scally A, De Maio N, Goldman N. Short-range template switching in great ape genomes explored using pair hidden Markov models. PLoS Genet 2021; 17:e1009221. [PMID: 33651813 PMCID: PMC7954356 DOI: 10.1371/journal.pgen.1009221] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 03/12/2021] [Accepted: 02/10/2021] [Indexed: 12/14/2022] Open
Abstract
Many complex genomic rearrangements arise through template switch errors, which occur in DNA replication when there is a transient polymerase switch to an alternate template nearby in three-dimensional space. While typically investigated at kilobase-to-megabase scales, the genomic and evolutionary consequences of this mutational process are not well characterised at smaller scales, where they are often interpreted as clusters of independent substitutions, insertions and deletions. Here we present an improved statistical approach using pair hidden Markov models, and use it to detect and describe short-range template switches underlying clusters of mutations in the multi-way alignment of hominid genomes. Using robust statistics derived from evolutionary genomic simulations, we show that template switch events have been widespread in the evolution of the great apes’ genomes and provide a parsimonious explanation for the presence of many complex mutation clusters in their phylogenetic context. Larger-scale mechanisms of genome rearrangement are typically associated with structural features around breakpoints, and accordingly we show that atypical patterns of secondary structure formation and DNA bending are present at the initial template switch loci. Our methods improve on previous non-probabilistic approaches for computational detection of template switch mutations, allowing the statistical significance of events to be assessed. By specifying realistic evolutionary parameters based on the genomes and taxa involved, our methods can be readily adapted to other intra- or inter-species comparisons. DNA replication is an imperfect process which causes the mutations that give rise to genetic diversity during the evolution of genomes. While many mutations are independent, single-nucleotide substitutions or small insertions and deletions, some mutations arise as nonindependent clusters of substitutions and larger scale chromosomal rearrangements. Large-scale rearrangements (also called structural variants) in particular can have a profound impact on genome evolution and contribute to both germline and somatic disease in humans. The replication-based mechanisms underlying structural variation typically involve a polymerase switch event in which a large segment of DNA is copied using a template from an alternate location in the genome. Methods for identifying these template switch mutations lack the power to detect smaller scale rearrangements which can arise through the same replication-based pathways. Here we outline a model which can detect and assess the statistical significance of such small-scale template switches within their evolutionary context. We show that these events are widespread in the evolution of great apes and that the genomic features associated with these small-scale rearrangements are similar to those of large-scale structural variants.
Collapse
Affiliation(s)
- Conor R. Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Aylwyn Scally
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
- * E-mail:
| |
Collapse
|
15
|
Warthi G, Fournier PE, Seligmann H. Systematic Nucleotide Exchange Analysis of ESTs From the Human Cancer Genome Project Report: Origins of 347 Unknown ESTs Indicate Putative Transcription of Non-Coding Genomic Regions. Front Genet 2020; 11:42. [PMID: 32117454 PMCID: PMC7027195 DOI: 10.3389/fgene.2020.00042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 01/15/2020] [Indexed: 12/16/2022] Open
Abstract
Expressed sequence tags (ESTs) provide an imprint of cellular RNA diversity irrespectively of sequence homology with template genomes. NCBI databases include many unknown RNAs from various normal and cancer cells. These are usually ignored assuming sequencing artefacts or contamination due to their lack of sequence homology with template DNA. Here, we report genomic origins of 347 ESTs previously assumed artefacts/unknown, from the FAPESP/LICR Human Cancer Genome Project. EST template detection uses systematic nucleotide exchange analyses called swinger transformations. Systematic nucleotide exchanges replace systematically particular nucleotides with different nucleotides. Among 347 unknown ESTs, 51 ESTs match mitogenome transcription, 17 and 2 ESTs are from nuclear chromosome non-coding regions, and uncharacterized nuclear genes. Identified ESTs mapped on 205 protein-coding genes, 10 genes had swinger RNAs in several biosamples. Whole cell transcriptome searches for 17 ESTs mapping on non-coding regions confirmed their transcription. The 10 swinger-transcribed genes identified more than once associate with cancer induction and progression, suggesting swinger transformation occurs mainly in highly transcribed genes. Swinger transformation is a unique method to identify noncanonical RNAs obtained from NGS, which identifies putative ncRNA transcribed regions. Results suggest that swinger transcription occurs in highly active genes in normal and genetically unstable cancer cells.
Collapse
Affiliation(s)
- Ganesh Warthi
- Aix Marseille Univ, IRD, APHM, SSA, VITROME, IHU-Méditerranée Infection, Marseille, France.,IHU-Méditerranée Infection, Marseille, France
| | - Pierre-Edouard Fournier
- Aix Marseille Univ, IRD, APHM, SSA, VITROME, IHU-Méditerranée Infection, Marseille, France.,IHU-Méditerranée Infection, Marseille, France
| | - Hervé Seligmann
- The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel.,Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecoms4Health, La Tronche, France
| |
Collapse
|
16
|
Pilzecker B, Buoninfante OA, Jacobs H. DNA damage tolerance in stem cells, ageing, mutagenesis, disease and cancer therapy. Nucleic Acids Res 2019; 47:7163-7181. [PMID: 31251805 PMCID: PMC6698745 DOI: 10.1093/nar/gkz531] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 05/22/2019] [Accepted: 06/26/2019] [Indexed: 12/12/2022] Open
Abstract
The DNA damage response network guards the stability of the genome from a plethora of exogenous and endogenous insults. An essential feature of the DNA damage response network is its capacity to tolerate DNA damage and structural impediments during DNA synthesis. This capacity, referred to as DNA damage tolerance (DDT), contributes to replication fork progression and stability in the presence of blocking structures or DNA lesions. Defective DDT can lead to a prolonged fork arrest and eventually cumulate in a fork collapse that involves the formation of DNA double strand breaks. Four principal modes of DDT have been distinguished: translesion synthesis, fork reversal, template switching and repriming. All DDT modes warrant continuation of replication through bypassing the fork stalling impediment or repriming downstream of the impediment in combination with filling of the single-stranded DNA gaps. In this way, DDT prevents secondary DNA damage and critically contributes to genome stability and cellular fitness. DDT plays a key role in mutagenesis, stem cell maintenance, ageing and the prevention of cancer. This review provides an overview of the role of DDT in these aspects.
Collapse
Affiliation(s)
- Bas Pilzecker
- Division of Tumor Biology and Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - Olimpia Alessandra Buoninfante
- Division of Tumor Biology and Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - Heinz Jacobs
- Division of Tumor Biology and Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| |
Collapse
|
17
|
Warthi G, Fournier PE, Seligmann H. Identification of Noncanonical Transcripts Produced by Systematic Nucleotide Exchanges in HIV-Associated Centroblastic Lymphoma. DNA Cell Biol 2019; 39:1444-1448. [PMID: 31750730 DOI: 10.1089/dna.2019.5066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Noncanonical transcriptions include transcriptions that systematically exchange nucleotides, also called bijective transformations or swinger transformations. Swinger transformation A↔T+C↔G recovers identities of 8 among 9 unknown RNAs differentially expressed in centroblastic lymphoma, a human immunodeficiency virus (HIV)-associated non-Hodgkin's lymphoma. The identified RNAs align with human genes with known anti-HIV1 or oncogenic activities. Function disruption through swinger-transformed transcription potentially enables avoiding antiviral responses and contributes to cancer induction.
Collapse
Affiliation(s)
- Ganesh Warthi
- IRD, APHM, Aix Marseille Univ, SSA, VITROME, IHU-Méditerranée Infection, Marseille, France.,IHU-Méditerranée Infection, Marseille, France
| | - Pierre-Edouard Fournier
- IRD, APHM, Aix Marseille Univ, SSA, VITROME, IHU-Méditerranée Infection, Marseille, France.,IHU-Méditerranée Infection, Marseille, France
| | - Hervé Seligmann
- The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
18
|
Truncating SLC12A6 variants cause different clinical phenotypes in humans and dogs. Eur J Hum Genet 2019; 27:1561-1568. [PMID: 31160700 DOI: 10.1038/s41431-019-0432-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 05/06/2019] [Accepted: 05/14/2019] [Indexed: 12/13/2022] Open
Abstract
Clinical, pathological, and genetic findings of a primary hereditary ataxia found in a Malinois dog family are described and compared with its human counterpart. Based on the family history and the phenotype/genotype relationships already described in humans and dogs, a causal variant was expected to be found in KCNJ10. Rather surprisingly, whole-exome sequencing identified the SLC12A6 NC_006612.3(XM_014109414.2): c.178_181delinsCATCTCACTCAT (p.(Met60Hisfs*14)) truncating variant. This loss-of-function variant perfectly segregated within the affected Malinois family in an autosomal recessive way and was not found in 562 additional reference dogs from 18 different breeds, including Malinois. In humans, SLC12A6 variants cause "agenesis of the corpus callosum with peripheral neuropathy" (ACCPN, alias Andermann syndrome), owing to a dysfunction of this K+-Cl- cotransporter. However, depending on the variant (including truncating variants), different clinical features are observed within ACCPN. The variant in dogs encodes the shortest isoform described so far and its resultant phenotype is quite different from humans, as no signs of peripheral neuropathy, agenesis of the corpus callosum nor obvious mental retardation have been observed in dogs. On the other hand, progressive spinocerebellar ataxia, which is the most important feature of the canine phenotype, hindlimb paresis, and myokymia-like muscle contractions have not been described in humans with ACCPN so far. As this is the first report of a naturally occurring disease-causing SLC12A6 variant in a non-human species, the canine model will be highly valuable to better understand the complex molecular pathophysiology of SLC12A6-related neurological disorders and to evaluate novel treatment strategies.
Collapse
|
19
|
Boel A, De Saffel H, Steyaert W, Callewaert B, De Paepe A, Coucke PJ, Willaert A. CRISPR/Cas9-mediated homology-directed repair by ssODNs in zebrafish induces complex mutational patterns resulting from genomic integration of repair-template fragments. Dis Model Mech 2018; 11:11/10/dmm035352. [PMID: 30355591 PMCID: PMC6215429 DOI: 10.1242/dmm.035352] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 08/31/2018] [Indexed: 12/30/2022] Open
Abstract
Targeted genome editing by CRISPR/Cas9 is extremely well fitted to generate gene disruptions, although precise sequence replacement by CRISPR/Cas9-mediated homology-directed repair (HDR) suffers from low efficiency, impeding its use for high-throughput knock-in disease modeling. In this study, we used next-generation sequencing (NGS) analysis to determine the efficiency and reliability of CRISPR/Cas9-mediated HDR using several types of single-stranded oligodeoxynucleotide (ssODN) repair templates for the introduction of disease-relevant point mutations in the zebrafish genome. Our results suggest that HDR rates are strongly determined by repair-template composition, with the most influential factor being homology-arm length. However, we found that repair using ssODNs does not only lead to precise sequence replacement but also induces integration of repair-template fragments at the Cas9 cut site. We observed that error-free repair occurs at a relatively constant rate of 1-4% when using different repair templates, which was sufficient for transmission of point mutations to the F1 generation. On the other hand, erroneous repair mainly accounts for the variability in repair rate between the different repair templates. To further improve error-free HDR rates, elucidating the mechanism behind this erroneous repair is essential. We show that the error-prone nature of ssODN-mediated repair, believed to act via synthesis-dependent strand annealing (SDSA), is most likely due to DNA synthesis errors. In conclusion, caution is warranted when using ssODNs for the generation of knock-in models or for therapeutic applications. We recommend the application of in-depth NGS analysis to examine both the efficiency and error-free nature of HDR events. This article has an associated First Person interview with the first author of the paper. Summary: NGS-based analysis reveals that CRISPR/Cas9-induced double-strand-break repair using single-stranded repair templates is error prone in zebrafish, resulting in complex patterns of integrated repair-template fragments.
Collapse
Affiliation(s)
- Annekatrien Boel
- Center for Medical Genetics, Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Hanna De Saffel
- Center for Medical Genetics, Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Wouter Steyaert
- Center for Medical Genetics, Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Bert Callewaert
- Center for Medical Genetics, Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Anne De Paepe
- Center for Medical Genetics, Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Paul J Coucke
- Center for Medical Genetics, Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Andy Willaert
- Center for Medical Genetics, Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| |
Collapse
|
20
|
Lavi B, Levy Karin E, Pupko T, Hazkani-Covo E. The Prevalence and Evolutionary Conservation of Inverted Repeats in Proteobacteria. Genome Biol Evol 2018; 10:918-927. [PMID: 29608719 PMCID: PMC5941160 DOI: 10.1093/gbe/evy044] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/21/2018] [Indexed: 12/11/2022] Open
Abstract
Perfect short inverted repeats (IRs) are known to be enriched in a variety of bacterial and eukaryotic genomes. Currently, it is unclear whether perfect IRs are conserved over evolutionary time scales. In this study, we aimed to characterize the prevalence and evolutionary conservation of IRs across 20 proteobacterial strains. We first identified IRs in Escherichia coli K-12 substr MG1655 and showed that they are overabundant. We next aimed to test whether this overabundance is reflected in the conservation of IRs over evolutionary time scales. To this end, for each perfect IR identified in E. coli MG1655, we collected orthologous sequences from related proteobacterial genomes. We next quantified the evolutionary conservation of these IRs, that is, the presence of the exact same IR across orthologous regions. We observed high conservation of perfect IRs: out of the 234 examined orthologous regions, 145 were more conserved than expected, which is statistically significant even after correcting for multiple testing. Our results together with previous experimental findings support a model in which imperfect IRs are corrected to perfect IRs in a preferential manner via a template switching mechanism.
Collapse
Affiliation(s)
- Bar Lavi
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
- Department of Natural and Life Sciences, The Open University of Israel, Ra'anana, Israel
| | - Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
- Department of Molecular Biology & Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
| | - Einat Hazkani-Covo
- Department of Natural and Life Sciences, The Open University of Israel, Ra'anana, Israel
| |
Collapse
|