1
|
Loh CA, Shields DA, Schwing A, Evrony GD. High-fidelity, large-scale targeted profiling of microsatellites. Genome Res 2024; 34:1008-1026. [PMID: 39013593 PMCID: PMC11368184 DOI: 10.1101/gr.278785.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 07/11/2024] [Indexed: 07/18/2024]
Abstract
Microsatellites are highly mutable sequences that can serve as markers for relationships among individuals or cells within a population. The accuracy and resolution of reconstructing these relationships depends on the fidelity of microsatellite profiling and the number of microsatellites profiled. However, current methods for targeted profiling of microsatellites incur significant "stutter" artifacts that interfere with accurate genotyping, and sequencing costs preclude whole-genome microsatellite profiling of a large number of samples. We developed a novel method for accurate and cost-effective targeted profiling of a panel of more than 150,000 microsatellites per sample, along with a computational tool for designing large-scale microsatellite panels. Our method addresses the greatest challenge for microsatellite profiling-"stutter" artifacts-with a low-temperature hybridization capture that significantly reduces these artifacts. We also developed a computational tool for accurate genotyping of the resulting microsatellite sequencing data that uses an ensemble approach integrating three microsatellite genotyping tools, which we optimize by analysis of de novo microsatellite mutations in human trios. Altogether, our suite of experimental and computational tools enables high-fidelity, large-scale profiling of microsatellites, which may find utility in diverse applications such as lineage tracing, population genetics, ecology, and forensics.
Collapse
Affiliation(s)
- Caitlin A Loh
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Danielle A Shields
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Adam Schwing
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Gilad D Evrony
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA;
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| |
Collapse
|
2
|
Balzano E, Pelliccia F, Giunta S. Genome (in)stability at tandem repeats. Semin Cell Dev Biol 2020; 113:97-112. [PMID: 33109442 DOI: 10.1016/j.semcdb.2020.10.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 09/26/2020] [Accepted: 10/10/2020] [Indexed: 12/12/2022]
Abstract
Repeat sequences account for over half of the human genome and represent a significant source of variation that underlies physiological and pathological states. Yet, their study has been hindered due to limitations in short-reads sequencing technology and difficulties in assembly. A important category of repetitive DNA in the human genome is comprised of tandem repeats (TRs), where repetitive units are arranged in a head-to-tail pattern. Compared to other regions of the genome, TRs carry between 10 and 10,000 fold higher mutation rate. There are several mutagenic mechanisms that can give rise to this propensity toward instability, but their precise contribution remains speculative. Given the high degree of homology between these sequences and their arrangement in tandem, once damaged, TRs have an intrinsic propensity to undergo aberrant recombination with non-allelic exchange and generate harmful rearrangements that may undermine the stability of the entire genome. The dynamic mutagenesis at TRs has been found to underlie individual polymorphism associated with neurodegenerative and neuromuscular disorders, as well as complex genetic diseases like cancer and diabetes. Here, we review our current understanding of the surveillance and repair mechanisms operating within these regions, and we describe how alterations in these protective processes can readily trigger mutational signatures found at TRs, ultimately resulting in the pathological correlation between TRs instability and human diseases. Finally, we provide a viewpoint to counter the detrimental effects that TRs pose in light of their selection and conservation, as important drivers of human evolution.
Collapse
Affiliation(s)
- Elisa Balzano
- Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy
| | - Franca Pelliccia
- Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy
| | - Simona Giunta
- The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA; Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy.
| |
Collapse
|
3
|
Pugacheva V, Korotkov A, Korotkov E. Search of latent periodicity in amino acid sequences by means of genetic algorithm and dynamic programming. Stat Appl Genet Mol Biol 2017; 15:381-400. [PMID: 27337743 DOI: 10.1515/sagmb-2015-0079] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The aim of this study was to show that amino acid sequences have a latent periodicity with insertions and deletions of amino acids in unknown positions of the analyzed sequence. Genetic algorithm, dynamic programming and random weight matrices were used to develop a new mathematical algorithm for latent periodicity search. A multiple alignment of periods was calculated with help of the direct optimization of the position-weight matrix without using pairwise alignments. The developed algorithm was applied to analyze amino acid sequences of a small number of proteins. This study showed the presence of latent periodicity with insertions and deletions in the amino acid sequences of such proteins, for which the presence of latent periodicity was not previously known. The origin of latent periodicity with insertions and deletions is discussed.
Collapse
|
4
|
Rezaei M, Nguyen NMP, Foroughinia L, Dash P, Ahmadpour F, Verma IC, Slim R, Fardaei M. Two novel mutations in the KHDC3L gene in Asian patients with recurrent hydatidiform mole. Hum Genome Var 2016; 3:16027. [PMID: 27621838 PMCID: PMC5007383 DOI: 10.1038/hgv.2016.27] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Revised: 05/02/2016] [Accepted: 06/28/2016] [Indexed: 02/04/2023] Open
Abstract
Recurrent hydatidiform mole (RHM) is defined by the occurrence of repeated molar pregnancies in affected women. Two genes, NLRP7 and KHDC3L, play a causal role in RHM and are responsible for 48-80% and 5% of cases, respectively. Here, we report the results of screening these two genes for mutations in one Iranian and one Indian patient with RHM. No mutations in NLRP7 were identified in the two patients. KHDC3L sequencing identified two novel protein-truncating mutations in a homozygous state, a 4-bp deletion, c.17_20delGGTT (p.Arg6Leufs*7), in the Iranian patient and a splice mutation, c.349+1G>A, that affects the invariant donor site at the junction of exon 2 and intron 2 in the Indian patient. To date, only four mutations in KHDC3L have been reported. The identification of two additional mutations provides further evidence for the important role of KHDC3L in the pathophysiology of RHM and increases the diversity of mutations described in Asian populations.
Collapse
Affiliation(s)
- Maryam Rezaei
- Department of Medical Genetics, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Ngoc Minh Phuong Nguyen
- Department of Human Genetics, McGill University Health Centre Research Institute, Montreal, Quebec, Canada
- Department of Obstetrics and Gynecology, McGill University Health Centre Research Institute, Montreal, Quebec, Canada
| | - Leila Foroughinia
- Department of Obstetrics and Gynecology, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Pratima Dash
- Center of Medical Genetics, Sir Ganga Ram Hospital, Delhi, India
| | - Fatemeh Ahmadpour
- Department of Obstetrics and Gynecology, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | | | - Rima Slim
- Department of Human Genetics, McGill University Health Centre Research Institute, Montreal, Quebec, Canada
- Department of Obstetrics and Gynecology, McGill University Health Centre Research Institute, Montreal, Quebec, Canada
| | - Majid Fardaei
- Department of Medical Genetics, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
5
|
Gingerich TJ, Stumpo DJ, Lai WS, Randall TA, Steppan SJ, Blackshear PJ. Emergence and evolution of Zfp36l3. Mol Phylogenet Evol 2015; 94:518-530. [PMID: 26493225 DOI: 10.1016/j.ympev.2015.10.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Revised: 10/06/2015] [Accepted: 10/13/2015] [Indexed: 11/19/2022]
Abstract
In most mammals, the Zfp36 gene family consists of three conserved members, with a fourth member, Zfp36l3, present only in rodents. The ZFP36 proteins regulate post-transcriptional gene expression at the level of mRNA stability in organisms from humans to yeasts, and appear to be expressed in all major groups of eukaryotes. In Mus musculus, Zfp36l3 expression is limited to the placenta and yolk sac, and is important for overall fecundity. We sequenced the Zfp36l3 gene from more than 20 representative species, from members of the Muridae, Cricetidae and Nesomyidae families. Zfp36l3 was not present in Dipodidae, or any families that branched earlier, indicating that this gene is exclusive to the Muroidea superfamily. We provide evidence that Zfp36l3 arose by retrotransposition of an mRNA encoded by a related gene, Zfp36l2 into an ancestral rodent X chromosome. Zfp36l3 has evolved rapidly since its origin, and numerous modifications have developed, including variations in start codon utilization, de novo intron formation by mechanisms including a nested retrotransposition, and the insertion of distinct repetitive regions. One of these repeat regions, a long alanine rich-sequence, is responsible for the full-time cytoplasmic localization of Mus musculus ZFP36L3. In contrast, this repeat sequence is lacking in Peromyscus maniculatus ZFP36L3, and this protein contains a novel nuclear export sequence that controls shuttling between the nucleus and cytosol. Zfp36l3 is an example of a recently acquired, rapidly evolving gene, and its various orthologues illustrate several different mechanisms by which new genes emerge and evolve.
Collapse
Affiliation(s)
- Timothy J Gingerich
- Laboratory of Signal Transduction, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| | - Deborah J Stumpo
- Laboratory of Signal Transduction, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| | - Wi S Lai
- Laboratory of Signal Transduction, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| | - Thomas A Randall
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| | - Scott J Steppan
- Department of Biological Science, Florida State University, Tallahassee, FL 32306, USA
| | - Perry J Blackshear
- Laboratory of Signal Transduction, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA; Department of Medicine, Duke University Medical Center, Durham, NC 27710, USA; Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA.
| |
Collapse
|
6
|
Lozano JC, Vergé V, Schatt P, Juengel JL, Peaucellier G. Evolution of cyclin B3 shows an abrupt three-fold size increase, due to the extension of a single exon in placental mammals, allowing for new protein-protein interactions. Mol Biol Evol 2012; 29:3855-71. [PMID: 22826462 DOI: 10.1093/molbev/mss189] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Cyclin B3 evolution has the unique peculiarity of an abrupt 3-fold increase of the protein size in the mammalian lineage due to the extension of a single exon. We have analyzed the evolution of the gene to define the modalities of this event and the possible consequences on the function of the protein. Database searches can trace the appearance of the gene to the origin of metazoans. Most introns were already present in early metazoans, and the intron-exon structure as well as the protein size were fairly conserved in invertebrates and nonmammalian vertebrates. Although intron gains are considered as rare events, we identified two cases, one at the prochordate-chordate transition and one in murids, resulting from different mechanisms. At the emergence of mammals, the gene was relocated from chromosome 6 of platypus to the X chromosome in marsupials, but the exon extension occurred only in placental mammals. A repetitive structure of 18 amino acids, of uncertain origin, is detectable in the 3,000-nt mammalian exon-encoded sequence, suggesting an extension by multiple internal duplications, some of which are still detectable in the primate lineage. Structure prediction programs suggest that the repetitive structure has no associated three-dimensional structure but rather a tendency for disorder. Splice variant isoforms were detected in several mammalian species but without conserved pattern, notably excluding the constant coexistence of premammalian-like transcripts, without the extension. The yeast two-hybrid method revealed that, in human, the extension allowed new interactions with ten unrelated proteins, most of them with specific three-dimensional structures involved in protein-protein interactions, and some highly expressed in testis, as is cyclin B3. The interactions with activator of cAMP-responsive element modulator in testis (ACT), germ cell-less homolog 1, and chromosome 1 open reading frame 14 remain to be verified in vivo since they may not be expressed at the same stages of spermatogenesis as cyclin B3.
Collapse
|
7
|
Molecular mining of alleles in water buffalo Bubalus bubalis and characterization of the TSPY1 and COL6A1 genes. PLoS One 2011; 6:e24958. [PMID: 21949806 PMCID: PMC3174239 DOI: 10.1371/journal.pone.0024958] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2011] [Accepted: 08/24/2011] [Indexed: 12/21/2022] Open
Abstract
Background Minisatellites are an integral part of eukaryotic genomes and show variation in the complexity of their organization. Besides their presence in non-coding regions, a small fraction of them are part of the transcriptome, possibly participating in gene regulation, expression and silencing. We studied the minisatellite (TGG)n tagged transcriptome in the water buffalo Bubalus bubalis across various tissues and the spermatozoa, and characterized the genes TSPY1 and COL6A1 discovered in the process. Results Minisatellite associated sequence amplification (MASA) conducted using cDNA and oligonucleotide primer (TGG)5 uncovered 38 different mRNA transcripts from somatic tissues and gonads and 15 from spermatozoa. These mRNA transcripts corresponded to several known and novel genes. The majority of the transcripts showed the highest level of expression either in the testes or spermatozoa with exception of a few showing higher expression levels in the lungs and liver. Transcript SR1, which is expressed in all the somatic tissues and gonads, was found to be similar to the Bos taurus collagen type VI alpha 1 gene (COL6A1). Similarly, SR29, a testis-specific transcript, was found to be similar to the Bos taurus testis-specific Y-encoded protein-1 representing cancer/testis antigen 78 (CT78). Subsequently, full length coding sequences (cds) of these two transcripts were obtained. Quantitative PCR (q-PCR) revealed 182-202 copies of theTSPY1 gene in water buffalo, which localized to the Y chromosome. Conclusions The MASA approach enabled us to identify several genes, including two of clinical significance, without screening an entire cDNA library. Genes identified with TGG repeats are not part of a specific family of proteins and instead are distributed randomly throughout the genome. Genes showing elevated expression in the testes and spermatozoa may prove to be potential candidates for in-depth characterization. Furthermore, their possible involvement in fertility or lack thereof would augment animal biotechnology.
Collapse
|
8
|
Marchetti F, Rowan-Carroll A, Williams A, Polyzos A, Berndt-Weis ML, Yauk CL. Sidestream tobacco smoke is a male germ cell mutagen. Proc Natl Acad Sci U S A 2011; 108:12811-4. [PMID: 21768363 PMCID: PMC3150936 DOI: 10.1073/pnas.1106896108] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Active cigarette smoking increases oxidative damage, DNA adducts, DNA strand breaks, chromosomal aberrations, and heritable mutations in sperm. However, little is known regarding the effects of second-hand smoke on the male germ line. We show here that short-term exposure to mainstream tobacco smoke or sidestream tobacco smoke (STS), the main component of second-hand smoke, induces mutations at an expanded simple tandem repeat locus (Ms6-hm) in mouse sperm. We further show that the response to STS is not linear and that, for both mainstream tobacco smoke and STS, doses that induced significant increases in expanded simple tandem repeat mutations in sperm did not increase the frequencies of micronucleated reticulocytes and erythrocytes in the bone marrow and blood of exposed mice. These data show that passive exposure to cigarette smoke can cause tandem repeat mutations in sperm under conditions that may not induce genetic damage in somatic cells. Although the relationship between noncoding tandem repeat instability and mutations in functional regions of the genome is unclear, our data suggest that paternal exposure to second-hand smoke may have reproductive consequences that go beyond the passive smoker.
Collapse
Affiliation(s)
- Francesco Marchetti
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720; and
| | - Andrea Rowan-Carroll
- Environmental Health Sciences and Research Bureau, Health Canada, Ottawa, ON, Canada K1A 0K9
| | - Andrew Williams
- Environmental Health Sciences and Research Bureau, Health Canada, Ottawa, ON, Canada K1A 0K9
| | - Aris Polyzos
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720; and
| | - M. Lynn Berndt-Weis
- Environmental Health Sciences and Research Bureau, Health Canada, Ottawa, ON, Canada K1A 0K9
| | - Carole L. Yauk
- Environmental Health Sciences and Research Bureau, Health Canada, Ottawa, ON, Canada K1A 0K9
| |
Collapse
|
9
|
Singer TM, Yauk CL. Germ cell mutagens: risk assessment challenges in the 21st century. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2010; 51:919-928. [PMID: 20740630 DOI: 10.1002/em.20613] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Heritable mutations may result in a wide variety of detrimental outcomes, from embryonic lethality to genetic disease in the offspring. Despite this, today's commonly used test batteries do not include assays for germ cell mutation. Current challenges include a lack of practical assays and concrete evidence for human germline mutagens, and large data gaps that often impede risk assessment. Moreover, most regulatory assessments are based on the assumption that somatic cell mutation assays also protect the germline by default, which has not been adequately confirmed. The field is also faced with new challenges aimed at dramatically reducing animal testing, and attempts to rapidly classify thousands of chemicals using high throughput in vitro assays. These approaches may not adequately capture effects that may be particular to gametes, since many aspects of the germline are unique. In light of these challenges, an urgent need exists to develop new approaches to evaluate the potential of toxicants to cause germline mutation. The application of new technologies will greatly enhance our understanding of mutation in humans exposed to environmental mutagens. However, we must be poised to collect and interpret these data, and facilitate risk translation to regulators and the public. Genetic toxicologists must also become actively involved in the development of high-throughput tools to study germline mutation. Appropriate attention to these areas will result in the development of policies that prioritize the protection of the germline and future generations from DNA sequence mutations.
Collapse
Affiliation(s)
- Timothy M Singer
- Mechanistic Studies Division, Environmental Health Science and Research Bureau, Health Canada, Ottawa, Ontario, Canada
| | | |
Collapse
|
10
|
Keren H, Lev-Maor G, Ast G. Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 2010; 11:345-55. [DOI: 10.1038/nrg2776] [Citation(s) in RCA: 756] [Impact Index Per Article: 50.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|