1
|
Wang Z, Moffitt AB, Andrews P, Wigler M, Levy D. Accurate measurement of microsatellite length by disrupting its tandem repeat structure. Nucleic Acids Res 2022; 50:e116. [PMID: 36095132 PMCID: PMC9723644 DOI: 10.1093/nar/gkac723] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/03/2022] [Accepted: 08/15/2022] [Indexed: 12/24/2022] Open
Abstract
Tandem repeats of simple sequence motifs, also known as microsatellites, are abundant in the genome. Because their repeat structure makes replication error-prone, variant microsatellite lengths are often generated during germline and other somatic expansions. As such, microsatellite length variations can serve as markers for cancer. However, accurate error-free measurement of microsatellite lengths is difficult with current methods precisely because of this high error rate during amplification. We have solved this problem by using partial mutagenesis to disrupt enough of the repeat structure of initial templates so that their sequence lengths replicate faithfully. In this work, we use bisulfite mutagenesis to convert a C to a U, later read as T. Compared to untreated templates, we achieve three orders of magnitude reduction in the error rate per round of replication. By requiring agreement from two independent first copies of an initial template, we reach error rates below one in a million. We apply this method to a thousand microsatellite loci from the human genome, revealing microsatellite length distributions not observable without mutagenesis.
Collapse
Affiliation(s)
| | | | - Peter Andrews
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | - Dan Levy
- To whom correspondence should be addressed. Tel: +1 516 367 5039; Fax: +1 516 367 8381;
| |
Collapse
|
2
|
Saini S, Mitra I, Mousavi N, Fotsing SF, Gymrek M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat Commun 2018; 9:4397. [PMID: 30353011 PMCID: PMC6199332 DOI: 10.1038/s41467-018-06694-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 09/18/2018] [Indexed: 12/14/2022] Open
Abstract
Short tandem repeats (STRs) are involved in dozens of Mendelian disorders and have been implicated in complex traits. However, genotyping arrays used in genome-wide association studies focus on single nucleotide polymorphisms (SNPs) and do not readily allow identification of STR associations. We leverage next-generation sequencing (NGS) from 479 families to create a SNP + STR reference haplotype panel. Our panel enables imputing STR genotypes into SNP array data when NGS is not available for directly genotyping STRs. Imputed genotypes achieve mean concordance of 97% with observed genotypes in an external dataset compared to 71% expected under a naive model. Performance varies widely across STRs, with near perfect concordance at bi-allelic STRs vs. 70% at highly polymorphic repeats. Imputation increases power over individual SNPs to detect STR associations with gene expression. Imputing STRs into existing SNP datasets will enable the first large-scale STR association studies across a range of complex traits.
Collapse
Affiliation(s)
- Shubham Saini
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Ileena Mitra
- Bioinformatics and Systems Biology Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Stephanie Feupe Fotsing
- Bioinformatics and Systems Biology Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
- Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
| |
Collapse
|
3
|
Mejía N, Soto B, Guerrero M, Casanueva X, Houel C, de los Ángeles Miccono M, Ramos R, Le Cunff L, Boursiquot JM, Hinrichsen P, Adam-Blondon AF. Molecular, genetic and transcriptional evidence for a role of VvAGL11 in stenospermocarpic seedlessness in grapevine. BMC PLANT BIOLOGY 2011; 11:57. [PMID: 21447172 PMCID: PMC3076230 DOI: 10.1186/1471-2229-11-57] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2010] [Accepted: 03/29/2011] [Indexed: 05/19/2023]
Abstract
BACKGROUND Stenospermocarpy is a mechanism through which certain genotypes of Vitis vinifera L. such as Sultanina produce berries with seeds reduced in size. Stenospermocarpy has not yet been characterized at the molecular level. RESULTS Genetic and physical maps were integrated with the public genomic sequence of Vitis vinifera L. to improve QTL analysis for seedlessness and berry size in experimental progeny derived from a cross of two seedless genotypes. Major QTLs co-positioning for both traits on chromosome 18 defined a 92-kb confidence interval. Functional information from model species including Vitis suggested that VvAGL11, included in this confidence interval, might be the main positional candidate gene responsible for seed and berry development.Characterization of VvAGL11 at the sequence level in the experimental progeny identified several SNPs and INDELs in both regulatory and coding regions. In association analyses performed over three seasons, these SNPs and INDELs explained up to 78% and 44% of the phenotypic variation in seed and berry weight, respectively. Moreover, genetic experiments indicated that the regulatory region has a larger effect on the phenotype than the coding region. Transcriptional analysis lent additional support to the putative role of VvAGL11's regulatory region, as its expression is abolished in seedless genotypes at key stages of seed development. These results transform VvAGL11 into a functional candidate gene for further analyses based on genetic transformation.For breeding purposes, intragenic markers were tested individually for marker assisted selection, and the best markers were those closest to the transcription start site. CONCLUSION We propose that VvAGL11 is the major functional candidate gene for seedlessness, and we provide experimental evidence suggesting that the seedless phenotype might be caused by variations in its promoter region. Current knowledge of the function of its orthologous genes, its expression profile in Vitis varieties and the strong association between its sequence variation and the degree of seedlessness together indicate that the D-lineage MADS-box gene VvAGL11 corresponds to the Seed Development Inhibitor locus described earlier as a major locus for seedlessness. These results provide new hypotheses for further investigations of the molecular mechanisms involved in seed and berry development.
Collapse
Affiliation(s)
- Nilo Mejía
- Biotechnology Unit, La Platina Experimental Station, INIA, Av. Santa Rosa 11610, 8831314, Santiago, Chile
| | - Braulio Soto
- Biotechnology Unit, La Platina Experimental Station, INIA, Av. Santa Rosa 11610, 8831314, Santiago, Chile
| | - Marcos Guerrero
- Biotechnology Unit, La Platina Experimental Station, INIA, Av. Santa Rosa 11610, 8831314, Santiago, Chile
| | - Ximena Casanueva
- Biotechnology Unit, La Platina Experimental Station, INIA, Av. Santa Rosa 11610, 8831314, Santiago, Chile
| | - Cléa Houel
- UMR INRA CNRS University of Evry on Plant Genomics, 2 rue Gaston Crémieux, BP 5708, 91057, Evry, France
| | | | - Rodrigo Ramos
- Biotechnology Unit, La Platina Experimental Station, INIA, Av. Santa Rosa 11610, 8831314, Santiago, Chile
| | - Loïc Le Cunff
- INRA - Montpellier SupAgro, UMR 1097, Equipe Diversité Génétique et Génomique Vigne, 2 place P. Viala, F-34060 Montpellier Cedex 1, France
| | - Jean-Michel Boursiquot
- INRA - Montpellier SupAgro, UMR 1097, Equipe Diversité Génétique et Génomique Vigne, 2 place P. Viala, F-34060 Montpellier Cedex 1, France
| | - Patricio Hinrichsen
- Biotechnology Unit, La Platina Experimental Station, INIA, Av. Santa Rosa 11610, 8831314, Santiago, Chile
| | | |
Collapse
|
4
|
Krawitz P, Rödelsperger C, Jäger M, Jostins L, Bauer S, Robinson PN. Microindel detection in short-read sequence data. ACTA ACUST UNITED AC 2010; 26:722-9. [PMID: 20144947 DOI: 10.1093/bioinformatics/btq027] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge. RESULTS We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (<4 bp) is >90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels. CONTACT peter.krawitz@googlemail.com; peter.robinson@charite.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peter Krawitz
- Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin.
| | | | | | | | | | | |
Collapse
|
5
|
Gibbons JG, Rokas A. Comparative and functional characterization of intragenic tandem repeats in 10 Aspergillus genomes. Mol Biol Evol 2008; 26:591-602. [PMID: 19056904 DOI: 10.1093/molbev/msn277] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Intragenic tandem repeats (ITRs) are consecutive repeats of three or more nucleotides found in coding regions. ITRs are the underlying cause of several human genetic diseases and have been associated with phenotypic variation, including pathogenesis, in several clades of the tree of life. We have examined the evolution and functional role of ITRs in 10 genomes spanning the fungal genus Aspergillus, a clade of relevance to medicine, agriculture, and industry. We identified several hundred ITRs in each of the species examined. ITR content varied extensively between species, with an average 79% of ITRs unique to a given species. For the fraction of conserved ITR regions, sequence comparisons within species and between close relatives revealed that they were highly variable. ITR-containing proteins were evolutionarily less conserved, compositionally distinct, and overrepresented for domains associated with cell-surface localization and function relative to the rest of the proteome. Furthermore, ITRs were preferentially found in proteins involved in transcription, cellular communication, and cell-type differentiation but were underrepresented in proteins involved in metabolism and energy. Importantly, although ITRs were evolutionarily labile, their functional associations appeared. To be remarkably conserved across eukaryotes. Fungal ITRs likely participate in a variety of developmental processes and cell-surface-associated functions, suggesting that their contribution to fungal lifestyle and evolution may be more general than previously assumed.
Collapse
Affiliation(s)
- John G Gibbons
- Department of Biological Sciences, Vanderbilt University, Nashville, USA
| | | |
Collapse
|
6
|
Gross E, Hölzl G, Arnold N, Hauenstein E, Jacobsen A, Schulze K, Ramser J, Meindl A, Kiechle M, Oefner PJ. Allelic loss analysis by denaturing high-performance liquid chromatography and electrospray ionization mass spectrometry. Hum Mutat 2007; 28:303-11. [PMID: 17109391 DOI: 10.1002/humu.20439] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Analysis of allelic imbalance is of great importance for understanding tumorigenesis and the clinical management of malignant disease. Fluorescent-based capillary electrophoresis (CE) of highly polymorphic short tandem repeats (STRs) has become the main method used to detect the loss/gain of alleles. However, there is continued interest in the development of techniques that require no fluorescence and allow the rapid analysis of individual samples. One promising alternative is ion-pair reversed-phase high-performance liquid chromatography (IP-RP-HPLC), which is widely available because of its use in denaturing HPLC. Its applicability in combination with ultraviolet (UV) absorbance detection to the efficient separation of di- and tetranucleotide repeats on the short arm of chromosome 11 was tested using 25 matched pairs of normal and ovarian cancer tissues. Loss of heterozygosity (LOH) could be readily identified for all 13 loci tested, based on changes in the ratios between either the alleles or homo- and heteroduplex signals. However, discrimination between noninformative homo- or hemizygous and heterozygous samples was difficult or impossible when HPLC failed to resolve the alleles. Hyphenation of HPLC with electrospray ionization (ESI) quadrupole ion trap (IT) mass spectrometry (MS) not only allowed the identification of coeluting alleles, but also the reliable detection of a 40% reduction of one allele. The size range of DNA fragments amenable to mass spectrometric analysis was effectively tripled to >300 bp by the use of a linear IT and a Taq DNA polymerase cocktail lacking detergents that otherwise adversely affect ESI.
Collapse
Affiliation(s)
- Eva Gross
- Department of Obstetrics and Gynaecology, Technical University, Munich, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Olejniczak M, Krzyzosiak WJ. Genotyping of simple sequence repeats--factors implicated in shadow band generation revisited. Electrophoresis 2007; 27:3724-34. [PMID: 16960838 DOI: 10.1002/elps.200600136] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
PCR amplification of microsatellite sequences generates, besides the main product corresponding to allele size, also additional, undesired products usually shorter by multiples of the repeated unit. These extra products known as shadow bands or stutter products may complicate genotyping. The mechanism by which these artifacts are formed is not well understood and so no effective remedy has been found to cope with these spurious products. In this study, using the DNA templates containing the CAG/CTG repeats flanked by gene-specific sequences and universal priming sites, we analyzed the effects of many PCR variables on the shadow band generation. The most important result was that at the decreased temperature of the denaturation step during PCR cycling the shadow bands were either not formed or were strongly suppressed. Several possible sources of this effect are discussed.
Collapse
Affiliation(s)
- Marta Olejniczak
- Laboratory of Cancer Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | | |
Collapse
|
8
|
Veytsman B, Akhmadeyeva L. Simple mathematical model of pathologic microsatellite expansions: When self-reparation does not work. J Theor Biol 2006; 242:401-8. [DOI: 10.1016/j.jtbi.2006.03.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2006] [Accepted: 03/15/2006] [Indexed: 10/24/2022]
|
9
|
Lai Y, Sun F. Sampling distribution for microsatellites amplified by PCR: mean field approximation and its applications to genotyping. J Theor Biol 2004; 228:185-94. [PMID: 15094014 DOI: 10.1016/j.jtbi.2003.12.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2003] [Revised: 09/30/2003] [Accepted: 12/23/2003] [Indexed: 11/21/2022]
Abstract
Due to microsatellite mutations during PCR, stutter patterns may appear in the final PCR product, which hinder us from accurate genotyping microsatellite markers. The existing methods for microsatellite stutter pattern deconvolution required large amount of data. A mathematical model for microsatellite mutations during PCR and an estimation method based on mean field approximation for branching processes have recently been developed. In this paper, we study the asymptotic behaviors for mean field approximation when experiments are started from a large number of molecules, and we derive an upper bound for the approximation error when experiments are started from a finite number of molecules. Based on the theories of mean field approximation and Bayesian statistics, we develop a novel method for microsatellite stutter pattern deconvolution.
Collapse
Affiliation(s)
- Yinglei Lai
- Department of Mathematics, University of Southern California, 1042 West 36th Place, DRB 288 Los Angeles, CA 90089-1113, USA
| | | |
Collapse
|
10
|
Lai Y, Sun F. Microsatellite mutations during the polymerase chain reaction: mean field approximations and their applications. J Theor Biol 2003; 224:127-37. [PMID: 12900210 DOI: 10.1016/s0022-5193(03)00155-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
We develop a novel mathematical model for microsatellite mutations during polymerase chain reaction (PCR). Based on the model, we study the first- and second-order moments of the number of repeat units in a randomly chosen molecule after n PCR cycles and their corresponding mean field approximations. We give upper bounds for the approximation errors and show that the approximation errors are small when the mutation rate is low. Based on the theoretical results, we develop a moment estimation method to estimate the mutation rate per-repeat-unit per PCR cycle and the probability of expansion when mutations occur. Simulation studies show that the moment estimation method can accurately recover the true mutation rate and probability of expansion. Finally, the method is applied to experimental data from single-molecule PCR experiments.
Collapse
Affiliation(s)
- Yinglei Lai
- Department of Mathematics, University of Southern California, 1042 West 36th Place, DRB288, Los Angeles, CA 90089-1113, USA
| | | |
Collapse
|