1
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024; 25:460-475. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
2
|
Huang Y, Wang M, Liu C, He G. Comprehensive landscape of non-CODIS STRs in global populations provides new insights into challenging DNA profiles. Forensic Sci Int Genet 2024; 70:103010. [PMID: 38271830 DOI: 10.1016/j.fsigen.2024.103010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/13/2024] [Accepted: 01/14/2024] [Indexed: 01/27/2024]
Abstract
The worldwide implementation of short tandem repeats (STR) profiles in forensic genetics necessitated establishing and expanding the CODIS core loci set to facilitated efficient data management and exchange. Currently, the mainstay CODIS STRs are adopted in most general-purpose forensic kits. However, relying solely on these loci failed to yield satisfactory results for challenging tasks, such as bio-geographical ancestry inference, complex DNA mixture profile interpretation, and distant kinship analysis. In this context, non-CODIS STRs are potent supplements to enhance the systematic discriminating power, particularly when combined with the high-throughput next-generation sequencing (NGS) technique. Nevertheless, comprehensive evaluation on non-CODIS STRs in diverse populations was scarce, hindering their further application in routine caseworks. To address this gap, we investigated genetic variations of 178 historically available non-CODIS STRs from ethnolinguistically different worldwide populations and studied their characteristics and forensic potentials via high-coverage whole genome sequencing (WGS) data. Initially, we delineated the genomic properties of these non-CODIS markers through sequence searching, repeat structure scanning, and manual inspection. Subsequent population genetics analysis suggested that these non-CODIS STRs had comparable polymorphism levels and forensic utility to CODIS STRs. Furthermore, we constructed a theoretical next-generation sequencing (NGS) panel comprising 108 STRs (20 CODIS STRs and 88 non-CODIS STRs), and evaluated its performance in inferring bio-geographical ancestry origins, deconvoluting complex DNA mixtures, and differentiating distant kinships using real and simulated datasets. Our findings demonstrated that incorporating supplementary non-CODIS STRs enabled the extrapolation of multidimensional information from a single STR profile, thereby facilitating the analysis of challenging forensic tasks. In conclusion, this study presents an extensive genomic landscape of forensic non-CODIS STRs among global populations, and emphasized the imperative inclusion of additional polymorphic non-CODIS STRs in future NGS-based forensic systems.
Collapse
Affiliation(s)
- Yuguo Huang
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610041, China.
| | - Mengge Wang
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610041, China
| | - Chao Liu
- Anti-Drug Technology Center of Guangdong Province, Guangzhou 510230, China; Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China.
| | - Guanglin He
- Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610041, China; Center for Archaeological Science, Sichuan University, Chengdu 610000, China.
| |
Collapse
|
3
|
Bodner M, Ballard D, Borsuk LA, King JL, Parson W, Phillips C, Gettings KB. Harmonizing the forensic nomenclature for STR loci D6S474 and DYS612. Forensic Sci Int Genet 2024; 70:103012. [PMID: 38295652 DOI: 10.1016/j.fsigen.2024.103012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 01/10/2024] [Accepted: 01/15/2024] [Indexed: 04/01/2024]
Abstract
The autosomal STR D6S474 and the Y-chromosomal STR DYS612 have been reported in multiple ways in the forensic literature, with differences in both the bracketed repeat structures and counting of numerical length-based capillary electrophoresis (CE) alleles. These issues often come to light when STR loci are introduced in commercial assays and results compared with historical publications of allele frequency data, or multiple assays are characterized with reference materials. We review the forensic literature and other relevant information, and provide suggestions for the future treatment of each STR.
Collapse
Affiliation(s)
- Martin Bodner
- Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria
| | - David Ballard
- King's Forensics, King's College London, Franklin-Wilkins Building, London, UK
| | - Lisa A Borsuk
- National Institute of Standards and Technology, Biomolecular Measurement Division, Gaithersburg, MD, USA
| | - Jonathan L King
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Walther Parson
- Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria; Forensic Science Program, The Pennsylvania State University, University Park, PA, USA
| | - Christopher Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Katherine Butler Gettings
- National Institute of Standards and Technology, Biomolecular Measurement Division, Gaithersburg, MD, USA.
| |
Collapse
|
4
|
Gettings KB, Bodner M, Borsuk LA, King JL, Ballard D, Parson W, Benschop CCG, Børsting C, Budowle B, Butler JM, van der Gaag KJ, Gill P, Gusmão L, Hares DR, Hoogenboom J, Irwin J, Prieto L, Schneider PM, Vennemann M, Phillips C. Recommendations of the DNA Commission of the International Society for Forensic Genetics (ISFG) on short tandem repeat sequence nomenclature. Forensic Sci Int Genet 2024; 68:102946. [PMID: 39090852 DOI: 10.1016/j.fsigen.2023.102946] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 10/14/2023] [Indexed: 08/04/2024]
Abstract
The DNA Commission of the International Society for Forensic Genetics (ISFG) has developed a set of nomenclature recommendations for short tandem repeat (STR) sequences. These recommendations follow the 2016 considerations of the DNA Commission of the ISFG, incorporating the knowledge gained through research and population studies in the intervening years. While maintaining a focus on backward compatibility with the CE data that currently populate national DNA databases, this report also looks to the future with the establishment of recommended minimum sequence reporting ranges to facilitate interlaboratory comparisons, automated solutions for sequence-based allele designations, a suite of resources to support bioinformatic development, guidance for characterizing new STR loci, and considerations for incorporating STR sequences and other new markers into investigative databases.
Collapse
Affiliation(s)
| | - Martin Bodner
- Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria
| | - Lisa A Borsuk
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Jonathan L King
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - David Ballard
- King's Forensics, Department of Analytical, Environmental and Forensic Sciences, King's College London, London, United Kingdom
| | - Walther Parson
- Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria; Forensic Science Program, The Pennsylvania State University, University Park, PA, USA
| | - Corina C G Benschop
- Division of Biological Traces, Netherlands Forensic Institute, The Hague, the Netherlands
| | - Claus Børsting
- Section of Forensic Genetics, Department of Forensic Medicine, University of Copenhagen, Denmark
| | - Bruce Budowle
- Department of Forensic Medicine, University of Helsinki, Helsinki, Finland; Radford University Forensic Science Institute, Radford University, Radford, VA, USA
| | - John M Butler
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Peter Gill
- Forensic Genetics Research Group, Oslo University Hospital, Oslo, Norway
| | - Leonor Gusmão
- DNA Diagnostic Laboratory, State University of Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Jerry Hoogenboom
- Division of Biological Traces, Netherlands Forensic Institute, The Hague, the Netherlands
| | | | - Lourdes Prieto
- Forensic Sciences Institute Luis Concheiro. University of Santiago de Compostela, Santiago de Compostela, Spain; Comisaría General de Policía Científica, Madrid, Spain
| | - Peter M Schneider
- Institute of Legal Medicine, University of Cologne, Cologne, Germany
| | | | - Christopher Phillips
- Forensic Sciences Institute Luis Concheiro. University of Santiago de Compostela, Santiago de Compostela, Spain
| |
Collapse
|
5
|
Mirchev MB, Boeva I, Peshevska-Sekulovska M, Stoitsov V, Peruhova M. Synchronous manifestation of colorectal cancer and intraductal papillary mucinous neoplasms. World J Clin Cases 2023; 11:3408-3417. [PMID: 37383909 PMCID: PMC10294181 DOI: 10.12998/wjcc.v11.i15.3408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/26/2023] [Accepted: 04/17/2023] [Indexed: 05/25/2023] Open
Abstract
High rates of extrapancreatic malignancies, in particular colorectal cancer (CRC), have been detected in patients with intraductal papillary mucinous neoplasm (IPMN). So far, there is no distinct explanation in the literature for the development of secondary or synchronous malignancies in patients with IPMN. In the past few years, some data related to common genetic alterations in IPMN and other affiliated cancers have been published. This review elucidated the association between IPMN and CRC, shedding light on the most relevant genetic alterations that may explain the possible relationship between these entities. In keeping with our findings, we suggested that once the diagnosis of IPMN is made, special consideration of CRC should be undertaken. Presently, there are no specific guidelines regarding colorectal screening programs for patients with IPMN. We recommend that patients with IPMNs are at high-risk for CRC, and a more rigorous colorectal surveillance program should be implemented.
Collapse
Affiliation(s)
| | - Irina Boeva
- Department of Gastroenterology, Heart and Brain Hospital, Burgas 8000, Bulgaria
| | | | - Veselin Stoitsov
- Department of Gastroenterology, Heart and Brain Hospital, Burgas 8000, Bulgaria
| | - Milena Peruhova
- Department of Gastroenterology, Heart and Brain Hospital, Burgas 8000, Bulgaria
| |
Collapse
|
6
|
Ochoa-Zavala M, Osorio-Olvera L, Cerón-Souza I, Rivera-Ocasio E, Jiménez-Lobato V, Núñez-Farfán J. Reduction of Genetic Variation When Far From the Niche Centroid: Prediction for Mangrove Species. FRONTIERS IN CONSERVATION SCIENCE 2022. [DOI: 10.3389/fcosc.2021.795365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The niche-centroid hypothesis states that populations that are distributed near the centroid of the species' ecological niche will have higher fitness-related attributes, such as population abundance and genetic diversity than populations near the edges of the niche. Empirical evidence based on abundance and, more recently, genetic diversity data support this hypothesis. However, there are few studies that test this hypothesis in coastal species, such as mangroves. Here, we focused on the black mangrove Avicennia germinans. We combined ecological, heterozygosity, and allelic richness information from 1,419 individuals distributed in 40 populations with three main goals: (1) test the relationship between distance to the niche centroid and genetic diversity, (2) determine the set of environmental variables that best explain heterozygosity and allelic richness, and (3) predict the spatial variation in genetic diversity throughout most of the species' natural geographic range. We found a strong correlation between the distance to the niche centroid and both observed heterozygosity (Ho; ρ2 = 0.67 P < 0.05) and expected heterozygosity (He; ρ2 = 0.65, P < 0.05). The niche variables that best explained geographic variation in genetic diversity were soil type and precipitation seasonality. This suggests that these environmental variables influence mangrove growth and establishment, indirectly impacting standing genetic variation. We also predicted the spatial heterozygosity of A. germinans across its natural geographic range in the Americas using regression model coefficients. They showed significant power in predicting the observed data (R2 = 0.65 for Ho; R2 = 0.60 for He), even when we considered independent data sets (R2= 0.28 for Ho; R2 = 0.25 for He). Using this approach, several genetic diversity estimates can be implemented and may take advantage of population genomics to improve genetic diversity predictions. We conclude that the level of genetic diversity in A. germinans is in agreement with expectations of the niche-centroid hypothesis, namely that the highest heterozygosity and allelic richness (the basic genetic units for adaptation) are higher at locations of high environmental suitability. This shows that this approach is a potentially powerful tool in the conservation and management of this species, including for modelling changes in the face of climate change.
Collapse
|
7
|
Novroski NMM. Exploring new short tandem repeat markers for
DNA
mixture deconvolution. ACTA ACUST UNITED AC 2020. [DOI: 10.1002/wfs2.1390] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Nicole M. M. Novroski
- Forensic Science Program, Department of Anthropology University of Toronto Mississauga Ontario Canada
- Center for Human Identification, Graduate School of Biomedical Sciences University of North Texas Health Science Center, Fort Worth Texas USA
| |
Collapse
|
8
|
Genovese LM, Mosca MM, Pellegrini M, Geraci F. Dot2dot: accurate whole-genome tandem repeats discovery. Bioinformatics 2019; 35:914-922. [PMID: 30165507 PMCID: PMC6419916 DOI: 10.1093/bioinformatics/bty747] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 08/03/2018] [Accepted: 08/24/2018] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Large-scale sequencing projects have confirmed the hypothesis that eukaryotic DNA is rich in repetitions whose functional role needs to be elucidated. In particular, tandem repeats (TRs) (i.e. short, almost identical sequences that lie adjacent to each other) have been associated to many cellular processes and, indeed, are also involved in several genetic disorders. The need of comprehensive lists of TRs for association studies and the absence of a computational model able to capture their variability have revived research on discovery algorithms. RESULTS Building upon the idea that sequence similarities can be easily displayed using graphical methods, we formalized the structure that TRs induce in dot-plot matrices where a sequence is compared with itself. Leveraging on the observation that a compact representation of these matrices can be built and searched in linear time, we developed Dot2dot: an accurate algorithm fast enough to be suitable for whole-genome discovery of TRs. Experiments on five manually curated collections of TRs have shown that Dot2dot is more accurate than other established methods, and completes the analysis of the biggest known reference genome in about one day on a standard PC. AVAILABILITY AND IMPLEMENTATION Source code and datasets are freely available upon paper acceptance at the URL: https://github.com/Gege7177/Dot2dot. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Marco M Mosca
- Department of Computer Science, University of Liverpool, Liverpool, UK
| | - Marco Pellegrini
- Institute for Informatics and Telematics, CNR, Pisa, Italy.,Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, Pisa, Italy
| | - Filippo Geraci
- Institute for Informatics and Telematics, CNR, Pisa, Italy
| |
Collapse
|
9
|
Gettings KB, Borsuk LA, Zook J, Vallone PM. Unleashing novel STRS via characterization of genome in a bottle reference samples. FORENSIC SCIENCE INTERNATIONAL GENETICS SUPPLEMENT SERIES 2019. [DOI: 10.1016/j.fsigss.2019.09.084] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
10
|
Demongeot J, Norris V. Emergence of a "Cyclosome" in a Primitive Network Capable of Building "Infinite" Proteins. Life (Basel) 2019; 9:E51. [PMID: 31216720 PMCID: PMC6617141 DOI: 10.3390/life9020051] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 06/08/2019] [Accepted: 06/13/2019] [Indexed: 01/02/2023] Open
Abstract
We argue for the existence of an RNA sequence, called the AL (for ALpha) sequence, which may have played a role at the origin of life; this role entailed the AL sequence helping generate the first peptide assemblies via a primitive network. These peptide assemblies included "infinite" proteins. The AL sequence was constructed on an economy principle as the smallest RNA ring having one representative of each codon's synonymy class and capable of adopting a non-functional but nevertheless evolutionarily stable hairpin form that resisted denaturation due to environmental changes in pH, hydration, temperature, etc. Long subsequences from the AL ring resemble sequences from tRNAs and 5S rRNAs of numerous species like the proteobacterium, Rhodobacter sphaeroides. Pentameric subsequences from the AL are present more frequently than expected in current genomes, in particular, in genes encoding some of the proteins associated with ribosomes like tRNA synthetases. Such relics may help explain the existence of universal sequences like exon/intron frontier regions, Shine-Dalgarno sequence (present in bacterial and archaeal mRNAs), CRISPR and mitochondrial loop sequences.
Collapse
Affiliation(s)
- Jacques Demongeot
- Faculty of Medicine, Université Grenoble Alpes, AGEIS EA 7407 Tools for e-Gnosis Medical, 38700 La Tronche, France.
| | - Vic Norris
- Laboratory of Microbiology Signals and Microenvironment, Université de Rouen, 76821 Mont-Saint-Aignan CEDEX, France.
| |
Collapse
|
11
|
Fan W, Xu L, Cheng H, Li M, Liu H, Jiang Y, Guo Y, Zhou Z, Hou S. Characterization of Duck ( Anas platyrhynchos) Short Tandem Repeat Variation by Population-Scale Genome Resequencing. Front Genet 2018; 9:520. [PMID: 30425731 PMCID: PMC6218588 DOI: 10.3389/fgene.2018.00520] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 10/15/2018] [Indexed: 12/30/2022] Open
Abstract
Short tandem repeats (STRs) are usually associated with genetic diseases and gene regulatory functions, and are also important genetic markers for analysis of evolutionary, genetic diversity and forensic. However, for the majority of STRs in the duck genome, their population genetic properties and functional impacts remain poorly defined. Recent advent of next generation sequencing (NGS) has offered an opportunity for profiling large numbers of polymorphic STRs. Here, we reported a population-scale analysis of STR variation using genome resequencing in mallard and Pekin duck. Our analysis provided the first genome-wide duck STR reference including 198,022 STR loci with motif size of 2–6 base pairs. We observed a relatively uneven distribution of STRs in different genomic regions, which indicates that the occurrence of STRs in duck genome is not random, but undergoes a directional selection pressure. Using genome resequencing data of 23 mallard and 26 Pekin ducks, we successfully identified 89,891 polymorphic STR loci. Intensive analysis of this dataset suggested that shorter repeat motif, longer reference tract length, higher purity, and residing outside of a coding region are all associated with an increase in STR variability. STR genotypes were utilized for population genetic analysis, and the results showed that population structure and divergence patterns among population groups can be efficiently captured. In addition, comparison between Pekin duck and mallard identified 3,122 STRs with extremely divergent allele frequency, which overlapped with a set of genes related to nervous system, energy metabolism and behavior. The evolutionary analysis revealed that the genes containing divergent STRs may play important roles in phenotypic changes during duck domestication. The variation analysis of STRs in population scale provides valuable resource for future study of genetic diversity and genome evolution in duck.
Collapse
Affiliation(s)
- Wenlei Fan
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China.,State Key Laboratory of Animal Nutrition, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lingyang Xu
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Hong Cheng
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Ming Li
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Hehe Liu
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yong Jiang
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yuming Guo
- State Key Laboratory of Animal Nutrition, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zhengkui Zhou
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Shuisheng Hou
- Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
12
|
Saini S, Mitra I, Mousavi N, Fotsing SF, Gymrek M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat Commun 2018; 9:4397. [PMID: 30353011 PMCID: PMC6199332 DOI: 10.1038/s41467-018-06694-0] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 09/18/2018] [Indexed: 12/14/2022] Open
Abstract
Short tandem repeats (STRs) are involved in dozens of Mendelian disorders and have been implicated in complex traits. However, genotyping arrays used in genome-wide association studies focus on single nucleotide polymorphisms (SNPs) and do not readily allow identification of STR associations. We leverage next-generation sequencing (NGS) from 479 families to create a SNP + STR reference haplotype panel. Our panel enables imputing STR genotypes into SNP array data when NGS is not available for directly genotyping STRs. Imputed genotypes achieve mean concordance of 97% with observed genotypes in an external dataset compared to 71% expected under a naive model. Performance varies widely across STRs, with near perfect concordance at bi-allelic STRs vs. 70% at highly polymorphic repeats. Imputation increases power over individual SNPs to detect STR associations with gene expression. Imputing STRs into existing SNP datasets will enable the first large-scale STR association studies across a range of complex traits.
Collapse
Affiliation(s)
- Shubham Saini
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Ileena Mitra
- Bioinformatics and Systems Biology Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Nima Mousavi
- Department of Electrical and Computer Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Stephanie Feupe Fotsing
- Bioinformatics and Systems Biology Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
- Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Melissa Gymrek
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
| |
Collapse
|
13
|
Kanitz R, Guillot EG, Antoniazza S, Neuenschwander S, Goudet J. Complex genetic patterns in human arise from a simple range-expansion model over continental landmasses. PLoS One 2018; 13:e0192460. [PMID: 29466398 PMCID: PMC5821356 DOI: 10.1371/journal.pone.0192460] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 01/23/2018] [Indexed: 12/21/2022] Open
Abstract
Although it is generally accepted that geography is a major factor shaping human genetic differentiation, it is still disputed how much of this differentiation is a result of a simple process of isolation-by-distance, and if there are factors generating distinct clusters of genetic similarity. We address this question using a geographically explicit simulation framework coupled with an Approximate Bayesian Computation approach. Based on six simple summary statistics only, we estimated the most probable demographic parameters that shaped modern human evolution under an isolation by distance scenario, and found these were the following: an initial population in East Africa spread and grew from 4000 individuals to 5.7 million in about 132 000 years. Subsequent simulations with these estimates followed by cluster analyses produced results nearly identical to those obtained in real data. Thus, a simple diffusion model from East Africa explains a large portion of the genetic diversity patterns observed in modern humans. We argue that a model of isolation by distance along the continental landmasses might be the relevant null model to use when investigating selective effects in humans and probably many other species.
Collapse
Affiliation(s)
- Ricardo Kanitz
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Elsa G. Guillot
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | | | - Samuel Neuenschwander
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Vital-IT, Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Jérôme Goudet
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
14
|
Phillips C. A genomic audit of newly-adopted autosomal STRs for forensic identification. Forensic Sci Int Genet 2017; 29:193-204. [DOI: 10.1016/j.fsigen.2017.04.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2017] [Revised: 04/03/2017] [Accepted: 04/14/2017] [Indexed: 10/19/2022]
|
15
|
Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets. Proc Natl Acad Sci U S A 2017; 114:5671-5676. [PMID: 28507140 PMCID: PMC5465933 DOI: 10.1073/pnas.1619944114] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Combining genotypes across datasets is central in facilitating advances in genetics. Data aggregation efforts often face the challenge of record matching-the identification of dataset entries that represent the same individual. We show that records can be matched across genotype datasets that have no shared markers based on linkage disequilibrium between loci appearing in different datasets. Using two datasets for the same 872 people-one with 642,563 genome-wide SNPs and the other with 13 short tandem repeats (STRs) used in forensic applications-we find that 90-98% of forensic STR records can be connected to corresponding SNP records and vice versa. Accuracy increases to 99-100% when ∼30 STRs are used. Our method expands the potential of data aggregation, but it also suggests privacy risks intrinsic in maintenance of databases containing even small numbers of markers-including databases of forensic significance.
Collapse
|
16
|
de la Puente M, Phillips C, Fondevila M, Gelabert-Besada M, Carracedo Á, Lareu MV. A forensic multiplex of nine novel pentameric-repeat STRs. Forensic Sci Int Genet 2017; 29:154-164. [PMID: 28445836 DOI: 10.1016/j.fsigen.2017.04.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Revised: 03/29/2017] [Accepted: 04/11/2017] [Indexed: 12/22/2022]
Abstract
Pentameric-repeat short tandem repeats (STRs), consisting of loci with repeat units of five base-pairs, have the advantage of reduced stutter products compared to their tetrameric-repeat STR counterparts. This characteristic potentially helps the interpretation of mixed DNA profiles when minor component alleles may coincide with stutter peaks from the major components. To develop a simple but informative forensic multiplex with the capability to aid mixture interpretation, we designed an 11-plex assay of nine pentameric STRs new to forensic analysis plus two male- specific markers: DYS391 and the Y-Indel rs2032678 used in GlobalFiler™ (Life Technologies). East Asian-specific variation in the recently adopted Y-Indel rs2032678 is reported in this study for the first time in its forensic use as a sex marker. We estimated the levels of variation observed in the nine pentameric STRs in three of the major population groups sampled in the HGDP-CEPH human genome diversity panel: African, European and East Asian (combining individual populations as their sample sizes were too small for STR allele frequency estimations); and we include genotype data from a population sample of Northwest Spain. From this data, forensic informativeness metrics were estimated when applying the nine novel STRs in identification or kinship analyses. The assay was assessed for forensic sensitivity and ability to successfully genotype highly degraded DNA. In the profiles from the 11-plex assay we observed an average 2.15% stutter ratio in all the pentameric loci compared to 7.32% across equivalently-sized tetrameric STRs in the Promega Powerplex® ESX-17 kit.
Collapse
Affiliation(s)
- M de la Puente
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain
| | - C Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain.
| | - M Fondevila
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain
| | - M Gelabert-Besada
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain
| | - Á Carracedo
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain; Grupo de Medicina Xenómica (GMX), Faculty of Medicine, University of Santiago de Compostela, Spain; Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - M V Lareu
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain
| |
Collapse
|
17
|
Shin G, Grimes SM, Lee H, Lau BT, Xia LC, Ji HP. CRISPR-Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis. Nat Commun 2017; 8:14291. [PMID: 28169275 PMCID: PMC5309709 DOI: 10.1038/ncomms14291] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 12/15/2016] [Indexed: 11/09/2022] Open
Abstract
Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR-Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples.
Collapse
Affiliation(s)
- GiWon Shin
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA
| | - Susan M Grimes
- Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, California 94304, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA
| | - Billy T Lau
- Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, California 94304, USA
| | - Li C Xia
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA.,Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, California 94304, USA
| |
Collapse
|
18
|
Yashima AS, Innan H. varver: a database of microsatellite variation in vertebrates. Mol Ecol Resour 2016; 17:824-833. [PMID: 27796069 DOI: 10.1111/1755-0998.12625] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Revised: 09/24/2016] [Accepted: 10/04/2016] [Indexed: 01/16/2023]
Abstract
Understanding how genetic variation is maintained within a species is important in ecology, evolution, conservation and population genetics. Tremendous efforts have been made to evaluate the patterns of genetic variation in natural populations of various species. For this purpose, microsatellites have played a major role since the 1990s. Here we describe a comprehensive database, varver (Variation in Vertebrates) that provides complete information regarding microsatellite variation in natural populations of vertebrates. For each species, varver includes basic information of the species, a list of publications reporting the microsatellite variation, and tables of genetic variation within and between populations (heterozygosity and FST ). The geographic location and rough sampling range are also shown for each sampled population. The database should be useful for researchers interested in not only specific species but also comparing multiple species. We discuss the utility of microsatellite data, particularly for meta-analyses that involve multiple microsatellite loci from various species. We show that in such analyses, it is extremely important to correct for biases caused by differences in mutation rate, mainly due to repeat unit and number.
Collapse
Affiliation(s)
- Akiko Sato Yashima
- Department of Evolutionary Studies of Biosystems, Graduate University for Advanced Studies (SOKENDAI), Hayama, Kanagawa, 240-0193, Japan.,Department of Mathematical Engineering, Musashino University, 3-3-3 Ariake, Koto-ku, Tokyo, 135-8181, Japan
| | - Hideki Innan
- Department of Evolutionary Studies of Biosystems, Graduate University for Advanced Studies (SOKENDAI), Hayama, Kanagawa, 240-0193, Japan
| |
Collapse
|
19
|
Algee-Hewitt B, Edge M, Kim J, Li J, Rosenberg N. Individual Identifiability Predicts Population Identifiability in Forensic Microsatellite Markers. Curr Biol 2016; 26:935-42. [DOI: 10.1016/j.cub.2016.01.065] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Revised: 12/10/2015] [Accepted: 01/26/2016] [Indexed: 10/22/2022]
|
20
|
Phillips C, Parson W, Amigo J, King JL, Coble MD, Steffen CR, Vallone PM, Gettings KB, Butler JM, Budowle B. D5S2500 is an ambiguously characterized STR: Identification and description of forensic microsatellites in the genomics age. Forensic Sci Int Genet 2016; 23:19-24. [PMID: 26974236 DOI: 10.1016/j.fsigen.2016.03.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2016] [Revised: 03/04/2016] [Accepted: 03/06/2016] [Indexed: 12/18/2022]
Abstract
In the process of establishing short tandem repeat (STR) sequence variant nomenclature guidelines in anticipation of expanded forensic multiplexes for massively parallel sequencing (MPS), it was discovered that the STR D5S2500 has multiple positions and genomic characteristics reported. This ambiguity is because the marker named D5S2500 consists of two different microsatellites forming separate components in the capillary electrophoresis multiplexes of Qiagen's HDplex (Hilden, Germany) and AGCU ScienTech's non-CODIS STR 21plex (Wuxi, Jiangsu, China). This study outlines the genomic details used to identify each microsatellite and reveals the D5S2500 marker in HDplex has the correctly assigned STR name, while the D5S2500 marker in the AGCU 21plex, closely positioned a further 1643 nucleotides in the human reference sequence, is an unnamed microsatellite. The fact that the D5S2500 marker has existed as two distinct STR loci undetected for almost ten years, even with reported discordant genotypes for the standard control DNA, underlines the need for careful scrutiny of the genomic properties of forensic STRs, as they become adapted for sequence analysis with MPS systems. We make the recommendation that precise chromosome location data must be reported for any forensic marker under development but not in common use, so that the genomic characteristics of the locus are validated to the same level of accuracy as its allelic variation and forensic performance. To clearly differentiate each microsatellite, we propose the name D5S2800 be used to identify the Chromosome-5 STR in the AGCU 21plex.
Collapse
Affiliation(s)
- C Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Santiago de Compostela, Spain.
| | - W Parson
- Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria; Forensic Science Program, The Pennsylvania State University, PA, USA
| | - J Amigo
- Galician Public Foundation in Genomics Medicine (FPGMX), Santiago de Compostela, Spain
| | - J L King
- Institute of Applied Genetics, Department of Molecular and Medical Genetics, University of North Texas Health Science Center, 3500Camp Bowie Blvd., Fort Worth, TX 76107, USA
| | - M D Coble
- U.S. National Institute of Standards and Technology, Applied Genetics Group, Biomolecular Measurement Division, 100 Bureau Drive, Gaithersburg, MD 20899, USA
| | - C R Steffen
- U.S. National Institute of Standards and Technology, Applied Genetics Group, Biomolecular Measurement Division, 100 Bureau Drive, Gaithersburg, MD 20899, USA
| | - P M Vallone
- U.S. National Institute of Standards and Technology, Applied Genetics Group, Biomolecular Measurement Division, 100 Bureau Drive, Gaithersburg, MD 20899, USA
| | - K B Gettings
- U.S. National Institute of Standards and Technology, Applied Genetics Group, Biomolecular Measurement Division, 100 Bureau Drive, Gaithersburg, MD 20899, USA
| | - J M Butler
- U.S. National Institute of Standards and Technology, Applied Genetics Group, Biomolecular Measurement Division, 100 Bureau Drive, Gaithersburg, MD 20899, USA; U.S. National Institute of Standards and Technology, Special Programs Office, 100 Bureau Drive, Mail Stop 4701, Gaithersburg, MD 20899, USA
| | - B Budowle
- Institute of Applied Genetics, Department of Molecular and Medical Genetics, University of North Texas Health Science Center, 3500Camp Bowie Blvd., Fort Worth, TX 76107, USA; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
21
|
Chatterjee A, Basu A, Chowdhury A, Das K, Sarkar-Roy N, Majumder PP, Basu P. Comparative analyses of genetic risk prediction methods reveal extreme diversity of genetic predisposition to nonalcoholic fatty liver disease (NAFLD) among ethnic populations of India. J Genet 2016; 94:105-13. [PMID: 25846882 DOI: 10.1007/s12041-015-0494-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Nonalcoholic fatty liver disease (NAFLD) is a distinct pathologic condition characterized by a disease spectrum ranging from simple steatosis to steato-hepatitis, cirrhosis and hepatocellular carcinoma. Prevalence of NAFLD varies in different ethnic groups, ranging from 12% in Chinese to 45% in Hispanics. Among Indian populations, the diversity in prevalence is high, ranging from 9% in rural populations to 32% in urban populations, with geographic differences as well. Here, we wished to find out if this difference is reflected in their genetic makeup. To date, several candidate genes and a few genomewide association studies (GWAS) have been carried out, and many associations between single nucleotide polymorphisms (SNPs) and NAFLD have been observed. In this study, the risk allele frequencies (RAFs) of NAFLD-associated SNPs in 20 Indian ethnic populations (376 individuals) were analysed. We used two different measures for calculating genetic risk scores and compared their performance. The correlation of additive risk scores of NAFLD for three Hapmap populations with their weighted mean prevalence was found to be high (R(2) = 0.93). Later we used this method to compare NAFLD risk among ethnic Indian populations. Based on our observation, the Indian caste populations have high risk scores compared to Caucasians, who are often used as surrogate and similar to Indian caste population in disease gene association studies, and is significantly higher than the Indian tribal populations.
Collapse
Affiliation(s)
- Ankita Chatterjee
- National Institute of Biomedical Genomics, Netaji Subhas Sanatorium (T. B. Hospital), Kalyani 741 251, India.
| | | | | | | | | | | | | |
Collapse
|
22
|
Alves I, Arenas M, Currat M, Sramkova Hanulova A, Sousa VC, Ray N, Excoffier L. Long-Distance Dispersal Shaped Patterns of Human Genetic Diversity in Eurasia. Mol Biol Evol 2015; 33:946-58. [PMID: 26637555 PMCID: PMC4776706 DOI: 10.1093/molbev/msv332] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Most previous attempts at reconstructing the past history of human populations did not explicitly take geography into account or considered very simple scenarios of migration and ignored environmental information. However, it is likely that the last glacial maximum (LGM) affected the demography and the range of many species, including our own. Moreover, long-distance dispersal (LDD) may have been an important component of human migrations, allowing fast colonization of new territories and preserving high levels of genetic diversity. Here, we use a high-quality microsatellite data set genotyped in 22 populations to estimate the posterior probabilities of several scenarios for the settlement of the Old World by modern humans. We considered models ranging from a simple spatial expansion to others including LDD and a LGM-induced range contraction, as well as Neolithic demographic expansions. We find that scenarios with LDD are much better supported by data than models without LDD. Nevertheless, we show evidence that LDD events to empty habitats were strongly prevented during the settlement of Eurasia. This unexpected absence of LDD ahead of the colonization wave front could have been caused by an Allee effect, either due to intrinsic causes such as an inbreeding depression built during the expansion or due to extrinsic causes such as direct competition with archaic humans. Overall, our results suggest only a relatively limited effect of the LGM contraction on current patterns of human diversity. This is in clear contrast with the major role of LDD migrations, which have potentially contributed to the intermingled genetic structure of Eurasian populations.
Collapse
Affiliation(s)
- Isabel Alves
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland Population and Conservation Genetics Group, Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | - Miguel Arenas
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
| | - Mathias Currat
- Anthropology, Genetics and Peopling History Lab, Department of Genetics & Evolution-Anthropology Unit, University of Geneva, Geneva, Switzerland
| | - Anna Sramkova Hanulova
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Vitor C Sousa
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nicolas Ray
- EnviroSPACE Lab, Institute for Environmental Sciences, University of Geneva, Geneva, Switzerland
| | - Laurent Excoffier
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
23
|
Zauza-Carrasco M, Rosenfeld-Mann F, Estrada-Juárez H. Análisis de las variaciones en el número de repeticiones de 5 marcadores ancestrales en donadores recurrentes en México. PERINATOLOGÍA Y REPRODUCCIÓN HUMANA 2015. [DOI: 10.1016/j.rprh.2015.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
|
24
|
Bilgin Sonay T, Carvalho T, Robinson MD, Greminger MP, Krützen M, Comas D, Highnam G, Mittelman D, Sharp A, Marques-Bonet T, Wagner A. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res 2015; 25:1591-1599. [PMID: 26290536 DOI: 10.1101/015784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 08/14/2015] [Indexed: 05/25/2023]
Abstract
Tandem repeats (TRs) are stretches of DNA that are highly variable in length and mutate rapidly. They are thus an important source of genetic variation. This variation is highly informative for population and conservation genetics. It has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation in humans and closely related species have been scarce due to technical difficulties derived from short-read technology. Here we explored the genome-wide diversity of TRs in a panel of 83 human and nonhuman great ape genomes, in a total of six different species, and studied their impact on gene expression evolution. We found that population diversity patterns can be efficiently captured with short TRs (repeat unit length, 1-5 bp). We examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length, 2-50 bp). Genes that contained TRs in the promoters, in their 3' untranslated region, in introns, and in exons had higher expression divergence than genes without repeats in the regions. Polymorphic small repeats (1-5 bp) had also higher expression divergence compared with genes with fixed or no TRs in the gene promoters. Our findings highlight the potential contribution of TRs to human evolution through gene regulation.
Collapse
Affiliation(s)
- Tugce Bilgin Sonay
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tiago Carvalho
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Mark D Robinson
- The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Maja P Greminger
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - Michael Krützen
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - David Comas
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Gareth Highnam
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Andrew Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai School, New York, New York 10029, USA
| | - Tomàs Marques-Bonet
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain; Centro Nacional de Análisis Genómico (CNAG), PCB, Barcelona, 08028 Catalonia, Spain; Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; The Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
25
|
Bilgin Sonay T, Carvalho T, Robinson MD, Greminger MP, Krützen M, Comas D, Highnam G, Mittelman D, Sharp A, Marques-Bonet T, Wagner A. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res 2015; 25:1591-9. [PMID: 26290536 PMCID: PMC4617956 DOI: 10.1101/gr.190868.115] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 08/14/2015] [Indexed: 12/20/2022]
Abstract
Tandem repeats (TRs) are stretches of DNA that are highly variable in length and mutate rapidly. They are thus an important source of genetic variation. This variation is highly informative for population and conservation genetics. It has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation in humans and closely related species have been scarce due to technical difficulties derived from short-read technology. Here we explored the genome-wide diversity of TRs in a panel of 83 human and nonhuman great ape genomes, in a total of six different species, and studied their impact on gene expression evolution. We found that population diversity patterns can be efficiently captured with short TRs (repeat unit length, 1–5 bp). We examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length, 2–50 bp). Genes that contained TRs in the promoters, in their 3′ untranslated region, in introns, and in exons had higher expression divergence than genes without repeats in the regions. Polymorphic small repeats (1–5 bp) had also higher expression divergence compared with genes with fixed or no TRs in the gene promoters. Our findings highlight the potential contribution of TRs to human evolution through gene regulation.
Collapse
Affiliation(s)
- Tugce Bilgin Sonay
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tiago Carvalho
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Mark D Robinson
- The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Maja P Greminger
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - Michael Krützen
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - David Comas
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Gareth Highnam
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Andrew Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai School, New York, New York 10029, USA
| | - Tomàs Marques-Bonet
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain; Centro Nacional de Análisis Genómico (CNAG), PCB, Barcelona, 08028 Catalonia, Spain; Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; The Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
26
|
Vongpaisarnsin K, Listman JB, Malison RT, Gelernter J. Ancestry informative markers for distinguishing between Thai populations based on genome-wide association datasets. Leg Med (Tokyo) 2015; 17:245-50. [PMID: 25759192 DOI: 10.1016/j.legalmed.2015.02.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Revised: 02/16/2015] [Accepted: 02/19/2015] [Indexed: 11/25/2022]
Abstract
The main purpose of this work was to identify a set of AIMs that stratify the genetic structure and diversity of the Thai population from a high-throughput autosomal genome-wide association study. In this study, more than one million SNPs from the international HapMap database and the Thai depression genome-wide association study have been examined to identify ancestry informative markers (AIMs) that distinguish between Thai populations. An efficient strategy is proposed to identify and characterize such SNPs and to test high-resolution SNP data from international HapMap populations. The best AIMs are identified to stratify the population and to infer genetic ancestry structure. A total of 124 AIMs were clearly clustered geographically across the continent, whereas only 89 AIMs stratified the Thai population from East Asian populations. Finally, a set of 273 AIMs was able to distinguish northern from southern Thai subpopulations. These markers will be of particular value in identifying the ethnic origins in regions where matching by self-reports is unavailable or unreliable, which usually occurs in real forensic cases.
Collapse
Affiliation(s)
- Kornkiat Vongpaisarnsin
- Department of Forensic Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.
| | | | - Robert T Malison
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA; VA Connecticut Healthcare System, West Haven Campus, West Haven, CT, USA
| | - Joel Gelernter
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA; VA Connecticut Healthcare System, West Haven Campus, West Haven, CT, USA; Department of Genetics, Yale University School of Medicine, New Haven, CT, USA; Department of Neurobiology, Yale University School of Medicine, New Haven, CT, USA
| |
Collapse
|
27
|
Kwong M, Pemberton TJ. Sequence differences at orthologous microsatellites inflate estimates of human-chimpanzee differentiation. BMC Genomics 2014; 15:990. [PMID: 25407736 PMCID: PMC4253012 DOI: 10.1186/1471-2164-15-990] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 10/30/2014] [Indexed: 02/06/2023] Open
Abstract
Background Microsatellites---contiguous arrays of 2–6 base-pair motifs---have formed the cornerstone of population-genetic studies for over two decades. Their genotype data typically takes the form of PCR fragment lengths obtained using locus-specific primer pairs to amplify the genomic region encompassing the microsatellite. Recently, we reported a dataset of 5,795 human and 84 chimpanzee individuals with genotypes at 246 human-derived autosomal microsatellites as a resource to facilitate interspecies comparisons. A major assumption underlying this dataset is that PCR amplicons at orthologous microsatellites are commensurable between species. Results We find this assumption to be frequently incorrect owing to discordance in microsatellite organization and variability, as well as nontrivial length imbalances caused by small species-specific indels in microsatellite flanking sequences. Converting PCR fragment lengths into the repeat numbers they represent at 138 microsatellites whose organization and variability was found to be highly similar in both species, we show that interspecies incommensurability among PCR amplicons can inflate FST and DPS estimates by up to 10.6%. Separate investigations of determinants of microsatellite variability in humans and chimpanzees uncover similar patterns with mean and maximum numbers of repeats, as well as numbers and ranges of distinct alleles, all important factors in predicting heterozygosity. In contrast, across microsatellites, numbers of repeats were significantly smaller in chimpanzees than in humans, while numbers and ranges of distinct alleles were instead larger. Conclusions Our findings have fundamental implications for interspecies comparisons using microsatellites and offer new opportunities for more accurate comparisons of patterns of human and chimpanzee genetic variation in numerous areas of application. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-990) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Trevor J Pemberton
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Manitoba, Canada.
| |
Collapse
|
28
|
Putman AI, Carbone I. Challenges in analysis and interpretation of microsatellite data for population genetic studies. Ecol Evol 2014; 4:4399-428. [PMID: 25540699 PMCID: PMC4267876 DOI: 10.1002/ece3.1305] [Citation(s) in RCA: 204] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 10/02/2014] [Accepted: 10/03/2014] [Indexed: 12/14/2022] Open
Abstract
Advancing technologies have facilitated the ever-widening application of genetic markers such as microsatellites into new systems and research questions in biology. In light of the data and experience accumulated from several years of using microsatellites, we present here a literature review that synthesizes the limitations of microsatellites in population genetic studies. With a focus on population structure, we review the widely used fixation (F ST) statistics and Bayesian clustering algorithms and find that the former can be confusing and problematic for microsatellites and that the latter may be confounded by complex population models and lack power in certain cases. Clustering, multivariate analyses, and diversity-based statistics are increasingly being applied to infer population structure, but in some instances these methods lack formalization with microsatellites. Migration-specific methods perform well only under narrow constraints. We also examine the use of microsatellites for inferring effective population size, changes in population size, and deeper demographic history, and find that these methods are untested and/or highly context-dependent. Overall, each method possesses important weaknesses for use with microsatellites, and there are significant constraints on inferences commonly made using microsatellite markers in the areas of population structure, admixture, and effective population size. To ameliorate and better understand these constraints, researchers are encouraged to analyze simulated datasets both prior to and following data collection and analysis, the latter of which is formalized within the approximate Bayesian computation framework. We also examine trends in the literature and show that microsatellites continue to be widely used, especially in non-human subject areas. This review assists with study design and molecular marker selection, facilitates sound interpretation of microsatellite data while fostering respect for their practical limitations, and identifies lessons that could be applied toward emerging markers and high-throughput technologies in population genetics.
Collapse
Affiliation(s)
- Alexander I Putman
- Department of Plant Pathology, North Carolina State University Raleigh, North Carolina, 27695-7616
| | - Ignazio Carbone
- Department of Plant Pathology, North Carolina State University Raleigh, North Carolina, 27695-7616
| |
Collapse
|
29
|
Willems T, Gymrek M, Highnam G, Mittelman D, Erlich Y. The landscape of human STR variation. Genome Res 2014; 24:1894-904. [PMID: 25135957 PMCID: PMC4216929 DOI: 10.1101/gr.177774.114] [Citation(s) in RCA: 187] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/15/2014] [Indexed: 02/06/2023]
Abstract
Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome's representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.
Collapse
Affiliation(s)
- Thomas Willems
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Computational and Systems Biology Program, MIT, Cambridge, Massachusetts 02139, USA
| | - Melissa Gymrek
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA; Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Gareth Highnam
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA; Gene by Gene, Ltd., Houston, Texas 77008, USA
| | - Yaniv Erlich
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA;
| |
Collapse
|
30
|
Pemberton TJ, Rosenberg NA. Population-genetic influences on genomic estimates of the inbreeding coefficient: a global perspective. Hum Hered 2014; 77:37-48. [PMID: 25060268 DOI: 10.1159/000362878] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND/AIMS Culturally driven marital practices provide a key instance of an interaction between social and genetic processes in shaping patterns of human genetic variation, producing, for example, increased identity by descent through consanguineous marriage. A commonly used measure to quantify identity by descent in an individual is the inbreeding coefficient, a quantity that reflects not only consanguinity, but also other aspects of kinship in the population to which the individual belongs. Here, in populations worldwide, we examine the relationship between genomic estimates of the inbreeding coefficient and population patterns in genetic variation. METHODS Using genotypes at 645 microsatellites, we compare inbreeding coefficients from 5,043 individuals representing 237 populations worldwide to demographic consanguinity frequency estimates available for 26 populations as well as to other quantities that can illuminate population-genetic influences on inbreeding coefficients. RESULTS We observe higher inbreeding coefficient estimates in populations and geographic regions with known high levels of consanguinity or genetic isolation and in populations with an increased effect of genetic drift and decreased genetic diversity with increasing distance from Africa. For the small number of populations with specific consanguinity estimates, we find a correlation between inbreeding coefficients and consanguinity frequency (r = 0.349, p = 0.040). CONCLUSIONS The results emphasize the importance of both consanguinity and population-genetic factors in influencing variation in inbreeding coefficients, and they provide insight into factors useful for assessing the effect of consanguinity on genomic patterns in different populations.
Collapse
Affiliation(s)
- Trevor J Pemberton
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Man., Canada
| | | |
Collapse
|
31
|
Haasl RJ, Payseur BA. Remarkable selective constraints on exonic dinucleotide repeats. Evolution 2014; 68:2737-44. [PMID: 24899386 DOI: 10.1111/evo.12460] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 05/14/2014] [Indexed: 01/07/2023]
Abstract
Long dinucleotide repeats found in exons present a substantial mutational hazard: mutations at these loci occur often and generate frameshifts. Here, we provide clear and compelling evidence that exonic dinucleotides experience strong selective constraint. In humans, only 18 exonic dinucleotides have repeat lengths greater than six, which contrasts sharply with the genome-wide distribution of dinucleotides. We genotyped each of these dinucleotides in 200 humans from eight 1000 Genomes Project populations and found a near-absence of polymorphism. More remarkably, divergence data demonstrate that repeat lengths have been conserved across the primate phylogeny in spite of what is likely considerable mutational pressure. Coalescent simulations show that even a very low mutation rate at these loci fails to explain the anomalous patterns of polymorphism and divergence. Our data support two related selective constraints on the evolution of exonic dinucleotides: a short-term intolerance for any change to repeat length and a long-term prevention of increases to repeat length. In general, our results implicate purifying selection as the force that eliminates new, deleterious mutants at exonic dinucleotides. We briefly discuss the evolution of the longest exonic dinucleotide in the human genome--a 10 x CA repeat in fibroblast growth factor receptor-like 1 (FGFRL1)--that should possess a considerably greater mutation rate than any other exonic dinucleotide and therefore generate a large number of deleterious variants.
Collapse
Affiliation(s)
- Ryan J Haasl
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, 53706.
| | | |
Collapse
|
32
|
Haberstick BC, Smolen A, Stetler GL, Tabor JW, Roy T, Rick Casey H, Pardo A, Roy F, Ryals LA, Hewitt C, Whitsel EA, Halpern CT, Killeya-Jones LA, Lessem JM, Hewitt JK, Harris KM. Simple sequence repeats in the national longitudinal study of adolescent health: an ethnically diverse resource for genetic analysis of health and behavior. Behav Genet 2014; 44:487-97. [PMID: 24890516 DOI: 10.1007/s10519-014-9662-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 05/08/2014] [Indexed: 12/16/2022]
Abstract
Simple sequence repeats (SSRs) are one of the earliest available forms of genetic variation available for analysis and have been utilized in studies of neurological, behavioral, and health phenotypes. Although findings from these studies have been suggestive, their interpretation has been complicated by a variety of factors including, among others, limited power due to small sample sizes. The current report details the availability, diversity, and allele and genotype frequencies of six commonly examined SSRs in the ethnically diverse, population-based National Longitudinal Study of Adolescent Health. A total of 106,743 genotypes were generated across 15,140 participants that included four microsatellites and two di-nucleotide repeats in three dopamine genes (DAT1, DRD4, DRD5), the serotonin transporter, and monoamine oxidase A. Allele and genotype frequencies showed a complex pattern and differed significantly between populations. For both di-nucleotide repeats we observed a greater allelic diversity than previously reported. The availability of these six SSRs in a large, ethnically diverse sample with extensive environmental measures assessed longitudinally offers a unique resource for researchers interested in health and behavior.
Collapse
Affiliation(s)
- Brett C Haberstick
- Institute for Behavioral Genetics, University of Colorado Boulder, Campus Box 447, Boulder, CO, 80309-0447, USA,
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Corona E, Chen R, Sikora M, Morgan AA, Patel CJ, Ramesh A, Bustamante CD, Butte AJ. Analysis of the genetic basis of disease in the context of worldwide human relationships and migration. PLoS Genet 2013; 9:e1003447. [PMID: 23717210 PMCID: PMC3662561 DOI: 10.1371/journal.pgen.1003447] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 02/28/2013] [Indexed: 12/21/2022] Open
Abstract
Genetic diversity across different human populations can enhance understanding of the genetic basis of disease. We calculated the genetic risk of 102 diseases in 1,043 unrelated individuals across 51 populations of the Human Genome Diversity Panel. We found that genetic risk for type 2 diabetes and pancreatic cancer decreased as humans migrated toward East Asia. In addition, biliary liver cirrhosis, alopecia areata, bladder cancer, inflammatory bowel disease, membranous nephropathy, systemic lupus erythematosus, systemic sclerosis, ulcerative colitis, and vitiligo have undergone genetic risk differentiation. This analysis represents a large-scale attempt to characterize genetic risk differentiation in the context of migration. We anticipate that our findings will enable detailed analysis pertaining to the driving forces behind genetic risk differentiation. The environment humans inhabit has changed many times in the last 100,000 years. Migration and dynamic local environments can lead to genetic adaptations favoring beneficial traits. Many genes responsible for these adaptations can alter disease susceptibility. Genes can also affect disease susceptibility by varying randomly across different populations. We have studied genetic variants that are known to modify disease susceptibility in the context of worldwide migration. We found that variants associated with 11 diseases have been affected to an extent that is not explained by random variation. We also found that the genetic risk of type 2 diabetes has steadily decreased along the worldwide human migration trajectory from Africa to America.
Collapse
Affiliation(s)
- Erik Corona
- Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America
- Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Rong Chen
- Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Martin Sikora
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Alexander A. Morgan
- Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America
- Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Chirag J. Patel
- Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America
- Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Aditya Ramesh
- Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Carlos D. Bustamante
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Atul J. Butte
- Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America
- Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
- * E-mail:
| |
Collapse
|
34
|
Population structure in a comprehensive genomic data set on human microsatellite variation. G3-GENES GENOMES GENETICS 2013; 3:891-907. [PMID: 23550135 PMCID: PMC3656735 DOI: 10.1534/g3.113.005728] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Over the past two decades, microsatellite genotypes have provided the data for landmark studies of human population-genetic variation. However, the various microsatellite data sets have been prepared with different procedures and sets of markers, so that it has been difficult to synthesize available data for a comprehensive analysis. Here, we combine eight human population-genetic data sets at the 645 microsatellite loci they share in common, accounting for procedural differences in the production of the different data sets, to assemble a single data set containing 5795 individuals from 267 worldwide populations. We perform a systematic analysis of genetic relatedness, detecting 240 intra-population and 92 inter-population pairs of previously unidentified close relatives and proposing standardized subsets of unrelated individuals for use in future studies. We then augment the human data with a data set of 84 chimpanzees at the 246 loci they share in common with the human samples. Multidimensional scaling and neighbor-joining analyses of these data sets offer new insights into the structure of human populations and enable a comparison of genetic variation patterns in chimpanzees with those in humans. Our combined data sets are the largest of their kind reported to date and provide a resource for use in human population-genetic studies.
Collapse
|
35
|
Abstract
The forensic genetics field is generating extensive population data on polymorphism of short tandem repeats (STR) markers in globally distributed samples. In this study we explored and quantified the informative power of these datasets to address issues related to human evolution and diversity, by using two online resources: an allele frequency dataset representing 141 populations summing up to almost 26 thousand individuals; a genotype dataset consisting of 42 populations and more than 11 thousand individuals. We show that the genetic relationships between populations based on forensic STRs are best explained by geography, as observed when analysing other worldwide datasets generated specifically to study human diversity. However, the global level of genetic differentiation between populations (as measured by a fixation index) is about half the value estimated with those other datasets, which contain a much higher number of markers but much less individuals. We suggest that the main factor explaining this difference is an ascertainment bias in forensics data resulting from the choice of markers for individual identification. We show that this choice results in average low variance of heterozygosity across world regions, and hence in low differentiation among populations. Thus, the forensic genetic markers currently produced for the purpose of individual assignment and identification allow the detection of the patterns of neutral genetic structure that characterize the human population but they do underestimate the levels of this genetic structure compared to the datasets of STRs (or other kinds of markers) generated specifically to study the diversity of human populations.
Collapse
Affiliation(s)
- Nuno M. Silva
- IPATIMUP (Instituto de Patologia e Imunologia Molecular da Universidade do Porto), Universidade do Porto, Porto, Portugal
| | - Luísa Pereira
- IPATIMUP (Instituto de Patologia e Imunologia Molecular da Universidade do Porto), Universidade do Porto, Porto, Portugal
- Faculdade de Medicina, Universidade do Porto, Porto, Portugal
| | - Estella S. Poloni
- Laboratory of Anthropology, Genetics and Peopling History, Department of Genetics and Evolution - Anthropology Unit, University of Geneva, Geneva, Switzerland
| | - Mathias Currat
- Laboratory of Anthropology, Genetics and Peopling History, Department of Genetics and Evolution - Anthropology Unit, University of Geneva, Geneva, Switzerland
- * E-mail:
| |
Collapse
|
36
|
Abstract
Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat STR mapping as gapped alignment, which results in cumbersome processing times and a biased sampling of STR alleles. Here, we present lobSTR, a novel method for profiling STRs in personal genomes. lobSTR harnesses concepts from signal processing and statistical learning to avoid gapped alignment and to address the specific noise patterns in STR calling. The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling. We validated lobSTR's accuracy by measuring its consistency in calling STRs from whole-genome sequencing of two biological replicates from the same individual, by tracing Mendelian inheritance patterns in STR alleles in whole-genome sequencing of a HapMap trio, and by comparing lobSTR results to traditional molecular techniques. Encouraged by the speed and accuracy of lobSTR, we used the algorithm to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome. We traced the mutation dynamics of close to 100,000 STR loci and observed more than 50,000 STR variations in a single genome. lobSTR's implementation is an end-to-end solution. The package accepts raw sequencing reads and provides the user with the genotyping results. It is written in C/C++, includes multi-threading capabilities, and is compatible with the BAM format.
Collapse
Affiliation(s)
- Melissa Gymrek
- Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | |
Collapse
|
37
|
Detecting and Removing Ascertainment Bias in Microsatellites from the HGDP-CEPH Panel. G3-GENES GENOMES GENETICS 2011; 1:479-88. [PMID: 22384358 PMCID: PMC3276161 DOI: 10.1534/g3.111.001016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2011] [Accepted: 09/20/2011] [Indexed: 12/13/2022]
Abstract
Although ascertainment bias in single nucleotide polymorphisms is a well-known problem, it is generally accepted that microsatellites have mutation rates too high for bias to be a concern. Here, we analyze in detail the large set of microsatellites typed for the Human Genetic Diversity Panel (HGDP)-CEPH panel. We develop a novel framework based on rarefaction to compare heterozygosity across markers with different mutation rates. We find that, whereas di- and tri-nucleotides show similar patterns of within- and between-population heterozygosity, tetra-nucleotides are inconsistent with the other two motifs. In addition, di- and tri-nucleotides are consistent with 16 unbiased tetra-nucleotide markers, whereas the HPGP-CEPH tetra-nucleotides are significantly different. This discrepancy is due to the HGDP-CEPH tetra-nucleotides being too homogeneous across Eurasia, even after their slower mutation rate is taken into account by rarefying the other markers. The most likely explanation for this pattern is ascertainment bias. We strongly advocate the exclusion of tetra-nucleotides from future population genetics analysis of this dataset, and we argue that other microsatellite datasets should be investigated for the presence of bias using the approach outlined in this article.
Collapse
|
38
|
Gardner MG, Fitch AJ, Bertozzi T, Lowe AJ. Rise of the machines--recommendations for ecologists when using next generation sequencing for microsatellite development. Mol Ecol Resour 2011; 11:1093-101. [PMID: 21679314 DOI: 10.1111/j.1755-0998.2011.03037.x] [Citation(s) in RCA: 180] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Next generation sequencing is revolutionizing molecular ecology by simplifying the development of molecular genetic markers, including microsatellites. Here, we summarize the results of the large-scale development of microsatellites for 54 nonmodel species using next generation sequencing and show that there are clear differences amongst plants, invertebrates and vertebrates for the number and proportion of motif types recovered that are able to be utilized as markers. We highlight that the heterogeneity within each group is very large. Despite this variation, we provide an indication of what number of sequences and consequent proportion of a 454 run are required for the development of 40 designable, unique microsatellite loci for a typical molecular ecological study. Finally, to address the challenges of choosing loci from the vast array of microsatellite loci typically available from partial genome runs (average for this study, 2341 loci), we provide a microsatellite development flowchart as a procedural guide for application once the results of a partial genome run are obtained.
Collapse
Affiliation(s)
- Michael G Gardner
- School of Biological Sciences, Flinders University, GPO Box 2100, Adelaide, SA 5001, Australia.
| | | | | | | |
Collapse
|
39
|
Jennings TN, Knaus BJ, Mullins TD, Haig SM, Cronn RC. Multiplexed microsatellite recovery using massively parallel sequencing. Mol Ecol Resour 2011; 11:1060-7. [PMID: 21676207 DOI: 10.1111/j.1755-0998.2011.03033.x] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356,958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5 M (USD).
Collapse
Affiliation(s)
- T N Jennings
- Pacific Northwest Research Station, USDA Forest Service, 3200 SW Jefferson Way, Corvallis, OR 97331, USA
| | | | | | | | | |
Collapse
|
40
|
Population-specific links between heterozygosity and the rate human microsatellite evolution. J Mol Evol 2010; 72:215-21. [PMID: 21161201 DOI: 10.1007/s00239-010-9423-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Accepted: 11/29/2010] [Indexed: 10/18/2022]
Abstract
Microsatellites form an abundant class of DNA sequences used widely as genetic markers. Surprisingly, the length of human microsatellites varies highly predictably with distance from Africa, apparently following the linear decline in variability that arose as we colonised the world. Such patterns have been used to argue that heterozygosity modulates the rate of microsatellite evolution. Here I test the ensuing prediction that variation in demographic history will cause individual populations predictably either to lead or to lag any given trend in length. I find that they do: larger populations with locally higher heterozygosity have microsatellites that are longer when a locus is expanding and shorter when a locus is contracting. These patterns remain even after controlling for the stepwise way in which heterozygosity and allele lengths decline across the world. This analysis provides support for a strongly discontinuous model for how human genetic variability is distributed and shows how individual populations differ in the average rate their microsatellites are evolving. Such patterns have the potential to provide a new window onto historical demography.
Collapse
|
41
|
Ballantyne KN, Goedbloed M, Fang R, Schaap O, Lao O, Wollstein A, Choi Y, van Duijn K, Vermeulen M, Brauer S, Decorte R, Poetsch M, von Wurmb-Schwark N, de Knijff P, Labuda D, Vézina H, Knoblauch H, Lessig R, Roewer L, Ploski R, Dobosz T, Henke L, Henke J, Furtado MR, Kayser M. Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications. Am J Hum Genet 2010; 87:341-53. [PMID: 20817138 DOI: 10.1016/j.ajhg.2010.08.006] [Citation(s) in RCA: 282] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2010] [Revised: 08/02/2010] [Accepted: 08/13/2010] [Indexed: 11/24/2022] Open
Abstract
Nonrecombining Y-chromosomal microsatellites (Y-STRs) are widely used to infer population histories, discover genealogical relationships, and identify males for criminal justice purposes. Although a key requirement for their application is reliable mutability knowledge, empirical data are only available for a small number of Y-STRs thus far. To rectify this, we analyzed a large number of 186 Y-STR markers in nearly 2000 DNA-confirmed father-son pairs, covering an overall number of 352,999 meiotic transfers. Following confirmation by DNA sequence analysis, the retrieved mutation data were modeled via a Bayesian approach, resulting in mutation rates from 3.78 × 10(-4) (95% credible interval [CI], 1.38 × 10(-5) - 2.02 × 10(-3)) to 7.44 × 10(-2) (95% CI, 6.51 × 10(-2) - 9.09 × 10(-2)) per marker per generation. With the 924 mutations at 120 Y-STR markers, a nonsignificant excess of repeat losses versus gains (1.16:1), as well as a strong and significant excess of single-repeat versus multirepeat changes (25.23:1), was observed. Although the total repeat number influenced Y-STR locus mutability most strongly, repeat complexity, the length in base pairs of the repeated motif, and the father's age also contributed to Y-STR mutability. To exemplify how to practically utilize this knowledge, we analyzed the 13 most mutable Y-STRs in an independent sample set and empirically proved their suitability for distinguishing close and distantly related males. This finding is expected to revolutionize Y-chromosomal applications in forensic biology, from previous male lineage differentiation toward future male individual identification.
Collapse
|
42
|
Payseur BA, Jing P, Haasl RJ. A genomic portrait of human microsatellite variation. Mol Biol Evol 2010; 28:303-12. [PMID: 20675409 DOI: 10.1093/molbev/msq198] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Rapid advances in DNA sequencing and genotyping technologies are beginning to reveal the scope and pattern of human genomic variation. Although single nucleotide polymorphisms (SNPs) have been intensively studied, the extent and form of variation at other types of molecular variants remain poorly understood. Polymorphism at the most variable loci in the human genome, microsatellites, has rarely been examined on a genomic scale without the ascertainment biases that attend typical genotyping studies. We conducted a genomic survey of variation at microsatellites with at least three perfect repeats by comparing two complete genome sequences, the Human Genome Reference sequence and the sequence of J. Craig Venter. The genomic proportion of polymorphic loci was 2.7%, much higher than the rate of SNP variation, with marked heterogeneity among classes of loci. The proportion of variable loci increased substantially with repeat number. Repeat lengths differed in levels of variation, with longer repeat lengths generally showing higher polymorphism at the same repeat number. Microsatellite variation was weakly correlated with regional SNP number, indicating modest effects of shared genealogical history. Reductions in variation were detected at microsatellites located in introns, in untranslated regions, in coding exons, and just upstream of transcription start sites, suggesting the presence of selective constraints. Our results provide new insights into microsatellite mutational processes and yield a preview of patterns of variation that will be obtained in genomic surveys of larger numbers of individuals.
Collapse
|