Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

18
(from Reference Citation Analysis)

Article PDFs (6)

Cited by > 0 (15)

Searched Name

Genetics, Population/standards

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Xu ZM, Rüeger S, Zwyer M, Brites D, Hiza H, Reinhard M, Rutaihwa L, Borrell S, Isihaka F, Temba H, Maroa T, Naftari R, Hella J, Sasamalo M, Reither K, Portevin D, Gagneux S, Fellay J. Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations. PLoS Comput Biol 2022;18:e1009628. [PMID: 35025869 PMCID: PMC8791479 DOI: 10.1371/journal.pcbi.1009628] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Revised: 01/26/2022] [Accepted: 11/10/2021] [Indexed: 12/13/2022] Open

Abstract

Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array.

Genome-wide association studies, which study the association between genetic variants and various phenotypes, typically rely on genotyping arrays. Only a small proportion of genetic variants within the genome are typed on genotyping arrays. Untyped variants are statistically inferred through a process known as genotype imputation, where correlations between variants (haplotypes) observed in external reference panels are leveraged to infer untyped variants in the study population. However, for study populations that are underrepresented in existing reference panels, the quality of imputation is often sub-optimal. This is because typed variants incorporated on existing genotyping arrays can be unsuitable for the study population, and haplotype structures can be different between the reference and the study population. Here, we illustrate an approach to select a custom set of population-specific typed variants to improve genotype imputation in such underrepresented populations.

Collapse

Affiliation(s)

Zhi Ming Xu School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
Sina Rüeger School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
Michaela Zwyer Swiss Tropical and Public Health Institute, Basel, Switzerland University of Basel, Basel, Switzerland
Daniela Brites Swiss Tropical and Public Health Institute, Basel, Switzerland University of Basel, Basel, Switzerland
Hellen Hiza Swiss Tropical and Public Health Institute, Basel, Switzerland University of Basel, Basel, Switzerland Ifakara Health Institute, Dar es Salaam, Tanzania
Miriam Reinhard Swiss Tropical and Public Health Institute, Basel, Switzerland University of Basel, Basel, Switzerland
Liliana Rutaihwa Swiss Tropical and Public Health Institute, Basel, Switzerland University of Basel, Basel, Switzerland
Sonia Borrell Swiss Tropical and Public Health Institute, Basel, Switzerland University of Basel, Basel, Switzerland
Faima Isihaka Ifakara Health Institute, Dar es Salaam, Tanzania
Hosiana Temba Ifakara Health Institute, Dar es Salaam, Tanzania
Thomas Maroa Ifakara Health Institute, Dar es Salaam, Tanzania
Rastard Naftari Ifakara Health Institute, Dar es Salaam, Tanzania
Jerry Hella Ifakara Health Institute, Dar es Salaam, Tanzania
Mohamed Sasamalo Ifakara Health Institute, Dar es Salaam, Tanzania
Klaus Reither Swiss Tropical and Public Health Institute, Basel, Switzerland University of Basel, Basel, Switzerland
Damien Portevin Swiss Tropical and Public Health Institute, Basel, Switzerland University of Basel, Basel, Switzerland
Sebastien Gagneux Swiss Tropical and Public Health Institute, Basel, Switzerland University of Basel, Basel, Switzerland
Jacques Fellay School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland Precision Medicine Unit, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland * E-mail:

Collapse

Schrider DR. Background Selection Does Not Mimic the Patterns of Genetic Diversity Produced by Selective Sweeps. Genetics 2020;216:499-519. [PMID: 32847814 PMCID: PMC7536861 DOI: 10.1534/genetics.120.303469] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 08/04/2020] [Indexed: 12/28/2022] Open

Abstract

It is increasingly evident that natural selection plays a prominent role in shaping patterns of diversity across the genome. The most commonly studied modes of natural selection are positive selection and negative selection, which refer to directional selection for and against derived mutations, respectively. Positive selection can result in hitchhiking events, in which a beneficial allele rapidly replaces all others in the population, creating a valley of diversity around the selected site along with characteristic skews in allele frequencies and linkage disequilibrium among linked neutral polymorphisms. Similarly, negative selection reduces variation not only at selected sites but also at linked sites, a phenomenon called background selection (BGS). Thus, discriminating between these two forces may be difficult, and one might expect efforts to detect hitchhiking to produce an excess of false positives in regions affected by BGS. Here, we examine the similarity between BGS and hitchhiking models via simulation. First, we show that BGS may somewhat resemble hitchhiking in simplistic scenarios in which a region constrained by negative selection is flanked by large stretches of unconstrained sites, echoing previous results. However, this scenario does not mirror the actual spatial arrangement of selected sites across the genome. By performing forward simulations under more realistic scenarios of BGS, modeling the locations of protein-coding and conserved noncoding DNA in real genomes, we show that the spatial patterns of variation produced by BGS rarely mimic those of hitchhiking events. Indeed, BGS is not substantially more likely than neutrality to produce false signatures of hitchhiking. This holds for simulations modeled after both humans and Drosophila, and for several different demographic histories. These results demonstrate that appropriately designed scans for hitchhiking need not consider BGS's impact on false-positive rates. However, we do find evidence that BGS increases the false-negative rate for hitchhiking, an observation that demands further investigation.

Collapse

Ralph P, Thornton K, Kelleher J. Efficiently Summarizing Relationships in Large Samples: A General Duality Between Statistics of Genealogies and Genomes. Genetics 2020;215:779-797. [PMID: 32357960 PMCID: PMC7337078 DOI: 10.1534/genetics.120.303253] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 04/28/2020] [Indexed: 12/11/2022] Open

Abstract

As a genetic mutation is passed down across generations, it distinguishes those genomes that have inherited it from those that have not, providing a glimpse of the genealogical tree relating the genomes to each other at that site. Statistical summaries of genetic variation therefore also describe the underlying genealogies. We use this correspondence to define a general framework that efficiently computes single-site population genetic statistics using the succinct tree sequence encoding of genealogies and genome sequence. The general approach accumulates sample weights within the genealogical tree at each position on the genome, which are then combined using a summary function; different statistics result from different choices of weight and function. Results can be reported in three ways: by site, which corresponds to statistics calculated as usual from genome sequence; by branch, which gives the expected value of the dual site statistic under the infinite sites model of mutation, and by node, which summarizes the contribution of each ancestor to these statistics. We use the framework to implement many currently defined statistics of genome sequence (making the statistics' relationship to the underlying genealogical trees concrete and explicit), as well as the corresponding branch statistics of tree shape. We evaluate computational performance using simulated data, and show that calculating statistics from tree sequences using this general framework is several orders of magnitude more efficient than optimized matrix-based methods in terms of both run time and memory requirements. We also explore how well the duality between site and branch statistics holds in practice on trees inferred from the 1000 Genomes Project data set, and discuss ways in which deviations may encode interesting biological signals.

Collapse

Soller JM, Ausband DE, Szykman Gunther M. The curse of observer experience: Error in noninvasive genetic sampling. PLoS One 2020;15:e0229762. [PMID: 32168506 PMCID: PMC7069729 DOI: 10.1371/journal.pone.0229762] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 02/13/2020] [Indexed: 11/18/2022] Open

Hao W, Storey JD. Extending Tests of Hardy-Weinberg Equilibrium to Structured Populations. Genetics 2019;213:759-770. [PMID: 31537622 PMCID: PMC6827367 DOI: 10.1534/genetics.119.302370] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 08/21/2019] [Indexed: 12/22/2022] Open

Peterson RE, Kuchenbaecker K, Walters RK, Chen CY, Popejoy AB, Periyasamy S, Lam M, Iyegbe C, Strawbridge RJ, Brick L, Carey CE, Martin AR, Meyers JL, Su J, Chen J, Edwards AC, Kalungi A, Koen N, Majara L, Schwarz E, Smoller JW, Stahl EA, Sullivan PF, Vassos E, Mowry B, Prieto ML, Cuellar-Barboza A, Bigdeli TB, Edenberg HJ, Huang H, Duncan LE. Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations. Cell 2019;179:589-603. [PMID: 31607513 PMCID: PMC6939869 DOI: 10.1016/j.cell.2019.08.051] [Citation(s) in RCA: 345] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 07/10/2019] [Accepted: 08/26/2019] [Indexed: 12/19/2022]

Affiliation(s)

Roseann E Peterson Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA.
Karoline Kuchenbaecker Division of Psychiatry and UCL Genetics Institute, University College London, London W1T 7NF, UK
Raymond K Walters Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
Chia-Yen Chen Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA; Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
Alice B Popejoy Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA 94305, USA
Sathish Periyasamy Queensland Brain Institute and Queensland Centre for Mental Health Research, The University of Queensland, Brisbane, QLD 4072, Australia
Max Lam Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
Conrad Iyegbe Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London SE5 8AF, UK
Rona J Strawbridge Institute of Health and Wellbeing, University of Glasgow, Glasgow G12 8RZ, UK; Department of Medicine Solna, Karolinska Institute, Stockholm, SE 17176, Sweden
Leslie Brick Department of Psychiatry and Human Behavior, Warren Alpert Medical School, Brown University, Providence, RI 02906, USA
Caitlin E Carey Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
Alicia R Martin Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
Jacquelyn L Meyers Department of Psychiatry, State University of New York Downstate Medical Center, Brooklyn, NY 11203, USA
Jinni Su Department of Psychology, Arizona State University, Tempe, AZ 85281, USA
Junfang Chen Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany
Alexis C Edwards Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
Allan Kalungi Mental Health Section of MRC/UVRI and LSHTM Uganda Research Unit, P.O. Box 49, Entebbe, Uganda; Department of Psychiatry, Faculty of Medicine & Health Sciences, University of Stellenbosch, Cape Town, South Africa; Department of Medical Microbiology, College of Health Sciences, Makerere University, Kampala, Uganda; Global Initiative for Neuropsychiatric Genetics Education in Research, Harvard T.H. Chan School of Public Health and Broad Institute, Boston, MA 02115, USA
Nastassja Koen Department of Psychiatry, Faculty of Medicine & Health Sciences, University of Stellenbosch, Cape Town, South Africa; Department of Medical Microbiology, College of Health Sciences, Makerere University, Kampala, Uganda; Global Initiative for Neuropsychiatric Genetics Education in Research, Harvard T.H. Chan School of Public Health and Broad Institute, Boston, MA 02115, USA
Lerato Majara Global Initiative for Neuropsychiatric Genetics Education in Research, Harvard T.H. Chan School of Public Health and Broad Institute, Boston, MA 02115, USA; MRC Human Genetics Research Unit, Division of Human Genetics, Department of Pathology, Institute of Infectious Diseases and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
Emanuel Schwarz Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany
Jordan W Smoller Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
Eli A Stahl Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Patrick F Sullivan Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, SE 17176, Sweden; Genetics and Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Evangelos Vassos Social, Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, SE5 8AF, UK
Bryan Mowry Queensland Brain Institute and Queensland Centre for Mental Health Research, The University of Queensland, Brisbane, QLD 4072, Australia
Miguel L Prieto Department of Psychiatry, Faculty of Medicine, Universidad de los Andes, Santiago 7620001, Chile; Mental Health Service, Clínica Universidad de los Andes, Santiago 7620001, Chile; Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, USA
Alfredo Cuellar-Barboza Department of Psychiatry, University Hospital and School of Medicine, Universidad Autonoma de Nuevo Leon, Monterrey, Mexico; Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, USA
Tim B Bigdeli Department of Psychiatry, State University of New York Downstate Medical Center, Brooklyn, NY 11203, USA
Howard J Edenberg Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
Hailiang Huang Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
Laramie E Duncan Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA

Collapse

Sakuma K, Ishida R, Kodama T, Takada Y. Reconstructing the population history of the sandy beach amphipod Haustorioides japonicus using the calibration of demographic transition (CDT) approach. PLoS One 2019;14:e0223624. [PMID: 31596891 PMCID: PMC6785125 DOI: 10.1371/journal.pone.0223624] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 09/24/2019] [Indexed: 11/19/2022] Open

El'chinova GI, Ivanov AV, El'kanova LA, Revazova YA, Zinchenko RA. [Acceptability of using Karachay surnames as a quasigenetic marker in population and genetic studies]. Genetika 2014;50:874-877. [PMID: 25720146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Al-Meeri A, Non AL, Lajoie TW, Mulligan CJ. Effect of different sampling strategies for a single geographic region in Yemen on standard genetic analyses of mitochondrial DNA sequence data. Mitochondrial DNA 2011;22:66-70. [PMID: 21864032 DOI: 10.3109/19401736.2011.606462] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Fujimura JH, Rajagopalan R. Different differences: the use of 'genetic ancestry' versus race in biomedical human genetic research. Soc Stud Sci 2011;41:5-30. [PMID: 21553638 PMCID: PMC3124377 DOI: 10.1177/0306312710379170] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]

Abstract

This article presents findings from our ethnographic research on biomedical scientists' studies of human genetic variation and common complex disease. We examine the socio-material work involved in genome-wide association studies (GWAS) and discuss whether, how, and when notions of race and ethnicity are or are not used. We analyze how researchers produce simultaneously different kinds of populations and population differences. Although many geneticists use race in their analyses, we find some who have invented a statistical genetics method and associated software that they use specifically to avoid using categories of race in their genetic analysis. Their method allows them to operationalize their concept of 'genetic ancestry' without resorting to notions of race and ethnicity. We focus on the construction and implementation of the software's algorithms, and discuss the consequences and implications of the software technology for debates and policies around the use of race in genetics research. We also demonstrate that the production and use of their method involves a dynamic and fluid assemblage of actors in various disciplines responding to disciplinary and sociopolitical contexts and concerns. This assemblage also includes particular discourses on human history and geography as they become entangled with research on genetic markers and disease.We introduce the concept of'genome geography' to analyze how some researchers studying human genetic variation'locate' stretches of DNA in different places and times. The concept of genetic ancestry and the practice of genome geography rely on old discourses, but they also incorporate new technologies, infrastructures, and political and scientific commitments. Some of these new technologies provide opportunities to change some of our institutional and cultural forms and frames around notions of difference and similarity. Nevertheless, we also highlight the slipperiness of genome geography and the tenacity of race and race concepts.

Collapse

Yu KD, Di GH, Fan L, Shao ZM. Test of Hardy-Weinberg equilibrium in breast cancer case-control studies: an issue may influence the conclusions. Breast Cancer Res Treat 2009;117:675-7. [PMID: 19242790 DOI: 10.1007/s10549-009-0353-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2009] [Accepted: 02/17/2009] [Indexed: 11/28/2022]

Reid L. Networking genetics, populations, and race. Am J Bioeth 2009;9:50-52. [PMID: 19998116 DOI: 10.1080/15265160902893957] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]

Scheet P, Stephens M. Linkage disequilibrium-based quality control for large-scale genetic studies. PLoS Genet 2008;4:e1000147. [PMID: 18670630 PMCID: PMC2475504 DOI: 10.1371/journal.pgen.1000147] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Accepted: 07/01/2008] [Indexed: 11/24/2022] Open

Abstract

Quality control (QC) is a critical step in large-scale studies of genetic variation. While, on average, high-throughput single nucleotide polymorphism (SNP) genotyping assays are now very accurate, the errors that remain tend to cluster into a small percentage of “problem” SNPs, which exhibit unusually high error rates. Because most large-scale studies of genetic variation are searching for phenomena that are rare (e.g., SNPs associated with a phenotype), even this small percentage of problem SNPs can cause important practical problems. Here we describe and illustrate how patterns of linkage disequilibrium (LD) can be used to improve QC in large-scale, population-based studies. This approach has the advantage over existing filters (e.g., HWE or call rate) that it can actually reduce genotyping error rates by automatically correcting some genotyping errors. Applying this LD-based QC procedure to data from The International HapMap Project, we identify over 1,500 SNPs that likely have high error rates in the CHB and JPT samples and estimate corrected genotypes. Our method is implemented in the software package fastPHASE, available from the Stephens Lab website (http://stephenslab.uchicago.edu/software.html).

In large-scale studies of population genetic data, particularly genome-wide association studies, considerable effort may be spent on quality control (QC) to ensure genotype data are accurate. Typically, QC steps are applied independently to individual marker loci, with data from suspicious loci being excluded from subsequent analyses. Here we present a new QC tool, which exploits the fact that correlation of alleles among nearby genetic loci (linkage disequilibrium; LD) provides a certain amount of redundancy in genotype information, and that high rates of genotyping error at a marker may leave their trace in unusual patterns of LD. The method (a) aids in the detection of SNP loci with possibly elevated levels of genotyping error, and (b) in some cases allows for the correction of erroneous genotype calls, thereby salvaging some of the genotype data from the QC filtering process. We confirm on data from real populations that SNPs identified by this approach do show evidence for containing actual genotyping errors, and we also examine genotype intensity plots to confirm that many individual genotypes corrected by the method do appear to be called in error. More generally, these results demonstrate the potential utility of incorporating LD information into algorithms for processing and analyzing population genotype data.

Collapse

Teo YY. Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling and population structure. Curr Opin Lipidol 2008;19:133-43. [PMID: 18388693 DOI: 10.1097/mol.0b013e3282f5dd77] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Bandelt HJ, Kivisild T. Quality assessment of DNA sequence data: autopsy of a mis-sequenced mtDNA population sample. Ann Hum Genet 2006;70:314-26. [PMID: 16674554 DOI: 10.1111/j.1529-8817.2005.00234.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Stoneking M, Nasidze I. The Patient is Not Dead Yet: Premature Autopsy of a mtDNA Data Set. Ann Hum Genet 2006;70:327-31. [PMID: 16674555 DOI: 10.1111/j.1469-1809.2005.00235.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Genetics of psychiatric disorders. Nat Neurosci 2005;8:693-693. [PMID: 15917827 DOI: 10.1038/nn0605-693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Chen ZW, Ouyang ZH, Dong G, Li RS. [Analyzing genetic quality of BALB/c mouse strains in China by microsatellite marking]. Yi Chuan 2004;26:845-8. [PMID: 15762004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]