1
|
Xu ZM, Rüeger S, Zwyer M, Brites D, Hiza H, Reinhard M, Rutaihwa L, Borrell S, Isihaka F, Temba H, Maroa T, Naftari R, Hella J, Sasamalo M, Reither K, Portevin D, Gagneux S, Fellay J. Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations. PLoS Comput Biol 2022; 18:e1009628. [PMID: 35025869 PMCID: PMC8791479 DOI: 10.1371/journal.pcbi.1009628] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Revised: 01/26/2022] [Accepted: 11/10/2021] [Indexed: 12/13/2022] Open
Abstract
Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array. Genome-wide association studies, which study the association between genetic variants and various phenotypes, typically rely on genotyping arrays. Only a small proportion of genetic variants within the genome are typed on genotyping arrays. Untyped variants are statistically inferred through a process known as genotype imputation, where correlations between variants (haplotypes) observed in external reference panels are leveraged to infer untyped variants in the study population. However, for study populations that are underrepresented in existing reference panels, the quality of imputation is often sub-optimal. This is because typed variants incorporated on existing genotyping arrays can be unsuitable for the study population, and haplotype structures can be different between the reference and the study population. Here, we illustrate an approach to select a custom set of population-specific typed variants to improve genotype imputation in such underrepresented populations.
Collapse
Affiliation(s)
- Zhi Ming Xu
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sina Rüeger
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michaela Zwyer
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Daniela Brites
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Hellen Hiza
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | - Miriam Reinhard
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Liliana Rutaihwa
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Sonia Borrell
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | | | | | - Thomas Maroa
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | | | - Jerry Hella
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | | | - Klaus Reither
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Damien Portevin
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Sebastien Gagneux
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Jacques Fellay
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Precision Medicine Unit, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
2
|
Abstract
It is increasingly evident that natural selection plays a prominent role in shaping patterns of diversity across the genome. The most commonly studied modes of natural selection are positive selection and negative selection, which refer to directional selection for and against derived mutations, respectively. Positive selection can result in hitchhiking events, in which a beneficial allele rapidly replaces all others in the population, creating a valley of diversity around the selected site along with characteristic skews in allele frequencies and linkage disequilibrium among linked neutral polymorphisms. Similarly, negative selection reduces variation not only at selected sites but also at linked sites, a phenomenon called background selection (BGS). Thus, discriminating between these two forces may be difficult, and one might expect efforts to detect hitchhiking to produce an excess of false positives in regions affected by BGS. Here, we examine the similarity between BGS and hitchhiking models via simulation. First, we show that BGS may somewhat resemble hitchhiking in simplistic scenarios in which a region constrained by negative selection is flanked by large stretches of unconstrained sites, echoing previous results. However, this scenario does not mirror the actual spatial arrangement of selected sites across the genome. By performing forward simulations under more realistic scenarios of BGS, modeling the locations of protein-coding and conserved noncoding DNA in real genomes, we show that the spatial patterns of variation produced by BGS rarely mimic those of hitchhiking events. Indeed, BGS is not substantially more likely than neutrality to produce false signatures of hitchhiking. This holds for simulations modeled after both humans and Drosophila, and for several different demographic histories. These results demonstrate that appropriately designed scans for hitchhiking need not consider BGS's impact on false-positive rates. However, we do find evidence that BGS increases the false-negative rate for hitchhiking, an observation that demands further investigation.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514
| |
Collapse
|
3
|
Ralph P, Thornton K, Kelleher J. Efficiently Summarizing Relationships in Large Samples: A General Duality Between Statistics of Genealogies and Genomes. Genetics 2020; 215:779-797. [PMID: 32357960 PMCID: PMC7337078 DOI: 10.1534/genetics.120.303253] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 04/28/2020] [Indexed: 12/11/2022] Open
Abstract
As a genetic mutation is passed down across generations, it distinguishes those genomes that have inherited it from those that have not, providing a glimpse of the genealogical tree relating the genomes to each other at that site. Statistical summaries of genetic variation therefore also describe the underlying genealogies. We use this correspondence to define a general framework that efficiently computes single-site population genetic statistics using the succinct tree sequence encoding of genealogies and genome sequence. The general approach accumulates sample weights within the genealogical tree at each position on the genome, which are then combined using a summary function; different statistics result from different choices of weight and function. Results can be reported in three ways: by site, which corresponds to statistics calculated as usual from genome sequence; by branch, which gives the expected value of the dual site statistic under the infinite sites model of mutation, and by node, which summarizes the contribution of each ancestor to these statistics. We use the framework to implement many currently defined statistics of genome sequence (making the statistics' relationship to the underlying genealogical trees concrete and explicit), as well as the corresponding branch statistics of tree shape. We evaluate computational performance using simulated data, and show that calculating statistics from tree sequences using this general framework is several orders of magnitude more efficient than optimized matrix-based methods in terms of both run time and memory requirements. We also explore how well the duality between site and branch statistics holds in practice on trees inferred from the 1000 Genomes Project data set, and discuss ways in which deviations may encode interesting biological signals.
Collapse
Affiliation(s)
- Peter Ralph
- Institute of Evolution and Ecology, Departments of Mathematics and Biology, University of Oregon, Eugene, Oregon 97405
| | - Kevin Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, United Kingdom OX3 7LF
| |
Collapse
|
4
|
Abstract
Noninvasive genetic sampling (NGS) is commonly used to study elusive or rare species where direct observation or capture is difficult. Little attention has been paid to the potential effects of observer bias while collecting noninvasive genetic samples in the field, however. Over a period of 7 years, we examined whether different observers (n = 58) and observer experience influenced detection, amplification rates, and correct species identification of 4,836 gray wolf (Canis lupus) fecal samples collected in Idaho and Yellowstone National Park, USA and southwestern Alberta, Canada (2008-2014). We compared new observers (n = 33) to experienced observers (n = 25) and hypothesized experience level would increase the overall success of using NGS techniques in the wild. In contrast to our hypothesis, we found that new individuals were better than experienced observers at detecting and collecting wolf scats and correctly identifying wolf scats from other sympatric carnivores present in the study areas. While adequate training of new observers is crucial for the successful use of NGS techniques, attention should also be directed to experienced observers. Observer experience could be a curse because of their potential effects on NGS data quality arising from fatigue, boredom or other factors. The ultimate benefit of an observer to a project is a combination of factors (i.e., field savvy, local knowledge), but project investigators should be aware of the potential negative effects of experience on NGS sampling.
Collapse
Affiliation(s)
- Jillian M. Soller
- Department of Wildlife, Humboldt State University, Arcata, California, United States of America
- * E-mail:
| | - David E. Ausband
- University of Montana Cooperative Wildlife Research Unit, Missoula, Montana, United States of America
| | - Micaela Szykman Gunther
- Department of Wildlife, Humboldt State University, Arcata, California, United States of America
| |
Collapse
|
5
|
Abstract
Testing for Hardy-Weinberg equilibrium (HWE) is an important component in almost all analyses of population genetic data. Genetic markers that violate HWE are often treated as special cases; for example, they may be flagged as possible genotyping errors, or they may be investigated more closely for evolutionary signatures of interest. The presence of population structure is one reason why genetic markers may fail a test of HWE. This is problematic because almost all natural populations studied in the modern setting show some degree of structure. Therefore, it is important to be able to detect deviations from HWE for reasons other than structure. To this end, we extend statistical tests of HWE to allow for population structure, which we call a test of "structural HWE." Additionally, our new test allows one to automatically choose tuning parameters and identify accurate models of structure. We demonstrate our approach on several important studies, provide theoretical justification for the test, and present empirical evidence for its utility. We anticipate the proposed test will be useful in a broad range of analyses of genome-wide population genetic data.
Collapse
Affiliation(s)
- Wei Hao
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, New Jersey 08544
| | - John D Storey
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, New Jersey 08544
| |
Collapse
|
6
|
Peterson RE, Kuchenbaecker K, Walters RK, Chen CY, Popejoy AB, Periyasamy S, Lam M, Iyegbe C, Strawbridge RJ, Brick L, Carey CE, Martin AR, Meyers JL, Su J, Chen J, Edwards AC, Kalungi A, Koen N, Majara L, Schwarz E, Smoller JW, Stahl EA, Sullivan PF, Vassos E, Mowry B, Prieto ML, Cuellar-Barboza A, Bigdeli TB, Edenberg HJ, Huang H, Duncan LE. Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations. Cell 2019; 179:589-603. [PMID: 31607513 PMCID: PMC6939869 DOI: 10.1016/j.cell.2019.08.051] [Citation(s) in RCA: 345] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 07/10/2019] [Accepted: 08/26/2019] [Indexed: 12/19/2022]
Abstract
Genome-wide association studies (GWASs) have focused primarily on populations of European descent, but it is essential that diverse populations become better represented. Increasing diversity among study participants will advance our understanding of genetic architecture in all populations and ensure that genetic research is broadly applicable. To facilitate and promote research in multi-ancestry and admixed cohorts, we outline key methodological considerations and highlight opportunities, challenges, solutions, and areas in need of development. Despite the perception that analyzing genetic data from diverse populations is difficult, it is scientifically and ethically imperative, and there is an expanding analytical toolbox to do it well.
Collapse
Affiliation(s)
- Roseann E Peterson
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA.
| | - Karoline Kuchenbaecker
- Division of Psychiatry and UCL Genetics Institute, University College London, London W1T 7NF, UK
| | - Raymond K Walters
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Chia-Yen Chen
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA; Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Alice B Popejoy
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Sathish Periyasamy
- Queensland Brain Institute and Queensland Centre for Mental Health Research, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Max Lam
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Conrad Iyegbe
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London SE5 8AF, UK
| | - Rona J Strawbridge
- Institute of Health and Wellbeing, University of Glasgow, Glasgow G12 8RZ, UK; Department of Medicine Solna, Karolinska Institute, Stockholm, SE 17176, Sweden
| | - Leslie Brick
- Department of Psychiatry and Human Behavior, Warren Alpert Medical School, Brown University, Providence, RI 02906, USA
| | - Caitlin E Carey
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Alicia R Martin
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Jacquelyn L Meyers
- Department of Psychiatry, State University of New York Downstate Medical Center, Brooklyn, NY 11203, USA
| | - Jinni Su
- Department of Psychology, Arizona State University, Tempe, AZ 85281, USA
| | - Junfang Chen
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany
| | - Alexis C Edwards
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Allan Kalungi
- Mental Health Section of MRC/UVRI and LSHTM Uganda Research Unit, P.O. Box 49, Entebbe, Uganda; Department of Psychiatry, Faculty of Medicine & Health Sciences, University of Stellenbosch, Cape Town, South Africa; Department of Medical Microbiology, College of Health Sciences, Makerere University, Kampala, Uganda; Global Initiative for Neuropsychiatric Genetics Education in Research, Harvard T.H. Chan School of Public Health and Broad Institute, Boston, MA 02115, USA
| | - Nastassja Koen
- Department of Psychiatry, Faculty of Medicine & Health Sciences, University of Stellenbosch, Cape Town, South Africa; Department of Medical Microbiology, College of Health Sciences, Makerere University, Kampala, Uganda; Global Initiative for Neuropsychiatric Genetics Education in Research, Harvard T.H. Chan School of Public Health and Broad Institute, Boston, MA 02115, USA
| | - Lerato Majara
- Global Initiative for Neuropsychiatric Genetics Education in Research, Harvard T.H. Chan School of Public Health and Broad Institute, Boston, MA 02115, USA; MRC Human Genetics Research Unit, Division of Human Genetics, Department of Pathology, Institute of Infectious Diseases and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Eli A Stahl
- Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Patrick F Sullivan
- Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, SE 17176, Sweden; Genetics and Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Evangelos Vassos
- Social, Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, SE5 8AF, UK
| | - Bryan Mowry
- Queensland Brain Institute and Queensland Centre for Mental Health Research, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Miguel L Prieto
- Department of Psychiatry, Faculty of Medicine, Universidad de los Andes, Santiago 7620001, Chile; Mental Health Service, Clínica Universidad de los Andes, Santiago 7620001, Chile; Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, USA
| | - Alfredo Cuellar-Barboza
- Department of Psychiatry, University Hospital and School of Medicine, Universidad Autonoma de Nuevo Leon, Monterrey, Mexico; Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, USA
| | - Tim B Bigdeli
- Department of Psychiatry, State University of New York Downstate Medical Center, Brooklyn, NY 11203, USA
| | - Howard J Edenberg
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Hailiang Huang
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Laramie E Duncan
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
7
|
Sakuma K, Ishida R, Kodama T, Takada Y. Reconstructing the population history of the sandy beach amphipod Haustorioides japonicus using the calibration of demographic transition (CDT) approach. PLoS One 2019; 14:e0223624. [PMID: 31596891 PMCID: PMC6785125 DOI: 10.1371/journal.pone.0223624] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 09/24/2019] [Indexed: 11/19/2022] Open
Abstract
Calibration of the molecular rate is one of the major challenges in marine population genetics. Although the use of an appropriate evolutionary rate is crucial in exploring population histories, calibration of the rate is always difficult because fossil records and geological events are rarely applicable for rate calibration. The acceleration of the evolutionary rate for recent coalescent events (or more simply, the time dependency of the molecular clock) is also a problem that can lead to overestimation of population parameters. Calibration of demographic transition (CDT) is a rate calibration technique that assumes a post-glacial demographic expansion, representing one of the most promising approaches for dealing with these potential problems in the rate calibration. Here, we demonstrate the importance of using an appropriate evolutionary rate, and the power of CDT, by using populations of the sandy beach amphipod Haustorioides japonicus along the Japanese coast of the northwestern Pacific Ocean. Analysis of mitochondrial sequences found that the most peripheral population in the Pacific coast of northeastern Honshu Island (Tohoku region) is genetically distinct from the other northwestern Pacific populations. By using the two-epoch demographic model and rate of temperature change, the evolutionary rate was modeled as a log-normal distribution with a median rate of 2.2%/My. The split-time of the Tohoku population was subsequently estimated to be during the previous interglacial period by using the rate distribution, which enables us to infer potential causes of the divergence between local populations along the continuous Pacific coast of Japan.
Collapse
Affiliation(s)
- Kay Sakuma
- Japan Sea National Fisheries Research Institute, Fisheries Research and Education Agency, Niigata, Japan
- * E-mail:
| | - Risa Ishida
- Japan Sea National Fisheries Research Institute, Fisheries Research and Education Agency, Niigata, Japan
| | - Taketoshi Kodama
- Japan Sea National Fisheries Research Institute, Fisheries Research and Education Agency, Niigata, Japan
| | - Yoshitake Takada
- Japan Sea National Fisheries Research Institute, Fisheries Research and Education Agency, Niigata, Japan
| |
Collapse
|
8
|
El'chinova GI, Ivanov AV, El'kanova LA, Revazova YA, Zinchenko RA. [Acceptability of using Karachay surnames as a quasigenetic marker in population and genetic studies]. Genetika 2014; 50:874-877. [PMID: 25720146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Based on a comparison of the data on the frequencies of 1206 surnames registered in the Malokarachayevsky District of Karachay-Cherkessia with a number of other parameters and historical data, it was concluded that Karachay surnames are acceptable for use as a quasigenetic marker in a study of a population-genetic description of the area.
Collapse
|
9
|
Al-Meeri A, Non AL, Lajoie TW, Mulligan CJ. Effect of different sampling strategies for a single geographic region in Yemen on standard genetic analyses of mitochondrial DNA sequence data. Mitochondrial DNA 2011; 22:66-70. [PMID: 21864032 DOI: 10.3109/19401736.2011.606462] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Collection of biological samples is the foundation of genetic studies ranging from estimation of genetic diversity to reconstruction of population history. Sample collections are intended to accurately represent the genetic, biological, ecological, cultural, geographic, and/or linguistic diversity of a particular region or population by providing a small, but representative, set of samples. In this study, we analyze human mitochondrial DNA variation in samples collected using four different sampling strategies to represent the same geographic region. Specifically, samples were collected from a village, a rural area, a regional clinic, and a national university in the governorate of Dhamar in Yemen. All samples were assayed for mitochondrial hypervariable region I DNA sequence variation and data were subjected to standard molecular genetic analyses. Our results suggest that analyses in which individual DNA sequences are explicitly compared or evaluated, e.g. phylogenetic and network analyses, may be more sensitive to sample collection design than analyses in which data are averaged across individuals or are analyzed more indirectly, e.g. summary statistics.
Collapse
Affiliation(s)
- A Al-Meeri
- Department of Biochemistry and Molecular Biology, Sana'a University, Yemen
| | | | | | | |
Collapse
|
10
|
Abstract
This article presents findings from our ethnographic research on biomedical scientists' studies of human genetic variation and common complex disease. We examine the socio-material work involved in genome-wide association studies (GWAS) and discuss whether, how, and when notions of race and ethnicity are or are not used. We analyze how researchers produce simultaneously different kinds of populations and population differences. Although many geneticists use race in their analyses, we find some who have invented a statistical genetics method and associated software that they use specifically to avoid using categories of race in their genetic analysis. Their method allows them to operationalize their concept of 'genetic ancestry' without resorting to notions of race and ethnicity. We focus on the construction and implementation of the software's algorithms, and discuss the consequences and implications of the software technology for debates and policies around the use of race in genetics research. We also demonstrate that the production and use of their method involves a dynamic and fluid assemblage of actors in various disciplines responding to disciplinary and sociopolitical contexts and concerns. This assemblage also includes particular discourses on human history and geography as they become entangled with research on genetic markers and disease.We introduce the concept of'genome geography' to analyze how some researchers studying human genetic variation'locate' stretches of DNA in different places and times. The concept of genetic ancestry and the practice of genome geography rely on old discourses, but they also incorporate new technologies, infrastructures, and political and scientific commitments. Some of these new technologies provide opportunities to change some of our institutional and cultural forms and frames around notions of difference and similarity. Nevertheless, we also highlight the slipperiness of genome geography and the tenacity of race and race concepts.
Collapse
Affiliation(s)
- Joan H Fujimura
- Department of Sociology, University of Wisconsin-Madison, 8128 Social Science Building, 1180 Observatory Drive, Madison, WI 53706, USA.
| | | |
Collapse
|
11
|
Yu KD, Di GH, Fan L, Shao ZM. Test of Hardy-Weinberg equilibrium in breast cancer case-control studies: an issue may influence the conclusions. Breast Cancer Res Treat 2009; 117:675-7. [PMID: 19242790 DOI: 10.1007/s10549-009-0353-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2009] [Accepted: 02/17/2009] [Indexed: 11/28/2022]
|
12
|
Affiliation(s)
- Lynette Reid
- Department of Bioethics, Dalhousie University, Halifax, Nova Scotia, B3H 2N7, Canada.
| |
Collapse
|
13
|
Abstract
Quality control (QC) is a critical step in large-scale studies of genetic variation. While, on average, high-throughput single nucleotide polymorphism (SNP) genotyping assays are now very accurate, the errors that remain tend to cluster into a small percentage of “problem” SNPs, which exhibit unusually high error rates. Because most large-scale studies of genetic variation are searching for phenomena that are rare (e.g., SNPs associated with a phenotype), even this small percentage of problem SNPs can cause important practical problems. Here we describe and illustrate how patterns of linkage disequilibrium (LD) can be used to improve QC in large-scale, population-based studies. This approach has the advantage over existing filters (e.g., HWE or call rate) that it can actually reduce genotyping error rates by automatically correcting some genotyping errors. Applying this LD-based QC procedure to data from The International HapMap Project, we identify over 1,500 SNPs that likely have high error rates in the CHB and JPT samples and estimate corrected genotypes. Our method is implemented in the software package fastPHASE, available from the Stephens Lab website (http://stephenslab.uchicago.edu/software.html). In large-scale studies of population genetic data, particularly genome-wide association studies, considerable effort may be spent on quality control (QC) to ensure genotype data are accurate. Typically, QC steps are applied independently to individual marker loci, with data from suspicious loci being excluded from subsequent analyses. Here we present a new QC tool, which exploits the fact that correlation of alleles among nearby genetic loci (linkage disequilibrium; LD) provides a certain amount of redundancy in genotype information, and that high rates of genotyping error at a marker may leave their trace in unusual patterns of LD. The method (a) aids in the detection of SNP loci with possibly elevated levels of genotyping error, and (b) in some cases allows for the correction of erroneous genotype calls, thereby salvaging some of the genotype data from the QC filtering process. We confirm on data from real populations that SNPs identified by this approach do show evidence for containing actual genotyping errors, and we also examine genotype intensity plots to confirm that many individual genotypes corrected by the method do appear to be called in error. More generally, these results demonstrate the potential utility of incorporating LD information into algorithms for processing and analyzing population genotype data.
Collapse
Affiliation(s)
- Paul Scheet
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America.
| | | |
Collapse
|
14
|
Abstract
PURPOSE OF REVIEW Genetic association studies which survey the entire genome have become a common design for uncovering the genetic basis of common diseases, including lipid-related traits. Such studies have identified several novel loci which influence blood lipids. The present review highlights the statistical challenges associated with such large-scale genetic studies and discusses the available methodological strategies for handling these issues. RECENT FINDINGS The successful analysis of genome-wide data assayed on commercial genotyping arrays depends on careful exploration of the data. Unaccounted sample failures, genotyping errors and population structure can introduce misleading signals that mimic genuine association. Careful interpretation of useful summary statistics and graphical data displays can minimize the extent of false associations that need to be followed up in replication or fine-mapping experiments. SUMMARY Recently published genome-wide studies are beginning to yield valuable insights into the importance of well designed methodological and statistical techniques for sensible interpretation of the plethora of genetic data generated.
Collapse
Affiliation(s)
- Yik Y Teo
- Wellcome Trust Centre for Human Genetics, University of Oxford, UK.
| |
Collapse
|
15
|
Abstract
Published DNA data sets constitute a body of sequencing results resting in silico that are supposed to reflect the variation of (once) living cells. In cases where the DNA variation reported is suspected to be fraught with artefacts, an autopsy of the full body of data is needed to clarify the amount and causes of mis-sequencing. In this paper we elaborate on strategies that allow a clear-cut identification of the problems in severely flawed mtDNA data. This approach is applied, by way of example, to a data set of HVS-I sequences from the Caucasus, published by Nasidze & Stoneking in 2001. These data bear numerous ambiguous nucleotide positions and suffer from an even higher number of phantom mutations, indicating that severe biochemical problems adversely influenced those sequencing results at the time. Furthermore, systematic omission of sequences with a long C-stretch (incurred by a transition at position 16189) must have severely biased the data set. Since no complete correction of these data has appeared to date, this example of mis-sequencing necessitates circumstantial evidence that is bullet-proof.
Collapse
Affiliation(s)
- H-J Bandelt
- Department of Mathematics, University of Hamburg, 20146 Hamburg, Germany.
| | | |
Collapse
|
16
|
Affiliation(s)
- M Stoneking
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.
| | | |
Collapse
|
17
|
Genetics of psychiatric disorders. Nat Neurosci 2005; 8:693-693. [PMID: 15917827 DOI: 10.1038/nn0605-693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
18
|
Chen ZW, Ouyang ZH, Dong G, Li RS. [Analyzing genetic quality of BALB/c mouse strains in China by microsatellite marking]. Yi Chuan 2004; 26:845-8. [PMID: 15762004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Eleven BALB/c mouse strains of Beijing, Shanghai, Shenyang, Haerbin, Guangzhou, Chongqing and Changchun were monitored in order to assure the genetic quality of inbred BALB/c mouse strains in China and to estimate the credibility of microsatllite markers. Fourteen microsatellites loci on different chromosomes were investigated by PCR analysis. It showed that all these microsatellites DNA loci displayed single allelic gene band in mouse strains of Beijing, Shanghai and Haerbin. But the mice came from Shenyang, Guangzhou, Chongqing and Changchun had polymorphisms or heterozygosis, among which the Shenyang and Changchun strains showed polymorphisms and heterozygosis at two separate loci. Four loci showed polymorphisms or heterozygosis in one of the Guangzhou mouse strains. The Chongqing strains showed polymorphisms and heterozygosis at seven loci, including the D10Mit180 locus as compared with the Shanghai strains.
Collapse
Affiliation(s)
- Zhen-Wen Chen
- Department of Laboratory Animal Science, Capital University of Medical Sciences, Beijing, China.
| | | | | | | |
Collapse
|