1
|
Ko S, Sobel EM, Zhou H, Lange K. Estimation of genetic admixture proportions via haplotypes. Comput Struct Biotechnol J 2024; 23:4384-4395. [PMID: 39737076 PMCID: PMC11683265 DOI: 10.1016/j.csbj.2024.11.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 11/26/2024] [Accepted: 11/26/2024] [Indexed: 01/01/2025] Open
Abstract
Estimation of ancestral admixture is essential for creating personal genealogies, studying human history, and conducting genome-wide association studies (GWAS). The following three primary methods exist for estimating admixture coefficients. The frequentist approach directly maximizes the binomial loglikelihood. The Bayesian approach adds a reasonable prior and samples the posterior distribution. Finally, the nonparametric approach decomposes the genotype matrix algebraically. Each approach scales successfully to datasets with a million individuals and a million single nucleotide polymorphisms (SNPs). Despite their variety, all current approaches assume independence between SNPs. To achieve independence requires performing LD (linkage disequilibrium) filtering before analysis. Unfortunately, this tactic loses valuable information and usually retains many SNPs still in LD. The present paper explores the option of explicitly incorporating haplotypes in ancestry estimation. Our program, HaploADMIXTURE, operates on adjacent SNP pairs and jointly estimates their haplotype frequencies along with admixture coefficients. This more complex strategy takes advantage of the rich information available in haplotypes and ultimately yields better admixture estimates and better clustering of real populations in curated datasets.
Collapse
Affiliation(s)
- Seyoon Ko
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Mathematics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Eric M. Sobel
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Hua Zhou
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kenneth Lange
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
2
|
Ko S, Chu BB, Peterson D, Okenwa C, Papp JC, Alexander DH, Sobel EM, Zhou H, Lange KL. Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets. Am J Hum Genet 2023; 110:314-325. [PMID: 36610401 PMCID: PMC9943729 DOI: 10.1016/j.ajhg.2022.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 12/12/2022] [Indexed: 01/09/2023] Open
Abstract
Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 105 to 106 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.
Collapse
Affiliation(s)
- Seyoon Ko
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Benjamin B. Chu
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Daniel Peterson
- Department of Mathematics, Brigham Young University, Provo, UT 84602, USA
| | - Chidera Okenwa
- Department of Mathematics, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Jeanette C. Papp
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Eric M. Sobel
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA,Corresponding author
| | - Hua Zhou
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kenneth L. Lange
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
3
|
Khan SU, Saeed S, Khan MHU, Fan C, Ahmar S, Arriagada O, Shahzad R, Branca F, Mora-Poblete F. Advances and Challenges for QTL Analysis and GWAS in the Plant-Breeding of High-Yielding: A Focus on Rapeseed. Biomolecules 2021; 11:1516. [PMID: 34680149 PMCID: PMC8533950 DOI: 10.3390/biom11101516] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 10/07/2021] [Accepted: 10/11/2021] [Indexed: 12/15/2022] Open
Abstract
Yield is one of the most important agronomic traits for the breeding of rapeseed (Brassica napus L), but its genetic dissection for the formation of high yield remains enigmatic, given the rapid population growth. In the present review, we review the discovery of major loci underlying important agronomic traits and the recent advancement in the selection of complex traits. Further, we discuss the benchmark summary of high-throughput techniques for the high-resolution genetic breeding of rapeseed. Biparental linkage analysis and association mapping have become powerful strategies to comprehend the genetic architecture of complex agronomic traits in crops. The generation of improved crop varieties, especially rapeseed, is greatly urged to enhance yield productivity. In this sense, the whole-genome sequencing of rapeseed has become achievable to clone and identify quantitative trait loci (QTLs). Moreover, the generation of high-throughput sequencing and genotyping techniques has significantly enhanced the precision of QTL mapping and genome-wide association study (GWAS) methodologies. Furthermore, this study demonstrates the first attempt to identify novel QTLs of yield-related traits, specifically focusing on ovule number per pod (ON). We also highlight the recent breakthrough concerning single-locus-GWAS (SL-GWAS) and multi-locus GWAS (ML-GWAS), which aim to enhance the potential and robust control of GWAS for improved complex traits.
Collapse
Affiliation(s)
- Shahid Ullah Khan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; (S.U.K.); (S.S.); (M.H.U.K.)
| | - Sumbul Saeed
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; (S.U.K.); (S.S.); (M.H.U.K.)
| | - Muhammad Hafeez Ullah Khan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; (S.U.K.); (S.S.); (M.H.U.K.)
| | - Chuchuan Fan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; (S.U.K.); (S.S.); (M.H.U.K.)
| | - Sunny Ahmar
- Institute of Biological Sciences, University of Talca, 1 Poniente 1141, Talca 3465548, Chile;
| | - Osvin Arriagada
- Departamento de Ciencias Vegetales, Facultad de Agronomía e Ingeniería Forestal, Pontificia Universidad Católica de Chile, Santiago 7820436, Chile;
| | - Raheel Shahzad
- Department of Biotechnology, Faculty of Science & Technology, Universitas Muhammadiyah Bandung, Bandung 40614, Indonesia;
| | - Ferdinando Branca
- Department of Agriculture, Food and Environment (Di3A), University of Catania, 95123 Catania, Italy;
| | - Freddy Mora-Poblete
- Institute of Biological Sciences, University of Talca, 1 Poniente 1141, Talca 3465548, Chile;
| |
Collapse
|
4
|
Miles AM, Huson HJ. Graduate Student Literature Review: Understanding the genetic mechanisms underlying mastitis. J Dairy Sci 2020; 104:1183-1191. [PMID: 33162090 DOI: 10.3168/jds.2020-18297] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 08/16/2020] [Indexed: 01/24/2023]
Abstract
Mastitis is the costliest disease facing dairy producers today; consequently, it has been the subject of substantial research focus. Efforts have evolved from an initial focus on understanding the etiology of intramammary infections to the application of preventative measures, including attempts to breed cows that are resistant to infection. However, breeding for resistance to infection has proven difficult, given the complexity of the disease and the high expense associated with assembling high-quality genotypes and phenotypes. This review provides a brief background on mastitis; illustrates current understanding of the genetics influencing mastitis and the application of this knowledge; and discusses challenges and limitations in understanding these mechanisms and applying these findings to genetic improvement strategies.
Collapse
Affiliation(s)
- Asha M Miles
- Department of Animal Science, Cornell University, Ithaca, NY 14853.
| | - Heather J Huson
- Department of Animal Science, Cornell University, Ithaca, NY 14853.
| |
Collapse
|
5
|
Ray R, Li D, Halitschke R, Baldwin IT. Using natural variation to achieve a whole-plant functional understanding of the responses mediated by jasmonate signaling. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 99:414-425. [PMID: 30927293 DOI: 10.1111/tpj.14331] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 02/25/2019] [Accepted: 02/27/2019] [Indexed: 06/09/2023]
Abstract
The dramatic advances in our understanding of the molecular biology and biochemistry of jasmonate (JA) signaling have been the subject of several excellent recent reviews that have highlighted the phytohormonal function of this signaling pathway. Here, we focus on the responses mediated by JA signaling which have consequences for a plant's Darwinian fitness, i.e. the organism-level function of JA signaling. The most diverse module in the signaling cascade, the JAZ proteins, and their interactions with other proteins and transcription factors, allow this canonical signaling cascade to mediate a bewildering array of traits in different tissues at different times; the functional coherence of these diverse responses are best appreciated in an organismal/ecological context. From published work, it appears that jasmonates can function as the 'Swiss Army knife' of plant signaling, mediating many different biotic and abiotic stress and developmental responses that allow plants to contextualize their responses to their frequently changing local environments and optimize their fitness. We propose that a deeper analysis of the natural variation in both within-plant and within-population JA signaling components is a profitable means of attaining a coherent whole-plant functional perspective of this signaling cascade, and provide examples of this approach from the Nicotiana attenuata system.
Collapse
Affiliation(s)
- Rishav Ray
- Department of Molecular Ecology, Max Planck Institute for Chemical Ecology, Hans-Knöll-Straße 8, D-07745, Jena, Germany
| | - Dapeng Li
- Department of Molecular Ecology, Max Planck Institute for Chemical Ecology, Hans-Knöll-Straße 8, D-07745, Jena, Germany
| | - Rayko Halitschke
- Department of Molecular Ecology, Max Planck Institute for Chemical Ecology, Hans-Knöll-Straße 8, D-07745, Jena, Germany
| | - Ian T Baldwin
- Department of Molecular Ecology, Max Planck Institute for Chemical Ecology, Hans-Knöll-Straße 8, D-07745, Jena, Germany
| |
Collapse
|
6
|
Polygenic and environmental influences on the course of African Americans' alcohol use from early adolescence through young adulthood. Dev Psychopathol 2019; 32:703-718. [PMID: 31256767 DOI: 10.1017/s0954579419000701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The study examined (a) whether alcohol use subgroups could be identified among African Americans assessed from adolescence through early adulthood, and (b) whether subgroup membership was associated with the interaction between internalizing symptoms and antisocial behavior polygenic risk scores (PRSs) and environmental characteristics (i.e., parental monitoring, community disadvantage). Participants (N = 436) were initially recruited for an elementary school-based prevention trial in a Mid-Atlantic city. Youths reported on the frequency of their past year alcohol use from ages 14-26. DNA was obtained from participants at age 21. Internalizing symptoms and antisocial behavior PRSs were created based on a genome-wide association study (GWAS) conducted by Benke et al. (2014) and Tielbeek et al. (2017), respectively. Parental monitoring and community disadvantage were assessed at age 12. Four classes of past year alcohol use were identified: (a) early-onset, increasing; (b) late-onset, moderate use; (c) low steady; and (d) early-onset, decreasing. In high community disadvantaged settings, participants with a higher internalizing symptoms PRS were more likely to be in the early-onset, decreasing class than the low steady class. When exposed to elevated community disadvantage, participants with a higher antisocial behavior PRS were more likely to be in the early-onset, increasing class than the early-onset, decreasing and late-onset, moderate use classes.
Collapse
|
7
|
Rabinowitz JA, Kuo SIC, Felder W, Musci RJ, Bettencourt A, Benke K, Sisto DY, Smail E, Uhl G, Maher BS, Kouzis A, Ialongo NS. Associations between an educational attainment polygenic score with educational attainment in an African American sample. GENES, BRAIN, AND BEHAVIOR 2019; 18:e12558. [PMID: 30793481 PMCID: PMC7008934 DOI: 10.1111/gbb.12558] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Revised: 01/25/2019] [Accepted: 02/07/2019] [Indexed: 12/13/2022]
Abstract
Polygenic propensity for educational attainment has been associated with higher education attendance, academic achievement and criminal offending in predominantly European samples; however, less is known about whether this polygenic propensity is associated with these outcomes among African Americans. Using an educational attainment polygenic score (EA PGS), the present study examined whether this score was associated with post-secondary education, academic achievement and criminal offending in an urban, African American sample. Three cohorts of participants (N = 1050; 43.9% male) were initially recruited for an elementary school-based universal prevention trial in a Mid-Atlantic city and followed into young adulthood. Standardized tests of reading and math achievement were administered in first grade. At age 20, participants reported on their level of education attained, and records of incarceration were obtained from Maryland's Criminal Justice Information System. In young adulthood, DNA was collected and extracted from blood or buccal swabs and genotyped. An EA PGS was created using results from a large-scale genome-wide association study on educational attainment. A higher EA PGS was associated with a greater log odds of post-secondary education. The EA PGS was not associated with reading achievement, although a significant relationship was found with math achievement in the third cohort. These findings contribute to the dearth of molecular genetics work conducted in African American samples and highlight that polygenic propensity for educational attainment is associated with higher education attendance.
Collapse
Affiliation(s)
- Jill A. Rabinowitz
- Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland
| | - Sally I.-C. Kuo
- Department of Psychology, Virginia Commonwealth University College of Humanities and Sciences, Richmond, Virginia
| | - William Felder
- Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland
| | - Rashelle J. Musci
- Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland
| | - Amie Bettencourt
- Department of Medicine, Division of Child and Adolescent Psychiatry, Johns Hopkins University, Baltimore, Maryland
| | - Kelly Benke
- Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland
| | - Danielle Y. Sisto
- Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland
| | - Emily Smail
- Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland
| | - George Uhl
- Office of Research & Development, New Mexico VA Health Care System, Albuquerque, New Mexico
| | - Brion S. Maher
- Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland
| | - Anthony Kouzis
- Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland
| | - Nicholas S. Ialongo
- Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland
| |
Collapse
|
8
|
Rabinowitz JA, Musci RJ, Milam AJ, Benke K, Uhl GR, Sisto DY, Ialongo NS, Maher BS. The interplay between externalizing disorders polygenic risk scores and contextual factors on the development of marijuana use disorders. Drug Alcohol Depend 2018; 191:365-373. [PMID: 30195949 PMCID: PMC8005265 DOI: 10.1016/j.drugalcdep.2018.07.016] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Revised: 07/13/2018] [Accepted: 07/18/2018] [Indexed: 12/28/2022]
Abstract
Externalizing disorders have been extensively linked to substance use problems. However, less is known about whether genetic factors underpinning externalizing disorders and environmental features interact to predict substance use disorders (i.e., marijuana abuse and dependence) among urban African Americans. We examined whether polygenic risk scores (PRS) for conduct disorder (CD) and attention-deficit hyperactivity disorder (ADHD) interacted with contextual factors (i.e., parental monitoring, community disadvantage) to influence risk for marijuana use disorders in a sample of African American youth. Participants (N=1,050; 44.2% male) were initially recruited for an elementary school-based universal prevention trial in a Mid-Atlantic city and followed through age 20. Participants reported on their parental monitoring in sixth grade and whether they were diagnosed with marijuana abuse or dependence at age 20. Blood or saliva samples were genotyped using the Affymetrix 6.0 microarrays. The CD and ADHD PRS were created based on genome-wide association studies conducted by Dick et al. (2010) and Demontis et al. (2017), respectively. Community disadvantage was calculated based on census data when participants were in sixth grade. There was an interaction between the CD PRS and community disadvantage such that a higher CD PRS was associated with greater risk for a marijuana use disorder at higher levels of neighborhood disadvantage. This finding should be interpreted with caution owing to the number of significance tests performed. Implications for etiological models and future research directions are presented.
Collapse
Affiliation(s)
- Jill A Rabinowitz
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, 624 N. Broadway, Baltimore, MD 21205, United States.
| | - Rashelle J Musci
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, 624 N. Broadway, Baltimore, MD 21205, United States
| | - Adam J Milam
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, 624 N. Broadway, Baltimore, MD 21205, United States
| | - Kelly Benke
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, 624 N. Broadway, Baltimore, MD 21205, United States
| | - George R Uhl
- New Mexico VA Healthcare System, 1501 San Pedro Drive, SE, Albuquerque, NM, 87108 United States
| | - Danielle Y Sisto
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, 624 N. Broadway, Baltimore, MD 21205, United States
| | - Nicholas S Ialongo
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, 624 N. Broadway, Baltimore, MD 21205, United States
| | - Brion S Maher
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, 624 N. Broadway, Baltimore, MD 21205, United States
| |
Collapse
|
9
|
Eanes WF, Koehn RK. AN ANALYSIS OF GENETIC STRUCTURE IN THE MONARCH BUTTERFLY,
DANAUS PLEXIPPUS
L. Evolution 2017; 32:784-797. [DOI: 10.1111/j.1558-5646.1978.tb04633.x] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/1977] [Revised: 02/06/1978] [Indexed: 12/01/2022]
Affiliation(s)
- Walter F. Eanes
- Department of Ecology and Evolution State University of New York Stony Brook New York 11794
| | - Richard K. Koehn
- Department of Ecology and Evolution State University of New York Stony Brook New York 11794
| |
Collapse
|
10
|
Hilbish TJ, Koehn RK. EXCLUSION OF THE ROLE OF SECONDARY CONTACT IN AN ALLELE FREQUENCY CLINE IN THE MUSSEL MYTILUS EDULIS. Evolution 2017; 39:432-443. [PMID: 28564224 DOI: 10.1111/j.1558-5646.1985.tb05679.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/1984] [Accepted: 09/14/1984] [Indexed: 11/26/2022]
Abstract
We examined the hypothesis that secondary contact generates an allele-frequency cline at the aminopeptidase-I locus (Lap) in the marine mussel, Mytilus edulis. It has been proposed that variation at the Lap locus is neutral and that the cline results from secondary contact between differentiated oceanic and estuarine populations (Levinton, 1980). We tested this hypothesis by comparing the genotypic distributions in samples from the cline to distributions that incorporate mixing effects. We employed a statistical model that determines the degree of contact using a maximum likelihood estimator and then incorporates the mixing estimates into an expected distribution of genotypes. Wahlund effects resulting from possible admixture are thereby incorporated into the expected distribution. Failure of the model to reconcile the observed with the expected distribution of genotypes indicates that the observed population structure does not result from admixture. The null hypothesis of mixing was unable to explain about 33% of the samples. Combined tests demonstrated the general departure from the mixing model to be highly significant. The distribution of heterozygote discrepancies across the cline was inconsistent with the expectations of a mixing model. Therefore we reject explanations for the structure of the Lap cline that involve secondary contact. Selection directed at the Lap locus appears necessary to explain the genotypic structure of clinal populations.
Collapse
Affiliation(s)
- Thomas J Hilbish
- Department of Ecology and Evolution, State University of New York at Stony Brook, Stony Brook, NY, 11794
| | - Richard K Koehn
- Department of Ecology and Evolution, State University of New York at Stony Brook, Stony Brook, NY, 11794
| |
Collapse
|
11
|
Hedrick PW, Parker KM. MHC VARIATION IN THE ENDANGERED GILA TOPMINNOW. Evolution 2017; 52:194-199. [PMID: 28568136 DOI: 10.1111/j.1558-5646.1998.tb05152.x] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/1997] [Accepted: 11/13/1997] [Indexed: 11/30/2022]
Abstract
Sequence variation at a major histocompatibility complex (MHC) gene, assumed to be involved in parasite and pathogen resistance, was examined in the endangered Gila topminnow (Poeciliopis o. occidentalis), from the four watersheds where they remain in the United States. This is the first estimate of variation in MHC genes in natural populations of an endangered species. The population that has experienced the most bottlenecks historically was monomorphic for MHC variation. Another population, which earlier had been found to be the only population polymorphic for allozymes, had five MHC alleles, four different from those found in the other populations. Overall, nine different alleles were found. The four populations were highly divergent at MHC with four of the six population pairs not sharing any alleles. However, the magnitude of differentiation between populations on the amino-acid level varied fivefold for the populations that shared no alleles. Using single-stranded conformational polymorphism (SSCP), these alleles segregated consistently with Mendelian expectations in families. Because of the high genetic differentiation between these populations for a potentially adaptive gene, we recommend that the four watersheds be examined further for separate conservation and management.
Collapse
Affiliation(s)
- Philip W Hedrick
- Department of Biology, Arizona State University, Tempe, Arizona, 85287-1501
| | - Karen M Parker
- Department of Biology, Arizona State University, Tempe, Arizona, 85287-1501
| |
Collapse
|
12
|
Koehn RK, Milkman R, Mitton JB. POPULATION GENETICS OF MARINE PELECYPODS. IV. SELECTION, MIGRATION AND GENETIC DIFFERENTIATION IN THE BLUE MUSSEL
MYTILUS EDULIS. Evolution 2017; 30:2-32. [DOI: 10.1111/j.1558-5646.1976.tb00878.x] [Citation(s) in RCA: 141] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/1975] [Indexed: 11/28/2022]
|
13
|
Hejase HA, Liu KJ. Mapping the genomic architecture of adaptive traits with interspecific introgressive origin: a coalescent-based approach. BMC Genomics 2016; 17 Suppl 1:8. [PMID: 26819241 PMCID: PMC4895787 DOI: 10.1186/s12864-015-2298-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Recent studies of eukaryotes including human and Neandertal, mice, and butterflies have highlighted the major role that interspecific introgression has played in adaptive trait evolution. A common question arises in each case: what is the genomic architecture of the introgressed traits? One common approach that can be used to address this question is association mapping, which looks for genotypic markers that have significant statistical association with a trait. It is well understood that sample relatedness can be a confounding factor in association mapping studies if not properly accounted for. Introgression and other evolutionary processes (e.g., incomplete lineage sorting) typically introduce variation among local genealogies, which can also differ from global sample structure measured across all genomic loci. In contrast, state-of-the-art association mapping methods assume fixed sample relatedness across the genome, which can lead to spurious inference. We therefore propose a new association mapping method called Coal-Map, which uses coalescent-based models to capture local genealogical variation alongside global sample structure. Using simulated and empirical data reflecting a range of evolutionary scenarios, we compare the performance of Coal-Map against EIGENSTRAT, a leading association mapping method in terms of its popularity, power, and type I error control. Our empirical data makes use of hundreds of mouse genomes for which adaptive interspecific introgression has recently been described. We found that Coal-Map's performance is comparable or better than EIGENSTRAT in terms of statistical power and false positive rate. Coal-Map's performance advantage was greatest on model conditions that most closely resembled empirically observed scenarios of adaptive introgression. These conditions had: (1) causal SNPs contained in one or a few introgressed genomic loci and (2) varying rates of gene flow - from high rates to very low rates where incomplete lineage sorting dominated as a primary cause of local genealogical variation.
Collapse
Affiliation(s)
- Hussein A Hejase
- Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, East Lansing, 48824, MI, USA.
| | - Kevin J Liu
- Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, East Lansing, 48824, MI, USA.
| |
Collapse
|
14
|
Lacour A, Schüller V, Drichel D, Herold C, Jessen F, Leber M, Maier W, Noethen MM, Ramirez A, Vaitsiakhovich T, Becker T. Novel genetic matching methods for handling population stratification in genome-wide association studies. BMC Bioinformatics 2015; 16:84. [PMID: 25880419 PMCID: PMC4367953 DOI: 10.1186/s12859-015-0521-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Accepted: 02/27/2015] [Indexed: 11/21/2022] Open
Abstract
Background A usually confronted problem in association studies is the occurrence of population stratification. In this work, we propose a novel framework to consider population matchings in the contexts of genome-wide and sequencing association studies. We employ pairwise and groupwise optimal case-control matchings and present an agglomerative hierarchical clustering, both based on a genetic similarity score matrix. In order to ensure that the resulting matches obtained from the matching algorithm capture correctly the population structure, we propose and discuss two stratum validation methods. We also invent a decisive extension to the Cochran-Armitage Trend test to explicitly take into account the particular population structure. Results We assess our framework by simulations of genotype data under the null hypothesis, to affirm that it correctly controls for the type-1 error rate. By a power study we evaluate that structured association testing using our framework displays reasonable power. We compare our result with those obtained from a logistic regression model with principal component covariates. Using the principal components approaches we also find a possible false-positive association to Alzheimer’s disease, which is neither supported by our new methods, nor by the results of a most recent large meta analysis or by a mixed model approach. Conclusions Matching methods provide an alternative handling of confounding due to population stratification for statistical tests for which covariates are hard to model. As a benchmark, we show that our matching framework performs equally well to state of the art models on common variants. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0521-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- André Lacour
- German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| | - Vitalia Schüller
- German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| | - Dmitriy Drichel
- German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| | - Christine Herold
- German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| | - Frank Jessen
- German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, Bonn, 53127, Germany. .,Abteilung für Psychiatrie und Psychotherapie, Universitätsklinikum Bonn, Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| | - Markus Leber
- Institut für Medizinische Biometrie, Informatik und Epidemiologie, Universität Bonn, Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| | - Wolfgang Maier
- Abteilung für Psychiatrie und Psychotherapie, Universitätsklinikum Bonn, Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| | - Markus M Noethen
- Institut für Humangenetik and Life & Brain Center, Universität Bonn, Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| | - Alfredo Ramirez
- Abteilung für Psychiatrie und Psychotherapie, Universitätsklinikum Bonn, Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| | - Tatsiana Vaitsiakhovich
- Institut für Medizinische Biometrie, Informatik und Epidemiologie, Universität Bonn, Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| | - Tim Becker
- German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, Bonn, 53127, Germany. .,Institut für Medizinische Biometrie, Informatik und Epidemiologie, Universität Bonn, Sigmund-Freud-Str. 25, Bonn, 53127, Germany.
| |
Collapse
|
15
|
Kadri NK, Guldbrandtsen B, Sørensen P, Sahana G. Comparison of genome-wide association methods in analyses of admixed populations with complex familial relationships. PLoS One 2014; 9:e88926. [PMID: 24662750 PMCID: PMC3963841 DOI: 10.1371/journal.pone.0088926] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 01/14/2014] [Indexed: 11/28/2022] Open
Abstract
Population structure is known to cause false-positive detection in association studies. We compared the power, precision, and type-I error rates of various association models in analyses of a simulated dataset with structure at the population (admixture from two populations; P) and family (K) levels. We also compared type-I error rates among models in analyses of publicly available human and dog datasets. The models corrected for none, one, or both structure levels. Correction for K was performed with linear mixed models incorporating familial relationships estimated from pedigrees or genetic markers. Linear models that ignored K were also tested. Correction for P was performed using principal component or structured association analysis. In analyses of simulated and real data, linear mixed models that corrected for K were able to control for type-I error, regardless of whether they also corrected for P. In contrast, correction for P alone in linear models was insufficient. The power and precision of linear mixed models with and without correction for P were similar. Furthermore, power, precision, and type-I error rate were comparable in linear mixed models incorporating pedigree and genomic relationships. In summary, in association studies using samples with both P and K, ancestries estimated using principal components or structured assignment were not sufficient to correct type-I errors. In such cases type-I errors may be controlled by use of linear mixed models with relationships derived from either pedigree or from genetic markers.
Collapse
Affiliation(s)
- Naveen K. Kadri
- Centre for Quantitative genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Bernt Guldbrandtsen
- Centre for Quantitative genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Peter Sørensen
- Centre for Quantitative genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Goutam Sahana
- Centre for Quantitative genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
16
|
Zawistowski M, Reppell M, Wegmann D, St Jean PL, Ehm MG, Nelson MR, Novembre J, Zöllner S. Analysis of rare variant population structure in Europeans explains differential stratification of gene-based tests. Eur J Hum Genet 2014; 22:1137-44. [PMID: 24398795 DOI: 10.1038/ejhg.2013.297] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Revised: 11/27/2013] [Accepted: 11/28/2013] [Indexed: 11/09/2022] Open
Abstract
There is substantial interest in the role of rare genetic variants in the etiology of complex human diseases. Several gene-based tests have been developed to simultaneously analyze multiple rare variants for association with phenotypic traits. The tests can largely be partitioned into two classes - 'burden' tests and 'joint' tests - based on how they accumulate evidence of association across sites. We used the empirical joint site frequency spectra of rare, nonsynonymous variation from a large multi-population sequencing study to explore the effect of realistic rare variant population structure on gene-based tests. We observed an important difference between the two test classes: their susceptibility to population stratification. Focusing on European samples, we found that joint tests, which allow variants to have opposite directions of effect, consistently showed higher levels of P-value inflation than burden tests. We determined that the differential stratification was caused by two specific patterns in the interpopulation distribution of rare variants, each correlating with inflation in one of the test classes. The pattern that inflates joint tests is more prevalent in real data, explaining the higher levels of inflation in these tests. Furthermore, we show that the different sources of inflation between tests lead to heterogeneous responses to genomic control correction and the number of variants analyzed. Our results indicate that care must be taken when interpreting joint and burden analyses of the same set of rare variants, in particular, to avoid mistaking inflated P-values in joint tests for stronger signals of true associations.
Collapse
Affiliation(s)
- Matthew Zawistowski
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Mark Reppell
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Daniel Wegmann
- Department of Biology, University of Fribourg, Fribourg, Switzerland
| | - Pamela L St Jean
- Quantitative Sciences, GlaxoSmithKline, Research Triangle Park, NC, USA
| | - Margaret G Ehm
- Quantitative Sciences, GlaxoSmithKline, Research Triangle Park, NC, USA
| | - Matthew R Nelson
- Quantitative Sciences, GlaxoSmithKline, Research Triangle Park, NC, USA
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Sebastian Zöllner
- 1] Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA [2] Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
17
|
Alexander M, Karmaus W, Holloway JW, Zhang H, Roberts G, Kurukulaaratchy RJ, Arshad SH, Ewart S. Effect of GSTM2-5 polymorphisms in relation to tobacco smoke exposures on lung function growth: a birth cohort study. BMC Pulm Med 2013; 13:56. [PMID: 24004509 PMCID: PMC3846453 DOI: 10.1186/1471-2466-13-56] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 08/20/2013] [Indexed: 02/07/2023] Open
Abstract
Background Genetic variation within GSTM2-5 genes may interfere with detoxification of environmental compounds, thereby having a detrimental effect on lung function following exposures such as tobacco smoke. We aim to investigate the influence of variants and associated methylation in the GSTM gene cluster with changes in lung function growth during adolescence. Methods Growth in forced expiratory volume (FEV1), forced vital capacity (FVC), and change in FEV1/FVC ratio measures were obtained from children in the Isle of Wight birth cohort at ages 10 and 18. Illumina GoldenGate assays were used to genotype 10 tagging polymorphisms from GSTM2 (rs574344 and rs12024479), GSTM3 (rs1537236, rs7483, and rs10735234), GSTM4 (rs668413, rs560018, and rs506008), and GSTM5 (rs929166 and rs11807) genes. Diplotypes were generated in the software Phase 3.0.2. DNA methylation was measured in over 450,000 CpG sites using the Infinium HumanMethylation450 BeadChip (Illumina 450K) in a subsample of 245 18-year olds from the Isle of Wight birth cohort. Gender, age, in utero smoke exposure, secondhand smoke exposure (SHS), and current smoking status were assessed via questionnaire; smoke exposures were validated with urine cotinine. We used linear mixed models to estimate the effect of GSTM diplotypes on lung function across time and examine interactions with tobacco smoke. Results 1,121 (77%) out of 1,456 children had information on lung function at ages 10 or 18. After adjustment for false discovery rate, one diplotype in GSTM3 had a detrimental effect on changes in FEV1 (p=0.03), and another diplotype in GSTM3 reduced FVC (p=0.02) over time. No significant interactions with smoking were identified. SHS significantly modified the relationship between diplotypes and methylation levels in one GSTM2 CpG site; however, this site did not predict lung function outcomes at age 18. Joint effects of GSTM loci and CpG sites located within these loci on adolescent lung growth were detected. Conclusions Diplotypes within GSTM2-5 genes are associated with lung function growth across adolescence, but do not appear to modify the effect of tobacco smoke exposures on adolescent lung growth. Interactions between DNA methylation and diplotypes should be taken into account to gain further understanding on lung function in adolescence.
Collapse
Affiliation(s)
- Melannie Alexander
- Division of Epidemiology, Biostatistics and Environmental Health, School of Public Health, University of Memphis, 236A Robison Hall, Memphis, TN 38152, USA.
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Libiger O, Schork NJ. A Method for Inferring an Individual's Genetic Ancestry and Degree of Admixture Associated with Six Major Continental Populations. Front Genet 2013; 3:322. [PMID: 23335941 PMCID: PMC3543981 DOI: 10.3389/fgene.2012.00322] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Accepted: 12/24/2012] [Indexed: 01/27/2023] Open
Abstract
The determination of the ancestry and genetic backgrounds of the subjects in genetic and general epidemiology studies is a crucial component in the analysis of relevant outcomes or associations. Although there are many methods for differentiating ancestral subgroups among individuals based on genetic markers only a few of these methods provide actual estimates of the fraction of an individual’s genome that is likely to be associated with different ancestral populations. We propose a method for assigning ancestry that works in stages to refine estimates of ancestral population contributions to individual genomes. The method leverages genotype data in the public domain obtained from individuals with known ancestries. Although we showcase the method in the assessment of ancestral genome proportions leveraging largely continental populations, the strategy can be used for assessing within-continent or more subtle ancestral origins with the appropriate data.
Collapse
Affiliation(s)
- Ondrej Libiger
- Department of Molecular and Experimental Medicine, The Scripps Research Institute and the Scripps Translational Science Institute La Jolla, CA, USA
| | | |
Collapse
|
19
|
Lai CQ. Adaptive genetic variation and population differences. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2012; 108:461-89. [PMID: 22656388 DOI: 10.1016/b978-0-12-398397-8.00018-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Since the expansion of modern humans (Homo sapiens) from Africa to the rest of the world between 50,000 and 100,000 years ago, the human genome has been shaped not only by demographic history but also by adaptation to local environments, including regional climate, landscape, food sources, culture, and pathogens. Genetic differences among populations interact with environmental factors, such as diet and lifestyle, leading to differences in nutrient metabolism, which translate into differences in susceptibility to a variety of diseases. Individuals from different populations sharing the same environments can exhibit differences in disease risk, as do individuals from the same population living in various regions of the globe. Therefore, it is important to understand how adaptive genetic variations interact with environments to influence health. This knowledge will provide a broad foundation for designing experiments and approaches in nutrigenomics research and strengthening the knowledge base for dietary recommendations for disease prevention. The objectives of this chapter are to (1) understand the methodology employed in examining adaptive genetic variation across populations, (2) establish the importance of adaptive genetic variation to human health, and (3) discuss the implications for nutrigenomics research and disease prevention.
Collapse
Affiliation(s)
- Chao-Qiang Lai
- Nutrition and Genomics Laboratory, Jean Meyer USDA Human Nutrition Research Center on Aging, Tufts University, Boston, Massachusetts, USA
| |
Collapse
|
20
|
Yu Z, Deng L. Pseudosibship methods in the case-parents design. Stat Med 2011; 30:3236-51. [PMID: 21953439 DOI: 10.1002/sim.4397] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2011] [Revised: 06/06/2011] [Accepted: 08/10/2011] [Indexed: 01/21/2023]
Abstract
Recent evidence suggests that complex traits are likely determined by multiple loci, each of which contributes a weak to moderate individual effect. Although extensive literature exists on multilocus analysis of unrelated subjects, there are relatively fewer strategies for jointly analyzing multiple loci using family data. Here we address this issue by evaluating two pseudosibship methods: the 1:1 matching, which matches each affected offspring to the pseudosibling formed by the alleles not transmitted to the affected offspring, and the exhaustive matching, which matches each affected offspring to the pseudosiblings formed by all the other possible combinations of parental alleles. We prove that the two matching strategies use exactly and approximately the same amount of information from data under additive and multiplicative genetic models, respectively. Using numerical calculations under a variety of models and testing assumptions, we show that compared with the exhaustive matching, the 1:1 matching has comparable asymptotic power in detecting multiplicative/additive effects in single-locus analysis and main effects in multilocus analysis, and it allows association testing of multiple linked loci. These results pave the way for many existing multilocus analysis methods developed for the case-control (or matched case-control) design to be applied to case-parents data with minor modifications. As an example, with the 1:1 matching, we applied an L1 regularized regression to a Crohn's disease dataset. Using the multiple loci selected in our approach, we obtained an order-of-magnitude decrease in p-value and an 18.9% increase in prediction accuracy when compared with using the most significant individual locus.
Collapse
Affiliation(s)
- Zhaoxia Yu
- Department of Statistics, University of California, Irvine, CA 92697, USA.
| | | |
Collapse
|
21
|
Yu Z, Wang S. Contrasting linkage disequilibrium as a multilocus family-based association test. Genet Epidemiol 2011; 35:487-98. [PMID: 21769928 DOI: 10.1002/gepi.20598] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2010] [Revised: 04/20/2011] [Accepted: 04/24/2011] [Indexed: 02/04/2023]
Abstract
Linkage disequilibrium (LD) of genetic loci is routinely estimated and graphically illustrated in genetic association studies. It has been suggested that the information in LD is also useful for association mapping and genetic association can be detected by comparing LD patterns between cases and controls. Here, we extend this idea to analyze case-parents data by comparing LD patterns between transmitted and nontransmitted genotypes. We provide the condition when contrasting LD is valid for testing gene-gene interactions. A permutation procedure is given to assess statistical significance. One advantage of our proposed methods is that haplotype information is not required. Thus, the implementation of our methods is straightforward and the resulted tests are free from potential bias caused by assumptions made to estimate haplotypes in silico. Since our test statistics use pairwise LD measurements, they are less affected by missing data than many other multilocus methods. With simulated data, we demonstrate that examining LD patterns of case-parents data is a useful multilocus association mapping strategy and it complements existing association mapping methods. The application of our methods to a Crohn's disease data set shows that our methods can detect multilocus association that might be missed by other association methods. Our permutation procedure can also be modified to allow multiple offspring from a family to be analyzed.
Collapse
Affiliation(s)
- Zhaoxia Yu
- Department of Statistics, University of California-Irvine, CA 92697, USA.
| | | |
Collapse
|
22
|
Conditions under which genome-wide association studies will be positively misleading. Genetics 2010; 186:1045-52. [PMID: 20813880 DOI: 10.1534/genetics.110.121665] [Citation(s) in RCA: 130] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Genome-wide association mapping is a popular method for using natural variation within a species to generate a genotype-phenotype map. Statistical association between an allele at a locus and the trait in question is used as evidence that variation at the locus is responsible for variation of the trait. Indirect association, however, can give rise to statistically significant results at loci unrelated to the trait. We use a haploid, three-locus, binary genetic model to describe the conditions under which these indirect associations become stronger than any of the causative associations in the organism--even to the point of representing the only associations present in the data. These indirect associations are the result of disequilibrium between multiple factors affecting a single trait. Epistasis and population structure can exacerbate the problem but are not required to create it. From a statistical point of view, indirect associations are true associations rather than the result of stochastic noise: they will not be ameliorated by increasing sampling size or marker density and can be reproduced in independent studies.
Collapse
|
23
|
LI TENGFEI, LI ZHAOHAI, YING ZHILIANG, ZHANG HONG. Influence of population stratification on population-based marker-disease association analysis. Ann Hum Genet 2010; 74:351-60. [PMID: 20529080 PMCID: PMC2897957 DOI: 10.1111/j.1469-1809.2010.00588.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Population-based genetic association analysis may suffer from the failure to control for confounders such as population stratification (PS). There has been extensive study on the influence of PS on candidate gene-disease association analysis, but much less attention has been paid to its influence on marker-disease association analysis. In this paper, we focus on the Pearson chi(2) test and the trend test for marker-disease association analysis. The mean and variance of the test statistics are derived under presence of PS, so that the power and inflated type I error rate can be evaluated. It is shown that the bias and the variance distortion are not zero in the presence of both PS and penetrance heterogeneity (PH). Unlike candidate gene-disease association analysis, when PS is present, the bias is not zero no matter whether PH is present or not. This work generalises the published results, where only the fully recessive penetrance model is considered and only the bias is calculated. It is shown that candidate gene-disease association analysis can be treated as a special case of marker-disease association analysis. Consequently, our results extend previous studies on candidate gene-disease association analysis. A simulation study confirms the theoretical findings.
Collapse
Affiliation(s)
- TENGFEI LI
- Department of Mathematics, Fudan University, 220 Handan Road, Shanghai 200433, P.R. China
| | - ZHAOHAI LI
- Department of Statistics, George Washington University, 2140 Pennsylvania Ave., N.W. Washington, DC 20052, USA
| | - ZHILIANG YING
- Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, NY 10027, USA
| | - HONG ZHANG
- Department of Statistics and Finance, University of Science and Technology of China, 96 Jinzhai Road, Hefei, Anhui 230026, P.R. China
| |
Collapse
|
24
|
Moreno-Macias H, Romieu I, London SJ, Laird NM. Gene-environment interaction tests for family studies with quantitative phenotypes: A review and extension to longitudinal measures. Hum Genomics 2010; 4:302-26. [PMID: 20650819 PMCID: PMC2952941 DOI: 10.1186/1479-7364-4-5-302] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Accepted: 04/22/2010] [Indexed: 01/08/2023] Open
Abstract
Longitudinal studies are an important tool for analysing traits that change over time, depending on individual characteristics and environmental exposures. Complex quantitative traits, such as lung function, may change over time and appear to depend on genetic and environmental factors, as well as on potential gene-environment interactions. There is a growing interest in modelling both marginal genetic effects and gene-environment interactions. In an admixed population, the use of traditional statistical models may fail to adjust for confounding by ethnicity, leading to bias in the genetic effect estimates. A variety of methods have been developed to account for the genetic substructure of human populations. Family-based designs provide an important resource for avoiding confounding due to admixture. To date, however, most genetic analyses have been applied to cross-sectional designs. In this paper, we propose a methodology which aims to improve the assessment of main genetic effect and gene-environment interaction effects by combining the advantages of both longitudinal studies for continuous phenotypes, and the family-based designs. This approach is based on an extension of ordinary linear mixed models for quantitative phenotypes, which incorporates information from a case-parent design. Our results indicate that use of this method allows both main genetic and gene-environment interaction effects to be estimated without bias, even in the presence of population substructure.
Collapse
|
25
|
Guan W, Liang L, Boehnke M, Abecasis GR. Genotype-based matching to correct for population stratification in large-scale case-control genetic association studies. Genet Epidemiol 2009; 33:508-17. [PMID: 19170134 PMCID: PMC2732762 DOI: 10.1002/gepi.20403] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Genome-wide association studies are helping to dissect the etiology of complex diseases. Although case-control association tests are generally more powerful than family-based association tests, population stratification can lead to spurious disease-marker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome-wide or large-scale candidate gene association study. GSM comprises three steps: (1) calculating similarity scores for pairs of individuals using the genotype data; (2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and (3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false-positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re-matching after genotyping is a method of choice for genome-wide association studies.
Collapse
Affiliation(s)
- Weihua Guan
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan 48109-2029, USA
| | | | | | | |
Collapse
|
26
|
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res 2009; 19:1655-64. [PMID: 19648217 DOI: 10.1101/gr.094052.109] [Citation(s) in RCA: 5590] [Impact Index Per Article: 349.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.
Collapse
Affiliation(s)
- David H Alexander
- Department of Biomathematics, University of California at Los Angeles, Los Angeles, California 90095, USA.
| | | | | |
Collapse
|
27
|
She D, Zhang H, Li Z. Testing Hardy-Weinberg equilibrium using family data from complex surveys. Ann Hum Genet 2009; 73:449-55. [PMID: 19489753 DOI: 10.1111/j.1469-1809.2009.00528.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Genetic data collected during the second phase of the Third National Health and Nutrition Examination Survey (NHANES III) enable us to investigate the association of a wide variety of health factors with regard to genetic variation. The classic question when looking into the genetic variations in a population is whether the population is in the state of Hardy-Weinberg Equilibrium (HWE). Our objective was to develop test procedures using family data from complex surveys such as NHANES III. We developed six Pearson chi(2) based tests for a diallelic locus of autosomal genes. The finite sample properties of the proposed test procedures were evaluated via Monte Carlo simulation studies and the Rao-Scott first order corrected test was recommended. Test procedures were applied to three loci from NHANES III genetic databases, i.e., ADRB2, TGFB1, and VDR. HWE was shown to hold at 0.05 level for all three loci when only families with genotypic information available for two parents and for one or more children were used in the analysis.
Collapse
Affiliation(s)
- Dewei She
- Department of Statistics, The George Washington University, 2140 Pennsylvania Avenue N.W., Washington, DC 20052, USA
| | | | | |
Collapse
|
28
|
Li Y, Graubard BI. Testing Hardy-Weinberg Equilibrium and Homogeneity of Hardy-Weinberg Disequilibrium using Complex Survey Data. Biometrics 2009; 65:1096-104. [DOI: 10.1111/j.1541-0420.2009.01199.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
29
|
Lai CQ, Tucker KL, Choudhry S, Parnell LD, Mattei J, García-Bailo B, Beckman K, Burchard EG, Ordovás JM. Population admixture associated with disease prevalence in the Boston Puerto Rican health study. Hum Genet 2009; 125:199-209. [PMID: 19107526 PMCID: PMC2727756 DOI: 10.1007/s00439-008-0612-7] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2008] [Accepted: 12/14/2008] [Indexed: 01/26/2023]
Abstract
Older Puerto Ricans living in the continental U.S. suffer from higher rates of diabetes, obesity, cardiovascular disease and depression compared to non-Hispanic White populations. Complex diseases, such as these, are likely due to multiple, potentially interacting, genetic, environmental and social risk factors. Presumably, many of these environmental and genetic risk factors are contextual. We reasoned that racial background may modify some of these risk factors and be associated with health disparities among Puerto Ricans. The contemporary Puerto Rican population is genetically heterogeneous and originated from three ancestral populations: European settlers, native Taíno Indians, and West Africans. This rich-mixed ancestry of Puerto Ricans provides the intrinsic variability needed to untangle complex gene-environment interactions in disease susceptibility and severity. Herein, we determined whether a specific ancestral background was associated with either of four major disease outcomes (diabetes, obesity, cardiovascular disease, and depression). We estimated the genetic ancestry of 1,129 subjects from the Boston Puerto Rican Health Study based on genotypes of 100 ancestry informative markers (AIMs). We examined the effects of ancestry on tests of association between single AIMs and disease traits. The ancestral composition of this population was 57.2% European, 27.4% African, and 15.4% Native American. African ancestry was negatively associated with type 2 diabetes and cardiovascular disease, and positively correlated with hypertension. It is likely that the high prevalence rate of diabetes in Africans, Hispanics, and Native Americans is not due to genetic variation alone, but to the combined effects of genetic variation interacting with environmental and social factors.
Collapse
Affiliation(s)
- Chao-Qiang Lai
- Nutrition and Genomics Laboratory, JM-USDA Human Nutrition Research Center on Aging at Tufts University, 711 Washington St, Boston, MA 02111, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Li Z, Zhang H, Zheng G, Gastwirth JL, Gail MH. Excess false positive rate caused by population stratification and disease rate heterogeneity in case–control association studies. Comput Stat Data Anal 2009. [DOI: 10.1016/j.csda.2008.02.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
31
|
VARVIO-AHO SIRKKALIISA, JÄRVINEN OLLI, VEPSÄLÄINEN KARI, PAMILO PEKKA. Seasonal changes of the enzyme gene pool in water-striders (Gerris). Hereditas 2009. [DOI: 10.1111/j.1601-5223.1979.tb01289.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
32
|
Deviations from Hardy-Weinberg proportions for multiple alleles under viability selection. Genet Res (Camb) 2008; 90:209-16. [PMID: 18426624 DOI: 10.1017/s0016672307009068] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Departures of genotype frequencies from Hardy-Weinberg proportions (HWP) for a single autosomal locus due to viability selection in a random mating population have been studied only for the two-allele case. In this article, the analysis of deviations from HWP due to constant viability selection is extended to multiple alleles. The deviations for an autosomal locus with k alleles are measured by means of k fii fixation indices for homozygotes and k(k-1)/2 fij fixation indices for heterozygotes, and expressions are obtained for these indices (FIS statistics) under the multiallele viability model. Furthermore, expressions for fii and fij when the multiallele polymorphism is at stable equilibrium are also derived and it is demonstrated that the pattern of multiallele Hardy-Weinberg deviations at equilibrium is characterized by a global heterozygote excess and a deficiency of each of the homozygotes. This pattern may be useful for detecting whether a given multiallelic polymorphism is at stable equilibrium in the population due to viability selection. An analysis of Hardy-Weinberg deviations from published data for the three-allele polymorphism at the beta-globin locus in human populations from West Africa is presented for illustration.
Collapse
|
33
|
Yan LK, Zheng G, Li Z. Two-Stage Group Sequential Robust Tests in Family-Based Association Studies: Controlling Type I Error. Ann Hum Genet 2008; 72:557-65. [DOI: 10.1111/j.1469-1809.2008.00435.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
34
|
Chen X, Li Z. Inference of haplotype effects in case-control studies using unphased genotype and environmental data. Biom J 2008; 50:270-82. [PMID: 18217697 DOI: 10.1002/bimj.200710396] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A retrospective likelihood-based approach was proposed to test and estimate the effect of haplotype on disease risk using unphased genotype data with adjustment for environmental covariates. The proposed method was also extended to handle the data in which the haplotype and environmental covariates are not independent. Likelihood ratio tests were constructed to test the effects of haplotype and gene-environment interaction. The model parameters such as haplotype effect size was estimated using an Expectation Conditional-Maximization (ECM) algorithm developed by Meng and Rubin (1993). Model-based variance estimates were derived using the observed information matrix. Simulation studies were conducted for three different genetic effect models, including dominant effect, recessive effect, and additive effect. The results showed that the proposed method generated unbiased parameter estimates, proper type I error, and true beta coverage probabilities. The model performed well with small or large sample sizes, as well as short or long haplotypes.
Collapse
Affiliation(s)
- Xiaowu Chen
- Department of Biostatistics, Human Genome Sciences, Inc., 14200 Shady Grove Rd. Rockville, MD, USA
| | | |
Collapse
|
35
|
Blaya C, Salum GA, Lima MS, Leistner-Segal S, Manfro GG. Lack of association between the Serotonin Transporter Promoter Polymorphism (5-HTTLPR) and Panic Disorder: a systematic review and meta-analysis. Behav Brain Funct 2007; 3:41. [PMID: 17705872 PMCID: PMC1994953 DOI: 10.1186/1744-9081-3-41] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2007] [Accepted: 08/18/2007] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The aim of this study is to assess the association between the Serotonin Transporter Promoter Polymorphism (5-HTTLPR) and Panic Disorder (PD). METHODS This is a systematic review and meta-analysis of case-control studies with unrelated individuals of any ethnic origin examining the role of the 5-HTTLPR in PD according to standard diagnostic criteria (DSM or ICD). Articles published in any language between January 1996 and April 2007 were eligible. The electronic databases searched included PubMed, PsychInfo, Lilacs and ISI. Two separate analyses were performed: an analysis by alleles and a stratified analysis separating studies by the quality of control groups. Asymptotic DerSimonian and Laird's Q test were used to assess heterogeneity. Results of individual studies were combined using the fixed effect model with respective 95% confidence intervals. RESULTS Nineteen potential articles were identified, and 10 studies were included in this meta-analysis. No statistically significant association between 5-HTTLPR and PD was found, OR = 0.91 (CI95% 0.80 to 1.03, p = 0.14). Three sub-analyses divided by ethnicity, control group quality and Agoraphobia comorbidity also failed to find any significant association. No evidence of heterogeneity was found between studies in the analyses. CONCLUSION Results from this systematic review do not provide evidence to support an association between 5-HTTLPR and PD. However, more studies are needed in different ethnic populations in order to evaluate a possible minor effect.
Collapse
Affiliation(s)
- Carolina Blaya
- Post-Graduate Program in Medical Sciences, Psychiatry, Universidade Federal do Rio Grande do Sul and Anxiety Disorders Program, Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil
| | - Giovanni A Salum
- Post-Graduate Program in Medical Sciences, Psychiatry, Universidade Federal do Rio Grande do Sul and Anxiety Disorders Program, Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil
| | - Maurício S Lima
- Associate Professor of Psychiatry, Universidade Católica de Pelotas & Medical Director, Eli Lilly do, Brazil
| | - Sandra Leistner-Segal
- Post-Graduate Program in Medical Sciences, Psychiatry, Universidade Federal do Rio Grande do Sul and Anxiety Disorders Program, Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil
| | - Gisele G Manfro
- Post-Graduate Program in Medical Sciences, Psychiatry, Universidade Federal do Rio Grande do Sul and Anxiety Disorders Program, Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil
| |
Collapse
|
36
|
Dupuis J. Effect of linkage disequilibrium between markers in linkage and association analyses. Genet Epidemiol 2007; 31 Suppl 1:S139-48. [DOI: 10.1002/gepi.20291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
37
|
Musani SK, Halbert ND, Redden DT, Allison DB, Derr JN. Marker genotypes and population admixture and their association with body weight, height and relative body mass in United States federal bison herds. Genetics 2006; 174:775-83. [PMID: 16888339 PMCID: PMC1602102 DOI: 10.1534/genetics.106.057547] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Elucidating genetic influences on bison growth and body composition is of interest, not only because bison are important for historical, cultural, and agricultural reasons, but also because their unusual population history makes them valuable models for finding influential loci in both domestic cattle and humans. We tested for trait loci associated with body weight, height, and bison mass index (BMI) while controlling for estimated ancestry to reduce potential confounding effects due to population admixture in 1316 bison sampled from four U.S. herds. We used 60 microsatellite markers to model each phenotype as a function of herd, sex, age, marker genotypes, and individual ancestry estimates. Statistical significance for genotype and its interaction with ancestry was evaluated using the adaptive false discovery rate. Of the four herds, two appeared to be admixed and two were nonadmixed. Although none of the main effects of the loci were significant, estimated ancestry and its interaction with marker loci were significantly associated with the phenotypes, illustrating the importance of including ancestry in the models and the dependence of genotype-phenotype associations on background ancestry. Individual loci contributed approximately 2.0% of variation in weight, height, and BMI, which confirms the utility and potential importance of adjusting for population stratification.
Collapse
Affiliation(s)
- Solomon K Musani
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama, Alabama 35294, USA
| | | | | | | | | |
Collapse
|
38
|
Li Z, Gastwirth JL, Gail MH. Power and Related Statistical Properties of Conditional Likelihood Score Tests for Association Studies in Nuclear Families with Parental Genotypes. Ann Hum Genet 2005. [DOI: 10.1046/j.1469-1809.2005.00169.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
39
|
Purcell S, Sham P. Properties of structured association approaches to detecting population stratification. Hum Hered 2005; 58:93-107. [PMID: 15711089 DOI: 10.1159/000083030] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2004] [Accepted: 09/02/2004] [Indexed: 01/06/2023] Open
Abstract
OBJECTIVE To examine the properties of the structured association approach for the detection and correction of population stratification. METHOD A method is developed, within a latent class analysis framework, similar to the methods proposed by Satten et al. (2001) and Pritchard et al. (2000). A series of simulations illustrate the relative impact of number and type of loci, sample size and population structure. RESULTS The ability to detect stratification and assign individuals to population strata is determined for a number of different scenarios. CONCLUSION The results underline the importance of careful marker selection.
Collapse
Affiliation(s)
- Shaun Purcell
- Whitehead Institute, Nine Cambridge Center, Cambridge, MA 02129, USA.
| | | |
Collapse
|
40
|
Song K, Elston RC. A powerful method of combining measures of association and Hardy-Weinberg disequilibrium for fine-mapping in case-control studies. Stat Med 2005; 25:105-26. [PMID: 16220513 DOI: 10.1002/sim.2350] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
We present a new method for fine-mapping a disease susceptibility locus using a case-control design. The new method, termed the 'weighted average (WA) statistic', averages the Cochran-Armitage (CA) trend test statistic and the difference between the Hardy Weinberg disequilibrium test statistics (the HWD trend) for cases and controls. The main features of the WA statistic are that it mitigates against the weaknesses, and maintains the strong points, of both the CA trend test and the HWD trend test. To allow for the extra variance induced by population structure and cryptic relatedness, the WA statistic can be adjusted for variance inflation. Based on the results of a simulation study, when there is no population structure the WA test statistic shows good performance under a variety of genetic disease models. When there is population structure, the adjusted WA statistic maintains the correct probability of type I error. Under all genetic disease models investigated, the adjusted WA statistic has better power than the adjusted CA trend test, the HWD trend test or the product of the adjusted CA trend test and the HWD trend test statistics.
Collapse
Affiliation(s)
- Kijoung Song
- Glaxo Smith Kline, 1250 S. Collegeville Road, Collegeville, PA 19426, USA.
| | | |
Collapse
|
41
|
Abstract
Bipolar disorder (BD) is a chronic, potentially disabling illness with a lifetime morbid risk of approximately 1%. There is substantial evidence for a significant genetic etiology, but gene-mapping efforts have been hampered by the complex mode of inheritance and the likelihood of multiple genes of small effect. In view of the complexity, it may be instructive to understand the biological bases for pathogenesis. Extensive disruption in circadian function is known to occur among patients in relapse. Therefore, it is plausible that circadian dysfunction underlies pathogenesis. Evidence for such a hypothesis is mounting and is reviewed here. If circadian dysfunction can be established as an 'endophenotype' for BD, this may not only enable identification of more homogenous sub-groups, but may also facilitate genetic analyses. For example, it would be logical to investigate polymorphisms of genes encoding key proteins that mediate circadian rhythms. Association studies that analyzed circadian genes in BD have been initiated and are reviewed. Other avenues for research are also discussed.
Collapse
Affiliation(s)
- Hader A Mansour
- Department of Psychiatry, University of Pittsburgh School of Medicine, Western Psychiatric Institute and Clinic, PA 15213, USA
| | | | | |
Collapse
|
42
|
Li YJ, Pericak-Vance MA, Haines JL, Siddique N, McKenna-Yasek D, Hung WY, Sapp P, Allen CI, Chen W, Hosler B, Saunders AM, Dellefave LM, Brown RH, Siddique T. Apolipoprotein E is associated with age at onset of amyotrophic lateral sclerosis. Neurogenetics 2004; 5:209-13. [PMID: 15657798 DOI: 10.1007/s10048-004-0193-0] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2004] [Accepted: 08/16/2004] [Indexed: 10/26/2022]
Abstract
Apolipoprotein E (APOE) is a confirmed risk factor for Alzheimer disease. APOE is also involved in several other neurodegenerative disorders, including Parkinson disease and multiple sclerosis. Previous studies of amyotrophic lateral sclerosis (Lou Gehrig disease, ALS) have investigated the effect of APOE on the risk of developing ALS, age at onset, site of onset, and duration of the disease. The results have been inconsistent, possibly due to small sample sizes and complete reliance on case-control data. No family-based association studies were performed. To address these limitations, we investigated the relationship between APOE functional polymorphisms and age at onset of ALS in a large set of 508 families. We treated age at onset as a quantitative trait and performed family-based association analysis using the TDTQ5 method. APOE-2 is protective against earlier onset (P =0.001) with an average age at onset of APOE-2 carriers approximately 3 years later than that of non-APOE-2 carriers. Similar to our previous report, we did not find APOE associated with ALS risk. Our findings suggest that APOE may express its strongest effect through age at onset rather than on risk.
Collapse
Affiliation(s)
- Yi-Ju Li
- Center for Human Genetics, Department of Medicine, Duke University Medical Center, Durham, North Carolina 27710, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Wang Y, Localio R, Rebbeck TR. Evaluating bias due to population stratification in case-control association studies of admixed populations. Genet Epidemiol 2004; 27:14-20. [PMID: 15185399 DOI: 10.1002/gepi.20003] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The potential for bias from population stratification (PS) has raised concerns about case-control studies involving admixed ethnicities. We evaluated the potential bias due to PS in relating a binary outcome with a candidate gene under simulated settings where study populations consist of multiple ethnicities. Disease risks were assigned within the range of prostate cancer rates of African Americans reported in SEER registries assuming k=2, 5, or 10 admixed ethnicities. Genotype frequencies were considered in the range of 5-95%. Under a model assuming no genotype effect on disease (odds ratio (OR)=1), the range of observed OR estimates ignoring ethnicity was 0.64-1.55 for k=2, 0.72-1.33 for k=5, and 0.81-1.22 for k=10. When genotype effect on disease was modeled to be OR=2, the ranges of observed OR estimates were 1.28-3.09, 1.43-2.65, and 1.62-2.42 for k=2, 5, and 10 ethnicities, respectively. Our results indicate that the magnitude of bias is small unless extreme differences exist in genotype frequency. Bias due to PS decreases as the number of admixed ethnicities increases. The biases are bounded by the minimum and maximum of all pairwise baseline disease odds ratios across ethnicities. Therefore, bias due to PS alone may be small when baseline risk differences are small within major categories of admixed ethnicity, such as African Americans.
Collapse
Affiliation(s)
- Yiting Wang
- Department of Biostatistics and Epidemiology, Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania, School of Medicine, Philadelphia, Pennsylvania 19104-6021, USA
| | | | | |
Collapse
|
44
|
Lee WC. Detecting population stratification using a panel of single nucleotide polymorphisms. Int J Epidemiol 2004; 32:1120. [PMID: 14681291 DOI: 10.1093/ije/dyg301] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
45
|
Devlin B, Roeder K, Wasserman L. Genomic control, a new approach to genetic-based association studies. Theor Popul Biol 2001; 60:155-66. [PMID: 11855950 DOI: 10.1006/tpbi.2001.1542] [Citation(s) in RCA: 372] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
During the past decade, mutations affecting liability to human disease have been discovered at a phenomenal rate, and that rate is increasing. For the most part, however, those diseases have a relatively simple genetic basis. For diseases with a complex genetic and environmental basis, new approaches are needed to pave the way for more rapid discovery of genes affecting liability. One such approach exploits large, population-based samples and large-scale genotyping to evaluate disease/gene associations. A substantial drawback to such samples is the fact that population heterogeneity can induce spurious associations between genes and disease. We describe a method called genomic control (GC), which obviates many of the concerns about population substructure by using the features of the genomes present in the sample to correct for stratification. Two such approaches are now available. The GC approach exploits the fact that population substructure generate "overdispersion" of statistics used to assess association. By testing multiple polymorphisms throughout the genome, only some of which are pertinent to the disease of interest, the degree of overdispersion generated by population substructure can be estimated and taken into account. The other approach, called Structured Association (SA), assumes that the sampled population, while heterogeneous, is composed of subpopulations that are themselves homogeneous. By using multiple polymorphisms throughout the genome, SA probabilistically assigns sampled individuals to these latent subpopulations. We review in detail the overdispersion GC. In addition to outlining the published ideas on this method, we describe several extensions: quantitative trait studies and case-control studies with haplotypes and multiallelic markers. For each study design our goal is to achieve control similar to that obtained for a family-based study, but with the convenience found in a population-based design.
Collapse
Affiliation(s)
- B Devlin
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA.
| | | | | |
Collapse
|
46
|
Satten GA, Flanders WD, Yang Q. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet 2001; 68:466-77. [PMID: 11170894 PMCID: PMC1235279 DOI: 10.1086/318195] [Citation(s) in RCA: 180] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2000] [Accepted: 12/15/2000] [Indexed: 11/03/2022] Open
Abstract
We propose a novel latent-class approach to detect and account for population stratification in a case-control study of association between a candidate gene and a disease. In our approach, population substructure is detected and accounted for using data on additional loci that are in linkage equilibrium within subpopulations but have alleles that vary in frequency between subpopulations. We have tested our approach using simulated data based on allele frequencies in 12 short tandem repeat (STR) loci in four populations in Argentina.
Collapse
Affiliation(s)
- G A Satten
- Centers for Disease Control and Prevention, Atlanta, GA 30341, USA.
| | | | | |
Collapse
|
47
|
Bacanu SA, Devlin B, Roeder K. The power of genomic control. Am J Hum Genet 2000; 66:1933-44. [PMID: 10801388 PMCID: PMC1378064 DOI: 10.1086/302929] [Citation(s) in RCA: 243] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2000] [Accepted: 04/06/2000] [Indexed: 11/03/2022] Open
Abstract
Although association analysis is a useful tool for uncovering the genetic underpinnings of complex traits, its utility is diminished by population substructure, which can produce spurious association between phenotype and genotype within population-based samples. Because family-based designs are robust against substructure, they have risen to the fore of association analysis. Yet, if population substructure could be ignored, this robustness can come at the price of power. Unfortunately it is rarely evident when population substructure can be ignored. Devlin and Roeder recently have proposed a method, termed "genomic control" (GC), which has the robustness of family-based designs even though it uses population-based data. GC uses the genome itself to determine appropriate corrections for population-based association tests. Using the GC method, we contrast the power of two study designs, family trios (i.e., father, mother, and affected progeny) versus case-control. For analysis of trios, we use the TDT test. When population substructure is absent, we find GC is always more powerful than TDT; furthermore, contrary to previous results, we show that as a disease becomes more prevalent the discrepancy in power becomes more extreme. When population substructure is present, however, the results are more complex: TDT is more powerful when population substructure is substantial, and GC is more powerful otherwise. We also explore general issues of power and implementation of GC within the case-control setting and find that, economically, GC is at least comparable to and often less expensive than family-based methods. Therefore, GC methods should prove a useful complement to family-based methods for the genetic analysis of complex traits.
Collapse
Affiliation(s)
- S A Bacanu
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA.
| | | | | |
Collapse
|
48
|
Abstract
A dense set of single nucleotide polymorphisms (SNP) covering the genome and an efficient method to assess SNP genotypes are expected to be available in the near future. An outstanding question is how to use these technologies efficiently to identify genes affecting liability to complex disorders. To achieve this goal, we propose a statistical method that has several optimal properties: It can be used with case control data and yet, like family-based designs, controls for population heterogeneity; it is insensitive to the usual violations of model assumptions, such as cases failing to be strictly independent; and, by using Bayesian outlier methods, it circumvents the need for Bonferroni correction for multiple tests, leading to better performance in many settings while still constraining risk for false positives. The performance of our genomic control method is quite good for plausible effects of liability genes, which bodes well for future genetic analyses of complex disorders.
Collapse
Affiliation(s)
- B Devlin
- Department of Psychiatry, University of Pittsburgh, Pennsylvania 15213, USA.
| | | |
Collapse
|
49
|
Abstract
This review provides an overview of forensic inference from genetic markers. Because the judge and jurors are charged with decision-making, the forensic expert's job is to provide a useful summary of the evidence to the court. Hence, this review focuses on the likelihood ratio as a means of summarizing the genetic data for either criminal or civil cases. The properties of the genetic markers frequently used in today's court cases, those being VNTR loci, are discussed in detail. Unlike traditional markers, the data from VNTR loci are complicated because current molecular methods generate data that follow a finite mixture distribution. Critical ancillary issues are also covered, though not in detail.
Collapse
Affiliation(s)
- B Devlin
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06510
| |
Collapse
|
50
|
Li CC. Genetics of subdivided populations and its relationships with certain measures of association. Genet Epidemiol 1991; 8:1-11. [PMID: 2060768 DOI: 10.1002/gepi.1370080102] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Unlike inbreeding, population subdivision affects different genotypes differently for multiple alleles, so that the simple relationship between inbreeding and subdivision for two alleles no longer holds for all genotypes. In this communication, the detailed effects of subdivision have been studied for three alleles with results easily generalized to any number of alleles. Then an average effect of subdivision is proposed. This average effect of subdivision is found to play the same role as the inbreeding coefficient for multiple alleles, so that the overall relationship between inbreeding and subdivision may be reestablished again. In the final section, we discuss the relationship of the genetic result with measures of association of square contingency tables arising from epidemiological, social studies, educational and psychological studies.
Collapse
Affiliation(s)
- C C Li
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pennsylvania 15261
| |
Collapse
|