1
|
Carter AR, Anderson EL. Correct illustration of assumptions in Mendelian randomization. Int J Epidemiol 2024; 53:dyae050. [PMID: 38580457 DOI: 10.1093/ije/dyae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 03/20/2024] [Indexed: 04/07/2024] Open
Affiliation(s)
- Alice R Carter
- Human Genetics Centre of Excellence, Novo Nordisk Research Centre Oxford, Oxford, UK
| | - Emma L Anderson
- Division of Psychiatry, Department of Mental Health of Older People, University College London, London, UK
| |
Collapse
|
2
|
Brīvība M, Atava I, Pečulis R, Elbere I, Ansone L, Rozenberga M, Silamiķelis I, Kloviņš J. Evaluating the Efficacy of Type 2 Diabetes Polygenic Risk Scores in an Independent European Population. Int J Mol Sci 2024; 25:1151. [PMID: 38256224 PMCID: PMC10817091 DOI: 10.3390/ijms25021151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 01/04/2024] [Accepted: 01/09/2024] [Indexed: 01/24/2024] Open
Abstract
Numerous type 2 diabetes (T2D) polygenic risk scores (PGSs) have been developed to predict individuals' predisposition to the disease. An independent assessment and verification of the best-performing PGS are warranted to allow for a rapid application of developed models. To date, only 3% of T2D PGSs have been evaluated. In this study, we assessed all (n = 102) presently published T2D PGSs in an independent cohort of 3718 individuals, which has not been included in the construction or fine-tuning of any T2D PGS so far. We further chose the best-performing PGS, assessed its performance across major population principal component analysis (PCA) clusters, and compared it with newly developed population-specific T2D PGS. Our findings revealed that 88% of the published PGSs were significantly associated with T2D; however, their performance was lower than what had been previously reported. We found a positive association of PGS improvement over the years (p-value = 8.01 × 10-4 with PGS002771 currently showing the best discriminatory power (area under the receiver operating characteristic (AUROC) = 0.669) and PGS003443 exhibiting the strongest association PGS003443 (odds ratio (OR) = 1.899). Further investigation revealed no difference in PGS performance across major population PCA clusters and when compared with newly developed population-specific PGS. Our findings revealed a positive trend in T2D PGS performance, consistently identifying high-T2D-risk individuals in an independent European population.
Collapse
Affiliation(s)
- Monta Brīvība
- Latvian Biomedical Research and Study Centre, LV-1067 Riga, Latvia; (I.A.); (I.E.); (L.A.); (J.K.)
| | | | | | | | | | | | | | | |
Collapse
|
3
|
Liu Z, Turkmen AS, Lin S. Bayesian LASSO for population stratification correction in rare haplotype association studies. Stat Appl Genet Mol Biol 2024; 23:sagmb-2022-0034. [PMID: 38235525 PMCID: PMC10794901 DOI: 10.1515/sagmb-2022-0034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 12/19/2023] [Indexed: 01/19/2024]
Abstract
Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.
Collapse
Affiliation(s)
- Zilu Liu
- Department of Statistics, The Ohio State University, Columbus, OH43210, USA
| | | | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH43210, USA
| |
Collapse
|
4
|
Mas-Sandoval A, Mathieson S, Fumagalli M. The genomic footprint of social stratification in admixing American populations. eLife 2023; 12:e84429. [PMID: 38038347 PMCID: PMC10776089 DOI: 10.7554/elife.84429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 11/22/2023] [Indexed: 12/02/2023] Open
Abstract
Cultural and socioeconomic differences stratify human societies and shape their genetic structure beyond the sole effect of geography. Despite mating being limited by sociocultural stratification, most demographic models in population genetics often assume random mating. Taking advantage of the correlation between sociocultural stratification and the proportion of genetic ancestry in admixed populations, we sought to infer the former process in the Americas. To this aim, we define a mating model where the individual proportions of the genome inherited from Native American, European, and sub-Saharan African ancestral populations constrain the mating probabilities through ancestry-related assortative mating and sex bias parameters. We simulate a wide range of admixture scenarios under this model. Then, we train a deep neural network and retrieve good performance in predicting mating parameters from genomic data. Our results show how population stratification, shaped by socially constructed racial and gender hierarchies, has constrained the admixture processes in the Americas since the European colonization and the subsequent Atlantic slave trade.
Collapse
Affiliation(s)
- Alex Mas-Sandoval
- Department of Life Sciences, Silwood Park Campus, Imperial College LondonLondonUnited Kingdom
- Department of Statistical Sciences, University of BolognaBolognaItaly
| | - Sara Mathieson
- Department of Computer Science, Haverford CollegeHaverfordUnited States
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park Campus, Imperial College LondonLondonUnited Kingdom
- School of Biological and Behavioural Sciences, Queen Mary University of LondonLondonUnited Kingdom
| |
Collapse
|
5
|
Tanigawa Y, Kellis M. Power of inclusion: Enhancing polygenic prediction with admixed individuals. Am J Hum Genet 2023; 110:1888-1902. [PMID: 37890495 PMCID: PMC10645553 DOI: 10.1016/j.ajhg.2023.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 09/22/2023] [Accepted: 09/22/2023] [Indexed: 10/29/2023] Open
Abstract
Admixed individuals offer unique opportunities for addressing limited transferability in polygenic scores (PGSs), given the substantial trans-ancestry genetic correlation in many complex traits. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals for developing more equitable PGS models.
Collapse
Affiliation(s)
- Yosuke Tanigawa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
6
|
Liu Z, Turkmen AS, Lin S. Population stratification correction using Bayesian shrinkage priors for genetic association studies. Ann Hum Genet 2023; 87:302-315. [PMID: 37771252 DOI: 10.1111/ahg.12527] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 08/20/2023] [Accepted: 08/24/2023] [Indexed: 09/30/2023]
Abstract
INTRODUCTION Population stratification (PS) is a major source of confounding in population-based genetic association studies of quantitative traits. Principal component regression (PCR) and linear mixed model (LMM) are two commonly used approaches to account for PS in association studies. Previous studies have shown that LMM can be interpreted as including all principal components (PCs) as random-effect covariates. However, including all PCs in LMM may dilute the influence of relevant PCs in some scenarios, while including only a few preselected PCs in PCR may fail to fully capture the genetic diversity. MATERIALS AND METHODS To address these shortcomings, we introduce Bayestrat-a method to detect associated variants with PS correction under the Bayesian LASSO framework. To adjust for PS, Bayestrat accommodates a large number of PCs and utilizes appropriate shrinkage priors to shrink the effects of nonassociated PCs. RESULTS Simulation results show that Bayestrat consistently controls type I error rates and achieves higher power compared to its non-shrinkage counterparts, especially when the number of PCs included in the model is large. As a demonstration of the utility of Bayestrat, we apply it to the Multi-Ethnic Study of Atherosclerosis (MESA). Variants and genes associated with serum triglyceride or HDL cholesterol are identified in our analyses. DISCUSSION The automatic and self-selection features of Bayestrat make it particularly suited in situations with complex underlying PS scenarios, where it is unknown a priori which PCs are potential confounders, yet the number that needs to be considered could be large in order to fully account for PS.
Collapse
Affiliation(s)
- Zilu Liu
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| | - Asuman S Turkmen
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
7
|
Han S, Camp SY, Chu H, Collins R, Gillani R, Park J, Bakouny Z, Ricker CA, Reardon B, Moore N, Kofman E, Labaki C, Braun D, Choueiri TK, AlDubayan SH, Van Allen EM. Integrative Analysis of Germline Rare Variants in Clear and Non-Clear Cell Renal Cell Carcinoma. medRxiv 2023:2023.01.18.23284664. [PMID: 36712083 PMCID: PMC9882438 DOI: 10.1101/2023.01.18.23284664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
IMPORTANCE RCC encompasses a set of histologically distinct cancers with a high estimated genetic heritability, of which only a portion is currently explained. Previous rare germline variant studies in RCC have usually pooled clear and non-clear cell RCCs and have not adequately accounted for population stratification that may significantly impact the interpretation and discovery of certain candidate risk genes. OBJECTIVE To evaluate the enrichment of germline PVs in established cancer-predisposing genes (CPGs) in clear cell and non-clear cell RCC patients compared to cancer-free controls using approaches that account for population stratification and to identify unconventional types of germline RCC risk variants that confer an increased risk of developing RCC. DESIGN SETTING AND PARTICIPANTS In 1,436 unselected RCC patients with sufficient data quality, we systematically identified rare germline PVs, cryptic splice variants, and copy number variants (CNVs). From this unselected cohort, 1,356 patients were ancestry-matched with 16,512 cancer-free controls, and gene-level enrichment of rare germline PVs were assessed in 143 CPGs, followed by an investigation of somatic events in matching tumor samples. MAIN OUTCOMES AND MEASURES Gene-level burden of rare germline PVs, identification of secondary somatic events accompanying the germline PVs, and characterization of less-explored types of rare germline PVs in RCC patients. RESULTS In clear cell RCC (n = 976 patients), patients exhibited significantly higher prevalence of PVs in VHL compared to controls (OR: 39.1, 95% CI: 7.01-218.07, p-value:4.95e-05, q-value:0.00584). In non-clear cell RCC (n = 380 patients), patients carried enriched burden of PVs in FH (OR: 77.9, 95% CI: 18.68-324.97, p-value:1.55e-08, q-value: 1.83e-06) and MET (OR: 1.98e11, 95% CI: 0-inf, p-value: 2.07e-05, q-value: 3.50e-07). In a CHEK2-focused analysis with European cases and controls, clear cell RCC patients (n=906 European patients) harbored nominal enrichment of the previously reported low-penetrance CHEK2 variants, p.Ile157Thr (OR:1.84, 95% CI: 1.00-3.36, p-value:0.049) and p.Ser428Phe (OR:5.20, 95% CI: 1.00-26.40, p-value:0.045) while non-clear cell RCC patients (n=295 European patients) exhibited nominal enrichment of CHEK2 LOF germline PVs (OR: 3.51, 95% CI: 1.10-11.10, p-value: 0.033). RCC patients with germline PVs in FH, MET, and VHL exhibited significantly earlier age of cancer onset compared to patients without any germline PVs in CPGs (Mean: 46.0 vs 60.2 years old, Tukey adjusted p-value < 0.0001), and more than half had secondary somatic events affecting the same gene (n=10/15, 66.7%, 95% CI: 38.7-87.0%). Conversely, patients with rare germline PVs in CHEK2 exhibited a similar age of disease onset to patients without any identified germline PVs in CPGs (Mean: 60.1 vs 60.2 years old, Tukey adjusted p-value: 0.99), and only 30.4% of the patients carried secondary somatic events in CHEK2 (n=7/23, 95% CI: 14.1-53.0%). Finally, rare pathogenic germline cryptic splice variants underexplored in RCC were identified in SDHA and TSC1, and rare pathogenic germline CNVs were found in 18 patients, including CNVs in FH, SDHA, and VHL. CONCLUSIONS AND RELEVANCE This systematic analysis supports the existing link between several RCC risk genes and elevated RCC risk manifesting in earlier age of RCC onset. Our analysis calls for caution when assessing the role of germline PVs in CHEK2 due to the burden of founder variants with varying population frequency in different ancestry groups. It also broadens the definition of the RCC germline landscape of pathogenicity to incorporate previously understudied types of germline variants, such as cryptic splice variants and CNVs.
Collapse
Affiliation(s)
- Seunghun Han
- Ph.D. Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sabrina Y. Camp
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hoyin Chu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ryan Collins
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Riaz Gillani
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Boston Children’s Hospital, Boston, MA, USA
| | - Jihye Park
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ziad Bakouny
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Cora A. Ricker
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Brendan Reardon
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicholas Moore
- Department of Therapeutic Radiology, Yale School of Medicine, New Haven, CT, USA
| | - Eric Kofman
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
| | - Chris Labaki
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - David Braun
- Center of Molecular and Cellular Oncology, Yale School of Medicine, New Haven, CT, USA
| | - Toni K. Choueiri
- Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Brigham and Women’s Hospital, Boston, MA, USA
| | - Saud H. AlDubayan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics, Brigham and Women’s Hospital, Boston, MA, USA
- College of Medicine, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Eliezer M. Van Allen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Cancer Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Cancer Genomics, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
8
|
Lee S, Hahn G, Hecker J, Lutz SM, Mullin K, Hide W, Bertram L, DeMeo DL, Tanzi RE, Lange C, Prokopenko D. A comparison between similarity matrices for principal component analysis to assess population stratification in sequenced genetic data sets. Brief Bioinform 2023; 24:bbac611. [PMID: 36585781 PMCID: PMC9851291 DOI: 10.1093/bib/bbac611] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 12/07/2022] [Accepted: 12/11/2022] [Indexed: 01/01/2023] Open
Abstract
Genetic similarity matrices are commonly used to assess population substructure (PS) in genetic studies. Through simulation studies and by the application to whole-genome sequencing (WGS) data, we evaluate the performance of three genetic similarity matrices: the unweighted and weighted Jaccard similarity matrices and the genetic relationship matrix. We describe different scenarios that can create numerical pitfalls and lead to incorrect conclusions in some instances. We consider scenarios in which PS is assessed based on loci that are located across the genome ('globally') and based on loci from a specific genomic region ('locally'). We also compare scenarios in which PS is evaluated based on loci from different minor allele frequency bins: common (>5%), low-frequency (5-0.5%) and rare (<0.5%) single-nucleotide variations (SNVs). Overall, we observe that all approaches provide the best clustering performance when computed based on rare SNVs. The performance of the similarity matrices is very similar for common and low-frequency variants, but for rare variants, the unweighted Jaccard matrix provides preferable clustering features. Based on visual inspection and in terms of standard clustering metrics, its clusters are the densest and the best separated in the principal component analysis of variants with rare SNVs compared with the other methods and different allele frequency cutoffs. In an application, we assessed the role of rare variants on local and global PS, using WGS data from multiethnic Alzheimer's disease data sets and European or East Asian populations from the 1000 Genome Project.
Collapse
Affiliation(s)
- Sanghun Lee
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medical Consilience, Division of Medicine, Graduate school, Dankook University, South Korea
- NH Institute for Natural Product Research, Myungji Hospital, South Korea
| | - Georg Hahn
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Julian Hecker
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Sharon M Lutz
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, MA, USA
| | - Kristina Mullin
- Genetics and Aging Unit and McCance Center for Brain Health, Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | | | - Winston Hide
- Harvard Medical School, Boston, MA, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Lars Bertram
- Lübeck Interdisciplinary Platform for Genome Analytics, University of Lübeck, Lübeck, Germany
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Dawn L DeMeo
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Rudolph E Tanzi
- Harvard Medical School, Boston, MA, USA
- Genetics and Aging Unit and McCance Center for Brain Health, Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - Christoph Lange
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Dmitry Prokopenko
- Harvard Medical School, Boston, MA, USA
- Genetics and Aging Unit and McCance Center for Brain Health, Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
9
|
Kruijver M, Kelly H, Bright JA, Buckleton J. Evaluating DNA Mixtures with Contributors from Different Populations Using Probabilistic Genotyping. Genes (Basel) 2022; 14:40. [PMID: 36672780 PMCID: PMC9858364 DOI: 10.3390/genes14010040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 12/15/2022] [Accepted: 12/20/2022] [Indexed: 12/24/2022] Open
Abstract
It is common practice to evaluate DNA profiling evidence with likelihood ratios using allele frequency estimates from a relevant population. When multiple populations may be relevant, a choice has to be made. For two-person mixtures without dropout, it has been reported that conservative estimates can be obtained by using the Person of Interest’s population with a θ value of 3%. More accurate estimates can be obtained by explicitly modelling different populations. One option is to present a minimum likelihood ratio across populations; another is to present a stratified likelihood ratio that incorporates a weighted average of likelihoods across multiple populations. For high template single source profiles, any difference between the methods is immaterial as far as conclusions are concerned. We revisit this issue in the context of potentially low-level and mixed samples where the contributors may originate from different populations and study likelihood ratio behaviour. We first present a method for evaluating DNA profiling evidence using probabilistic genotyping when the contributors may originate from different ethnic groups. In this method, likelihoods are weighted across a prior distribution that assigns sample donors to ethnic groups. The prior distribution can be constrained such that all sample donors are from the same ethnic group, or all permutations can be considered. A simulation study is used to determine the effect of either assumption on the likelihood ratio. The likelihood ratios are also compared to the minimum likelihood ratio across populations. We demonstrate that the common practise of taking a minimum likelihood ratio across populations is not always conservative when FST=0. Population stratification methods may also be non-conservative in some cases. When FST>0 is used in the likelihood ratio calculations, as is recommended, all compared approaches become conservative on average to varying degrees.
Collapse
Affiliation(s)
- Maarten Kruijver
- Institute of Environmental Science and Research, Auckland 1142, New Zealand
| | - Hannah Kelly
- Institute of Environmental Science and Research, Auckland 1142, New Zealand
| | - Jo-Anne Bright
- Institute of Environmental Science and Research, Auckland 1142, New Zealand
| | - John Buckleton
- Institute of Environmental Science and Research, Auckland 1142, New Zealand
- Department of Statistics, University of Auckland, Auckland 1142, New Zealand
| |
Collapse
|
10
|
Yan S, Sha Q, Zhang S. Control for population stratification in genetic association studies based on GWAS summary statistics. Genet Epidemiol 2022; 46:604-614. [PMID: 35766057 DOI: 10.1002/gepi.22493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 05/20/2022] [Accepted: 05/25/2022] [Indexed: 11/11/2022]
Abstract
Over the past years, genome-wide association studies (GWAS) have generated a wealth of new information. Summary data from many GWAS are now publicly available, promoting the development of many statistical methods for association studies based on GWAS summary statistics, which avoids the increasing challenges associated with individual-level genotype and phenotype data sharing. However, for population-based association studies such as GWAS, it has been long recognized that population stratification can seriously confound association results. For large GWAS, it is very likely that there exist population stratification and cryptic relatedness, which will result in inflated Type I error in association testing. Although many methods have been developed to control for population stratification, only two of these approaches can be used to control population stratification without individual-level data: one is based on genomic control (GC) and the other one is based on linkage disequilibrium score regression (LDSC). However, the performance of these two approaches is currently unknown. In this study, we use extensive simulation studies including populations with subpopulations, spatially structured populations, and populations with cryptic relatedness to compare the performance of these two approaches to control for population stratification using only GWAS summary statistics without individual-level data. Data sets from the genetic analysis workshop 19 and UK Biobank are also used to evaluate these two approaches. We demonstrate that the intercept of LDSC can be used as a more accurate correction factor than GC. The results from this study will provide very useful information for researchers using GWAS summary statistics while trying to control for population stratification.
Collapse
Affiliation(s)
- Shijia Yan
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| |
Collapse
|
11
|
Burt CH. Challenging the utility of polygenic scores for social science: Environmental confounding, downward causation, and unknown biology. Behav Brain Sci 2022; 46:e207. [PMID: 35551690 PMCID: PMC9653522 DOI: 10.1017/s0140525x22001145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The sociogenomics revolution is upon us, we are told. Whether revolutionary or not, sociogenomics is poised to flourish given the ease of incorporating polygenic scores (or PGSs) as "genetic propensities" for complex traits into social science research. Pointing to evidence of ubiquitous heritability and the accessibility of genetic data, scholars have argued that social scientists not only have an opportunity but a duty to add PGSs to social science research. Social science research that ignores genetics is, some proponents argue, at best partial and likely scientifically flawed, misleading, and wasteful. Here, I challenge arguments about the value of genetics for social science and with it the claimed necessity of incorporating PGSs into social science models as measures of genetic influences. In so doing, I discuss the impracticability of distinguishing genetic influences from environmental influences because of non-causal gene-environment correlations, especially population stratification, familial confounding, and downward causation. I explain how environmental effects masquerade as genetic influences in PGSs, which undermines their raison d'être as measures of genetic propensity, especially for complex socially contingent behaviors that are the subject of sociogenomics. Additionally, I draw attention to the partial, unknown biology, while highlighting the persistence of an implicit, unavoidable reductionist genes versus environments approach. Leaving sociopolitical and ethical concerns aside, I argue that the potential scientific rewards of adding PGSs to social science are few and greatly overstated and the scientific costs, which include obscuring structural disadvantages and cultural influences, outweigh these meager benefits for most social science applications.
Collapse
Affiliation(s)
- Callie H Burt
- Department of Criminal Justice & Criminology, Center for Research on Interpersonal Violence (CRIV), Georgia State University, Atlanta, GA, USA ; www.callieburt.org
| |
Collapse
|
12
|
Fernández-Rhodes L, Graff M, Buchanan VL, Justice AE, Highland HM, Guo X, Zhu W, Chen HH, Young KL, Adhikari K, Palmer ND, Below JE, Bradfield J, Pereira AC, Glover L, Kim D, Lilly AG, Shrestha P, Thomas AG, Zhang X, Chen M, Chiang CW, Pulit S, Horimoto A, Krieger JE, Guindo-Martínez M, Preuss M, Schumann C, Smit RA, Torres-Mejía G, Acuña-Alonzo V, Bedoya G, Bortolini MC, Canizales-Quinteros S, Gallo C, González-José R, Poletti G, Rothhammer F, Hakonarson H, Igo R, Adler SG, Iyengar SK, Nicholas SB, Gogarten SM, Isasi CR, Papnicolaou G, Stilp AM, Qi Q, Kho M, Smith JA, Langefeld CD, Wagenknecht L, Mckean-Cowdin R, Gao XR, Nousome D, Conti DV, Feng Y, Allison MA, Arzumanyan Z, Buchanan TA, Ida Chen YD, Genter PM, Goodarzi MO, Hai Y, Hsueh W, Ipp E, Kandeel FR, Lam K, Li X, Nadler JL, Raffel LJ, Roll K, Sandow K, Tan J, Taylor KD, Xiang AH, Yao J, Audirac-Chalifour A, de Jesus Peralta Romero J, Hartwig F, Horta B, Blangero J, Curran JE, Duggirala R, Lehman DE, Puppala S, Fejerman L, John EM, Aguilar-Salinas C, Burtt NP, Florez JC, García-Ortíz H, González-Villalpando C, Mercader J, Orozco L, Tusié-Luna T, Blanco E, Gahagan S, Cox NJ, Hanis C, Butte NF, Cole SA, Comuzzie AG, Voruganti VS, Rohde R, Wang Y, Sofer T, Ziv E, Grant SF, Ruiz-Linares A, Rotter JI, Haiman CA, Parra EJ, Cruz M, Loos RJ, North KE. Ancestral diversity improves discovery and fine-mapping of genetic loci for anthropometric traits-The Hispanic/Latino Anthropometry Consortium. HGG Adv 2022; 3:100099. [PMID: 35399580 PMCID: PMC8990175 DOI: 10.1016/j.xhgg.2022.100099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 03/06/2022] [Indexed: 02/05/2023] Open
Abstract
Hispanic/Latinos have been underrepresented in genome-wide association studies (GWAS) for anthropometric traits despite their notable anthropometric variability, ancestry proportions, and high burden of growth stunting and overweight/obesity. To address this knowledge gap, we analyzed densely imputed genetic data in a sample of Hispanic/Latino adults to identify and fine-map genetic variants associated with body mass index (BMI), height, and BMI-adjusted waist-to-hip ratio (WHRadjBMI). We conducted a GWAS of 18 studies/consortia as part of the Hispanic/Latino Anthropometry (HISLA) Consortium (stage 1, n = 59,771) and generalized our findings in 9 additional studies (stage 2, n = 10,538). We conducted a trans-ancestral GWAS with summary statistics from HISLA stage 1 and existing consortia of European and African ancestries. In our HISLA stage 1 + 2 analyses, we discovered one BMI locus, as well as two BMI signals and another height signal each within established anthropometric loci. In our trans-ancestral meta-analysis, we discovered three BMI loci, one height locus, and one WHRadjBMI locus. We also identified 3 secondary signals for BMI, 28 for height, and 2 for WHRadjBMI in established loci. We show that 336 known BMI, 1,177 known height, and 143 known WHRadjBMI (combined) SNPs demonstrated suggestive transferability (nominal significance and effect estimate directional consistency) in Hispanic/Latino adults. Of these, 36 BMI, 124 height, and 11 WHRadjBMI SNPs were significant after trait-specific Bonferroni correction. Trans-ancestral meta-analysis of the three ancestries showed a small-to-moderate impact of uncorrected population stratification on the resulting effect size estimates. Our findings demonstrate that future studies may also benefit from leveraging diverse ancestries and differences in linkage disequilibrium patterns to discover novel loci and additional signals with less residual population stratification.
Collapse
Affiliation(s)
- Lindsay Fernández-Rhodes
- Department of Biobehavioral Health, Pennsylvania State University, 219 Biobehavioral Health Building, University Park, PA 16802, USA
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Victoria L. Buchanan
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Anne E. Justice
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Biomedical and Translational Informatics, Geisinger Health System, Danville, PA 17822, USA
| | - Heather M. Highland
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Wanying Zhu
- Vanderbilt Genetics Institute, Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Hung-Hsin Chen
- Vanderbilt Genetics Institute, Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Kristin L. Young
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Kaustubh Adhikari
- School of Mathematics and Statistics, Faculty of Science, Technology, Engineering and Mathematics, The Open University, MK7 6AA Milton Keynes, UK
| | - Nicholette D. Palmer
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27101, USA
| | - Jennifer E. Below
- Vanderbilt Genetics Institute, Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jonathan Bradfield
- Center for Applied Genomics, Division of Human Genetics, Department of Pediatrics, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Alexandre C. Pereira
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo, São Paulo 05508-220, Brazil
| | - LáShauntá Glover
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Daeeun Kim
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Adam G. Lilly
- Department of Sociology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Poojan Shrestha
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Alvin G. Thomas
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Xinruo Zhang
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Minhui Chen
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Charleston W.K. Chiang
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90007, USA
| | - Sara Pulit
- Vertex Pharmaceuticals, W2 6BD Oxford, UK
| | - Andrea Horimoto
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo, São Paulo 05508-220, Brazil
| | - Jose E. Krieger
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo, São Paulo 05508-220, Brazil
| | - Marta Guindo-Martínez
- The Charles Bronfman Institutes for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- The Novo Nordisk Center for Basic Metabolic Research, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Michael Preuss
- The Charles Bronfman Institutes for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Claudia Schumann
- Hasso Plattner Institute, University of Potsdam, Digital Health Center, 14482 Potsdam, Germany
| | - Roelof A.J. Smit
- The Charles Bronfman Institutes for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Gabriela Torres-Mejía
- Department of Research in Cardiovascular Diseases, Diabetes Mellitus, and Cancer, Population Health Research Center, National Institute of Public Health, Cuernavaca, Morelos 62100, Mexico
| | | | - Gabriel Bedoya
- Molecular Genetics Investigation Group, University of Antioquia, Medellín 1226, Colombia
| | - Maria-Cátira Bortolini
- Department of Genetics, Federal University of Rio Grande do Sul, Porto Alegre 90040-060, Brazil
| | - Samuel Canizales-Quinteros
- Population Genomics Applied to Health Unit, The National Institute of Genomic Medicine and the Faculty of Chemistry at the National Autonomous University of Mexico, Mexico City 04510, Mexico
| | - Carla Gallo
- Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima 15102, Peru
| | - Rolando González-José
- Patagonian Institute of the Social and Human Sciences, Patagonian National Center, Puerto Madryn U9120, Argentina
| | - Giovanni Poletti
- Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima 15102, Peru
| | | | - Hakon Hakonarson
- Center for Applied Genomics, Division of Human Genetics, Department of Pediatrics, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Robert Igo
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Sharon G. Adler
- Division of Nephrology and Hypertension, Harbor-University of California Los Angeles Medical Center, Torrance, CA 90502, USA
| | - Sudha K. Iyengar
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Susanne B. Nicholas
- Department of Medicine, David Geffen School of Medicine at University of California, Los Angeles, CA 90095, USA
| | | | - Carmen R. Isasi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | | | - Adrienne M. Stilp
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Qibin Qi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Minjung Kho
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jennifer A. Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Carl D. Langefeld
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27101, USA
| | - Lynne Wagenknecht
- Division of Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC 27101, USA
| | - Roberta Mckean-Cowdin
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90032, USA
| | - Xiaoyi Raymond Gao
- Department of Ophthalmology and Visual Sciences, Department of Biomedical Informatics, Division of Human Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Darryl Nousome
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90032, USA
| | - David V. Conti
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Ye Feng
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90032, USA
| | - Matthew A. Allison
- Department of Family Medicine, University of California, San Diego, CA 92161, USA
| | - Zorayr Arzumanyan
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Thomas A. Buchanan
- Department of Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Pauline M. Genter
- Department of Medicine, Division of Endocrinology, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Mark O. Goodarzi
- Division of Endocrinology, Diabetes, and Metabolism, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Yang Hai
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Willa Hsueh
- Department of Internal Medicine, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA
| | - Eli Ipp
- Department of Medicine, David Geffen School of Medicine at University of California, Los Angeles, CA 90095, USA
- Department of Medicine, Division of Endocrinology, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Fouad R. Kandeel
- Department of Translational Research & Cellular Therapeutics, Beckman Research Institute of City of Hope, Duarte, CA 91010, USA
| | - Kelvin Lam
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Xiaohui Li
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Jerry L. Nadler
- Department of Pharmacology at New York Medical College School of Medicine, Valhalla, NY 10595, USA
| | - Leslie J. Raffel
- Division of Genetic and Genomic Medicine, Department of Pediatrics, University of California, Irvine, CA 92697, USA
| | - Kathryn Roll
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Kevin Sandow
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Jingyi Tan
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Anny H. Xiang
- Research and Evaluation Branch, Kaiser Permanente of Southern California, Pasadena, CA 91101, USA
| | - Jie Yao
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Astride Audirac-Chalifour
- Medical Research Unit in Biochemistry, Specialty Hospital, National Medical Center of the Twenty-First Century, Mexican Institute of Social Security, Mexico City 06725, Mexico
| | - Jose de Jesus Peralta Romero
- Medical Research Unit in Biochemistry, Specialty Hospital, National Medical Center of the Twenty-First Century, Mexican Institute of Social Security, Mexico City 06725, Mexico
| | - Fernando Hartwig
- Postgraduate Program in Epidemiology, Federal University of Pelotas, Pelotas 96010-610, Brazil
| | - Bernando Horta
- Postgraduate Program in Epidemiology, Federal University of Pelotas, Pelotas 96010-610, Brazil
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, University of Texas Rio Grande Valley, Brownsville and Edinburg, TX 78520 and 78539, USA
| | - Joanne E. Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, University of Texas Rio Grande Valley, Brownsville and Edinburg, TX 78520 and 78539, USA
| | - Ravindranath Duggirala
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, University of Texas Rio Grande Valley, Brownsville and Edinburg, TX 78520 and 78539, USA
| | - Donna E. Lehman
- Department of Medicine, School of Medicine, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Sobha Puppala
- Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27109, USA
| | - Laura Fejerman
- Department of Public Health Sciences, School of Medicine, and the Comprehensive Cancer Center, University of California Davis, Davis, CA 95616, USA
| | - Esther M. John
- Departments of Epidemiology & Population Health and Medicine-Oncology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Carlos Aguilar-Salinas
- Division of Nutrition, Salvador Zubirán National Institute of Health Sciences and Nutrition, Mexico City 14080, Mexico
| | - Noël P. Burtt
- Programs in Metabolism and Medical and Population Genetics, Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Jose C. Florez
- Programs in Metabolism and Medical and Population Genetics, Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Humberto García-Ortíz
- Laboratory of Immunogenomics and Metabolic Diseases, National Institute of Genomic Medicine, Mexico City 14610, Mexico
| | - Clicerio González-Villalpando
- Center for Diabetes Studies, Research Unit for Diabetes and Cardiovascular Risk, Center for Population Health Studies, National Institute of Public Health, Mexico City 14080, Mexico
| | - Josep Mercader
- Programs in Metabolism and Medical and Population Genetics, Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lorena Orozco
- Laboratory of Immunogenomics and Metabolic Diseases, National Institute of Genomic Medicine, Mexico City 14610, Mexico
| | - Teresa Tusié-Luna
- Molecular Biology and Medical Genomics Unity, Institute of Biomedical Research, The National Autonomous University of Mexico and the Salvador Zubirán National Institute of Health Sciences and Nutrition, Mexico City 14080, Mexico
| | - Estela Blanco
- Center for Community Health, Division of Academic General Pediatrics, University of California at San Diego, San Diego, CA 92093, USA
| | - Sheila Gahagan
- Center for Community Health, Division of Academic General Pediatrics, University of California at San Diego, San Diego, CA 92093, USA
| | - Nancy J. Cox
- Vanderbilt Genetics Institute, Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Craig Hanis
- University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Nancy F. Butte
- United States Department of Agriculture, Agricultural Research Service, The Children’s Nutrition Research Center, and the Department Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shelley A. Cole
- Population Health Program, Texas Biomedical Research Institute, San Antonio, TX 78227, USA
| | | | - V. Saroja Voruganti
- Department of Nutrition and Nutrition Research Institute, University of North Carolina at Chapel Hill, Kannapolis, NC 28081, USA
| | - Rebecca Rohde
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yujie Wang
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tamar Sofer
- Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Elad Ziv
- Division of General Internal Medicine, Department of Medicine, Helen Diller Family Comprehensive Cancer Center, Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94115, USA
| | - Struan F.A. Grant
- Center for Applied Genomics, Division of Human Genetics, Department of Pediatrics, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Andres Ruiz-Linares
- Ministry of Education Key Laboratory of Contemporary Anthropology and Collaborative Innovation Center of Genetics and Development, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai 200438, China
- Department of Genetics, Evolution and Environment, and Genetics Institute of the University College London, London WC1E 6BT, UK
- Laboratory of Biocultural Anthropology, Law, Ethics, and Health, Aix-Marseille University, Marseille 13385, France
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Christopher A. Haiman
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Esteban J. Parra
- Department of Anthropology, University of Toronto- Mississauga, Mississauga, ON L5L 1C6, Canada
| | - Miguel Cruz
- Medical Research Unit in Biochemistry, Specialty Hospital, National Medical Center of the Twenty-First Century, Mexican Institute of Social Security, Mexico City 06725, Mexico
| | - Ruth J.F. Loos
- The Charles Bronfman Institutes for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kari E. North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| |
Collapse
|
13
|
Cheng S, Lyu J, Shi X, Wang K, Wang Z, Deng M, Sun B, Wang C. Rare variant association tests for ancestry-matched case-control data based on conditional logistic regression. Brief Bioinform 2022; 23:6502553. [PMID: 35021184 DOI: 10.1093/bib/bbab572] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/29/2021] [Accepted: 12/13/2021] [Indexed: 12/13/2022] Open
Abstract
With the increasing volume of human sequencing data available, analysis incorporating external controls becomes a popular and cost-effective approach to boost statistical power in disease association studies. To prevent spurious association due to population stratification, it is important to match the ancestry backgrounds of cases and controls. However, rare variant association tests based on a standard logistic regression model are conservative when all ancestry-matched strata have the same case-control ratio and might become anti-conservative when case-control ratio varies across strata. Under the conditional logistic regression (CLR) model, we propose a weighted burden test (CLR-Burden), a variance component test (CLR-SKAT) and a hybrid test (CLR-MiST). We show that the CLR model coupled with ancestry matching is a general approach to control for population stratification, regardless of the spatial distribution of disease risks. Through extensive simulation studies, we demonstrate that the CLR-based tests robustly control type 1 errors under different matching schemes and are more powerful than the standard Burden, SKAT and MiST tests. Furthermore, because CLR-based tests allow for different case-control ratios across strata, a full-matching scheme can be employed to efficiently utilize all available cases and controls to accelerate the discovery of disease associated genes.
Collapse
Affiliation(s)
- Shanshan Cheng
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Jingjing Lyu
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Xian Shi
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Kai Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Zengmiao Wang
- Center for Quantitative Biology, Peking University, Beijing 100871, P. R. China
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Beijing 100871, P. R. China.,LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, P. R. China.,Center for Statistical Sciences, Peking University, Beijing 100871, P. R. China
| | - Baoluo Sun
- Department of Statistics and Data Science, National University of Singapore, Singapore 117546, Singapore
| | - Chaolong Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China.,Department of Orthopedic Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| |
Collapse
|
14
|
Yu X, Ho K, Shen Z, Fu X, Huang H, Wu D, Lin Y, Lin Y, Chen W, Su M, Qiu C, Zhuang X, Su Z. The Association of Human Leukocyte Antigen and COVID-19 in Southern China. Open Forum Infect Dis 2021; 8:ofab410. [PMID: 34552996 PMCID: PMC8436377 DOI: 10.1093/ofid/ofab410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 08/06/2021] [Indexed: 12/24/2022] Open
Abstract
Human leukocyte antigen (HLA) polymorphism is hypothesized to be associated with diverse immune responses toward infectious diseases. Herein, by comparing against multiple subpopulation groups as control, we confirmed that HLA-B*15:27 and HLA-DRB1*04:06 were associated with coronavirus disease 2019 susceptibility in China. Both alleles were predicted to have weak binding affinities toward viral proteins.
Collapse
Affiliation(s)
- Xueping Yu
- Department of Infectious Disease, The First Hospital of Quanzhou, affiliated to Fujian Medical University, Quanzhou, Fujian, China
| | - Kuoting Ho
- HI. Q Biomedical Laboratory, Quanzhou, Fujian, China.,School of Biomedical Science, Huaqiao University, Quanzhou, Fujian, China
| | - Zhongliang Shen
- Department of Infectious Disease, Huashan Hospital, Fudan University, Shanghai, China
| | - Xiaoying Fu
- HI. Q Biomedical Laboratory, Quanzhou, Fujian, China
| | - Hongbo Huang
- Department of Respiratory Disease, The First Hospital of Quanzhou, affiliated to Fujian Medical University, Quanzhou, Fujian, China
| | - Delun Wu
- HI. Q Biomedical Laboratory, Quanzhou, Fujian, China
| | - Yancheng Lin
- HI. Q Biomedical Laboratory, Quanzhou, Fujian, China
| | - Yijian Lin
- Department of Respiratory Disease, The First Hospital of Quanzhou, affiliated to Fujian Medical University, Quanzhou, Fujian, China
| | - Wenhuang Chen
- Department of Infectious Disease, The First Hospital of Quanzhou, affiliated to Fujian Medical University, Quanzhou, Fujian, China
| | - Milong Su
- Department of Clinical Laboratory, The First Hospital of Quanzhou, affiliated to Fujian Medical University, Quanzhou, Fujian, China
| | - Chao Qiu
- Department of Infectious Disease, Huashan Hospital, Fudan University, Shanghai, China
| | - Xibin Zhuang
- Department of Respiratory Disease, The First Hospital of Quanzhou, affiliated to Fujian Medical University, Quanzhou, Fujian, China
| | - Zhijun Su
- Department of Infectious Disease, The First Hospital of Quanzhou, affiliated to Fujian Medical University, Quanzhou, Fujian, China
| |
Collapse
|
15
|
Brinster R, Scherer D, Lorenzo Bermejo J. Optimal selection of genetic variants for adjustment of population stratification in European association studies. Brief Bioinform 2021; 21:753-761. [PMID: 30863848 DOI: 10.1093/bib/bbz023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Revised: 01/24/2019] [Accepted: 02/10/2019] [Indexed: 01/14/2023] Open
Abstract
Population stratification is usually corrected relying on principal component analysis (PCA) of genome-wide genotype data, even in populations considered genetically homogeneous, such as Europeans. The need to genotype only a small number of genetic variants that show large differences in allele frequency among subpopulations-so-called ancestry-informative markers (AIMs)-instead of the whole genome for stratification adjustment could represent an advantage for replication studies and candidate gene/pathway studies. Here we compare the correction performance of classical and robust principal components (PCs) with the use of AIMs selected according to four different methods: the informativeness for assignment measure ($IN$-AIMs), the combination of PCA and F-statistics, PCA-correlated measurement and the PCA weighted loadings for each genetic variant. We used real genotype data from the Population Reference Sample and The Cancer Genome Atlas to simulate European genetic association studies and to quantify type I error rate and statistical power in different case-control settings. In studies with the same numbers of cases and controls per country and control-to-case ratios reflecting actual rates of disease prevalence, no adjustment for population stratification was required. The unnecessary inclusion of the country of origin, PCs or AIMs as covariates in the regression models translated into increasing type I error rates. In studies with cases and controls from separate countries, no investigated method was able to adequately correct for population stratification. The first classical and the first two robust PCs achieved the lowest (although inflated) type I error, followed at some distance by the first eight $IN$-AIMs.
Collapse
Affiliation(s)
- Regina Brinster
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, Heidelberg, Germany
| | - Dominique Scherer
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, Heidelberg, Germany
| | - Justo Lorenzo Bermejo
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, Heidelberg, Germany
| |
Collapse
|
16
|
Mullaert J, Bouaziz M, Seeleuthner Y, Bigio B, Casanova JL, Alcaïs A, Abel L, Cobat A. Taking population stratification into account by local permutations in rare-variant association studies on small samples. Genet Epidemiol 2021; 45:821-829. [PMID: 34402542 DOI: 10.1002/gepi.22426] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 06/07/2021] [Accepted: 07/15/2021] [Indexed: 11/08/2022]
Abstract
Many methods for rare variant association studies require permutations to assess the significance of tests. Standard permutations assume that all individuals are exchangeable and do not take population stratification (PS), a known confounding factor in genetic studies, into account. We propose a novel strategy, LocPerm, in which individual phenotypes are permuted only with their closest ancestry-based neighbors. We performed a simulation study, focusing on small samples, to evaluate and compare LocPerm with standard permutations and classical adjustment on first principal components. Under the null hypothesis, LocPerm was the only method providing an acceptable type I error, regardless of sample size and level of stratification. The power of LocPerm was similar to that of standard permutation in the absence of PS, and remained stable in different PS scenarios. We conclude that LocPerm is a method of choice for taking PS and/or small sample size into account in rare variant association studies.
Collapse
Affiliation(s)
- Jimmy Mullaert
- Université de Paris, IAME, INSERM, Paris, France.,AP-HP, Hôpital Bichat, DEBRC, Paris, France.,Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France
| | - Matthieu Bouaziz
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France
| | - Yoann Seeleuthner
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France
| | - Benedetta Bigio
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, New York, USA
| | - Jean-Laurent Casanova
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France.,St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, New York, USA.,Howard Hughes Medical Institute, New York, New York, USA
| | - Alexandre Alcaïs
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France
| | - Laurent Abel
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France.,St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, New York, USA
| | - Aurélie Cobat
- Laboratory of Human Genetics of Infectious Diseases, Paris, EU, France.,Université de Paris, Imagine Institute, Paris, EU, France
| |
Collapse
|
17
|
Arriaga-MacKenzie IS, Matesi G, Chen S, Ronco A, Marker KM, Hall JR, Scherenberg R, Khajeh-Sharafabadi M, Wu Y, Gignoux CR, Null M, Hendricks AE. Summix: A method for detecting and adjusting for population structure in genetic summary data. Am J Hum Genet 2021; 108:1270-1282. [PMID: 34157305 PMCID: PMC8322937 DOI: 10.1016/j.ajhg.2021.05.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 05/26/2021] [Indexed: 12/11/2022] Open
Abstract
Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
Collapse
Affiliation(s)
| | - Gregory Matesi
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Samuel Chen
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Alexandria Ronco
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Katie M Marker
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jordan R Hall
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Ryan Scherenberg
- Business School, University of Colorado Denver, Denver, CO 80204, USA
| | | | - Yinfei Wu
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Christopher R Gignoux
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, USA
| | - Megan Null
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA; Mathematics and Physical Sciences, The College of Idaho, Caldwell, ID 83605, USA
| | - Audrey E Hendricks
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, USA.
| |
Collapse
|
18
|
Reisetter AC, Breheny P. Penalized linear mixed models for structured genetic data. Genet Epidemiol 2021; 45:427-444. [PMID: 33998038 DOI: 10.1002/gepi.22384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/19/2021] [Accepted: 03/29/2021] [Indexed: 11/12/2022]
Abstract
Many genetic studies that aim to identify genetic variants associated with complex phenotypes are subject to unobserved confounding factors arising from environmental heterogeneity. This poses a challenge to detecting associations of interest and is known to induce spurious associations when left unaccounted for. Penalized linear mixed models (LMMs) are an attractive method to correct for unobserved confounding. These methods correct for varying levels of relatedness and population structure by modeling it as a random effect with a covariance structure estimated from observed genetic data. Despite an extensive literature on penalized regression and LMMs separately, the two are rarely discussed together. The aim of this review is to do so while examining the statistical properties of penalized LMMs in the genetic association setting. Specifically, the ability of penalized LMMs to accurately estimate genetic effects in the presence of environmental confounding has not been well studied. To clarify the important yet subtle distinction between population structure and environmental heterogeneity, we present a detailed review of relevant concepts and methods. In addition, we evaluate the performance of penalized LMMs and competing methods in terms of estimation and selection accuracy in the presence of a number of confounding structures.
Collapse
Affiliation(s)
- Anna C Reisetter
- Department of Biostatistics, University of Iowa, Iowa City, Iowa, USA
| | - Patrick Breheny
- Department of Biostatistics, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
19
|
Ratnasekera P, McNeney B. Re-analysis of a Genome-Wide Gene-By-Environment Interaction Study of Case Parent Trios, Adjusted for Population Stratification. Front Genet 2021; 11:600232. [PMID: 33519903 PMCID: PMC7838675 DOI: 10.3389/fgene.2020.600232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Accepted: 12/08/2020] [Indexed: 11/13/2022] Open
Abstract
We investigate the impact of confounding on the results of a genome-wide association analysis by Beaty et al., which identified multiple single nucleotide polymorphisms that appeared to modify the effect of maternal smoking, alcohol consumption, or multivitamin supplementation on risk of cleft palate. The study sample of case-parent trios was primarily of European and East Asian ancestry, and the distribution of all three exposures differed by ancestral group. Such differences raise the possibility that confounders, rather than the exposures, are the risk modifiers and hence that the inference of gene-environment (G×E) interaction may be spurious. Our analyses generally confirmed the result of Beaty et al. and suggest the interaction G×E is driven by the European trios, whereas the East Asian trios were less informative.
Collapse
Affiliation(s)
- Pulindu Ratnasekera
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada
| | - Brad McNeney
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
20
|
Abstract
A simulation study demonstrates a better method for separating genetic effects from environmental effects in genome-wide association studies, but there is still some way to go before this becomes a "solved" problem.
Collapse
Affiliation(s)
- Jennifer Blanc
- Human Genetics, University of Chicago, Chicago, United States
| | - Jeremy J Berg
- Human Genetics, University of Chicago, Chicago, United States
| |
Collapse
|
21
|
Abstract
Population stratification continues to bias the results of genome-wide association studies (GWAS). When these results are used to construct polygenic scores, even subtle biases can cumulatively lead to large errors. To study the effect of residual stratification, we simulated GWAS under realistic models of demographic history. We show that when population structure is recent, it cannot be corrected using principal components of common variants because they are uninformative about recent history. Consequently, polygenic scores are biased in that they recapitulate environmental structure. Principal components calculated from rare variants or identity-by-descent segments can correct this stratification for some types of environmental effects. While family-based studies are immune to stratification, the hybrid approach of ascertaining variants in GWAS but reestimating effect sizes in siblings reduces but does not eliminate stratification. We show that the effect of population stratification depends not only on allele frequencies and environmental structure but also on demographic history.
Collapse
Affiliation(s)
- Arslan A Zaidi
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| | - Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of PennsylvaniaPhiladelphiaUnited States
| |
Collapse
|
22
|
Chen Y, Peloso GM, Liu CT, DeStefano AL, Dupuis J. Evaluation of population stratification adjustment using genome-wide or exonic variants. Genet Epidemiol 2020; 44:702-716. [PMID: 32608112 PMCID: PMC7722041 DOI: 10.1002/gepi.22332] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 03/13/2020] [Accepted: 06/18/2020] [Indexed: 11/11/2022]
Abstract
Population stratification may cause an inflated type-I error and spurious association when assessing the association between genetic variations with an outcome. Many genetic association studies are now using exonic variants, which captures only 1% of the genome, however, population stratification adjustments have not been evaluated in the context of exonic variants. We compare the performance of two established approaches: principal components analysis (PCA) and mixed-effects models and assess the utility of genome-wide (GW) and exonic variants, by simulation and using a data set from the Framingham Heart Study. Our results illustrate that although the PCs and genetic relationship matrices computed by GW and exonic markers are different, the type-I error rate of association tests for common variants with additive effect appear to be properly controlled in the presence of population stratification. In addition, by considering single nucleotide variants (SNVs) that have different levels of confounding by population stratification, we also compare the power across multiple association approaches to account for population stratification such as PC-based corrections and mixed-effects models. We find that while these two methods achieve a similar power for SNVs that have a low or medium level of confounding by population stratification, mixed-effects model can reach a higher power for SNVs highly confounded by population stratification.
Collapse
Affiliation(s)
- Yuning Chen
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| | - Anita L DeStefano
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
- Department of Neurology, Boston University School of Medicine, Boston, Massachusetts
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| |
Collapse
|
23
|
Hahn G, Lutz SM, Hecker J, Prokopenko D, Cho MH, Silverman EK, Weiss ST, Lange C. locStra: Fast analysis of regional/global stratification in whole-genome sequencing studies. Genet Epidemiol 2020; 45:82-98. [PMID: 32929743 DOI: 10.1002/gepi.22356] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 08/05/2020] [Accepted: 08/24/2020] [Indexed: 01/08/2023]
Abstract
locStra is an R -package for the analysis of regional and global population stratification in whole-genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user-defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome-wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.
Collapse
Affiliation(s)
- Georg Hahn
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA
| | - Sharon M Lutz
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA
| | - Julian Hecker
- Department of Medicine, Brigham and Women's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Dmitry Prokopenko
- Massachusetts General Hospital, Harvard University, Boston, Massachusetts, USA
| | - Michael H Cho
- Department of Medicine, Brigham and Women's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Edwin K Silverman
- Department of Medicine, Brigham and Women's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Scott T Weiss
- Department of Medicine, Brigham and Women's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Christoph Lange
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA
| | | |
Collapse
|
24
|
Abegaz F, Chaichoompu K, Génin E, Fardo DW, König IR, Mahachie John JM, Van Steen K. Principals about principal components in statistical genetics. Brief Bioinform 2020; 20:2200-2216. [PMID: 30219892 DOI: 10.1093/bib/bby081] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 07/21/2018] [Accepted: 08/12/2018] [Indexed: 12/13/2022] Open
Abstract
Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.
Collapse
|
25
|
Pereira V, Santangelo R, Børsting C, Tvedebrink T, Almeida APF, Carvalho EF, Morling N, Gusmão L. Evaluation of the Precision of Ancestry Inferences in South American Admixed Populations. Front Genet 2020; 11:966. [PMID: 32973885 PMCID: PMC7472784 DOI: 10.3389/fgene.2020.00966] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 07/31/2020] [Indexed: 11/13/2022] Open
Abstract
Ancestry informative markers (AIMs) are used in forensic genetics to infer biogeographical ancestry (BGA) of individuals and may also have a prominent role in future police and identification investigations. In the last few years, many studies have been published reporting new AIM sets. These sets include markers (usually around 100 or less) selected with different purposes and different population resolutions. Regardless of the ability of these sets to separate populations from different continents or regions, the uncertainty associated with the estimates provided by these panels and their capacity to accurately report the different ancestral contributions in individuals of admixed populations has rarely been investigated. This issue is addressed in this study by evaluating different AIM sets. Ancestry inference was carried out in admixed South American populations, both at population and individual levels. The results of ancestry inferences using AIM sets with different numbers of markers among admixed reference populations were compared. To evaluate the performance of the different ancestry panels at the individual level, expected and observed estimates among families and their offspring were compared, considering that (1) the apportionment of ancestry in the offspring should be closer to the average ancestry of the parents, and (2) full siblings should present similar ancestry values. The results obtained illustrate the importance of having a good balance/compromise between not only the number of markers and their ability to differentiate ancestral populations, but also a balanced differentiation among reference groups, to obtain more precise values of genetic ancestry. This work also highlights the importance of estimating errors associated with the use of a limited number of markers. We demonstrate that although these errors have a moderate effect at the population level, they may have an important impact at the individual level. Considering that many AIM-sets are being described for inferences at the individual level and not at the population level, e.g., in association studies or the determination of a suspect's BGA, the results of this work point to the need of a more careful evaluation of the uncertainty associated with the ancestry estimates in admixed populations, when small AIM-sets are used.
Collapse
Affiliation(s)
- Vania Pereira
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Roberta Santangelo
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Claus Børsting
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Torben Tvedebrink
- Department of Mathematical Sciences, Aalborg University, Aalborg, Denmark
| | - Ana Paula F Almeida
- DNA Diagnostic Laboratory, State University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Elizeu F Carvalho
- DNA Diagnostic Laboratory, State University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Niels Morling
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Leonor Gusmão
- DNA Diagnostic Laboratory, State University of Rio de Janeiro, Rio de Janeiro, Brazil.,Instituto de Investigação e Inovação em Saúde, i3S, Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
| |
Collapse
|
26
|
Chen M, Sidore C, Akiyama M, Ishigaki K, Kamatani Y, Schlessinger D, Cucca F, Okada Y, Chiang CWK. Evidence of Polygenic Adaptation in Sardinia at Height-Associated Loci Ascertained from the Biobank Japan. Am J Hum Genet 2020; 107:60-71. [PMID: 32533944 PMCID: PMC7332648 DOI: 10.1016/j.ajhg.2020.05.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 05/19/2020] [Indexed: 01/31/2023] Open
Abstract
Adult height is one of the earliest putative examples of polygenic adaptation in humans. However, this conclusion was recently challenged because residual uncorrected stratification from large-scale consortium studies was considered responsible for the previously noted genetic difference. It thus remains an open question whether height loci exhibit signals of polygenic adaptation in any human population. We re-examined this question, focusing on one of the shortest European populations, the Sardinians, in addition to mainland European populations. We utilized height-associated loci from the Biobank Japan (BBJ) dataset to further alleviate concerns of biased ascertainment of GWAS loci and showed that the Sardinians remain significantly shorter than expected under neutrality (∼0.22 standard deviation shorter than Utah residents with ancestry from northern and western Europe [CEU] on the basis of polygenic height scores, p = 3.89 × 10-4). We also found the trajectory of polygenic height scores between the Sardinian and the British populations diverged over at least the last 10,000 years (p = 0.0082), consistent with a signature of polygenic adaptation driven primarily by the Sardinian population. Although the polygenic score-based analysis showed a much subtler signature in mainland European populations, we found a clear and robust adaptive signature in the UK population by using a haplotype-based statistic, the trait singleton density score (tSDS), driven by the height-increasing alleles (p = 9.1 × 10-4). In summary, by ascertaining height loci in a distant East Asian population, we further supported the evidence of polygenic adaptation at height-associated loci among the Sardinians. In mainland Europeans, the adaptive signature was detected in haplotype-based analysis but not in polygenic score-based analysis.
Collapse
Affiliation(s)
- Minhui Chen
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA.
| | - Carlo Sidore
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche, Monserrato 09042, Cagliari, Italy
| | - Masato Akiyama
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan; Department of Ocular Pathology and Imaging Science, Graduate School of Medical Sciences, Kyushu University, Fukuoka 812-8582, Japan
| | - Kazuyoshi Ishigaki
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan; Kyoto-McGill International Collaborative School in Genomic Medicine, Kyoto University Graduate School of Medicine, Kyoto 606-8501, Japan
| | - David Schlessinger
- Laboratory of Genetics and Genomics, National Institute on Aging, US National Institutes of Health, Baltimore, MD 21224, USA
| | - Francesco Cucca
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche, Monserrato 09042, Cagliari, Italy
| | - Yukinori Okada
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan; Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka 565-0871, Japan
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA; Quantitative and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
27
|
Naderi S, Moradi MH, Farhadian M, Yin T, Jaeger M, Scheper C, Korkuc P, Brockmann GA, König S, May K. Assessing selection signatures within and between selected lines of dual-purpose black and white and German Holstein cattle. Anim Genet 2020; 51:391-408. [PMID: 32100321 DOI: 10.1111/age.12925] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/02/2020] [Indexed: 12/29/2022]
Abstract
The aim of this study was to detect selection signatures considering cows from the German Holstein (GH) and the local dual-purpose black and white (DSN) population, as well as from generated sub-populations. The 4654 GH and 261 DSN cows were genotyped with the BovineSNP50 Genotyping BeadChip. The geographical herd location was used as an environmental descriptor to create the East-DSN and West-DSN sub-populations. In addition, two further sub-populations of GH cows were generated, using the extreme values for solutions of residual effects of cows for the claw disorder dermatitis digitalis. These groups represented the most susceptible and most resistant cows. We used cross-population extended haplotype homozygosity methodology (XP-EHH) to identify the most recent selection signatures. Furthermore, we calculated Wright's fixation index (FST ). Chromosomal segments for the top 0.1 percentile of negative or positive XP-EHH scores were studied in detail. For gene annotations, we used the Ensembl database and we considered a window of 250 kbp downstream and upstream of each core SNP corresponding to peaks of XP-EHH. In addition, functional interactions among potential candidate genes were inferred via gene network analyses. The most outstanding XP-EHH score was on chromosome 12 (at 77.34 Mb) for DSN and on chromosome 20 (at 36.29-38.42 Mb) for GH. Selection signature locations harbored QTL for several economically important milk and meat quality traits, reflecting the different breeding goals for GH and DSN. The average FST value between GH and DSN was quite low (0.068), indicating shared founders. For group stratifications according to cow health, several identified potential candidate genes influence disease resistance, especially to dermatitis digitalis.
Collapse
Affiliation(s)
- S Naderi
- Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany
| | - M H Moradi
- Department of Animal Sciences, Arak University, Shahid Beheshti Street, Arak, Iran
| | - M Farhadian
- Department of Animal Science, University of Tabriz, 29 Bahman Boulevard, Tabriz, Iran
| | - T Yin
- Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany
| | - M Jaeger
- Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany
| | - C Scheper
- Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany
| | - P Korkuc
- Albrecht Daniel Thaer Institute for Agricultural and Horticultural Sciences, Humboldt University Berlin, Invalidenstr. 42, Berlin, D-10115, Germany
| | - G A Brockmann
- Albrecht Daniel Thaer Institute for Agricultural and Horticultural Sciences, Humboldt University Berlin, Invalidenstr. 42, Berlin, D-10115, Germany
| | - S König
- Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany
| | - K May
- Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany
| |
Collapse
|
28
|
An J, Won S, Lutz SM, Hecker J, Lange C. Effect of population stratification on SNP-by-environment interaction. Genet Epidemiol 2019; 43:1046-1055. [PMID: 31429121 DOI: 10.1002/gepi.22250] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 06/04/2019] [Accepted: 07/11/2019] [Indexed: 11/10/2022]
Abstract
Proportions of false-positive rates in genome-wide association analysis are affected by population stratification, and if it is not correctly adjusted, the statistical analysis can produce the large false-negative finding. Therefore various approaches have been proposed to adjust such problems in genome-wide association studies. However, in spite of its importance, a few studies have been conducted in genome-wide single nucleotide polymorphism (SNP)-by-environment interaction studies. In this report, we illustrate in which scenarios can lead to the false-positive rates in association mapping and approach to maintaining the overall type-1 error rate.
Collapse
Affiliation(s)
- Jaehoon An
- Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, South Korea
| | - Sungho Won
- Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, South Korea.,Interdisciplinary Program for Bioinformatics, College of Natural Science, Seoul National University, Seoul, South Korea.,Institute of Health and Environment, Seoul National University, Seoul, South Korea.,Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | - Sharon M Lutz
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts
| | - Julian Hecker
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts.,Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts
| | - Christoph Lange
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts.,Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts
| |
Collapse
|
29
|
Cabreros I, Storey JD. A Likelihood-Free Estimator of Population Structure Bridging Admixture Models and Principal Components Analysis. Genetics 2019; 212:1009-1029. [PMID: 31028112 PMCID: PMC6707457 DOI: 10.1534/genetics.119.302159] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 04/08/2019] [Indexed: 11/18/2022] Open
Abstract
We introduce a simple and computationally efficient method for fitting the admixture model of genetic population structure, called ALStructure The strategy of ALStructure is to first estimate the low-dimensional linear subspace of the population admixture components, and then search for a model within this subspace that is consistent with the admixture model's natural probabilistic constraints. Central to this strategy is the observation that all models belonging to this constrained space of solutions are risk-minimizing and have equal likelihood, rendering any additional optimization unnecessary. The low-dimensional linear subspace is estimated through a recently introduced principal components analysis method that is appropriate for genotype data, thereby providing a solution that has both principal components and probabilistic admixture interpretations. Our approach differs fundamentally from other existing methods for estimating admixture, which aim to fit the admixture model directly by searching for parameters that maximize the likelihood function or the posterior probability. We observe that ALStructure typically outperforms existing methods both in accuracy and computational speed under a wide array of simulated and real human genotype datasets. Throughout this work, we emphasize that the admixture model is a special case of a much broader class of models for which algorithms similar to ALStructure may be successfully employed.
Collapse
Affiliation(s)
- Irineo Cabreros
- Program in Applied and Computational Mathematics, Princeton University, New Jersey 08544
| | - John D Storey
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, New Jersey 08544
| |
Collapse
|
30
|
Kerminen S, Martin AR, Koskela J, Ruotsalainen SE, Havulinna AS, Surakka I, Palotie A, Perola M, Salomaa V, Daly MJ, Ripatti S, Pirinen M. Geographic Variation and Bias in the Polygenic Scores of Complex Diseases and Traits in Finland. Am J Hum Genet 2019; 104:1169-1181. [PMID: 31155286 PMCID: PMC6562021 DOI: 10.1016/j.ajhg.2019.05.001] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Accepted: 04/29/2019] [Indexed: 12/12/2022] Open
Abstract
Polygenic scores (PSs) are becoming a useful tool to identify individuals with high genetic risk for complex diseases, and several projects are currently testing their utility for translational applications. It is also tempting to use PSs to assess whether genetic variation can explain a part of the geographic distribution of a phenotype. However, it is not well known how the population genetic properties of the training and target samples affect the geographic distribution of PSs. Here, we evaluate geographic differences, and related biases, of PSs in Finland in a geographically well-defined sample of 2,376 individuals from the National FINRISK study. First, we detect geographic differences in PSs for coronary artery disease (CAD), rheumatoid arthritis, schizophrenia, waist-hip ratio (WHR), body-mass index (BMI), and height, but not for Crohn disease or ulcerative colitis. Second, we use height as a model trait to thoroughly assess the possible population genetic biases in PSs and apply similar approaches to the other phenotypes. Most importantly, we detect suspiciously large accumulations of geographic differences for CAD, WHR, BMI, and height, suggesting bias arising from the population's genetic structure rather than from a direct genotype-phenotype association. This work demonstrates how sensitive the geographic patterns of current PSs are for small biases even within relatively homogeneous populations and provides simple tools to identify such biases. A thorough understanding of the effects of population genetic structure on PSs is essential for translational applications of PSs.
Collapse
Affiliation(s)
- Sini Kerminen
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki 00014, Finland
| | - Alicia R Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, USA
| | - Jukka Koskela
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki 00014, Finland
| | - Sanni E Ruotsalainen
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki 00014, Finland
| | - Aki S Havulinna
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki 00014, Finland; National Institute of Health and Welfare, Helsinki 00271, Finland
| | - Ida Surakka
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki 00014, Finland; Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - Aarno Palotie
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki 00014, Finland; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Psychiatric and Neurodevelopmental Genetics Unit, Department of Psychiatry, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Markus Perola
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki 00014, Finland; National Institute of Health and Welfare, Helsinki 00271, Finland
| | - Veikko Salomaa
- National Institute of Health and Welfare, Helsinki 00271, Finland
| | - Mark J Daly
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki 00014, Finland; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, 02142, USA
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki 00014, Finland; Department of Public Health, University of Helsinki, Helsinki 00014, Finland
| | - Matti Pirinen
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki 00014, Finland; Department of Public Health, University of Helsinki, Helsinki 00014, Finland; Helsinki Institute for Information Technology and Department of Mathematics and Statistics, University of Helsinki, Helsinki 00014, Finland.
| |
Collapse
|
31
|
Panarella M, Burkett KM. A Cautionary Note on the Effects of Population Stratification Under an Extreme Phenotype Sampling Design. Front Genet 2019; 10:398. [PMID: 31130982 PMCID: PMC6509877 DOI: 10.3389/fgene.2019.00398] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Accepted: 04/12/2019] [Indexed: 11/13/2022] Open
Abstract
Extreme phenotype sampling (EPS) is a popular study design used to reduce genotyping or sequencing costs. Assuming continuous phenotype data are available on a large cohort, EPS involves genotyping or sequencing only those individuals with extreme phenotypic values. Although this design has been shown to have high power to detect genetic effects even at smaller sample sizes, little attention has been paid to the effects of confounding variables, and in particular population stratification. Using extensive simulations, we demonstrate that the false positive rate under the EPS design is greatly inflated relative to a random sample of equal size or a “case-control”-like design where the cases are from one phenotypic extreme and the controls randomly sampled. The inflated false positive rate is observed even with allele frequency and phenotype mean differences taken from European population data. We show that the effects of confounding are not reduced by increasing the sample size. We also show that including the top principal components in a logistic regression model is sufficient for controlling the type 1 error rate using data simulated with a population genetics model and using 1,000 Genomes genotype data. Our results suggest that when an EPS study is conducted, it is crucial to adjust for all confounding variables. For genetic association studies this requires genotyping a sufficient number of markers to allow for ancestry estimation. Unfortunately, this could increase the costs of a study if sequencing or genotyping was only planned for candidate genes or pathways; the available genetic data would not be suitable for ancestry correction as many of the variants could have a true association with the trait.
Collapse
Affiliation(s)
- Michela Panarella
- Department of Biology, University of Ottawa, Ottawa, ON, Canada.,Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada
| | - Kelly M Burkett
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
32
|
Ahsan A, Monir M, Meng X, Rahaman M, Chen H, Chen M. Identification of epistasis loci underlying rice flowering time by controlling population stratification and polygenic effect. DNA Res 2019; 26:119-130. [PMID: 30590457 PMCID: PMC6476725 DOI: 10.1093/dnares/dsy043] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 11/21/2018] [Indexed: 01/28/2023] Open
Abstract
Flowering time is an important agronomic trait, attributed by multiple genes, gene-gene interactions and environmental factors. Population stratification and polygenic effects might confound genetic effects of the causal loci underlying this complex trait. We proposed a two-step approach for detecting epistasis interactions underlying rice flowering time by accounting population structure and polygenic effects. Simulation studies showed that the approach used in this study performs better than classical and PC-linear approaches in terms of powers and false discovery rates in the case of population stratification and polygenic effects. Whole genome epistasis analyses identified 589 putative genetic interactions for flowering time. Eighteen of these interactions are located within 10 kilobases of regions of known protein-protein interactions. Thirty-seven SNPs near to twenty-five genes involve in rice or/and Arabidopsis (orthologue) flowering pathway. Bioinformatics analysis showed that 66.55% pairwise genes of the identified interactions (392 out of the 589 interactions) have similarity in various genomic features. Moreover, significant numbers of detected epistatic genes have high expression in different floral tissues. Our findings highlight the importance of epistasis analysis by controlling population stratification and polygenic effect and provided novel insights into the genetic architecture of rice flowering which could assist breeding programmes.
Collapse
Affiliation(s)
- Asif Ahsan
- The State Key Laboratory of Plant Physiology and Biochemistry, Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Mamun Monir
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Xianwen Meng
- The State Key Laboratory of Plant Physiology and Biochemistry, Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Matiur Rahaman
- The State Key Laboratory of Plant Physiology and Biochemistry, Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
- Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, Bangladesh
| | - Hongjun Chen
- The State Key Laboratory of Plant Physiology and Biochemistry, Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Ming Chen
- The State Key Laboratory of Plant Physiology and Biochemistry, Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| |
Collapse
|
33
|
Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, Chiang CWK, Hirschhorn J, Daly MJ, Patterson N, Neale B, Mathieson I, Reich D, Sunyaev SR. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 2019; 8:e39702. [PMID: 30895926 PMCID: PMC6428571 DOI: 10.7554/elife.39702] [Citation(s) in RCA: 199] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 01/15/2019] [Indexed: 01/03/2023] Open
Abstract
Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution. Editorial note This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
Collapse
Affiliation(s)
- Mashaal Sohail
- Division of Genetics, Department of MedicineBrigham and Women’s Hospital and Harvard Medical SchoolBostonUnited States
- Department of Biomedical InformaticsHarvard Medical SchoolBostonUnited States
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
| | - Robert M Maier
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
| | - Andrea Ganna
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
- Department of Medical Epidemiology and BiostatisticsKarolinska InstitutetStockholmSweden
- Institute for Molecular Medicine FinlandUniversity of HelsinkiHelsinkiFinland
| | - Alex Bloemendal
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
| | - Alicia R Martin
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
| | - Michael C Turchin
- Center for Computational Molecular BiologyBrown UniversityProvidenceUnited States
- Department of Ecology and Evolutionary BiologyBrown UniversityProvidenceUnited States
| | - Charleston WK Chiang
- Department of Preventive Medicine, Center for Genetic Epidemiology, Keck School of MedicineUniversity of Southern CaliforniaLos AngelesUnited States
| | - Joel Hirschhorn
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Departments of Pediatrics and GeneticsHarvard Medical SchoolBostonUnited States
- Division of Endocrinology and Center for Basic and Translational Obesity ResearchBoston Children’s HospitalBostonUnited States
| | - Mark J Daly
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
- Institute for Molecular Medicine FinlandUniversity of HelsinkiHelsinkiFinland
| | - Nick Patterson
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Department of GeneticsHarvard Medical SchoolBostonUnited States
| | - Benjamin Neale
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Stanley Center for Psychiatric ResearchBroad Institute of MIT and HarvardCambridgeUnited States
- Analytical and Translational Genetics UnitMassachusetts General HospitalBostonUnited States
| | - Iain Mathieson
- Department of Genetics, Perelman School of MedicineUniversity of PennsylvaniaPhiladelphiaUnited States
| | - David Reich
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Department of GeneticsHarvard Medical SchoolBostonUnited States
- Howard Hughes Medical Institute, Harvard Medical SchoolBostonUnited States
| | - Shamil R Sunyaev
- Department of Biomedical InformaticsHarvard Medical SchoolBostonUnited States
- Program in Medical and Population GeneticsBroad Institute of MIT and HarvardCambridgeUnited States
- Division of Genetics, Department of MedicineBrigham and Women’s Hospital and Harvard Medical SchoolBostonUnited States
| |
Collapse
|
34
|
Lee JJ, McGue M, Iacono WG, Chow CC. The accuracy of LD Score regression as an estimator of confounding and genetic correlations in genome-wide association studies. Genet Epidemiol 2018; 42:783-795. [PMID: 30251275 DOI: 10.1002/gepi.22161] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 08/03/2018] [Accepted: 08/07/2018] [Indexed: 01/03/2023]
Abstract
To infer that a single-nucleotide polymorphism (SNP) either affects a phenotype or is linkage disequilibrium with a causal site, we must have some assurance that any SNP-phenotype correlation is not the result of confounding with environmental variables that also affect the trait. In this study, we study the properties of linkage disequilibrium (LD) Score regression, a recently developed method for using summary statistics from genome-wide association studies to ensure that confounding does not inflate the number of false positives. We do not treat the effects of genetic variation as a random variable and thus are able to obtain results about the unbiasedness of this method. We demonstrate that LD Score regression can produce estimates of confounding at null SNPs that are unbiased or conservative under fairly general conditions. This robustness holds in the case of the parent genotype affecting the offspring phenotype through some environmental mechanism, despite the resulting correlation over SNPs between LD Scores and the degree of confounding. Additionally, we demonstrate that LD Score regression can produce reasonably robust estimates of the genetic correlation, even when its estimates of the genetic covariance and the two univariate heritabilities are substantially biased.
Collapse
Affiliation(s)
- James J Lee
- Department of Psychology, University of Minnesota Twin Cities, Minneapolis, Minnesota
| | - Matt McGue
- Department of Psychology, University of Minnesota Twin Cities, Minneapolis, Minnesota
| | - William G Iacono
- Department of Psychology, University of Minnesota Twin Cities, Minneapolis, Minnesota
| | - Carson C Chow
- Mathematical Biology Section, Laboratory of Biological Modeling, NIDDK, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
35
|
Li M, He Z, Tong X, Witte JS, Lu Q. Detecting Rare Mutations with Heterogeneous Effects Using a Family-Based Genetic Random Field Method. Genetics 2018; 210:463-76. [PMID: 30104420 DOI: 10.1534/genetics.118.301266] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 07/29/2018] [Indexed: 01/19/2023] Open
Abstract
The genetic etiology of many complex diseases is highly heterogeneous. A complex disease can be caused by multiple mutations within the same gene or mutations in multiple genes at various genomic loci. Although these disease-susceptibility mutations can be collectively common in the population, they are often individually rare or even private to certain families. Family-based studies are powerful for detecting rare variants enriched in families, which is an important feature for sequencing studies due to the heterogeneous nature of rare variants. In addition, family designs can provide robust protection against population stratification. Nevertheless, statistical methods for analyzing family-based sequencing data are underdeveloped, especially those accounting for heterogeneous etiology of complex diseases. In this article, we introduce a random field framework for detecting gene-phenotype associations in family-based sequencing studies, referred to as family-based genetic random field (FGRF). Similar to existing family-based association tests, FGRF could utilize within-family and between-family information separately or jointly to test an association. We demonstrate that FGRF has comparable statistical power with existing methods when there is no genetic heterogeneity, but can improve statistical power when there is genetic heterogeneity across families. The proposed method also shares the same advantages with the conventional family-based association tests (e.g., being robust to population stratification). Finally, we applied the proposed method to a sequencing data from the Minnesota Twin Family Study, and revealed several genes, including SAMD14, potentially associated with alcohol dependence.
Collapse
|
36
|
Naret O, Chaturvedi N, Bartha I, Hammer C, Fellay J. Correcting for Population Stratification Reduces False Positive and False Negative Results in Joint Analyses of Host and Pathogen Genomes. Front Genet 2018; 9:266. [PMID: 30105048 PMCID: PMC6078058 DOI: 10.3389/fgene.2018.00266] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 07/02/2018] [Indexed: 11/23/2022] Open
Abstract
Studies of host genetic determinants of pathogen sequence variations can identify sites of genomic conflicts, by highlighting variants that are implicated in immune response on the host side and adaptive escape on the pathogen side. However, systematic genetic differences in host and pathogen populations can lead to inflated type I (false positive) and type II (false negative) error rates in genome-wide association analyses. Here, we demonstrate through a simulation that correcting for both host and pathogen stratification reduces spurious signals and increases power to detect real associations in a variety of tested scenarios. We confirm the validity of the simulations by showing comparable results in an analysis of paired human and HIV genomes.
Collapse
Affiliation(s)
- Olivier Naret
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nimisha Chaturvedi
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Istvan Bartha
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Christian Hammer
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jacques Fellay
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Precision Medicine Unit, Lausanne University Hospital, Lausanne, Switzerland
| | | |
Collapse
|
37
|
Yang J, Chen S, Abecasis G. Improved score statistics for meta-analysis in single-variant and gene-level association studies. Genet Epidemiol 2018; 42:333-343. [PMID: 29696691 DOI: 10.1002/gepi.22123] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 03/04/2018] [Accepted: 03/16/2018] [Indexed: 01/09/2023]
Abstract
Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses.
Collapse
Affiliation(s)
- Jingjing Yang
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America.,Department of Human Genetics, Center for Computational and Quantitative Genetics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Sai Chen
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Gonçalo Abecasis
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | | |
Collapse
|
38
|
Duan Q, Xu Z, Raffield L, Chang S, Wu D, Lange EM, Reiner AP, Li Y. A robust and powerful two-step testing procedure for local ancestry adjusted allelic association analysis in admixed populations. Genet Epidemiol 2018; 42:288-302. [PMID: 29226381 PMCID: PMC5851818 DOI: 10.1002/gepi.22104] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 09/07/2017] [Accepted: 10/20/2017] [Indexed: 12/23/2022]
Abstract
Genetic association studies in admixed populations allow us to gain deeper understanding of the genetic architecture of human diseases and traits. However, population stratification, complicated linkage disequilibrium (LD) patterns, and the complex interplay of allelic and ancestry effects on phenotypic traits pose challenges in such analyses. These issues may lead to detecting spurious associations and/or result in reduced statistical power. Fortunately, if handled appropriately, these same challenges provide unique opportunities for gene mapping. To address these challenges and to take these opportunities, we propose a robust and powerful two-step testing procedure Local Ancestry Adjusted Allelic (LAAA) association. In the first step, LAAA robustly captures associations due to allelic effect, ancestry effect, and interaction effect, allowing detection of effect heterogeneity across ancestral populations. In the second step, LAAA identifies the source of association, namely allelic, ancestry, or the combination. By jointly modeling allele, local ancestry, and ancestry-specific allelic effects, LAAA is highly powerful in capturing the presence of interaction between ancestry and allele effect. We evaluated the validity and statistical power of LAAA through simulations over a broad spectrum of scenarios. We further illustrated its usefulness by application to the Candidate Gene Association Resource (CARe) African American participants for association with hemoglobin levels. We were able to replicate independent groups' previously identified loci that would have been missed in CARe without joint testing. Moreover, the loci, for which LAAA detected potential effect heterogeneity, were replicated among African Americans from the Women's Health Initiative study. LAAA is freely available at https://yunliweb.its.unc.edu/LAAA.
Collapse
Affiliation(s)
- Qing Duan
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, USA
- Department of Statistics, University of North Carolina, Chapel Hill, NC, USA
| | - Zheng Xu
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE
- Initiative of Quantitative Life Sciences, University of Nebraska-Lincoln, Lincoln, NE
| | - Laura Raffield
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Suhua Chang
- Institute of Psychology, Chinese Academy of Science, Beijing, China
| | - Di Wu
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Periodontology, University of North Carolina, Chapel Hill, NC, USA
| | - Ethan M. Lange
- Department of Medicine, University of Colorado at Denver, Anschutz Medical Campus, Aurora, CO, USA
- Department of Biostatistics and Informatics, University of Colorado at Denver, Anschutz Medical Campus, Aurora, CO, USA
| | - Alex P. Reiner
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Yun Li
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
39
|
Martin ER, Tunc I, Liu Z, Slifer SH, Beecham AH, Beecham GW. Properties of global- and local-ancestry adjustments in genetic association tests in admixed populations. Genet Epidemiol 2017; 42:214-229. [PMID: 29288582 DOI: 10.1002/gepi.22103] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Revised: 09/24/2017] [Accepted: 10/29/2017] [Indexed: 12/31/2022]
Abstract
Population substructure can lead to confounding in tests for genetic association, and failure to adjust properly can result in spurious findings. Here we address this issue of confounding by considering the impact of global ancestry (average ancestry across the genome) and local ancestry (ancestry at a specific chromosomal location) on regression parameters and relative power in ancestry-adjusted and -unadjusted models. We examine theoretical expectations under different scenarios for population substructure; applying different regression models, verifying and generalizing using simulations, and exploring the findings in real-world admixed populations. We show that admixture does not lead to confounding when the trait locus is tested directly in a single admixed population. However, if there is more complex population structure or a marker locus in linkage disequilibrium (LD) with the trait locus is tested, both global and local ancestry can be confounders. Additionally, we show the genotype parameters of adjusted and unadjusted models all provide tests for LD between the marker and trait locus, but in different contexts. The local ancestry adjusted model tests for LD in the ancestral populations, while tests using the unadjusted and the global ancestry adjusted models depend on LD in the admixed population(s), which may be enriched due to different ancestral allele frequencies. Practically, this implies that global-ancestry adjustment should be used for screening, but local-ancestry adjustment may better inform fine mapping and provide better effect estimates at trait loci.
Collapse
Affiliation(s)
- Eden R Martin
- John P. Hussman Institute for Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, United States of America.,John T. MacDonald Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, United States of America
| | - Ilker Tunc
- Bioinformatics and Systems Biology, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Zhi Liu
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Susan H Slifer
- John P. Hussman Institute for Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, United States of America
| | - Ashley H Beecham
- John T. MacDonald Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, United States of America
| | - Gary W Beecham
- John P. Hussman Institute for Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, United States of America.,John T. MacDonald Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, United States of America
| |
Collapse
|
40
|
Hettige NC, Bani-Fatemi A, Luca VD. Validating a research ethnicity questionnaire using genomic markers. Pharmacogenomics 2017; 18:1649-1657. [PMID: 29173001 DOI: 10.2217/pgs-2017-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AIM Population stratification is a confounding factor in genetic association studies. Self-report measures, the most common method of collecting ethnicity, may be less reliable for psychiatric patients. This study aims to validate our research ethnicity questionnaire as a reliable measure of genetic ancestry. METHODS The validity of our questionnaire was compared with genetic ancestry according to structured association tests and dimensional reduction methods. Our research tool was also compared with a standard multiple choice questionnaire. RESULTS Our research questionnaire was highly consistent with genetic ancestry. The standard questionnaire demonstrated a greater degree of inconsistency in identifying ethnicity. CONCLUSION Collecting information on the geographical ancestry of each individual's grandparents provides a more comprehensive view of ethnicity to prevent population stratification and wasted finances on genotyping.
Collapse
Affiliation(s)
- Nuwan C Hettige
- Institute of Medical Science, University of Toronto, 1 King's College Circle, Toronto, Ontario, M5S 1A8, Canada.,Center for Addiction & Mental Health, 250 College Street, Toronto, Ontario, M5T 1R8, Canada
| | - Ali Bani-Fatemi
- Institute of Medical Science, University of Toronto, 1 King's College Circle, Toronto, Ontario, M5S 1A8, Canada.,Center for Addiction & Mental Health, 250 College Street, Toronto, Ontario, M5T 1R8, Canada
| | - Vincenzo De Luca
- Institute of Medical Science, University of Toronto, 1 King's College Circle, Toronto, Ontario, M5S 1A8, Canada.,Department of Psychiatry, University of Toronto, 250 College Street, Toronto, Ontario, Canada.,Center for Addiction & Mental Health, 250 College Street, Toronto, Ontario, M5T 1R8, Canada
| |
Collapse
|
41
|
Lin BD, Willemsen G, Abdellaoui A, Bartels M, Ehli EA, Davies GE, Boomsma DI, Hottenga JJ. The Genetic Overlap Between Hair and Eye Color. Twin Res Hum Genet 2016; 19:595-9. [PMID: 27852355 DOI: 10.1017/thg.2016.85] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
We identified the genetic variants for eye color by Genome-Wide Association Study (GWAS) in a Dutch Caucasian family-based population sample and examined the genetic correlation between hair and eye color using data from unrelated participants from the Netherlands Twin Register. With the Genome-wide Complex Trait Analysis software package, we found strong genetic correlations between various combinations of hair and eye colors. The strongest positive correlations were found for blue eyes with blond hair (0.87) and brown eyes with dark hair (0.71), whereas blue eyes with dark hair and brown eyes with blond hair showed the strongest negative correlations (-0.64 and -0.94, respectively). Red hair with green/hazel eyes showed the weakest correlation (-0.14). All analyses were corrected for age and sex, and we explored the effects of correcting for principal components (PCs) that represent ancestry and describe the genetic stratification of the Netherlands. When including the first three PCs as covariates, the genetic correlations between the phenotypes disappeared. This is not unexpected since hair and eye colors strongly indicate the ancestry of an individual. This makes it difficult to separate the effects of population stratification and the true genetic effects of variants on these particular phenotypes.
Collapse
|
42
|
Hellwege J, Keaton J, Giri A, Gao X, Velez Edwards DR, Edwards TL. Population Stratification in Genetic Association Studies. Curr Protoc Hum Genet 2017; 95:1.22.1-1.22.23. [PMID: 29044472 PMCID: PMC6007879 DOI: 10.1002/cphg.48] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Population stratification (PS) is a primary consideration in studies of genetic determinants of human traits. Failure to control for PS may lead to confounding, causing a study to fail for lack of significant results, or resources to be wasted following false-positive signals. Here, historical and current approaches for addressing PS when performing genetic association studies in human populations are reviewed. Methods for detecting the presence of PS, including global and local ancestry methods, are described. Also described are approaches for accounting for PS when calculating association statistics, such that measures of association are not confounded. Many traits are being examined for the first time in minority populations, which may inherently feature PS. © 2017 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Jacklyn Hellwege
- Vanderbilt Genetics Institute, Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center,
Nashville, TN 37203, USA
| | - Jacob Keaton
- Vanderbilt Genetics Institute, Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center,
Nashville, TN 37203, USA
| | - Ayush Giri
- Vanderbilt Genetics Institute, Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center,
Nashville, TN 37203, USA
| | - Xiaoyi Gao
- Department of Ophthalmology and Preventive Medicine, Keck School of Medicine, University of Southern California, Los
Angeles, CA 90033, USA
| | - Digna R. Velez Edwards
- Vanderbilt Genetics Institute, Department of Obstetrics and Gynecology, Vanderbilt University Medical Center,
Nashville, TN 37203, USA
| | - Todd L. Edwards
- Vanderbilt Genetics Institute, Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center,
Nashville, TN 37203, USA
| |
Collapse
|
43
|
Jiang Y, Ji Y, Sibley AB, Li YJ, Allen AS. Leveraging population information in family-based rare variant association analyses of quantitative traits. Genet Epidemiol 2016; 41:98-107. [PMID: 27917519 DOI: 10.1002/gepi.22022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Revised: 09/22/2016] [Accepted: 09/22/2016] [Indexed: 12/15/2022]
Abstract
Confounding due to population substructure is always a concern in genetic association studies. Although methods have been proposed to adjust for population stratification in the context of common variation, it is unclear how well these approaches will work when interrogating rare variation. Family-based association tests can be constructed that are robust to population stratification. For example, when considering a quantitative trait, a linear model can be used that decomposes genetic effects into between- and within-family components and a test of the within-family component is robust to population stratification. However, this within-family test ignores between-family information potentially leading to a loss of power. Here, we propose a family-based two-stage rare-variant test for quantitative traits. We first construct a weight for each variant within a gene, or other genetic unit, based on score tests of between-family effect parameters. These weights are then used to combine variants using score tests of within-family effect parameters. Because the between-family and within-family tests are orthogonal under the null hypothesis, this two-stage approach can increase power while still maintaining validity. Using simulation, we show that this two-stage test can significantly improve power while correctly maintaining type I error. We further show that the two-stage approach maintains the robustness to population stratification of the within-family test and we illustrate this using simulations reflecting samples composed of continental and closely related subpopulations.
Collapse
Affiliation(s)
- Yu Jiang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.,Center for Statistical Genetics and Genomics, Duke University, Durham, NC, USA
| | - Yunqi Ji
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.,Duke Molecular Physiology Institute, Duke University, Durham, NC, USA
| | - Alexander B Sibley
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.,Duke Cancer Institute, Duke University, Durham, NC, USA
| | - Yi-Ju Li
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.,Duke Molecular Physiology Institute, Duke University, Durham, NC, USA
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.,Center for Statistical Genetics and Genomics, Duke University, Durham, NC, USA
| |
Collapse
|
44
|
Yashin AI, Zhbannikov I, Arbeeva L, Arbeev KG, Wu D, Akushevich I, Yashkin A, Kovtun M, Kulminski AM, Stallard E, Kulminskaya I, Ukraintseva S. Pure and Confounded Effects of Causal SNPs on Longevity: Insights for Proper Interpretation of Research Findings in GWAS of Populations with Different Genetic Structures. Front Genet 2016; 7:188. [PMID: 27877192 PMCID: PMC5099244 DOI: 10.3389/fgene.2016.00188] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Accepted: 10/07/2016] [Indexed: 11/13/2022] Open
Abstract
This paper shows that the effects of causal SNPs on lifespan, estimated through GWAS, may be confounded and the genetic structure of the study population may be responsible for this effect. Simulation experiments show that levels of linkage disequilibrium (LD) and other parameters of the population structure describing connections between two causal SNPs may substantially influence separate estimates of the effect of the causal SNPs on lifespan. This study suggests that differences in LD levels between two causal SNP loci within two study populations may contribute to the failure to replicate previous GWAS findings. The results of this paper also show that successful replication of the results of genetic association studies does not necessarily guarantee proper interpretation of the effect of a causal SNP on lifespan.
Collapse
Affiliation(s)
- Anatoliy I Yashin
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Ilya Zhbannikov
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Liubov Arbeeva
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Konstantin G Arbeev
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Deqing Wu
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Igor Akushevich
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Arseniy Yashkin
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Mikhail Kovtun
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Alexander M Kulminski
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Eric Stallard
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Irina Kulminskaya
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| | - Svetlana Ukraintseva
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University Durham, NC, USA
| |
Collapse
|
45
|
Oetjens MT, Brown-Gentry K, Goodloe R, Dilks HH, Crawford DC. Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data. Front Genet 2016; 7:76. [PMID: 27200085 PMCID: PMC4858524 DOI: 10.3389/fgene.2016.00076] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2015] [Accepted: 04/18/2016] [Indexed: 01/01/2023] Open
Abstract
Population stratification or confounding by genetic ancestry is a potential cause of false associations in genetic association studies. Estimation of and adjustment for genetic ancestry has become common practice thanks in part to the availability of ancestry informative markers on genome-wide association study (GWAS) arrays. While array data is now widespread, these data are not ubiquitous as several large epidemiologic and clinic-based studies lack genome-wide data. One such large epidemiologic-based study lacking genome-wide data accessible to investigators is the National Health and Nutrition Examination Surveys (NHANES), population-based cross-sectional surveys of Americans linked to demographic, health, and lifestyle data conducted by the Centers for Disease Control and Prevention. DNA samples (n = 14,998) were extracted from biospecimens from consented NHANES participants between 1991–1994 (NHANES III, phase 2) and 1999–2002 and represent three major self-identified racial/ethnic groups: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We as the Epidemiologic Architecture for Genes Linked to Environment study genotyped candidate gene and GWAS-identified index variants in NHANES as part of the larger Population Architecture using Genomics and Epidemiology I study for collaborative genetic association studies. To enable basic quality control such as estimation of genetic ancestry to control for population stratification in NHANES san genome-wide data, we outline here strategies that use limited genetic data to identify the markers optimal for characterizing genetic ancestry. From among 411 and 295 autosomal SNPs available in NHANES III and NHANES 1999–2002, we demonstrate that markers with ancestry information can be identified to estimate global ancestry. Despite limited resolution, global genetic ancestry is highly correlated with self-identified race for the majority of participants, although less so for ethnicity. Overall, the strategies outlined here for a large epidemiologic study can be applied to other datasets accessible for genotype–phenotype studies but are sans genome-wide data.
Collapse
Affiliation(s)
- Matthew T Oetjens
- Center for Human Genetics Research Vanderbilt University, Nashville TN, USA
| | | | - Robert Goodloe
- Center for Human Genetics Research Vanderbilt University, Nashville TN, USA
| | | | - Dana C Crawford
- Department of Epidemiology and Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland OH, USA
| |
Collapse
|
46
|
Li M, Li J, He Z, Lu Q, Witte JS, Macleod SL, Hobbs CA, Cleves MA. Testing Allele Transmission of an SNP Set Using a Family-Based Generalized Genetic Random Field Method. Genet Epidemiol 2016; 40:341-51. [PMID: 27061818 DOI: 10.1002/gepi.21970] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2015] [Revised: 02/19/2016] [Accepted: 02/22/2016] [Indexed: 12/20/2022]
Abstract
Family-based association studies are commonly used in genetic research because they can be robust to population stratification (PS). Recent advances in high-throughput genotyping technologies have produced a massive amount of genomic data in family-based studies. However, current family-based association tests are mainly focused on evaluating individual variants one at a time. In this article, we introduce a family-based generalized genetic random field (FB-GGRF) method to test the joint association between a set of autosomal SNPs (i.e., single-nucleotide polymorphisms) and disease phenotypes. The proposed method is a natural extension of a recently developed GGRF method for population-based case-control studies. It models offspring genotypes conditional on parental genotypes, and, thus, is robust to PS. Through simulations, we presented that under various disease scenarios the FB-GGRF has improved power over a commonly used family-based sequence kernel association test (FB-SKAT). Further, similar to GGRF, the proposed FB-GGRF method is asymptotically well-behaved, and does not require empirical adjustment of the type I error rates. We illustrate the proposed method using a study of congenital heart defects with family trios from the National Birth Defects Prevention Study (NBDPS).
Collapse
Affiliation(s)
- Ming Li
- Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, Indiana, United States of America
| | - Jingyun Li
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Zihuai He
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| | - John S Witte
- Department of Epidemiology and Biostatistics, University of California at San Francisco, San Francisco, California, United States of America
| | - Stewart L Macleod
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Charlotte A Hobbs
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Mario A Cleves
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | | |
Collapse
|
47
|
Dhankani V, Gibbs DL, Knijnenburg T, Kramer R, Vockley J, Niederhuber J, Shmulevich I, Bernard B. Using Incomplete Trios to Boost Confidence in Family Based Association Studies. Front Genet 2016; 7:34. [PMID: 27047537 PMCID: PMC4796035 DOI: 10.3389/fgene.2016.00034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 02/25/2016] [Indexed: 11/18/2022] Open
Abstract
Most currently available family based association tests are designed to account only for nuclear families with complete genotypes for parents as well as offspring. Due to the availability of increasingly less expensive generation of whole genome sequencing information, genetic studies are able to collect data for more families and from large family cohorts with the goal of improving statistical power. However, due to missing genotypes, many families are not included in the family based association tests, negating the benefits of large scale sequencing data. Here, we present the CIFBAT method to use incomplete families in Family Based Association Test (FBAT) to evaluate robustness against missing data. CIFBAT uses quantile intervals of the FBAT statistic by randomly choosing valid completions of incomplete family genotypes based on Mendelian inheritance rules. By considering all valid completions equally likely and computing quantile intervals over many randomized iterations, CIFBAT avoids assumption of a homogeneous population structure or any particular missingness pattern in the data. Using simulated data, we show that the quantile intervals computed by CIFBAT are useful in validating robustness of the FBAT statistic against missing data and in identifying genomic markers with higher precision. We also propose a novel set of candidate genomic markers for uterine related abnormalities from analysis of familial whole genome sequences, and provide validation for a previously established set of candidate markers for Type 1 diabetes. We have provided a software package that incorporates TDT, robustTDT, FBAT, and CIFBAT. The data format proposed for the software uses half the memory space that the standard FBAT format (PED) files use, making it efficient for large scale genome wide association studies.
Collapse
Affiliation(s)
| | | | | | | | - Joseph Vockley
- Inova Translational Medicine InstituteFalls Church, VA, USA; School of Medicine, Virginia Commonwealth UniversityRichmond, VA, USA
| | - John Niederhuber
- Inova Translational Medicine InstituteFalls Church, VA, USA; School of Medicine, John Hopkins UniversityBaltimore, MD, USA
| | | | | |
Collapse
|
48
|
Vieira PCM, Burbano RMR, Fernandes DCRO, Montenegro RC, Dos Santos SEB, Sortica VA, Assumpção PP, Ribeiro-Dos-Santos ÂKC, Carvalho AA, Dos Santos NPC. Population stratification effect on cancer susceptibility in an admixed population from Brazilian Amazon. Anticancer Res 2015; 35:2009-2014. [PMID: 25862854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
BACKGROUND/AIM Many efforts have been made to identify candidate genes involved in cancer susceptibility. The present study aimed to investigate the association between Arg194Trp (XRCC1), Ala222Val (MTHFR) and Arg521Lys (EGFR) polymorphisms (SNPs) and their susceptibility to gastric and breast carcinoma cancer in patients from Brazilian Amazon, controlling population structure interference. MATERIALS AND METHODS The SNPs were genotyped by TaqMan® SNP Genotyping Assays. Ancestry was estimated by analysis of a panel with 48 ancestry informative markers. RESULTS Logistic regression analysis showed an inverse association with a 10% increase in African and European ancestry and cancer risk (odds ratio (OR)=1.919 and 0.676, respectively). In a preliminary Chi-square analysis a positive association between Arg521Lys (EGFR) polymorphism and carcinoma susceptibility was found (p=0.037); however, when two different methodologies to control population structure bias were utilized, this association was lost (p=0.064 and p=0.256). CONCLUSION Genetic ancestry influence gastric and breast cancer risk and highlight the importance of population structure inference in association studies in highly admixed populations, such as those from Brazilian Amazon.
Collapse
Affiliation(s)
- Priscilla Cristina Moura Vieira
- Oncology Research Center, Federal University of Pará, Belém, Pará, Brazil Human Cytogenetics Laboratory, Federal University of Pará, Belém, Pará, Brazil
| | - Rommel Mario Rodríguez Burbano
- Oncology Research Center, Federal University of Pará, Belém, Pará, Brazil Human Cytogenetics Laboratory, Federal University of Pará, Belém, Pará, Brazil
| | | | - Raquel Carvalho Montenegro
- Oncology Research Center, Federal University of Pará, Belém, Pará, Brazil Human Cytogenetics Laboratory, Federal University of Pará, Belém, Pará, Brazil
| | | | | | | | | | | | | |
Collapse
|
49
|
Wang X, Zhang S, Li Y, Li M, Sha Q. A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol 2015; 39:294-305. [PMID: 25758547 DOI: 10.1002/gepi.21894] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Revised: 01/09/2015] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
Population stratification has long been recognized as an issue in genetic association studies because unrecognized population stratification can lead to both false-positive and false-negative findings and can obscure true association signals if not appropriately corrected. This issue can be even worse in rare variant association analyses because rare variants often demonstrate stronger and potentially different patterns of stratification than common variants. To correct for population stratification in genetic association studies, we proposed a novel method to Test the effect of an Optimally Weighted combination of variants in Admixed populations (TOWA) in which the analytically derived optimal weights can be calculated from existing phenotype and genotype data. TOWA up weights rare variants and those variants that have strong associations with the phenotype. Additionally, it can adjust for the direction of the association, and allows for local ancestry difference among study subjects. Extensive simulations show that the type I error rate of TOWA is under control in the presence of population stratification and it is more powerful than existing methods. We have also applied TOWA to a real sequencing data. Our simulation studies as well as real data analysis results indicate that TOWA is a useful tool for rare variant association analyses in admixed populations.
Collapse
Affiliation(s)
- Xuexia Wang
- Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, United States of America
| | | | | | | | | |
Collapse
|
50
|
Li M, He Z, Schaid DJ, Cleves MA, Nick TG, Lu Q. A powerful nonparametric statistical framework for family-based association analyses. Genetics 2015; 200:69-78. [PMID: 25745024 DOI: 10.1534/genetics.115.175174] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 02/23/2015] [Indexed: 01/04/2023] Open
Abstract
Family-based study design is commonly used in genetic research. It has many ideal features, including being robust to population stratification (PS). With the advance of high-throughput technologies and ever-decreasing genotyping cost, it has become common for family studies to examine a large number of variants for their associations with disease phenotypes. The yield from the analysis of these family-based genetic data can be enhanced by adopting computationally efficient and powerful statistical methods. We propose a general framework of a family-based U-statistic, referred to as family-U, for family-based association studies. Unlike existing parametric-based methods, the proposed method makes no assumption of the underlying disease models and can be applied to various phenotypes (e.g., binary and quantitative phenotypes) and pedigree structures (e.g., nuclear families and extended pedigrees). By using only within-family information, it can offer robust protection against PS. In the absence of PS, it can also utilize additional information (i.e., between-family information) for power improvement. Through simulations, we demonstrated that family-U attained higher power over a commonly used method, family-based association tests, under various disease scenarios. We further illustrated the new method with an application to large-scale family data from the Framingham Heart Study. By utilizing additional information (i.e., between-family information), family-U confirmed a previous association of CHRNA5 with nicotine dependence.
Collapse
|