951
|
Wang T, Chen YPP, Bowman PJ, Goddard ME, Hayes BJ. A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping. BMC Genomics 2016; 17:744. [PMID: 27654580 PMCID: PMC5031345 DOI: 10.1186/s12864-016-3082-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 09/10/2016] [Indexed: 11/23/2022] Open
Abstract
Background Bayesian mixture models in which the effects of SNP are assumed to come from normal distributions with different variances are attractive for simultaneous genomic prediction and QTL mapping. These models are usually implemented with Monte Carlo Markov Chain (MCMC) sampling, which requires long compute times with large genomic data sets. Here, we present an efficient approach (termed HyB_BR), which is a hybrid of an Expectation-Maximisation algorithm, followed by a limited number of MCMC without the requirement for burn-in. Results To test prediction accuracy from HyB_BR, dairy cattle and human disease trait data were used. In the dairy cattle data, there were four quantitative traits (milk volume, protein kg, fat% in milk and fertility) measured in 16,214 cattle from two breeds genotyped for 632,002 SNPs. Validation of genomic predictions was in a subset of cattle either from the reference set or in animals from a third breeds that were not in the reference set. In all cases, HyB_BR gave almost identical accuracies to Bayesian mixture models implemented with full MCMC, however computational time was reduced by up to 1/17 of that required by full MCMC. The SNPs with high posterior probability of a non-zero effect were also very similar between full MCMC and HyB_BR, with several known genes affecting milk production in this category, as well as some novel genes. HyB_BR was also applied to seven human diseases with 4890 individuals genotyped for around 300 K SNPs in a case/control design, from the Welcome Trust Case Control Consortium (WTCCC). In this data set, the results demonstrated again that HyB_BR performed as well as Bayesian mixture models with full MCMC for genomic predictions and genetic architecture inference while reducing the computational time from 45 h with full MCMC to 3 h with HyB_BR. Conclusions The results for quantitative traits in cattle and disease in humans demonstrate that HyB_BR can perform equally well as Bayesian mixture models implemented with full MCMC in terms of prediction accuracy, but with up to 17 times faster than the full MCMC implementations. The HyB_BR algorithm makes simultaneous genomic prediction, QTL mapping and inference of genetic architecture feasible in large genomic data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3082-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tingting Wang
- School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, Australia. .,Biosciences Research, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Melbourne, VIC, Australia. .,Dairy Futures Cooperative Research Centre, Melbourne, VIC, Australia.
| | - Yi-Ping Phoebe Chen
- School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, Australia
| | - Phil J Bowman
- Biosciences Research, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Melbourne, VIC, Australia.,Dairy Futures Cooperative Research Centre, Melbourne, VIC, Australia.,School of Applied Systems Biology, La Trobe University, Melbourne, VIC, Australia
| | - Michael E Goddard
- Biosciences Research, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Melbourne, VIC, Australia.,Dairy Futures Cooperative Research Centre, Melbourne, VIC, Australia.,Faculty of Veterinary and Agricultural Science, University of Melbourne, Melbourne, VIC, Australia
| | - Ben J Hayes
- Biosciences Research, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Melbourne, VIC, Australia.,Dairy Futures Cooperative Research Centre, Melbourne, VIC, Australia.,Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, Queensland, Australia
| |
Collapse
|
952
|
Jones SE, Tyrrell J, Wood AR, Beaumont RN, Ruth KS, Tuke MA, Yaghootkar H, Hu Y, Teder-Laving M, Hayward C, Roenneberg T, Wilson JF, Del Greco F, Hicks AA, Shin C, Yun CH, Lee SK, Metspalu A, Byrne EM, Gehrman PR, Tiemeier H, Allebrandt KV, Freathy RM, Murray A, Hinds DA, Frayling TM, Weedon MN. Genome-Wide Association Analyses in 128,266 Individuals Identifies New Morningness and Sleep Duration Loci. PLoS Genet 2016; 12:e1006125. [PMID: 27494321 PMCID: PMC4975467 DOI: 10.1371/journal.pgen.1006125] [Citation(s) in RCA: 270] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 05/24/2016] [Indexed: 11/18/2022] Open
Abstract
Disrupted circadian rhythms and reduced sleep duration are associated with several human diseases, particularly obesity and type 2 diabetes, but until recently, little was known about the genetic factors influencing these heritable traits. We performed genome-wide association studies of self-reported chronotype (morning/evening person) and self-reported sleep duration in 128,266 white British individuals from the UK Biobank study. Sixteen variants were associated with chronotype (P<5x10-8), including variants near the known circadian rhythm genes RGS16 (1.21 odds of morningness, 95% CI [1.15, 1.27], P = 3x10-12) and PER2 (1.09 odds of morningness, 95% CI [1.06, 1.12], P = 4x10-10). The PER2 signal has previously been associated with iris function. We sought replication using self-reported data from 89,283 23andMe participants; thirteen of the chronotype signals remained associated at P<5x10-8 on meta-analysis and eleven of these reached P<0.05 in the same direction in the 23andMe study. We also replicated 9 additional variants identified when the 23andMe study was used as a discovery GWAS of chronotype (all P<0.05 and meta-analysis P<5x10-8). For sleep duration, we replicated one known signal in PAX8 (2.6 minutes per allele, 95% CI [1.9, 3.2], P = 5.7x10-16) and identified and replicated two novel associations at VRK2 (2.0 minutes per allele, 95% CI [1.3, 2.7], P = 1.2x10-9; and 1.6 minutes per allele, 95% CI [1.1, 2.2], P = 7.6x10-9). Although we found genetic correlation between chronotype and BMI (rG = 0.056, P = 0.05); undersleeping and BMI (rG = 0.147, P = 1x10-5) and oversleeping and BMI (rG = 0.097, P = 0.04), Mendelian Randomisation analyses, with limited power, provided no consistent evidence of causal associations between BMI or type 2 diabetes and chronotype or sleep duration. Our study brings the total number of loci associated with chronotype to 22 and with sleep duration to three, and provides new insights into the biology of sleep and circadian rhythms in humans. Numerous studies have identified links between too little or too much sleep and circadian misalignment with metabolic disorders such as obesity and type 2 diabetes. However, cause-and-effect is not easily determined, because of multiple confounding factors affecting both sleep patterns and disease risk. Using the first release of the UK Biobank study, which combines detailed measurements and questionnaire data with genetic data, we investigate the genetics of two self-report sleep measures, chronotype and average sleep duration, in 128,266 white British individuals. We replicate previous genetic associations and identify seven and two novel genetic variants influencing chronotype and sleep duration, respectively. Associated variants are located near genes implicated in circadian rhythm regulation (RGS16, PER2), near a serotonin receptor gene (HTR6) and another gene (INADL) encoding a protein thought to be important in photosensitive retinal cells, cells known to communicate with the brain’s primary circadian pacemaker. Using the genetic risk factors, we estimate the unconfounded causal associations of BMI and type 2 diabetes on sleep patterns (and vice versa) through Mendelian Randomisation. However, we find no evidence for causal associations in either direction. The full UK Biobank release of 500,000 individuals will boost our power to detect causal associations.
Collapse
Affiliation(s)
- Samuel E. Jones
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
| | - Jessica Tyrrell
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
| | - Andrew R. Wood
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
| | - Robin N. Beaumont
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
| | - Katherine S. Ruth
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
| | - Marcus A. Tuke
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
| | - Hanieh Yaghootkar
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
| | - Youna Hu
- 23andMe Inc., Mountain View, California, United States of America
- A9.com Inc, Palo Alto, California, United States of America
| | - Maris Teder-Laving
- Estonian Genome Center and Institute of Molecular and Cell Biology of University of Tartu, Estonian Biocentre, Tartu, Estonia
| | - Caroline Hayward
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, Edinburgh, Scotland
| | - Till Roenneberg
- Institute of Medical Psychology, Ludwig-Maximilians-University, Munich, Germany
| | - James F. Wilson
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, Edinburgh, Scotland
- Usher Institute for Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
| | - Fabiola Del Greco
- Center for Biomedicine, European Academy of Bolzano, Bozen, Italy–affiliated Institute of the University of Lübeck, Lübeck, Germany
| | - Andrew A. Hicks
- Center for Biomedicine, European Academy of Bolzano, Bozen, Italy–affiliated Institute of the University of Lübeck, Lübeck, Germany
| | - Chol Shin
- Division of Pulmonary, Sleep and Critical Care Medicine, Department of Internal Medicine, Korea University Ansan Hospital, Ansan, Republic of Korea
- Institute of Human Genomic Study, College of Medicine, Korea University Ansan Hospital, Ansan, Republic of Korea
| | - Chang-Ho Yun
- Department of Neurology, Bundang Clinical Neuroscience Institute, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Seung Ku Lee
- Institute of Human Genomic Study, College of Medicine, Korea University Ansan Hospital, Ansan, Republic of Korea
| | - Andres Metspalu
- Estonian Genome Center and Institute of Molecular and Cell Biology of University of Tartu, Estonian Biocentre, Tartu, Estonia
| | - Enda M. Byrne
- The University of Queensland, Queensland Brain Institute, Brisbane, Australia
| | - Philip R. Gehrman
- Perelman School of Medicine of the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Henning Tiemeier
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, Netherlands
- Department of Psychiatry, Erasmus Medical Center, Rotterdam, Netherlands
| | - Karla V. Allebrandt
- Institute of Medical Psychology, Ludwig-Maximilians-University, Munich, Germany
| | - Rachel M. Freathy
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
| | - Anna Murray
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
| | - David A. Hinds
- 23andMe Inc., Mountain View, California, United States of America
| | - Timothy M. Frayling
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
| | - Michael N. Weedon
- Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom
- * E-mail:
| |
Collapse
|
953
|
Transethnic Genetic-Correlation Estimates from Summary Statistics. Am J Hum Genet 2016; 99:76-88. [PMID: 27321947 DOI: 10.1016/j.ajhg.2016.05.001] [Citation(s) in RCA: 218] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2015] [Accepted: 05/03/2016] [Indexed: 11/22/2022] Open
Abstract
The increasing number of genetic association studies conducted in multiple populations provides an unprecedented opportunity to study how the genetic architecture of complex phenotypes varies between populations, a problem important for both medical and population genetics. Here, we have developed a method for estimating the transethnic genetic correlation: the correlation of causal-variant effect sizes at SNPs common in populations. This methods takes advantage of the entire spectrum of SNP associations and uses only summary-level data from genome-wide association studies. This avoids the computational costs and privacy concerns associated with genotype-level information while remaining scalable to hundreds of thousands of individuals and millions of SNPs. We applied our method to data on gene expression, rheumatoid arthritis, and type 2 diabetes and overwhelmingly found that the genetic correlation was significantly less than 1. Our method is implemented in a Python package called Popcorn.
Collapse
|
954
|
Simmons S, Sahinalp C, Berger B. Enabling Privacy-Preserving GWASs in Heterogeneous Human Populations. Cell Syst 2016; 3:54-61. [PMID: 27453444 PMCID: PMC4994706 DOI: 10.1016/j.cels.2016.04.013] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Revised: 04/08/2016] [Accepted: 04/17/2016] [Indexed: 11/26/2022]
Abstract
The proliferation of large genomic databases offers the potential to perform increasingly larger-scale genome-wide association studies (GWASs). Due to privacy concerns, however, access to these data is limited, greatly reducing their usefulness for research. Here, we introduce a computational framework for performing GWASs that adapts principles of differential privacy-a cryptographic theory that facilitates secure analysis of sensitive data-to both protect private phenotype information (e.g., disease status) and correct for population stratification. This framework enables us to produce privacy-preserving GWAS results based on EIGENSTRAT and linear mixed model (LMM)-based statistics, both of which correct for population stratification. We test our differentially private statistics, PrivSTRAT and PrivLMM, on simulated and real GWAS datasets and find they are able to protect privacy while returning meaningful results. Our framework can be used to securely query private genomic datasets to discover which specific genomic alterations may be associated with a disease, thus increasing the availability of these valuable datasets.
Collapse
Affiliation(s)
- Sean Simmons
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Cenk Sahinalp
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada; School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
955
|
Physical and neurobehavioral determinants of reproductive onset and success. Nat Genet 2016; 48:617-623. [PMID: 27089180 DOI: 10.1038/ng.3551] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 03/24/2016] [Indexed: 12/16/2022]
Abstract
The ages of puberty, first sexual intercourse and first birth signify the onset of reproductive ability, behavior and success, respectively. In a genome-wide association study of 125,667 UK Biobank participants, we identify 38 loci associated (P < 5 × 10(-8)) with age at first sexual intercourse. These findings were taken forward in 241,910 men and women from Iceland and 20,187 women from the Women's Genome Health Study. Several of the identified loci also exhibit associations (P < 5 × 10(-8)) with other reproductive and behavioral traits, including age at first birth (variants in or near ESR1 and RBM6-SEMA3F), number of children (CADM2 and ESR1), irritable temperament (MSRA) and risk-taking propensity (CADM2). Mendelian randomization analyses infer causal influences of earlier puberty timing on earlier first sexual intercourse, earlier first birth and lower educational attainment. In turn, likely causal consequences of earlier first sexual intercourse include reproductive, educational, psychiatric and cardiometabolic outcomes.
Collapse
|
956
|
Hoffmann TJ, Witte JS. Strategies for Imputing and Analyzing Rare Variants in Association Studies. Trends Genet 2016; 31:556-563. [PMID: 26450338 DOI: 10.1016/j.tig.2015.07.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2015] [Revised: 07/28/2015] [Accepted: 07/31/2015] [Indexed: 01/22/2023]
Abstract
Rare genetic variants may be responsible for a significant amount of the uncharacterized genetic risk underlying many diseases. An efficient approach to characterizing the disease burden of rare variants may be to impute them into existing large datasets. It is well known that the ability to impute a rare variant is dependent both on the array choice and number of individuals in the reference panel carrying that variant, although it is still unclear exactly how well imputation will work for rare variants. Here, we review the additional challenges that arise when imputing rare variants, looking at studies that have been able to impute rare variants, methods behind merging reference panels, approaches for imputing rare variants, and methods for analyzing rare variants.
Collapse
Affiliation(s)
- Thomas J Hoffmann
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California San Francisco, San Francisco, CA, 94143 USA.
| | - John S Witte
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California San Francisco, San Francisco, CA, 94143 USA; Department of Urology, University of California San Francisco, San Francisco, CA 94158, USA; UCSF Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
957
|
Chen H, Wang C, Conomos MP, Stilp AM, Li Z, Sofer T, Szpiro AA, Chen W, Brehm JM, Celedón JC, Redline S, Papanicolaou GJ, Thornton TA, Laurie CC, Rice K, Lin X. Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models. Am J Hum Genet 2016; 98:653-66. [PMID: 27018471 DOI: 10.1016/j.ajhg.2016.02.012] [Citation(s) in RCA: 287] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 02/17/2016] [Indexed: 11/17/2022] Open
Abstract
Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM's constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs.
Collapse
Affiliation(s)
- Han Chen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Chaolong Wang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Adrienne M Stilp
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Mathematics, Tsinghua University, Beijing 100084, P. R. China
| | - Tamar Sofer
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Adam A Szpiro
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Wei Chen
- Division of Pediatric Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA 15224, USA
| | - John M Brehm
- Division of Pediatric Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Juan C Celedón
- Division of Pediatric Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Departments of Medicine and Neurology, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - George J Papanicolaou
- Prevention and Population Sciences Program, Division of Cardiovascular Sciences, National Heart, Lung, and Blood Institute, NIH, Bethesda, MD 20892, USA
| | - Timothy A Thornton
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Kenneth Rice
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
| |
Collapse
|
958
|
Joo JWJ, Hormozdiari F, Han B, Eskin E. Multiple testing correction in linear mixed models. Genome Biol 2016; 17:62. [PMID: 27039378 PMCID: PMC4818520 DOI: 10.1186/s13059-016-0903-6] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 02/17/2016] [Indexed: 08/30/2023] Open
Abstract
BACKGROUND Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data.
Collapse
Affiliation(s)
- Jong Wha J Joo
- Bioinformatics IDP, University of California, Los Angeles, CA, USA
| | - Farhad Hormozdiari
- Computer Science Department, University of California, Los Angeles, CA, USA
| | - Buhm Han
- Department of Convergence Medicine, University of Ulsan College of Medicine & Asan Institute for Life Sciences, Asan Medical Center, Seoul, 138-736, Republic of Korea.
| | - Eleazar Eskin
- Computer Science Department, University of California, Los Angeles, CA, USA. .,Department of Human Genetics, University of California, Los Angeles, CA, USA.
| |
Collapse
|
959
|
Lane JM, Vlasac I, Anderson SG, Kyle SD, Dixon WG, Bechtold DA, Gill S, Little MA, Luik A, Loudon A, Emsley R, Scheer FAJL, Lawlor DA, Redline S, Ray DW, Rutter MK, Saxena R. Genome-wide association analysis identifies novel loci for chronotype in 100,420 individuals from the UK Biobank. Nat Commun 2016; 7:10889. [PMID: 26955885 PMCID: PMC4786869 DOI: 10.1038/ncomms10889] [Citation(s) in RCA: 206] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 01/29/2016] [Indexed: 12/26/2022] Open
Abstract
Our sleep timing preference, or chronotype, is a manifestation of our internal biological clock. Variation in chronotype has been linked to sleep disorders, cognitive and physical performance, and chronic disease. Here we perform a genome-wide association study of self-reported chronotype within the UK Biobank cohort (n=100,420). We identify 12 new genetic loci that implicate known components of the circadian clock machinery and point to previously unstudied genetic variants and candidate genes that might modulate core circadian rhythms or light-sensing pathways. Pathway analyses highlight central nervous and ocular systems and fear-response-related processes. Genetic correlation analysis suggests chronotype shares underlying genetic pathways with schizophrenia, educational attainment and possibly BMI. Further, Mendelian randomization suggests that evening chronotype relates to higher educational attainment. These results not only expand our knowledge of the circadian system in humans but also expose the influence of circadian characteristics over human health and life-history variables such as educational attainment.
Collapse
Affiliation(s)
- Jacqueline M. Lane
- Center for Human Genetic Research Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts 02142, USA
| | - Irma Vlasac
- Center for Human Genetic Research Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts 02142, USA
| | - Simon G. Anderson
- Cardiovascular Research Group, Institute of Cardiovascular Sciences, The University of Manchester, Manchester M139PL, UK
| | - Simon D. Kyle
- Sleep and Circadian Neuroscience Institute (SCNi), Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX12JD, UK
| | - William G. Dixon
- Centre for Musculoskeletal Research Institute of Inflammation and Repair, The University of Manchester, Manchester M139PL, UK
| | - David A. Bechtold
- Faculty of Life Sciences, The University of Manchester, Manchester M139PL, UK
| | - Shubhroz Gill
- Chemical Biology Program, Broad Institute, Cambridge, Massachusetts 02142, USA
| | - Max A. Little
- Department of Mathematics, Engineering and Applied Science, Aston University, Birmingham B47ET, UK
| | - Annemarie Luik
- Sleep and Circadian Neuroscience Institute (SCNi), Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX12JD, UK
| | - Andrew Loudon
- Faculty of Life Sciences, The University of Manchester, Manchester M139PL, UK
| | - Richard Emsley
- Institute of Population Health, The University of Manchester, Manchester M139PL, UK
| | - Frank A. J. L. Scheer
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Deborah A. Lawlor
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol BS81TH, UK
- School of Social and Community Medicine, University of Bristol, Bristol BS81TH, UK
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - David W. Ray
- Centre for Endocrinology and Diabetes, Institute of Human Development, The University of Manchester, Manchester M139PL, UK
| | - Martin K. Rutter
- Centre for Endocrinology and Diabetes, Institute of Human Development, The University of Manchester, Manchester M139PL, UK
- Manchester Diabetes Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester M139PL, UK
| | - Richa Saxena
- Center for Human Genetic Research Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts 02142, USA
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
| |
Collapse
|
960
|
Tyrrell J, Jones SE, Beaumont R, Astley CM, Lovell R, Yaghootkar H, Tuke M, Ruth KS, Freathy RM, Hirschhorn JN, Wood AR, Murray A, Weedon MN, Frayling TM. Height, body mass index, and socioeconomic status: mendelian randomisation study in UK Biobank. BMJ 2016; 352:i582. [PMID: 26956984 PMCID: PMC4783516 DOI: 10.1136/bmj.i582] [Citation(s) in RCA: 202] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
OBJECTIVE To determine whether height and body mass index (BMI) have a causal role in five measures of socioeconomic status. DESIGN Mendelian randomisation study to test for causal effects of differences in stature and BMI on five measures of socioeconomic status. Mendelian randomisation exploits the fact that genotypes are randomly assigned at conception and thus not confounded by non-genetic factors. SETTING UK Biobank. PARTICIPANTS 119,669 men and women of British ancestry, aged between 37 and 73 years. MAIN OUTCOME MEASURES Age completed full time education, degree level education, job class, annual household income, and Townsend deprivation index. RESULTS In the UK Biobank study, shorter stature and higher BMI were observationally associated with several measures of lower socioeconomic status. The associations between shorter stature and lower socioeconomic status tended to be stronger in men, and the associations between higher BMI and lower socioeconomic status tended to be stronger in women. For example, a 1 standard deviation (SD) higher BMI was associated with a £210 (€276; $300; 95% confidence interval £84 to £420; P=6 × 10(-3)) lower annual household income in men and a £1890 (£1680 to £2100; P=6 × 10(-15)) lower annual household income in women. Genetic analysis provided evidence that these associations were partly causal. A genetically determined 1 SD (6.3 cm) taller stature caused a 0.06 (0.02 to 0.09) year older age of completing full time education (P=0.01), a 1.12 (1.07 to 1.18) times higher odds of working in a skilled profession (P=6 × 10(-7)), and a £1130 (£680 to £1580) higher annual household income (P=4 × 10(-8)). Associations were stronger in men. A genetically determined 1 SD higher BMI (4.6 kg/m(2)) caused a £2940 (£1680 to £4200; P=1 × 10(-5)) lower annual household income and a 0.10 (0.04 to 0.16) SD (P=0.001) higher level of deprivation in women only. CONCLUSIONS These data support evidence that height and BMI play an important partial role in determining several aspects of a person's socioeconomic status, especially women's BMI for income and deprivation and men's height for education, income, and job class. These findings have important social and health implications, supporting evidence that overweight people, especially women, are at a disadvantage and that taller people, especially men, are at an advantage.
Collapse
Affiliation(s)
- Jessica Tyrrell
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK European Centre for Environment and Human Health, University of Exeter Medical School, The Knowledge Spa, Truro TR1 3HD, UK
| | - Samuel E Jones
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| | - Robin Beaumont
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| | - Christina M Astley
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA Center for Basic and Translational Obesity Research and Division of Endocrinology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Rebecca Lovell
- European Centre for Environment and Human Health, University of Exeter Medical School, The Knowledge Spa, Truro TR1 3HD, UK
| | - Hanieh Yaghootkar
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| | - Marcus Tuke
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| | - Katherine S Ruth
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| | - Rachel M Freathy
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| | - Joel N Hirschhorn
- European Centre for Environment and Human Health, University of Exeter Medical School, The Knowledge Spa, Truro TR1 3HD, UK Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Andrew R Wood
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| | - Anna Murray
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| | - Michael N Weedon
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| | - Timothy M Frayling
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter EX2 5DW, UK
| |
Collapse
|
961
|
Galinsky KJ, Bhatia G, Loh PR, Georgiev S, Mukherjee S, Patterson NJ, Price AL. Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. Am J Hum Genet 2016; 98:456-472. [PMID: 26924531 PMCID: PMC4827102 DOI: 10.1016/j.ajhg.2015.12.022] [Citation(s) in RCA: 248] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 12/31/2015] [Indexed: 01/13/2023] Open
Abstract
Searching for genetic variants with unusual differentiation between subpopulations is an established approach for identifying signals of natural selection. However, existing methods generally require discrete subpopulations. We introduce a method that infers selection using principal components (PCs) by identifying variants whose differentiation along top PCs is significantly greater than the null distribution of genetic drift. To enable the application of this method to large datasets, we developed the FastPCA software, which employs recent advances in random matrix theory to accurately approximate top PCs while reducing time and memory cost from quadratic to linear in the number of individuals, a computational improvement of many orders of magnitude. We apply FastPCA to a cohort of 54,734 European Americans, identifying 5 distinct subpopulations spanning the top 4 PCs. Using the PC-based test for natural selection, we replicate previously known selected loci and identify three new genome-wide significant signals of selection, including selection in Europeans at ADH1B. The coding variant rs1229984(∗)T has previously been associated to a decreased risk of alcoholism and shown to be under selection in East Asians; we show that it is a rare example of independent evolution on two continents. We also detect selection signals at IGFBP3 and IGH, which have also previously been associated to human disease.
Collapse
Affiliation(s)
- Kevin J Galinsky
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Gaurav Bhatia
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | | | - Sayan Mukherjee
- Departments of Statistical Science, Computer Science, and Mathematics, Duke University, Durham, NC 27708, USA
| | - Nick J Patterson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
| |
Collapse
|
962
|
Pilling LC, Atkins JL, Bowman K, Jones SE, Tyrrell J, Beaumont RN, Ruth KS, Tuke MA, Yaghootkar H, Wood AR, Freathy RM, Murray A, Weedon MN, Xue L, Lunetta K, Murabito JM, Harries LW, Robine JM, Brayne C, Kuchel GA, Ferrucci L, Frayling TM, Melzer D. Human longevity is influenced by many genetic variants: evidence from 75,000 UK Biobank participants. Aging (Albany NY) 2016; 8:547-60. [PMID: 27015805 PMCID: PMC4833145 DOI: 10.18632/aging.100930] [Citation(s) in RCA: 93] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Accepted: 03/10/2016] [Indexed: 11/25/2022]
Abstract
Variation in human lifespan is 20 to 30% heritable in twins but few genetic variants have been identified. We undertook a Genome Wide Association Study (GWAS) using age at death of parents of middle-aged UK Biobank participants of European decent (n=75,244 with father's and/or mother's data, excluding early deaths). Genetic risk scores for 19 phenotypes (n=777 proven variants) were also tested. In GWAS, a nicotine receptor locus(CHRNA3, previously associated with increased smoking and lung cancer) was associated with fathers' survival. Less common variants requiring further confirmation were also identified. Offspring of longer lived parents had more protective alleles for coronary artery disease, systolic blood pressure, body mass index, cholesterol and triglyceride levels, type-1 diabetes, inflammatory bowel disease and Alzheimer's disease. In candidate analyses, variants in the TOMM40/APOE locus were associated with longevity, but FOXO variants were not. Associations between extreme longevity (mother >=98 years, fathers >=95 years, n=1,339) and disease alleles were similar, with an additional association with HDL cholesterol (p=5.7x10-3). These results support a multiple protective factors model influencing lifespan and longevity (top 1% survival) in humans, with prominent roles for cardiovascular-related pathways. Several of these genetically influenced risks, including blood pressure and tobacco exposure, are potentially modifiable.
Collapse
Affiliation(s)
- Luke C. Pilling
- Epidemiology and Public Health Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Janice L. Atkins
- Epidemiology and Public Health Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Kirsty Bowman
- Epidemiology and Public Health Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Samuel E. Jones
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Jessica Tyrrell
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Robin N. Beaumont
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Katherine S. Ruth
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Marcus A. Tuke
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Hanieh Yaghootkar
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Andrew R. Wood
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Rachel M. Freathy
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Anna Murray
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Michael N. Weedon
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Luting Xue
- School of Public Health, Department of Biostatistics, Boston University, Boston, MA 02215, USA
| | - Kathryn Lunetta
- School of Public Health, Department of Biostatistics, Boston University, Boston, MA 02215, USA
- The Framingham Heart Study, Framingham, MA 01702, USA
| | - Joanne M. Murabito
- The Framingham Heart Study, Framingham, MA 01702, USA
- Section of General Internal Medicine, Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Lorna W. Harries
- Institute of Biomedical and Clinical Sciences, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - Jean-Marie Robine
- Institut National de la Santé et de la Recherche Médicale (INSERM U1198), 34394 Montpellier, France
- Ecole Pratique des Hautes études (EPHE), 75014 Paris, France
| | - Carol Brayne
- Cambridge Institute of Public Health, School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SR, UK
| | - George A. Kuchel
- Center on Aging, University of Connecticut, Farmington, CT 06030, USA
| | | | - Timothy M. Frayling
- Genetics of Complex Traits Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
| | - David Melzer
- Epidemiology and Public Health Group, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK
- Center on Aging, University of Connecticut, Farmington, CT 06030, USA
| |
Collapse
|
963
|
Pausch H, Emmerling R, Schwarzenbacher H, Fries R. A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle. Genet Sel Evol 2016; 48:14. [PMID: 26883850 PMCID: PMC4756527 DOI: 10.1186/s12711-016-0190-4] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 01/26/2016] [Indexed: 12/16/2022] Open
Abstract
Background The availability of whole-genome sequence data from key ancestors in bovine populations provides an exhaustive catalogue of polymorphic sites that segregate within and across cattle breeds. Sequence variants identified from the sequenced genome of key ancestors can be imputed into animals that have been genotyped using medium- and high-density genotyping arrays. Association analysis with imputed sequences, particularly when applied to multiple traits simultaneously, is a very powerful approach to detect candidate causal variants that underlie complex phenotypes. Results We used whole-genome sequence data from 157 key ancestors of the German Fleckvieh cattle population to impute 20,561,798 sequence variants into 10,363 animals that had (partly imputed) genotypes based on 634,109 single nucleotide polymorphisms (SNPs). Rare variants were more frequent among the sequence-derived than the array-derived genotypes. Association studies with imputed sequence variants were performed using seven correlated udder conformation traits as response variables. The calculation of an approximate multi-trait test statistic enabled us to detect 12 quantitative trait loci (QTL) (P < 2.97 × 10−9) that affect different morphological features of the mammary gland. Among the tested variants, the most significant associations were found for imputed sequence variants at 11 QTL, whereas the top association signal was observed for an array-derived variant at a QTL on bovine chromosome 14. Seven QTL were associated with multiple phenotypes. Most QTL were located in non-coding regions of the genome but in close proximity of candidate genes that could be involved in mammary gland morphology (SP5, GC, NPFFR2, CRIM1, RXFP2, TBX5, RBM19 and ADAM12). Conclusions Using imputed sequence variants in association analyses allows the detection of QTL at maximum resolution. Multi-trait approaches can reveal QTL that are not detected in single-trait association studies. Most QTL for udder conformation traits were located in non-coding regions of the genome, which suggests that mutations in regulatory sequences are the major determinants of variation in mammary gland morphology in cattle. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0190-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hubert Pausch
- Lehrstuhl fuer Tierzucht, Technische Universitaet Muenchen, 85354, Freising, Germany.
| | - Reiner Emmerling
- Institut fuer Tierzucht, Bayerische Landesanstalt fuer Landwirtschaft, 85586, Poing, Germany.
| | | | - Ruedi Fries
- Lehrstuhl fuer Tierzucht, Technische Universitaet Muenchen, 85354, Freising, Germany.
| |
Collapse
|
964
|
Day FR, Loh PR, Scott RA, Ong KK, Perry JRB. A Robust Example of Collider Bias in a Genetic Association Study. Am J Hum Genet 2016; 98:392-3. [PMID: 26849114 DOI: 10.1016/j.ajhg.2015.12.019] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 12/17/2015] [Indexed: 11/19/2022] Open
Affiliation(s)
- Felix R Day
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Box 285, Hills Road, Cambridge CB2 0QQ, UK
| | - Po-Ru Loh
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Robert A Scott
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Box 285, Hills Road, Cambridge CB2 0QQ, UK
| | - Ken K Ong
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Box 285, Hills Road, Cambridge CB2 0QQ, UK
| | - John R B Perry
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Box 285, Hills Road, Cambridge CB2 0QQ, UK.
| |
Collapse
|
965
|
Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies. PLoS Genet 2016; 12:e1005767. [PMID: 26828793 PMCID: PMC4734661 DOI: 10.1371/journal.pgen.1005767] [Citation(s) in RCA: 825] [Impact Index Per Article: 91.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 12/03/2015] [Indexed: 12/05/2022] Open
Abstract
False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises true positives. The modified MLM method, Multiple Loci Linear Mixed Model (MLMM), incorporates multiple markers simultaneously as covariates in a stepwise MLM to partially remove the confounding between testing markers and kinship. To completely eliminate the confounding, we divided MLMM into two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and use them iteratively. FEM contains testing markers, one at a time, and multiple associated markers as covariates to control false positives. To avoid model over-fitting problem in FEM, the associated markers are estimated in REM by using them to define kinship. The P values of testing markers and the associated markers are unified at each iteration. We named the new method as Fixed and random model Circulating Probability Unification (FarmCPU). Both real and simulated data analyses demonstrated that FarmCPU improves statistical power compared to current methods. Additional benefits include an efficient computing time that is linear to both number of individuals and number of markers. Now, a dataset with half million individuals and half million markers can be analyzed within three days.
Collapse
Affiliation(s)
- Xiaolei Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei, China
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
| | - Meng Huang
- Department of Crop and Soil Sciences, Washington State University, Pullman, Washington, United States of America
| | - Bin Fan
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Edward S. Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
- United States Department of Agriculture (USDA)–Agricultural Research Service (ARS), Ithaca, New York, United States of America
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, Washington, United States of America
- Department of Animal Sciences, Northeast Agricultural University, Harbin, Heilongjiang, China
| |
Collapse
|
966
|
Ruth KS, Beaumont RN, Tyrrell J, Jones SE, Tuke MA, Yaghootkar H, Wood AR, Freathy RM, Weedon MN, Frayling TM, Murray A. Genetic evidence that lower circulating FSH levels lengthen menstrual cycle, increase age at menopause and impact female reproductive health. Hum Reprod 2016; 31:473-81. [PMID: 26732621 PMCID: PMC4716809 DOI: 10.1093/humrep/dev318] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2015] [Accepted: 11/25/2015] [Indexed: 12/22/2022] Open
Abstract
STUDY QUESTION How does a genetic variant in the FSHB promoter, known to alter FSH levels, impact female reproductive health? SUMMARY ANSWER The T allele of the FSHB promoter polymorphism (rs10835638; c.-211G>T) results in longer menstrual cycles and later menopause and, while having detrimental effects on fertility, is protective against endometriosis. WHAT IS KNOWN ALREADY The FSHB promoter polymorphism (rs10835638; c.-211G>T) affects levels of FSHB transcription and, as a result, circulating levels of FSH. FSH is required for normal fertility and genetic variants at the FSHB locus are associated with age at menopause and polycystic ovary syndrome (PCOS). STUDY DESIGN, SIZE, DURATION We used cross-sectional data from the UK Biobank to look at associations between the FSHB promoter polymorphism and reproductive traits, and performed a genome-wide association study (GWAS) for length of menstrual cycle. PARTICIPANTS/MATERIALS, SETTING, METHODS We included white British individuals aged 40-69 years in 2006-2010, in the May 2015 release of genetic data from UK Biobank. We tested the FSH-lowering T allele of the FSHB promoter polymorphism (rs10835638; c.-211G>T) for associations with 29, mainly female, reproductive phenotypes in up to 63 350 women and 56 608 men. We conducted a GWAS in 9534 individuals to identify genetic variants associated with length of menstrual cycle. MAIN RESULTS AND THE ROLE OF CHANCE The FSH-lowering T allele of the FSHB promoter polymorphism (rs10835638; MAF 0.16) was associated with longer menstrual cycles [0.16 SD (c. 1 day) per minor allele; 95% confidence interval (CI) 0.12-0.20; P = 6 × 10(-16)], later age at menopause (0.13 years per minor allele; 95% CI 0.04-0.22; P = 5.7 × 10(-3)), greater female nulliparity [odds ratio (OR) = 1.06; 95% CI 1.02-1.11; P = 4.8 × 10(-3)] and lower risk of endometriosis (OR = 0.79; 95% CI 0.69-0.90; P = 4.1 × 10(-4)). The FSH-lowering T allele was not associated with other female reproductive illnesses or conditions in our study and we did not replicate associations with male infertility or PCOS. In the GWAS for menstrual cycle length, only variants near the FSHB gene reached genome-wide significance (P < 5 × 10(-9)). LIMITATIONS, REASONS FOR CAUTION The data included might be affected by recall bias. Cycle length was not available for 25% of women still cycling (1% did not answer, 6% did not know and for 18% cycle length was recorded as 'irregular'). Women with a cycle length recorded were aged over 40 and were approaching menopause; however, we did not find evidence that this affected the results. Many of the groups with illnesses had relatively small sample sizes and so the study may have been under-powered to detect an effect. WIDER IMPLICATIONS OF THE FINDINGS We found a strong novel association between a genetic variant that lowers FSH levels and longer menstrual cycles, at a locus previously robustly associated with age at menopause. The variant was also associated with nulliparity and endometriosis risk. These findings should now be verified in a second independent group of patients. We conclude that lifetime differences in circulating levels of FSH between individuals can influence menstrual cycle length and a range of reproductive outcomes, including menopause timing, infertility, endometriosis and PCOS. STUDY FUNDING/COMPETING INTERESTS None. TRIAL REGISTRATION NUMBER Not applicable.
Collapse
Affiliation(s)
- Katherine S Ruth
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| | - Robin N Beaumont
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| | - Jessica Tyrrell
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| | - Samuel E Jones
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| | - Marcus A Tuke
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| | - Hanieh Yaghootkar
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| | - Andrew R Wood
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| | - Rachel M Freathy
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| | - Michael N Weedon
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| | - Timothy M Frayling
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| | - Anna Murray
- Genetics of Complex Traits, University of Exeter Medical School, RILD Level 3, Royal Devon and Exeter Hospital, Barrack Road, Exeter EX2 5DW, UK
| |
Collapse
|
967
|
Canela-Xandri O, Law A, Gray A, Woolliams JA, Tenesa A. A new tool called DISSECT for analysing large genomic data sets using a Big Data approach. Nat Commun 2015; 6:10162. [PMID: 26657010 PMCID: PMC4682108 DOI: 10.1038/ncomms10162] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 11/10/2015] [Indexed: 12/03/2022] Open
Abstract
Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ∼4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. Availability of computing power can limit computational analysis of large genetic and genomic datasets. Here, Canela-Xandri, et al. describe a software called DISSECT that is capable of analyzing large-scale genetic data by distributing the work across thousands of networked computers.
Collapse
Affiliation(s)
- Oriol Canela-Xandri
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Edinburgh EH25 9RG, UK
| | - Andy Law
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Edinburgh EH25 9RG, UK
| | - Alan Gray
- EPCC, The University of Edinburgh, Edinburgh EH9 3FD, UK
| | - John A Woolliams
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Edinburgh EH25 9RG, UK
| | - Albert Tenesa
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush Campus, Edinburgh EH25 9RG, UK.,MRC HGU at the MRC IGMM, University of Edinburgh, Edinburgh EH4 2XU, UK
| |
Collapse
|
968
|
Loh PR, Bhatia G, Gusev A, Finucane HK, Bulik-Sullivan BK, Pollack SJ, Schizophrenia Working Group of the Psychiatric Genomics Consortium, de Candia TR, Lee SH, Wray NR, Kendler KS, O’Donovan MC, Neale BM, Patterson N, Price AL. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat Genet 2015; 47:1385-92. [PMID: 26523775 PMCID: PMC4666835 DOI: 10.1038/ng.3431] [Citation(s) in RCA: 314] [Impact Index Per Article: 31.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 10/02/2015] [Indexed: 12/15/2022]
Abstract
Heritability analyses of genome-wide association study (GWAS) cohorts have yielded important insights into complex disease architecture, and increasing sample sizes hold the promise of further discoveries. Here we analyze the genetic architectures of schizophrenia in 49,806 samples from the PGC and nine complex diseases in 54,734 samples from the GERA cohort. For schizophrenia, we infer an overwhelmingly polygenic disease architecture in which ≥71% of 1-Mb genomic regions harbor ≥1 variant influencing schizophrenia risk. We also observe significant enrichment of heritability in GC-rich regions and in higher-frequency SNPs for both schizophrenia and GERA diseases. In bivariate analyses, we observe significant genetic correlations (ranging from 0.18 to 0.85) for several pairs of GERA diseases; genetic correlations were on average 1.3 tunes stronger than the correlations of overall disease liabilities. To accomplish these analyses, we developed a fast algorithm for multicomponent, multi-trait variance-components analysis that overcomes prior computational barriers that made such analyses intractable at this scale.
Collapse
Affiliation(s)
- Po-Ru Loh
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Gaurav Bhatia
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Alexander Gusev
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Hilary K Finucane
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Brendan K Bulik-Sullivan
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Samuela J Pollack
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | | | - Teresa R de Candia
- Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, Colorado, United States
| | - Sang Hong Lee
- The Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
- School of Environmental and Rural Science, University of New England, Armidale, New South Wales, Australia
| | - Naomi R Wray
- The Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
| | - Kenneth S Kendler
- Department of Psychiatry and Human Genetics, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Michael C O’Donovan
- MRC Centre for Neuropsychiatric Genetics and Genomics, Institute of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Nick Patterson
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
969
|
Two-Variance-Component Model Improves Genetic Prediction in Family Datasets. Am J Hum Genet 2015; 97:677-90. [PMID: 26544803 DOI: 10.1016/j.ajhg.2015.10.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Accepted: 10/03/2015] [Indexed: 12/15/2022] Open
Abstract
Genetic prediction based on either identity by state (IBS) sharing or pedigree information has been investigated extensively with best linear unbiased prediction (BLUP) methods. Such methods were pioneered in plant and animal-breeding literature and have since been applied to predict human traits, with the aim of eventual clinical utility. However, methods to combine IBS sharing and pedigree information for genetic prediction in humans have not been explored. We introduce a two-variance-component model for genetic prediction: one component for IBS sharing and one for approximate pedigree structure, both estimated with genetic markers. In simulations using real genotypes from the Candidate-gene Association Resource (CARe) and Framingham Heart Study (FHS) family cohorts, we demonstrate that the two-variance-component model achieves gains in prediction r(2) over standard BLUP at current sample sizes, and we project, based on simulations, that these gains will continue to hold at larger sample sizes. Accordingly, in analyses of four quantitative phenotypes from CARe and two quantitative phenotypes from FHS, the two-variance-component model significantly improves prediction r(2) in each case, with up to a 20% relative improvement. We also find that standard mixed-model association tests can produce inflated test statistics in datasets with related individuals, whereas the two-variance-component model corrects for inflation.
Collapse
|
970
|
Popescu AA, Huber KT. PSIKO2: a fast and versatile tool to infer population stratification on various levels in GWAS. Bioinformatics 2015; 31:3552-4. [PMID: 26142187 DOI: 10.1093/bioinformatics/btv396] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 06/24/2015] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Genome-wide association studies are an invaluable tool for identifying genotypic loci linked with agriculturally important traits or certain diseases. The signal on which such studies rely upon can, however, be obscured by population stratification making it necessary to account for it in some way. Population stratification is dependent on when admixture happened and thus can occur at various levels. To aid in its inference at the genome level, we recently introduced psiko, and comparison with leading methods indicates that it has attractive properties. However, until now, it could not be used for local ancestry inference which is preferable in cases of recent admixture as the genome level tends to be too coarse to properly account for processes acting on small segments of a genome. To also bring the powerful ideas underpinning psiko to bear in such studies, we extended it to psiko2, which we introduce here. AVAILABILITY AND IMPLEMENTATION Source code, binaries and user manual are freely available at https://www.uea.ac.uk/computing/psiko. CONTACT Andrei-Alin.Popescu@uea.ac.uk or Katharina.Huber@cmp.uea.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrei-Alin Popescu
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| |
Collapse
|
971
|
Shriner D, Bentley AR, Doumatey AP, Chen G, Zhou J, Adeyemo A, Rotimi CN. Phenotypic variance explained by local ancestry in admixed African Americans. Front Genet 2015; 6:324. [PMID: 26579196 PMCID: PMC4625172 DOI: 10.3389/fgene.2015.00324] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 10/13/2015] [Indexed: 01/11/2023] Open
Abstract
We surveyed 26 quantitative traits and disease outcomes to understand the proportion of phenotypic variance explained by local ancestry in admixed African Americans. After inferring local ancestry as the number of African-ancestry chromosomes at hundreds of thousands of genotyped loci across all autosomes, we used a linear mixed effects model to estimate the variance explained by local ancestry in two large independent samples of unrelated African Americans. We found that local ancestry at major and polygenic effect genes can explain up to 20 and 8% of phenotypic variance, respectively. These findings provide evidence that most but not all additive genetic variance is explained by genetic markers undifferentiated by ancestry. These results also inform the proportion of health disparities due to genetic risk factors and the magnitude of error in association studies not controlling for local ancestry.
Collapse
Affiliation(s)
- Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute Bethesda, MD, USA
| | - Amy R Bentley
- Center for Research on Genomics and Global Health, National Human Genome Research Institute Bethesda, MD, USA
| | - Ayo P Doumatey
- Center for Research on Genomics and Global Health, National Human Genome Research Institute Bethesda, MD, USA
| | - Guanjie Chen
- Center for Research on Genomics and Global Health, National Human Genome Research Institute Bethesda, MD, USA
| | - Jie Zhou
- Center for Research on Genomics and Global Health, National Human Genome Research Institute Bethesda, MD, USA
| | - Adebowale Adeyemo
- Center for Research on Genomics and Global Health, National Human Genome Research Institute Bethesda, MD, USA
| | - Charles N Rotimi
- Center for Research on Genomics and Global Health, National Human Genome Research Institute Bethesda, MD, USA
| |
Collapse
|
972
|
Coleman JRI, Euesden J, Patel H, Folarin AA, Newhouse S, Breen G. Quality control, imputation and analysis of genome-wide genotyping data from the Illumina HumanCoreExome microarray. Brief Funct Genomics 2015; 15:298-304. [PMID: 26443613 DOI: 10.1093/bfgp/elv037] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The decreasing cost of performing genome-wide association studies has made genomics widely accessible. However, there is a paucity of guidance for best practice in conducting such analyses. For the results of a study to be valid and replicable, multiple biases must be addressed in the course of data preparation and analysis. In addition, standardizing methods across small, independent studies would increase comparability and the potential for effective meta-analysis. This article provides a discussion of important aspects of quality control, imputation and analysis of genome-wide data from a low-coverage microarray, as well as a straight-forward guide to performing a genome-wide association study. A detailed protocol is provided online, with example scripts available at https://github.com/JoniColeman/gwas_scripts.
Collapse
|
973
|
Vilhjálmsson B, Yang J, Finucane H, Gusev A, Lindström S, Ripke S, Genovese G, Loh PR, Bhatia G, Do R, Hayeck T, Won HH, Kathiresan S, Pato M, Pato C, Tamimi R, Stahl E, Zaitlen N, Pasaniuc B, Belbin G, Kenny EE, Schierup MH, De Jager P, Patsopoulos NA, McCarroll S, Daly M, Purcell S, Chasman D, Neale B, Goddard M, Visscher PM, Kraft P, Patterson N, Price AL, Ripke S, Neale B, Corvin A, Walters J, Farh KH, Holmans P, Lee P, Bulik-Sullivan B, Collier D, Huang H, Pers T, Agartz I, Agerbo E, Albus M, Alexander M, Amin F, Bacanu S, Begemann M, Belliveau R, Bene J, Bergen S, Bevilacqua E, Bigdeli T, Black D, Bruggeman R, Buccola N, Buckner R, Byerley W, Cahn W, Cai G, Campion D, Cantor R, Carr V, Carrera N, Catts S, Chambert K, Chan R, Chen R, Chen E, Cheng W, Cheung E, Chong S, Cloninger C, Cohen D, Cohen N, Cormican P, Craddock N, Crowley J, Curtis D, Davidson M, Davis K, Degenhardt F, Del Favero J, DeLisi L, Demontis D, Dikeos D, Dinan T, Djurovic S, Donohoe G, Drapeau E, Duan J, Dudbridge F, Durmishi N, Eichhammer P, Eriksson J, Escott-Price V, et alVilhjálmsson B, Yang J, Finucane H, Gusev A, Lindström S, Ripke S, Genovese G, Loh PR, Bhatia G, Do R, Hayeck T, Won HH, Kathiresan S, Pato M, Pato C, Tamimi R, Stahl E, Zaitlen N, Pasaniuc B, Belbin G, Kenny EE, Schierup MH, De Jager P, Patsopoulos NA, McCarroll S, Daly M, Purcell S, Chasman D, Neale B, Goddard M, Visscher PM, Kraft P, Patterson N, Price AL, Ripke S, Neale B, Corvin A, Walters J, Farh KH, Holmans P, Lee P, Bulik-Sullivan B, Collier D, Huang H, Pers T, Agartz I, Agerbo E, Albus M, Alexander M, Amin F, Bacanu S, Begemann M, Belliveau R, Bene J, Bergen S, Bevilacqua E, Bigdeli T, Black D, Bruggeman R, Buccola N, Buckner R, Byerley W, Cahn W, Cai G, Campion D, Cantor R, Carr V, Carrera N, Catts S, Chambert K, Chan R, Chen R, Chen E, Cheng W, Cheung E, Chong S, Cloninger C, Cohen D, Cohen N, Cormican P, Craddock N, Crowley J, Curtis D, Davidson M, Davis K, Degenhardt F, Del Favero J, DeLisi L, Demontis D, Dikeos D, Dinan T, Djurovic S, Donohoe G, Drapeau E, Duan J, Dudbridge F, Durmishi N, Eichhammer P, Eriksson J, Escott-Price V, Essioux L, Fanous A, Farrell M, Frank J, Franke L, Freedman R, Freimer N, Friedl M, Friedman J, Fromer M, Genovese G, Georgieva L, Gershon E, Giegling I, Giusti-Rodrguez P, Godard S, Goldstein J, Golimbet V, Gopal S, Gratten J, Grove J, de Haan L, Hammer C, Hamshere M, Hansen M, Hansen T, Haroutunian V, Hartmann A, Henskens F, Herms S, Hirschhorn J, Hoffmann P, Hofman A, Hollegaard M, Hougaard D, Ikeda M, Joa I, Julia A, Kahn R, Kalaydjieva L, Karachanak-Yankova S, Karjalainen J, Kavanagh D, Keller M, Kelly B, Kennedy J, Khrunin A, Kim Y, Klovins J, Knowles J, Konte B, Kucinskas V, Kucinskiene Z, Kuzelova-Ptackova H, Kahler A, Laurent C, Keong J, Lee S, Legge S, Lerer B, Li M, Li T, Liang KY, Lieberman J, Limborska S, Loughland C, Lubinski J, Lnnqvist J, Macek M, Magnusson P, Maher B, Maier W, Mallet J, Marsal S, Mattheisen M, Mattingsdal M, McCarley R, McDonald C, McIntosh A, Meier S, Meijer C, Melegh B, Melle I, Mesholam-Gately R, Metspalu A, Michie P, Milani L, Milanova V, Mokrab Y, Morris D, Mors O, Mortensen P, Murphy K, Murray R, Myin-Germeys I, Mller-Myhsok B, Nelis M, Nenadic I, Nertney D, Nestadt G, Nicodemus K, Nikitina-Zake L, Nisenbaum L, Nordin A, O’Callaghan E, O’Dushlaine C, O’Neill F, Oh SY, Olincy A, Olsen L, Van Os J, Pantelis C, Papadimitriou G, Papiol S, Parkhomenko E, Pato M, Paunio T, Pejovic-Milovancevic M, Perkins D, Pietilinen O, Pimm J, Pocklington A, Powell J, Price A, Pulver A, Purcell S, Quested D, Rasmussen H, Reichenberg A, Reimers M, Richards A, Roffman J, Roussos P, Ruderfer D, Salomaa V, Sanders A, Schall U, Schubert C, Schulze T, Schwab S, Scolnick E, Scott R, Seidman L, Shi J, Sigurdsson E, Silagadze T, Silverman J, Sim K, Slominsky P, Smoller J, So HC, Spencer C, Stahl E, Stefansson H, Steinberg S, Stogmann E, Straub R, Strengman E, Strohmaier J, Stroup T, Subramaniam M, Suvisaari J, Svrakic D, Szatkiewicz J, Sderman E, Thirumalai S, Toncheva D, Tooney P, Tosato S, Veijola J, Waddington J, Walsh D, Wang D, Wang Q, Webb B, Weiser M, Wildenauer D, Williams N, Williams S, Witt S, Wolen A, Wong E, Wormley B, Wu J, Xi H, Zai C, Zheng X, Zimprich F, Wray N, Stefansson K, Visscher P, Adolfsson R, Andreassen O, Blackwood D, Bramon E, Buxbaum J, Børglum A, Cichon S, Darvasi A, Domenici E, Ehrenreich H, Esko T, Gejman P, Gill M, Gurling H, Hultman C, Iwata N, Jablensky A, Jonsson E, Kendler K, Kirov G, Knight J, Lencz T, Levinson D, Li Q, Liu J, Malhotra A, McCarroll S, McQuillin A, Moran J, Mortensen P, Mowry B, Nthen M, Ophoff R, Owen M, Palotie A, Pato C, Petryshen T, Posthuma D, Rietschel M, Riley B, Rujescu D, Sham P, Sklar P, St. Clair D, Weinberger D, Wendland J, Werge T, Daly M, Sullivan P, O’Donovan M, Kraft P, Hunter DJ, Adank M, Ahsan H, Aittomäki K, Baglietto L, Berndt S, Blomquist C, Canzian F, Chang-Claude J, Chanock SJ, Crisponi L, Czene K, Dahmen N, Silva IDS, Easton D, Eliassen AH, Figueroa J, Fletcher O, Garcia-Closas M, Gaudet MM, Gibson L, Haiman CA, Hall P, Hazra A, Hein R, Henderson BE, Hofman A, Hopper JL, Irwanto A, Johansson M, Kaaks R, Kibriya MG, Lichtner P, Lindström S, Liu J, Lund E, Makalic E, Meindl A, Meijers-Heijboer H, Müller-Myhsok B, Muranen TA, Nevanlinna H, Peeters PH, Peto J, Prentice RL, Rahman N, Sánchez MJ, Schmidt DF, Schmutzler RK, Southey MC, Tamimi R, Travis R, Turnbull C, Uitterlinden AG, van der Luijt RB, Waisfisz Q, Wang Z, Whittemore AS, Yang R, Zheng W. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am J Hum Genet 2015; 97:576-92. [PMID: 26430803 DOI: 10.1016/j.ajhg.2015.09.001] [Show More Authors] [Citation(s) in RCA: 867] [Impact Index Per Article: 86.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 09/01/2015] [Indexed: 11/24/2022] Open
Abstract
Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
Collapse
|
974
|
Casale FP, Rakitsch B, Lippert C, Stegle O. Efficient set tests for the genetic analysis of correlated traits. Nat Methods 2015; 12:755-8. [PMID: 26076425 DOI: 10.1038/nmeth.3439] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 05/18/2015] [Indexed: 01/17/2023]
Abstract
Set tests are a powerful approach for genome-wide association testing between groups of genetic variants and quantitative traits. We describe mtSet (http://github.com/PMBio/limix), a mixed-model approach that enables joint analysis across multiple correlated traits while accounting for population structure and relatedness. mtSet effectively combines the benefits of set tests with multi-trait modeling and is computationally efficient, enabling genetic analysis of large cohorts (up to 500,000 individuals) and multiple traits.
Collapse
Affiliation(s)
- Francesco Paolo Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Barbara Rakitsch
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Christoph Lippert
- 1] Microsoft Research, Los Angeles, California, USA. [2] Human Longevity, Inc., Mountain View, California, USA
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
975
|
Hayeck T, Zaitlen N, Loh PR, Vilhjalmsson B, Pollack S, Gusev A, Yang J, Chen GB, Goddard M, Visscher P, Patterson N, Price A. Mixed model with correction for case-control ascertainment increases association power. Am J Hum Genet 2015; 96:720-30. [PMID: 25892111 DOI: 10.1016/j.ajhg.2015.03.004] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Accepted: 03/05/2015] [Indexed: 01/06/2023] Open
Abstract
We introduce a liability-threshold mixed linear model (LTMLM) association statistic for case-control studies and show that it has a well-controlled false-positive rate and more power than existing mixed-model methods for diseases with low prevalence. Existing mixed-model methods suffer a loss in power under case-control ascertainment, but no solution has been proposed. Here, we solve this problem by using a χ(2) score statistic computed from posterior mean liabilities (PMLs) under the liability-threshold model. Each individual's PML is conditional not only on that individual's case-control status but also on every individual's case-control status and the genetic relationship matrix (GRM) obtained from the data. The PMLs are estimated with a multivariate Gibbs sampler; the liability-scale phenotypic covariance matrix is based on the GRM, and a heritability parameter is estimated via Haseman-Elston regression on case-control phenotypes and then transformed to the liability scale. In simulations of unrelated individuals, the LTMLM statistic was correctly calibrated and achieved higher power than existing mixed-model methods for diseases with low prevalence, and the magnitude of the improvement depended on sample size and severity of case-control ascertainment. In a Wellcome Trust Case Control Consortium 2 multiple sclerosis dataset with >10,000 samples, LTMLM was correctly calibrated and attained a 4.3% improvement (p = 0.005) in χ(2) statistics over existing mixed-model methods at 75 known associated SNPs, consistent with simulations. Larger increases in power are expected at larger sample sizes. In conclusion, case-control studies of diseases with low prevalence can achieve power higher than that in existing mixed-model methods.
Collapse
|