1
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
2
|
He KY, Kelly TN, Wang H, Liang J, Zhu L, Cade BE, Assimes TL, Becker LC, Beitelshees AL, Bielak LF, Bress AP, Brody JA, Chang YPC, Chang YC, de Vries PS, Duggirala R, Fox ER, Franceschini N, Furniss AL, Gao Y, Guo X, Haessler J, Hung YJ, Hwang SJ, Irvin MR, Kalyani RR, Liu CT, Liu C, Martin LW, Montasser ME, Muntner PM, Mwasongwe S, Naseri T, Palmas W, Reupena MS, Rice KM, Sheu WHH, Shimbo D, Smith JA, Snively BM, Yanek LR, Zhao W, Blangero J, Boerwinkle E, Chen YDI, Correa A, Cupples LA, Curran JE, Fornage M, He J, Hou L, Kaplan RC, Kardia SLR, Kenny EE, Kooperberg C, Lloyd-Jones D, Loos RJF, Mathias RA, McGarvey ST, Mitchell BD, North KE, Peyser PA, Psaty BM, Raffield LM, Rao DC, Redline S, Reiner AP, Rich SS, Rotter JI, Taylor KD, Tracy R, Vasan RS, Morrison AC, Levy D, Chakravarti A, Arnett DK, Zhu X. Rare coding variants in RCN3 are associated with blood pressure. BMC Genomics 2022; 23:148. [PMID: 35183128 PMCID: PMC8858539 DOI: 10.1186/s12864-022-08356-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 02/01/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND While large genome-wide association studies have identified nearly one thousand loci associated with variation in blood pressure, rare variant identification is still a challenge. In family-based cohorts, genome-wide linkage scans have been successful in identifying rare genetic variants for blood pressure. This study aims to identify low frequency and rare genetic variants within previously reported linkage regions on chromosomes 1 and 19 in African American families from the Trans-Omics for Precision Medicine (TOPMed) program. Genetic association analyses weighted by linkage evidence were completed with whole genome sequencing data within and across TOPMed ancestral groups consisting of 60,388 individuals of European, African, East Asian, Hispanic, and Samoan ancestries. RESULTS Associations of low frequency and rare variants in RCN3 and multiple other genes were observed for blood pressure traits in TOPMed samples. The association of low frequency and rare coding variants in RCN3 was further replicated in UK Biobank samples (N = 403,522), and reached genome-wide significance for diastolic blood pressure (p = 2.01 × 10- 7). CONCLUSIONS Low frequency and rare variants in RCN3 contributes blood pressure variation. This study demonstrates that focusing association analyses in linkage regions greatly reduces multiple-testing burden and improves power to identify novel rare variants associated with blood pressure traits.
Collapse
Affiliation(s)
- Karen Y He
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH, 44106, USA
| | - Tanika N Kelly
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Heming Wang
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Jingjing Liang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH, 44106, USA
| | - Luke Zhu
- Center for Human Genetics & Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Brian E Cade
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Themistocles L Assimes
- Department of Medicine (Division of Cardiovascular Medicine), Stanford University, Palo Alto, CA, USA
| | - Lewis C Becker
- GeneSTAR Research Program, Department of Medicine, Divisions of Cardiology and General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Amber L Beitelshees
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Adam P Bress
- Department of Population Health Sciences, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Yen-Pei Christy Chang
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Yi-Cheng Chang
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University, Taipei City, Taiwan
- Institute of Biomedical Sciences, Academia Sinica, Taipei City, Taiwan
- Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Paul S de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ravindranath Duggirala
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Ervin R Fox
- Division of Cardiovascular Diseases, Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Nora Franceschini
- Department of Epidemiology, UNC Gillings School of Global Public Health, Chapel Hill, NC, USA
| | - Anna L Furniss
- Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA
| | - Yan Gao
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Yi-Jen Hung
- Institute of Preventive Medicine, National Defense Medical Center, New Taipei City, Taiwan
| | - Shih-Jen Hwang
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
| | - Marguerite Ryan Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AB, USA
| | - Rita R Kalyani
- GeneSTAR Research Program, Department of Medicine, Division of Endocrinology, Diabetes and Metabolism, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ching-Ti Liu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
| | - Chunyu Liu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
| | - Lisa Warsinger Martin
- Division of Cardiology, Department of Medicine, George Washington University, Washington, DC, USA
| | - May E Montasser
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Paul M Muntner
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AB, USA
| | | | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | - Walter Palmas
- Division of Cardiology, Columbia University Irving Medical Center, New York, NY, USA
| | | | - Kenneth M Rice
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA
| | - Wayne H-H Sheu
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung City, Taiwan
| | - Daichi Shimbo
- Division of Cardiology, Columbia University Irving Medical Center, New York, NY, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Beverly M Snively
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
- Division of Genomic Outcomes, Department of Pediatrics, Harbor-UCLA Medical Center Professor of Pediatrics, UCLA, Torrance, CA, USA
| | - Adolfo Correa
- Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Lifang Hou
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University Chicago, Evanston, IL, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Donald Lloyd-Jones
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Rasika A Mathias
- GeneSTAR Research Program, Department of Medicine, Divisions of Allergy and Clinical Immunology and General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Stephen T McGarvey
- International Health Institute and Department of Epidemiology, School of Public Health, Brown University, Providence, RI, USA
- Department of Anthropology, Brown University, Providence, RI, USA
| | - Braxton D Mitchell
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Veterans Affairs Medical Center, Baltimore, MD, USA
| | - Kari E North
- Department of Epidemiology, UNC Gillings School of Global Public Health, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - D C Rao
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Alex P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Russell Tracy
- Department of Pathology & Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, VT, USA
- Department of Biochemistry, University of Vermont, Burlington, VT, USA
| | - Ramachandran S Vasan
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
- Department of Medicine, School of Medicine, Boston University, Boston, MA, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Daniel Levy
- Framingham Heart Study, National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Aravinda Chakravarti
- Center for Human Genetics & Genomics, New York University Grossman School of Medicine, New York, NY, USA
| | - Donna K Arnett
- University of Kentucky College of Public Health, Lexington, KY, USA
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH, 44106, USA.
| |
Collapse
|
3
|
Gao C, Sha Q, Zhang S, Zhang K. MF-TOWmuT: Testing an optimally weighted combination of common and rare variants with multiple traits using family data. Genet Epidemiol 2020; 45:64-81. [PMID: 33047835 DOI: 10.1002/gepi.22355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 08/03/2020] [Accepted: 08/18/2020] [Indexed: 11/11/2022]
Abstract
With rapid advancements of sequencing technologies and accumulations of electronic health records, a large number of genetic variants and multiple correlated human complex traits have become available in many genetic association studies. Thus, it becomes necessary and important to develop new methods that can jointly analyze the association between multiple genetic variants and multiple traits. Compared with methods that only use a single marker or trait, the joint analysis of multiple genetic variants and multiple traits is more powerful since such an analysis can fully incorporate the correlation structure of genetic variants and/or traits and their mutual dependence patterns. However, most of existing methods that simultaneously analyze multiple genetic variants and multiple traits are only applicable to unrelated samples. We develop a new method called MF-TOWmuT to detect association of multiple phenotypes and multiple genetic variants in a genomic region with family samples. MF-TOWmuT is based on an optimally weighted combination of variants. Our method can be applied to both rare and common variants and both qualitative and quantitative traits. Our simulation results show that (1) the type I error of MF-TOWmuT is preserved; (2) MF-TOWmuT outperforms two existing methods such as Multiple Family-based Quasi-Likelihood Score Test and Multivariate Family-based Rare Variant Association Test in terms of power. We also illustrate the usefulness of MF-TOWmuT by analyzing genotypic and phenotipic data from the Genetics of Kidneys in Diabetes study. R program is available at https://github.com/gaochengPRC/MF-TOWmuT.
Collapse
Affiliation(s)
- Cheng Gao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| |
Collapse
|
4
|
Jones RM, Melton PE, Pinese M, Rea AJ, Ingley E, Ballinger ML, Wood DJ, Thomas DM, Moses EK. Identification of novel sarcoma risk genes using a two-stage genome wide DNA sequencing strategy in cancer cluster families and population case and control cohorts. BMC MEDICAL GENETICS 2019; 20:69. [PMID: 31053105 PMCID: PMC6499942 DOI: 10.1186/s12881-019-0808-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 04/16/2019] [Indexed: 12/26/2022]
Abstract
Background Although familial clustering of cancers is relatively common, only a small proportion of familial cancer risk can be explained by known cancer predisposition genes. Methods In this study we employed a two-stage approach to identify candidate sarcoma risk genes. First, we conducted whole exome sequencing in three multigenerational cancer families ascertained through a sarcoma proband (n = 19) in order to prioritize candidate genes for validation in an independent case-control cohort of sarcoma patients using family-based association and segregation analysis. The second stage employed a burden analysis of rare variants within prioritized candidate genes identified from stage one in 560 sarcoma cases and 1144 healthy ageing controls, for which whole genome sequence was available. Results Variants from eight genes were identified in stage one. Following gene-based burden testing and after correction for multiple testing, two of these genes, ABCB5 and C16orf96, were determined to show statistically significant association with cancer. The ABCB5 gene was found to have a higher burden of putative regulatory variants (OR = 4.9, p-value = 0.007, q-value = 0.04) based on allele counts in sarcoma cases compared to controls. C16orf96, was found to have a significantly lower burden (OR = 0.58, p-value = 0.0004, q-value = 0.003) of regulatory variants in controls compared to sarcoma cases. Conclusions Based on these genetic association data we propose that ABCB5 and C16orf96 are novel candidate risk genes for sarcoma. Although neither of these two genes have been previously associated with sarcoma, ABCB5 has been shown to share clinical drug resistance associations with melanoma and leukaemia and C16orf96 shares regulatory elements with genes that are involved with TNF-alpha mediated apoptosis in a p53/TP53-dependent manner. Future genetic studies in other family and population cohorts will be required for further validation of these novel findings. Electronic supplementary material The online version of this article (10.1186/s12881-019-0808-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rachel M Jones
- The Curtin UWA Centre for Genetic Origins of Health and Disease, Faculty of Health Sciences, Curtin University and Faculty of Health and Medical Sciences, M409 The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Western Australia.,Medical School, Faculty of Health and Medical Sciences, University of Western Australia, Crawley, Australia
| | - Phillip E Melton
- The Curtin UWA Centre for Genetic Origins of Health and Disease, Faculty of Health Sciences, Curtin University and Faculty of Health and Medical Sciences, M409 The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Western Australia.,School of Pharmacy and Biomedical Sciences, Faculty of Health Sciences, Curtin University, Bentley, Western Australia
| | - Mark Pinese
- Cancer Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Alexander J Rea
- The Curtin UWA Centre for Genetic Origins of Health and Disease, Faculty of Health Sciences, Curtin University and Faculty of Health and Medical Sciences, M409 The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Western Australia
| | - Evan Ingley
- School of Veterinary and Life Sciences, Murdoch University, Murdoch, Australia.,Harry Perkins Institute of Medical Research, Murdoch, Western Australia.,The Centre for Medical Research, The University of Western Australia, Crawley, Australia
| | - Mandy L Ballinger
- Cancer Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | | | - David J Wood
- Medical School, Faculty of Health and Medical Sciences, University of Western Australia, Crawley, Australia
| | - David M Thomas
- Cancer Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Eric K Moses
- The Curtin UWA Centre for Genetic Origins of Health and Disease, Faculty of Health Sciences, Curtin University and Faculty of Health and Medical Sciences, M409 The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Western Australia. .,School of Pharmacy and Biomedical Sciences, Faculty of Health Sciences, Curtin University, Bentley, Western Australia. .,School of Biomedical Sciences, Faculty of Health and Medical Sciences, The University of Western Australia, Crawley, Australia.
| |
Collapse
|
5
|
Qin H, Zhao J, Zhu X. Identifying Rare Variant Associations in Admixed Populations. Sci Rep 2019; 9:5458. [PMID: 30931973 PMCID: PMC6443736 DOI: 10.1038/s41598-019-41845-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 03/12/2019] [Indexed: 12/27/2022] Open
Abstract
An admixed population and its ancestral populations bear different burdens of a complex disease. The ancestral populations may have different haplotypes of deleterious alleles and thus ancestry-gene interaction can influence disease risk in the admixed population. Among admixed individuals, deleterious haplotypes and their ancestries are dependent and can provide non-redundant association information. Herein we propose a local ancestry boosted sum test (LABST) for identifying chromosomal blocks that harbor rare variants but have no ancestry switches. For such a stable ancestral block, our LABST exploits ancestry-gene interaction and the number of rare alleles therein. Under the null of no genetic association, the test statistic asymptotically follows a chi-square distribution with one degree of freedom (1-df). Our LABST properly controlled type I error rates under extensive simulations, suggesting that the asymptotic approximation was accurate for the null distribution of the test statistic. In terms of power for identifying rare variant associations, our LABST uniformly outperformed several famed methods under four important modes of disease genetics over a large range of relative risks. In conclusion, exploiting ancestry-gene interaction can boost statistical power for rare variant association mapping in admixed populations.
Collapse
Affiliation(s)
- Huaizhen Qin
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, 32611, USA
- Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, 1440 Canal Street, New Orleans, LA, 70112, USA
| | - Jinying Zhao
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, 32611, USA
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, 10900 Euclid Avenue, Cleveland, Ohio, 44106, USA.
| |
Collapse
|
6
|
Guo Y, Zhou Y. A modified association test for rare and common variants based on affected sib-pair design. J Theor Biol 2019; 467:1-6. [PMID: 30707975 DOI: 10.1016/j.jtbi.2019.01.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 01/08/2019] [Indexed: 11/18/2022]
Abstract
Current genome-wide association analysis has identified a great number of rare and common variants associated with common complex traits, however, more effective approaches for detecting associations between rare and common variants with common diseases are still demanded. Approaches for detecting rare variant association analysis will compromise the power when detecting the effects of rare and common variants simultaneously. In this paper, we extend an existing method of testing for rare variant association based on affected sib pairs (TOW-sib) and propose a variable weight test for rare and common variants association based on affected sib pairs (abbreviated as VW-TOWsib). The VW-TOWsib can be used to achieve the purpose of detecting the association of rare and common variants with complex diseases. Simulation results in various scenarios show that our proposed method is more powerful than existing methods for detecting effects of rare and common variants. At the same time, the VW-TOWsib also performs well as a method for rare variant association analysis.
Collapse
Affiliation(s)
- Yixing Guo
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University and Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Harbin 150080, China
| | - Ying Zhou
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University and Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Harbin 150080, China.
| |
Collapse
|
7
|
Caspers M, Blocquiaux S, Charlier R, Lefevre J, De Bock K, Thomis M. Metabolic fitness in relation to genetic variation and leukocyte DNA methylation. Physiol Genomics 2019; 51:12-26. [DOI: 10.1152/physiolgenomics.00077.2018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Metabolic syndrome (MetS) is a highly prevalent condition causing increased risk of several life-threatening diseases. MetS has a pronounced hereditary basis but is also influenced by environmental factors, partly through epigenetic mechanisms. In this study, the five phenotypes underlying MetS were incorporated into a continuous score for metabolic fitness (MF), and associations with both genotypic variation and leukocyte DNA methylation were investigated. Baseline MF phenotypes (waist circumference, blood pressure, blood glucose, serum triglycerides, and high-density lipoproteins) of 710 healthy Flemish adults were measured. After a 10 yr period, follow-up measures were derived from 618 of these subjects. Genotyping was performed for 65 preselected MF-related genetic variants. Next, full genetic predisposition scores (GPSs) were calculated, combining genotype scores of multiple genetic variants. Additionally, stepwise GPSs were constructed, including only the most predictive genetic variants for the different MF phenotypes. For a subset of 68 middle-aged men, global and gene-specific DNA methylation was investigated, and a biological pathway analysis was performed. The full GPSs were predictive for some baseline MF phenotypes, but not for changes over time. Only a limited number of genetic variants were significantly predictive individually. On the contrary, global and gene-specific DNA methylation was associated with changes in the MF phenotypes rather than with the baseline measures, indicating that effects of DNA methylation on MF are somewhat delayed. Furthermore, several biological pathways were associated with the MF phenotypes through gene promoter methylation. For CETP, G6PC2, MC4R, and TFAP2B both a genetic and epigenetic relationship was found with MF.
Collapse
Affiliation(s)
- M. Caspers
- Department of Movement Sciences, KU Leuven, Leuven, Belgium
| | - S. Blocquiaux
- Department of Movement Sciences, KU Leuven, Leuven, Belgium
| | - R. Charlier
- Department of Movement Sciences, KU Leuven, Leuven, Belgium
| | - J. Lefevre
- Department of Movement Sciences, KU Leuven, Leuven, Belgium
| | - K. De Bock
- Department of Health Sciences and Technology, ETH Zürich, Schwerzenbach, Switzerland
| | - M. Thomis
- Department of Movement Sciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
8
|
Combined linkage and association analysis identifies rare and low frequency variants for blood pressure at 1q31. Eur J Hum Genet 2018; 27:269-277. [PMID: 30262922 DOI: 10.1038/s41431-018-0277-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Revised: 07/12/2018] [Accepted: 08/28/2018] [Indexed: 12/24/2022] Open
Abstract
High blood pressure (BP) is a major risk factor for cardiovascular disease (CVD) and is more prevalent in African Americans as compared to other US groups. Although large, population-based genome-wide association studies (GWAS) have identified over 300 common polymorphisms modulating inter-individual BP variation, largely in European ancestry subjects, most of them do not localize to regions previously identified through family-based linkage studies. This discrepancy has remained unexplained despite the statistical power differences between current GWAS and prior linkage studies. To address this issue, we performed genome-wide linkage analysis of BP traits in African-American families from the Family Blood Pressure Program (FBPP) and genotyped on the Illumina Human Exome BeadChip v1.1. We identified a genomic region on chromosome 1q31 with LOD score 3.8 for pulse pressure (PP), a region we previously implicated in DBP studies of European ancestry families. Although no reported GWAS variants map to this region, combined linkage and association analysis of PP identified 81 rare and low frequency exonic variants accounting for the linkage evidence. Replication analysis in eight independent African ancestry cohorts (N = 16,968) supports this specific association with PP (P = 0.0509). Additional association and network analyses identified multiple potential candidate genes in this region expressed in multiple tissues and with a strong biological support for a role in BP. In conclusion, multiple genes and rare variants on 1q31 contribute to PP variation. Beyond producing new insights into PP, we demonstrate how family-based linkage and association studies can implicate specific rare and low frequency variants for complex traits.
Collapse
|
9
|
He KY, Wang H, Cade BE, Nandakumar P, Giri A, Ware EB, Haessler J, Liang J, Smith JA, Franceschini N, Le TH, Kooperberg C, Edwards TL, Kardia SLR, Lin X, Chakravarti A, Redline S, Zhu X. Rare variants in fox-1 homolog A (RBFOX1) are associated with lower blood pressure. PLoS Genet 2017; 13:e1006678. [PMID: 28346479 PMCID: PMC5386302 DOI: 10.1371/journal.pgen.1006678] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 04/10/2017] [Accepted: 03/09/2017] [Indexed: 12/23/2022] Open
Abstract
Many large genome-wide association studies (GWAS) have identified common blood pressure (BP) variants. However, most of the identified BP variants do not overlap with the linkage evidence observed from family studies. We thus hypothesize that multiple rare variants contribute to the observed linkage evidence. We performed linkage analysis using 517 individuals in 130 European families from the Cleveland Family Study (CFS) who have been genotyped on the Illumina OmniExpress Exome array. The largest linkage peak was observed on chromosome 16p13 (MLOD = 2.81) for systolic blood pressure (SBP). Follow-up conditional linkage and association analyses in the linkage region identified multiple rare, coding variants in RBFOX1 associated with reduced SBP. In a 17-member CFS family, carriers of the missense variant rs149974858 are normotensive despite being obese (average BMI = 60 kg/m2). Gene-based association test of rare variants using SKAT-O showed significant association with SBP (p-value = 0.00403) and DBP (p-value = 0.0258) in the CFS participants and the association was replicated in large independent replication studies (N = 57,234, p-value = 0.013 for SBP, 0.0023 for PP). RBFOX1 is expressed in brain tissues, the atrial appendage and left ventricle in the heart, and in skeletal muscle tissues, organs/tissues which are potentially related to blood pressure. Our study showed that associations of rare variants could be efficiently detected using family information.
Collapse
Affiliation(s)
- Karen Y. He
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Heming Wang
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Brian E. Cade
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Priyanka Nandakumar
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Ayush Giri
- Division of Epidemiology, Department of Medicine, Institute for Medicine and Public Health, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Erin B. Ware
- Biosocial Methods Collaborative, Institute for Social Research, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Jingjing Liang
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Jennifer A. Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Nora Franceschini
- Department of Epidemiology, UNC Gillings School of Global Public Health, Chapel Hill, North Carolina, United States of America
| | - Thu H. Le
- Department of Medicine, Division of Nephrology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Todd L. Edwards
- Division of Epidemiology, Department of Medicine, Institute for Medicine and Public Health, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Sharon L. R. Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Aravinda Chakravarti
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Division of Sleep Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, United States of America
- * E-mail:
| |
Collapse
|
10
|
Fu J, Beaty TH, Scott AF, Hetmanski J, Parker MM, Wilson JEB, Marazita ML, Mangold E, Albacha-Hejazi H, Murray JC, Bureau A, Carey J, Cristiano S, Ruczinski I, Scharpf RB. Whole exome association of rare deletions in multiplex oral cleft families. Genet Epidemiol 2017; 41:61-69. [PMID: 27910131 PMCID: PMC5154821 DOI: 10.1002/gepi.22010] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Revised: 09/21/2016] [Accepted: 09/21/2016] [Indexed: 11/11/2022]
Abstract
By sequencing the exomes of distantly related individuals in multiplex families, rare mutational and structural changes to coding DNA can be characterized and their relationship to disease risk can be assessed. Recently, several rare single nucleotide variants (SNVs) were associated with an increased risk of nonsyndromic oral cleft, highlighting the importance of rare sequence variants in oral clefts and illustrating the strength of family-based study designs. However, the extent to which rare deletions in coding regions of the genome occur and contribute to risk of nonsyndromic clefts is not well understood. To identify putative structural variants underlying risk, we developed a pipeline for rare hemizygous deletions in families from whole exome sequencing and statistical inference based on rare variant sharing. Among 56 multiplex families with 115 individuals, we identified 53 regions with one or more rare hemizygous deletions. We found 45 of the 53 regions contained rare deletions occurring in only one family member. Members of the same family shared a rare deletion in only eight regions. We also devised a scalable global test for enrichment of shared rare deletions.
Collapse
Affiliation(s)
- Jack Fu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Terri H. Beaty
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Alan F. Scott
- Center for Inherited Disease Research and Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore MD, USA
| | - Jacqueline Hetmanski
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Margaret M. Parker
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston MA, USA
| | - Joan E. Bailey Wilson
- Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore MD, USA
| | - Mary L. Marazita
- Department of Oral Biology, Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, PA, USA
| | | | | | - Jeffrey C. Murray
- Department of Pediatrics, School of Medicine, University of Iowa, IA, USA
| | - Alexandre Bureau
- Centre de Recherche de l’Institut Universitaire en Santé Mentale de Québec and Département de Médecine Sociale et Préventive, Université Laval, Québec, Canada
| | - Jacob Carey
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Stephen Cristiano
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Robert B. Scharpf
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore MD, USA
| |
Collapse
|
11
|
Zhu H, Wang Z, Wang X, Sha Q. A novel statistical method for rare-variant association studies in general pedigrees. BMC Proc 2016; 10:193-196. [PMID: 27980635 PMCID: PMC5133499 DOI: 10.1186/s12919-016-0029-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Both population-based and family-based designs are commonly used in genetic association studies to identify rare variants that underlie complex diseases. For any type of study design, the statistical power will be improved if rare variants can be enriched in the samples. Family-based designs, with ascertainment based on phenotype, may enrich the sample for causal rare variants and thus can be more powerful than population-based designs. Therefore, it is important to develop family-based statistical methods that can account for ascertainment. In this paper, we develop a novel statistical method for rare-variant association studies in general pedigrees for quantitative traits. This method uses a retrospective view that treats the traits as fixed and the genotypes as random, which allows us to account for complex and undefined ascertainment of families. We then apply the newly developed method to the Genetic Analysis Workshop 19 data set and compare the power of the new method with two other methods for general pedigrees. The results show that the newly proposed method increases power in most of the cases we consider, more than the other two methods.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| | - Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203-5017 USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| |
Collapse
|
12
|
Zhou YJ, Wang Y, Chen LL. Detecting the Common and Individual Effects of Rare Variants on Quantitative Traits by Using Extreme Phenotype Sampling. Genes (Basel) 2016; 7:genes7010002. [PMID: 26784232 PMCID: PMC4728382 DOI: 10.3390/genes7010002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Revised: 12/21/2015] [Accepted: 01/05/2016] [Indexed: 12/19/2022] Open
Abstract
Next-generation sequencing technology has made it possible to detect rare genetic variants associated with complex human traits. In recent literature, various methods specifically designed for rare variants are proposed. These tests can be broadly classified into burden and nonburden tests. In this paper, we take advantage of the burden and nonburden tests, and consider the common effect and the individual deviations from the common effect. To achieve robustness, we use two methods of combining p-values, Fisher's method and the minimum-p method. In rare variant association studies, to improve the power of the tests, we explore the advantage of the extreme phenotype sampling. At first, we dichotomize the continuous phenotypes before analysis, and the two extremes are treated as two different groups representing a dichotomous phenotype. We next compare the powers of several methods based on extreme phenotype sampling and random sampling. Extensive simulation studies show that our proposed methods by using extreme phenotype sampling are the most powerful or very close to the most powerful one in various settings of true models when the same sample size is used.
Collapse
Affiliation(s)
- Ya-Jing Zhou
- Department of Mathematics, School of Science, Harbin Institute of Technology, Harbin 150001, China.
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, China.
| | - Yong Wang
- Department of Mathematics, School of Science, Harbin Institute of Technology, Harbin 150001, China.
| | - Li-Li Chen
- Department of Mathematics, School of Science, Harbin Institute of Technology, Harbin 150001, China.
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, China.
| |
Collapse
|
13
|
Renin angiotensinogen system gene polymorphisms and essential hypertension among people of West African descent: a systematic review. J Hum Hypertens 2015; 30:467-78. [DOI: 10.1038/jhh.2015.114] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Revised: 10/09/2015] [Accepted: 10/15/2015] [Indexed: 01/11/2023]
|
14
|
Greco B, Hainline A, Arbet J, Grinde K, Benitez A, Tintle N. A general approach for combining diverse rare variant association tests provides improved robustness across a wider range of genetic architectures. Eur J Hum Genet 2015; 24:767-73. [PMID: 26508571 DOI: 10.1038/ejhg.2015.194] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Revised: 06/13/2015] [Accepted: 07/03/2015] [Indexed: 11/09/2022] Open
Abstract
The widespread availability of genome sequencing data made possible by way of next-generation technologies has yielded a flood of different gene-based rare variant association tests. Most of these tests have been published because they have superior power for particular genetic architectures. However, for applied researchers it is challenging to know which test to choose in practice when little is known a priori about genetic architecture. Recently, tests have been proposed which combine two particular individual tests (one burden and one variance components) to minimize power loss while improving robustness to a wider range of genetic architectures. In our analysis we propose an expansion of these approaches, yielding a general method that works for combining any number of individual tests. We demonstrate that running multiple different tests on the same data set and using a Bonferroni correction for multiple testing is never better than combining tests using our general method. We also find that using a test statistic that is highly robust to the inclusion of non-causal variants (joint-infinity) together with a previously published combined test (sequence kernel adaptive test-optimal) provides improved robustness to a wide range of genetic architectures and should be considered for use in practice. Software for this approach is supplied. We support the increased use of combined tests in practice - as well as further exploration of novel combined testing approaches using the general framework provided here - to maximize robustness of rare variant testing strategies against a wide range of genetic architectures.
Collapse
Affiliation(s)
- Brian Greco
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Allison Hainline
- Department of Biostatistics, Vanderbilt University, Nashville, TN, USA
| | - Jaron Arbet
- Department of Statistics, Winona State University, Winona, MN, USA.,Department of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| | - Kelsey Grinde
- Department of Mathematics, Statistics and Computer Science, St Olaf College, Northfield, MN, USA.,Department of Biostatistics, University of Washington, Seattle, WA, USA
| | | | - Nathan Tintle
- Department of Mathematics, Statistics and Computer Science, Dordt College, Sioux Center, IA, USA
| |
Collapse
|
15
|
Detecting association of rare and common variants by adaptive combination of P-values. Genet Res (Camb) 2015; 97:e20. [PMID: 26440553 DOI: 10.1017/s0016672315000208] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Genome-wide association studies (GWAS) can detect common variants associated with diseases. Next generation sequencing technology has made it possible to detect rare variants. Most of association tests, including burden tests and nonburden tests, mainly target rare variants by upweighting rare variant effects and downweighting common variant effects. But there is increasing evidence that complex diseases are caused by both common and rare variants. In this paper, we extend the ADA method (adaptive combination of P-values; Lin et al., 2014) for rare variants only and propose a RC-ADA method (common and rare variants by adaptive combination of P-values). Our proposed method combines the per-site P-values with the weights based on minor allele frequencies (MAFs). The RC-ADA is robust to directions of effects of causal variants and inclusion of a high proportion of neutral variants. The performance of the RC-ADA method is compared with several other association methods. Extensive simulation studies show that the RC-ADA method is more powerful than other association methods over a wide range of models.
Collapse
|
16
|
Association between AVPR1A, DRD2, and ASPM and endophenotypes of communication disorders. Psychiatr Genet 2015; 24:191-200. [PMID: 24849541 DOI: 10.1097/ypg.0000000000000045] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
OBJECTIVES Speech sound disorder (SSD) is one of the most common communication disorders, with a prevalence rate of 16% at 3 years of age, and an estimated 3.8% of children still presenting speech difficulties at 6 years of age. Several studies have identified promising associations between communication disorders and genes in brain and neuronal pathways; however, there have been few studies focusing on SSD and its associated endophenotypes. On the basis of the hypothesis that neuronal genes may influence endophenotypes common to communication disorders, we focused on three genes related to brain and central nervous system functioning: the dopamine D2 receptor (DRD2) gene, the arginine-vasopressin receptor 1a (AVPR1A) gene, and the microcephaly-associated protein gene (ASPM). METHODS We examined the association of these genes with key endophenotypes of SSD - phonological memory measured through multisyllabic and nonword repetition, vocabulary measured using the Expressive One Word Picture Vocabulary Test and Peabody Picture Vocabulary Test, and reading decoding measured using the Woodcock Reading Mastery Tests Revised - as well as with the clinical phenotype of SSD. We genotyped tag single nucleotide polymorphisms in these genes and examined 498 individuals from 180 families. RESULTS These data show that several single nucleotide polymorphisms in all three genes were associated with phonological memory, vocabulary, and reading decoding, with P less than 0.05. Notably, associations in AVPR1A (rs11832266) were significant after multiple testing correction. Gene-level tests showed that DRD2 was associated with vocabulary, ASPM with vocabulary and reading decoding, and AVPR1A with all three endophenotypes. CONCLUSION Endophenotypes common to SSD, language impairment, and reading disability are all associated with these neuronal pathway genes.
Collapse
|
17
|
Kao CF, Liu JR, Hung H, Kuo PH. A robust GWSS method to simultaneously detect rare and common variants for complex disease. PLoS One 2015; 10:e0120873. [PMID: 25880329 PMCID: PMC4399906 DOI: 10.1371/journal.pone.0120873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 01/26/2015] [Indexed: 11/19/2022] Open
Abstract
The rapid advances in sequencing technologies and the resulting next-generation sequencing data provide the opportunity to detect disease-associated variants with a better solution, in particular for low-frequency variants. Although both common and rare variants might exert their independent effects on the risk for the trait of interest, previous methods to detect the association effects rarely consider them simultaneously. We proposed a class of test statistics, the generalized weighted-sum statistic (GWSS), to detect disease associations in the presence of common and rare variants with a case-control study design. Information of rare variants was aggregated using a weighted sum method, while signal directions and strength of the variants were considered at the same time. Permutations were performed to obtain the empirical p-values of the test statistics. Our simulation showed that, compared to the existing methods, the GWSS method had better performance in most of the scenarios. The GWSS (in particular VDWSS-t) method is particularly robust for opposite association directions, association strength, and varying distributions of minor-allele frequencies. It is therefore promising for detecting disease-associated loci. For empirical data application, we also applied our GWSS method to the Genetic Analysis Workshop 17 data, and the results were consistent with the simulation, suggesting good performance of our method. As re-sequencing studies become more popular to identify putative disease loci, we recommend the use of this newly developed GWSS to detect associations with both common and rare variants.
Collapse
Affiliation(s)
- Chung-Feng Kao
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Jia-Rou Liu
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Department of Public Health, Chang Gung University, Taoyuan,Taiwan
| | - Hung Hung
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Research Center for Genes, Environment and Human Health, National Taiwan University, Taipei, Taiwan
- * E-mail: (PHK); (HH)
| | - Po-Hsiu Kuo
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Research Center for Genes, Environment and Human Health, National Taiwan University, Taipei, Taiwan
- * E-mail: (PHK); (HH)
| |
Collapse
|
18
|
Wen SH, Yeh JI. Cohen's h for detection of disease association with rare genetic variants. BMC Genomics 2014; 15:875. [PMID: 25294186 PMCID: PMC4198687 DOI: 10.1186/1471-2164-15-875] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 10/03/2014] [Indexed: 11/16/2022] Open
Abstract
Background The power of the genome wide association studies starts to go down when the minor allele frequency (MAF) is below 0.05. Here, we proposed the use of Cohen’s h in detecting disease associated rare variants. The variance stabilizing effect based on the arcsine square root transformation of MAFs to generate Cohen’s h contributed to the statistical power for rare variants analysis. We re-analyzed published datasets, one microarray and one sequencing based, and used simulation to compare the performance of Cohen’s h with the risk difference (RD) and odds ratio (OR). Results The analysis showed that the type 1 error rate of Cohen’s h was as expected and Cohen’s h and RD were both less biased and had higher power than OR. The advantage of Cohen’s h was more obvious when MAF was less than 0.01. Conclusions Cohen’s h can increase the power to find genetic association of rare variants and diseases, especially when MAF is less than 0.01. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-875) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Jih-I Yeh
- Department of Molecular Biology and Human Genetics, Tzu-Chi University, 701, Sec 3, Chung-Yang Rd, Hualien 97004, Taiwan.
| |
Collapse
|
19
|
Hainline A, Alvarez C, Luedtke A, Greco B, Beck A, Tintle NL. Evaluation of the power and type I error of recently proposed family-based tests of association for rare variants. BMC Proc 2014; 8:S36. [PMID: 25519321 PMCID: PMC4143711 DOI: 10.1186/1753-6561-8-s1-s36] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Until very recently, few methods existed to analyze rare-variant association with binary phenotypes in complex pedigrees. We consider a set of recently proposed methods applied to the simulated and real hypertension phenotype as part of the Genetic Analysis Workshop 18. Minimal power of the methods is observed for genes containing variants with weak effects on the phenotype. Application of the methods to the real hypertension phenotype yielded no genes meeting a strict Bonferroni cutoff of significance. Some prior literature connects 3 of the 5 most associated genes (p <1 × 10−4) to hypertension or related phenotypes. Further methodological development is needed to extend these methods to handle covariates, and to explore more powerful test alternatives.
Collapse
Affiliation(s)
- Allison Hainline
- Department of Statistics, Baylor University, 1311 S 5th St., Waco, TX 76798, USA
| | - Carolina Alvarez
- Department of Biostatistics, Florida International University, 11200 SW 8th St., Miami, FL 33199, USA
| | - Alexander Luedtke
- Divison of Biostatistics, University of California, Berkeley, 101 Sproul Hall, Berkeley, CA 94720, USA
| | - Brian Greco
- Department of Mathematics and Statistics, Grinnell College, 733 Broad St., Grinnell, IA 50112, USA
| | - Andrew Beck
- Department of Mathematics, Loyola University Chicago, 1032 W. Sheridan Rd, Chicago, IL 60660, USA
| | - Nathan L Tintle
- Department of Mathematics, Statistics and Computer Science, 498 4th Ave. NE, Dordt College, Sioux Center, IA 51250, USA
| |
Collapse
|
20
|
Abstract
The cost of next-generation sequencing is now approaching that of the first generation of genome-wide single-nucleotide genotyping panels, but this is still out of reach for large-scale epidemiologic studies with tens of thousands of subjects. Furthermore, the anticipated yield of millions of rare variants poses serious challenges for distinguishing causal from noncausal variants for disease. We explore the merits of using family-based designs for sequencing substudies to identify novel variants and prioritize them for their likelihood of causality. While the sharing of variants within families means that family-based designs may be less efficient for discovery than sequencing of a comparable number of unrelated individuals, the ability to exploit cosegregation of variants with disease within families helps distinguish causal from noncausal ones. We introduce a score test criterion for prioritizing discovered variants in terms of their likelihood of being functional. We compare the relative statistical efficiency of 2-stage versus1-stage family-based designs by application to the Genetic Analysis Workshop 18 simulated sequence data.
Collapse
Affiliation(s)
- Zhao Yang
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089-9234, USA
| | - Duncan C Thomas
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089-9234, USA
| |
Collapse
|
21
|
Feng T, Zhu X. Whole genome sequencing data from pedigrees suggests linkage disequilibrium among rare variants created by population admixture. BMC Proc 2014; 8:S44. [PMID: 25519326 PMCID: PMC4143626 DOI: 10.1186/1753-6561-8-s1-s44] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Next-generation sequencing technologies have been designed to discover rare and de novo variants and are an important tool for identifying rare disease variants. Many statistical methods have been developed to test, using next-generation sequencing data, for rare variants that are associated with a trait. However, many of these methods make assumptions that rare variants are in linkage equilibrium in a gene. In this report, we studied whether transmitted or untransmitted haplotypes carry an excess of rare variants using the whole genome sequencing data of 15 large Mexican American pedigrees provided by the Genetic Analysis Workshop 18. We observed that an excess of rare variants are carried on either transmitted or nontransmitted haplotypes from parents to offspring. Further analyses suggest that such nonrandom associations among rare variants can be attributed to population admixture and single-nucleotide variant calling errors. Our results have significant implications for rare variant association studies, especially those conducted in admixed populations.
Collapse
Affiliation(s)
- Tao Feng
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| |
Collapse
|
22
|
Wang H, Zhu X. De novo mutations discovered in 8 Mexican American families through whole genome sequencing. BMC Proc 2014; 8:S24. [PMID: 25519376 PMCID: PMC4143763 DOI: 10.1186/1753-6561-8-s1-s24] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
De novo mutations enrich the sequence diversity and carry the clue of evolutional selection. Recent studies suggest the de novo mutations could be one of the risk factors for complex diseases. We conducted a survey of de novo mutations using the whole genome sequence data but only available on the odd autosomes of Mexican American families provided by Genetic Analysis Workshop 18. We extracted 8 three-generation families who have sequencing data available from 20 large pedigrees. By comparing the known single nucleotide variants (SNVs) in dbSNP129 and the de novo variants transmitted in the Mexican American families, we were able to estimate a de novo mutation rate of 1.64(±0.42) × 10(-8) per position per haploid genome. This result is consistent with the estimates in literature that required many extensive validation efforts, such as genotyping and further resequencing. Our analysis suggests the importance of using family samples for studying rare variants.
Collapse
Affiliation(s)
- Heming Wang
- Department of Epidemiology and Biostatistics, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106-4945, USA
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106-4945, USA
| |
Collapse
|
23
|
Abstract
This article focuses on conducting global testing for association between a binary trait and a set of rare variants (RVs), although its application can be much broader to other types of traits, common variants (CVs), and gene set or pathway analysis. We show that many of the existing tests have deteriorating performance in the presence of many nonassociated RVs: their power can dramatically drop as the proportion of nonassociated RVs in the group to be tested increases. We propose a class of so-called sum of powered score (SPU) tests, each of which is based on the score vector from a general regression model and hence can deal with different types of traits and adjust for covariates, e.g., principal components accounting for population stratification. The SPU tests generalize the sum test, a representative burden test based on pooling or collapsing genotypes of RVs, and a sum of squared score (SSU) test that is closely related to several other powerful variance component tests; a previous study (Basu and Pan 2011) has demonstrated good performance of one, but not both, of the Sum and SSU tests in many situations. The SPU tests are versatile in the sense that one of them is often powerful, although its identity varies with the unknown true association parameters. We propose an adaptive SPU (aSPU) test to approximate the most powerful SPU test for a given scenario, consequently maintaining high power and being highly adaptive across various scenarios. We conducted extensive simulations to show superior performance of the aSPU test over several state-of-the-art association tests in the presence of many nonassociated RVs. Finally we applied the SPU and aSPU tests to the GAW17 mini-exome sequence data to compare its practical performance with some existing tests, demonstrating their potential usefulness.
Collapse
|
24
|
Cook K, Benitez A, Fu C, Tintle N. Evaluating the impact of genotype errors on rare variant tests of association. Front Genet 2014; 5:62. [PMID: 24744770 PMCID: PMC3978329 DOI: 10.3389/fgene.2014.00062] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2013] [Accepted: 03/11/2014] [Indexed: 01/23/2023] Open
Abstract
The new class of rare variant tests has usually been evaluated assuming perfect genotype information. In reality, rare variant genotypes may be incorrect, and so rare variant tests should be robust to imperfect data. Errors and uncertainty in SNP genotyping are already known to dramatically impact statistical power for single marker tests on common variants and, in some cases, inflate the type I error rate. Recent results show that uncertainty in genotype calls derived from sequencing reads are dependent on several factors, including read depth, calling algorithm, number of alleles present in the sample, and the frequency at which an allele segregates in the population. We have recently proposed a general framework for the evaluation and investigation of rare variant tests of association, classifying most rare variant tests into one of two broad categories (length or joint tests). We use this framework to relate factors affecting genotype uncertainty to the power and type I error rate of rare variant tests. We find that non-differential genotype errors (an error process that occurs independent of phenotype) decrease power, with larger decreases for extremely rare variants, and for the common homozygote to heterozygote error. Differential genotype errors (an error process that is associated with phenotype status), lead to inflated type I error rates which are more likely to occur at sites with more common homozygote to heterozygote errors than vice versa. Finally, our work suggests that certain rare variant tests and study designs may be more robust to the inclusion of genotype errors. Further work is needed to directly integrate genotype calling algorithm decisions, study costs and test statistic choices to provide comprehensive design and analysis advice which appropriately accounts for the impact of genotype errors.
Collapse
Affiliation(s)
- Kaitlyn Cook
- Department of Mathematics, Carleton College Northfield, MN, USA
| | - Alejandra Benitez
- Department of Applied Mathematics, Brown University Providence, RI, USA
| | - Casey Fu
- Department of Mathematics, Massachusetts Institute of Technology Boston, MA, USA
| | - Nathan Tintle
- Department of Mathematics, Statistics and Computer Science, Dordt College Sioux Center, IA, USA
| |
Collapse
|
25
|
Test of rare variant association based on affected sib-pairs. Eur J Hum Genet 2014; 23:229-37. [PMID: 24667785 DOI: 10.1038/ejhg.2014.43] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Revised: 11/06/2013] [Accepted: 12/30/2013] [Indexed: 11/08/2022] Open
Abstract
With the development of sequencing techniques, there is increasing interest to detect associations between rare variants and complex traits. Quite a few statistical methods to detect associations between rare variants and complex traits have been developed for unrelated individuals. Statistical methods for detecting rare variant associations under family-based designs have not received as much attention as methods for unrelated individuals. Recent studies show that rare disease variants will be enriched in family data and thus family-based designs may improve power to detect rare variant associations. In this article, we propose a novel test to test association between the optimally weighted combination of variants and trait of interests for affected sib-pairs. The optimal weights are analytically derived and can be calculated from sampled genotypes and phenotypes. Based on the optimal weights, the proposed method is robust to the directions of the effects of causal variants and is less affected by neutral variants than existing methods are. Our simulation results show that, in all the cases, the proposed method is substantially more powerful than existing methods based on unrelated individuals and existing methods based on affected sib-pairs.
Collapse
|
26
|
Sha Q, Zhang S. A novel test for testing the optimally weighted combination of rare and common variants based on data of parents and affected children. Genet Epidemiol 2014; 38:135-43. [PMID: 24382753 PMCID: PMC4162402 DOI: 10.1002/gepi.21787] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Revised: 10/28/2013] [Accepted: 12/02/2013] [Indexed: 11/10/2022]
Abstract
With the development of sequencing technologies, the direct testing of rare variant associations has become possible. Many statistical methods for detecting associations between rare variants and complex diseases have recently been developed, most of which are population-based methods for unrelated individuals. A limitation of population-based methods is that spurious associations can occur when there is a population structure. For rare variants, this problem can be more serious, because the spectrum of rare variation can be very different in diverse populations, as well as the current nonexistence of methods to control for population stratification in population-based rare variant associations. A solution to the problem of population stratification is to use family-based association tests, which use family members to control for population stratification. In this article, we propose a novel test for Testing the Optimally Weighted combination of variants based on data of Parents and Affected Children (TOW-PAC). TOW-PAC is a family-based association test that tests the combined effect of rare and common variants in a genomic region, and is robust to the directions of the effects of causal variants. Simulation studies confirm that, for rare variant associations, family-based association tests are robust to population stratification although population-based association tests can be seriously confounded by population stratification. The results of power comparisons show that the power of TOW-PAC increases with an increase of the number of affected children in each family and TOW-PAC based on multiple affected children per family is more powerful than TOW based on unrelated individuals.
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | | |
Collapse
|
27
|
Turkmen AS, Lin S. Blocking approach for identification of rare variants in family-based association studies. PLoS One 2014; 9:e86126. [PMID: 24465912 PMCID: PMC3900483 DOI: 10.1371/journal.pone.0086126] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 12/09/2013] [Indexed: 01/14/2023] Open
Abstract
With the advent of next-generation sequencing technology, rare variant association analysis is increasingly being conducted to identify genetic variants associated with complex traits. In recent years, significant effort has been devoted to develop powerful statistical methods to test such associations for population-based designs. However, there has been relatively little development for family-based designs although family data have been shown to be more powerful to detect rare variants. This study introduces a blocking approach that extends two popular family-based common variant association tests to rare variants association studies. Several options are considered to partition a genomic region (gene) into "independent" blocks by which information from SNVs is aggregated within a block and an overall test statistic for the entire genomic region is calculated by combining information across these blocks. The proposed methodology allows different variants to have different directions (risk or protective) and specification of minor allele frequency threshold is not needed. We carried out a simulation to verify the validity of the method by showing that type I error is well under control when the underlying null hypothesis and the assumption of independence across blocks are satisfied. Further, data from the Genetic Analysis Workshop [Formula: see text] are utilized to illustrate the feasibility and performance of the proposed methodology in a realistic setting.
Collapse
Affiliation(s)
- Asuman S Turkmen
- Statistics Department, The Ohio State University, Columbus, Ohio, United States of America ; Statistics Department, The Ohio State University, Newark, Ohio, United States of America
| | - Shili Lin
- Statistics Department, The Ohio State University, Columbus, Ohio, United States of America
| |
Collapse
|
28
|
Thomas DC, Yang Z, Yang F. Two-phase and family-based designs for next-generation sequencing studies. Front Genet 2013; 4:276. [PMID: 24379824 PMCID: PMC3861783 DOI: 10.3389/fgene.2013.00276] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Accepted: 11/19/2013] [Indexed: 12/21/2022] Open
Abstract
The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes. Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy. While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Zhao Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Fan Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| |
Collapse
|
29
|
Wang X, Lee S, Zhu X, Redline S, Lin X. GEE-based SNP set association test for continuous and discrete traits in family-based association studies. Genet Epidemiol 2013; 37:778-86. [PMID: 24166731 PMCID: PMC4007511 DOI: 10.1002/gepi.21763] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Revised: 08/17/2013] [Accepted: 09/10/2013] [Indexed: 12/17/2022]
Abstract
Family-based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P-values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP-SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P-value GEE test for an SNP-set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.
Collapse
Affiliation(s)
- Xuefeng Wang
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| | - Seunggeun Lee
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA 44106
| | - Susan Redline
- Department of Medicine, Brigham and Women’s Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Xihong Lin
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| |
Collapse
|
30
|
|
31
|
|
32
|
Fang S, Zhang S, Sha Q. Detecting association of rare variants by testing an optimally weighted combination of variants for quantitative traits in general families. Ann Hum Genet 2013; 77:524-34. [PMID: 23968488 DOI: 10.1111/ahg.12038] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2012] [Accepted: 07/10/2013] [Indexed: 12/01/2022]
Abstract
Although next-generation sequencing technology allows sequencing the whole genome of large groups of individuals, the development of powerful statistical methods for rare variant association studies is still underway. Even though many statistical methods have been developed for mapping rare variants, most of these methods are for unrelated individuals only, whereas family data have been shown to improve power to detect rare variants. The majority of the existing methods for unrelated individuals is essentially testing the effect of a weighted combination of variants with different weighting schemes. The performance of these methods depends on the weights being used. Recently, researchers proposed a test for Testing the effect of an Optimally Weighted combination of variants (TOW) for unrelated individuals. In this article, we extend our previously developed TOW for unrelated individuals to family-based data and propose a novel test for Testing the effect of an Optimally Weighted combination of variants for Family-based designs (TOW-F). The optimal weights are analytically derived. The results of extensive simulation studies show that TOW-F is robust to population stratification in a wide range of population structures, is robust to the direction and magnitude of the effects of causal variants, and is relatively robust to the percentage of neutral variants.
Collapse
Affiliation(s)
- Shurong Fang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | | | | |
Collapse
|
33
|
Liu K, Fast S, Zawistowski M, Tintle NL. A geometric framework for evaluating rare variant tests of association. Genet Epidemiol 2013; 37:345-57. [PMID: 23526307 DOI: 10.1002/gepi.21722] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Revised: 02/12/2013] [Accepted: 02/13/2013] [Indexed: 11/08/2022]
Abstract
The wave of next-generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers.
Collapse
Affiliation(s)
- Keli Liu
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | | | | | | |
Collapse
|
34
|
Wang X, Morris NJ, Zhu X, Elston RC. A variance component based multi-marker association test using family and unrelated data. BMC Genet 2013; 14:17. [PMID: 23497289 PMCID: PMC3614458 DOI: 10.1186/1471-2156-14-17] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Accepted: 02/11/2013] [Indexed: 02/02/2023] Open
Abstract
Background Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples. Results The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates. Conclusions We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package “fassoc”, which provides a useful tool for data analysis and exploration.
Collapse
Affiliation(s)
- Xuefeng Wang
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | | | | | | |
Collapse
|
35
|
Shugart YY, Zhu Y, Guo W, Xiong M. Weighted pedigree-based statistics for testing the association of rare variants. BMC Genomics 2012; 13:667. [PMID: 23176082 PMCID: PMC3827928 DOI: 10.1186/1471-2164-13-667] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2012] [Accepted: 11/12/2012] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND With the advent of next-generation sequencing (NGS) technologies, researchers are now generating a deluge of data on high dimensional genomic variations, whose analysis is likely to reveal rare variants involved in the complex etiology of disease. Standing in the way of such discoveries, however, is the fact that statistics for rare variants are currently designed for use with population-based data. In this paper, we introduce a pedigree-based statistic specifically designed to test for rare variants in family-based data. The additional power of pedigree-based statistics stems from the fact that while rare variants related to diseases or traits of interest occur only infrequently in populations, in families with multiple affected individuals, such variants are enriched. Note that while the proposed statistic can be applied with and without statistical weighting, our simulations show that its power increases when weighting (WSS and VT) are applied. RESULTS Our working hypothesis was that, since rare variants are concentrated in families with multiple affected individuals, pedigree-based statistics should detect rare variants more powerfully than population-based statistics. To evaluate how well our new pedigree-based statistics perform in association studies, we develop a general framework for sequence-based association studies capable of handling data from pedigrees of various types and also from unrelated individuals. In short, we developed a procedure for transforming population-based statistics into tests for family-based associations. Furthermore, we modify two existing tests, the weighted sum-square test and the variable-threshold test, and apply both to our family-based collapsing methods. We demonstrate that the new family-based tests are more powerful than corresponding population-based test and they generate a reasonable type I error rate.To demonstrate feasibility, we apply the newly developed tests to a pedigree-based GWAS data set from the Framingham Heart Study (FHS). FHS-GWAS data contain approximately 5000 uncommon variants with frequencies less than 0.05. Potential association findings in these data demonstrate the feasibility of the software PB-STAR (note, PB-STAR is now freely available to the public). CONCLUSION Our tests show that when analyzing for rare variants, a pedigree-based design is more powerful than a population-based case-control design. We further demonstrate that a pedigree-based statistic's power to detect rare variants increases in direct relation to the proportion of affected individuals within the pedigree.
Collapse
Affiliation(s)
- Yin Yao Shugart
- Unit of Statistical Genomics, Division of Intramural Division Program, National Institute of Mental Health, National Institute of Health, Bethesda, MD, USA
| | - Yun Zhu
- Division of Biostatistics, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Wei Guo
- Unit of Statistical Genomics, Division of Intramural Division Program, National Institute of Mental Health, National Institute of Health, Bethesda, MD, USA
| | - Momiao Xiong
- Division of Biostatistics, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genetics Center, The University of Texas Health Science Center at Houston, P.O. Box 20186, Houston, TX 77225, USA
| |
Collapse
|
36
|
Abstract
PURPOSE OF REVIEW Recent identification of over 60 loci contributing to the susceptibility of developing type 1 diabetes (T1D) provides a timely opportunity to assess what is currently known of the genetics of T1D, and what these discoveries may tell us about the disease itself. RECENT FINDINGS The major findings will be discussed under five main themes: T1D risk gene identification, molecular mechanisms of susceptibility, shared genetic cause with other diseases, development of novel analytical methods, and understanding disease heterogeneity. SUMMARY The plethora of T1D risk genes that have been identified risk overwhelming clinicians with lists of gene names and symbols that have little bearing on management, and provide a challenge for researchers to place the genetics of T1D in a more amenable clinical context.
Collapse
Affiliation(s)
- Grant Morahan
- Centre for Diabetes Research, The Western Australian Institute for Medical Research, University of Western Australia, Perth, Western Australia, Australia.
| |
Collapse
|
37
|
Fang S, Sha Q, Zhang S. Two adaptive weighting methods to test for rare variant associations in family-based designs. Genet Epidemiol 2012; 36:499-507. [PMID: 22674630 DOI: 10.1002/gepi.21646] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2011] [Revised: 04/26/2012] [Accepted: 04/26/2012] [Indexed: 11/06/2022]
Abstract
Although next-generation DNA sequencing technologies have made rare variant association studies feasible and affordable, the development of powerful statistical methods for rare variant association studies is still under way. Most of the existing methods for rare variant association studies compare the number of rare mutations in a group of rare variants (in a gene or a pathway) between cases and controls. However, these methods assume that all causal variants are risk to diseases. Recently, several methods that are robust to the direction and magnitude of effects of causal variants have been proposed. However, they are applicable to unrelated individuals only, whereas family data have been shown to improve power to detect rare variants. In this article, we propose two adaptive weighting methods for rare variant association studies based on family data for quantitative traits. Using extensive simulation studies, we evaluate and compare our proposed methods with two methods based on the weights proposed by Madsen and Browning. Our results show that both proposed methods are robust to population stratification, robust to the direction and magnitude of the effects of causal variants, and more powerful than the methods using weights suggested by Madsen and Browning, especially when both risk and protective variants are present.
Collapse
Affiliation(s)
- Shurong Fang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan 49931, USA
| | | | | |
Collapse
|
38
|
Statistical Challenges in Sequence-Based Association Studies with Population- and Family-Based Designs. STATISTICS IN BIOSCIENCES 2012. [DOI: 10.1007/s12561-012-9062-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
39
|
Namkung J, Raska P, Kang J, Liu Y, Lu Q, Zhu X. Analysis of exome sequences with and without incorporating prior biological knowledge. Genet Epidemiol 2012; 35 Suppl 1:S48-55. [PMID: 22128058 DOI: 10.1002/gepi.20649] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Next-generation sequencing technology provides new opportunities and challenges in the search for genetic variants that underlie complex traits. It will also presumably uncover many new rare variants, but exactly how these variants should be incorporated into the data analysis remains a question. Several papers in our group from Genetic Analysis Workshop 17 evaluated different methods of rare variant analysis, including single-variant, gene-based, and pathway-based analyses and analyses that incorporated biological information. Although the performance of some of these methods strongly depends on the underlying disease model, integration of known biological information is helpful in detecting causal genes. Two work groups demonstrated that use of a Bayesian network and a collapsing receiver operating characteristic curve approach improves risk prediction when a disease is caused by many rare variants. Another work group suggested that modeling local rather than global ancestry may be beneficial when controlling the effect of population structure in rare variant association analysis.
Collapse
Affiliation(s)
- Junghyun Namkung
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, USA
| | | | | | | | | | | |
Collapse
|
40
|
Liu DJ, Leal SM. A unified framework for detecting rare variant quantitative trait associations in pedigree and unrelated individuals via sequence data. Hum Hered 2012; 73:105-22. [PMID: 22555759 DOI: 10.1159/000336293] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Accepted: 01/07/2012] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES There is great interest to sequence unrelated or pedigree samples for detecting rare variant quantitative trait associations. In order to reduce the cost of sequencing and improve power, many studies sequence selected samples with extreme traits. Existing methods for detecting rare variant associations were developed for unrelated samples. Methods are needed to analyze (selected or randomly ascertained) pedigree samples. METHODS We propose a unified framework of modeling extreme trait genetic associations (MEGA) with rare variants. Using MEGA and appropriate permutation algorithms, many rare variant tests can be extended to family data. As an application, we compared study designs using both sib-pairs and unrelated individuals. Extensive simulations were carried out using realistic population genetic and complex trait models. RESULTS It is demonstrated that when extreme sampling is implemented within equal-sized cohorts of unrelated individuals or sib-pairs, analyzing unrelated individuals is consistently more powerful than studying sib-pairs. A higher portion of rare variants can be identified through sequencing unrelated samples compared to sibs. Alternatively, if samples are ascertained using fixed thresholds from an infinite-sized population, sequencing one sib with the most extreme trait from each extreme concordant sib-pair is consistently the most powerful design. CONCLUSIONS MEGA will play an important role in the analysis of sequence-based genetic association studies.
Collapse
Affiliation(s)
- Dajiang J Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | |
Collapse
|
41
|
Dai Y, Jiang R, Dong J. Weighted selective collapsing strategy for detecting rare and common variants in genetic association study. BMC Genet 2012; 13:7. [PMID: 22309429 PMCID: PMC3296579 DOI: 10.1186/1471-2156-13-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2011] [Accepted: 02/06/2012] [Indexed: 01/12/2023] Open
Abstract
Background Genome-wide association studies (GWAS) have been used successfully in detecting associations between common genetic variants and complex diseases. However, common SNPs detected by current GWAS only explain a small proportion of heritable variability. With the development of next-generation sequencing technologies, researchers find more and more evidence to support the role played by rare variants in heritable variability. However, rare and common variants are often studied separately. The objective of this paper is to develop a robust strategy to analyze association between complex traits and genetic regions using both common and rare variants. Results We propose a weighted selective collapsing strategy for both candidate gene studies and genome-wide association scans. The strategy considers genetic information from both common and rare variants, selectively collapses all variants in a given region by a forward selection procedure, and uses an adaptive weight to favor more likely causal rare variants. Under this strategy, two tests are proposed. One test denoted by BwSC is sensitive to the directions of genetic effects, and it separates the deleterious and protective effects into two components. Another denoted by BwSCd is robust in the directions of genetic effects, and it considers the difference of the two components. In our simulation studies, BwSC achieves a higher power when the casual variants have the same genetic effect, while BwSCd is as powerful as several existing tests when a mixed genetic effect exists. Both of the proposed tests work well with and without the existence of genetic effects from common variants. Conclusions Two tests using a weighted selective collapsing strategy provide potentially powerful methods for association studies of sequencing data. The tests have a higher power when both common and rare variants contribute to the heritable variability and the effect of common variants is not strong enough to be detected by traditional methods. Our simulation studies have demonstrated a substantially higher power for both tests in all scenarios regardless whether the common SNPs are associated with the trait or not.
Collapse
Affiliation(s)
- Yilin Dai
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA.
| | | | | |
Collapse
|