1
|
Wang Y, Zang C, Li Z, Guo CC, Lai D, Wei P. A comparative study of statistical methods for identifying differentially expressed genes in spatial transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.17.638726. [PMID: 40027680 PMCID: PMC11870610 DOI: 10.1101/2025.02.17.638726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Spatial transcriptomics (ST) provides unprecedented insights into gene expression patterns while retaining spatial context, making it a valuable tool for understanding complex tissue architectures, such as those found in cancers. Seurat, by far the most popular tool for analyzing ST data, uses the Wilcoxon rank-sum test by default for differential expression analysis. However, as a nonparametric method that disregards spatial correlations, the Wilcoxon test can lead to inflated false positive rates and misleading findings. This limitation highlights the need for a more robust statistical approach that effectively incorporates spatial correlations. To this end, we propose a Generalized Score Test (GST) in the Generalized Estimating Equations (GEEs) framework as a robust solution for differential gene expression analysis in ST. We conducted a comprehensive comparison of the GST with existing methods, including the Wilcoxon rank-sum test and the GEEs with the robust Wald test. By appropriately accounting for spatial correlations, extensive simulations showed that the GST demonstrated superior Type I error control and comparable power relative to other methods. Applications to ST datasets from breast and prostate cancer showed that the GST-identified differentially expressed genes were enriched in pathways directly implicated in cancer progression. In contrast, the Wilcoxon test- identified genes were enriched in non-cancer pathways and produced substantial false positives, highlighting its limitations for spatially structured data. Our findings suggest that the GST approach is well-suited for ST data, offering more accurate identification of biologically relevant gene expression changes. We have implemented the proposed method in R package "SpatialGEE", available on GitHub. Author Summary Spatial transcriptomics (ST) provides unprecedented insights into gene expression patterns while retaining spatial context, making it a valuable tool for studying complex tissue architectures and disease etiology. Seurat, a widely used software tool for analyzing ST data, relies on the Wilcoxon rank-sum test for differential expression analysis. However, this test ignores spatial correlations, leading to inflated false positive rates and misleading findings. This limitation highlights the need for a more robust statistical approach that effectively incorporates spatial correlations. To this end, we have proposed a Generalized Score Test (GST) in the Generalized Estimating Equations (GEEs) framework as a robust solution for differential gene expression analysis in ST. By appropriately accounting for spatial correlations, extensive simulations showed that the GST demonstrated superior false positive rate control and comparable power relative to other methods. Applications to ST datasets from breast and prostate cancer showed that GST identified cancer-related genes and pathways more accurately than the Wilcoxon test, which produced misleading results. We have implemented the proposed method in R package "SpatialGEE", available on GitHub.
Collapse
|
2
|
Zhou Z, Tang X, Chen W, Chen Q, Ye B, Johar AS, Kullo IJ, Ding K. Rare loss-of-function variants in matrisome genes are enriched in Ebstein's anomaly. HGG ADVANCES 2024; 5:100258. [PMID: 38006208 PMCID: PMC10726248 DOI: 10.1016/j.xhgg.2023.100258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 11/26/2023] Open
Abstract
Ebstein's anomaly, a rare congenital heart disease, is distinguished by the failure of embryological delamination of the tricuspid valve leaflets from the underlying primitive right ventricle myocardium. Gaining insight into the genetic basis of Ebstein's anomaly allows a more precise definition of its pathogenesis. In this study, two distinct cohorts from the Chinese Han population were included: a case-control cohort consisting of 82 unrelated cases and 125 controls without cardiac phenotypes and a trio cohort comprising 36 parent-offspring trios. Whole-exome sequencing data from all 315 participants were utilized to identify qualifying variants, encompassing rare (minor allele frequency < 0.1% from East Asians in the gnomAD database) functional variants and high-confidence (HC) loss-of-function (LoF) variants. Various statistical models, including burden tests and variance-component models, were employed to identify rare variants, genes, and biological pathways associated with Ebstein's anomaly. Significant associations were noted between Ebstein's anomaly and rare HC LoF variants found in genes related to the matrisome, a collection of extracellular matrix (ECM) components. Specifically, 47 genes with HC LoF variants were exclusively or predominantly identified in cases, while nine genes showed such variants in the probands. Over half of unrelated cases (n = 42) and approximately one-third of probands (n = 12) were found to carry one or two LoF variants in these prioritized genes. These results highlight the role of the matrisome in the pathogenesis of Ebstein's anomaly, contributing to a better understanding of the genetic architecture underlying this condition. Our findings hold the potential to impact the genetic diagnosis and treatment approaches for Ebstein's anomaly.
Collapse
Affiliation(s)
- Zhou Zhou
- Department of Laboratory Medicine, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, P.R. China.
| | - Xia Tang
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200433, P.R. China
| | - Wen Chen
- Department of Laboratory Medicine, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, P.R. China
| | - Qianlong Chen
- Department of Laboratory Medicine, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, P.R. China
| | - Bo Ye
- Department of Clinical Data Research, Chongqing Emergency Medical Center, Chongqing Key Laboratory of Emergency Medicine, Chongqing University Central Hospital, Chongqing University, Chongqing 400014, P.R. China
| | - Angad S Johar
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Keyue Ding
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN 55905, USA.
| |
Collapse
|
3
|
Liu Z, Xu J, Tan J, Li X, Zhang F, Ouyang W, Wang S, Huang Y, Li S, Pan X. Genetic overlap for ten cardiovascular diseases: A comprehensive gene-centric pleiotropic association analysis and Mendelian randomization study. iScience 2023; 26:108150. [PMID: 37908310 PMCID: PMC10613921 DOI: 10.1016/j.isci.2023.108150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/13/2023] [Accepted: 10/02/2023] [Indexed: 11/02/2023] Open
Abstract
Recent studies suggest that pleiotropic effects may explain the genetic architecture of cardiovascular diseases (CVDs). We conducted a comprehensive gene-centric pleiotropic association analysis for ten CVDs using genome-wide association study (GWAS) summary statistics to identify pleiotropic genes and pathways that may underlie multiple CVDs. We found shared genetic mechanisms underlying the pathophysiology of CVDs, with over two-thirds of the diseases exhibiting common genes and single-nucleotide polymorphisms (SNPs). Significant positive genetic correlations were observed in more than half of paired CVDs. Additionally, we investigated the pleiotropic genes shared between different CVDs, as well as their functional pathways and distribution in different tissues. Moreover, six hub genes, including ALDH2, XPO1, HSPA1L, ESR2, WDR12, and RAB1A, as well as 26 targeted potential drugs, were identified. Our study provides further evidence for the pleiotropic effects of genetic variants on CVDs and highlights the importance of considering pleiotropy in genetic association studies.
Collapse
Affiliation(s)
- Zeye Liu
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Jing Xu
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, Beijing, China
| | - Jiangshan Tan
- Key Laboratory of Pulmonary Vascular Medicine, National Clinical Research Center of Cardiovascular Diseases, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China
| | - Xiaofei Li
- Department of Cardiology, Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Fengwen Zhang
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Wenbin Ouyang
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Shouzheng Wang
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Yuan Huang
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Pediatric Cardiac Surgery Center, Fuwai Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, Beijing, China
| | - Shoujun Li
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Pediatric Cardiac Surgery Center, Fuwai Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, Beijing, China
| | - Xiangbin Pan
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| |
Collapse
|
4
|
Dossa HRG, Bureau A, Maziade M, Lakhal-Chaieb L, Oualkacha K. A novel rare variants association test for binary traits in family-based designs via copulas. Stat Methods Med Res 2023; 32:2096-2122. [PMID: 37832140 PMCID: PMC10683345 DOI: 10.1177/09622802231197977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2023]
Abstract
With the cost-effectiveness technology in whole-genome sequencing, more sophisticated statistical methods for testing genetic association with both rare and common variants are being investigated to identify the genetic variation between individuals. Several methods which group variants, also called gene-based approaches, are developed. For instance, advanced extensions of the sequence kernel association test, which is a widely used variant-set test, have been proposed for unrelated samples and extended for family data. Family data have been shown to be powerful when analyzing rare variants. However, most of such methods capture familial relatedness using a random effect component within the generalized linear mixed model framework. Therefore, there is a need to develop unified and flexible methods to study the association between a set of genetic variants and a trait, especially for a binary outcome. Copulas are multivariate distribution functions with uniform margins on the [ 0 , 1 ] interval and they provide suitable models to capture familial dependence structure. In this work, we propose a flexible family-based association test for both rare and common variants in the presence of binary traits. The method, termed novel rare variant association test (NRVAT), uses a marginal logistic model and a Gaussian Copula. The latter is employed to model the dependence between relatives. An analytic score-type test is derived. Through simulations, we show that our method can achieve greater power than existing approaches. The proposed model is applied to investigate the association between schizophrenia and bipolar disorder in a family-based cohort consisting of 17 extended families from Eastern Quebec.
Collapse
Affiliation(s)
- Houssou R. G. Dossa
- Département de Mathématiques, Université du Québec à Montréal (UQAM) et, Québec, Canada
| | - Alexandre Bureau
- Département de Médecine Sociale et Préventive, Université Laval, Québec, Canada
- Centre de Recherche CERVO, Quebec, Canada
| | - Michel Maziade
- Centre de Recherche CERVO, Quebec, Canada
- Département de Psychiatrie et Neuroscience, Université Laval, Québec, Canada
| | - Lajmi Lakhal-Chaieb
- Département de Mathématiques et Statistique, Université Laval, Québec, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal (UQAM) et, Québec, Canada
| |
Collapse
|
5
|
Rajabli F, Kunkle BW. Strategies in Aggregation Tests for Rare Variants. Curr Protoc 2023; 3:e931. [PMID: 37988228 DOI: 10.1002/cpz1.931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Genome-wide association studies (GWAS) successfully identified numerous common variants involved in complex diseases, but only limited heritability was explained by these findings. Advances in high-throughput sequencing technology made it possible to assess the contribution of rare variants in common diseases. However, study of rare variants introduces challenges due to low frequency of rare variants. Well-established common variant methods were underpowered to identify the rare variants in GWAS. To address this challenge, several new methods have been developed to examine the role of rare variants in complex diseases. These approaches are based on testing the aggregate effect of multiple rare variants in a predefined genetic region. Provided here is an overview of statistical approaches and the protocols explaining step-by-step analysis of aggregations tests with the hands-on experience using R scripts in four categories: burden tests, adaptive burden tests, variance-component tests, and combined tests. Also explained are the concepts of rare variants, permutation tests, kernel methods, and genetic variant annotation. At the end we discuss relevant topics of bioinformatics tools for annotation, family-based design of rare-variant analysis, population stratification adjustment, and meta-analysis. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Farid Rajabli
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, USA
| | - Brian W Kunkle
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, USA
| |
Collapse
|
6
|
Genetic correlation and gene-based pleiotropy analysis for four major neurodegenerative diseases with summary statistics. Neurobiol Aging 2023; 124:117-128. [PMID: 36740554 DOI: 10.1016/j.neurobiolaging.2022.12.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 03/25/2022] [Accepted: 12/27/2022] [Indexed: 01/02/2023]
Abstract
Recent genome-wide association studies suggested shared genetic components between neurodegenerative diseases. However, pleiotropic association patterns among them remain poorly understood. We here analyzed 4 major neurodegenerative diseases including Alzheimer's disease (AD), Parkinson's disease (PD), frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS), and found suggestively positive genetic correlation. We next implemented a gene-centric pleiotropy analysis with a powerful method called PLACO and detected 280 pleiotropic associations (226 unique genes) with these diseases. Functional analyses demonstrated that these genes were enriched in the pancreas, liver, heart, blood, brain, and muscle tissues; and that 42 pleiotropic genes exhibited drug-gene interactions with 341 drugs. Using Mendelian randomization, we discovered that AD and PD can increase the risk of developing ALS, and that AD and ALS can also increase the risk of developing FTD, respectively. Overall, this study provides in-depth insights into shared genetic components and causal relationship among the 4 major neurodegenerative diseases, indicating genetic overlap and causality commonly drive their co-occurrence. It also has important implications on the etiology understanding, drug development and therapeutic targets for neurodegenerative diseases.
Collapse
|
7
|
Lu H, Qiao J, Shao Z, Wang T, Huang S, Zeng P. A comprehensive gene-centric pleiotropic association analysis for 14 psychiatric disorders with GWAS summary statistics. BMC Med 2021; 19:314. [PMID: 34895209 PMCID: PMC8667366 DOI: 10.1186/s12916-021-02186-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 11/10/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Recent genome-wide association studies (GWASs) have revealed the polygenic nature of psychiatric disorders and discovered a few of single-nucleotide polymorphisms (SNPs) associated with multiple psychiatric disorders. However, the extent and pattern of pleiotropy among distinct psychiatric disorders remain not completely clear. METHODS We analyzed 14 psychiatric disorders using summary statistics available from the largest GWASs by far. We first applied the cross-trait linkage disequilibrium score regression (LDSC) to estimate genetic correlation between disorders. Then, we performed a gene-based pleiotropy analysis by first aggregating a set of SNP-level associations into a single gene-level association signal using MAGMA. From a methodological perspective, we viewed the identification of pleiotropic associations across the entire genome as a high-dimensional problem of composite null hypothesis testing and utilized a novel method called PLACO for pleiotropy mapping. We ultimately implemented functional analysis for identified pleiotropic genes and used Mendelian randomization for detecting causal association between these disorders. RESULTS We confirmed extensive genetic correlation among psychiatric disorders, based on which these disorders can be grouped into three diverse categories. We detected a large number of pleiotropic genes including 5884 associations and 2424 unique genes and found that differentially expressed pleiotropic genes were significantly enriched in pancreas, liver, heart, and brain, and that the biological process of these genes was remarkably enriched in regulating neurodevelopment, neurogenesis, and neuron differentiation, offering substantial evidence supporting the validity of identified pleiotropic loci. We further demonstrated that among all the identified pleiotropic genes there were 342 unique ones linked with 6353 drugs with drug-gene interaction which can be classified into distinct types including inhibitor, agonist, blocker, antagonist, and modulator. We also revealed causal associations among psychiatric disorders, indicating that genetic overlap and causality commonly drove the observed co-existence of these disorders. CONCLUSIONS Our study is among the first large-scale effort to characterize gene-level pleiotropy among a greatly expanded set of psychiatric disorders and provides important insight into shared genetic etiology underlying these disorders. The findings would inform psychiatric nosology, identify potential neurobiological mechanisms predisposing to specific clinical presentations, and pave the way to effective drug targets for clinical treatment.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
8
|
Wang S, Meigs JB, Dupuis J. Genetic association tests in family samples for multi-category phenotypes. BMC Genomics 2021; 22:873. [PMID: 34863089 PMCID: PMC8642939 DOI: 10.1186/s12864-021-08107-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 10/19/2021] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Advancements in statistical methods and sequencing technology have led to numerous novel discoveries in human genetics in the past two decades. Among phenotypes of interest, most attention has been given to studying genetic associations with continuous or binary traits. Efficient statistical methods have been proposed and are available for both types of traits under different study designs. However, for multinomial categorical traits in related samples, there is a lack of efficient statistical methods and software. RESULTS We propose an efficient score test to analyze a multinomial trait in family samples, in the context of genome-wide association/sequencing studies. An alternative Wald statistic is also proposed. We also extend the methodology to be applicable to ordinal traits. We performed extensive simulation studies to evaluate the type-I error of the score test, Wald test compared to the multinomial logistic regression for unrelated samples, under different allele frequency and study designs. We also evaluate the power of these methods. Results show that both the score and Wald tests have a well-controlled type-I error rate, but the multinomial logistic regression has an inflated type-I error rate when applied to family samples. We illustrated the application of the score test with an application to the Framingham Heart Study to uncover genetic variants associated with diabesity, a multi-category phenotype. CONCLUSION Both proposed tests have correct type-I error rate and similar power. However, because the Wald statistics rely on computer-intensive estimation, it is less efficient than the score test in terms of applications to large-scale genetic association studies. We provide computer implementation for both multinomial and ordinal traits.
Collapse
Affiliation(s)
- Shuai Wang
- Pfizer Inc, Global Product Development, Groton, CT, 06340, USA.
| | - James B Meigs
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA.,Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA.,Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, 02118, USA
| |
Collapse
|
9
|
Lu H, Wei Y, Jiang Z, Zhang J, Wang T, Huang S, Zeng P. Integrative eQTL-weighted hierarchical Cox models for SNP-set based time-to-event association studies. J Transl Med 2021; 19:418. [PMID: 34627275 PMCID: PMC8502405 DOI: 10.1186/s12967-021-03090-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 09/26/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Integrating functional annotations into SNP-set association studies has been proven a powerful analysis strategy. Statistical methods for such integration have been developed for continuous and binary phenotypes; however, the SNP-set integrative approaches for time-to-event or survival outcomes are lacking. METHODS We here propose IEHC, an integrative eQTL (expression quantitative trait loci) hierarchical Cox regression, for SNP-set based survival association analysis by modeling effect sizes of genetic variants as a function of eQTL via a hierarchical manner. Three p-values combination tests are developed to examine the joint effects of eQTL and genetic variants after a novel decorrelated modification of statistics for the two components. An omnibus test (IEHC-ACAT) is further adapted to aggregate the strengths of all available tests. RESULTS Simulations demonstrated that the IEHC joint tests were more powerful if both eQTL and genetic variants contributed to association signal, while IEHC-ACAT was robust and often outperformed other approaches across various simulation scenarios. When applying IEHC to ten TCGA cancers by incorporating eQTL from relevant tissues of GTEx, we revealed that substantial correlations existed between the two types of effect sizes of genetic variants from TCGA and GTEx, and identified 21 (9 unique) cancer-associated genes which would otherwise be missed by approaches not incorporating eQTL. CONCLUSION IEHC represents a flexible, robust, and powerful approach to integrate functional omics information to enhance the power of identifying association signals for the survival risk of complex human cancers.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yongyue Wei
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, 211166, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jinhui Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
10
|
Lu H, Zhang J, Jiang Z, Zhang M, Wang T, Zhao H, Zeng P. Detection of Genetic Overlap Between Rheumatoid Arthritis and Systemic Lupus Erythematosus Using GWAS Summary Statistics. Front Genet 2021; 12:656545. [PMID: 33815486 PMCID: PMC8012913 DOI: 10.3389/fgene.2021.656545] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/01/2021] [Indexed: 01/04/2023] Open
Abstract
Background Clinical and epidemiological studies have suggested systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) are comorbidities and common genetic etiologies can partly explain such coexistence. However, shared genetic determinations underlying the two diseases remain largely unknown. Methods Our analysis relied on summary statistics available from genome-wide association studies of SLE (N = 23,210) and RA (N = 58,284). We first evaluated the genetic correlation between RA and SLE through the linkage disequilibrium score regression (LDSC). Then, we performed a multiple-tissue eQTL (expression quantitative trait loci) weighted integrative analysis for each of the two diseases and aggregated association evidence across these tissues via the recently proposed harmonic mean P-value (HMP) combination strategy, which can produce a single well-calibrated P-value for correlated test statistics. Afterwards, we conducted the pleiotropy-informed association using conjunction conditional FDR (ccFDR) to identify potential pleiotropic genes associated with both RA and SLE. Results We found there existed a significant positive genetic correlation (rg = 0.404, P = 6.01E-10) via LDSC between RA and SLE. Based on the multiple-tissue eQTL weighted integrative analysis and the HMP combination across various tissues, we discovered 14 potential pleiotropic genes by ccFDR, among which four were likely newly novel genes (i.e., INPP5B, OR5K2, RP11-2C24.5, and CTD-3105H18.4). The SNP effect sizes of these pleiotropic genes were typically positively dependent, with an average correlation of 0.579. Functionally, these genes were implicated in multiple auto-immune relevant pathways such as inositol phosphate metabolic process, membrane and glucagon signaling pathway. Conclusion This study reveals common genetic components between RA and SLE and provides candidate associated loci for understanding of molecular mechanism underlying the comorbidity of the two diseases.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Zhou Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Meng Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
11
|
Jiang Y, Chiu CY, Yan Q, Chen W, Gorin MB, Conley YP, Lakhal-Chaieb ML, Cook RJ, Amos CI, Wilson AF, Bailey-Wilson JE, McMahon FJ, Vazquez AI, Yuan A, Zhong X, Xiong M, Weeks DE, Fan R. Gene-Based Association Testing of Dichotomous Traits With Generalized Functional Linear Mixed Models Using Extended Pedigrees: Applications to Age-Related Macular Degeneration. J Am Stat Assoc 2020; 116:531-545. [PMID: 34321704 PMCID: PMC8315575 DOI: 10.1080/01621459.2020.1799809] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 07/09/2020] [Accepted: 07/17/2020] [Indexed: 10/23/2022]
Abstract
Genetics plays a role in age-related macular degeneration (AMD), a common cause of blindness in the elderly. There is a need for powerful methods for carrying out region-based association tests between a dichotomous trait like AMD and genetic variants on family data. Here, we apply our new generalized functional linear mixed models (GFLMM) developed to test for gene-based association in a set of AMD families. Using common and rare variants, we observe significant association with two known AMD genes: CFH and ARMS2. Using rare variants, we find suggestive signals in four genes: ASAH1, CLEC6A, TMEM63C, and SGSM1. Intriguingly, ASAH1 is down-regulated in AMD aqueous humor, and ASAH1 deficiency leads to retinal inflammation and increased vulnerability to oxidative stress. These findings were made possible by our GFLMM which model the effect of a major gene as a fixed mean, the polygenic contributions as a random variation, and the correlation of pedigree members by kinship coefficients. Simulations indicate that the GFLMM likelihood ratio tests (LRTs) accurately control the Type I error rates. The LRTs have similar or higher power than existing retrospective kernel and burden statistics. Our GFLMM-based statistics provide a new tool for conducting family-based genetic studies of complex diseases. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Collapse
Affiliation(s)
- Yingda Jiang
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Children’s Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, PA
| | - Wei Chen
- Division of Pulmonary Medicine, Allergy and Immunology, Children’s Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, PA
| | - Michael B. Gorin
- Department of Ophthalmology, David Geffen School of Medicine, UCLA Stein Eye Institute, Los Angeles, CA
| | - Yvette P. Conley
- Department of Health Promotion and Development, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | | | - Richard J. Cook
- Department of Statistics and Actuarial Science, Waterloo, ON, Canada
| | | | - Alexander F. Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Joan E. Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Francis J. McMahon
- Human Genetics Branch and Genetic Basis of Mood and Anxiety Disorders Section, National Institute of Mental Health, NIH, Bethesda, MD
| | - Ana I. Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI
| | - Ao Yuan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| | - Xiaogang Zhong
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| | - Momiao Xiong
- Human Genetics Center, University of Texas, Houston, TX
| | - Daniel E. Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Ruzong Fan
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| |
Collapse
|
12
|
de Andrade M, Mazo Lopera MA, Duarte NE. Bivariate traits association analysis using generalized estimating equations in family data. Stat Appl Genet Mol Biol 2020; 19:sagmb-2019-0030. [PMID: 32374294 DOI: 10.1515/sagmb-2019-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Genome wide association study (GWAS) is becoming fundamental in the arduous task of deciphering the etiology of complex diseases. The majority of the statistical models used to address the genes-disease association consider a single response variable. However, it is common for certain diseases to have correlated phenotypes such as in cardiovascular diseases. Usually, GWAS typically sample unrelated individuals from a population and the shared familial risk factors are not investigated. In this paper, we propose to apply a bivariate model using family data that associates two phenotypes with a genetic region. Using generalized estimation equations (GEE), we model two phenotypes, either discrete, continuous or a mixture of them, as a function of genetic variables and other important covariates. We incorporate the kinship relationships into the working matrix extended to a bivariate analysis. The estimation method and the joint gene-set effect in both phenotypes are developed in this work. We also evaluate the proposed methodology with a simulation study and an application to real data.
Collapse
Affiliation(s)
- Mariza de Andrade
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, 55905, USA
| | - Mauricio A Mazo Lopera
- Escuela de Estadística, Universidad Nacional de Colombia, Medellín, Antioquia, 050022, Colombia
| | - Nubia E Duarte
- Departamento de Matemáticas, Universidad Nacional de Colombia, Manizales, Caldas, 170001, Colombia
| |
Collapse
|
13
|
Kim Y, Chi YY, Zou F. An efficient integrative resampling method for gene-trait association analysis. Genet Epidemiol 2019; 44:197-207. [PMID: 31820489 DOI: 10.1002/gepi.22271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 10/27/2019] [Accepted: 11/25/2019] [Indexed: 11/07/2022]
Abstract
Genetic association studies are popular for identifying genetic variants, such as single nucleotide polymorphisms (SNPs), that are associated with complex traits. Statistical tests are commonly performed one SNP at a time with an assumed mode of inheritance such as recessive, additive, or dominant genetic model. Such analysis can result in inadequate power when the employed model deviates from the underlying true genetic model. We propose an integrative association test procedure under a generalized linear model framework to flexibly model the data from the above three common genetic models and beyond. A computationally efficient resampling procedure is adopted to estimate the null distribution of the proposed test statistic. Simulation results show that our methods maintain the Type I error rate irrespective of the existence of confounding covariates and achieve adequate power compared to the methods with the true genetic model. The new methods are applied to two genetic studies on the resistance of severe malaria and sarcoidosis.
Collapse
Affiliation(s)
- Yeonil Kim
- Early Development Statistics, Merck & Co., Inc., Rahway, New Jersey
| | - Yueh-Yun Chi
- Department of Biostatistics, University of Florida, Gainesville, Florida
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
14
|
Gurinovich A, Bae H, Farrell JJ, Andersen SL, Monti S, Puca A, Atzmon G, Barzilai N, Perls TT, Sebastiani P. PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects. Bioinformatics 2019; 35:3046-3054. [PMID: 30624692 PMCID: PMC6735784 DOI: 10.1093/bioinformatics/btz017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 11/01/2018] [Accepted: 01/04/2019] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. RESULTS In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. AVAILABILITY AND IMPLEMENTATION PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Harold Bae
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, USA
| | - John J Farrell
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Stacy L Andersen
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Stefano Monti
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Annibale Puca
- Department of Medicine and Surgery, University of Salerno, Fisciano, Italy
- Cardiovascular Research Unit, IRCCS MultiMedica, Sesto San Giovanni, Italy
| | - Gil Atzmon
- Department of Medicine and Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Nir Barzilai
- Department of Medicine and Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Thomas T Perls
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| |
Collapse
|
15
|
Chen Y, Adrianto I, Ianuzzi MC, Garman L, Montgomery CG, Rybicki BA, Levin AM, Li J. Extended methods for gene-environment-wide interaction scans in studies of admixed individuals with varying degrees of relationships. Genet Epidemiol 2019; 43:414-426. [PMID: 30793815 DOI: 10.1002/gepi.22196] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 12/26/2018] [Accepted: 01/24/2019] [Indexed: 11/08/2022]
Abstract
The etiology of many complex diseases involves both environmental exposures and inherited genetic predisposition as well as interactions between them. Gene-environment-wide interaction studies (GEWIS) provide a means to identify the interactions between genetic variation and environmental exposures that underlie disease risk. However, current GEWIS methods lack the capability to adjust for the potentially complex correlations in studies with varying degrees of relationships (both known and unknown) among individuals in admixed populations. We developed novel generalized estimating equation (GEE) based methods-GEE-adaptive and GEE-joint-to account for phenotypic correlations due to kinship while accounting for covariates, including, measures of genome-wide ancestry. In simulation studies of admixed individuals, both methods controlled family-wise error rates, an advantage over the case-only approach. They demonstrated higher power than traditional case-control methods across a wide range of underlying alternative hypotheses, especially where both marginal and interaction effects were present. We applied the proposed method to conduct a GEWIS of a known sarcoidosis risk factor (insecticide exposure) and risk of sarcoidosis in African Americans and identified two novel loci with suggestive evidence of G × E interaction.
Collapse
Affiliation(s)
- Yalei Chen
- Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan.,Center for Bioinformatics, Henry Ford Health System, Detroit, Michigan
| | - Indra Adrianto
- Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan.,Center for Bioinformatics, Henry Ford Health System, Detroit, Michigan
| | - Michael C Ianuzzi
- Department of Internal Medicine, Northwell Staten Island University Hospital, Staten Island, New York, New York
| | - Lori Garman
- Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, Oklahoma City, Oklahoma
| | - Courtney G Montgomery
- Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, Oklahoma City, Oklahoma
| | - Benjamin A Rybicki
- Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan
| | - Albert M Levin
- Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan.,Center for Bioinformatics, Henry Ford Health System, Detroit, Michigan
| | - Jia Li
- Department of Public Health Sciences, Henry Ford Health System, Detroit, Michigan.,Center for Bioinformatics, Henry Ford Health System, Detroit, Michigan
| |
Collapse
|
16
|
Chen H, Huffman JE, Brody JA, Wang C, Lee S, Li Z, Gogarten SM, Sofer T, Bielak LF, Bis JC, Blangero J, Bowler RP, Cade BE, Cho MH, Correa A, Curran JE, de Vries PS, Glahn DC, Guo X, Johnson AD, Kardia S, Kooperberg C, Lewis JP, Liu X, Mathias RA, Mitchell BD, O’Connell JR, Peyser PA, Post WS, Reiner AP, Rich SS, Rotter JI, Silverman EK, Smith JA, Vasan RS, Wilson JG, Yanek LR, Redline S, Smith NL, Boerwinkle E, Borecki IB, Cupples LA, Laurie CC, Morrison AC, Rice KM, Lin X, Rice KM, Lin X. Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies. Am J Hum Genet 2019; 104:260-274. [PMID: 30639324 DOI: 10.1016/j.ajhg.2018.12.012] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 12/17/2018] [Indexed: 12/12/2022] Open
Abstract
With advances in whole-genome sequencing (WGS) technology, more advanced statistical methods for testing genetic association with rare variants are being developed. Methods in which variants are grouped for analysis are also known as variant-set, gene-based, and aggregate unit tests. The burden test and sequence kernel association test (SKAT) are two widely used variant-set tests, which were originally developed for samples of unrelated individuals and later have been extended to family data with known pedigree structures. However, computationally efficient and powerful variant-set tests are needed to make analyses tractable in large-scale WGS studies with complex study samples. In this paper, we propose the variant-set mixed model association tests (SMMAT) for continuous and binary traits using the generalized linear mixed model framework. These tests can be applied to large-scale WGS studies involving samples with population structure and relatedness, such as in the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program. SMMATs share the same null model for different variant sets, and a virtue of this null model, which includes covariates only, is that it needs to be fit only once for all tests in each genome-wide analysis. Simulation studies show that all the proposed SMMATs correctly control type I error rates for both continuous and binary traits in the presence of population structure and relatedness. We also illustrate our tests in a real data example of analysis of plasma fibrinogen levels in the TOPMed program (n = 23,763), using the Analysis Commons, a cloud-based computing platform.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
17
|
Rhoades R, Jackson F, Teng S. Discovery of rare variants implicated in schizophrenia using next-generation sequencing. JOURNAL OF TRANSLATIONAL GENETICS AND GENOMICS 2019; 3:1-20. [PMID: 33981965 PMCID: PMC8112455 DOI: 10.20517/jtgg.2018.26] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
Abstract
Schizophrenia is a highly heritable psychiatric disorder that affects 1% of the population. Genome-wide association studies have identified common variants in candidate genes associated with schizophrenia, but the genetics mechanisms of this disorder have not yet been elucidated. The discovery of rare genetic variants that contribute to schizophrenia symptoms promises to help explain the missing heritability of the disease. Next generation sequencing techniques are revolutionizing the field of psychiatric genetics. Various statistical approaches have been developed for rare variant association testing in case-control and family studies. Targeted resequencing, whole exome sequencing and whole genome sequencing combined with these computational tools are used for the discovery of rare genetic variations in schizophrenia. The findings provide useful information for characterizing the rare mutations and elucidating the genetic mechanisms by which the variants cause schizophrenia.
Collapse
Affiliation(s)
- Raina Rhoades
- Department of Biology, Howard University, Washington, DC 20059, USA
| | - Fatimah Jackson
- Department of Biology, Howard University, Washington, DC 20059, USA
| | - Shaolei Teng
- Department of Biology, Howard University, Washington, DC 20059, USA
| |
Collapse
|
18
|
Wang X, Boekstegers F, Brinster R. Methods and results from the genome-wide association group at GAW20. BMC Genet 2018; 19:79. [PMID: 30255814 PMCID: PMC6157187 DOI: 10.1186/s12863-018-0649-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND This paper summarizes the contributions from the Genome-wide Association Study group (GWAS group) of the GAW20. The GWAS group contributions focused on topics such as association tests, phenotype imputation, and application of empirical kinships. The goals of the GWAS group contributions were varied. A real or a simulated data set based on the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study was employed by different methods. Different outcomes and covariates were considered, and quality control procedures varied throughout the contributions. RESULTS The consideration of heritability and family structure played a major role in some contributions. The inclusion of family information and adaptive weights based on data were found to improve power in genome-wide association studies. It was proven that gene-level approaches are more powerful than single-marker analysis. Other contributions focused on the comparison between pedigree-based kinship and empirical kinship matrices, and investigated similar results in heritability estimation, association mapping, and genomic prediction. A new approach for linkage mapping of triglyceride levels was able to identify a novel linkage signal. CONCLUSIONS This summary paper reports on promising statistical approaches and findings of the members of the GWAS group applied on real and simulated data which encompass the current topics of epigenetic and pharmacogenomics.
Collapse
Affiliation(s)
- Xuexia Wang
- University of North Texas, GAB 459, 1155 Union Circle #311430, Denton, TX 76203 USA
| | - Felix Boekstegers
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, 69120 Heidelberg, Germany
| | - Regina Brinster
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, 69120 Heidelberg, Germany
| |
Collapse
|
19
|
Wu X, Guan T, Liu DJ, León Novelo LG, Bandyopadhyay D. ADAPTIVE-WEIGHT BURDEN TEST FOR ASSOCIATIONS BETWEEN QUANTITATIVE TRAITS AND GENOTYPE DATA WITH COMPLEX CORRELATIONS. Ann Appl Stat 2018; 12:1558-1582. [PMID: 30214655 DOI: 10.1214/17-aoas1121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
High-throughput sequencing has often been used to screen samples from pedigrees or with population structure, producing genotype data with complex correlations rendered from both familial relation and linkage disequilibrium. With such data, it is critical to account for these genotypic correlations when assessing the contribution of variants by gene or pathway. Recognizing the limitations of existing association testing methods, we propose Adaptive-weight Burden Test (ABT), a retrospective, mixed-model test for genetic association of quantitative traits on genotype data with complex correlations. This method makes full use of genotypic correlations across both samples and variants, and adopts "data-driven" weights to improve power. We derive the ABT statistic and its explicit distribution under the null hypothesis, and demonstrate through simulation studies that it is generally more powerful than the fixed-weight burden test and family-based SKAT in various scenarios, controlling for the type I error rate. Further investigation reveals the connection of ABT with kernel tests, as well as the adaptability of its weights to the direction of genetic effects. The application of ABT is illustrated by a whole genome analysis of genes with common and rare variants associated with fasting glucose from the NHLBI "Grand Opportunity" Exome Sequencing Project.
Collapse
Affiliation(s)
- Xiaowei Wu
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, MC0439, Blacksburg, VA 24061, USA
| | - Ting Guan
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, MC0439, Blacksburg, VA 24061, USA
| | - Dajiang J Liu
- Department of Public Health Sciences, Hershey Institute of Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Luis G León Novelo
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center, Houston, TX 77030, USA
| | | |
Collapse
|
20
|
Wang X, Zhang Z, Morris N, Cai T, Lee S, Wang C, Yu TW, Walsh CA, Lin X. Rare variant association test in family-based sequencing studies. Brief Bioinform 2018; 18:954-961. [PMID: 27677958 DOI: 10.1093/bib/bbw083] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Indexed: 12/20/2022] Open
Abstract
The objective of this article is to introduce valid and robust methods for the analysis of rare variants for family-based exome chips, whole-exome sequencing or whole-genome sequencing data. Family-based designs provide unique opportunities to detect genetic variants that complement studies of unrelated individuals. Currently, limited methods and software tools have been developed to assist family-based association studies with rare variants, especially for analyzing binary traits. In this article, we address this gap by extending existing burden and kernel-based gene set association tests for population data to related samples, with a particular emphasis on binary phenotypes. The proposed approach blends the strengths of kernel machine methods and generalized estimating equations. Importantly, the efficient generalized kernel score test can be applied as a mega-analysis framework to combine studies with different designs. We illustrate the application of the proposed method using data from an exome sequencing study of autism. Methods discussed in this article are implemented in an R package 'gskat', which is available on CRAN and GitHub.
Collapse
|
21
|
Detecting Rare Mutations with Heterogeneous Effects Using a Family-Based Genetic Random Field Method. Genetics 2018; 210:463-476. [PMID: 30104420 DOI: 10.1534/genetics.118.301266] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 07/29/2018] [Indexed: 01/19/2023] Open
Abstract
The genetic etiology of many complex diseases is highly heterogeneous. A complex disease can be caused by multiple mutations within the same gene or mutations in multiple genes at various genomic loci. Although these disease-susceptibility mutations can be collectively common in the population, they are often individually rare or even private to certain families. Family-based studies are powerful for detecting rare variants enriched in families, which is an important feature for sequencing studies due to the heterogeneous nature of rare variants. In addition, family designs can provide robust protection against population stratification. Nevertheless, statistical methods for analyzing family-based sequencing data are underdeveloped, especially those accounting for heterogeneous etiology of complex diseases. In this article, we introduce a random field framework for detecting gene-phenotype associations in family-based sequencing studies, referred to as family-based genetic random field (FGRF). Similar to existing family-based association tests, FGRF could utilize within-family and between-family information separately or jointly to test an association. We demonstrate that FGRF has comparable statistical power with existing methods when there is no genetic heterogeneity, but can improve statistical power when there is genetic heterogeneity across families. The proposed method also shares the same advantages with the conventional family-based association tests (e.g., being robust to population stratification). Finally, we applied the proposed method to a sequencing data from the Minnesota Twin Family Study, and revealed several genes, including SAMD14, potentially associated with alcohol dependence.
Collapse
|
22
|
Wang K. Conditional asymptotic inference for the kernel association test. Bioinformatics 2018; 33:3733-3739. [PMID: 28961861 DOI: 10.1093/bioinformatics/btx511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 08/08/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation The kernel association test (KAT) is popular in biological studies for its ability to combine weak effects potentially of opposite direction. Its P-value is typically assessed via its (unconditional) asymptotic distribution. However, such an asymptotic distribution is known only for continuous traits and for dichotomous traits. Furthermore, the derived P-values are known to be conservative when sample size is small, especially for the important case of dichotomous traits. One alternative is the permutation test, a widely accepted approximation to the exact finite sample conditional inference. But it is time-consuming to use in practice due to stringent significance criteria commonly seen in these analyses. Results Based on a previous theoretical result a conditional asymptotic distribution for the KAT is introduced. This distribution provides an alternative approximation to the exact distribution of the KAT. An explicit expression of this distribution is provided from which P-values can be easily computed. This method applies to any type of traits. The usefulness of this approach is demonstrated via extensive simulation studies using real genotype data and an analysis of genetic data from the Ocular Hypertension Treatment Study. Numerical results showed that the new method can control the type I error rate and is a bit conservative when compared to the permutation method. Nevertheless the proposed method may be used as a fast screening method. A time-consuming permutation procedure may be conducted at locations that show signals of association. Availability and implementation An implementation of the proposed method is provided in the R package iGasso. Contact kai-wang@uiowa.edu.
Collapse
Affiliation(s)
- Kai Wang
- Department of Biostatistics, University of Iowa, Iowa City, IA 52246, USA
| |
Collapse
|
23
|
Hackinger S, Zeggini E. Statistical methods to detect pleiotropy in human complex traits. Open Biol 2018; 7:rsob.170125. [PMID: 29093210 PMCID: PMC5717338 DOI: 10.1098/rsob.170125] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 09/29/2017] [Indexed: 12/13/2022] Open
Abstract
In recent years pleiotropy, the phenomenon of one genetic locus influencing several traits, has become a widely researched field in human genetics. With the increasing availability of genome-wide association study summary statistics, as well as the establishment of deeply phenotyped sample collections, it is now possible to systematically assess the genetic overlap between multiple traits and diseases. In addition to increasing power to detect associated variants, multi-trait methods can also aid our understanding of how different disorders are aetiologically linked by highlighting relevant biological pathways. A plethora of available tools to perform such analyses exists, each with their own advantages and limitations. In this review, we outline some of the currently available methods to conduct multi-trait analyses. First, we briefly introduce the concept of pleiotropy and outline the current landscape of pleiotropy research in human genetics; second, we describe analytical considerations and analysis methods; finally, we discuss future directions for the field.
Collapse
|
24
|
Fernández MV, Budde J, Del-Aguila JL, Ibañez L, Deming Y, Harari O, Norton J, Morris JC, Goate AM, NIA-LOAD family study group, NCRAD, Cruchaga C. Evaluation of Gene-Based Family-Based Methods to Detect Novel Genes Associated With Familial Late Onset Alzheimer Disease. Front Neurosci 2018; 12:209. [PMID: 29670507 PMCID: PMC5893779 DOI: 10.3389/fnins.2018.00209] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2017] [Accepted: 03/15/2018] [Indexed: 12/22/2022] Open
Abstract
Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families (N = 1,235) with late-onset Alzheimer disease (LOAD). After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B, a GWAS candidate gene for sporadic AD, along with six novel genes (CHRD, CLCN2, HDLBP, CPAMD8, NLRP9, and MAS1L) as candidate genes for familial LOAD.
Collapse
Affiliation(s)
- Maria V. Fernández
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - John Budde
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - Jorge L. Del-Aguila
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - Laura Ibañez
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - Yuetiva Deming
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - Oscar Harari
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - Joanne Norton
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| | - John C. Morris
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
- Knight Alzheimer's Disease Research Center, Washington University School of Medicine, St. Louis, MO, United States
| | - Alison M. Goate
- Department of Neuroscience, Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | | | | | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, United States
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, MO, United States
| |
Collapse
|
25
|
Davenport CA, Maity A, Sullivan PF, Tzeng JY. A Powerful Test for SNP Effects on Multivariate Binary Outcomes using Kernel Machine Regression. STATISTICS IN BIOSCIENCES 2018; 10:117-138. [PMID: 30420901 PMCID: PMC6226013 DOI: 10.1007/s12561-017-9189-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 12/20/2016] [Accepted: 03/15/2017] [Indexed: 10/19/2022]
Abstract
Evaluating multiple binary outcomes is common in genetic studies of complex diseases. These outcomes are often correlated because they are collected from the same individual and they may share common marker effects. In this paper, we propose a procedure to test for effect of a SNP-set on multiple, possibly correlated, binary responses. We develop a score-based test using a nonparametric modeling framework that jointly models the global effect of the marker set. We account for the nonlinear effects and potentially complicated interaction between markers using reproducing kernels. Our testing procedure only requires estimation under the null hypothesis and we use multivariate generalized estimating equations (GEEs) to estimate the model components to account for the correlation among the outcomes. We evaluate finite sample performance of our test via simulation study and demonstrated our methods using the CATIE antibody study data and the CoLaus Study data.
Collapse
Affiliation(s)
- Clemontina A Davenport
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC 27707, USA
| | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jung-Ying Tzeng
- Department of Statistics, Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA. Department of Statistics, National Cheng-Kung University, Tainan, Taiwan Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
26
|
He Z, Lee S, Zhang M, Smith JA, Guo X, Palmas W, Kardia SL, Ionita-Laza I, Mukherjee B. Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA). Genet Epidemiol 2017; 41:801-810. [PMID: 29076270 PMCID: PMC5696115 DOI: 10.1002/gepi.22081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 08/24/2017] [Accepted: 08/24/2017] [Indexed: 11/09/2022]
Abstract
Over the past few years, an increasing number of studies have identified rare variants that contribute to trait heritability. Due to the extreme rarity of some individual variants, gene-based association tests have been proposed to aggregate the genetic variants within a gene, pathway, or specific genomic region as opposed to a one-at-a-time single variant analysis. In addition, in longitudinal studies, statistical power to detect disease susceptibility rare variants can be improved through jointly testing repeatedly measured outcomes, which better describes the temporal development of the trait of interest. However, usual sandwich/model-based inference for sequencing studies with longitudinal outcomes and rare variants can produce deflated/inflated type I error rate without further corrections. In this paper, we develop a group of tests for rare-variant association based on outcomes with repeated measures. We propose new perturbation methods such that the type I error rate of the new tests is not only robust to misspecification of within-subject correlation, but also significantly improved for variants with extreme rarity in a study with small or moderate sample size. Through extensive simulation studies, we illustrate that substantially higher power can be achieved by utilizing longitudinal outcomes and our proposed finite sample adjustment. We illustrate our methods using data from the Multi-Ethnic Study of Atherosclerosis for exploring association of repeated measures of blood pressure with rare and common variants based on exome sequencing data on 6,361 individuals.
Collapse
Affiliation(s)
- Zihuai He
- Department of Biostatistics, Columbia University, New York, NY 10032
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| | - Min Zhang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| | - Jennifer A. Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109
| | - Xiuqing Guo
- Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA 90509
| | - Walter Palmas
- Department of Medicine, Columbia University, New York, NY 10032
| | | | | | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
27
|
Xu Z, Xu G, Pan W, for the Alzheimer's Disease Neuroimaging Initiative. Adaptive testing for association between two random vectors in moderate to high dimensions. Genet Epidemiol 2017; 41:599-609. [PMID: 28714590 PMCID: PMC5643233 DOI: 10.1002/gepi.22059] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Revised: 04/26/2017] [Accepted: 05/17/2017] [Indexed: 01/09/2023]
Abstract
Testing for association between two random vectors is a common and important task in many fields, however, existing tests, such as Escoufier's RV test, are suitable only for low-dimensional data, not for high-dimensional data. In moderate to high dimensions, it is necessary to consider sparse signals, which are often expected with only a few, but not many, variables associated with each other. We generalize the RV test to moderate-to-high dimensions. The key idea is to data adaptively weight each variable pair based on its empirical association. As the consequence, the proposed test is adaptive, alleviating the effects of noise accumulation in high-dimensional data, and thus maintaining the power for both dense and sparse alternative hypotheses. We show the connections between the proposed test with several existing tests, such as a generalized estimating equations-based adaptive test, multivariate kernel machine regression (KMR), and kernel distance methods. Furthermore, we modify the proposed adaptive test so that it can be powerful for nonlinear or nonmonotonic associations. We use both real data and simulated data to demonstrate the advantages and usefulness of the proposed new test. The new test is freely available in R package aSPC on CRAN at https://cran.r-project.org/web/packages/aSPC/index.html and https://github.com/jasonzyx/aSPC.
Collapse
Affiliation(s)
- Zhiyuan Xu
- Division of Biostatistics, University of Minnesota
| | - Gongjun Xu
- Department of Statistics, University of Michigan
| | - Wei Pan
- Division of Biostatistics, University of Minnesota
| | | |
Collapse
|
28
|
He Z, Xu B, Lee S, Ionita-Laza I. Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data. Am J Hum Genet 2017; 101:340-352. [PMID: 28844485 PMCID: PMC5590864 DOI: 10.1016/j.ajhg.2017.07.011] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Accepted: 07/18/2017] [Indexed: 12/14/2022] Open
Abstract
Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests.
Collapse
Affiliation(s)
- Zihuai He
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Bin Xu
- Department of Psychiatry, Columbia University, New York, NY 10032, USA
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
29
|
Jadhav S, Koul HL, Lu Q. Miscellanea Dependent generalized functional linear models. Biometrika 2017; 104:987-994. [PMID: 29353911 DOI: 10.1093/biomet/asx044] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
This paper considers testing for no effect of functional covariates on response variables in multivariate regression. We use generalized estimating equations to determine the underlying parameters and establish their joint asymptotic normality. This is then used to test the significance of the effect of predictors on the vector of response variables. Simulations demonstrate the importance of considering existing correlation structures in the data. To explore the effect of treating genetic data as a function, we perform a simulation study using gene sequencing data and find that the performance of our test is comparable to that of another popular method used in sequencing studies. We present simulations to explore the behaviour of our test under varying sample size, cluster size and dimension of the parameter to be estimated, and an application where we are able to confirm known associations between nicotine dependence and neuronal nicotinic acetylcholine receptor subunit genes.
Collapse
Affiliation(s)
- S Jadhav
- Department of Statistics & Probability, Michigan State University, 619 Red Cedar Road, East Lansing, Michigan 48824, U.S.A
| | - H L Koul
- Department of Statistics & Probability, Michigan State University, 619 Red Cedar Road, East Lansing, Michigan 48824, U.S.A
| | - Q Lu
- Department of Epidemiology & Biostatistics, Michigan State University, B601West Fee Hall, 909 Fee Road, East Lansing, Michigan 48824, U.S.A
| |
Collapse
|
30
|
Dai W, Yang M, Wang C, Cai T. Sequence robust association test for familial data. Biometrics 2017; 73:876-884. [PMID: 28273695 PMCID: PMC11141465 DOI: 10.1111/biom.12643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Revised: 09/01/2016] [Accepted: 11/01/2016] [Indexed: 01/12/2023]
Abstract
Genome-wide association studies (GWAS) and next generation sequencing studies (NGSS) are often performed in family studies to improve power in identifying genetic variants that are associated with clinical phenotypes. Efficient analysis of genome-wide studies with familial data is challenging due to the difficulty in modeling shared but unmeasured genetic and/or environmental factors that cause dependencies among family members. Existing genetic association testing procedures for family studies largely rely on generalized estimating equations (GEE) or linear mixed-effects (LME) models. These procedures may fail to properly control for type I errors when the imposed model assumptions fail. In this article, we propose the Sequence Robust Association Test (SRAT), a fully rank-based, flexible approach that tests for association between a set of genetic variants and an outcome, while accounting for within-family correlation and adjusting for covariates. Comparing to existing methods, SRAT has the advantages of allowing for unknown correlation structures and weaker assumptions about the outcome distribution. We provide theoretical justifications for SRAT and show that SRAT includes the well-known Wilcoxon rank sum test as a special case. Extensive simulation studies suggest that SRAT provides better protection against type I error rate inflation, and could be much more powerful for settings with skewed outcome distribution than existing methods. For illustration, we also apply SRAT to the familial data from the Framingham Heart Study and Offspring Study to examine the association between an inflammatory marker and a few sets of genetic variants.
Collapse
Affiliation(s)
- Wei Dai
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, US
| | - Ming Yang
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, US
| | - Chaolong Wang
- Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Tianxi Cai
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, US
| |
Collapse
|
31
|
Joint association analysis of a binary and a quantitative trait in family samples. Eur J Hum Genet 2016; 25:130-136. [PMID: 27782109 DOI: 10.1038/ejhg.2016.134] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 07/05/2016] [Accepted: 09/06/2016] [Indexed: 12/12/2022] Open
Abstract
In recent years, improved genotyping and sequencing technologies have enabled the discovery of new loci associated with various diseases or traits. For instance, by testing the association with each single-nucleotide variant (SNV) separately, genome-wide association studies (GWAS) have achieved tremendous success in identifying SNVs associated with specific traits. However, little is known about the common genetic basis of multiple traits owing to lack of efficient methods. With the use of extended quasi-likelihood, a Wald test has been proposed to perform a bivariate analysis of a continuous and a binary trait in unrelated samples. However, owing to its low computational efficiency, it has not been implemented in real applications to large-scale genetic studies. In this paper, we propose an efficient bivariate robust score test for two traits, one continuous and one binary, based on extended generalized estimating equations. Our approach is applicable to both family-based and unrelated study designs and can be extended to test the association of multiple traits. Our simulation studies demonstrate the type-I error rate of our approach is well controlled in all minor allele frequency (MAF) scenarios, with MAF ranging from 1 to 30%, and the method is more powerful in certain MAF scenarios than univariate testing with correction for multiple testing. Because of the computational advantage of score tests, our approach is readily applicable to GWAS or sequencing studies. Finally, we present a real application to uncover genetic variants associated with body mass index and type-2 diabetes in the Framingham Heart Study.
Collapse
|
32
|
Wang L, Choi S, Lee S, Park T, Won S. Comparing family-based rare variant association tests for dichotomous phenotypes. BMC Proc 2016; 10:181-186. [PMID: 27980633 PMCID: PMC5133528 DOI: 10.1186/s12919-016-0027-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background It has been repeatedly stressed that family-based samples suffer less from genetic heterogeneity and that association analyses with family-based samples are expected to be powerful for detecting susceptibility loci for rare disease. Various approaches for rare-variant analysis with family-based samples have been proposed. Methods In this report, performances of the existing methods were compared with the simulated data set provided as part of Genetic Analysis Workshop 19 (GAW19). We considered the rare variant transmission disequilibrium test (RV-TDT), generalized estimating equations-based kernel association (GEE-KM) test, an extended combined multivariate and collapsing test for pedigree data (known as Pedigree Combined Multivariate and Collapsing [PedCMC]), gene-level kernel and burden association tests with disease status for pedigree data (PedGene), and the family-based rare variant association test (FARVAT). Results The results show that PedGene and FARVAT are usually the most efficient, and the optimal test statistic provided by FARVAT is robust under different disease models. Furthermore, FARVAT was implemented with C++, which is more computationally faster than other methods. Conclusions Considering both statistical and computational efficiency, we conclude that FARVAT is a good choice for rare-variant analysis with extended families.
Collapse
Affiliation(s)
- Longfei Wang
- Interdisciplinary Program in bioinformatics, Seoul National University, Seoul, 151-742 Korea
| | - Sungkyoung Choi
- Interdisciplinary Program in bioinformatics, Seoul National University, Seoul, 151-742 Korea
| | - Sungyoung Lee
- Interdisciplinary Program in bioinformatics, Seoul National University, Seoul, 151-742 Korea
| | - Taesung Park
- Interdisciplinary Program in bioinformatics, Seoul National University, Seoul, 151-742 Korea ; Department of Statistics, Seoul National University, Seoul, 151-742 Korea
| | - Sungho Won
- Interdisciplinary Program in bioinformatics, Seoul National University, Seoul, 151-742 Korea ; Department of Public Health Science, Seoul National University, Seoul, 151-742 Korea ; Institute of Health Environment, Seoul National University, Seoul, 151-742 Korea
| |
Collapse
|
33
|
SORL1 variants across Alzheimer's disease European American cohorts. Eur J Hum Genet 2016; 24:1828-1830. [PMID: 27650968 DOI: 10.1038/ejhg.2016.122] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Revised: 06/27/2016] [Accepted: 08/05/2016] [Indexed: 11/08/2022] Open
Abstract
The accumulation of the toxic Aβ peptide in Alzheimer's disease (AD) largely relies upon an efficient recycling of amyloid precursor protein (APP). Recent genetic association studies have described rare variants in SORL1 with putative pathogenic consequences in the recycling of APP. In this work, we examine the presence of rare coding variants in SORL1 in three different European American cohorts: early-onset, late-onset AD (LOAD) and familial LOAD.
Collapse
|
34
|
Wang K. Boosting the Power of the Sequence Kernel Association Test by Properly Estimating Its Null Distribution. Am J Hum Genet 2016; 99:104-14. [PMID: 27292111 DOI: 10.1016/j.ajhg.2016.05.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 05/02/2016] [Indexed: 01/22/2023] Open
Abstract
The sequence kernel association test (SKAT) is probably the most popular statistical test used in rare-variant association studies. Its null distribution involves unknown parameters that need to be estimated. The current estimation method has a valid type I error rate, but the power is compromised given that all subjects are used for estimation. I have developed an estimation method that uses only control subjects. Named SKAT+, this method uses the same test statistic as SKAT but differs in the way the null distribution is estimated. Extensive simulation studies and applications to data from the Genetic Analysis Workshop 17 and the Ocular Hypertension Treatment Study demonstrated that SKAT+ has superior power over SKAT while maintaining control over the type I error rate. This method is applicable to extensions of SKAT in the literature.
Collapse
Affiliation(s)
- Kai Wang
- Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA 52242, USA.
| |
Collapse
|
35
|
Wang L, Lee S, Gim J, Qiao D, Cho M, Elston RC, Silverman EK, Won S. Family-Based Rare Variant Association Analysis: A Fast and Efficient Method of Multivariate Phenotype Association Analysis. Genet Epidemiol 2016; 40:502-11. [PMID: 27312886 DOI: 10.1002/gepi.21985] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2016] [Revised: 05/01/2016] [Accepted: 05/08/2016] [Indexed: 12/19/2022]
Abstract
Family-based designs have been repeatedly shown to be powerful in detecting the significant rare variants associated with human diseases. Furthermore, human diseases are often defined by the outcomes of multiple phenotypes, and thus we expect multivariate family-based analyses may be very efficient in detecting associations with rare variants. However, few statistical methods implementing this strategy have been developed for family-based designs. In this report, we describe one such implementation: the multivariate family-based rare variant association tool (mFARVAT). mFARVAT is a quasi-likelihood-based score test for rare variant association analysis with multiple phenotypes, and tests both homogeneous and heterogeneous effects of each variant on multiple phenotypes. Simulation results show that the proposed method is generally robust and efficient for various disease models, and we identify some promising candidate genes associated with chronic obstructive pulmonary disease. The software of mFARVAT is freely available at http://healthstat.snu.ac.kr/software/mfarvat/, implemented in C++ and supported on Linux and MS Windows.
Collapse
Affiliation(s)
- Longfei Wang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea
| | - Sungyoung Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea
| | - Jungsoo Gim
- Institute of Health and Environment, Seoul National University, Seoul, Korea
| | - Dandi Qiao
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.,Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Michael Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Robert C Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Sungho Won
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.,Graduate School of Public Health, Seoul National University, Seoul, Korea
| |
Collapse
|
36
|
Powerful and Adaptive Testing for Multi-trait and Multi-SNP Associations with GWAS and Sequencing Data. Genetics 2016; 203:715-31. [PMID: 27075728 DOI: 10.1534/genetics.115.186502] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 04/02/2016] [Indexed: 11/18/2022] Open
Abstract
Testing for genetic association with multiple traits has become increasingly important, not only because of its potential to boost statistical power, but also for its direct relevance to applications. For example, there is accumulating evidence showing that some complex neurodegenerative and psychiatric diseases like Alzheimer's disease are due to disrupted brain networks, for which it would be natural to identify genetic variants associated with a disrupted brain network, represented as a set of multiple traits, one for each of multiple brain regions of interest. In spite of its promise, testing for multivariate trait associations is challenging: if not appropriately used, its power can be much lower than testing on each univariate trait separately (with a proper control for multiple testing). Furthermore, differing from most existing methods for single-SNP-multiple-trait associations, we consider SNP set-based association testing to decipher complicated joint effects of multiple SNPs on multiple traits. Because the power of a test critically depends on several unknown factors such as the proportions of associated SNPs and of traits, we propose a highly adaptive test at both the SNP and trait levels, giving higher weights to those likely associated SNPs and traits, to yield high power across a wide spectrum of situations. We illuminate relationships among the proposed and some existing tests, showing that the proposed test covers several existing tests as special cases. We compare the performance of the new test with that of several existing tests, using both simulated and real data. The methods were applied to structural magnetic resonance imaging data drawn from the Alzheimer's Disease Neuroimaging Initiative to identify genes associated with gray matter atrophy in the human brain default mode network (DMN). For genome-wide association studies (GWAS), genes AMOTL1 on chromosome 11 and APOE on chromosome 19 were discovered by the new test to be significantly associated with the DMN. Notably, gene AMOTL1 was not detected by single SNP-based analyses. To our knowledge, AMOTL1 has not been highlighted in other Alzheimer's disease studies before, although it was indicated to be related to cognitive impairment. The proposed method is also applicable to rare variants in sequencing data and can be extended to pathway analysis.
Collapse
|
37
|
Li M, Li J, He Z, Lu Q, Witte JS, Macleod SL, Hobbs CA, Cleves MA. Testing Allele Transmission of an SNP Set Using a Family-Based Generalized Genetic Random Field Method. Genet Epidemiol 2016; 40:341-51. [PMID: 27061818 DOI: 10.1002/gepi.21970] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2015] [Revised: 02/19/2016] [Accepted: 02/22/2016] [Indexed: 12/20/2022]
Abstract
Family-based association studies are commonly used in genetic research because they can be robust to population stratification (PS). Recent advances in high-throughput genotyping technologies have produced a massive amount of genomic data in family-based studies. However, current family-based association tests are mainly focused on evaluating individual variants one at a time. In this article, we introduce a family-based generalized genetic random field (FB-GGRF) method to test the joint association between a set of autosomal SNPs (i.e., single-nucleotide polymorphisms) and disease phenotypes. The proposed method is a natural extension of a recently developed GGRF method for population-based case-control studies. It models offspring genotypes conditional on parental genotypes, and, thus, is robust to PS. Through simulations, we presented that under various disease scenarios the FB-GGRF has improved power over a commonly used family-based sequence kernel association test (FB-SKAT). Further, similar to GGRF, the proposed FB-GGRF method is asymptotically well-behaved, and does not require empirical adjustment of the type I error rates. We illustrate the proposed method using a study of congenital heart defects with family trios from the National Birth Defects Prevention Study (NBDPS).
Collapse
Affiliation(s)
- Ming Li
- Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, Indiana, United States of America
| | - Jingyun Li
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Zihuai He
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| | - John S Witte
- Department of Epidemiology and Biostatistics, University of California at San Francisco, San Francisco, California, United States of America
| | - Stewart L Macleod
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Charlotte A Hobbs
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Mario A Cleves
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | | |
Collapse
|
38
|
Wu B, Pankow JS. On Sample Size and Power Calculation for Variant Set-Based Association Tests. Ann Hum Genet 2016; 80:136-43. [PMID: 26831402 PMCID: PMC4761288 DOI: 10.1111/ahg.12147] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 12/07/2015] [Indexed: 01/03/2023]
Abstract
Sample size and power calculations are an important part of designing new sequence-based association studies. The recently developed SEQPower and SPS programs adopted computationally intensive Monte Carlo simulations to empirically estimate power for a series of variant set association (VSA) test methods including the sequence kernel association test (SKAT). It is desirable to develop methods that can quickly and accurately compute power without intensive Monte Carlo simulations. We will show that the computed power for SKAT based on the existing analytical approach could be inflated especially for small significance levels, which are often of primary interest for large-scale whole genome and exome sequencing projects. We propose a new χ(2) -approximation-based approach to accurately and efficiently compute sample size and power. In addition, we propose and implement a more accurate "exact" method to compute power, which is more efficient than the Monte Carlo approach though generally involves more computations than the χ(2) approximation method. The exact approach could produce very accurate results and be used to verify alternative approximation approaches. We implement the proposed methods in publicly available R programs that can be readily adapted when planning sequencing projects.
Collapse
Affiliation(s)
- Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - James S. Pankow
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
39
|
Abstract
Empirical studies and evolutionary theory support a role for rare variants in the etiology of complex traits. Given this motivation and increasing affordability of whole-exome and whole-genome sequencing, methods for rare variant association have been an active area of research for the past decade. Here, we provide a survey of the current literature and developments from the Genetics Analysis Workshop 19 (GAW19) Collapsing Rare Variants working group. In particular, we present the generalized linear regression framework and associated score statistic for the 2 major types of methods: burden and variance components methods. We further show that by simply modifying weights within these frameworks we arrive at many of the popular existing methods, for example, the cohort allelic sums test and sequence kernel association test. Meta-analysis techniques are also described. Next, we describe the 6 contributions from the GAW19 Collapsing Rare Variants working group. These included development of new methods, such as a retrospective likelihood for family data, a method using genomic structure to compare cases and controls, a haplotype-based meta-analysis, and a permutation-based method for combining different statistical tests. In addition, one contribution compared a mega-analysis of family-based and population-based data to meta-analysis. Finally, the power of existing family-based methods for binary traits was compared. We conclude with suggestions for open research questions.
Collapse
Affiliation(s)
- Stephanie A Santorico
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| | - Audrey E Hendricks
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| |
Collapse
|
40
|
Kim J, Bai Y, Pan W. An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics. Genet Epidemiol 2015; 39:651-63. [PMID: 26493956 DOI: 10.1002/gepi.21931] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 08/12/2015] [Indexed: 01/01/2023]
Abstract
We study the problem of testing for single marker-multiple phenotype associations based on genome-wide association study (GWAS) summary statistics without access to individual-level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual-level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta-analyzed GWAS dataset with three blood lipid traits and another with sex-stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta-analyzed) genome-wide summary statistics, then extend the method to meta-analysis of multiple sets of genome-wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.
Collapse
Affiliation(s)
- Junghi Kim
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Yun Bai
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
41
|
Xu Z, Pan W. Approximate score-based testing with application to multivariate trait association analysis. Genet Epidemiol 2015. [PMID: 26198454 DOI: 10.1002/gepi.21911] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
For genome-wide association studies and DNA sequencing studies, several powerful score-based tests, such as kernel machine regression and sum of powered score tests, have been proposed in the last few years. However, extensions of these score-based tests to more complex models, such as mixed-effects models for analysis of multiple and correlated traits, have been hindered by the unavailability of the score vector, due to either no output from statistical software or no closed-form solution at all. We propose a simple and general method to asymptotically approximate the score vector based on an asymptotically normal and consistent estimate of a parameter vector to be tested and its (consistent) covariance matrix. The proposed method is applicable to both maximum-likelihood estimation and estimating function-based approaches. We use the derived approximate score vector to extend several score-based tests to mixed-effects models. We demonstrate the feasibility and possible power gains of these tests in association analysis of multiple and correlated quantitative or binary traits with both real and simulated data. The proposed method is easy to implement with a wide applicability.
Collapse
Affiliation(s)
- Zhiyuan Xu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | | |
Collapse
|
42
|
Lin DY, Tao R, Kalsbeek W, Zeng D, Gonzalez F, Fernández-Rhodes L, Graff M, Koch G, North K, Heiss G. Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos. Am J Hum Genet 2014; 95:675-88. [PMID: 25480034 DOI: 10.1016/j.ajhg.2014.11.005] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Accepted: 11/11/2014] [Indexed: 12/27/2022] Open
Abstract
The cohort design allows investigators to explore the genetic basis of a variety of diseases and traits in a single study while avoiding major weaknesses of the case-control design. Most cohort studies employ multistage cluster sampling with unequal probabilities to conveniently select participants with desired characteristics, and participants from different clusters might be genetically related. Analysis that ignores the complex sampling design can yield biased estimation of the genetic association and inflation of the type I error. Herein, we develop weighted estimators that reflect unequal selection probabilities and differential nonresponse rates, and we derive variance estimators that properly account for the sampling design and the potential relatedness of participants in different sampling units. We compare, both analytically and numerically, the performance of the proposed weighted estimators with unweighted estimators that disregard the sampling design. We demonstrate the usefulness of the proposed methods through analysis of MetaboChip data in the Hispanic Community Health Study/Study of Latinos, which is the largest health study of the Hispanic/Latino population in the United States aimed at identifying risk factors for various diseases and determining the role of genes and environment in the occurrence of diseases. We provide guidelines on the use of weighted and unweighted estimators, as well as the relevant software.
Collapse
|
43
|
Li M, Cleves MA, Mallick H, Erickson SW, Tang X, Nick TG, Macleod SL, Hobbs CA, National Birth Defect Prevention Study. A genetic association study detects haplotypes associated with obstructive heart defects. Hum Genet 2014; 133:1127-38. [PMID: 24894164 PMCID: PMC4313870 DOI: 10.1007/s00439-014-1453-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Accepted: 05/20/2014] [Indexed: 10/25/2022]
Abstract
The development of congenital heart defects (CHDs) involves a complex interplay between genetic variants, epigenetic variants, and environmental exposures. Previous studies have suggested that susceptibility to CHDs is associated with maternal genotypes, fetal genotypes, and maternal-fetal genotype (MFG) interactions. We conducted a haplotype-based genetic association study of obstructive heart defects (OHDs), aiming to detect the genetic effects of 877 SNPs involved in the homocysteine, folate, and transsulfuration pathways. Genotypes were available for 285 mother-offspring pairs with OHD-affected pregnancies and 868 mother-offspring pairs with unaffected pregnancies. A penalized logistic regression model was applied with an adaptive least absolute shrinkage and selection operator (lasso), which dissects the maternal effect, fetal effect, and MFG interaction effects associated with OHDs. By examining the association between 140 haplotype blocks, we identified 9 blocks that are potentially associated with OHD occurrence. Four haplotype blocks, located in genes MGMT, MTHFS, CBS, and DNMT3L, were statistically significant using a Bayesian false-discovery probability threshold of 0.8. Two blocks in MGMT and MTHFS appear to have significant fetal effects, while the CBS and DNMT3L genes may have significant MFG interaction effects.
Collapse
Affiliation(s)
- Ming Li
- Department of Pediatrics, College of Medicine, University of Arkansas for Medical Sciences, 13 Children’s Way Mail Slot 512-40, Little Rock, AR 72202, USA
| | - Mario A. Cleves
- Department of Pediatrics, College of Medicine, University of Arkansas for Medical Sciences, 13 Children’s Way Mail Slot 512-40, Little Rock, AR 72202, USA
| | - Himel Mallick
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Stephen W. Erickson
- Department of Biostatistics, University of Arkansas for Medical Sciences, Little Rock, AR 72202, USA
| | - Xinyu Tang
- Department of Pediatrics, College of Medicine, University of Arkansas for Medical Sciences, 13 Children’s Way Mail Slot 512-40, Little Rock, AR 72202, USA
| | - Todd G. Nick
- Department of Pediatrics, College of Medicine, University of Arkansas for Medical Sciences, 13 Children’s Way Mail Slot 512-40, Little Rock, AR 72202, USA
| | - Stewart L. Macleod
- Department of Pediatrics, College of Medicine, University of Arkansas for Medical Sciences, 13 Children’s Way Mail Slot 512-40, Little Rock, AR 72202, USA
| | - Charlotte A. Hobbs
- Department of Pediatrics, College of Medicine, University of Arkansas for Medical Sciences, 13 Children’s Way Mail Slot 512-40, Little Rock, AR 72202, USA
| | | |
Collapse
|
44
|
Jiang Y, Conneely KN, Epstein MP. Flexible and robust methods for rare-variant testing of quantitative traits in trios and nuclear families. Genet Epidemiol 2014; 38:542-51. [PMID: 25044337 DOI: 10.1002/gepi.21839] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 05/21/2014] [Accepted: 05/29/2014] [Indexed: 11/07/2022]
Abstract
Most rare-variant association tests for complex traits are applicable only to population-based or case-control resequencing studies. There are fewer rare-variant association tests for family-based resequencing studies, which is unfortunate because pedigrees possess many attractive characteristics for such analyses. Family-based studies can be more powerful than their population-based counterparts due to increased genetic load and further enable the implementation of rare-variant association tests that, by design, are robust to confounding due to population stratification. With this in mind, we propose a rare-variant association test for quantitative traits in families; this test integrates the QTDT approach of Abecasis et al. [Abecasis et al., ] into the kernel-based SNP association test KMFAM of Schifano et al. [Schifano et al., ]. The resulting within-family test enjoys the many benefits of the kernel framework for rare-variant association testing, including rapid evaluation of P-values and preservation of power when a region harbors rare causal variation that acts in different directions on phenotype. Additionally, by design, this within-family test is robust to confounding due to population stratification. Although within-family association tests are generally less powerful than their counterparts that use all genetic information, we show that we can recover much of this power (although still ensuring robustness to population stratification) using a straightforward screening procedure. Our method accommodates covariates and allows for missing parental genotype data, and we have written software implementing the approach in R for public use.
Collapse
Affiliation(s)
- Yunxuan Jiang
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | | | | |
Collapse
|
45
|
Abstract
The use of genetically isolated populations can empower next-generation association studies. In this review, we discuss the advantages of this approach and review study design and analytical considerations of genetic association studies focusing on isolates. We cite successful examples of using population isolates in association studies and outline potential ways forward.
Collapse
|
46
|
Lee S, Abecasis G, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014; 95:5-23. [PMID: 24995866 DOI: 10.1016/j.ajhg.2014.06.009] [Citation(s) in RCA: 721] [Impact Index Per Article: 65.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Indexed: 12/30/2022] Open
Abstract
Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions.
Collapse
|