1
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
2
|
Qadri QR, Lai X, Zhao W, Zhang Z, Zhao Q, Ma P, Pan Y, Wang Q. Exploring the Interplay between the Hologenome and Complex Traits in Bovine and Porcine Animals Using Genome-Wide Association Analysis. Int J Mol Sci 2024; 25:6234. [PMID: 38892420 PMCID: PMC11172659 DOI: 10.3390/ijms25116234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/25/2024] [Accepted: 05/29/2024] [Indexed: 06/21/2024] Open
Abstract
Genome-wide association studies (GWAS) significantly enhance our ability to identify trait-associated genomic variants by considering the host genome. Moreover, the hologenome refers to the host organism's collective genetic material and its associated microbiome. In this study, we utilized the hologenome framework, called Hologenome-wide association studies (HWAS), to dissect the architecture of complex traits, including milk yield, methane emissions, rumen physiology in cattle, and gut microbial composition in pigs. We employed four statistical models: (1) GWAS, (2) Microbial GWAS (M-GWAS), (3) HWAS-CG (hologenome interaction estimated using COvariance between Random Effects Genome-based restricted maximum likelihood (CORE-GREML)), and (4) HWAS-H (hologenome interaction estimated using the Hadamard product method). We applied Bonferroni correction to interpret the significant associations in the complex traits. The GWAS and M-GWAS detected one and sixteen significant SNPs for milk yield traits, respectively, whereas the HWAS-CG and HWAS-H each identified eight SNPs. Moreover, HWAS-CG revealed four, and the remaining models identified three SNPs each for methane emissions traits. The GWAS and HWAS-CG detected one and three SNPs for rumen physiology traits, respectively. For the pigs' gut microbial composition traits, the GWAS, M-GWAS, HWAS-CG, and HWAS-H identified 14, 16, 13, and 12 SNPs, respectively. We further explored these associations through SNP annotation and by analyzing biological processes and functional pathways. Additionally, we integrated our GWA results with expression quantitative trait locus (eQTL) data using transcriptome-wide association studies (TWAS) and summary-based Mendelian randomization (SMR) methods for a more comprehensive understanding of SNP-trait associations. Our study revealed hologenomic variability in agriculturally important traits, enhancing our understanding of host-microbiome interactions.
Collapse
Affiliation(s)
- Qamar Raza Qadri
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China; (Q.R.Q.); (P.M.)
| | - Xueshuang Lai
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Science, Zhejiang University, Hangzhou 310030, China; (X.L.); (W.Z.); (Z.Z.); (Y.P.)
| | - Wei Zhao
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Science, Zhejiang University, Hangzhou 310030, China; (X.L.); (W.Z.); (Z.Z.); (Y.P.)
| | - Zhenyang Zhang
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Science, Zhejiang University, Hangzhou 310030, China; (X.L.); (W.Z.); (Z.Z.); (Y.P.)
| | - Qingbo Zhao
- Institute of Swine Science, Nanjing Agricultural University, Nanjing 210095, China;
| | - Peipei Ma
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China; (Q.R.Q.); (P.M.)
| | - Yuchun Pan
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Science, Zhejiang University, Hangzhou 310030, China; (X.L.); (W.Z.); (Z.Z.); (Y.P.)
- Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya 572000, China
| | - Qishan Wang
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Science, Zhejiang University, Hangzhou 310030, China; (X.L.); (W.Z.); (Z.Z.); (Y.P.)
- Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya 572000, China
| |
Collapse
|
3
|
Shi X, Bu A, Yang Y, Wang Y, Zhao C, Fan J, Yang C, Jia X. Investigating the shared genetic architecture between breast and ovarian cancers. Genet Mol Biol 2024; 47:e20230181. [PMID: 38626574 PMCID: PMC11021043 DOI: 10.1590/1678-4685-gmb-2023-0181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 12/27/2023] [Indexed: 04/18/2024] Open
Abstract
High heritability and strong correlation have been observed in breast and ovarian cancers. However, their shared genetic architecture remained unclear. Linkage disequilibrium score regression (LDSC) and heritability estimation from summary statistics (ρ-HESS) were applied to estimate heritability and genetic correlations. Bivariate causal mixture model (MiXeR) was used to qualify the polygenic overlap. Then, stratified-LDSC (S-LDSC) was used to identify tissue and cell type specificity. Meanwhile, the adaptive association test called MTaSPUsSet was performed to identify potential pleiotropic genes. The Single Nucleotide Polymorphisms (SNP) heritability was 13% for breast cancer and 5% for ovarian cancer. There was a significant genetic correlation between breast and ovarian cancers (rg=0.21). Breast and ovarian cancers exhibited polygenic overlap, sharing 0.4 K out 2.8 K of causal variants. Tissue and cell type specificity displayed significant enrichment in female breast mammary, uterus, kidney tissues, and adipose cell. Moreover, the 74 potential pleiotropic genes were identified between breast and ovarian cancers, which were related to the regulation of cell cycle and cell death. We quantified the shared genetic architecture between breast and ovarian cancers and shed light on the biological basis of the co-morbidity. Ultimately, these findings facilitated the understanding of disease etiology.
Collapse
Affiliation(s)
- Xuezhong Shi
- Zhengzhou University, College of Public Health, Department of Epidemiology and Biostatistics, Zhengzhou, Henan, China
| | - Anqi Bu
- Zhengzhou University, College of Public Health, Department of Epidemiology and Biostatistics, Zhengzhou, Henan, China
| | - Yongli Yang
- Zhengzhou University, College of Public Health, Department of Epidemiology and Biostatistics, Zhengzhou, Henan, China
| | - Yuping Wang
- Zhengzhou University, College of Public Health, Department of Epidemiology and Biostatistics, Zhengzhou, Henan, China
| | - Chenyu Zhao
- Zhengzhou University, College of Public Health, Department of Epidemiology and Biostatistics, Zhengzhou, Henan, China
| | - Jingwen Fan
- Zhengzhou University, College of Public Health, Department of Epidemiology and Biostatistics, Zhengzhou, Henan, China
| | - Chaojun Yang
- Zhengzhou University, College of Public Health, Department of Epidemiology and Biostatistics, Zhengzhou, Henan, China
| | - Xiaocan Jia
- Zhengzhou University, College of Public Health, Department of Epidemiology and Biostatistics, Zhengzhou, Henan, China
| |
Collapse
|
4
|
Wang Y, Yang Y, Jia X, Zhao C, Yang C, Fan J, Wang N, Shi X. Identification of the shared genetic architecture underlying seven autoimmune diseases with GWAS summary statistics. Front Immunol 2024; 14:1303675. [PMID: 38259487 PMCID: PMC10800382 DOI: 10.3389/fimmu.2023.1303675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
Background The common clinical symptoms and immunopathological mechanisms have been observed among multiple autoimmune diseases (ADs), but the shared genetic etiology remains unclear. Methods GWAS summary statistics of seven ADs were downloaded from Open Targets Genetics and Dryad. Linkage disequilibrium score regression (LDSC) was applied to estimate overall genetic correlations, bivariate causal mixture model (MiXeR) was used to qualify the polygenic overlap, and stratified-LDSC partitioned heritability to reveal tissue and cell type specific enrichments. Ultimately, we conducted a novel adaptive association test called MTaSPUsSet for identifying pleiotropic genes. Results The high heritability of seven ADs ranged from 0.1228 to 0.5972, and strong genetic correlations among certain phenotypes varied between 0.185 and 0.721. There was substantial polygenic overlap, with the number of shared SNPs approximately 0.03K to 0.21K. The specificity of SNP heritability was enriched in the immune/hematopoietic related tissue and cells. Furthermore, we identified 32 pleiotropic genes associated with seven ADs, 23 genes were considered as novel genes. These genes were involved in several cell regulation pathways and immunologic signatures. Conclusion We comprehensively explored the shared genetic architecture across seven ADs. The findings progress the exploration of common molecular mechanisms and biological processes involved, and facilitate understanding of disease etiology.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Xuezhong Shi
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|
5
|
Chi J, Xu M, Sheng X, Zhou Y. Association detection between multiple traits and rare variants based on family data via a nonparametric method. PeerJ 2023; 11:e16040. [PMID: 37780393 PMCID: PMC10541022 DOI: 10.7717/peerj.16040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 08/15/2023] [Indexed: 10/03/2023] Open
Abstract
Background The rapid development of next-generation sequencing technologies allow people to analyze human complex diseases at the molecular level. It has been shown that rare variants play important roles for human diseases besides common variants. Thus, effective statistical methods need to be proposed to test for the associations between traits (e.g., diseases) and rare variants. Currently, more and more rare genetic variants are being detected throughout the human genome, which demonstrates the possibility to study rare variants. Yet complex diseases are usually measured as a variety of forms, such as binary, ordinal, quantitative, or some mixture of them. Therefore, the genetic mapping problem can be attributable to the association detection between multiple traits and multiple loci, with sufficiently considering the correlated structure among multiple traits. Methods In this article, we construct a new non-parametric statistic by the generalized Kendall's τ theory based on family data. The new test statistic has an asymptotic distribution, it can be used to study the associations between multiple traits and rare variants, which broadens the way to identify genetic factors of human complex diseases. Results We apply our method (called Nonp-FAM) to analyze simulated data and GAW17 data, and conduct comprehensive comparison with some existing methods. Experimental results show that the proposed family-based method is powerful and robust for testing associations between multiple traits and rare variants, even if the data has some population stratification effect.
Collapse
Affiliation(s)
- Jinling Chi
- Department of Statistics, Heilongjiang University, Harbin, China
- School of Mathematics and Statistics, Xidian University, Xi’an, China
| | - Meijuan Xu
- Department of Statistics, Heilongjiang University, Harbin, China
| | - Xiaona Sheng
- School of Information Engineering, Harbin University, Harbin, China
| | - Ying Zhou
- Department of Statistics, Heilongjiang University, Harbin, China
| |
Collapse
|
6
|
Lin Z, Xue H, Pan W. Combining Mendelian randomization and network deconvolution for inference of causal networks with GWAS summary data. PLoS Genet 2023; 19:e1010762. [PMID: 37200398 PMCID: PMC10231771 DOI: 10.1371/journal.pgen.1010762] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 05/31/2023] [Accepted: 04/25/2023] [Indexed: 05/20/2023] Open
Abstract
Mendelian randomization (MR) has been increasingly applied for causal inference with observational data by using genetic variants as instrumental variables (IVs). However, the current practice of MR has been largely restricted to investigating the total causal effect between two traits, while it would be useful to infer the direct causal effect between any two of many traits (by accounting for indirect or mediating effects through other traits). For this purpose we propose a two-step approach: we first apply an extended MR method to infer (i.e. both estimate and test) a causal network of total effects among multiple traits, then we modify a graph deconvolution algorithm to infer the corresponding network of direct effects. Simulation studies showed much better performance of our proposed method than existing ones. We applied the method to 17 large-scale GWAS summary datasets (with median N = 256879 and median #IVs = 48) to infer the causal networks of both total and direct effects among 11 common cardiometabolic risk factors, 4 cardiometabolic diseases (coronary artery disease, stroke, type 2 diabetes, atrial fibrillation), Alzheimer's disease and asthma, identifying some interesting causal pathways. We also provide an R Shiny app (https://zhaotongl.shinyapps.io/cMLgraph/) for users to explore any subset of the 17 traits of interest.
Collapse
Affiliation(s)
- Zhaotong Lin
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Haoran Xue
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
7
|
A clustering linear combination method for multiple phenotype association studies based on GWAS summary statistics. Sci Rep 2023; 13:3389. [PMID: 36854754 PMCID: PMC9975197 DOI: 10.1038/s41598-023-30415-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 02/22/2023] [Indexed: 03/02/2023] Open
Abstract
There is strong evidence showing that joint analysis of multiple phenotypes in genome-wide association studies (GWAS) can increase statistical power when detecting the association between genetic variants and human complex diseases. We previously developed the Clustering Linear Combination (CLC) method and a computationally efficient CLC (ceCLC) method to test the association between multiple phenotypes and a genetic variant, which perform very well. However, both of these methods require individual-level genotypes and phenotypes that are often not easily accessible. In this research, we develop a novel method called sCLC for association studies of multiple phenotypes and a genetic variant based on GWAS summary statistics. We use the LD score regression to estimate the correlation matrix among phenotypes. The test statistic of sCLC is constructed by GWAS summary statistics and has an approximate Cauchy distribution. We perform a variety of simulation studies and compare sCLC with other commonly used methods for multiple phenotype association studies using GWAS summary statistics. Simulation results show that sCLC can control Type I error rates well and has the highest power in most scenarios. Moreover, we apply the newly developed method to the UK Biobank GWAS summary statistics from the XIII category with 70 related musculoskeletal system and connective tissue phenotypes. The results demonstrate that sCLC detects the most number of significant SNPs, and most of these identified SNPs can be matched to genes that have been reported in the GWAS catalog to be associated with those phenotypes. Furthermore, sCLC also identifies some novel signals that were missed by standard GWAS, which provide new insight into the potential genetic factors of the musculoskeletal system and connective tissue phenotypes.
Collapse
|
8
|
Zhang J, Liang X, Gonzales S, Liu J, Gao XR, Wang X. A gene based combination test using GWAS summary data. BMC Bioinformatics 2023; 24:2. [PMID: 36597047 PMCID: PMC9811798 DOI: 10.1186/s12859-022-05114-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 12/13/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Gene-based association tests provide a useful alternative and complement to the usual single marker association tests, especially in genome-wide association studies (GWAS). The way of weighting for variants in a gene plays an important role in boosting the power of a gene-based association test. Appropriate weights can boost statistical power, especially when detecting genetic variants with weak effects on a trait. One major limitation of existing gene-based association tests lies in using weights that are predetermined biologically or empirically. This limitation often attenuates the power of a test. On another hand, effect sizes or directions of causal genetic variants in real data are usually unknown, driving a need for a flexible yet robust methodology of gene based association tests. Furthermore, access to individual-level data is often limited, while thousands of GWAS summary data are publicly and freely available. RESULTS To resolve these limitations, we propose a combination test named as OWC which is based on summary statistics from GWAS data. Several traditional methods including burden test, weighted sum of squared score test [SSU], weighted sum statistic [WSS], SNP-set Kernel Association Test [SKAT], and the score test are special cases of OWC. To evaluate the performance of OWC, we perform extensive simulation studies. Results of simulation studies demonstrate that OWC outperforms several existing popular methods. We further show that OWC outperforms comparison methods in real-world data analyses using schizophrenia GWAS summary data and a fasting glucose GWAS meta-analysis data. The proposed method is implemented in an R package available at https://github.com/Xuexia-Wang/OWC-R-package CONCLUSIONS: We propose a novel gene-based association test that incorporates four different weighting schemes (two constant weights and two weights proportional to normal statistic Z) and includes several popular methods as its special cases. Results of the simulation studies and real data analyses illustrate that the proposed test, OWC, outperforms comparable methods in most scenarios. These results demonstrate that OWC is a useful tool that adapts to the underlying biological model for a disease by weighting appropriately genetic variants and combination of well-known gene-based tests.
Collapse
Affiliation(s)
- Jianjun Zhang
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyu Liang
- grid.17088.360000 0001 2150 1785Department of Epidemiology and Biostatistics, Michigan State University, 909 Wilson Rd Room B601, East Lansing, MI 48824 USA
| | - Samantha Gonzales
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Jianguo Liu
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyi Raymond Gao
- grid.261331.40000 0001 2285 7943Department of Ophthalmology and Visual Science, Department of Biomedical informatics, Division of Human Genetics, Ohio State University, 915 Olentangy River Road, Columbus, OH 43212 USA
| | - Xuexia Wang
- grid.65456.340000 0001 2110 1845Department of Biostatistics, Robert Stempel College of Public Health and Social Work, Florida International University, 11200 SW 8th street, Miami, FL 33174 USA
| |
Collapse
|
9
|
Zhao C, Jia X, Wang Y, Luo Z, Fan J, Shi X, Yang Y. Overlapping genetic susceptibility of seven autoimmune diseases:SPU tests based on genome-wide association summary statistics. Gene 2022; 851:147036. [DOI: 10.1016/j.gene.2022.147036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 10/26/2022] [Accepted: 11/04/2022] [Indexed: 11/11/2022]
|
10
|
Identifying pleiotropic genes for major psychiatric disorders with GWAS summary statistics using multivariate adaptive association tests. J Psychiatr Res 2022; 155:471-482. [PMID: 36183601 DOI: 10.1016/j.jpsychires.2022.09.038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 08/17/2022] [Accepted: 09/16/2022] [Indexed: 11/21/2022]
Abstract
BACKGROUND Genome wide association studies (GWAS) have discovered a few of single nucleotide polymorphisms (SNPs) related to major psychiatric disorders. However, it is not completely clear which genes play a pleiotropic role in multiple disorders. The study aimed to identify the pleiotropic genes across five psychiatric disorders using multivariate adaptive association tests. METHODS Summary statistics of five psychiatric disorders were downloaded from Psychiatric Genomics Consortium. We applied linkage disequilibrium score regression (LDSC) to estimate genetic correlation and conducted tissue and cell type specificity analyses based on Multi-marker Analysis of GenoMic Annotation (MAGMA). Then, we identified the pleiotropic genes using MTaSPUsSet and aSPUs tests. We ultimately performed the functional analysis for pleiotropic genes. RESULTS We confirmed the significant genetic correlation and brain tissue and neuron specificity among five disorders. 100 pleiotropic genes were detected to be significantly associated with five psychiatric disorders, of which 55 were novel genes. These genes were functionally enriched in neuron differentiation and synaptic transmission. LIMITATIONS The effect direction of pleiotropic genes couldn't be distinguished due to without individual-level data. CONCLUSION We identified pleiotropic genes using multivariate adaptive association tests and explored their biological function. The findings may provide novel insight into the development and implementation of prevention and treatment as well as targeted drug discovery in practice.
Collapse
|
11
|
Yang Y, Basu S, Zhang L. A Bayesian hierarchically structured prior for gene-based association testing with multiple traits in genome-wide association studies. Genet Epidemiol 2022; 46:63-72. [PMID: 34787916 PMCID: PMC8795481 DOI: 10.1002/gepi.22437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/28/2021] [Accepted: 10/18/2021] [Indexed: 02/03/2023]
Abstract
Although genome-wide association studies (GWAS) often collect data on multiple correlated traits for complex diseases, conventional gene-based analysis is usually univariate, and therefore, treating traits as uncorrelated. Multivariate analysis of multiple correlated traits can potentially increase the power to detect genes that affect some or all of these traits. In this study, we propose the multivariate hierarchically structured variable selection (HSVS-M) model, a flexible Bayesian model that tests the association of a gene with multiple correlated traits. With only summary statistics, HSVS-M can account for the correlations among genetic variants and among traits simultaneously and can also estimate the various directions and magnitudes of associations between a gene and multiple traits. Simulation studies show that HSVS-M substantially outperforms competing methods in various scenarios, particularly when variants in a gene are associated with a trait in similar directions and magnitudes. We applied HSVS-M to the summary statistics of a meta-analysis GWAS on four lipid traits from the Global Lipids Genetics Consortium and identified 15 genes that have also been confirmed as risk factors in previous studies.
Collapse
Affiliation(s)
- Yi Yang
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA,Department of Biostatistics, Columbia University, New York, NY 10032, USA,Correspondence:
| | - Saonli Basu
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Lin Zhang
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
12
|
Tong L, Zhou Y, Guo Y, Ding H, Ji D. Quantitative trait locus mapping analysis of multiple traits when using genotype data with potential errors. PeerJ 2021; 9:e12187. [PMID: 34631317 PMCID: PMC8475548 DOI: 10.7717/peerj.12187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 08/30/2021] [Indexed: 12/03/2022] Open
Abstract
Background Quantitative trait locus (QTL) analysis aims to locate and estimate the effects of the genes influencing quantitative traits and infer the relationship between gene variants and changes in phenotypic characteristics using statistical methods. Some methods have been developed to map QTLs of multiple traits in the case of no genotype error in a given dataset. However, practical genetic data that people use may contain some potential errors because of the limitations of biotechnology. Common genetic data correction methods can only reduce errors, but cannot calculate the degree of error. In this paper, we propose a QTL mapping strategy for multiple traits in the presence of genotype errors. Methods The additive effect, dominant effect, recombination rate, error rate, and other parameters of QTLs can be simultaneously obtained using this new method in the framework of multiple-interval mapping. Results Our simulation results show that the accuracy of parameter estimation can be improved by considering the errors of marker genotypes during the analysis of genetic data. Real data analysis also shows that the new method proposed in this paper can map the QTLs of multiple traits more accurately.
Collapse
Affiliation(s)
- Liang Tong
- School of Science, Harbin University of Science and Technology, Harbin, P. R. China.,School of Information Engineering, Suihua University, Suihua, P. R. China
| | - Ying Zhou
- School of Mathematical Sciences, Heilongjiang University and Heilongjiang Provincial Key Laboratory of the Theory and Computation of Complex Systems, Harbin, P. R. China
| | - Yixing Guo
- Dalian University of Science and Technology, Dalian, P. R. China
| | - Hui Ding
- School of Information Engineering, Suihua University, Suihua, P. R. China
| | - Donghai Ji
- School of Science, Harbin University of Science and Technology, Harbin, P. R. China
| |
Collapse
|
13
|
Yang T, Tang H, Risch HA, Olson SH, Petersen G, Bracci PM, Gallinger S, Hung R, Neale RE, Scelo G, Duell EJ, Kurtz RC, Khaw KT, Severi G, Sund M, Wareham N, Amos CI, Li D, Wei P. Incorporating multiple sets of eQTL weights into gene-by-environment interaction analysis identifies novel susceptibility loci for pancreatic cancer. Genet Epidemiol 2020; 44:880-892. [PMID: 32779232 PMCID: PMC7657998 DOI: 10.1002/gepi.22348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 07/14/2020] [Accepted: 07/30/2020] [Indexed: 11/11/2022]
Abstract
It is of great scientific interest to identify interactions between genetic variants and environmental exposures that may modify the risk of complex diseases. However, larger sample sizes are usually required to detect gene-by-environment interaction (G × E) than required to detect genetic main association effects. To boost the statistical power and improve the understanding of the underlying molecular mechanisms, we incorporate functional genomics information, specifically, expression quantitative trait loci (eQTLs), into a data-adaptive G × E test, called aGEw. This test adaptively chooses the best eQTL weights from multiple tissues and provides an extra layer of weighting at the genetic variant level. Extensive simulations show that the aGEw test can control the Type 1 error rate, and the power is resilient to the inclusion of neutral variants and noninformative external weights. We applied the proposed aGEw test to the Pancreatic Cancer Case-Control Consortium (discovery cohort of 3,585 cases and 3,482 controls) and the PanScan II genome-wide association study data (replication cohort of 2,021 cases and 2,105 controls) with smoking as the exposure of interest. Two novel putative smoking-related pancreatic cancer susceptibility genes, TRIP10 and KDM3A, were identified. The aGEw test is implemented in an R package aGE.
Collapse
Affiliation(s)
- Tianzhong Yang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Divison of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| | - Hongwei Tang
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Sara H. Olson
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, US
| | - Gloria Petersen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Paige M. Bracci
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Steven Gallinger
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, Canada
| | - Rayjean Hung
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, Canada
| | - Rachel E. Neale
- Cancer Aetiology and Prevention Group, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | | | - Eric J. Duell
- Unit of Nutrition and Cancer, Cancer Epidemiology Research Program Catalan Institute of Oncology - Bellvitge Biomedical Research Institute (ICO-IDIBELL) Avda. Gran Via 199-203 08908 L’Hospitalet de Llobregat, Barcelona, Spain
| | - Robert C. Kurtz
- Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Kay-Tee Khaw
- Department of Public Health and Primary Care, University of Cambridge, UK
| | - Gianluca Severi
- Gustave Roussy, F-94805, Villejuif, France
- CESP, Fac. de médecine - Univ. Paris-Sud, Fac. de médecine - UVSQ, INSERM, Université Paris-Saclay, 94805, Villejuif, France
| | - Malin Sund
- Department of Surgical and Perioperative Sciences, Umeå University, Sweden
| | - Nick Wareham
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | - Christopher I Amos
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Donghui Li
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
14
|
Genome-Wide Association Study and Pathway Analysis for Heterophil/Lymphocyte (H/L) Ratio in Chicken. Genes (Basel) 2020; 11:genes11091005. [PMID: 32867375 PMCID: PMC7563235 DOI: 10.3390/genes11091005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 08/19/2020] [Accepted: 08/19/2020] [Indexed: 12/27/2022] Open
Abstract
Disease control and prevention have been critical factors in the dramatic growth of the poultry industry. Disease resistance in chickens can be improved through genetic selection for immunocompetence. The heterophil/lymphocyte ratio (H/L) in the blood reflects the immune system status of chickens. Our objective was to conduct a genome-wide association study (GWAS) and pathway analysis to identify possible biological mechanisms involved in H/L traits. In this study, GWAS for H/L was performed in 1317 Cobb broilers to identify significant single-nucleotide polymorphisms (SNPs) associated with H/L. Eight SNPs (p < 1/8068) reached a significant level of association. The significant SNP on GGA 19 (chicken chromosome 19) was in the gene for complement C1q binding protein (C1QBP). The wild-type and mutant individuals showed significant differences in H/L at five identified SNPs (p < 0.05). According to the results of pathway analysis, nine associated pathways (p < 0.05) were identified. By combining GWAS with pathway analysis, we found that all SNPs after QC explained 12.4% of the phenotypic variation in H/L, and 52 SNPs associated with H/L explained as much as 9.7% of the phenotypic variation in H/L. Our findings contribute to understanding of the genetic regulation of H/L and provide theoretical support.
Collapse
|
15
|
Luo L, Shen J, Zhang H, Chhibber A, Mehrotra DV, Tang ZZ. Multi-trait analysis of rare-variant association summary statistics using MTAR. Nat Commun 2020; 11:2850. [PMID: 32503972 PMCID: PMC7275056 DOI: 10.1038/s41467-020-16591-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 05/09/2020] [Indexed: 12/13/2022] Open
Abstract
Integrating association evidence across multiple traits can improve the power of gene discovery and reveal pleiotropy. Most multi-trait analysis methods focus on individual common variants in genome-wide association studies. Here, we introduce multi-trait analysis of rare-variant associations (MTAR), a framework for joint analysis of association summary statistics between multiple rare variants and different traits. MTAR achieves substantial power gain by leveraging the genome-wide genetic correlation measure to inform the degree of gene-level effect heterogeneity across traits. We apply MTAR to rare-variant summary statistics for three lipid traits in the Global Lipids Genetics Consortium. 99 genome-wide significant genes were identified in the single-trait-based tests, and MTAR increases this to 139. Among the 11 novel lipid-associated genes discovered by MTAR, 7 are replicated in an independent UK Biobank GWAS analysis. Our study demonstrates that MTAR is substantially more powerful than single-trait-based tests and highlights the value of MTAR for novel gene discovery.
Collapse
Affiliation(s)
- Lan Luo
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, 53706, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, 07065, USA
| | - Hong Zhang
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, 07065, USA
| | - Aparna Chhibber
- Genetics and Pharmacogenomics, Merck & Co., Inc., West Point, Pennsylvania, 19446, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, Pennsylvania, 19454, USA
| | - Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, 53715, USA.
- Wisconsin Institute for Discovery, Madison, Wisconsin, 53715, USA.
| |
Collapse
|
16
|
Jiang D, Deng J, Dong C, Ma X, Xiao Q, Zhou B, Yang C, Wei L, Conran C, Zheng SL, Ng IOL, Yu L, Xu J, Sham PC, Qi X, Hou J, Ji Y, Cao G, Li M. Knowledge-based analyses reveal new candidate genes associated with risk of hepatitis B virus related hepatocellular carcinoma. BMC Cancer 2020; 20:403. [PMID: 32393195 PMCID: PMC7216662 DOI: 10.1186/s12885-020-06842-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Accepted: 04/07/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Recent genome-wide association studies (GWASs) have suggested several susceptibility loci of hepatitis B virus (HBV)-related hepatocellular carcinoma (HCC) by statistical analysis at individual single-nucleotide polymorphisms (SNPs). However, these loci only explain a small fraction of HBV-related HCC heritability. In the present study, we aimed to identify additional susceptibility loci of HBV-related HCC using advanced knowledge-based analysis. METHODS We performed knowledge-based analysis (including gene- and gene-set-based association tests) on variant-level association p-values from two existing GWASs of HBV-related HCC. Five different types of gene-sets were collected for the association analysis. A number of SNPs within the gene prioritized by the knowledge-based association tests were selected to replicate genetic associations in an independent sample of 965 cases and 923 controls. RESULTS The gene-based association analysis detected four genes significantly or suggestively associated with HBV-related HCC risk: SLC39A8, GOLGA8M, SMIM31, and WHAMMP2. The gene-set-based association analysis prioritized two promising gene sets for HCC, cell cycle G1/S transition and NOTCH1 intracellular domain regulates transcription. Within the gene sets, three promising candidate genes (CDC45, NCOR1 and KAT2A) were further prioritized for HCC. Among genes of liver-specific expression, multiple genes previously implicated in HCC were also highlighted. However, probably due to small sample size, none of the genes prioritized by the knowledge-based association analyses were successfully replicated by variant-level association test in the independent sample. CONCLUSIONS This comprehensive knowledge-based association mining study suggested several promising genes and gene-sets associated with HBV-related HCC risks, which would facilitate follow-up functional studies on the pathogenic mechanism of HCC.
Collapse
Affiliation(s)
- Deke Jiang
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institutes of Liver Diseases Research of Guangdong Province, Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Jiaen Deng
- Department of Psychiatry, the University of Hong Kong, Pokfulam, Hong Kong
| | | | - Xiaopin Ma
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Qianyi Xiao
- Center for Genomic Translational Medicine and Prevention, School of Public Health, Fudan University, Shanghai, China
| | - Bin Zhou
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institutes of Liver Diseases Research of Guangdong Province, Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Chou Yang
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institutes of Liver Diseases Research of Guangdong Province, Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Lin Wei
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL, USA
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Carly Conran
- Program for Personalized Cancer Care, NorthShore University HealthSystem, Pritzker School of Medicine, University of Chicago, Evanston, IL, USA
| | - S Lilly Zheng
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL, USA
| | - Irene Oi-Lin Ng
- Department of Pathology, the University of Hong Kong, Pokfulam, Hong Kong
- State Key Laboratory of Liver Research, the University of Hong Kong, Pokfulam, Hong Kong
| | - Long Yu
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
| | - Jianfeng Xu
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL, USA
| | - Pak C Sham
- The Centre for Genomic Sciences, the University of Hong Kong, Pokfulam, Hong Kong
| | - Xiaolong Qi
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institutes of Liver Diseases Research of Guangdong Province, Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Jinlin Hou
- State Key Laboratory of Organ Failure Research, Guangdong Key Laboratory of Viral Hepatitis Research, Institutes of Liver Diseases Research of Guangdong Province, Department of Infectious Diseases and Hepatology Unit, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Yuan Ji
- Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Guangwen Cao
- Department of Epidemiology, Second Military Medical University, Shanghai, China.
| | - Miaoxin Li
- Department of Psychiatry, the University of Hong Kong, Pokfulam, Hong Kong.
- The Centre for Genomic Sciences, the University of Hong Kong, Pokfulam, Hong Kong.
- State Key Laboratory for Cognitive and Brain Sciences, the University of Hong Kong, Pokfulam, Hong Kong.
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, China.
| |
Collapse
|
17
|
Guo B, Wu B. Powerful and efficient SNP-set association tests across multiple phenotypes using GWAS summary data. Bioinformatics 2020; 35:1366-1372. [PMID: 30239606 DOI: 10.1093/bioinformatics/bty811] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Revised: 08/29/2018] [Accepted: 09/18/2018] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Many GWAS conducted in the past decade have identified tens of thousands of disease related variants, which in total explained only part of the heritability for most traits. There remain many more genetics variants with small effect sizes to be discovered. This has motivated the development of sequencing studies with larger sample sizes and increased resolution of genotyped variants, e.g., the ongoing NHLBI Trans-Omics for Precision Medicine (TOPMed) whole genome sequencing project. An alternative approach is the development of novel and more powerful statistical methods. The current dominating approach in the field of GWAS analysis is the "single trait single variant" association test, despite the fact that most GWAS are conducted in deeply-phenotyped cohorts with many correlated traits measured. In this paper, we aim to develop rigorous methods that integrate multiple correlated traits and multiple variants to improve the power to detect novel variants. In recognition of the difficulty of accessing raw genotype and phenotype data due to privacy and logistic concerns, we develop methods that are applicable to publicly available GWAS summary data. RESULTS We build rigorous statistical models for GWAS summary statistics to motivate novel multi-trait SNP-set association tests, including variance component test, burden test and their adaptive test, and develop efficient numerical algorithms to quickly compute their analytical P-values. We implement the proposed methods in an open source R package. We conduct thorough simulation studies to verify the proposed methods rigorously control type I errors at the genome-wide significance level, and further demonstrate their utility via comprehensive analysis of GWAS summary data for multiple lipids traits and glycemic traits. We identified many novel loci that were not detected by the individual trait based GWAS analysis. AVAILABILITY AND IMPLEMENTATION We have implemented the proposed methods in an R package freely available at http://www.github.com/baolinwu/MSKAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Guo
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
18
|
Sha Q, Wang Z, Zhang X, Zhang S. A clustering linear combination approach to jointly analyze multiple phenotypes for GWAS. Bioinformatics 2019; 35:1373-1379. [PMID: 30239574 PMCID: PMC6477981 DOI: 10.1093/bioinformatics/bty810] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Revised: 08/29/2018] [Accepted: 09/18/2018] [Indexed: 12/16/2022] Open
Abstract
SUMMARY There is an increasing interest in joint analysis of multiple phenotypes for genome-wide association studies (GWASs) based on the following reasons. First, cohorts usually collect multiple phenotypes and complex diseases are usually measured by multiple correlated intermediate phenotypes. Second, jointly analyzing multiple phenotypes may increase statistical power for detecting genetic variants associated with complex diseases. Third, there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. In this paper, we develop a clustering linear combination (CLC) method to jointly analyze multiple phenotypes for GWASs. In the CLC method, we first cluster individual statistics into positively correlated clusters and then, combine the individual statistics linearly within each cluster and combine the between-cluster terms in a quadratic form. CLC is not only robust to different signs of the means of individual statistics, but also reduce the degrees of freedom of the test statistic. We also theoretically prove that if we can cluster the individual statistics correctly, CLC is the most powerful test among all tests with certain quadratic forms. Our simulation results show that CLC is either the most powerful test or has similar power to the most powerful test among the tests we compared, and CLC is much more powerful than other tests when effect sizes align with inferred clusters. We also evaluate the performance of CLC through a real case study. AVAILABILITY AND IMPLEMENTATION R code for implementing our method is available at http://www.math.mtu.edu/∼shuzhang/software.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Xiao Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| |
Collapse
|
19
|
Yazdani A, Yazdani A, Méndez Giráldez R, Aguilar D, Sartore L. A Multi-Trait Approach Identified Genetic Variants Including a Rare Mutation in RGS3 with Impact on Abnormalities of Cardiac Structure/Function. Sci Rep 2019; 9:5845. [PMID: 30971721 PMCID: PMC6458140 DOI: 10.1038/s41598-019-41362-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 03/05/2019] [Indexed: 01/29/2023] Open
Abstract
Heart failure is a major cause for premature death. Given the heterogeneity of the heart failure syndrome, identifying genetic determinants of cardiac function and structure may provide greater insights into heart failure. Despite progress in understanding the genetic basis of heart failure through genome wide association studies, the heritability of heart failure is not well understood. Gaining further insights into mechanisms that contribute to heart failure requires systematic approaches that go beyond single trait analysis. We integrated a Bayesian multi-trait approach and a Bayesian networks for the analysis of 10 correlated traits of cardiac structure and function measured across 3387 individuals with whole exome sequence data. While using single-trait based approaches did not find any significant genetic variant, applying the integrative Bayesian multi-trait approach, we identified 3 novel variants located in genes, RGS3, CHD3, and MRPL38 with significant impact on the cardiac traits such as left ventricular volume index, parasternal long axis interventricular septum thickness, and mean left ventricular wall thickness. Among these, the rare variant NC_000009.11:g.116346115C > A (rs144636307) in RGS3 showed pleiotropic effect on left ventricular mass index, left ventricular volume index and maximal left atrial anterior-posterior diameter while RGS3 can inhibit TGF-beta signaling associated with left ventricle dilation and systolic dysfunction.
Collapse
Affiliation(s)
- Akram Yazdani
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA. .,Climax Data Pattern, Boston, MA, USA.
| | - Azam Yazdani
- School of Medicine, Boston University, Boston, MA, USA
| | - Raúl Méndez Giráldez
- Lineberger Comprehensive Cancer Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Luca Sartore
- National Institute of Statistical Science, Washington, DC, USA
| |
Collapse
|
20
|
Fan Q, Zhang F, Wang W, Xu J, Hao J, He A, Wen Y, Li P, Liang X, Du Y, Liu L, Wu C, Wang S, Wang X, Ning Y, Guo X. GWAS summary-based pathway analysis correcting for the genetic confounding impact of environmental exposures. Brief Bioinform 2019; 19:725-730. [PMID: 28334273 DOI: 10.1093/bib/bbx025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2016] [Indexed: 11/13/2022] Open
Abstract
Genome-wide association study (GWAS)-based pathway association analysis is a powerful approach for the genetic studies of human complex diseases. However, the genetic confounding effects of environment exposure-related genes can decrease the accuracy of GWAS-based pathway association analysis of target diseases. In this study, we developed a pathway association analysis approach, named Mendelian randomization-based pathway enrichment analysis (MRPEA), which was capable of correcting the genetic confounding effects of environmental exposures, using the GWAS summary data of environmental exposures. After analyzing the real GWAS summary data of cardiovascular disease and cigarette smoking, we observed significantly improved performance of MRPEA compared with traditional pathway association analysis (TPAA) without adjusting for environmental exposures. Further, simulation studies found that MRPEA generally outperformed TPAA under various scenarios. We hope that MRPEA could help to fill the gap of TPAA and identify novel causal pathways for complex diseases.
Collapse
Affiliation(s)
- Qianrui Fan
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Feng Zhang
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Wenyu Wang
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Jiawen Xu
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Jingcan Hao
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Awen He
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Yan Wen
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Ping Li
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xiao Liang
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Yanan Du
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Li Liu
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Cuiyan Wu
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Sen Wang
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xi Wang
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Yujie Ning
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xiong Guo
- Health Science Center, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| |
Collapse
|
21
|
Derkach A, Pfeiffer RM. Subset testing and analysis of multiple phenotypes. Genet Epidemiol 2019; 43:492-505. [PMID: 30920058 DOI: 10.1002/gepi.22199] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 02/08/2019] [Accepted: 02/19/2019] [Indexed: 11/08/2022]
Abstract
Meta-analysis of multiple genome-wide association studies (GWAS) is effective for detecting single- or multimarker associations with complex traits. We develop a flexible procedure (subset testing and analysis of multiple phenotypes [STAMP]) based on mixture models to perform a region-based meta-analysis of different phenotypes using data from different GWAS and identify subsets of associated phenotypes. Our model framework helps distinguish true associations from between-study heterogeneity. As a measure of association, we compute for each phenotype the posterior probability that the genetic region under investigation is truly associated. Extensive simulations show that STAMP is more powerful than standard approaches for meta-analyses when the proportion of truly associated outcomes is between 25% and 50%. For other settings, the power of STAMP is similar to that of existing methods. We illustrate our method on two examples, the association of a region on chromosome 9p21 with the risk of 14 cancers, and the associations of expression of quantitative trait loci from two genetic regions with their cis-single-nucleotide polymorphisms measured in 17 tissue types using data from The Cancer Genome Atlas.
Collapse
Affiliation(s)
- Andriy Derkach
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland
| | - Ruth M Pfeiffer
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland
| |
Collapse
|
22
|
|
23
|
Liang X, Sha Q, Zhang S. Joint analysis of multiple phenotypes in association studies using allele-based clustering approach for non-normal distributions. Ann Hum Genet 2018; 82:389-395. [PMID: 29932453 PMCID: PMC6188849 DOI: 10.1111/ahg.12260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 03/15/2018] [Accepted: 05/11/2018] [Indexed: 11/29/2022]
Abstract
In the study of complex diseases, several correlated phenotypes are usually measured. There is also increasing evidence showing that testing the association between a single-nucleotide polymorphism (SNP) and multiple-dependent phenotypes jointly is often more powerful than analyzing only one phenotype at a time. Therefore, developing statistical methods to test for genetic association with multiple phenotypes has become increasingly important. In this paper, we develop an Allele-based Clustering Approach (ACA) for the joint analysis of multiple non-normal phenotypes in association studies. In ACA, we consider the alleles at a SNP of interest as a dependent variable with two classes, and the correlated phenotypes as predictors to predict the alleles at the SNP of interest. We perform extensive simulation studies to evaluate the performance of ACA and compare the power of ACA with the powers of Adaptive Fisher's Combination test, Trait-based Association Test that uses Extended Simes procedure, Fisher's Combination test, the standard MANOVA, and the joint model of Multiple Phenotypes. Our simulation studies show that the proposed method has correct type I error rates and is much more powerful than other methods for some non-normal distributions.
Collapse
Affiliation(s)
- Xiaoyu Liang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| |
Collapse
|
24
|
Deng Y, Pan W. Improved Use of Small Reference Panels for Conditional and Joint Analysis with GWAS Summary Statistics. Genetics 2018; 209:401-408. [PMID: 29674520 PMCID: PMC5972416 DOI: 10.1534/genetics.118.300813] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 04/04/2018] [Indexed: 02/08/2023] Open
Abstract
Due to issues of practicality and confidentiality of genomic data sharing on a large scale, typically only meta- or mega-analyzed genome-wide association study (GWAS) summary data, not individual-level data, are publicly available. Reanalyses of such GWAS summary data for a wide range of applications have become more and more common and useful, which often require the use of an external reference panel with individual-level genotypic data to infer linkage disequilibrium (LD) among genetic variants. However, with a small sample size in only hundreds, as for the most popular 1000 Genomes Project European sample, estimation errors for LD are not negligible, leading to often dramatically increased numbers of false positives in subsequent analyses of GWAS summary data. To alleviate the problem in the context of association testing for a group of SNPs, we propose an alternative estimator of the covariance matrix with an idea similar to multiple imputation. We use numerical examples based on both simulated and real data to demonstrate the severe problem with the use of the 1000 Genomes Project reference panels, and the improved performance of our new approach.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
25
|
Liang X, Sha Q, Rho Y, Zhang S. A hierarchical clustering method for dimension reduction in joint analysis of multiple phenotypes. Genet Epidemiol 2018; 42:344-353. [PMID: 29682782 DOI: 10.1002/gepi.22124] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 02/01/2018] [Accepted: 02/19/2018] [Indexed: 12/25/2022]
Abstract
Genome-wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. In this paper, we develop a novel variable reduction method using hierarchical clustering method (HCM) for joint analysis of multiple phenotypes in association studies. The proposed method involves two steps. The first step applies a dimension reduction technique by using a representative phenotype for each cluster of phenotypes. Then, existing methods are used in the second step to test the association between genetic variants and the representative phenotypes rather than the individual phenotypes. We perform extensive simulation studies to compare the powers of multivariate analysis of variance (MANOVA), joint model of multiple phenotypes (MultiPhen), and trait-based association test that uses extended simes procedure (TATES) using HCM with those of without using HCM. Our simulation studies show that using HCM is more powerful than without using HCM in most scenarios. We also illustrate the usefulness of using HCM by analyzing a whole-genome genotyping data from a lung function study.
Collapse
Affiliation(s)
- Xiaoyu Liang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Yeonwoo Rho
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
26
|
Deng Y, Pan W. Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses. Genetics 2017; 207:1285-1299. [PMID: 28971959 PMCID: PMC5714448 DOI: 10.1534/genetics.117.300347] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 09/29/2017] [Indexed: 11/18/2022] Open
Abstract
There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the working independence model for robust inference. We provide numerical examples based on both simulated and real data, including two large lipid GWAS summary association datasets based on ∼100,000 and ∼189,000 samples, respectively, to demonstrate the difference between marginal and conditional analyses, as well as the effectiveness of our new approach.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
27
|
Deng Y, Pan W. Conditional analysis of multiple quantitative traits based on marginal GWAS summary statistics. Genet Epidemiol 2017; 41:427-436. [PMID: 28464407 PMCID: PMC5536980 DOI: 10.1002/gepi.22046] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 01/09/2017] [Accepted: 02/04/2017] [Indexed: 12/22/2022]
Abstract
There has been an increasing interest in joint association testing of multiple traits for possible pleiotropic effects. However, even in the presence of pleiotropy, most of the existing methods cannot distinguish direct and indirect effects of a genetic variant, say single-nucleotide polymorphism (SNP), on multiple traits, and a conditional analysis of a trait adjusting for other traits is perhaps the simplest and most common approach to addressing this question. However, without individual-level genotypic and phenotypic data but with only genome-wide association study (GWAS) summary statistics, as typical with most large-scale GWAS consortium studies, we are not aware of any existing method for such a conditional analysis. We propose such a conditional analysis, offering formulas of necessary calculations to fit a joint linear regression model for multiple quantitative traits. Furthermore, our method can also accommodate conditional analysis on multiple SNPs in addition to on multiple quantitative traits, which is expected to be useful for fine mapping. We provide numerical examples based on both simulated and real GWAS data to demonstrate the effectiveness of our proposed approach, and illustrate possible usefulness of conditional analysis by contrasting its result differences from those of standard marginal analyses.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|