1
|
Guo H, Li T, Wang Z. Pleiotropic genetic association analysis with multiple phenotypes using multivariate response best-subset selection. BMC Genomics 2023; 24:759. [PMID: 38082214 PMCID: PMC10712198 DOI: 10.1186/s12864-023-09820-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/20/2023] [Indexed: 12/18/2023] Open
Abstract
Genetic pleiotropy refers to the simultaneous association of a gene with multiple phenotypes. It is widely distributed in the whole genome and can help to understand the common genetic mechanism of diseases or traits. In this study, a multivariate response best-subset selection (MRBSS) model based pleiotropic association analysis method is proposed. Different from the traditional genetic association model, the high-dimensional genotypic data are viewed as response variables while the multiple phenotypic data as predictor variables. Moreover, the response best-subset selection procedure is converted into an 0-1 integer optimization problem by introducing a separation parameter and a tuning parameter. Furthermore, the model parameters are estimated by using the curve search under the modified Bayesian information criterion. Simulation experiments show that the proposed method MRBSS remarkably reduces the computational time, obtains higher statistical power under most of the considered scenarios, and controls the type I error rate at a low level. The application studies in the datasets of maize yield traits and pig lipid traits further verifies the effectiveness.
Collapse
Affiliation(s)
- Hongping Guo
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, 435002, People's Republic of China.
| | - Tong Li
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, 435002, People's Republic of China
| | - Zixuan Wang
- School of Mathematics and Statistics, South-Central Minzu University, Wuhan, 430074, People's Republic of China
| |
Collapse
|
2
|
Kim K, Jun TH, Ha BK, Wang S, Sun H. New statistical selection method for pleiotropic variants associated with both quantitative and qualitative traits. BMC Bioinformatics 2023; 24:381. [PMID: 37817069 PMCID: PMC10563219 DOI: 10.1186/s12859-023-05505-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 09/28/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND Identification of pleiotropic variants associated with multiple phenotypic traits has received increasing attention in genetic association studies. Overlapping genetic associations from multiple traits help to detect weak genetic associations missed by single-trait analyses. Many statistical methods were developed to identify pleiotropic variants with most of them being limited to quantitative traits when pleiotropic effects on both quantitative and qualitative traits have been observed. This is a statistically challenging problem because there does not exist an appropriate multivariate distribution to model both quantitative and qualitative data together. Alternatively, meta-analysis methods can be applied, which basically integrate summary statistics of individual variants associated with either a quantitative or a qualitative trait without accounting for correlations among genetic variants. RESULTS We propose a new statistical selection method based on a unified selection score quantifying how a genetic variant, i.e., a pleiotropic variant associates with both quantitative and qualitative traits. In our extensive simulation studies where various types of pleiotropic effects on both quantitative and qualitative traits were considered, we demonstrated that the proposed method outperforms the existing meta-analysis methods in terms of true positive selection. We also applied the proposed method to a peanut dataset with 6 quantitative and 2 qualitative traits, and a cowpea dataset with 2 quantitative and 6 qualitative traits. We were able to detect some potentially pleiotropic variants missed by the existing methods in both analyses. CONCLUSIONS The proposed method is able to locate pleiotropic variants associated with both quantitative and qualitative traits. It has been implemented into an R package 'UNISS', which can be downloaded from http://github.com/statpng/uniss.
Collapse
Affiliation(s)
- Kipoong Kim
- Department of Statistic, Pusan National University, 46241, Busan, Korea
| | - Tae-Hwan Jun
- Department of Plant Bioscience, Pusan National University, 50463, Miryang, Korea
| | - Bo-Keun Ha
- Department of Applied Plant Science, Chonnam National University, 61186, Gwangju, Korea
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, 10032, USA
| | - Hokeun Sun
- Department of Statistic, Pusan National University, 46241, Busan, Korea.
| |
Collapse
|
3
|
Wei Q, Chen L, Zhou Y, Wang H. An adaptive test based on principal components for detecting multiple phenotype associations using GWAS summary data. Genetica 2023; 151:97-104. [PMID: 36656460 DOI: 10.1007/s10709-023-00179-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 01/11/2023] [Indexed: 01/20/2023]
Abstract
Extensive evidence from genome-wide association studies (GWAS) has shown that jointly analyzing multiple phenotypes can improve the power of the association test compared to the traditional single variant versus single trait approach. Here we propose an adaptive test based on principal components (ATPC) that is powerful and efficient for discovering the association between a single variant and multiple traits. Our method only needs GWAS summary statistics that are often available. We first estimate the trait correlation matrix by LD score regression. Then, based on the correlation matrix, we construct a series of test statistics that contain different numbers of principal components. The ultimate test statistic combines the P values of these principal component-based statistics by using the aggregated Cauchy association test. The analytical P-value of the test statistic can be computed quickly without the permutation process, which is the notable feature of our proposed method. The extensive simulation studies demonstrate that ATPC can control the type I error rates and have powerful and robust performance compared to several existing tests in a wide range of simulation settings. The analysis of the lipids GWAS summary data from the Global Lipids Genetics Consortium shows that ATPC identifies 230 new SNPs that are missed by the original single trait association analysis. By searching the GWAS Catalog, some SNPs and mapped genes identified by ATPC are reported to be associated with lipid traits. Through further analysis for GWAS results, we also find some Gene Ontology terms and biological pathways related to lipids.
Collapse
Affiliation(s)
- Qianran Wei
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, 150080, China
| | - Lili Chen
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, 150080, China.
| | - Yajing Zhou
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, 150080, China
| | - Huiyi Wang
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, 150080, China
| |
Collapse
|
4
|
Vilor-Tejedor N, Garrido-Martín D, Rodriguez-Fernandez B, Lamballais S, Guigó R, Gispert JD. Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO! Comput Struct Biotechnol J 2021; 19:5800-5810. [PMID: 34765095 PMCID: PMC8567328 DOI: 10.1016/j.csbj.2021.10.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 10/08/2021] [Accepted: 10/12/2021] [Indexed: 12/01/2022] Open
Abstract
Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neuroimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimaging data.
Collapse
Affiliation(s)
- Natalia Vilor-Tejedor
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Diego Garrido-Martín
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
| | | | - Sander Lamballais
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan Domingo Gispert
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
- IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
- Centro de Investigación Biomédica en Red Bioingeniería, Biomateriales y Nanomedicina, Madrid, Spain
| |
Collapse
|
5
|
Sasaki T, Ikeda K, Nakajima T, Kawabata-Iwakawa R, Iizuka T, Dharmawan T, Tamura S, Niwamae N, Tange S, Nishiyama M, Kaneko Y, Kurabayashi M. Multiple arrhythmic and cardiomyopathic phenotypes associated with an SCN5A A735E mutation. J Electrocardiol 2021; 65:122-127. [PMID: 33610078 DOI: 10.1016/j.jelectrocard.2021.01.019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 01/29/2021] [Accepted: 01/29/2021] [Indexed: 12/19/2022]
Abstract
BACKGROUND SCN5A mutations are associated with multiple arrhythmic and cardiomyopathic phenotypes including Brugada syndrome (BrS), sinus node dysfunction (SND), atrioventricular block, supraventricular tachyarrhythmias (SVTs), long QT syndrome (LQTS), dilated cardiomyopathy and left ventricular noncompaction. Several single SCN5A mutations have been associated with overlap of some of these phenotypes, but never with overlap of all the phenotypes. OBJECTIVE We encountered two pedigrees with multiple arrhythmic phenotypes with or without cardiomyopathic phenotypes, and sought to identify a responsible mutation and reveal its functional abnormalities. METHODS Target panel sequencing of 72 genes, including inherited arrhythmia syndromes- and cardiomyopathies-related genes, was employed in two probands. Cascade screening was performed by Saner sequencing. Wild-type or identified mutant SCN5A were expressed in tsA201 cells, and whole-cell sodium currents (INa) were recorded using patch-clamp techniques. RESULTS We identified an SCN5A A735E mutation in these probands, but did not identify any other mutations. All eight mutation carriers exhibited at least one of the arrhythmic phenotypes. Two patients exhibited multiple arrhythmic phenotypes: one (15-year-old girl) exhibited BrS, SND, and exercise and epinephrine-induced QT prolongation, the other (4-year-old boy) exhibited BrS, SND, and SVTs. Another one (30-year-old male) exhibited all arrhythmic and cardiomyopathic phenotypes, except for LQTS. One male suddenly died at age 22. Functional analysis revealed that the mutant did not produce functional INa. CONCLUSIONS A non-functional SCN5A A735E mutation could be associated with multiple arrhythmic and cardiomyopathic phenotypes, although there remains a possibility that other unidentified factors may be involved in the phenotypic variability of the mutation carriers.
Collapse
Affiliation(s)
- Takashi Sasaki
- Department of Cardiovascular Medicine, Japanese Red Cross Maebashi Hospital, Maebashi, Gunma, Japan
| | - Kentaro Ikeda
- Department of Cardiology, Gunma Children's Medical Center, Shibukawa, Gunma, Japan
| | - Tadashi Nakajima
- Department of Cardiovascular Medicine, Gunma University Graduate School of Medicine, Maebashi, Gunma, Japan.
| | - Reika Kawabata-Iwakawa
- Division of Integrated Oncology Research, Gunma University Initiative for Advanced Research, Maebashi, Gunma, Japan
| | - Takashi Iizuka
- Department of Cardiovascular Medicine, Gunma University Graduate School of Medicine, Maebashi, Gunma, Japan
| | - Tommy Dharmawan
- Department of Cardiovascular Medicine, Gunma University Graduate School of Medicine, Maebashi, Gunma, Japan
| | - Shuntaro Tamura
- Department of Cardiovascular Medicine, Gunma University Graduate School of Medicine, Maebashi, Gunma, Japan
| | - Nogiku Niwamae
- Department of Cardiovascular Medicine, Japanese Red Cross Maebashi Hospital, Maebashi, Gunma, Japan
| | - Shoichi Tange
- Department of Cardiovascular Medicine, Japanese Red Cross Maebashi Hospital, Maebashi, Gunma, Japan
| | | | - Yoshiaki Kaneko
- Department of Cardiovascular Medicine, Gunma University Graduate School of Medicine, Maebashi, Gunma, Japan
| | - Masahiko Kurabayashi
- Department of Cardiovascular Medicine, Gunma University Graduate School of Medicine, Maebashi, Gunma, Japan
| |
Collapse
|
6
|
Chen L, Zhou Y. A fast and powerful aggregated Cauchy association test for joint analysis of multiple phenotypes. Genes Genomics 2021; 43:69-77. [PMID: 33432394 DOI: 10.1007/s13258-020-01034-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 12/23/2020] [Indexed: 11/27/2022]
Abstract
BACKGROUND Pleiotropy is a widespread phenomenon in complex human diseases. Jointly analyzing multiple phenotypes can improve power performance of detecting genetic variants and uncover the underlying genetic mechanism. OBJECTIVE This study aims to detect the association between genetic variants in a genomic region and multiple phenotypes. METHODS We develop the aggregated Cauchy association test to detect the association between rare variants in a genomic region and multiple phenotypes (abbreviated as "Multi-ACAT"). Multi-ACAT first detects the association between each rare variant and multiple phenotypes based on reverse regression and obtains variant-level p-values, then takes linear combination of transformed p-values as the test statistic which approximately follows Cauchy distribution under the null hypothesis. RESULTS Extensive simulation studies show that when the proportion of causal variants in a genomic region is extremely small, Multi-ACAT is more powerful than the other several methods and is robust to bi-directional effects of causal variants. Finally, we illustrate our proposed method by analyzing two phenotypes [systolic blood pressure (SBP) and diastolic blood pressure (DBP)] from Genetic Analysis Workshop 19 (GAW19). CONCLUSION The Multi-ACAT computes extremely fast, does not consider complex distributions of multiple correlated phenotypes, and can be applied to the case with noise phenotypes.
Collapse
Affiliation(s)
- Lili Chen
- School of Mathematical Sciences, Heilongjiang University, No. 74 Xuefu Road, Nangang District, Harbin, 150080, People's Republic of China
| | - Yajing Zhou
- School of Mathematical Sciences, Heilongjiang University, No. 74 Xuefu Road, Nangang District, Harbin, 150080, People's Republic of China.
| |
Collapse
|
7
|
Wen Y, Lu Q. An optimal kernel-based multivariate U-statistic to test for associations with multiple phenotypes. Biostatistics 2020; 23:705-720. [PMID: 33108446 DOI: 10.1093/biostatistics/kxaa049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 09/24/2020] [Accepted: 10/03/2020] [Indexed: 11/13/2022] Open
Abstract
Set-based analysis that jointly considers multiple predictors in a group has been broadly conducted for association tests. However, their power can be sensitive to the distribution of phenotypes, and the underlying relationships between predictors and outcomes. Moreover, most of the set-based methods are designed for single-trait analysis, making it hard to explore the pleiotropic effect and borrow information when multiple phenotypes are available. Here, we propose a kernel-based multivariate U-statistics (KMU) that is robust and powerful in testing the association between a set of predictors and multiple outcomes. We employed a rank-based kernel function for the outcomes, which makes our method robust to various outcome distributions. Rather than selecting a single kernel, our test statistics is built based on multiple kernels selected in a data-driven manner, and thus is capable of capturing various complex relationships between predictors and outcomes. The asymptotic properties of our test statistics have been developed. Through simulations, we have demonstrated that KMU has controlled type I error and higher power than its counterparts. We further showed its practical utility by analyzing a whole genome sequencing data from Alzheimer's Disease Neuroimaging Initiative study, where novel genes have been detected to be associated with imaging phenotypes.
Collapse
Affiliation(s)
- Y Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Qing Lu
- Department of Biostatistics, College of Public Health, University of Florida, Gainesville, FL, USA
| |
Collapse
|
8
|
Xia Y, Cai TT, Li H. Joint testing and false discovery rate control in high-dimensional multivariate regression. Biometrika 2019; 105:249-269. [PMID: 30799872 DOI: 10.1093/biomet/asx085] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Indexed: 01/15/2023] Open
Abstract
Multivariate regression with high-dimensional covariates has many applications in genomic and genetic research, in which some covariates are expected to be associated with multiple responses. This paper considers joint testing for regression coefficients over multiple responses and develops simultaneous testing methods with false discovery rate control. The test statistic is based on inverse regression and bias-corrected group lasso estimates of the regression coefficients and is shown to have an asymptotic chi-squared null distribution. A row-wise multiple testing procedure is developed to identify the covariates associated with the responses. The procedure is shown to control the false discovery proportion and false discovery rate at a prespecified level asymptotically. Simulations demonstrate the gain in power, relative to entrywise testing, in detecting the covariates associated with the responses. The test is applied to an ovarian cancer dataset to identify the microRNA regulators that regulate protein expression.
Collapse
Affiliation(s)
- Yin Xia
- Department of Statistics, School of Management, Fudan University, Shanghai, China
| | - T Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, U.S.A
| | - Hongzhe Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, U.S.A
| |
Collapse
|
9
|
Agniel D, Cai T. Analysis of multiple diverse phenotypes via semiparametric canonical correlation analysis. Biometrics 2017; 73:1254-1265. [PMID: 28407213 DOI: 10.1111/biom.12690] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Revised: 02/01/2017] [Accepted: 02/01/2017] [Indexed: 11/30/2022]
Abstract
Studying multiple outcomes simultaneously allows researchers to begin to identify underlying factors that affect all of a set of diseases (i.e., shared etiology) and what may give rise to differences in disorders between patients (i.e., disease subtypes). In this work, our goal is to build risk scores that are predictive of multiple phenotypes simultaneously and identify subpopulations at high risk of multiple phenotypes. Such analyses could yield insight into etiology or point to treatment and prevention strategies. The standard canonical correlation analysis (CCA) can be used to relate multiple continuous outcomes to multiple predictors. However, in order to capture the full complexity of a disorder, phenotypes may include a diverse range of data types, including binary, continuous, ordinal, and censored variables. When phenotypes are diverse in this way, standard CCA is not possible and no methods currently exist to model them jointly. In the presence of such complications, we propose a semi-parametric CCA method to develop risk scores that are predictive of multiple phenotypes. To guard against potential model mis-specification, we also propose a nonparametric calibration method to identify subgroups that are at high risk of multiple disorders. A resampling procedure is also developed to account for the variability in these estimates. Our method opens the door to synthesizing a wide array of data sources for the purposes of joint prediction.
Collapse
Affiliation(s)
- Denis Agniel
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Tianxi Cai
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts
| |
Collapse
|
10
|
Gordon D, Londono D, Patel P, Kim W, Finch SJ, Heiman GA. An Analytic Solution to the Computation of Power and Sample Size for Genetic Association Studies under a Pleiotropic Mode of Inheritance. Hum Hered 2017; 81:194-209. [PMID: 28315880 DOI: 10.1159/000457135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 01/20/2017] [Indexed: 01/14/2023] Open
Abstract
Our motivation here is to calculate the power of 3 statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected) and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend the analytic power and minimum-sample-size-necessary (MSSN) formulas for 2 categorical data-based tests (genotype, linear trend test [LTT]) of genetic association to the pleiotropic model. We further compare the MSSN of the genotype test and the LTT with that of a multivariate ANOVA (Pillai). We approximate the MSSN for statistics by linear models using a factorial design and ANOVA. With ANOVA decomposition, we determine which factors most significantly change the power/MSSN for all statistics. Finally, we determine which test statistics have the smallest MSSN. In this work, MSSN calculations are for 2 traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits. Our key findings are that the genotype test usually has lower MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% vs. top/bottom 10%) have higher sample size requirements. The Pillai test has a much larger MSSN than both the genotype test and the LTT, as a result of sample selection. With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes.
Collapse
Affiliation(s)
- Derek Gordon
- Department of Genetics, The State University of New Jersey, Piscataway, NJ, USA
| | | | | | | | | | | |
Collapse
|
11
|
Mägi R, Suleimanov YV, Clarke GM, Kaakinen M, Fischer K, Prokopenko I, Morris AP. SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes. BMC Bioinformatics 2017; 18:25. [PMID: 28077070 PMCID: PMC5225593 DOI: 10.1186/s12859-016-1437-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2016] [Accepted: 12/17/2016] [Indexed: 11/10/2022] Open
Abstract
Background Genome-wide association studies (GWAS) of single nucleotide polymorphisms (SNPs) have been successful in identifying loci contributing genetic effects to a wide range of complex human diseases and quantitative traits. The traditional approach to GWAS analysis is to consider each phenotype separately, despite the fact that many diseases and quantitative traits are correlated with each other, and often measured in the same sample of individuals. Multivariate analyses of correlated phenotypes have been demonstrated, by simulation, to increase power to detect association with SNPs, and thus may enable improved detection of novel loci contributing to diseases and quantitative traits. Results We have developed the SCOPA software to enable GWAS analysis of multiple correlated phenotypes. The software implements “reverse regression” methodology, which treats the genotype of an individual at a SNP as the outcome and the phenotypes as predictors in a general linear model. SCOPA can be applied to quantitative traits and categorical phenotypes, and can accommodate imputed genotypes under a dosage model. The accompanying META-SCOPA software enables meta-analysis of association summary statistics from SCOPA across GWAS. Application of SCOPA to two GWAS of high-and low-density lipoprotein cholesterol, triglycerides and body mass index, and subsequent meta-analysis with META-SCOPA, highlighted stronger association signals than univariate phenotype analysis at established lipid and obesity loci. The META-SCOPA meta-analysis also revealed a novel signal of association at genome-wide significance for triglycerides mapping to GPC5 (lead SNP rs71427535, p = 1.1x10−8), which has not been reported in previous large-scale GWAS of lipid traits. Conclusions The SCOPA and META-SCOPA software enable discovery and dissection of multiple phenotype association signals through implementation of a powerful reverse regression approach.
Collapse
Affiliation(s)
- Reedik Mägi
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | - Yury V Suleimanov
- Computation-based Science and Technology Research Center, Cyprus Institute, Nicosia, Cyprus.,Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Geraldine M Clarke
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | | | - Krista Fischer
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | | | - Andrew P Morris
- Estonian Genome Center, University of Tartu, Tartu, Estonia. .,Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. .,Department of Biostatistics, University of Liverpool, Liverpool, UK.
| |
Collapse
|
12
|
Wang P, Rahman M, Jin L, Xiong M. A new statistical framework for genetic pleiotropic analysis of high dimensional phenotype data. BMC Genomics 2016; 17:881. [PMID: 27821073 PMCID: PMC5100198 DOI: 10.1186/s12864-016-3169-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 10/18/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The widely used genetic pleiotropic analyses of multiple phenotypes are often designed for examining the relationship between common variants and a few phenotypes. They are not suited for both high dimensional phenotypes and high dimensional genotype (next-generation sequencing) data. To overcome limitations of the traditional genetic pleiotropic analysis of multiple phenotypes, we develop sparse structural equation models (SEMs) as a general framework for a new paradigm of genetic analysis of multiple phenotypes. To incorporate both common and rare variants into the analysis, we extend the traditional multivariate SEMs to sparse functional SEMs. To deal with high dimensional phenotype and genotype data, we employ functional data analysis and the alternative direction methods of multiplier (ADMM) techniques to reduce data dimension and improve computational efficiency. RESULTS Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods. Simulations also demonstrate that the gene-based pleiotropic analysis has higher power than the single variant-based pleiotropic analysis. The proposed method is applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) with 11 phenotypes, which identifies a network with 137 genes connected to 11 phenotypes and 341 edges. Among them, 114 genes showed pleiotropic genetic effects and 45 genes were reported to be associated with phenotypes in the analysis or other cardiovascular disease (CVD) related phenotypes in the literature. CONCLUSIONS Our proposed sparse functional SEMs can incorporate both common and rare variants into the analysis and the ADMM algorithm can efficiently solve the penalized SEMs. Using this model we can jointly infer genetic architecture and casual phenotype network structure, and decompose the genetic effect into direct, indirect and total effect. Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods.
Collapse
Affiliation(s)
- Panpan Wang
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA.,State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200433, China
| | - Mohammad Rahman
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA
| | - Li Jin
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200433, China.
| | - Momiao Xiong
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA. .,Human Genetics Center, The University of Texas Health Science Center at Houston, P.O. Box 20186, Houston, TX, 77225, USA.
| |
Collapse
|
13
|
Agniel D, Liao KP, Cai T. Estimation and testing for multiple regulation of multivariate mixed outcomes. Biometrics 2016; 72:1194-1205. [PMID: 26910481 DOI: 10.1111/biom.12495] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 11/01/2015] [Accepted: 12/01/2015] [Indexed: 11/27/2022]
Abstract
Considerable interest has recently been focused on studying multiple phenotypes simultaneously in both epidemiological and genomic studies, either to capture the multidimensionality of complex disorders or to understand shared etiology of related disorders. We seek to identify multiple regulators or predictors that are associated with multiple outcomes when these outcomes may be measured on very different scales or composed of a mixture of continuous, binary, and not-fully observed elements. We first propose an estimation technique to put all effects on similar scales, and we induce sparsity on the estimated effects. We provide standard asymptotic results for this estimator and show that resampling can be used to quantify uncertainty in finite samples. We finally provide a multiple testing procedure which can be geared specifically to the types of multiple regulators of interest, and we establish that, under standard regularity conditions, the familywise error rate will approach 0 as sample size diverges. Simulation results indicate that our approach can improve over unregularized methods both in reducing bias in estimation and improving power for testing.
Collapse
Affiliation(s)
- Denis Agniel
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, U.S.A. 02115
| | - Katherine P Liao
- Brigham and Women's Hospital, Boston, Massachusetts, U.S.A. 02115
| | - Tianxi Cai
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, U.S.A. 02115
| |
Collapse
|