1
|
Xi D, Cui D, Zhang M, Zhang J, Shang M, Guo L, Han J, Du L. Identification of genetic basis of brain imaging by group sparse multi-task learning leveraging summary statistics. Comput Struct Biotechnol J 2024; 23:3288-3299. [PMID: 39296810 PMCID: PMC11409045 DOI: 10.1016/j.csbj.2024.08.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 08/29/2024] [Accepted: 08/29/2024] [Indexed: 09/21/2024] Open
Abstract
Brain imaging genetics is an evolving neuroscience topic aiming to identify genetic variations related to neuroimaging measurements of interest. Traditional linear regression methods have shown success, but their reliance on individual-level imaging and genetic data limits their applicability. Herein, we proposed S-GsMTLR, a group sparse multi-task linear regression method designed to harness summary statistics from genome-wide association studies (GWAS) of neuroimaging quantitative traits. S-GsMTLR directly employs GWAS summary statistics, bypassing the requirement for raw imaging genetic data, and applies multivariate multi-task sparse learning to these univariate GWAS results. It amalgamates the strengths of conventional sparse learning methods, including sophisticated modeling techniques and efficient feature selection. Additionally, we implemented a rapid optimization strategy to alleviate computational burdens by identifying genetic variants associated with phenotypes of interest across the entire chromosome. We first evaluated S-GsMTLR using summary statistics derived from the Alzheimer's Disease Neuroimaging Initiative. The results were remarkably encouraging, demonstrating its comparability to conventional methods in modeling and identification of risk loci. Furthermore, our method was evaluated with two additional GWAS summary statistics datasets: One focused on white matter microstructures and the other on whole brain imaging phenotypes, where the original individual-level data was unavailable. The results not only highlighted S-GsMTLR's ability to pinpoint significant loci but also revealed intriguing structures within genetic variations and loci that went unnoticed by GWAS. These findings suggest that S-GsMTLR is a promising multivariate sparse learning method in brain imaging genetics. It eliminates the need for original individual-level imaging and genetic data while demonstrating commendable modeling and feature selection capabilities.
Collapse
Affiliation(s)
- Duo Xi
- Northwestern Polytechnical University, Xi'an, 710072, China
| | - Dingnan Cui
- Northwestern Polytechnical University, Xi'an, 710072, China
| | | | - Jin Zhang
- Northwestern Polytechnical University, Xi'an, 710072, China
| | - Muheng Shang
- Northwestern Polytechnical University, Xi'an, 710072, China
| | - Lei Guo
- Northwestern Polytechnical University, Xi'an, 710072, China
| | - Junwei Han
- Northwestern Polytechnical University, Xi'an, 710072, China
| | - Lei Du
- Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
2
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
3
|
Dubovik T, Lukačišin M, Starosvetsky E, LeRoy B, Normand R, Admon Y, Alpert A, Ofran Y, G'Sell M, Shen-Orr SS. Interactions between immune cell types facilitate the evolution of immune traits. Nature 2024; 632:350-356. [PMID: 38866051 PMCID: PMC11306095 DOI: 10.1038/s41586-024-07661-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 06/04/2024] [Indexed: 06/14/2024]
Abstract
An essential prerequisite for evolution by natural selection is variation among individuals in traits that affect fitness1. The ability of a system to produce selectable variation, known as evolvability2, thus markedly affects the rate of evolution. Although the immune system is among the fastest-evolving components in mammals3, the sources of variation in immune traits remain largely unknown4,5. Here we show that an important determinant of the immune system's evolvability is its organization into interacting modules represented by different immune cell types. By profiling immune cell variation in bone marrow of 54 genetically diverse mouse strains from the Collaborative Cross6, we found that variation in immune cell frequencies is polygenic and that many associated genes are involved in homeostatic balance through cell-intrinsic functions of proliferation, migration and cell death. However, we also found genes associated with the frequency of a particular cell type that are expressed in a different cell type, exerting their effect in what we term cyto-trans. The vertebrate evolutionary record shows that genes associated in cyto-trans have faced weaker negative selection, thus increasing the robustness and hence evolvability2,7,8 of the immune system. This phenomenon is similarly observable in human blood. Our findings suggest that interactions between different components of the immune system provide a phenotypic space in which mutations can produce variation with little detriment, underscoring the role of modularity in the evolution of complex systems9.
Collapse
Affiliation(s)
- Tania Dubovik
- Department of Immunology, Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel
- CytoReason, Tel-Aviv, Israel
| | - Martin Lukačišin
- Department of Immunology, Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel
| | - Elina Starosvetsky
- Department of Immunology, Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel
- CytoReason, Tel-Aviv, Israel
| | - Benjamin LeRoy
- Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA
- Nike, Beaverton, OR, USA
| | - Rachelly Normand
- Department of Immunology, Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel
- Massachusetts General Hospital, Boston, MA, USA
| | - Yasmin Admon
- Department of Immunology, Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel
- CytoReason, Tel-Aviv, Israel
| | - Ayelet Alpert
- Department of Immunology, Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel
- Department of Oncology, Rambam Health Care Campus, Haifa, Israel
| | - Yishai Ofran
- Department of Immunology, Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel
- Department of Haematology and Bone Marrow Transplantation, Rambam Health Care Campus, Haifa, Israel
- Haematology and Bone Marrow Transplantation Department and the Eisenberg R&D Authority, Shaare Zedek Medical Centre, Faculty of Medicine, Hebrew University, Jerusalem, Israel
| | - Max G'Sell
- Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Shai S Shen-Orr
- Department of Immunology, Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel.
| |
Collapse
|
4
|
Deng Q, Song C, Lin S. An adaptive and robust method for multi-trait analysis of genome-wide association studies using summary statistics. Eur J Hum Genet 2024; 32:681-690. [PMID: 37237036 PMCID: PMC11153499 DOI: 10.1038/s41431-023-01389-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 05/01/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with human traits or diseases in the past decade. Nevertheless, much of the heritability of many traits is still unaccounted for. Commonly used single-trait analysis methods are conservative, while multi-trait methods improve statistical power by integrating association evidence across multiple traits. In contrast to individual-level data, GWAS summary statistics are usually publicly available, and thus methods using only summary statistics have greater usage. Although many methods have been developed for joint analysis of multiple traits using summary statistics, there are many issues, including inconsistent performance, computational inefficiency, and numerical problems when considering lots of traits. To address these challenges, we propose a multi-trait adaptive Fisher method for summary statistics (MTAFS), a computationally efficient method with robust power performance. We applied MTAFS to two sets of brain imaging derived phenotypes (IDPs) from the UK Biobank, including a set of 58 Volumetric IDPs and a set of 212 Area IDPs. Through annotation analysis, the underlying genes of the SNPs identified by MTAFS were found to exhibit higher expression and are significantly enriched in brain-related tissues. Together with results from a simulation study, MTAFS shows its advantage over existing multi-trait methods, with robust performance across a range of underlying settings. It controls type 1 error well and can efficiently handle a large number of traits.
Collapse
Affiliation(s)
- Qiaolan Deng
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, USA
- Department of Statistics, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA
| | - Chi Song
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, USA
| | - Shili Lin
- Department of Statistics, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
5
|
Garrido-Martín D, Calvo M, Reverter F, Guigó R. A fast non-parametric test of association for multiple traits. Genome Biol 2023; 24:230. [PMID: 37828616 PMCID: PMC10571397 DOI: 10.1186/s13059-023-03076-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 09/27/2023] [Indexed: 10/14/2023] Open
Abstract
The increasing availability of multidimensional phenotypic data in large cohorts of genotyped individuals requires efficient methods to identify genetic effects on multiple traits. Permutational multivariate analysis of variance (PERMANOVA) offers a powerful non-parametric approach. However, it relies on permutations to assess significance, which hinders the analysis of large datasets. Here, we derive the limiting null distribution of the PERMANOVA test statistic, providing a framework for the fast computation of asymptotic p values. Our asymptotic test presents controlled type I error and high power, often outperforming parametric approaches. We illustrate its applicability in the context of QTL mapping and GWAS.
Collapse
Affiliation(s)
- Diego Garrido-Martín
- Department of Genetics, Microbiology and Statistics, Universitat de Barcelona (UB), Av. Diagonal 643, Barcelona, 08028, Spain.
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Catalonia, Spain.
| | - Miquel Calvo
- Department of Genetics, Microbiology and Statistics, Universitat de Barcelona (UB), Av. Diagonal 643, Barcelona, 08028, Spain
| | - Ferran Reverter
- Department of Genetics, Microbiology and Statistics, Universitat de Barcelona (UB), Av. Diagonal 643, Barcelona, 08028, Spain
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| |
Collapse
|
6
|
Kim K, Jun TH, Ha BK, Wang S, Sun H. New statistical selection method for pleiotropic variants associated with both quantitative and qualitative traits. BMC Bioinformatics 2023; 24:381. [PMID: 37817069 PMCID: PMC10563219 DOI: 10.1186/s12859-023-05505-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 09/28/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND Identification of pleiotropic variants associated with multiple phenotypic traits has received increasing attention in genetic association studies. Overlapping genetic associations from multiple traits help to detect weak genetic associations missed by single-trait analyses. Many statistical methods were developed to identify pleiotropic variants with most of them being limited to quantitative traits when pleiotropic effects on both quantitative and qualitative traits have been observed. This is a statistically challenging problem because there does not exist an appropriate multivariate distribution to model both quantitative and qualitative data together. Alternatively, meta-analysis methods can be applied, which basically integrate summary statistics of individual variants associated with either a quantitative or a qualitative trait without accounting for correlations among genetic variants. RESULTS We propose a new statistical selection method based on a unified selection score quantifying how a genetic variant, i.e., a pleiotropic variant associates with both quantitative and qualitative traits. In our extensive simulation studies where various types of pleiotropic effects on both quantitative and qualitative traits were considered, we demonstrated that the proposed method outperforms the existing meta-analysis methods in terms of true positive selection. We also applied the proposed method to a peanut dataset with 6 quantitative and 2 qualitative traits, and a cowpea dataset with 2 quantitative and 6 qualitative traits. We were able to detect some potentially pleiotropic variants missed by the existing methods in both analyses. CONCLUSIONS The proposed method is able to locate pleiotropic variants associated with both quantitative and qualitative traits. It has been implemented into an R package 'UNISS', which can be downloaded from http://github.com/statpng/uniss.
Collapse
Affiliation(s)
- Kipoong Kim
- Department of Statistic, Pusan National University, 46241, Busan, Korea
| | - Tae-Hwan Jun
- Department of Plant Bioscience, Pusan National University, 50463, Miryang, Korea
| | - Bo-Keun Ha
- Department of Applied Plant Science, Chonnam National University, 61186, Gwangju, Korea
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, 10032, USA
| | - Hokeun Sun
- Department of Statistic, Pusan National University, 46241, Busan, Korea.
| |
Collapse
|
7
|
Zhai S, Guo B, Wu B, Mehrotra DV, Shen J. Integrating multiple traits for improving polygenic risk prediction in disease and pharmacogenomics GWAS. Brief Bioinform 2023:7169140. [PMID: 37200155 DOI: 10.1093/bib/bbad181] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/30/2023] [Accepted: 04/21/2023] [Indexed: 05/20/2023] Open
Abstract
Polygenic risk score (PRS) has been recently developed for predicting complex traits and drug responses. It remains unknown whether multi-trait PRS (mtPRS) methods, by integrating information from multiple genetically correlated traits, can improve prediction accuracy and power for PRS analysis compared with single-trait PRS (stPRS) methods. In this paper, we first review commonly used mtPRS methods and find that they do not directly model the underlying genetic correlations among traits, which has been shown to be useful in guiding multi-trait association analysis in the literature. To overcome this limitation, we propose a mtPRS-PCA method to combine PRSs from multiple traits with weights obtained from performing principal component analysis (PCA) on the genetic correlation matrix. To accommodate various genetic architectures covering different effect directions, signal sparseness and across-trait correlation structures, we further propose an omnibus mtPRS method (mtPRS-O) by combining P values from mtPRS-PCA, mtPRS-ML (mtPRS based on machine learning) and stPRSs using Cauchy Combination Test. Our extensive simulation studies show that mtPRS-PCA outperforms other mtPRS methods in both disease and pharmacogenomics (PGx) genome-wide association studies (GWAS) contexts when traits are similarly correlated, with dense signal effects and in similar effect directions, and mtPRS-O is consistently superior to most other methods due to its robustness under various genetic architectures. We further apply mtPRS-PCA, mtPRS-O and other methods to PGx GWAS data from a randomized clinical trial in the cardiovascular domain and demonstrate performance improvement of mtPRS-PCA in both prediction accuracy and patient stratification as well as the robustness of mtPRS-O in PRS association test.
Collapse
Affiliation(s)
- Song Zhai
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Bin Guo
- Data and Genome Science, Merck & Co., Inc., Cambridge, MA 02141, USA
| | - Baolin Wu
- Department of Epidemiology and Biostatistics, University of California Irvine, Irvine, CA 92697, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
8
|
Wei Q, Chen L, Zhou Y, Wang H. An adaptive test based on principal components for detecting multiple phenotype associations using GWAS summary data. Genetica 2023; 151:97-104. [PMID: 36656460 DOI: 10.1007/s10709-023-00179-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 01/11/2023] [Indexed: 01/20/2023]
Abstract
Extensive evidence from genome-wide association studies (GWAS) has shown that jointly analyzing multiple phenotypes can improve the power of the association test compared to the traditional single variant versus single trait approach. Here we propose an adaptive test based on principal components (ATPC) that is powerful and efficient for discovering the association between a single variant and multiple traits. Our method only needs GWAS summary statistics that are often available. We first estimate the trait correlation matrix by LD score regression. Then, based on the correlation matrix, we construct a series of test statistics that contain different numbers of principal components. The ultimate test statistic combines the P values of these principal component-based statistics by using the aggregated Cauchy association test. The analytical P-value of the test statistic can be computed quickly without the permutation process, which is the notable feature of our proposed method. The extensive simulation studies demonstrate that ATPC can control the type I error rates and have powerful and robust performance compared to several existing tests in a wide range of simulation settings. The analysis of the lipids GWAS summary data from the Global Lipids Genetics Consortium shows that ATPC identifies 230 new SNPs that are missed by the original single trait association analysis. By searching the GWAS Catalog, some SNPs and mapped genes identified by ATPC are reported to be associated with lipid traits. Through further analysis for GWAS results, we also find some Gene Ontology terms and biological pathways related to lipids.
Collapse
Affiliation(s)
- Qianran Wei
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, 150080, China
| | - Lili Chen
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, 150080, China.
| | - Yajing Zhou
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, 150080, China
| | - Huiyi Wang
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin, 150080, China
| |
Collapse
|
9
|
Du L, Zhang J, Zhao Y, Shang M, Guo L, Han J. inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:396-413. [PMID: 37442417 PMCID: PMC10634656 DOI: 10.1016/j.gpb.2023.03.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 01/29/2023] [Accepted: 03/14/2023] [Indexed: 07/15/2023]
Abstract
Identifying genetic risk factors for Alzheimer's disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case-control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of cross-endophenotype (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, i.e., pairwise endophenotype correlation-guided MTSCCA (pcMTSCCA) and high-order endophenotype correlation-guided MTSCCA (hocMTSCCA). pcMTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. hocMTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared pcMTSCCA and hocMTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using multi-omic endophenotypes and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at https://ngdc.cncb.ac.cn/biocode/tools/BT007330.
Collapse
Affiliation(s)
- Lei Du
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Jin Zhang
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Ying Zhao
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Muheng Shang
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lei Guo
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Junwei Han
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
10
|
A clustering linear combination method for multiple phenotype association studies based on GWAS summary statistics. Sci Rep 2023; 13:3389. [PMID: 36854754 PMCID: PMC9975197 DOI: 10.1038/s41598-023-30415-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 02/22/2023] [Indexed: 03/02/2023] Open
Abstract
There is strong evidence showing that joint analysis of multiple phenotypes in genome-wide association studies (GWAS) can increase statistical power when detecting the association between genetic variants and human complex diseases. We previously developed the Clustering Linear Combination (CLC) method and a computationally efficient CLC (ceCLC) method to test the association between multiple phenotypes and a genetic variant, which perform very well. However, both of these methods require individual-level genotypes and phenotypes that are often not easily accessible. In this research, we develop a novel method called sCLC for association studies of multiple phenotypes and a genetic variant based on GWAS summary statistics. We use the LD score regression to estimate the correlation matrix among phenotypes. The test statistic of sCLC is constructed by GWAS summary statistics and has an approximate Cauchy distribution. We perform a variety of simulation studies and compare sCLC with other commonly used methods for multiple phenotype association studies using GWAS summary statistics. Simulation results show that sCLC can control Type I error rates well and has the highest power in most scenarios. Moreover, we apply the newly developed method to the UK Biobank GWAS summary statistics from the XIII category with 70 related musculoskeletal system and connective tissue phenotypes. The results demonstrate that sCLC detects the most number of significant SNPs, and most of these identified SNPs can be matched to genes that have been reported in the GWAS catalog to be associated with those phenotypes. Furthermore, sCLC also identifies some novel signals that were missed by standard GWAS, which provide new insight into the potential genetic factors of the musculoskeletal system and connective tissue phenotypes.
Collapse
|
11
|
Cappa EP, Chen C, Klutsch JG, Sebastian-Azcona J, Ratcliffe B, Wei X, Da Ros L, Ullah A, Liu Y, Benowicz A, Sadoway S, Mansfield SD, Erbilgin N, Thomas BR, El-Kassaby YA. Multiple-trait analyses improved the accuracy of genomic prediction and the power of genome-wide association of productivity and climate change-adaptive traits in lodgepole pine. BMC Genomics 2022; 23:536. [PMID: 35870886 PMCID: PMC9308220 DOI: 10.1186/s12864-022-08747-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 07/08/2022] [Indexed: 11/10/2022] Open
Abstract
Background Genomic prediction (GP) and genome-wide association (GWA) analyses are currently being employed to accelerate breeding cycles and to identify alleles or genomic regions of complex traits in forest trees species. Here, 1490 interior lodgepole pine (Pinus contorta Dougl. ex. Loud. var. latifolia Engelm) trees from four open-pollinated progeny trials were genotyped with 25,099 SNPs, and phenotyped for 15 growth, wood quality, pest resistance, drought tolerance, and defense chemical (monoterpenes) traits. The main objectives of this study were to: (1) identify genetic markers associated with these traits and determine their genetic architecture, and to compare the marker detected by single- (ST) and multiple-trait (MT) GWA models; (2) evaluate and compare the accuracy and control of bias of the genomic predictions for these traits underlying different ST and MT parametric and non-parametric GP methods. GWA, ST and MT analyses were compared using a linear transformation of genomic breeding values from the respective genomic best linear unbiased prediction (GBLUP) model. GP, ST and MT parametric and non-parametric (Reproducing Kernel Hilbert Spaces, RKHS) models were compared in terms of prediction accuracy (PA) and control of bias. Results MT-GWA analyses identified more significant associations than ST. Some SNPs showed potential pleiotropic effects. Averaging across traits, PA from the studied ST-GP models did not differ significantly from each other, with generally a slight superiority of the RKHS method. MT-GP models showed significantly higher PA (and lower bias) than the ST models, being generally the PA (bias) of the RKHS approach significantly higher (lower) than the GBLUP. Conclusions The power of GWA and the accuracy of GP were improved when MT models were used in this lodgepole pine population. Given the number of GP and GWA models fitted and the traits assessed across four progeny trials, this work has produced the most comprehensive empirical genomic study across any lodgepole pine population to date. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08747-7.
Collapse
|
12
|
Single trait versus principal component based association analysis for flowering related traits in pigeonpea. Sci Rep 2022; 12:10453. [PMID: 35729192 PMCID: PMC9211048 DOI: 10.1038/s41598-022-14568-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
Pigeonpea, a tropical photosensitive crop, harbors significant diversity for days to flowering, but little is known about the genes that govern these differences. Our goal in the current study was to use genome wide association strategy to discover the loci that regulate days to flowering in pigeonpea. A single trait as well as a principal component based association study was conducted on a diverse collection of 142 pigeonpea lines for days to first and fifty percent of flowering over 3 years, besides plant height and number of seeds per pod. The analysis used seven association mapping models (GLM, MLM, MLMM, CMLM, EMLM, FarmCPU and SUPER) and further comparison revealed that FarmCPU is more robust in controlling both false positives and negatives as it incorporates multiple markers as covariates to eliminate confounding between testing marker and kinship. Cumulatively, a set of 22 SNPs were found to be associated with either days to first flowering (DOF), days to fifty percent flowering (DFF) or both, of which 15 were unique to trait based, 4 to PC based GWAS while 3 were shared by both. Because PC1 represents DOF, DFF and plant height (PH), four SNPs found associated to PC1 can be inferred as pleiotropic. A window of ± 2 kb of associated SNPs was aligned with available transcriptome data generated for transition from vegetative to reproductive phase in pigeonpea. Annotation analysis of these regions revealed presence of genes which might be involved in floral induction like Cytochrome p450 like Tata box binding protein, Auxin response factors, Pin like genes, F box protein, U box domain protein, chromatin remodelling complex protein, RNA methyltransferase. In summary, it appears that auxin responsive genes could be involved in regulating DOF and DFF as majority of the associated loci contained genes which are component of auxin signaling pathways in their vicinity. Overall, our findings indicates that the use of principal component analysis in GWAS is statistically more robust in terms of identifying genes and FarmCPU is a better choice compared to the other aforementioned models in dealing with both false positive and negative associations and thus can be used for traits with complex inheritance.
Collapse
|
13
|
Vilor-Tejedor N, Garrido-Martín D, Rodriguez-Fernandez B, Lamballais S, Guigó R, Gispert JD. Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO! Comput Struct Biotechnol J 2021; 19:5800-5810. [PMID: 34765095 PMCID: PMC8567328 DOI: 10.1016/j.csbj.2021.10.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 10/08/2021] [Accepted: 10/12/2021] [Indexed: 12/01/2022] Open
Abstract
Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neuroimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimaging data.
Collapse
Affiliation(s)
- Natalia Vilor-Tejedor
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Diego Garrido-Martín
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
| | | | - Sander Lamballais
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan Domingo Gispert
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
- IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
- Centro de Investigación Biomédica en Red Bioingeniería, Biomateriales y Nanomedicina, Madrid, Spain
| |
Collapse
|
14
|
Wolf JM, Westra J, Tintle N. Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates. Front Genet 2021; 12:745901. [PMID: 34712269 PMCID: PMC8546319 DOI: 10.3389/fgene.2021.745901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 09/23/2021] [Indexed: 12/03/2022] Open
Abstract
While the promise of electronic medical record and biobank data is large, major questions remain about patient privacy, computational hurdles, and data access. One promising area of recent development is pre-computing non-individually identifiable summary statistics to be made publicly available for exploration and downstream analysis. In this manuscript we demonstrate how to utilize pre-computed linear association statistics between individual genetic variants and phenotypes to infer genetic relationships between products of phenotypes (e.g., ratios; logical combinations of binary phenotypes using "and" and "or") with customized covariate choices. We propose a method to approximate covariate adjusted linear models for products and logical combinations of phenotypes using only pre-computed summary statistics. We evaluate our method's accuracy through several simulation studies and an application modeling ratios of fatty acids using data from the Framingham Heart Study. These studies show consistent ability to recapitulate analysis results performed on individual level data including maintenance of the Type I error rate, power, and effect size estimates. An implementation of this proposed method is available in the publicly available R package pcsstools.
Collapse
Affiliation(s)
- Jack M. Wolf
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States
| | - Jason Westra
- Department of Mathematics, Computer Science, and Statistics, Dordt University, Sioux Center, IA, United States
| | - Nathan Tintle
- Department of Mathematics, Computer Science, and Statistics, Dordt University, Sioux Center, IA, United States
- Department of Population Health Nursing Science, College of Nursing, University of Illinois Chicago, Chicago, IL, United States
| |
Collapse
|
15
|
Julienne H, Laville V, McCaw ZR, He Z, Guillemot V, Lasry C, Ziyatdinov A, Nerin C, Vaysse A, Lechat P, Ménager H, Le Goff W, Dube MP, Kraft P, Ionita-Laza I, Vilhjálmsson BJ, Aschard H. Multitrait GWAS to connect disease variants and biological mechanisms. PLoS Genet 2021; 17:e1009713. [PMID: 34460823 PMCID: PMC8437297 DOI: 10.1371/journal.pgen.1009713] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 09/13/2021] [Accepted: 07/12/2021] [Indexed: 12/30/2022] Open
Abstract
Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.
Collapse
Affiliation(s)
- Hanna Julienne
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Vincent Laville
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Zachary R. McCaw
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, California, United States of America
| | - Vincent Guillemot
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Carla Lasry
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Andrey Ziyatdinov
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Cyril Nerin
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Amaury Vaysse
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Pierre Lechat
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Hervé Ménager
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Wilfried Le Goff
- Sorbonne Université, INSERM, Institute of Cardiometabolism and Nutrition (ICAN), UMR_S 1166, Paris, France
| | - Marie-Pierre Dube
- Université de Montréal Beaulieu-Saucier Pharmacogenomics Centre, Montreal Heart Institute, Montreal, Canada
- Université de Montréal, Faculty of Medicine, Department of medicine, Université de Montréal, Montreal, Canada
| | - Peter Kraft
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Columbia University, New York, New York, United States of America
| | - Bjarni J. Vilhjálmsson
- National Centre for Register-based Research, Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Paris, France
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
16
|
Wu C. Multi-trait Genome-Wide Analyses of the Brain Imaging Phenotypes in UK Biobank. Genetics 2020; 215:947-958. [PMID: 32540950 PMCID: PMC7404235 DOI: 10.1534/genetics.120.303242] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 06/09/2020] [Indexed: 01/08/2023] Open
Abstract
Many genetic variants identified in genome-wide association studies (GWAS) are associated with multiple, sometimes seemingly unrelated, traits. This motivates multi-trait association analyses, which have successfully identified novel associated loci for many complex diseases. While appealing, most existing methods focus on analyzing a relatively small number of traits, and may yield inflated Type 1 error rates when a large number of traits need to be analyzed jointly. As deep phenotyping data are becoming rapidly available, we develop a novel method, referred to as aMAT (adaptive multi-trait association test), for multi-trait analysis of any number of traits. We applied aMAT to GWAS summary statistics for a set of 58 volumetric imaging derived phenotypes from the UK Biobank. aMAT had a genomic inflation factor of 1.04, indicating the Type 1 error rate was well controlled. More important, aMAT identified 24 distinct risk loci, 13 of which were ignored by standard GWAS. In comparison, the competing methods either had a suspicious genomic inflation factor or identified much fewer risk loci. Finally, four additional sets of traits have been analyzed and provided similar conclusions.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University, Tallahassee, Florida 32306
| |
Collapse
|
17
|
Nguyen TH, Dobbyn A, Brown RC, Riley BP, Buxbaum JD, Pinto D, Purcell SM, Sullivan PF, He X, Stahl EA. mTADA is a framework for identifying risk genes from de novo mutations in multiple traits. Nat Commun 2020; 11:2929. [PMID: 32522981 PMCID: PMC7287090 DOI: 10.1038/s41467-020-16487-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2018] [Accepted: 05/06/2020] [Indexed: 11/12/2022] Open
Abstract
Joint analysis of multiple traits can result in the identification of associations not found through the analysis of each trait in isolation. Studies of neuropsychiatric disorders and congenital heart disease (CHD) which use de novo mutations (DNMs) from parent-offspring trios have reported multiple putatively causal genes. However, a joint analysis method designed to integrate DNMs from multiple studies has yet to be implemented. We here introduce multiple-trait TADA (mTADA) which jointly analyzes two traits using DNMs from non-overlapping family samples. We first demonstrate that mTADA is able to leverage genetic overlaps to increase the statistical power of risk-gene identification. We then apply mTADA to large datasets of >13,000 trios for five neuropsychiatric disorders and CHD. We report additional risk genes for schizophrenia, epileptic encephalopathies and CHD. We outline some shared and specific biological information of intellectual disability and CHD by conducting systems biology analyses of genes prioritized by mTADA.
Collapse
Affiliation(s)
- Tan-Hoang Nguyen
- Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA.
| | - Amanda Dobbyn
- Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ruth C Brown
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Brien P Riley
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Joseph D Buxbaum
- Seaver Autism Center, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Dalila Pinto
- Seaver Autism Center, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Mindich Child Health & Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Shaun M Purcell
- Sleep Center, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Patrick F Sullivan
- Departments of Genetics and Psychiatry, University of North Carolina, Chapel Hill, NC, USA
| | - Xin He
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Grossman Institute for Neuroscience, Quantitative Biology and Human Behavior, University of Chicago, Chicago, IL, USA.
| | - Eli A Stahl
- Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
18
|
Lin J, Tabassum R, Ripatti S, Pirinen M. MetaPhat: Detecting and Decomposing Multivariate Associations From Univariate Genome-Wide Association Statistics. Front Genet 2020; 11:431. [PMID: 32499813 PMCID: PMC7242752 DOI: 10.3389/fgene.2020.00431] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 04/07/2020] [Indexed: 11/21/2022] Open
Abstract
Background Multivariate testing tools that integrate multiple genome-wide association studies (GWAS) have become important as the number of phenotypes gathered from study cohorts and biobanks has increased. While these tools have been shown to boost statistical power considerably over univariate tests, an important remaining challenge is to interpret which traits are driving the multivariate association and which traits are just passengers with minor contributions to the genotype-phenotypes association statistic. Results We introduce MetaPhat, a novel bioinformatics tool to conduct GWAS of multiple correlated traits using univariate GWAS results and to decompose multivariate associations into sets of central traits based on intuitive trace plots that visualize Bayesian Information Criterion (BIC) and P-value statistics of multivariate association models. We validate MetaPhat with Global Lipids Genetics Consortium GWAS results, and we apply MetaPhat to univariate GWAS results for 21 heritable and correlated polyunsaturated lipid species from 2,045 Finnish samples, detecting seven independent loci associated with a cluster of lipid species. In most cases, we are able to decompose these multivariate associations to only three to five central traits out of all 21 traits included in the analyses. We release MetaPhat as an open source tool written in Python with built-in support for multi-processing, quality control, clumping and intuitive visualizations using the R software. Conclusion MetaPhat efficiently decomposes associations between multivariate phenotypes and genetic variants into smaller sets of central traits and improves the interpretation and specificity of genome-phenome associations. MetaPhat is freely available under the MIT license at: https://sourceforge.net/projects/meta-pheno-association-tracer.
Collapse
Affiliation(s)
- Jake Lin
- Institute for Molecular Medicine Finland FIMM, Helsinki Institute of Life Science HiLIFE, University of Helsinki, Helsinki, Finland
| | - Rubina Tabassum
- Institute for Molecular Medicine Finland FIMM, Helsinki Institute of Life Science HiLIFE, University of Helsinki, Helsinki, Finland
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland FIMM, Helsinki Institute of Life Science HiLIFE, University of Helsinki, Helsinki, Finland.,Public Health, University of Helsinki, Helsinki, Finland.,Broad Institute, Massachusetts Institute of Technology, Harvard University, Cambridge, MA, United States
| | - Matti Pirinen
- Institute for Molecular Medicine Finland FIMM, Helsinki Institute of Life Science HiLIFE, University of Helsinki, Helsinki, Finland.,Public Health, University of Helsinki, Helsinki, Finland.,Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
19
|
Effect of non-normality and low count variants on cross-phenotype association tests in GWAS. Eur J Hum Genet 2019; 28:300-312. [PMID: 31582815 DOI: 10.1038/s41431-019-0514-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 09/01/2019] [Accepted: 09/05/2019] [Indexed: 01/21/2023] Open
Abstract
Many complex human diseases, such as type 2 diabetes, are characterized by multiple underlying traits/phenotypes that have substantially shared genetic architecture. Multivariate analysis of correlated traits has the potential to increase the power of detecting underlying common genetic loci. Several cross-phenotype association methods have been proposed-some require individual-level data on traits and genotypes, while the others require only summary-level data. In this article, we explore whether non-normality of multivariate trait distribution affects the inference from some of the existing multi-trait methods and how that effect is dependent on the allele count of the genetic variant being tested. We find that most of these tests are susceptible to biases that lead to spurious association signals. Even after controlling for confounders that may contribute to non-normality and then applying inverse normal transformation on the residuals of each trait, these tests may have inflated type I errors for variants with low minor allele counts (MACs). A likelihood ratio test of association based on the ordinal regression of individual-level genotype conditional on the traits seems to be the least biased and can maintain type I error when the MAC is reasonably large (e.g., MAC > 30). Application of these methods to publicly available summary statistics of eight amino acid traits on European samples seem to exhibit systematic inflation (especially for variants with low MAC), which is consistent with our findings from simulation experiments.
Collapse
|