Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lee D, Williamson VS, Bigdeli TB, Riley BP, Fanous AH, Vladimirov VI, Bacanu SA. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. ACTA ACUST UNITED AC 2014;31:1176-82. [PMID: 25505091 PMCID: PMC4393522 DOI: 10.1093/bioinformatics/btu816] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 12/07/2014] [Indexed: 01/03/2023]

For:	Lee D, Williamson VS, Bigdeli TB, Riley BP, Fanous AH, Vladimirov VI, Bacanu SA. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. ACTA ACUST UNITED AC 2014;31:1176-82. [PMID: 25505091 PMCID: PMC4393522 DOI: 10.1093/bioinformatics/btu816] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 12/07/2014] [Indexed: 01/03/2023]

Number

Cited by Other Article(s)

Lee D, Bacanu SA. GAUSS: a summary-statistics-based R package for accurate estimation of linkage disequilibrium for variants, Gaussian imputation, and TWAS analysis of cosmopolitan cohorts. Bioinformatics 2024;40:btae203. [PMID: 38632050 PMCID: PMC11052653 DOI: 10.1093/bioinformatics/btae203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 02/25/2024] [Accepted: 04/16/2024] [Indexed: 04/19/2024] Open

Abstract

MOTIVATION

As the availability of larger and more ethnically diverse reference panels grows, there is an increase in demand for ancestry-informed imputation of genome-wide association studies (GWAS), and other downstream analyses, e.g. fine-mapping. Performing such analyses at the genotype level is computationally challenging and necessitates, at best, a laborious process to access individual-level genotype and phenotype data. Summary-statistics-based tools, not requiring individual-level data, provide an efficient alternative that streamlines computational requirements and promotes open science by simplifying the re-analysis and downstream analysis of existing GWAS summary data. However, existing tools perform only disparate parts of needed analysis, have only command-line interfaces, and are difficult to extend/link by applied researchers.

RESULTS

To address these challenges, we present Genome Analysis Using Summary Statistics (GAUSS)-a comprehensive and user-friendly R package designed to facilitate the re-analysis/downstream analysis of GWAS summary statistics. GAUSS offers an integrated toolkit for a range of functionalities, including (i) estimating ancestry proportion of study cohorts, (ii) calculating ancestry-informed linkage disequilibrium, (iii) imputing summary statistics of unobserved variants, (iv) conducting transcriptome-wide association studies, and (v) correcting for "Winner's Curse" biases. Notably, GAUSS utilizes an expansive, multi-ethnic reference panel consisting of 32 953 genomes from 29 ethnic groups. This panel enhances the range and accuracy of imputable variants, including the ability to impute summary statistics of rarer variants. As a result, GAUSS elevates the quality and applicability of existing GWAS analyses without requiring access to subject-level genotypic and phenotypic information.

AVAILABILITY AND IMPLEMENTATION

The GAUSS R package, complete with its source code, is readily accessible to the public via our GitHub repository at https://github.com/statsleelab/gauss. To further assist users, we provided illustrative use-case scenarios that are conveniently found at https://statsleelab.github.io/gauss/, along with a comprehensive user guide detailed in Supplementary Text S1.

Collapse

Moore A, Marks JA, Quach BC, Guo Y, Bierut LJ, Gaddis NC, Hancock DB, Page GP, Johnson EO. Evaluating 17 methods incorporating biological function with GWAS summary statistics to accelerate discovery demonstrates a tradeoff between high sensitivity and high positive predictive value. Commun Biol 2023;6:1199. [PMID: 38001305 PMCID: PMC10673847 DOI: 10.1038/s42003-023-05413-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 10/03/2023] [Indexed: 11/26/2023] Open

Gedik H, Peterson RE, Riley BP, Vladimirov VI, Bacanu SA. Integrative Post-Genome-Wide Association Study Analyses Relevant to Psychiatric Disorders: Imputing Transcriptome and Proteome Signals. Complex Psychiatry 2023;9:130-144. [PMID: 37588130 PMCID: PMC10425719 DOI: 10.1159/000530223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 03/09/2023] [Indexed: 08/18/2023] Open

Abstract

Background

The genome-wide association study (GWAS) is a common tool to identify genetic variants associated with complex traits, including psychiatric disorders (PDs). However, post-GWAS analyses are needed to extend the statistical inference to biologically relevant entities, e.g., genes, proteins, and pathways. To achieve this goal, researchers developed methods that incorporate biologically relevant intermediate molecular phenotypes, such as gene expression and protein abundance, which are posited to mediate the variant-trait association. Transcriptome-wide association study (TWAS) and proteome-wide association study (PWAS) are commonly used methods to test the association between these molecular mediators and the trait.

Summary

In this review, we discuss the most recent developments in TWAS and PWAS. These methods integrate existing "omic" information with the GWAS summary statistics for trait(s) of interest. Specifically, they impute transcript/protein data and test the association between imputed gene expression/protein level with phenotype of interest by using (i) GWAS summary statistics and (ii) reference transcriptomic/proteomic/genomic datasets. TWAS and PWAS are suitable as analysis tools for (i) primary association scan and (ii) fine-mapping to identify potentially causal genes for PDs.

Key Messages

As post-GWAS analyses, TWAS and PWAS have the potential to highlight causal genes for PDs. These prioritized genes could indicate targets for the development of novel drug therapies. For researchers attempting such analyses, we recommend Mendelian randomization tools that use GWAS statistics for both trait and reference datasets, e.g., summary Mendelian randomization (SMR). We base our recommendation on (i) being able to use the same tool for both TWAS and PWAS, (ii) not requiring the pre-computed weights (and thus easier to update for larger reference datasets), and (iii) most larger transcriptome reference datasets are publicly available and easy to transform into a compatible format for SMR analysis.

Collapse

Gazal S, Weissbrod O, Hormozdiari F, Dey KK, Nasser J, Jagadeesh KA, Weiner DJ, Shi H, Fulco CP, O'Connor LJ, Pasaniuc B, Engreitz JM, Price AL. Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nat Genet 2022;54:827-836. [PMID: 35668300 PMCID: PMC9894581 DOI: 10.1038/s41588-022-01087-y] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 04/27/2022] [Indexed: 02/04/2023]

Affiliation(s)

Steven Gazal Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Omer Weissbrod Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Farhad Hormozdiari Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Kushal K Dey Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Joseph Nasser Broad Institute of MIT and Harvard, Cambridge, MA, USA
Karthik A Jagadeesh Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Daniel J Weiner Broad Institute of MIT and Harvard, Cambridge, MA, USA
Huwenbo Shi Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Charles P Fulco Broad Institute of MIT and Harvard, Cambridge, MA, USA Department of Systems Biology, Harvard Medical School, Boston, MA, USA Bristol Myers Squibb, Cambridge, MA, USA
Luke J O'Connor Broad Institute of MIT and Harvard, Cambridge, MA, USA
Bogdan Pasaniuc Departments of Computational Medicine, Human Genetics, Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
Jesse M Engreitz Broad Institute of MIT and Harvard, Cambridge, MA, USA Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA BASE Initiative, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Stanford University School of Medicine, Stanford, CA, USA
Alkes L Price Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA. Broad Institute of MIT and Harvard, Cambridge, MA, USA. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Collapse

Xie Y, Shan N, Zhao H, Hou L. Transcriptome wide association studies: general framework and methods. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-020-0228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Wang L, Xia Y, Chen Y, Dai R, Qiu W, Meng Q, Kuney L, Chen C. Brain Banks Spur New Frontiers in Neuropsychiatric Research and Strategies for Analysis and Validation. GENOMICS, PROTEOMICS & BIOINFORMATICS 2019;17:402-414. [PMID: 31811942 PMCID: PMC6943778 DOI: 10.1016/j.gpb.2019.02.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 02/13/2019] [Accepted: 03/01/2019] [Indexed: 12/27/2022]

Liu Y, Xie J. Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J Am Stat Assoc 2019;115:393-402. [PMID: 33012899 DOI: 10.1080/01621459.2018.1554485] [Citation(s) in RCA: 153] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Zhang H, Wheeler W, Song L, Yu K. Proper joint analysis of summary association statistics requires the adjustment of heterogeneity in SNP coverage pattern. Brief Bioinform 2018;19:1337-1343. [PMID: 28981575 DOI: 10.1093/bib/bbx072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Indexed: 11/12/2022] Open

Rüeger S, McDaid A, Kutalik Z. Evaluation and application of summary statistic imputation to discover new height-associated loci. PLoS Genet 2018;14:e1007371. [PMID: 29782485 PMCID: PMC5983877 DOI: 10.1371/journal.pgen.1007371] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 06/01/2018] [Accepted: 04/18/2018] [Indexed: 12/11/2022] Open

Abstract

As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.

Collapse

Chatzinakos C, Lee D, Webb BT, Vladimirov VI, Kendler KS, Bacanu SA. JEPEGMIX2: improved gene-level joint analysis of eQTLs in cosmopolitan cohorts. Bioinformatics 2018;34:286-288. [PMID: 28968763 PMCID: PMC5860197 DOI: 10.1093/bioinformatics/btx509] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 07/10/2017] [Accepted: 09/13/2017] [Indexed: 12/17/2022] Open

Benner C, Havulinna AS, Järvelin MR, Salomaa V, Ripatti S, Pirinen M. Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies. Am J Hum Genet 2017;101:539-551. [PMID: 28942963 DOI: 10.1016/j.ajhg.2017.08.012] [Citation(s) in RCA: 129] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 08/17/2017] [Indexed: 01/15/2023] Open

Zhu X, Stephens M. BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES. Ann Appl Stat 2017;11:1561-1592. [PMID: 29399241 PMCID: PMC5796536 DOI: 10.1214/17-aoas1046] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2023]

Gordon D, Londono D, Patel P, Kim W, Finch SJ, Heiman GA. An Analytic Solution to the Computation of Power and Sample Size for Genetic Association Studies under a Pleiotropic Mode of Inheritance. Hum Hered 2017;81:194-209. [PMID: 28315880 DOI: 10.1159/000457135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 01/20/2017] [Indexed: 01/14/2023] Open

Abstract

Our motivation here is to calculate the power of 3 statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected) and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend the analytic power and minimum-sample-size-necessary (MSSN) formulas for 2 categorical data-based tests (genotype, linear trend test [LTT]) of genetic association to the pleiotropic model. We further compare the MSSN of the genotype test and the LTT with that of a multivariate ANOVA (Pillai). We approximate the MSSN for statistics by linear models using a factorial design and ANOVA. With ANOVA decomposition, we determine which factors most significantly change the power/MSSN for all statistics. Finally, we determine which test statistics have the smallest MSSN. In this work, MSSN calculations are for 2 traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits. Our key findings are that the genotype test usually has lower MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% vs. top/bottom 10%) have higher sample size requirements. The Pillai test has a much larger MSSN than both the genotype test and the LTT, as a result of sample selection. With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes.

Collapse

Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet 2016;18:117-127. [PMID: 27840428 DOI: 10.1038/nrg.2016.142] [Citation(s) in RCA: 248] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Liu J, Wan X, Ma S, Yang C. EPS: an empirical Bayes approach to integrating pleiotropy and tissue-specific information for prioritizing risk genes. Bioinformatics 2016;32:1856-64. [DOI: 10.1093/bioinformatics/btw081] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 02/05/2016] [Indexed: 12/12/2022] Open

Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, Jansen R, de Geus EJC, Boomsma DI, Wright FA, Sullivan PF, Nikkola E, Alvarez M, Civelek M, Lusis AJ, Lehtimäki T, Raitoharju E, Kähönen M, Seppälä I, Raitakari OT, Kuusisto J, Laakso M, Price AL, Pajukanta P, Pasaniuc B. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet 2016;48:245-52. [PMID: 26854917 DOI: 10.1038/ng.3506] [Citation(s) in RCA: 1151] [Impact Index Per Article: 143.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 01/14/2016] [Indexed: 02/07/2023]

Affiliation(s)

Alexander Gusev Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA
Arthur Ko Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA.,Molecular Biology Institute, University of California, Los Angeles, Los Angeles, California, USA
Huwenbo Shi Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, California, USA
Gaurav Bhatia Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA
Wonil Chung Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
Brenda W J H Penninx Department of Psychiatry, VU University Medical Center, Amsterdam, the Netherlands
Rick Jansen Department of Psychiatry, VU University Medical Center, Amsterdam, the Netherlands
Eco J C de Geus Department of Biological Psychology, VU University, Amsterdam, the Netherlands
Dorret I Boomsma Department of Biological Psychology, VU University, Amsterdam, the Netherlands
Fred A Wright Bioinformatics Research Center, Department of Statistics, Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, USA
Patrick F Sullivan Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA.,Department of Psychiatry, University of North Carolina, Chapel Hill, North Carolina, USA.,Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Elina Nikkola Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
Marcus Alvarez Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
Mete Civelek Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
Aldons J Lusis Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA.,Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
Terho Lehtimäki Department of Clinical Chemistry, Fimlab Laboratories and University of Tampere School of Medicine, Tampere, Finland
Emma Raitoharju Department of Clinical Chemistry, Fimlab Laboratories and University of Tampere School of Medicine, Tampere, Finland
Mika Kähönen Department of Clinical Physiology, Pirkanmaa Hospital District and University of Tampere School of Medicine, Tampere, Finland
Ilkka Seppälä Department of Clinical Chemistry, Fimlab Laboratories and University of Tampere School of Medicine, Tampere, Finland
Olli T Raitakari Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland.,Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland
Johanna Kuusisto Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
Markku Laakso Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
Alkes L Price Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA
Päivi Pajukanta Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA.,Molecular Biology Institute, University of California, Los Angeles, Los Angeles, California, USA
Bogdan Pasaniuc Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA.,Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, California, USA.,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA

Collapse

Lamparter D, Marbach D, Rueedi R, Kutalik Z, Bergmann S. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics. PLoS Comput Biol 2016;12:e1004714. [PMID: 26808494 PMCID: PMC4726509 DOI: 10.1371/journal.pcbi.1004714] [Citation(s) in RCA: 208] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 12/17/2015] [Indexed: 12/17/2022] Open

Abstract

Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries.

Genome-wide association studies (GWAS) typically generate lists of trait- or disease-associated SNPs. Yet, such output sheds little light on the underlying molecular mechanisms and tools are needed to extract biological insight from the results at the SNP level. Pathway analysis tools integrate signals from multiple SNPs at various positions in the genome in order to map associated genomic regions to well-established pathways, i.e., sets of genes known to act in concert. The nature of GWAS association results requires specifically tailored methods for this task. Here, we present Pascal (Pathway scoring algorithm), a tool that allows gene and pathway-level analysis of GWAS association results without the need to access the original genotypic data. Pascal was designed to be fast, accurate and to have high power to detect relevant pathways. We extensively tested our approach on a large collection of real GWAS association results and saw better discovery of confirmed pathways than with other popular methods. We believe that these results together with the ease-of-use of our publicly available software will allow Pascal to become a useful addition to the toolbox of the GWAS community.

Collapse

Lee D, Williamson VS, Bigdeli TB, Riley BP, Webb BT, Fanous AH, Kendler KS, Vladimirov VI, Bacanu SA. JEPEGMIX: gene-level joint analysis of functional SNPs in cosmopolitan cohorts. Bioinformatics 2016;32:295-7. [PMID: 26428293 PMCID: PMC4708106 DOI: 10.1093/bioinformatics/btv567] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 09/01/2015] [Accepted: 09/22/2015] [Indexed: 12/26/2022] Open

Lee D, Bigdeli TB, Williamson VS, Vladimirov VI, Riley BP, Fanous AH, Bacanu SA. DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts. Bioinformatics 2015;31:3099-104. [PMID: 26059716 PMCID: PMC4576696 DOI: 10.1093/bioinformatics/btv348] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Accepted: 05/29/2015] [Indexed: 01/09/2023] Open

Abstract

Motivation: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts.

Results: To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources.

Availability and implementation: DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix.

Contact:dlee4@vcu.edu

Supplementary information:Supplementary Data are available at Bioinformatics online.

Collapse