1
|
Clauw P, Ellis TJ, Liu HJ, Sasaki E. Beyond the Standard GWAS-A Guide for Plant Biologists. PLANT & CELL PHYSIOLOGY 2025; 66:431-443. [PMID: 38988201 PMCID: PMC12085090 DOI: 10.1093/pcp/pcae079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/05/2024] [Accepted: 07/10/2024] [Indexed: 07/12/2024]
Abstract
Classic genome-wide association studies (GWAS) look for associations between individual single-nucleotide polymorphisms (SNPs) and phenotypes of interest. With the rapid progress of high-throughput genotyping and phenotyping technologies, GWAS have become increasingly powerful for detecting genetic determinants and their molecular mechanisms underpinning natural phenotypic variation. However, GWAS frequently yield results with neither expected nor promising loci, nor any significant associations. This is often because associations between SNPs and a single phenotype are confounded, for example with the environment, other traits or complex genetic structures. Such confounding can mask true genotype-phenotype associations, or inflate spurious associations. To address these problems, numerous methods have been developed that go beyond the standard model. Such advanced GWAS models are flexible and can offer improved statistical power for understanding the genetics underlying complex traits. Despite this advantage, these models have not been widely adopted and implemented compared to the standard GWAS approach, partly because this literature is diverse and often technical. In this review, our aim is to provide an overview of the application and the benefits of various advanced GWAS models for handling complex traits and genetic structures, targeting plant biologists who wish to carry out GWAS more effectively.
Collapse
Affiliation(s)
- Pieter Clauw
- Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, Vienna 1030, Austria
| | - Thomas James Ellis
- Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, Vienna 1030, Austria
| | - Hai-Jun Liu
- Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, Vienna 1030, Austria
- Yazhouwan National Laboratory, Sanya 572024, China
| | - Eriko Sasaki
- Faculty of Science, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| |
Collapse
|
2
|
Rudra P, Zhou YH, Nobel A, Wright FA. Control of false discoveries in grouped hypothesis testing for eQTL data. BMC Bioinformatics 2024; 25:147. [PMID: 38605284 PMCID: PMC11007981 DOI: 10.1186/s12859-024-05736-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 03/08/2024] [Indexed: 04/13/2024] Open
Abstract
BACKGROUND Expression quantitative trait locus (eQTL) analysis aims to detect the genetic variants that influence the expression of one or more genes. Gene-level eQTL testing forms a natural grouped-hypothesis testing strategy with clear biological importance. Methods to control family-wise error rate or false discovery rate for group testing have been proposed earlier, but may not be powerful or easily apply to eQTL data, for which certain structured alternatives may be defensible and may enable the researcher to avoid overly conservative approaches. RESULTS In an empirical Bayesian setting, we propose a new method to control the false discovery rate (FDR) for grouped hypotheses. Here, each gene forms a group, with SNPs annotated to the gene corresponding to individual hypotheses. The heterogeneity of effect sizes in different groups is considered by the introduction of a random effects component. Our method, entitled Random Effects model and testing procedure for Group-level FDR control (REG-FDR), assumes a model for alternative hypotheses for the eQTL data and controls the FDR by adaptive thresholding. As a convenient alternate approach, we also propose Z-REG-FDR, an approximate version of REG-FDR, that uses only Z-statistics of association between genotype and expression for each gene-SNP pair. The performance of Z-REG-FDR is evaluated using both simulated and real data. Simulations demonstrate that Z-REG-FDR performs similarly to REG-FDR, but with much improved computational speed. CONCLUSION Our results demonstrate that the Z-REG-FDR method performs favorably compared to other methods in terms of statistical power and control of FDR. It can be of great practical use for grouped hypothesis testing for eQTL analysis or similar problems in statistical genomics due to its fast computation and ability to be fit using only summary data.
Collapse
Affiliation(s)
- Pratyaydipta Rudra
- Department of Statistics, Oklahoma State University, Stillwater, OK, USA.
| | - Yi-Hui Zhou
- Bioinformatics Research Center, Departments of Statistics and Biological Sciences, North Carolina State University, Raleigh, NC, USA
| | - Andrew Nobel
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA
| | - Fred A Wright
- Bioinformatics Research Center, Departments of Statistics and Biological Sciences, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
3
|
Lu Y, Oliva M, Pierce BL, Liu J, Chen LS. Integrative cross-omics and cross-context analysis elucidates molecular links underlying genetic effects on complex traits. Nat Commun 2024; 15:2383. [PMID: 38493154 PMCID: PMC10944527 DOI: 10.1038/s41467-024-46675-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 03/06/2024] [Indexed: 03/18/2024] Open
Abstract
Genetic effects on functionally related 'omic' traits often co-occur in relevant cellular contexts, such as tissues. Motivated by the multi-tissue methylation quantitative trait loci (mQTLs) and expression QTLs (eQTLs) analysis, we propose X-ING (Cross-INtegrative Genomics) for cross-omics and cross-context integrative analysis. X-ING takes as input multiple matrices of association statistics, each obtained from different omics data types across multiple cellular contexts. It models the latent binary association status of each statistic, captures the major association patterns among omics data types and contexts, and outputs the posterior mean and probability for each input statistic. X-ING enables the integration of effects from different omics data with varying effect distributions. In the multi-tissue cis-association analysis, X-ING shows improved detection and replication of mQTLs by integrating eQTL maps. In the trans-association analysis, X-ING reveals an enrichment of trans-associations in many disease/trait-relevant tissues.
Collapse
Affiliation(s)
- Yihao Lu
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| | - Meritxell Oliva
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
- Genomics Research Center, AbbVie, North Chicago, IL, USA
| | - Brandon L Pierce
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| | - Jin Liu
- School of Data Science, The Chinese University of Hong Kong-Shenzhen, Shenzhen, China.
| | - Lin S Chen
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
4
|
Goldmann K, Spiliopoulou A, Iakovliev A, Plant D, Nair N, Cubuk C, McKeigue P, Barnes MR, Barton A, Pitzalis C, Lewis MJ. Expression quantitative trait loci analysis in rheumatoid arthritis identifies tissue specific variants associated with severity and outcome. Ann Rheum Dis 2024; 83:288-299. [PMID: 37979960 PMCID: PMC10894812 DOI: 10.1136/ard-2023-224540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 10/20/2023] [Indexed: 11/20/2023]
Abstract
OBJECTIVE Genome-wide association studies have successfully identified more than 100 loci associated with susceptibility to rheumatoid arthritis (RA). However, our understanding of the functional effects of genetic variants in causing RA and their effects on disease severity and response to treatment remains limited. METHODS In this study, we conducted expression quantitative trait locus (eQTL) analysis to dissect the link between genetic variants and gene expression comparing the disease tissue against blood using RNA-Sequencing of synovial biopsies (n=85) and blood samples (n=51) from treatment-naïve patients with RA from the Pathobiology of Early Arthritis Cohort. RESULTS This identified 898 eQTL genes in synovium and genes loci in blood, with 232 genes in common to both synovium and blood, although notably many eQTL were tissue specific. Examining the HLA region, we uncovered a specific eQTL at HLA-DPB2 with the critical triad of single-nucleotide polymorphisms (SNPs) rs3128921 driving synovial HLA-DPB2 expression, and both rs3128921 and HLA-DPB2 gene expression correlating with clinical severity and increasing probability of the lympho-myeloid pathotype. CONCLUSIONS This analysis highlights the need to explore functional consequences of genetic associations in disease tissue. HLA-DPB2 SNP rs3128921 could potentially be used to stratify patients to more aggressive treatment immediately at diagnosis.
Collapse
Affiliation(s)
- Katriona Goldmann
- Centre for Experimental Medicine & Rheumatology, William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Athina Spiliopoulou
- Centre for Population Health Sciences, Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Andrii Iakovliev
- Centre for Population Health Sciences, Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Darren Plant
- Centre for Genetics and Genomics Versus Arthritis, University of Manchester Centre for Musculoskeletal Research, Manchester, UK
| | - Nisha Nair
- Centre for Genetics and Genomics Versus Arthritis, University of Manchester Centre for Musculoskeletal Research, Manchester, UK
| | - Cankut Cubuk
- Centre for Experimental Medicine & Rheumatology, William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Paul McKeigue
- Centre for Population Health Sciences, Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Michael R Barnes
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Anne Barton
- Centre for Genetics and Genomics Versus Arthritis, University of Manchester Centre for Musculoskeletal Research, Manchester, UK
| | - Costantino Pitzalis
- Centre for Experimental Medicine & Rheumatology, William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Myles J Lewis
- Centre for Experimental Medicine & Rheumatology, William Harvey Research Institute, Queen Mary University of London, London, UK
| |
Collapse
|
5
|
Fang Z, Li G, Li W, Pu X, Xiang D. Distributed eQTL analysis with auxiliary information. J Stat Plan Inference 2024; 228:34-45. [PMID: 38264292 PMCID: PMC10805471 DOI: 10.1016/j.jspi.2023.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Expression quantitative trait locus (eQTL) analysis is a useful tool to identify genetic loci that are associated with gene expression levels. Large collaborative efforts such as the Genotype-Tissue Expression (GTEx) project provide valuable resources for eQTL analysis in different tissues. Most existing methods, however, either focus on one tissue at a time, or analyze multiple tissues to identify eQTLs jointly present in multiple tissues. There is a lack of powerful methods to identify eQTLs in a target tissue while effectively borrowing strength from auxiliary tissues. In this paper, we propose a novel statistical framework to improve the eQTL detection efficacy in the tissue of interest with auxiliary information from other tissues. This framework can enhance the power of the hypothesis test for eQTL effects by incorporating shared and specific effects from multiple tissues into the test statistics. We also devise data-driven and distributed computing approaches for efficient implementation of eQTL detection when the number of tissues is large. Numerical studies in simulation demonstrate the efficacy of the proposed method, and the real data analysis of the GTEx example provides novel insights into eQTL findings in different tissues.
Collapse
Affiliation(s)
- Zhiwen Fang
- KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Gen Li
- Department of Biostatistics, University of Michigan, Ann Arbor, USA
| | - Wendong Li
- School of Statistics and Management, Shanghai Institute of International Finance and Economics, Shanghai University of Finance and Economics, Shanghai, China
| | - Xiaolong Pu
- KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Dongdong Xiang
- KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
| |
Collapse
|
6
|
McCaw ZR, Gaynor SM, Sun R, Lin X. Leveraging a surrogate outcome to improve inference on a partially missing target outcome. Biometrics 2023; 79:1472-1484. [PMID: 35218565 PMCID: PMC11023615 DOI: 10.1111/biom.13629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 12/18/2021] [Accepted: 01/11/2022] [Indexed: 11/30/2022]
Abstract
Sample sizes vary substantially across tissues in the Genotype-Tissue Expression (GTEx) project, where considerably fewer samples are available from certain inaccessible tissues, such as the substantia nigra (SSN), than from accessible tissues, such as blood. This severely limits power for identifying tissue-specific expression quantitative trait loci (eQTL) in undersampled tissues. Here we propose Surrogate Phenotype Regression Analysis (Spray) for leveraging information from a correlated surrogate outcome (eg, expression in blood) to improve inference on a partially missing target outcome (eg, expression in SSN). Rather than regarding the surrogate outcome as a proxy for the target outcome, Spray jointly models the target and surrogate outcomes within a bivariate regression framework. Unobserved values of either outcome are treated as missing data. We describe and implement an expectation conditional maximization algorithm for performing estimation in the presence of bilateral outcome missingness. Spray estimates the same association parameter estimated by standard eQTL mapping and controls the type I error even when the target and surrogate outcomes are truly uncorrelated. We demonstrate analytically and empirically, using simulations and GTEx data, that in comparison with marginally modeling the target outcome, jointly modeling the target and surrogate outcomes increases estimation precision and improves power.
Collapse
Affiliation(s)
- Zachary R. McCaw
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Sheila M. Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Ryan Sun
- Department of Biostatistics, MD Anderson Cancer Center, Houston, TX
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Department of Statistics, Harvard University, Cambridge, MA
| |
Collapse
|
7
|
Cordero RY, Cordero JB, Stiemke AB, Datta LW, Buyske S, Kugathasan S, McGovern DPB, Brant SR, Simpson CL. Trans-ancestry, Bayesian meta-analysis discovers 20 novel risk loci for inflammatory bowel disease in an African American, East Asian and European cohort. Hum Mol Genet 2023; 32:873-882. [PMID: 36308435 PMCID: PMC9941836 DOI: 10.1093/hmg/ddac269] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 10/19/2022] [Accepted: 10/25/2022] [Indexed: 11/14/2022] Open
Abstract
Inflammatory bowel disease (IBD) is an immune-mediated chronic intestinal disorder with major phenotypes: ulcerative colitis (UC) and Crohn's disease (CD). Multiple studies have identified over 240 IBD susceptibility loci. However, most studies have centered on European (EUR) and East Asian (EAS) populations. The prevalence of IBD in non-EUR, including African Americans (AAs), has risen in recent years. Here we present the first attempt to identify loci in AAs using a trans-ancestry Bayesian approach (MANTRA) accounting for heterogeneity between diverse ancestries while allowing for the similarity between closely related populations. We meta-analyzed genome-wide association studies (GWAS) and Immunochip data from a 2015 EUR meta-analysis of 38 155 IBD cases and 48 485 controls and EAS Immunochip study of 2824 IBD cases and 3719 controls, and our recent AA IBD GWAS of 2345 cases and 5002 controls. Across the major IBD phenotypes, we found significant evidence for 92% of 205 loci lead SNPs from the 2015 meta-analysis, but also for three IBD loci only established in latter studies. We detected 20 novel loci, all containing immunity-related genes or genes with other evidence for IBD or immune-mediated disease relevance: PLEKHG5;TNFSFR25 (encoding death receptor 3, receptor for TNFSF15 gene product TL1A), XKR6, ELMO1, BC021024;PI4KB;PSMD4 and APLP1 for IBD; AUTS2, XKR6, OSER1, TET2;AK094561, BCAP29 and APLP1 for CD; and GABBR1;MOG, DQ570892, SPDEF;ILRUN, SMARCE1;CCR7;KRT222;KRT24;KRT25, ANKS1A;TCP11, IL7, LRRC18;WDFY4, XKR6 and TNFSF4 for UC. Our study highlights the value of combining low-powered genomic studies from understudied populations of diverse ancestral backgrounds together with a high-powered study to enable novel locus discovery, including potentially important therapeutic IBD gene targets.
Collapse
Affiliation(s)
- Roberto Y Cordero
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Jennifer B Cordero
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Andrew B Stiemke
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Lisa W Datta
- Meyerhoff Inflammatory Bowel Disease Center, Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Steven Buyske
- Department of Statistics and Biostatistics, Rutgers University, Piscataway, NJ 08854, USA
| | - Subra Kugathasan
- Department of Pediatrics and Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Dermot P B McGovern
- F. Widjaja Foundation Inflammatory Bowel and Immunobiology Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Steven R Brant
- Meyerhoff Inflammatory Bowel Disease Center, Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD 21231, USA
- Rutgers Crohn’s and Colitis Center of New Jersey, Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
- Human Genetics Institute of New Jersey and Department of Genetics, School of Arts and Sciences, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Claire L Simpson
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| |
Collapse
|
8
|
Zhou YH, Gallins PJ, Etheridge AS, Jima D, Scholl E, Wright FA, Innocenti F. A resource for integrated genomic analysis of the human liver. Sci Rep 2022; 12:15151. [PMID: 36071064 PMCID: PMC9452507 DOI: 10.1038/s41598-022-18506-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 08/08/2022] [Indexed: 11/18/2022] Open
Abstract
In this study, we generated whole-transcriptome RNA-Seq from n = 192 genotyped liver samples and used these data with existing data from the GTEx Project (RNA-Seq) and previous liver eQTL (microarray) studies to create an enhanced transcriptomic sequence resource in the human liver. Analyses of genotype-expression associations show pronounced enrichment of associations with genes of drug response. The associations are primarily consistent across the two RNA-Seq datasets, with some modest variation, indicating the importance of obtaining multiple datasets to produce a robust resource. We further used an empirical Bayesian model to compare eQTL patterns in liver and an additional 20 GTEx tissues, finding that MHC genes, and especially class II genes, are enriched for liver-specific eQTL patterns. To illustrate the utility of the resource to augment GWAS analysis with small sample sizes, we developed a novel meta-analysis technique to combine several liver eQTL data sources. We also illustrate its application using a transcriptome-enhanced re-analysis of a study of neutropenia in pancreatic cancer patients. The associations of genotype with liver expression, including splice variation and its genetic associations, are made available in a searchable genome browser.
Collapse
Affiliation(s)
- Yi-Hui Zhou
- Department of Biological Sciences, North Carolina State University, Raleigh NC State University, Raleigh, NC, 27695, USA.
- Bioinformatics Research Center, North Carolina State University, Raleigh NC State University, Raleigh, NC, 27695, USA.
| | - Paul J Gallins
- Bioinformatics Research Center, North Carolina State University, Raleigh NC State University, Raleigh, NC, 27695, USA
| | - Amy S Etheridge
- Division of Pharmacotherapy and Experimental Therapeutics, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Dereje Jima
- Bioinformatics Research Center, North Carolina State University, Raleigh NC State University, Raleigh, NC, 27695, USA
| | - Elizabeth Scholl
- Bioinformatics Research Center, North Carolina State University, Raleigh NC State University, Raleigh, NC, 27695, USA
| | - Fred A Wright
- Department of Biological Sciences, North Carolina State University, Raleigh NC State University, Raleigh, NC, 27695, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh NC State University, Raleigh, NC, 27695, USA
- Department of Statistics, North Carolina State University, Raleigh NC State University, Raleigh, NC, 27695, USA
| | - Federico Innocenti
- Division of Pharmacotherapy and Experimental Therapeutics, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
9
|
Multivariate phenotype analysis enables genome-wide inference of mammalian gene function. PLoS Biol 2022; 20:e3001723. [PMID: 35944064 PMCID: PMC9391051 DOI: 10.1371/journal.pbio.3001723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 08/19/2022] [Accepted: 06/22/2022] [Indexed: 11/23/2022] Open
Abstract
The function of the majority of genes in the human and mouse genomes is unknown. Investigating and illuminating this dark genome is a major challenge for the biomedical sciences. The International Mouse Phenotyping Consortium (IMPC) is addressing this through the generation and broad-based phenotyping of a knockout (KO) mouse line for every protein-coding gene, producing a multidimensional data set that underlies a genome-wide annotation map from genes to phenotypes. Here, we develop a multivariate (MV) statistical approach and apply it to IMPC data comprising 148 phenotypes measured across 4,548 KO lines. There are 4,256 (1.4% of 302,997 observed data measurements) hits called by the univariate (UV) model analysing each phenotype separately, compared to 31,843 (10.5%) hits in the observed data results of the MV model, corresponding to an estimated 7.5-fold increase in power of the MV model relative to the UV model. One key property of the data set is its 55.0% rate of missingness, resulting from quality control filters and incomplete measurement of some KO lines. This raises the question of whether it is possible to infer perturbations at phenotype-gene pairs at which data are not available, i.e., to infer some in vivo effects using statistical analysis rather than experimentation. We demonstrate that, even at missing phenotypes, the MV model can detect perturbations with power comparable to the single-phenotype analysis, thereby filling in the complete gene-phenotype map with good sensitivity. A factor analysis of the MV model's fitted covariance structure identifies 20 clusters of phenotypes, with each cluster tending to be perturbed collectively. These factors cumulatively explain 75% of the KO-induced variation in the data and facilitate biological interpretation of perturbations. We also demonstrate that the MV approach strengthens the correspondence between IMPC phenotypes and existing gene annotation databases. Analysis of a subset of KO lines measured in replicate across multiple laboratories confirms that the MV model increases power with high replicability.
Collapse
|
10
|
Cuomo ASE, Heinen T, Vagiaki D, Horta D, Marioni JC, Stegle O. CellRegMap: a statistical framework for mapping context-specific regulatory variants using scRNA-seq. Mol Syst Biol 2022; 18:e10663. [PMID: 35972065 PMCID: PMC9380406 DOI: 10.15252/msb.202110663] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 06/28/2022] [Accepted: 07/01/2022] [Indexed: 11/11/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) enables characterizing the cellular heterogeneity in human tissues. Recent technological advances have enabled the first population-scale scRNA-seq studies in hundreds of individuals, allowing to assay genetic effects with single-cell resolution. However, existing strategies to analyze these data remain based on principles established for the genetic analysis of bulk RNA-seq. In particular, current methods depend on a priori definitions of discrete cell types, and hence cannot assess allelic effects across subtle cell types and cell states. To address this, we propose the Cell Regulatory Map (CellRegMap), a statistical framework to test for and quantify genetic effects on gene expression in individual cells. CellRegMap provides a principled approach to identify and characterize genotype-context interactions of known eQTL variants using scRNA-seq data. This model-based approach resolves allelic effects across cellular contexts of different granularity, including genetic effects specific to cell subtypes and continuous cell transitions. We validate CellRegMap using simulated data and apply it to previously identified eQTL from two recent studies of differentiating iPSCs, where we uncover hundreds of eQTL displaying heterogeneity of genetic effects across cellular contexts. Finally, we identify fine-grained genetic regulation in neuronal subtypes for eQTL that are colocalized with human disease variants.
Collapse
Affiliation(s)
- Anna S E Cuomo
- European Bioinformatics Institute (EMBL‐EBI)CambridgeUK
- Wellcome Sanger InstituteCambridgeUK
- Present address:
Garvan Institute of Medical ScienceSydneyNSWAustralia
| | - Tobias Heinen
- Division of Computational Genomics and Systems GeneticsGerman Cancer Research Centre (DKFZ)HeidelbergGermany
- European Molecular Biology Laboratory (EMBL)Genome BiologyHeidelbergGermany
- Faculty of Mathematics and Computer ScienceHeidelberg UniversityHeidelbergGermany
| | - Danai Vagiaki
- Division of Computational Genomics and Systems GeneticsGerman Cancer Research Centre (DKFZ)HeidelbergGermany
- European Molecular Biology Laboratory (EMBL)Genome BiologyHeidelbergGermany
- Faculty of BiosciencesHeidelberg UniversityHeidelbergGermany
| | - Danilo Horta
- European Bioinformatics Institute (EMBL‐EBI)CambridgeUK
| | - John C Marioni
- European Bioinformatics Institute (EMBL‐EBI)CambridgeUK
- Wellcome Sanger InstituteCambridgeUK
- Cancer Research UKCambridge InstituteCambridgeUK
| | - Oliver Stegle
- European Bioinformatics Institute (EMBL‐EBI)CambridgeUK
- Wellcome Sanger InstituteCambridgeUK
- Division of Computational Genomics and Systems GeneticsGerman Cancer Research Centre (DKFZ)HeidelbergGermany
- European Molecular Biology Laboratory (EMBL)Genome BiologyHeidelbergGermany
| |
Collapse
|
11
|
Towards the Genetic Architecture of Complex Gene Expression Traits: Challenges and Prospects for eQTL Mapping in Humans. Genes (Basel) 2022; 13:genes13020235. [PMID: 35205280 PMCID: PMC8871770 DOI: 10.3390/genes13020235] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/21/2022] [Accepted: 01/25/2022] [Indexed: 12/10/2022] Open
Abstract
The discovery of expression quantitative trait loci (eQTLs) and their target genes (eGenes) has not only compensated for the limitations of genome-wide association studies for complex phenotypes but has also provided a basis for predicting gene expression. Efforts have been made to develop analytical methods in statistical genetics, a key discipline in eQTL analysis. In particular, mixed model– and deep learning–based analytical methods have been extremely beneficial in mapping eQTLs and predicting gene expression. Nevertheless, we still face many challenges associated with eQTL discovery. Here, we discuss two key aspects of these challenges: 1, the complexity of eTraits with various factors such as polygenicity and epistasis and 2, the voluminous work required for various types of eQTL profiles. The properties and prospects of statistical methods, including the mixed model method, Bayesian inference, the deep learning method, and the integration method, are presented as future directions for eQTL discovery. This review will help expedite the design and use of efficient methods for eQTL discovery and eTrait prediction.
Collapse
|
12
|
Molstad AJ, Sun W, Hsu L. A COVARIANCE-ENHANCED APPROACH TO MULTI-TISSUE JOINT EQTL MAPPING WITH APPLICATION TO TRANSCRIPTOME-WIDE ASSOCIATION STUDIES. Ann Appl Stat 2021; 15:998-1016. [PMID: 34413922 DOI: 10.1214/20-aoas1432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Transcriptome-wide association studies based on genetically predicted gene expression have the potential to identify novel regions associated with various complex traits. It has been shown that incorporating expression quantitative trait loci (eQTLs) corresponding to multiple tissue types can improve power for association studies involving complex etiology. In this article, we propose a new multivariate response linear regression model and method for predicting gene expression in multiple tissues simultaneously. Unlike existing methods for multi-tissue joint eQTL mapping, our approach incorporates tissue-tissue expression correlation, which allows us to more efficiently handle missing expression measurements and more accurately predict gene expression using a weighted summation of eQTL genotypes. We show through simulation studies that our approach performs better than the existing methods in many scenarios. We use our method to estimate eQTL weights for 29 tissues collected by GTEx, and show that our approach significantly improves expression prediction accuracy compared to competitors. Using our eQTL weights, we perform a multi-tissue-based S-MultiXcan [2] transcriptome-wide association study and show that our method leads to more discoveries in novel regions and more discoveries overall than the existing methods. Estimated eQTL weights and code for implementing the method are available for download online at github.com/ajmolstad/MTeQTLResults.
Collapse
|
13
|
Umans BD, Battle A, Gilad Y. Where Are the Disease-Associated eQTLs? Trends Genet 2021; 37:109-124. [PMID: 32912663 PMCID: PMC8162831 DOI: 10.1016/j.tig.2020.08.009] [Citation(s) in RCA: 183] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 08/07/2020] [Accepted: 08/14/2020] [Indexed: 02/07/2023]
Abstract
Most disease-associated variants, although located in putatively regulatory regions, do not have detectable effects on gene expression. One explanation could be that we have not examined gene expression in the cell types or conditions that are most relevant for disease. Even large-scale efforts to study gene expression across tissues are limited to human samples obtained opportunistically or postmortem, mostly from adults. In this review we evaluate recent findings and suggest an alternative strategy, drawing on the dynamic and highly context-specific nature of gene regulation. We discuss new technologies that can extend the standard regulatory mapping framework to more diverse, disease-relevant cell types and states.
Collapse
Affiliation(s)
- Benjamin D Umans
- Department of Medicine, University of Chicago, Chicago, IL, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Yoav Gilad
- Department of Medicine, University of Chicago, Chicago, IL, USA; Department of Human Genetics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
14
|
Yan KK, Zhao H, Wu JT, Pang H. An enhanced machine learning tool for cis-eQTL mapping with regularization and confounder adjustments. Genet Epidemiol 2020; 44:798-810. [PMID: 32700329 PMCID: PMC7875251 DOI: 10.1002/gepi.22341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 07/07/2020] [Accepted: 07/07/2020] [Indexed: 11/07/2022]
Abstract
Many expression quantitative trait loci (eQTL) studies have been conducted to investigate the biological effects of variants in gene regulation. However, these eQTL studies may suffer from low or moderate statistical power and overly conservative false-discovery rate. In practice, most algorithms for eQTL identification do not model the joint effects of multiple genetic variants with weak or moderate influence. Here we present a novel machine-learning algorithm, lasso least-squares kernel machine (LSKM-LASSO) that model the association between multiple genetic variants and phenotypic traits simultaneously with the existence of nongenetic and genetic confounding. With a more general and flexible framework for the estimation of genetic confounding, LSKM-LASSO is able to provide a more accurate evaluation of the joint effects of multiple genetic variants. Our simulations demonstrate that our approach outperforms three state-of-the-art alternatives in terms of eQTL identification and phenotype prediction. We then apply our method to genotype and gene expression data of 11 tissues obtained from the Genotype-Tissue Expression project. Our algorithm was able to identify more genes with eQTL than other algorithms. By incorporating a regularization term and combining it with least-squares kernel machine, LSKM-LASSO provides a powerful tool for eQTL mapping and phenotype prediction.
Collapse
Affiliation(s)
- Kang K. Yan
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, Connecticut
| | - Joseph T. Wu
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Herbert Pang
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
15
|
Quantify and control reproducibility in high-throughput experiments. Nat Methods 2020; 17:1207-1213. [PMID: 33046893 DOI: 10.1038/s41592-020-00978-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 09/14/2020] [Indexed: 11/09/2022]
Abstract
Ensuring reproducibility of results in high-throughput experiments is crucial for biomedical research. Here, we propose a set of computational methods, INTRIGUE, to evaluate and control reproducibility in high-throughput settings. Our approaches are built on a new definition of reproducibility that emphasizes directional consistency when experimental units are assessed with signed effect size estimates. The proposed methods are designed to (1) assess the overall reproducible quality of multiple studies and (2) evaluate reproducibility at the individual experimental unit levels. We demonstrate the proposed methods in detecting unobserved batch effects via simulations. We further illustrate the versatility of the proposed methods in transcriptome-wide association studies: in addition to reproducible quality control, they are also suited to investigating genuine biological heterogeneity. Finally, we discuss the potential extensions of the proposed methods in other vital areas of reproducible research (for example, publication bias and conceptual replications).
Collapse
|
16
|
Zhang Y, Quick C, Yu K, Barbeira A, Luca F, Pique-Regi R, Kyung Im H, Wen X. PTWAS: investigating tissue-relevant causal molecular mechanisms of complex traits using probabilistic TWAS analysis. Genome Biol 2020; 21:232. [PMID: 32912253 PMCID: PMC7488550 DOI: 10.1186/s13059-020-02026-y] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 04/20/2020] [Indexed: 01/02/2023] Open
Abstract
We propose a new computational framework, probabilistic transcriptome-wide association study (PTWAS), to investigate causal relationships between gene expressions and complex traits. PTWAS applies the established principles from instrumental variables analysis and takes advantage of probabilistic eQTL annotations to delineate and tackle the unique challenges arising in TWAS. PTWAS not only confers higher power than the existing methods but also provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type-specific gene-to-trait effects. We illustrate the power of PTWAS by analyzing the eQTL data across 49 tissues from GTEx (v8) and GWAS summary statistics from 114 complex traits.
Collapse
Affiliation(s)
- Yuhua Zhang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Corbin Quick
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Department of Biostatistics, Harvard University, Cambridge, MA, USA
| | - Ketian Yu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Alvaro Barbeira
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
17
|
The statistical practice of the GTEx Project: from single to multiple tissues. QUANTITATIVE BIOLOGY 2020. [DOI: 10.1007/s40484-020-0210-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
18
|
A Multi-Omics Perspective of Quantitative Trait Loci in Precision Medicine. Trends Genet 2020; 36:318-336. [PMID: 32294413 DOI: 10.1016/j.tig.2020.01.009] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 01/05/2020] [Accepted: 01/21/2020] [Indexed: 02/07/2023]
Abstract
Quantitative trait loci (QTL) analysis is an important approach to investigate the effects of genetic variants identified through an increasing number of large-scale, multidimensional 'omics data sets. In this 'big data' era, the research community has identified a significant number of molecular QTLs (molQTLs) and increased our understanding of their effects. Herein, we review multiple categories of molQTLs, including those associated with transcriptome, post-transcriptional regulation, epigenetics, proteomics, metabolomics, and the microbiome. We summarize approaches to identify molQTLs and to infer their causal effects. We further discuss the integrative analysis of molQTLs through a multi-omics perspective. Our review highlights future opportunities to better understand the functional significance of genetic variants and to utilize the discovery of molQTLs in precision medicine.
Collapse
|
19
|
Abstract
Expression quantitative trait loci (eQTL) analysis identifies genetic variants that regulate the expression level of a gene. The genetic regulation may persist or vary in different tissues. When data are available on multiple tissues, it is often desired to borrow information across tissues and conduct an integrative analysis. Here we describe a multi-tissue eQTL analysis procedure, which improves the identification of different types of eQTL and facilitates the assessment of tissue specificity.
Collapse
Affiliation(s)
- Gen Li
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA.
| |
Collapse
|
20
|
Zhuang Y, Wade K, Saba LM, Kechris K. Development of a tissue augmented Bayesian model for expression quantitative trait loci analysis. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2019; 17:122-143. [PMID: 31731343 PMCID: PMC7384761 DOI: 10.3934/mbe.2020007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
Expression quantitative trait loci (eQTL) analyses detect genetic variants (SNPs) associated with RNA expression levels of genes. The conventional eQTL analysis is to perform individual tests for each gene-SNP pair using simple linear regression and to perform the test on each tissue separately ignoring the extensive information known about RNA expression in other tissue(s). Although Bayesian models have been recently developed to improve eQTL prediction on multiple tissues, they are often based on uninformative priors or treat all tissues equally. In this study, we develop a novel tissue augmented Bayesian model for eQTL analysis (TA-eQTL), which takes prior eQTL information from a different tissue into account to better predict eQTL for another tissue. We demonstrate that our modified Bayesian model has comparable performance to several existing methods in terms of sensitivity and specificity using allele-specific expression (ASE) as the gold standard. Furthermore, the tissue augmented Bayesian model improves the power and accuracy for local-eQTL prediction especially when the sample size is small. In summary, TA-eQTL's performance is comparable to existing methods but has additional flexibility to evaluate data from different platforms, can focus prediction on one tissue using only summary statistics from the secondary tissue(s), and provides a closed form solution for estimation.
Collapse
Affiliation(s)
- Yonghua Zhuang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver Anschutz Medical Campus, Mail Stop B119, 13001 E. 17th Place, Aurora, 80045, USA
| | - Kristen Wade
- Human Medical Genetics and Genomics Program, School of Medicine, University of Colorado Denver Anschutz Medical Campus, 80045, Aurora, USA
| | - Laura M. Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Denver Anschutz Medical Campus, 80045, Aurora, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver Anschutz Medical Campus, Mail Stop B119, 13001 E. 17th Place, Aurora, 80045, USA
- Correspondence:, ; Tel: +13037244363, +13037249697
| |
Collapse
|
21
|
Ray EL, Qian J, Brecha R, Reilly MP, Foulkes AS. Stochastic imputation for integrated transcriptome association analysis of a longitudinally measured trait. Stat Methods Med Res 2019; 29:1167-1180. [PMID: 31172883 DOI: 10.1177/0962280219852720] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
The mechanistic pathways linking genetic polymorphisms and complex disease traits remain largely uncharacterized. At the same time, expansive new transcriptome data resources offer unprecedented opportunity to unravel the mechanistic underpinnings of complex disease associations. Two-stage strategies involving conditioning on a single, penalized regression imputation for transcriptome association analysis have been described for cross-sectional traits. In this manuscript, we propose an alternative two-stage approach based on stochastic regression imputation that additionally incorporates error in the predictive model. Application of a bootstrap procedure offers flexibility when a closed form predictive distribution is not available. The two-stage strategy is also generalized to longitudinally measured traits, using a linear mixed effects modeling framework and a composite test statistic to evaluate whether the genetic component of gene-level expression modifies the biomarker trajectory over time. Simulations studies are performed to evaluate relative performance with respect to type-1 error rates, coverage, estimation error, and power under a range of conditions. A case study is presented to investigate the association between whole blood expression for each of five inflammasome genes with inflammatory response over time after endotoxin challenge.
Collapse
Affiliation(s)
- Evan L Ray
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, USA
| | - Jing Qian
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, MA, USA
| | - Regina Brecha
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, USA
| | | | - Andrea S Foulkes
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, USA
| |
Collapse
|
22
|
Xiang D, Zhao SD, Tony Cai T. Signal classification for the integrative analysis of multiple sequences of large-scale multiple tests. J R Stat Soc Series B Stat Methodol 2019. [DOI: 10.1111/rssb.12323] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Dongdong Xiang
- East China Normal University; Shanghai People's Republic of China
| | | | - T. Tony Cai
- University of Pennsylvania; Philadelphia USA
| |
Collapse
|
23
|
Urbut SM, Wang G, Carbonetto P, Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat Genet 2019; 51:187-195. [PMID: 30478440 PMCID: PMC6309609 DOI: 10.1038/s41588-018-0268-8] [Citation(s) in RCA: 227] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 10/01/2018] [Indexed: 11/26/2022]
Abstract
We introduce new statistical methods for analyzing genomic data sets that measure many effects in many conditions (for example, gene expression changes under many treatments). These new methods improve on existing methods by allowing for arbitrary correlations in effect sizes among conditions. This flexible approach increases power, improves effect estimates and allows for more quantitative assessments of effect-size heterogeneity compared to simple shared or condition-specific assessments. We illustrate these features through an analysis of locally acting variants associated with gene expression (cis expression quantitative trait loci (eQTLs)) in 44 human tissues. Our analysis identifies more eQTLs than existing approaches, consistent with improved power. We show that although genetic effects on expression are extensively shared among tissues, effect sizes can still vary greatly among tissues. Some shared eQTLs show stronger effects in subsets of biologically related tissues (for example, brain-related tissues), or in only one tissue (for example, testis). Our methods are widely applicable, computationally tractable for many conditions and available online.
Collapse
Affiliation(s)
- Sarah M Urbut
- Pritzker School of Medicine, Growth & Development Training Program, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Gao Wang
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Statistics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
24
|
Ip HF, Jansen R, Abdellaoui A, Bartels M, Boomsma DI, Nivard MG. Characterizing the Relation Between Expression QTLs and Complex Traits: Exploring the Role of Tissue Specificity. Behav Genet 2018; 48:374-385. [PMID: 30030655 PMCID: PMC6097736 DOI: 10.1007/s10519-018-9914-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 07/04/2018] [Indexed: 01/14/2023]
Abstract
Measurement of gene expression levels and detection of eQTLs (expression quantitative trait loci) are difficult in tissues with limited sample availability, such as the brain. However, eQTL overlap between tissues might be high, which would allow for inference of eQTL functioning in the brain via eQTLs detected in readily accessible tissues, e.g. whole blood. Applying Stratified Linkage Disequilibrium Score Regression (SLDSR), we quantified the enrichment in polygenic signal of blood and brain eQTLs in genome-wide association studies (GWAS) of 11 complex traits. We looked at eQTLs discovered in 44 tissues by the Genotype-Tissue Expression (GTEx) consortium and two other large representative studies, and found no tissue-specific eQTL effects. Next, we integrated the GTEx eQTLs with regions associated with tissue-specific histone modifiers, and interrogated their effect on rheumatoid arthritis and schizophrenia. We observed substantially enriched effects of eQTLs located inside regions bearing modification H3K4me1 on schizophrenia, but not rheumatoid arthritis, and not tissue-specific. Finally, we extracted eQTLs associated with tissue-specific differentially expressed genes and determined their effects on rheumatoid arthritis and schizophrenia, these analysis revealed limited enrichment of eQTLs associated with gene specifically expressed in specific tissues. Our results pointed to strong enrichment of eQTLs in their effect on complex traits, without evidence for tissue-specific effects. Lack of tissue-specificity can be either due to a lack of statistical power or due to the true absence of tissue-specific effects. We conclude that eQTLs are strongly enriched in GWAS signal and that the enrichment is not specific to the eQTL discovery tissue. Until sample sizes for eQTL discovery grow sufficiently large, working with relatively accessible tissues as proxy for eQTL discovery is sensible and restricting lookups for GWAS hits to a specific tissue for which limited samples are available might not be advisable.
Collapse
Affiliation(s)
- Hill F Ip
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
| | - Rick Jansen
- Department of Psychiatry, VU University Medical Center, Amsterdam, The Netherlands
| | - Abdel Abdellaoui
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Meike Bartels
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Dorret I Boomsma
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Michel G Nivard
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
| |
Collapse
|
25
|
Bhalala OG, Nath AP, Inouye M, Sibley CR. Identification of expression quantitative trait loci associated with schizophrenia and affective disorders in normal brain tissue. PLoS Genet 2018; 14:e1007607. [PMID: 30142156 PMCID: PMC6126875 DOI: 10.1371/journal.pgen.1007607] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 09/06/2018] [Accepted: 08/02/2018] [Indexed: 01/12/2023] Open
Abstract
Schizophrenia and the affective disorders, here comprising bipolar disorder and major depressive disorder, are psychiatric illnesses that lead to significant morbidity and mortality worldwide. Whilst understanding of their pathobiology remains limited, large case-control studies have recently identified single nucleotide polymorphisms (SNPs) associated with these disorders. However, discerning the functional effects of these SNPs has been difficult as the associated causal genes are unknown. Here we evaluated whether schizophrenia and affective disorder associated-SNPs are correlated with gene expression within human brain tissue. Specifically, to identify expression quantitative trait loci (eQTLs), we leveraged disorder-associated SNPs identified from 11 genome-wide association studies with gene expression levels in post-mortem, neurologically-normal tissue from two independent human brain tissue expression datasets (UK Brain Expression Consortium (UKBEC) and Genotype-Tissue Expression (GTEx)). Utilizing stringent multi-region meta-analyses, we identified 2,224 cis-eQTLs associated with expression of 40 genes, including 11 non-coding RNAs. One cis-eQTL, rs16969968, results in a functionally disruptive missense mutation in CHRNA5, a schizophrenia-implicated gene. Importantly, comparing across tissues, we find that blood eQTLs capture < 10% of brain cis-eQTLs. Contrastingly, > 30% of brain-associated eQTLs are significant in tibial nerve. This study identifies putatively causal genes whose expression in region-specific tissue may contribute to the risk of schizophrenia and affective disorders.
Collapse
Affiliation(s)
- Oneil G. Bhalala
- Systems Genomics Lab, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- The Royal Melbourne Hospital, Melbourne Health, Parkville, Victoria, Australia
- * E-mail: (OGB); (CRS)
| | - Artika P. Nath
- Systems Genomics Lab, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Department of Microbiology and Immunology, The Peter Doherty Institute, University of Melbourne, Parkville, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Baker Heart & Diabetes Institute, Melbourne, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | | | - Michael Inouye
- Systems Genomics Lab, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Baker Heart & Diabetes Institute, Melbourne, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- Department of Clinical Pathology, The University of Melbourne, Parkville, Victoria, Australia
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- The Alan Turing Institute, British Library, London, United Kingdom
| | - Christopher R. Sibley
- Department of Clinical Pathology, The University of Melbourne, Parkville, Victoria, Australia
- Department of Molecular Neuroscience, University College London Institute of Neurology, Russell Square House, Russell Square, London, United Kingdom
- Department of Medicine, Division of Brain Sciences, Imperial College London, Burlington Danes, London, United Kingdom
- * E-mail: (OGB); (CRS)
| |
Collapse
|
26
|
Scarpa JR, Jiang P, Gao VD, Fitzpatrick K, Millstein J, Olker C, Gotter A, Winrow CJ, Renger JJ, Kasarskis A, Turek FW, Vitaterna MH. Cross-species systems analysis identifies gene networks differentially altered by sleep loss and depression. SCIENCE ADVANCES 2018; 4:eaat1294. [PMID: 30050989 PMCID: PMC6059761 DOI: 10.1126/sciadv.aat1294] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Accepted: 06/18/2018] [Indexed: 06/08/2023]
Abstract
To understand the transcriptomic organization underlying sleep and affective function, we studied a population of (C57BL/6J × 129S1/SvImJ) F2 mice by measuring 283 affective and sleep phenotypes and profiling gene expression across four brain regions. We identified converging molecular bases for sleep and affective phenotypes at both the single-gene and gene-network levels. Using publicly available transcriptomic datasets collected from sleep-deprived mice and patients with major depressive disorder (MDD), we identified three cortical gene networks altered by the sleep/wake state and depression. The network-level actions of sleep loss and depression were opposite to each other, providing a mechanistic basis for the sleep disruptions commonly observed in depression, as well as the reported acute antidepressant effects of sleep deprivation. We highlight one particular network composed of circadian rhythm regulators and neuronal activity-dependent immediate-early genes. The key upstream driver of this network, Arc, may act as a nexus linking sleep and depression. Our data provide mechanistic insights into the role of sleep in affective function and MDD.
Collapse
Affiliation(s)
- Joseph R. Scarpa
- Icahn Institute for Genomics and Multiscale Biology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Peng Jiang
- Center for Sleep and Circadian Biology, Department of Neurobiology, Northwestern University, Evanston, IL 60208, USA
| | - Vance D. Gao
- Center for Sleep and Circadian Biology, Department of Neurobiology, Northwestern University, Evanston, IL 60208, USA
| | - Karrie Fitzpatrick
- Center for Sleep and Circadian Biology, Department of Neurobiology, Northwestern University, Evanston, IL 60208, USA
| | | | - Christopher Olker
- Center for Sleep and Circadian Biology, Department of Neurobiology, Northwestern University, Evanston, IL 60208, USA
| | - Anthony Gotter
- Department of Neuroscience, Merck Research Laboratories, West Point, PA 19486, USA
| | | | - John J. Renger
- Department of Neuroscience, Merck Research Laboratories, West Point, PA 19486, USA
| | - Andrew Kasarskis
- Icahn Institute for Genomics and Multiscale Biology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Fred W. Turek
- Center for Sleep and Circadian Biology, Department of Neurobiology, Northwestern University, Evanston, IL 60208, USA
| | - Martha H. Vitaterna
- Center for Sleep and Circadian Biology, Department of Neurobiology, Northwestern University, Evanston, IL 60208, USA
| |
Collapse
|
27
|
Palowitch J, Shabalin A, Zhou YH, Nobel AB, Wright FA. Estimation of cis-eQTL effect sizes using a log of linear model. Biometrics 2018; 74:616-625. [PMID: 29073327 PMCID: PMC5920774 DOI: 10.1111/biom.12810] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 09/01/2017] [Accepted: 09/01/2017] [Indexed: 11/29/2022]
Abstract
The study of expression Quantitative Trait Loci (eQTL) is an important problem in genomics and biomedicine. While detection (testing) of eQTL associations has been widely studied, less work has been devoted to the estimation of eQTL effect size. To reduce false positives, detection methods frequently rely on linear modeling of rank-based normalized or log-transformed gene expression data. Unfortunately, these approaches do not correspond to the simplest model of eQTL action, and thus yield estimates of eQTL association that can be uninterpretable and inaccurate. In this article, we propose a new, log-of-linear model for eQTL action, termed ACME, that captures allelic contributions to cis-acting eQTLs in an additive fashion, yielding effect size estimates that correspond to a biologically coherent model of cis-eQTLs. We describe a non-linear least-squares algorithm to fit the model by maximum likelihood, and obtain corresponding p-values. We perform careful investigation of the model using a combination of simulated data and data from the Genotype Tissue Expression (GTEx) project. Our results reveal little evidence for dominance effects, a parsimonious result that accords with a simple biological model for allele-specific expression and supports use of the ACME model. We show that Type-I error is well-controlled under our approach in a realistic setting, so that rank-based normalizations are unnecessary. Furthermore, we show that such normalizations can be detrimental to power and estimation accuracy under the proposed model. We then show, through effect size analyses of whole-genome cis-eQTLs in the GTEx data, that using standard normalizations instead of ACME noticeably affects the ranking and sign of estimates.
Collapse
Affiliation(s)
- John Palowitch
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Andrey Shabalin
- Department of Psychiatry, University of Utah, Salt Lake City, Utah 84108, U.S.A
| | - Yi-Hui Zhou
- Bioinformatics Research Center and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, U.S.A
| | - Andrew B Nobel
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Fred A Wright
- Bioinformatics Research Center and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, U.S.A
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, U.S.A
| |
Collapse
|
28
|
Li G, Jima D, Wright FA, Nobel AB. HT-eQTL: integrative expression quantitative trait loci analysis in a large number of human tissues. BMC Bioinformatics 2018. [PMID: 29523079 PMCID: PMC5845327 DOI: 10.1186/s12859-018-2088-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Expression quantitative trait loci (eQTL) analysis identifies genetic markers associated with the expression of a gene. Most existing eQTL analyses and methods investigate association in a single, readily available tissue, such as blood. Joint analysis of eQTL in multiple tissues has the potential to improve, and expand the scope of, single-tissue analyses. Large-scale collaborative efforts such as the Genotype-Tissue Expression (GTEx) program are currently generating high quality data in a large number of tissues. However, computational constraints limit genome-wide multi-tissue eQTL analysis. Results We develop an integrative method under a hierarchical Bayesian framework for eQTL analysis in a large number of tissues. The model fitting procedure is highly scalable, and the computing time is a polynomial function of the number of tissues. Multi-tissue eQTLs are identified through a local false discovery rate approach, which rigorously controls the false discovery rate. Using simulation and GTEx real data studies, we show that the proposed method has superior performance to existing methods in terms of computing time and the power of eQTL discovery. Conclusions We provide a scalable method for eQTL analysis in a large number of tissues. The method enables the identification of eQTL with different configurations and facilitates the characterization of tissue specificity. Electronic supplementary material The online version of this article (10.1186/s12859-018-2088-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gen Li
- Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 W 168 Street, New York, USA.
| | - Dereje Jima
- Center for Human Health and the Environment and Bioinformatics Research Center, North Carolina State University, 850 Main Campus Drive, Raleigh, 27695, USA
| | - Fred A Wright
- Center for Human Health and the Environment and Bioinformatics Research Center, North Carolina State University, 850 Main Campus Drive, Raleigh, 27695, USA.,Department of Statistics and Biological Sciences, North Carolina State University, 2311 Stinson Drive, Raleigh, 27695, USA
| | - Andrew B Nobel
- Department of Statistics and Operations Research and Department of Biostatistics, University of North Carolina at Chapel Hill, 318 E Cameron Avenue, Chapel Hill, 27599, USA
| |
Collapse
|
29
|
O'Brien TD, Jia P, Caporaso NE, Landi MT, Zhao Z. Weak sharing of genetic association signals in three lung cancer subtypes: evidence at the SNP, gene, regulation, and pathway levels. Genome Med 2018; 10:16. [PMID: 29486777 PMCID: PMC5828003 DOI: 10.1186/s13073-018-0522-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Accepted: 02/13/2018] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND There are two main types of lung cancer: small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). NSCLC has many subtypes, but the two most common are lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). These subtypes are mainly classified by physiological and pathological characteristics, although there is increasing evidence of genetic and molecular differences as well. Although some work has been done at the somatic level to explore the genetic and biological differences among subtypes, little work has been done that interrogates these differences at the germline level to characterize the unique and shared susceptibility genes for each subtype. METHODS We used single-nucleotide polymorphisms (SNPs) from a genome-wide association study (GWAS) of European samples to interrogate the similarity of the subtypes at the SNP, gene, pathway, and regulatory levels. We expanded these genotyped SNPs to include all SNPs in linkage disequilibrium (LD) using data from the 1000 Genomes Project. We mapped these SNPs to several lung tissue expression quantitative trait loci (eQTL) and enhancer datasets to identify regulatory SNPs and their target genes. We used these genes to perform a biological pathway analysis for each subtype. RESULTS We identified 8295, 8734, and 8361 SNPs with moderate association signals for LUAD, LUSC, and SCLC, respectively. Those SNPs had p < 1 × 10- 3 in the original GWAS or were within LD (r2 > 0.8, Europeans) to the genotyped SNPs. We identified 215, 320, and 172 disease-associated genes for LUAD, LUSC, and SCLC, respectively. Only five genes (CHRNA5, IDH3A, PSMA4, RP11-650 L12.2, and TBC1D2B) overlapped all subtypes. Furthermore, we observed only two pathways from the Kyoto Encyclopedia of Genes and Genomes shared by all subtypes. At the regulatory level, only three eQTL target genes and two enhancer target genes overlapped between all subtypes. CONCLUSIONS Our results suggest that the three lung cancer subtypes do not share much genetic signal at the SNP, gene, pathway, or regulatory level, which differs from the common subtype classification based upon histology. However, three (CHRNA5, IDH3A, and PSMA4) of the five genes shared between the subtypes are well-known lung cancer genes that may act as general lung cancer genes regardless of subtype.
Collapse
Affiliation(s)
- Timothy D O'Brien
- Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, TN, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA
| | - Neil E Caporaso
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Maria Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Zhongming Zhao
- Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, TN, USA. .,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA. .,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St. Suite 820, Houston, TX, 77030, USA. .,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
30
|
Drag M, Hansen MB, Kadarmideen HN. Systems genomics study reveals expression quantitative trait loci, regulator genes and pathways associated with boar taint in pigs. PLoS One 2018; 13:e0192673. [PMID: 29438444 PMCID: PMC5811030 DOI: 10.1371/journal.pone.0192673] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 01/29/2018] [Indexed: 01/14/2023] Open
Abstract
Boar taint is an offensive odour and/or taste from a proportion of non-castrated male pigs caused by skatole and androstenone accumulation during sexual maturity. Castration is widely used to avoid boar taint but is currently under debate because of animal welfare concerns. This study aimed to identify expression quantitative trait loci (eQTLs) with potential effects on boar taint compounds to improve breeding possibilities for reduced boar taint. Danish Landrace male boars with low, medium and high genetic merit for skatole and human nose score (HNS) were slaughtered at ~100 kg. Gene expression profiles were obtained by RNA-Seq, and genotype data were obtained by an Illumina 60K Porcine SNP chip. Following quality control and filtering, 10,545 and 12,731 genes from liver and testis were included in the eQTL analysis, together with 20,827 SNP variants. A total of 205 and 109 single-tissue eQTLs associated with 102 and 58 unique genes were identified in liver and testis, respectively. By employing a multivariate Bayesian hierarchical model, 26 eQTLs were identified as significant multi-tissue eQTLs. The highest densities of eQTLs were found on pig chromosomes SSC12, SSC1, SSC13, SSC9 and SSC14. Functional characterisation of eQTLs revealed functions within regulation of androgen and the intracellular steroid hormone receptor signalling pathway and of xenobiotic metabolism by cytochrome P450 system and cellular response to oestradiol. A QTL enrichment test revealed 89 QTL traits curated by the Animal Genome PigQTL database to be significantly overlapped by the genomic coordinates of cis-acting eQTLs. Finally, a subset of 35 cis-acting eQTLs overlapped with known boar taint QTL traits. These eQTLs could be useful in the development of a DNA test for boar taint but careful monitoring of other overlapping QTL traits should be performed to avoid any negative consequences of selection.
Collapse
Affiliation(s)
- Markus Drag
- Section of Anatomy, Biochemistry and Physiology, Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg C, Denmark
| | - Mathias B. Hansen
- Section of Anatomy, Biochemistry and Physiology, Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg C, Denmark
| | - Haja N. Kadarmideen
- Section of Anatomy, Biochemistry and Physiology, Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg C, Denmark
- Section of Systems Genomics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, Lyngby, Denmark
- * E-mail:
| |
Collapse
|
31
|
Battle A, Brown CD, Engelhardt BE, Montgomery SB. Genetic effects on gene expression across human tissues. Nature 2017; 550:204-213. [PMID: 29022597 PMCID: PMC5776756 DOI: 10.1038/nature24277] [Citation(s) in RCA: 2744] [Impact Index Per Article: 343.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 09/15/2017] [Indexed: 12/12/2022]
Abstract
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
Collapse
Affiliation(s)
- Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Christopher D Brown
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Barbara E Engelhardt
- Department of Computer Science and Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey 08540, USA
| | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, California 94305, USA
- Department of Pathology, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
32
|
Wen X, Pique-Regi R, Luca F. Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization. PLoS Genet 2017; 13:e1006646. [PMID: 28278150 PMCID: PMC5363995 DOI: 10.1371/journal.pgen.1006646] [Citation(s) in RCA: 160] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Revised: 03/23/2017] [Accepted: 02/21/2017] [Indexed: 01/25/2023] Open
Abstract
We propose a novel statistical framework for integrating the result from molecular quantitative trait loci (QTL) mapping into genome-wide genetic association analysis of complex traits, with the primary objectives of quantitatively assessing the enrichment of the molecular QTLs in complex trait-associated genetic variants and the colocalizations of the two types of association signals. We introduce a natural Bayesian hierarchical model that treats the latent association status of molecular QTLs as SNP-level annotations for candidate SNPs of complex traits. We detail a computational procedure to seamlessly perform enrichment, fine-mapping and colocalization analyses, which is a distinct feature compared to the existing colocalization analysis procedures in the literature. The proposed approach is computationally efficient and requires only summary-level statistics. We evaluate and demonstrate the proposed computational approach through extensive simulation studies and analyses of blood lipid data and the whole blood eQTL data from the GTEx project. In addition, a useful utility from our proposed method enables the computation of expected colocalization signals using simple characteristics of the association data. Using this utility, we further illustrate the importance of enrichment analysis on the ability to discover colocalized signals and the potential limitations of currently available molecular QTL data. The software pipeline that implements the proposed computation procedures, enloc, is freely available at https://github.com/xqwen/integrative. Genome-wide association studies (GWAS) have been tremendously successful in identifying genetic variants that impact complex diseases. However, the roles of such studies in disease etiology remain poorly understood, primarily because a large proportion of the GWAS findings are located in the non-coding region of the genome. Recent advancements in high-throughput sequencing technology enable the systematic investigation of molecular quantitative trait loci (QTLs), which are genetic variants that directly affect molecular phenotypes (e.g., gene expression, transcription factor binding and DNA methylation). Linking molecular QTLs to GWAS findings intuitively represents an important step for interpreting the biological and clinical relevance of the GWAS results. In this paper, we describe a rigorous and efficient computational approach that assesses the enrichment and overlap between the GWAS findings and molecular QTLs. Importantly, we illustrate that the accurate quantification of overlapping between molecular QTL and GWAS signals requires reliable enrichment estimation. Our proposed approach fully accounts for the intrinsic uncertainty embedded in the association analyses of GWAS and molecular QTL mapping, and it outperforms the existing state-of-the-art approaches. Applying the proposed approach to the GWAS data of blood lipid traits and the whole blood expression QTLs (eQTLs) yields some novel biological insights and also illustrates the potential limitations of the currently available molecular QTL data.
Collapse
Affiliation(s)
- Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, United States of America
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, United States of America
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, United States of America
| |
Collapse
|
33
|
Wen X, Luca F, Pique-Regi R. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet 2015; 11:e1005176. [PMID: 25906321 PMCID: PMC4408026 DOI: 10.1371/journal.pgen.1005176] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2014] [Accepted: 03/25/2015] [Indexed: 12/19/2022] Open
Abstract
Mapping expression quantitative trait loci (eQTLs) has been shown as a powerful tool to uncover the genetic underpinnings of many complex traits at molecular level. In this paper, we present an integrative analysis approach that leverages eQTL data collected from multiple population groups. In particular, our approach effectively identifies multiple independent cis-eQTL signals that are consistent across populations, accounting for population heterogeneity in allele frequencies and linkage disequilibrium patterns. Furthermore, by integrating genomic annotations, our analysis framework enables high-resolution functional analysis of eQTLs. We applied our statistical approach to analyze the GEUVADIS data consisting of samples from five population groups. From this analysis, we concluded that i) jointly analysis across population groups greatly improves the power of eQTL discovery and the resolution of fine mapping of causal eQTL ii) many genes harbor multiple independent eQTLs in their cis regions iii) genetic variants that disrupt transcription factor binding are significantly enriched in eQTLs (p-value = 4.93 × 10(-22)).
Collapse
Affiliation(s)
- Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- * E-mail: (XW); (RPR)
| | - Francesca Luca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, USA
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
- Department of Clinical and Translational Sciences, Wayne State University, Detroit, MI, USA
- * E-mail: (XW); (RPR)
| |
Collapse
|
34
|
Lock EF, Soldano KL, Garrett ME, Cope H, Markunas CA, Fuchs H, Grant G, Dunson DB, Gregory SG, Ashley-Koch AE. Joint eQTL assessment of whole blood and dura mater tissue from individuals with Chiari type I malformation. BMC Genomics 2015; 16:11. [PMID: 25609184 PMCID: PMC4342828 DOI: 10.1186/s12864-014-1211-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 12/30/2014] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Expression quantitative trait loci (eQTL) play an important role in the regulation of gene expression. Gene expression levels and eQTLs are expected to vary from tissue to tissue, and therefore multi-tissue analyses are necessary to fully understand complex genetic conditions in humans. Dura mater tissue likely interacts with cranial bone growth and thus may play a role in the etiology of Chiari Type I Malformation (CMI) and related conditions, but it is often inaccessible and its gene expression has not been well studied. A genetic basis to CMI has been established; however, the specific genetic risk factors are not well characterized. RESULTS We present an assessment of eQTLs for whole blood and dura mater tissue from individuals with CMI. A joint-tissue analysis identified 239 eQTLs in either dura or blood, with 79% of these eQTLs shared by both tissues. Several identified eQTLs were novel and these implicate genes involved in bone development (IPO8, XYLT1, and PRKAR1A), and ribosomal pathways related to marrow and bone dysfunction, as potential candidates in the development of CMI. CONCLUSIONS Despite strong overall heterogeneity in expression levels between blood and dura, the majority of cis-eQTLs are shared by both tissues. The power to detect shared eQTLs was improved by using an integrative statistical approach. The identified tissue-specific and shared eQTLs provide new insight into the genetic basis for CMI and related conditions.
Collapse
Affiliation(s)
- Eric F Lock
- Department of Medicine, Duke University Medical Center, Durham, NC, USA.
- Department of Statistical Science, Duke University, Durham, NC, USA.
| | - Karen L Soldano
- Department of Medicine, Duke University Medical Center, Durham, NC, USA.
- Duke Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA.
| | - Melanie E Garrett
- Department of Medicine, Duke University Medical Center, Durham, NC, USA.
- Duke Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA.
| | - Heidi Cope
- Department of Medicine, Duke University Medical Center, Durham, NC, USA.
- Duke Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA.
| | | | - Herbert Fuchs
- Division of Neurosurgery, Department of Surgery, Duke University Medical Center, Durham, NC, USA.
| | - Gerald Grant
- Division of Neurosurgery, Department of Surgery, Duke University Medical Center, Durham, NC, USA.
- Department of Neurosurgery, Stanford University/Lucile Packard Children's Hospital, Stanford, CA, USA.
| | - David B Dunson
- Department of Statistical Science, Duke University, Durham, NC, USA.
| | - Simon G Gregory
- Department of Medicine, Duke University Medical Center, Durham, NC, USA.
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC, USA.
| | - Allison E Ashley-Koch
- Department of Medicine, Duke University Medical Center, Durham, NC, USA.
- Duke Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA.
| |
Collapse
|