1
|
Zou Y, Carbonetto P, Xie D, Wang G, Stephens M. Fast and flexible joint fine-mapping of multiple traits via the Sum of Single Effects model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.14.536893. [PMID: 37425935 PMCID: PMC10327118 DOI: 10.1101/2023.04.14.536893] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
We introduce mvSuSiE, a multi-trait fine-mapping method for identifying putative causal variants from genetic association data (individual-level or summary data). mvSuSiE learns patterns of shared genetic effects from data, and exploits these patterns to improve power to identify causal SNPs. Comparisons on simulated data show that mvSuSiE is competitive in speed, power and precision with existing multi-trait methods, and uniformly improves on single-trait fine-mapping (SuSiE) in each trait separately. We applied mvSuSiE to jointly fine-map 16 blood cell traits using data from the UK Biobank. By jointly analyzing the traits and modeling heterogeneous effect sharing patterns, we discovered a much larger number of causal SNPs (>3,000) compared with single-trait fine-mapping, and with narrower credible sets. mvSuSiE also more comprehensively characterized the ways in which the genetic variants affect one or more blood cell traits; 68% of causal SNPs showed significant effects in more than one blood cell type.
Collapse
Affiliation(s)
- Yuxin Zou
- Department of Statistics, University of Chicago, Chicago, IL, USA
- Regeneron Genetics Center, Regeneron Pharmaceuticals, Inc., Tarrytown, NY, USA
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Dongyue Xie
- Department of Statistics, University of Chicago, Chicago, IL, USA
| | - Gao Wang
- Gertrude. H. Sergievsky Center, Department of Neurology, Columbia University, New York, NY, USA
| | - Matthew Stephens
- Department of Statistics, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
2
|
Scott DAV, Benavente E, Libiseller-Egger J, Fedorov D, Phelan J, Ilina E, Tikhonova P, Kudryavstev A, Galeeva J, Clark T, Lewin A. Bayesian compositional regression with microbiome features via variational inference. BMC Bioinformatics 2023; 24:210. [PMID: 37217852 PMCID: PMC10201722 DOI: 10.1186/s12859-023-05219-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 03/02/2023] [Indexed: 05/24/2023] Open
Abstract
The microbiome plays a key role in the health of the human body. Interest often lies in finding features of the microbiome, alongside other covariates, which are associated with a phenotype of interest. One important property of microbiome data, which is often overlooked, is its compositionality as it can only provide information about the relative abundance of its constituting components. Typically, these proportions vary by several orders of magnitude in datasets of high dimensions. To address these challenges we develop a Bayesian hierarchical linear log-contrast model which is estimated by mean field Monte-Carlo co-ordinate ascent variational inference (CAVI-MC) and easily scales to high dimensional data. We use novel priors which account for the large differences in scale and constrained parameter space associated with the compositional covariates. A reversible jump Monte Carlo Markov chain guided by the data through univariate approximations of the variational posterior probability of inclusion, with proposal parameters informed by approximating variational densities via auxiliary parameters, is used to estimate intractable marginal expectations. We demonstrate that our proposed Bayesian method performs favourably against existing frequentist state of the art compositional data analysis methods. We then apply the CAVI-MC to the analysis of real data exploring the relationship of the gut microbiome to body mass index.
Collapse
Affiliation(s)
- Darren A. V. Scott
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, Keppel Street, London, United Kingdom
| | - Ernest Benavente
- Laboratory of Experimental Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Julian Libiseller-Egger
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, Keppel Street, London, United Kingdom
| | - Dmitry Fedorov
- Federal Research and Clinical Center of Physical-Chemical Medicine, Moscow, Russia
| | - Jody Phelan
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, Keppel Street, London, United Kingdom
| | - Elena Ilina
- Federal Research and Clinical Center of Physical-Chemical Medicine, Moscow, Russia
| | - Polina Tikhonova
- Federal Research and Clinical Center of Physical-Chemical Medicine, Moscow, Russia
- Bioinformatics and Genomics Intercollege Graduate Program, Huck Institutes of Life Sciences, Pennsylvania State University, Pennsylvania, USA
| | | | - Julia Galeeva
- Federal Research and Clinical Center of Physical-Chemical Medicine, Moscow, Russia
| | - Taane Clark
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, Keppel Street, London, United Kingdom
| | - Alex Lewin
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, Keppel Street, London, United Kingdom
| |
Collapse
|
3
|
Bottolo L, Banterle M, Richardson S, Ala-Korpela M, Järvelin MR, Lewin A. A computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional quantitative trait loci discovery. J R Stat Soc Ser C Appl Stat 2021; 70:886-908. [PMID: 35001978 PMCID: PMC7612194 DOI: 10.1111/rssc.12490] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Our work is motivated by the search for metabolite quantitative trait loci (QTL) in a cohort of more than 5000 people. There are 158 metabolites measured by NMR spectroscopy in the 31-year follow-up of the Northern Finland Birth Cohort 1966 (NFBC66). These metabolites, as with many multivariate phenotypes produced by high-throughput biomarker technology, exhibit strong correlation structures. Existing approaches for combining such data with genetic variants for multivariate QTL analysis generally ignore phenotypic correlations or make restrictive assumptions about the associations between phenotypes and genetic loci. We present a computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional data, with cell-sparse variable selection and sparse graphical structure for covariance selection. Cell sparsity allows different phenotype responses to be associated with different genetic predictors and the graphical structure is used to represent the conditional dependencies between phenotype variables. To achieve feasible computation of the large model space, we exploit a factorisation of the covariance matrix. Applying the model to the NFBC66 data with 9000 directly genotyped single nucleotide polymorphisms, we are able to simultaneously estimate genotype-phenotype associations and the residual dependence structure among the metabolites. The R package BayesSUR with full documentation is available at https://cran.r-project.org/web/packages/BayesSUR/.
Collapse
Affiliation(s)
- Leonardo Bottolo
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
- The Alan Turing Institute, London, UK
- MRC Biostatistics Unit, Cambridge, UK
| | - Marco Banterle
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Sylvia Richardson
- The Alan Turing Institute, London, UK
- MRC Biostatistics Unit, Cambridge, UK
| | - Mika Ala-Korpela
- Computational Medicine, Faculty of Medicine, University of Oulu and Biocenter Oulu, Oulu, Finland
- NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland
| | - Marjo-Riitta Järvelin
- Center for Life Course Health Research, University of Oulu, Oulu, Finland
- Biocenter Oulu, University of Oulu, Oulu, Finland
- Department of Epidemiology and Biostatistics, Imperial College London, London, UK
- MRC-PHE Centre for Environment and Health, Imperial College London, London, UK
- Department of Life Sciences, Brunel University London, Uxbridge, UK
| | - Alex Lewin
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
4
|
Witte F, Ruiz-Orera J, Mattioli CC, Blachut S, Adami E, Schulz JF, Schneider-Lunitz V, Hummel O, Patone G, Mücke MB, Šilhavý J, Heinig M, Bottolo L, Sanchis D, Vingron M, Chekulaeva M, Pravenec M, Hubner N, van Heesch S. A trans locus causes a ribosomopathy in hypertrophic hearts that affects mRNA translation in a protein length-dependent fashion. Genome Biol 2021; 22:191. [PMID: 34183069 PMCID: PMC8240307 DOI: 10.1186/s13059-021-02397-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 06/02/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Little is known about the impact of trans-acting genetic variation on the rates with which proteins are synthesized by ribosomes. Here, we investigate the influence of such distant genetic loci on the efficiency of mRNA translation and define their contribution to the development of complex disease phenotypes within a panel of rat recombinant inbred lines. RESULTS We identify several tissue-specific master regulatory hotspots that each control the translation rates of multiple proteins. One of these loci is restricted to hypertrophic hearts, where it drives a translatome-wide and protein length-dependent change in translational efficiency, altering the stoichiometric translation rates of sarcomere proteins. Mechanistic dissection of this locus across multiple congenic lines points to a translation machinery defect, characterized by marked differences in polysome profiles and misregulation of the small nucleolar RNA SNORA48. Strikingly, from yeast to humans, we observe reproducible protein length-dependent shifts in translational efficiency as a conserved hallmark of translation machinery mutants, including those that cause ribosomopathies. Depending on the factor mutated, a pre-existing negative correlation between protein length and translation rates could either be enhanced or reduced, which we propose to result from mRNA-specific imbalances in canonical translation initiation and reinitiation rates. CONCLUSIONS We show that distant genetic control of mRNA translation is abundant in mammalian tissues, exemplified by a single genomic locus that triggers a translation-driven molecular mechanism. Our work illustrates the complexity through which genetic variation can drive phenotypic variability between individuals and thereby contribute to complex disease.
Collapse
Affiliation(s)
- Franziska Witte
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
- Present Address: NUVISAN ICB GmbH, Lead Discovery-Structrual Biology, 13353, Berlin, Germany
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Camilla Ciolli Mattioli
- Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115, Berlin, Germany
- Present Address: Department of Biological Regulation, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - Susanne Blachut
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Eleonora Adami
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
- Present Address: Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore, 169857, Singapore
| | - Jana Felicitas Schulz
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Valentin Schneider-Lunitz
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Oliver Hummel
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Giannino Patone
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Michael Benedikt Mücke
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347, Berlin, Germany
- Charité-Universitätsmedizin, 10117, Berlin, Germany
| | - Jan Šilhavý
- Institute of Physiology of the Czech Academy of Sciences, 4, 142 20, Praha, Czech Republic
| | - Matthias Heinig
- Institute of Computational Biology (ICB), HMGU, Ingolstaedter Landstr. 1, 85764 Neuherberg, Munich, Germany
- Department of Informatics, Technische Universitaet Muenchen (TUM), Boltzmannstr. 3, 85748 Garching, Munich, Germany
| | - Leonardo Bottolo
- Department of Medical Genetics, University of Cambridge, Cambridge, CB2 0QQ, UK
- The Alan Turing Institute, London, NW1 2DB, UK
- MRC Biostatistics Unit, University of Cambridge, Cambridge, CB2 0SR, UK
| | - Daniel Sanchis
- Institut de Recerca Biomedica de Lleida (IRBLLEIDA), Universitat de Lleida, Edifici Biomedicina-I. Av. Rovira Roure, 80, 25198, Lleida, Spain
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195, Berlin, Germany
| | - Marina Chekulaeva
- Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115, Berlin, Germany
| | - Michal Pravenec
- Institute of Physiology of the Czech Academy of Sciences, 4, 142 20, Praha, Czech Republic
| | - Norbert Hubner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany.
- DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347, Berlin, Germany.
- Charité-Universitätsmedizin, 10117, Berlin, Germany.
| | - Sebastiaan van Heesch
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany.
- Present Address: The Princess Máxima Center for Pediatric Oncology, Utrecht, the Netherlands.
| |
Collapse
|
5
|
Zhang J, Sun M, Zhao Y, Geng G, Hu Y. Identification of Gingivitis-Related Genes Across Human Tissues Based on the Summary Mendelian Randomization. Front Cell Dev Biol 2021; 8:624766. [PMID: 34026747 PMCID: PMC8134671 DOI: 10.3389/fcell.2020.624766] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 12/02/2020] [Indexed: 11/13/2022] Open
Abstract
Periodontal diseases are among the most frequent inflammatory diseases affecting children and adolescents, which affect the supporting structures of the teeth and lead to tooth loss and contribute to systemic inflammation. Gingivitis is the most common periodontal infection. Gingivitis, which is mainly caused by a substance produced by microbial plaque, systemic disorders, and genetic abnormalities in the host. Identifying gingivitis-related genes across human tissues is not only significant for understanding disease mechanisms but also disease development and clinical diagnosis. The Genome-wide association study (GWAS) a commonly used method to mine disease-related genetic variants. However, due to some factors such as linkage disequilibrium, it is difficult for GWAS to identify genes directly related to the disease. Hence, we constructed a data integration method that uses the Summary Mendelian randomization (SMR) to combine the GWAS with expression quantitative trait locus (eQTL) data to identify gingivitis-related genes. Five eQTL studies from different human tissues and one GWAS studies were referenced in this paper. This study identified several candidates SNPs and genes relate to gingivitis in tissue-specific or cross-tissue. Further, we also analyzed and explained the functions of these genes. The R program for the SMR method has been uploaded to GitHub(https://github.com/hxdde/SMR).
Collapse
Affiliation(s)
- Jiahui Zhang
- Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Mingai Sun
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yuanyuan Zhao
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Guannan Geng
- Department of Endocrinology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yang Hu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
6
|
Abstract
Omics data are being generated and collected at unprecedented scale. During the last decade, single omics, such as genomics, transcriptomics, proteomics, and metabolomics, have already highlighted pathophysiological pathways underpinning a variety of conditions across all the fields of medicine.In fact, high-throughput data generated by the comprehensive and unbiased analysis of an entire segment of the flow of genetic information (i.e., genetic variants in the case of genomics, or gene expression in transcriptomics) certainly provide a plethora of information and a precious support to dissect the mechanisms involved in complex diseases.Yet the most effective approach, set to fully exploit the potential of such big data, lies in the possibility to integrate various omics to unveil previously unappreciated pathways. This approach is the foundation of Systems Biology and allows to overcome the limitations inherent to single omics and traditional biology analyses.A robust and powerful strategy has been developed to integrate genetics and gene expression data in the framework of Systems Genetics. With this technique the first two layers of the flow of genetic information are integrated and specifically it is possible to pinpoint which genetic variants are associated with gene co-expression networks.Here we present a versatile bioinformatic protocol that can be used to study the Systems Genetics of CTLs, in order to identify genes (also known as master regulators) that influence the activation of biological pathways in these cells in a particular state or condition.
Collapse
Affiliation(s)
- Francesco Pesce
- Department of Emergency and Organ Transplantation, Nephrology, Dialysis and Transplantation Unit, University of Bari "A. Moro", Bari, Italy.
| | - Paolo Protopapa
- Department of Emergency and Organ Transplantation, Nephrology, Dialysis and Transplantation Unit, University of Bari "A. Moro", Bari, Italy
| |
Collapse
|
7
|
Ruffieux H, Davison AC, Hager J, Inshaw J, Fairfax BP, Richardson S, Bottolo L. A Global-Local Approach for Detecting Hotspots in Multiple-Response Regression. Ann Appl Stat 2020; 14:905-928. [PMID: 34992707 PMCID: PMC7612176 DOI: 10.1214/20-aoas1332] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting hotspots, that is, predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expression of many genes and may initiate decisive functional mechanisms underlying disease endpoints. Existing hierarchical regression approaches designed to model hotspots suffer from two limitations: their discrimination of hotspots is sensitive to the choice of top-level scale parameters for the propensity of predictors to be hotspots, and they do not scale to large predictor and response vectors, for example, of dimensions 103-105 in genetic applications. We address these shortcomings by introducing a flexible hierarchical regression framework that is tailored to the detection of hotspots and scalable to the above dimensions. Our proposal implements a fully Bayesian model for hotspots based on the horseshoe shrinkage prior. Its global-local formulation shrinks noise globally and, hence, accommodates the highly sparse nature of genetic analyses while being robust to individual signals, thus leaving the effects of hotspots unshrunk. Inference is carried out using a fast variational algorithm coupled with a novel simulated annealing procedure that allows efficient exploration of multimodal distributions.
Collapse
Affiliation(s)
| | | | | | - Jamie Inshaw
- Wellcome Centre for Human Genetics, Oxford, University of Oxford
| | - Benjamin P. Fairfax
- Department of Oncology, MRC Weatherall Institute for Molecular Medicine, University of Oxford
| | - Sylvia Richardson
- MRC Biostatistics Unit, University of Cambridge
- Alan Turing Institute
| | - Leonardo Bottolo
- MRC Biostatistics Unit, University of Cambridge
- Alan Turing Institute
- Department of Medical Genetics, University of Cambridge
| |
Collapse
|
8
|
Zhao Z, Zucknick M. Structured penalized regression for drug sensitivity prediction. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12400] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
9
|
A Multi-Omics Perspective of Quantitative Trait Loci in Precision Medicine. Trends Genet 2020; 36:318-336. [PMID: 32294413 DOI: 10.1016/j.tig.2020.01.009] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 01/05/2020] [Accepted: 01/21/2020] [Indexed: 02/07/2023]
Abstract
Quantitative trait loci (QTL) analysis is an important approach to investigate the effects of genetic variants identified through an increasing number of large-scale, multidimensional 'omics data sets. In this 'big data' era, the research community has identified a significant number of molecular QTLs (molQTLs) and increased our understanding of their effects. Herein, we review multiple categories of molQTLs, including those associated with transcriptome, post-transcriptional regulation, epigenetics, proteomics, metabolomics, and the microbiome. We summarize approaches to identify molQTLs and to infer their causal effects. We further discuss the integrative analysis of molQTLs through a multi-omics perspective. Our review highlights future opportunities to better understand the functional significance of genetic variants and to utilize the discovery of molQTLs in precision medicine.
Collapse
|
10
|
Abstract
Expression quantitative trait loci (eQTL) analysis identifies genetic variants that regulate the expression level of a gene. The genetic regulation may persist or vary in different tissues. When data are available on multiple tissues, it is often desired to borrow information across tissues and conduct an integrative analysis. Here we describe a multi-tissue eQTL analysis procedure, which improves the identification of different types of eQTL and facilitates the assessment of tissue specificity.
Collapse
Affiliation(s)
- Gen Li
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA.
| |
Collapse
|
11
|
Turchin MC, Stephens M. Bayesian multivariate reanalysis of large genetic studies identifies many new associations. PLoS Genet 2019; 15:e1008431. [PMID: 31596850 PMCID: PMC6802844 DOI: 10.1371/journal.pgen.1008431] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 10/21/2019] [Accepted: 09/17/2019] [Indexed: 01/08/2023] Open
Abstract
Genome-wide association studies (GWAS) have now been conducted for hundreds of phenotypes of relevance to human health. Many such GWAS involve multiple closely-related phenotypes collected on the same samples. However, the vast majority of these GWAS have been analyzed using simple univariate analyses, which consider one phenotype at a time. This is despite the fact that, at least in simulation experiments, multivariate analyses have been shown to be more powerful at detecting associations. Here, we conduct multivariate association analyses on 13 different publicly-available GWAS datasets that involve multiple closely-related phenotypes. These data include large studies of anthropometric traits (GIANT), plasma lipid traits (GlobalLipids), and red blood cell traits (HaemgenRBC). Our analyses identify many new associations (433 in total across the 13 studies), many of which replicate when follow-up samples are available. Overall, our results demonstrate that multivariate analyses can help make more effective use of data from both existing and future GWAS. Genome-wide association studies (GWAS) have become a common and powerful tool for identifying significant correlations between markers of genetic variation and physical traits of interest. Often these studies are conducted by comparing genetic variation against single traits one at a time (‘univariate’); however, it has previously been shown that it is possible to increase your power to detect significant associations by comparing genetic variation against multiple traits simultaneously (‘multivariate’). Despite this apparent increase in power though, researchers still rarely conduct multivariate GWAS, even when studies have multiple traits readily available. Here, we reanalyze 13 previously published GWAS using a multivariate method and find >400 additional associations. Our method makes use of univariate GWAS summary statistics and is available as a software package, thus making it accessible to other researchers interested in conducting the same analyses. We also show, using studies that have multiple releases, that our new associations have high rates of replication. Overall, we argue multivariate approaches in GWAS should no longer be overlooked and how, often, there is low-hanging fruit in the form of new associations by running these methods on data already collected.
Collapse
Affiliation(s)
- Michael C. Turchin
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Matthew Stephens
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
- Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
12
|
Adriaens ME, Lodder EM, Moreno‐Moral A, Šilhavý J, Heinig M, Glinge C, Belterman C, Wolswinkel R, Petretto E, Pravenec M, Remme CA, Bezzina CR. Systems Genetics Approaches in Rat Identify Novel Genes and Gene Networks Associated With Cardiac Conduction. J Am Heart Assoc 2018; 7:e009243. [PMID: 30608189 PMCID: PMC6404199 DOI: 10.1161/jaha.118.009243] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 08/03/2018] [Indexed: 01/20/2023]
Abstract
Background Electrocardiographic ( ECG ) parameters are regarded as intermediate phenotypes of cardiac arrhythmias. Insight into the genetic underpinnings of these parameters is expected to contribute to the understanding of cardiac arrhythmia mechanisms. Here we used HXB / BXH recombinant inbred rat strains to uncover genetic loci and candidate genes modulating ECG parameters. Methods and Results RR interval, PR interval, QRS duration, and QT c interval were measured from ECG s obtained in 6 male rats from each of the 29 available HXB / BXH recombinant inbred strains. Genes at loci displaying significant quantitative trait loci (QTL) effects were prioritized by assessing the presence of protein-altering variants, and by assessment of cis expression QTL ( eQTL ) effects and correlation of transcript abundance to the respective trait in the heart. Cardiac RNA -seq data were additionally used to generate gene co-expression networks. QTL analysis of ECG parameters identified 2 QTL for PR interval, respectively, on chromosomes 10 and 17. At the chromosome 10 QTL , cis- eQTL effects were identified for Acbd4, Cd300lg, Fam171a2, and Arhgap27; the transcript abundance in the heart of these 4 genes was correlated with PR interval. At the chromosome 17 QTL , a cis- eQTL was uncovered for Nhlrc1 candidate gene; the transcript abundance of this gene was also correlated with PR interval. Co-expression analysis furthermore identified 50 gene networks, 6 of which were correlated with PR interval or QRS duration, both parameters of cardiac conduction. Conclusions These newly identified genetic loci and gene networks associated with the ECG parameters of cardiac conduction provide a starting point for future studies with the potential of identifying novel mechanisms underlying cardiac electrical function.
Collapse
Affiliation(s)
- Michiel E. Adriaens
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
- Maastricht Centre for Systems BiologyMaastricht UniversityMaastrichtThe Netherlands
| | - Elisabeth M. Lodder
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| | | | - Jan Šilhavý
- Institute of PhysiologyAcademy of Sciences of the Czech RepublicPragueCzech Republic
| | - Matthias Heinig
- Institute of Computational BiologyHelmholtz Zentrum MünchenMünchenGermany
| | - Charlotte Glinge
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| | - Charly Belterman
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| | - Rianne Wolswinkel
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| | - Enrico Petretto
- The MRC London Institute of Medical SciencesImperial College LondonLondonUnited Kingdom
- Duke‐NUS Medical SchoolSingapore
| | - Michal Pravenec
- Institute of PhysiologyAcademy of Sciences of the Czech RepublicPragueCzech Republic
| | - Carol Ann Remme
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| | - Connie R. Bezzina
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| |
Collapse
|
13
|
Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits. Biophys Rev 2018; 10:1053-1060. [PMID: 29934864 PMCID: PMC6082306 DOI: 10.1007/s12551-018-0435-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Accepted: 06/13/2018] [Indexed: 12/31/2022] Open
Abstract
Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.
Collapse
|
14
|
Li G, Jima D, Wright FA, Nobel AB. HT-eQTL: integrative expression quantitative trait loci analysis in a large number of human tissues. BMC Bioinformatics 2018. [PMID: 29523079 PMCID: PMC5845327 DOI: 10.1186/s12859-018-2088-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Expression quantitative trait loci (eQTL) analysis identifies genetic markers associated with the expression of a gene. Most existing eQTL analyses and methods investigate association in a single, readily available tissue, such as blood. Joint analysis of eQTL in multiple tissues has the potential to improve, and expand the scope of, single-tissue analyses. Large-scale collaborative efforts such as the Genotype-Tissue Expression (GTEx) program are currently generating high quality data in a large number of tissues. However, computational constraints limit genome-wide multi-tissue eQTL analysis. Results We develop an integrative method under a hierarchical Bayesian framework for eQTL analysis in a large number of tissues. The model fitting procedure is highly scalable, and the computing time is a polynomial function of the number of tissues. Multi-tissue eQTLs are identified through a local false discovery rate approach, which rigorously controls the false discovery rate. Using simulation and GTEx real data studies, we show that the proposed method has superior performance to existing methods in terms of computing time and the power of eQTL discovery. Conclusions We provide a scalable method for eQTL analysis in a large number of tissues. The method enables the identification of eQTL with different configurations and facilitates the characterization of tissue specificity. Electronic supplementary material The online version of this article (10.1186/s12859-018-2088-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gen Li
- Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 W 168 Street, New York, USA.
| | - Dereje Jima
- Center for Human Health and the Environment and Bioinformatics Research Center, North Carolina State University, 850 Main Campus Drive, Raleigh, 27695, USA
| | - Fred A Wright
- Center for Human Health and the Environment and Bioinformatics Research Center, North Carolina State University, 850 Main Campus Drive, Raleigh, 27695, USA.,Department of Statistics and Biological Sciences, North Carolina State University, 2311 Stinson Drive, Raleigh, 27695, USA
| | - Andrew B Nobel
- Department of Statistics and Operations Research and Department of Biostatistics, University of North Carolina at Chapel Hill, 318 E Cameron Avenue, Chapel Hill, 27599, USA
| |
Collapse
|
15
|
Fang J, Zhang JG, Deng HW, Wang YP. Joint Detection of Associations between DNA Methylation and Gene Expression from Multiple Cancers. IEEE J Biomed Health Inform 2017; 22:1960-1969. [PMID: 29990049 DOI: 10.1109/jbhi.2017.2784621] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
DNA methylation plays an important role in the development of various cancers mainly through the regulation on gene expression. Hence, the study on the relation between DNA methylation and gene expression is of particular interest to understand cancers. Recently, an increasing number of datasets are available from multiple cancers, which makes it possible to study both the similarity and difference of genomic alterations across multiple tumor types. However, most of the existing pan-cancer analysis methods perform simple aggregations, which may overlook the heterogeneity of the interactions. In this paper, we propose a novel method to jointly detect complex associations between DNA methylation and gene expression levels from multiple cancers. The main idea is to apply joint sparse canonical correlation analysis to detect a small set of methylated sites, which are associated with another set of genes either shared across cancers or specific to a particular group (group-specific) of cancers. These methylated sites and genes form a complex module with strong multivariate correlations. We further introduced a joint sparse precision matrix estimation method to identify driver methylation-gene pairs in the module. These pairs are characterized by significant partial correlations, which may imply high functional impacts and contribute to complementary information to the main step. We apply our method to The Cancer Genome Atlas(TCGA) datasets with 1166 samples from four cancers. The results reveal significant shared and groupspecific interactions between DNA methylation and gene expression levels. To promote reproducible research, the Matlab code is available at https://sites.google.com/site/jianfang86/jointTCGA.
Collapse
|
16
|
Asafu-Adjei J, Mahlet GT, Coull B, Balasubramanian R, Lev M, Schwamm L, Betensky R. Bayesian Variable Selection Methods for Matched Case-Control Studies. Int J Biostat 2017; 13:/j/ijb.ahead-of-print/ijb-2016-0043/ijb-2016-0043.xml. [PMID: 28157692 DOI: 10.1515/ijb-2016-0043] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Matched case-control designs are currently used in many biomedical applications. To ensure high efficiency and statistical power in identifying features that best discriminate cases from controls, it is important to account for the use of matched designs. However, in the setting of high dimensional data, few variable selection methods account for matching. Bayesian approaches to variable selection have several advantages, including the fact that such approaches visit a wider range of model subsets. In this paper, we propose a variable selection method to account for case-control matching in a Bayesian context and apply it using simulation studies, a matched brain imaging study conducted at Massachusetts General Hospital, and a matched cardiovascular biomarker study conducted by the High Risk Plaque Initiative.
Collapse
|
17
|
Abstract
The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these "one at-a-time" strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results.
Collapse
Affiliation(s)
- Martha Imprialou
- Centre for Complement and Inflammation Research, Imperial College London, Hammersmith Hospital, Du Cane Road, London, W12 0NN, UK
| | - Enrico Petretto
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
| | - Leonardo Bottolo
- Department of Medical Genetics, University of Cambridge, Box 238, Lv 6 Addenbrooke's Treatment Centre, Addenbrooke's Hospital, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK.
- Department of Mathematics, Imperial College London, 180 Queen's Gate, London, SW7 2AZ, UK.
| |
Collapse
|
18
|
Moreno-Moral A, Pesce F, Behmoaras J, Petretto E. Systems Genetics as a Tool to Identify Master Genetic Regulators in Complex Disease. Methods Mol Biol 2017; 1488:337-362. [PMID: 27933533 DOI: 10.1007/978-1-4939-6427-7_16] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Systems genetics stems from systems biology and similarly employs integrative modeling approaches to describe the perturbations and phenotypic effects observed in a complex system. However, in the case of systems genetics the main source of perturbation is naturally occurring genetic variation, which can be analyzed at the systems-level to explain the observed variation in phenotypic traits. In contrast with conventional single-variant association approaches, the success of systems genetics has been in the identification of gene networks and molecular pathways that underlie complex disease. In addition, systems genetics has proven useful in the discovery of master trans-acting genetic regulators of functional networks and pathways, which in many cases revealed unexpected gene targets for disease. Here we detail the central components of a fully integrated systems genetics approach to complex disease, starting from assessment of genetic and gene expression variation, linking DNA sequence variation to mRNA (expression QTL mapping), gene regulatory network analysis and mapping the genetic control of regulatory networks. By summarizing a few illustrative (and successful) examples, we highlight how different data-modeling strategies can be effectively integrated in a systems genetics study.
Collapse
Affiliation(s)
- Aida Moreno-Moral
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
| | - Francesco Pesce
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, Hammersmith Campus, Imperial Centre for Translational and Experimental Medicine, London, UK
| | - Jacques Behmoaras
- Centre for Complement and Inflammation Research, Imperial College London, Hammersmith Hospital, Du Cane Road, London, W12 0NN, UK
| | - Enrico Petretto
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
| |
Collapse
|
19
|
Moreno-Moral A, Petretto E. From integrative genomics to systems genetics in the rat to link genotypes to phenotypes. Dis Model Mech 2016; 9:1097-1110. [PMID: 27736746 PMCID: PMC5087832 DOI: 10.1242/dmm.026104] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Complementary to traditional gene mapping approaches used to identify the hereditary components of complex diseases, integrative genomics and systems genetics have emerged as powerful strategies to decipher the key genetic drivers of molecular pathways that underlie disease. Broadly speaking, integrative genomics aims to link cellular-level traits (such as mRNA expression) to the genome to identify their genetic determinants. With the characterization of several cellular-level traits within the same system, the integrative genomics approach evolved into a more comprehensive study design, called systems genetics, which aims to unravel the complex biological networks and pathways involved in disease, and in turn map their genetic control points. The first fully integrated systems genetics study was carried out in rats, and the results, which revealed conserved trans-acting genetic regulation of a pro-inflammatory network relevant to type 1 diabetes, were translated to humans. Many studies using different organisms subsequently stemmed from this example. The aim of this Review is to describe the most recent advances in the fields of integrative genomics and systems genetics applied in the rat, with a focus on studies of complex diseases ranging from inflammatory to cardiometabolic disorders. We aim to provide the genetics community with a comprehensive insight into how the systems genetics approach came to life, starting from the first integrative genomics strategies [such as expression quantitative trait loci (eQTLs) mapping] and concluding with the most sophisticated gene network-based analyses in multiple systems and disease states. Although not limited to studies that have been directly translated to humans, we will focus particularly on the successful investigations in the rat that have led to primary discoveries of genes and pathways relevant to human disease.
Collapse
Affiliation(s)
- Aida Moreno-Moral
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore (NUS) Medical School, Singapore
| | - Enrico Petretto
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore (NUS) Medical School, Singapore
| |
Collapse
|