1
|
Zhang X, Jiang W, Zhao H. Integration of expression QTLs with fine mapping via SuSiE. PLoS Genet 2024; 20:e1010929. [PMID: 38271473 PMCID: PMC10846745 DOI: 10.1371/journal.pgen.1010929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 02/06/2024] [Accepted: 01/06/2024] [Indexed: 01/27/2024] Open
Abstract
Genome-wide association studies (GWASs) have achieved remarkable success in associating thousands of genetic variants with complex traits. However, the presence of linkage disequilibrium (LD) makes it challenging to identify the causal variants. To address this critical gap from association to causation, many fine-mapping methods have been proposed to assign well-calibrated probabilities of causality to candidate variants, taking into account the underlying LD pattern. In this manuscript, we introduce a statistical framework that incorporates expression quantitative trait locus (eQTL) information to fine-mapping, built on the sum of single-effects (SuSiE) regression model. Our new method, SuSiE2, connects two SuSiE models, one for eQTL analysis and one for genetic fine-mapping. This is achieved by first computing the posterior inclusion probabilities (PIPs) from an eQTL-based SuSiE model with the expression level of the candidate gene as the phenotype. These calculated PIPs are then utilized as prior inclusion probabilities for risk variants in another SuSiE model for the trait of interest. By prioritizing functional variants within the candidate region using eQTL information, SuSiE2 improves SuSiE by increasing the detection rate of causal SNPs and reducing the average size of credible sets. We compared the performance of SuSiE2 with other multi-trait fine-mapping methods with respect to power, coverage, and precision through simulations and applications to the GWAS results of Alzheimer's disease (AD) and body mass index (BMI). Our results demonstrate the better performance of SuSiE2, both when the in-sample linkage disequilibrium (LD) matrix and an external reference panel is used in inference.
Collapse
Affiliation(s)
- Xiangyu Zhang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Wei Jiang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
2
|
Zhang X, Jiang W, Zhao H. Integration of Expression QTLs with fine mapping via SuSiE. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.03.23294486. [PMID: 37873337 PMCID: PMC10593033 DOI: 10.1101/2023.10.03.23294486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Genome-wide association studies (GWASs) have achieved remarkable success in associating thousands of genetic variants with complex traits. However, the presence of linkage disequilibrium (LD) makes it challenging to identify the causal variants. To address this critical gap from association to causation, many fine mapping methods have been proposed to assign well-calibrated probabilities of causality to candidate variants, taking into account the underlying LD pattern. In this manuscript, we introduce a statistical framework that incorporates expression quantitative trait locus (eQTL) information to fine mapping, built on the sum of single-effects (SuSiE) regression model. Our new method, SuSiE2, connects two SuSiE models, one for eQTL analysis and one for genetic fine mapping. This is achieved by first computing the posterior inclusion probabilities (PIPs) from an eQTL-based SuSiE model with the expression level of the candidate gene as the phenotype. These calculated PIPs are then utilized as prior inclusion probabilities for risk variants in another SuSiE model for the trait of interest. By leveraging eQTL information, SuSiE2 enhances the power of detecting causal SNPs while reducing false positives and the average size of credible sets by prioritizing functional variants within the candidate region. The advantages of SuSiE2 over SuSiE are demonstrated by simulations and an application to a single-cell epigenomic study for Alzheimer's disease. We also demonstrate that eQTL information can be used by SuSiE2 to compensate for the power loss because of an inaccurate LD matrix.
Collapse
Affiliation(s)
- Xiangyu Zhang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Wei Jiang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
3
|
Bottolo L, Banterle M, Richardson S, Ala-Korpela M, Järvelin MR, Lewin A. A computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional quantitative trait loci discovery. J R Stat Soc Ser C Appl Stat 2021; 70:886-908. [PMID: 35001978 PMCID: PMC7612194 DOI: 10.1111/rssc.12490] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Our work is motivated by the search for metabolite quantitative trait loci (QTL) in a cohort of more than 5000 people. There are 158 metabolites measured by NMR spectroscopy in the 31-year follow-up of the Northern Finland Birth Cohort 1966 (NFBC66). These metabolites, as with many multivariate phenotypes produced by high-throughput biomarker technology, exhibit strong correlation structures. Existing approaches for combining such data with genetic variants for multivariate QTL analysis generally ignore phenotypic correlations or make restrictive assumptions about the associations between phenotypes and genetic loci. We present a computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional data, with cell-sparse variable selection and sparse graphical structure for covariance selection. Cell sparsity allows different phenotype responses to be associated with different genetic predictors and the graphical structure is used to represent the conditional dependencies between phenotype variables. To achieve feasible computation of the large model space, we exploit a factorisation of the covariance matrix. Applying the model to the NFBC66 data with 9000 directly genotyped single nucleotide polymorphisms, we are able to simultaneously estimate genotype-phenotype associations and the residual dependence structure among the metabolites. The R package BayesSUR with full documentation is available at https://cran.r-project.org/web/packages/BayesSUR/.
Collapse
Affiliation(s)
- Leonardo Bottolo
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
- The Alan Turing Institute, London, UK
- MRC Biostatistics Unit, Cambridge, UK
| | - Marco Banterle
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Sylvia Richardson
- The Alan Turing Institute, London, UK
- MRC Biostatistics Unit, Cambridge, UK
| | - Mika Ala-Korpela
- Computational Medicine, Faculty of Medicine, University of Oulu and Biocenter Oulu, Oulu, Finland
- NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland
| | - Marjo-Riitta Järvelin
- Center for Life Course Health Research, University of Oulu, Oulu, Finland
- Biocenter Oulu, University of Oulu, Oulu, Finland
- Department of Epidemiology and Biostatistics, Imperial College London, London, UK
- MRC-PHE Centre for Environment and Health, Imperial College London, London, UK
- Department of Life Sciences, Brunel University London, Uxbridge, UK
| | - Alex Lewin
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
4
|
Witte F, Ruiz-Orera J, Mattioli CC, Blachut S, Adami E, Schulz JF, Schneider-Lunitz V, Hummel O, Patone G, Mücke MB, Šilhavý J, Heinig M, Bottolo L, Sanchis D, Vingron M, Chekulaeva M, Pravenec M, Hubner N, van Heesch S. A trans locus causes a ribosomopathy in hypertrophic hearts that affects mRNA translation in a protein length-dependent fashion. Genome Biol 2021; 22:191. [PMID: 34183069 PMCID: PMC8240307 DOI: 10.1186/s13059-021-02397-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 06/02/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Little is known about the impact of trans-acting genetic variation on the rates with which proteins are synthesized by ribosomes. Here, we investigate the influence of such distant genetic loci on the efficiency of mRNA translation and define their contribution to the development of complex disease phenotypes within a panel of rat recombinant inbred lines. RESULTS We identify several tissue-specific master regulatory hotspots that each control the translation rates of multiple proteins. One of these loci is restricted to hypertrophic hearts, where it drives a translatome-wide and protein length-dependent change in translational efficiency, altering the stoichiometric translation rates of sarcomere proteins. Mechanistic dissection of this locus across multiple congenic lines points to a translation machinery defect, characterized by marked differences in polysome profiles and misregulation of the small nucleolar RNA SNORA48. Strikingly, from yeast to humans, we observe reproducible protein length-dependent shifts in translational efficiency as a conserved hallmark of translation machinery mutants, including those that cause ribosomopathies. Depending on the factor mutated, a pre-existing negative correlation between protein length and translation rates could either be enhanced or reduced, which we propose to result from mRNA-specific imbalances in canonical translation initiation and reinitiation rates. CONCLUSIONS We show that distant genetic control of mRNA translation is abundant in mammalian tissues, exemplified by a single genomic locus that triggers a translation-driven molecular mechanism. Our work illustrates the complexity through which genetic variation can drive phenotypic variability between individuals and thereby contribute to complex disease.
Collapse
Affiliation(s)
- Franziska Witte
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
- Present Address: NUVISAN ICB GmbH, Lead Discovery-Structrual Biology, 13353, Berlin, Germany
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Camilla Ciolli Mattioli
- Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115, Berlin, Germany
- Present Address: Department of Biological Regulation, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - Susanne Blachut
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Eleonora Adami
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
- Present Address: Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore, 169857, Singapore
| | - Jana Felicitas Schulz
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Valentin Schneider-Lunitz
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Oliver Hummel
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Giannino Patone
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
| | - Michael Benedikt Mücke
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347, Berlin, Germany
- Charité-Universitätsmedizin, 10117, Berlin, Germany
| | - Jan Šilhavý
- Institute of Physiology of the Czech Academy of Sciences, 4, 142 20, Praha, Czech Republic
| | - Matthias Heinig
- Institute of Computational Biology (ICB), HMGU, Ingolstaedter Landstr. 1, 85764 Neuherberg, Munich, Germany
- Department of Informatics, Technische Universitaet Muenchen (TUM), Boltzmannstr. 3, 85748 Garching, Munich, Germany
| | - Leonardo Bottolo
- Department of Medical Genetics, University of Cambridge, Cambridge, CB2 0QQ, UK
- The Alan Turing Institute, London, NW1 2DB, UK
- MRC Biostatistics Unit, University of Cambridge, Cambridge, CB2 0SR, UK
| | - Daniel Sanchis
- Institut de Recerca Biomedica de Lleida (IRBLLEIDA), Universitat de Lleida, Edifici Biomedicina-I. Av. Rovira Roure, 80, 25198, Lleida, Spain
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195, Berlin, Germany
| | - Marina Chekulaeva
- Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 10115, Berlin, Germany
| | - Michal Pravenec
- Institute of Physiology of the Czech Academy of Sciences, 4, 142 20, Praha, Czech Republic
| | - Norbert Hubner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany.
- DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347, Berlin, Germany.
- Charité-Universitätsmedizin, 10117, Berlin, Germany.
| | - Sebastiaan van Heesch
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany.
- Present Address: The Princess Máxima Center for Pediatric Oncology, Utrecht, the Netherlands.
| |
Collapse
|
5
|
Ruffieux H, Fairfax BP, Nassiri I, Vigorito E, Wallace C, Richardson S, Bottolo L. EPISPOT: An epigenome-driven approach for detecting and interpreting hotspots in molecular QTL studies. Am J Hum Genet 2021; 108:983-1000. [PMID: 33909991 PMCID: PMC8206410 DOI: 10.1016/j.ajhg.2021.04.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 04/08/2021] [Indexed: 12/27/2022] Open
Abstract
We present EPISPOT, a fully joint framework which exploits large panels of epigenetic annotations as variant-level information to enhance molecular quantitative trait locus (QTL) mapping. Thanks to a purpose-built Bayesian inferential algorithm, EPISPOT accommodates functional information for both cis and trans actions, including QTL hotspot effects. It effectively couples simultaneous QTL analysis of thousands of genetic variants and molecular traits with hypothesis-free selection of biologically interpretable annotations which directly contribute to the QTL effects. This unified, epigenome-aided learning boosts statistical power and sheds light on the regulatory basis of the uncovered hits; EPISPOT therefore marks an essential step toward improving the challenging detection and functional interpretation of trans-acting genetic variants and hotspots. We illustrate the advantages of EPISPOT in simulations emulating real-data conditions and in a monocyte expression QTL study, which confirms known hotspots and finds other signals, as well as plausible mechanisms of action. In particular, by highlighting the role of monocyte DNase-I sensitivity sites from >150 epigenetic annotations, we clarify the mediation effects and cell-type specificity of major hotspots close to the lysozyme gene. Our approach forgoes the daunting and underpowered task of one-annotation-at-a-time enrichment analyses for prioritizing cis and trans QTL hits and is tailored to any transcriptomic, proteomic, or metabolomic QTL problem. By enabling principled epigenome-driven QTL mapping transcriptome-wide, EPISPOT helps progress toward a better functional understanding of genetic regulation.
Collapse
Affiliation(s)
- Hélène Ruffieux
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK.
| | - Benjamin P Fairfax
- Department of Oncology, MRC Weatherall Institute for Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, UK
| | - Isar Nassiri
- Department of Oncology, MRC Weatherall Institute for Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, UK
| | - Elena Vigorito
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK
| | - Chris Wallace
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK; Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge CB2 0AW, UK
| | - Sylvia Richardson
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK; The Alan Turing Institute, London NW1 2DB, UK
| | - Leonardo Bottolo
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK; The Alan Turing Institute, London NW1 2DB, UK; Department of Medical Genetics, University of Cambridge, Cambridge CB2 0QQ, UK
| |
Collapse
|
6
|
Wang G, Sarkar A, Carbonetto P, Stephens M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J R Stat Soc Series B Stat Methodol 2020; 82:1273-1300. [DOI: 10.1111/rssb.12388] [Citation(s) in RCA: 176] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
7
|
A fully joint Bayesian quantitative trait locus mapping of human protein abundance in plasma. PLoS Comput Biol 2020; 16:e1007882. [PMID: 32492067 PMCID: PMC7295243 DOI: 10.1371/journal.pcbi.1007882] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 06/15/2020] [Accepted: 04/16/2020] [Indexed: 11/19/2022] Open
Abstract
Molecular quantitative trait locus (QTL) analyses are increasingly popular to explore the genetic architecture of complex traits, but existing studies do not leverage shared regulatory patterns and suffer from a large multiplicity burden, which hampers the detection of weak signals such as trans associations. Here, we present a fully multivariate proteomic QTL (pQTL) analysis performed with our recently proposed Bayesian method LOCUS on data from two clinical cohorts, with plasma protein levels quantified by mass-spectrometry and aptamer-based assays. Our two-stage study identifies 136 pQTL associations in the first cohort, of which >80% replicate in the second independent cohort and have significant enrichment with functional genomic elements and disease risk loci. Moreover, 78% of the pQTLs whose protein abundance was quantified by both proteomic techniques are confirmed across assays. Our thorough comparisons with standard univariate QTL mapping on (1) these data and (2) synthetic data emulating the real data show how LOCUS borrows strength across correlated protein levels and markers on a genome-wide scale to effectively increase statistical power. Notably, 15% of the pQTLs uncovered by LOCUS would be missed by the univariate approach, including several trans and pleiotropic hits with successful independent validation. Finally, the analysis of extensive clinical data from the two cohorts indicates that the genetically-driven proteins identified by LOCUS are enriched in associations with low-grade inflammation, insulin resistance and dyslipidemia and might therefore act as endophenotypes for metabolic diseases. While considerations on the clinical role of the pQTLs are beyond the scope of our work, these findings generate useful hypotheses to be explored in future research; all results are accessible online from our searchable database. Thanks to its efficient variational Bayes implementation, LOCUS can analyze jointly thousands of traits and millions of markers. Its applicability goes beyond pQTL studies, opening new perspectives for large-scale genome-wide association and QTL analyses. Diet, Obesity and Genes (DiOGenes) trial registration number: NCT00390637. Exploring the functional mechanisms between the genotype and disease endpoints in view of identifying innovative therapeutic targets has prompted molecular quantitative trait locus studies, which assess how genetic variants (single nucleotide polymorphisms, SNPs) affect intermediate gene (eQTL), protein (pQTL) or metabolite (mQTL) levels. However, conventional univariate screening approaches do not account for local dependencies and association structures shared by multiple molecular levels and markers. Conversely, the current joint modelling approaches are restricted to small datasets by computational constraints. We illustrate and exploit the advantages of our recently introduced Bayesian framework LOCUS in a fully multivariate pQTL study, with ≈300K tag SNPs (capturing information from 4M markers) and 100 − 1, 000 plasma protein levels measured by two distinct technologies. LOCUS identifies novel pQTLs that replicate in an independent cohort, confirms signals documented in studies 2 − 18 times larger, and detects more pQTLs than a conventional two-stage univariate analysis of our datasets. Moreover, some of these pQTLs might be of biomedical relevance and would therefore deserve dedicated investigation. Our extensive numerical experiments on these data and on simulated data demonstrate that the increased statistical power of LOCUS over standard approaches is largely attributable to its ability to exploit shared information across outcomes while efficiently accounting for the genetic correlation structures at a genome-wide level.
Collapse
|
8
|
Ruffieux H, Davison AC, Hager J, Inshaw J, Fairfax BP, Richardson S, Bottolo L. A Global-Local Approach for Detecting Hotspots in Multiple-Response Regression. Ann Appl Stat 2020; 14:905-928. [PMID: 34992707 PMCID: PMC7612176 DOI: 10.1214/20-aoas1332] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting hotspots, that is, predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expression of many genes and may initiate decisive functional mechanisms underlying disease endpoints. Existing hierarchical regression approaches designed to model hotspots suffer from two limitations: their discrimination of hotspots is sensitive to the choice of top-level scale parameters for the propensity of predictors to be hotspots, and they do not scale to large predictor and response vectors, for example, of dimensions 103-105 in genetic applications. We address these shortcomings by introducing a flexible hierarchical regression framework that is tailored to the detection of hotspots and scalable to the above dimensions. Our proposal implements a fully Bayesian model for hotspots based on the horseshoe shrinkage prior. Its global-local formulation shrinks noise globally and, hence, accommodates the highly sparse nature of genetic analyses while being robust to individual signals, thus leaving the effects of hotspots unshrunk. Inference is carried out using a fast variational algorithm coupled with a novel simulated annealing procedure that allows efficient exploration of multimodal distributions.
Collapse
Affiliation(s)
| | | | | | - Jamie Inshaw
- Wellcome Centre for Human Genetics, Oxford, University of Oxford
| | - Benjamin P. Fairfax
- Department of Oncology, MRC Weatherall Institute for Molecular Medicine, University of Oxford
| | - Sylvia Richardson
- MRC Biostatistics Unit, University of Cambridge
- Alan Turing Institute
| | - Leonardo Bottolo
- MRC Biostatistics Unit, University of Cambridge
- Alan Turing Institute
- Department of Medical Genetics, University of Cambridge
| |
Collapse
|
9
|
Chen H, Moreno-Moral A, Pesce F, Devapragash N, Mancini M, Heng EL, Rotival M, Srivastava PK, Harmston N, Shkura K, Rackham OJL, Yu WP, Sun XM, Tee NGZ, Tan ELS, Barton PJR, Felkin LE, Lara-Pezzi E, Angelini G, Beltrami C, Pravenec M, Schafer S, Bottolo L, Hubner N, Emanueli C, Cook SA, Petretto E. WWP2 regulates pathological cardiac fibrosis by modulating SMAD2 signaling. Nat Commun 2019; 10:3616. [PMID: 31399586 PMCID: PMC6689010 DOI: 10.1038/s41467-019-11551-9] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 07/19/2019] [Indexed: 01/03/2023] Open
Abstract
Cardiac fibrosis is a final common pathology in inherited and acquired heart diseases that causes cardiac electrical and pump failure. Here, we use systems genetics to identify a pro-fibrotic gene network in the diseased heart and show that this network is regulated by the E3 ubiquitin ligase WWP2, specifically by the WWP2-N terminal isoform. Importantly, the WWP2-regulated pro-fibrotic gene network is conserved across different cardiac diseases characterized by fibrosis: human and murine dilated cardiomyopathy and repaired tetralogy of Fallot. Transgenic mice lacking the N-terminal region of the WWP2 protein show improved cardiac function and reduced myocardial fibrosis in response to pressure overload or myocardial infarction. In primary cardiac fibroblasts, WWP2 positively regulates the expression of pro-fibrotic markers and extracellular matrix genes. TGFβ1 stimulation promotes nuclear translocation of the WWP2 isoforms containing the N-terminal region and their interaction with SMAD2. WWP2 mediates the TGFβ1-induced nucleocytoplasmic shuttling and transcriptional activity of SMAD2.
Collapse
Affiliation(s)
- Huimei Chen
- Programme in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, 169857, Republic of Singapore
| | - Aida Moreno-Moral
- Programme in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, 169857, Republic of Singapore
| | - Francesco Pesce
- Department of Emergency and Organ Transplantation (DETO), University of Bari, 70124, Bari, Italy
| | - Nithya Devapragash
- Programme in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, 169857, Republic of Singapore
| | - Massimiliano Mancini
- SOC di Anatomia Patologica, Ospedale San Giovanni di Dio, 50123, Florence, Italy
| | - Ee Ling Heng
- National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK
| | - Maxime Rotival
- Unit of Human Evolutionary Genetics, Institute Pasteur, 75015, Paris, France
| | - Prashant K Srivastava
- Division of Brain Sciences, Imperial College Faculty of Medicine, London, W12 0NN, UK
| | - Nathan Harmston
- Programme in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, 169857, Republic of Singapore
| | - Kirill Shkura
- Division of Brain Sciences, Imperial College Faculty of Medicine, London, W12 0NN, UK
| | - Owen J L Rackham
- Programme in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, 169857, Republic of Singapore
| | - Wei-Ping Yu
- Animal Gene Editing Laboratory, BRC, A*STAR20 Biopolis Way, Singapore, 138668, Republic of Singapore
- Institute of Molecular and Cell Biology, A*STAR, 61 Biopolis Drive, Singapore, 138673, Republic of Singapore
| | - Xi-Ming Sun
- MRC London Institute of Medical Sciences (LMC), Imperial College, London, W12 0NN, UK
| | | | - Elisabeth Li Sa Tan
- Programme in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, 169857, Republic of Singapore
| | - Paul J R Barton
- National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK
- Cardiovascular Research Centre, Royal Brompton and Harefield NHS Trust, London, SW3 6NP, UK
| | - Leanne E Felkin
- National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK
- Cardiovascular Research Centre, Royal Brompton and Harefield NHS Trust, London, SW3 6NP, UK
| | - Enrique Lara-Pezzi
- Centro Nacional de Investigaciones Cardiovasculares - CNIC, 28029, Madrid, Spain
| | - Gianni Angelini
- National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK
- Bristol Heart Institute, Bristol Medical School, University of Bristol, Bristol, BS2 89HW, UK
| | - Cristina Beltrami
- National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK
| | - Michal Pravenec
- Institute of Physiology, Czech Academy of Sciences, 142 00, Praha 4, Czech Republic
| | - Sebastian Schafer
- Programme in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, 169857, Republic of Singapore
- National Heart Centre Singapore, Singapore, 169609, Republic of Singapore
| | - Leonardo Bottolo
- Department of Medical Genetics, University of Cambridge, Cambridge, CB2 0QQ, UK
- The Alan Turing Institute, London, NW1 2DB, UK
- MRC Biostatistics Unit, University of Cambridge, Cambridge, CB2 0SR, UK
| | - Norbert Hubner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125, Berlin, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, 13347, Berlin, Germany
- Charité-Universitätsmedizin, 10117, Berlin, Germany
- Berlin Institute of Health (BIH), 10178, Berlin, Germany
| | - Costanza Emanueli
- National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK
- Cardiovascular Research Centre, Royal Brompton and Harefield NHS Trust, London, SW3 6NP, UK
| | - Stuart A Cook
- Programme in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, 169857, Republic of Singapore
- MRC London Institute of Medical Sciences (LMC), Imperial College, London, W12 0NN, UK
- National Heart Centre Singapore, Singapore, 169609, Republic of Singapore
| | - Enrico Petretto
- Programme in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, 169857, Republic of Singapore.
- MRC London Institute of Medical Sciences (LMC), Imperial College, London, W12 0NN, UK.
| |
Collapse
|
10
|
Newcombe PJ, Connolly S, Seaman S, Richardson S, Sharp SJ. A two-step method for variable selection in the analysis of a case-cohort study. Int J Epidemiol 2019; 47:597-604. [PMID: 29136145 PMCID: PMC5913627 DOI: 10.1093/ije/dyx224] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/06/2017] [Indexed: 11/29/2022] Open
Abstract
Background Accurate detection and estimation of true exposure-outcome associations is important in aetiological analysis; when there are multiple potential exposure variables of interest, methods for detecting the subset of variables most likely to have true associations with the outcome of interest are required. Case-cohort studies often collect data on a large number of variables which have not been measured in the entire cohort (e.g. panels of biomarkers). There is a lack of guidance on methods for variable selection in case-cohort studies. Methods We describe and explore the application of three variable selection methods to data from a case-cohort study. These are: (i) selecting variables based on their level of significance in univariable (i.e. one-at-a-time) Prentice-weighted Cox regression models; (ii) stepwise selection applied to Prentice-weighted Cox regression; and (iii) a two-step method which applies a Bayesian variable selection algorithm to obtain posterior probabilities of selection for each variable using multivariable logistic regression followed by effect estimation using Prentice-weighted Cox regression. Results Across nine different simulation scenarios, the two-step method demonstrated higher sensitivity and lower false discovery rate than the one-at-a-time and stepwise methods. In an application of the methods to data from the EPIC-InterAct case-cohort study, the two-step method identified an additional two fatty acids as being associated with incident type 2 diabetes, compared with the one-at-a-time and stepwise methods. Conclusions The two-step method enables more powerful and accurate detection of exposure-outcome associations in case-cohort studies. An R package is available to enable researchers to apply this method.
Collapse
Affiliation(s)
| | | | - S Seaman
- MRC Biostatistics Unit, Cambridge, UK
| | | | - S J Sharp
- MRC Epidemiology Unit, Cambridge, UK
| |
Collapse
|
11
|
Bagnati M, Moreno-Moral A, Ko JH, Nicod J, Harmston N, Imprialou M, Game L, Gil J, Petretto E, Behmoaras J. Systems genetics identifies a macrophage cholesterol network associated with physiological wound healing. JCI Insight 2019; 4:e125736. [PMID: 30674726 PMCID: PMC6413785 DOI: 10.1172/jci.insight.125736] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Accepted: 12/18/2018] [Indexed: 01/18/2023] Open
Abstract
Among other cells, macrophages regulate the inflammatory and reparative phases during wound healing but genetic determinants and detailed molecular pathways that modulate these processes are not fully elucidated. Here, we took advantage of normal variation in wound healing in 1,378 genetically outbred mice, and carried out macrophage RNA-sequencing profiling of mice with extreme wound healing phenotypes (i.e., slow and fast healers, n = 146 in total). The resulting macrophage coexpression networks were genetically mapped and led to the identification of a unique module under strong trans-acting genetic control by the Runx2 locus. This macrophage-mediated healing network was specifically enriched for cholesterol and fatty acid biosynthetic processes. Pharmacological blockage of fatty acid synthesis with cerulenin resulted in delayed wound healing in vivo, and increased macrophage infiltration in the wounded skin, suggesting the persistence of an unresolved inflammation. We show how naturally occurring sequence variation controls transcriptional networks in macrophages, which in turn regulate specific metabolic pathways that could be targeted in wound healing.
Collapse
Affiliation(s)
- Marta Bagnati
- Centre for Inflammatory Disease, Imperial College London, Hammersmith Hospital, London, United Kingdom (UK)
| | | | - Jeong-Hun Ko
- Centre for Inflammatory Disease, Imperial College London, Hammersmith Hospital, London, United Kingdom (UK)
| | - Jérôme Nicod
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | | | - Martha Imprialou
- Centre for Inflammatory Disease, Imperial College London, Hammersmith Hospital, London, United Kingdom (UK)
| | - Laurence Game
- Genomics Laboratory, Medical Research Council (MRC) London Institute of Medical Sciences, Imperial College London, Hammersmith Hospital, London, UK
| | - Jesus Gil
- Cell Proliferation Group, MRC London Institute of Medical Sciences (LMS), London, UK
| | - Enrico Petretto
- Duke-NUS Medical School, Singapore, Singapore
- MRC London Institute of Medical Sciences, Faculty of Medicine, Imperial College London, London, UK
| | - Jacques Behmoaras
- Centre for Inflammatory Disease, Imperial College London, Hammersmith Hospital, London, United Kingdom (UK)
| |
Collapse
|
12
|
Adriaens ME, Lodder EM, Moreno‐Moral A, Šilhavý J, Heinig M, Glinge C, Belterman C, Wolswinkel R, Petretto E, Pravenec M, Remme CA, Bezzina CR. Systems Genetics Approaches in Rat Identify Novel Genes and Gene Networks Associated With Cardiac Conduction. J Am Heart Assoc 2018; 7:e009243. [PMID: 30608189 PMCID: PMC6404199 DOI: 10.1161/jaha.118.009243] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 08/03/2018] [Indexed: 01/20/2023]
Abstract
Background Electrocardiographic ( ECG ) parameters are regarded as intermediate phenotypes of cardiac arrhythmias. Insight into the genetic underpinnings of these parameters is expected to contribute to the understanding of cardiac arrhythmia mechanisms. Here we used HXB / BXH recombinant inbred rat strains to uncover genetic loci and candidate genes modulating ECG parameters. Methods and Results RR interval, PR interval, QRS duration, and QT c interval were measured from ECG s obtained in 6 male rats from each of the 29 available HXB / BXH recombinant inbred strains. Genes at loci displaying significant quantitative trait loci (QTL) effects were prioritized by assessing the presence of protein-altering variants, and by assessment of cis expression QTL ( eQTL ) effects and correlation of transcript abundance to the respective trait in the heart. Cardiac RNA -seq data were additionally used to generate gene co-expression networks. QTL analysis of ECG parameters identified 2 QTL for PR interval, respectively, on chromosomes 10 and 17. At the chromosome 10 QTL , cis- eQTL effects were identified for Acbd4, Cd300lg, Fam171a2, and Arhgap27; the transcript abundance in the heart of these 4 genes was correlated with PR interval. At the chromosome 17 QTL , a cis- eQTL was uncovered for Nhlrc1 candidate gene; the transcript abundance of this gene was also correlated with PR interval. Co-expression analysis furthermore identified 50 gene networks, 6 of which were correlated with PR interval or QRS duration, both parameters of cardiac conduction. Conclusions These newly identified genetic loci and gene networks associated with the ECG parameters of cardiac conduction provide a starting point for future studies with the potential of identifying novel mechanisms underlying cardiac electrical function.
Collapse
Affiliation(s)
- Michiel E. Adriaens
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
- Maastricht Centre for Systems BiologyMaastricht UniversityMaastrichtThe Netherlands
| | - Elisabeth M. Lodder
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| | | | - Jan Šilhavý
- Institute of PhysiologyAcademy of Sciences of the Czech RepublicPragueCzech Republic
| | - Matthias Heinig
- Institute of Computational BiologyHelmholtz Zentrum MünchenMünchenGermany
| | - Charlotte Glinge
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| | - Charly Belterman
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| | - Rianne Wolswinkel
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| | - Enrico Petretto
- The MRC London Institute of Medical SciencesImperial College LondonLondonUnited Kingdom
- Duke‐NUS Medical SchoolSingapore
| | - Michal Pravenec
- Institute of PhysiologyAcademy of Sciences of the Czech RepublicPragueCzech Republic
| | - Carol Ann Remme
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| | - Connie R. Bezzina
- Department of Experimental CardiologyHeart CentreAcademic Medical Center AmsterdamAmsterdamThe Netherlands
| |
Collapse
|
13
|
Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits. Biophys Rev 2018; 10:1053-1060. [PMID: 29934864 PMCID: PMC6082306 DOI: 10.1007/s12551-018-0435-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Accepted: 06/13/2018] [Indexed: 12/31/2022] Open
Abstract
Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.
Collapse
|
14
|
Vavoulis DV, Taylor JC, Schuh A. Hierarchical probabilistic models for multiple gene/variant associations based on next-generation sequencing data. Bioinformatics 2018; 33:3058-3064. [PMID: 28575251 PMCID: PMC5637939 DOI: 10.1093/bioinformatics/btx355] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Accepted: 05/30/2017] [Indexed: 01/12/2023] Open
Abstract
Motivation The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays employs two competing types of models: models which rely on appropriate transformations of RNA-seq data (and are powered by a mature mathematical theory), or count-based models, which represent digital gene expression explicitly, thus rendering such transformations unnecessary. The latter constitutes an immensely popular methodology, which is however plagued by mathematical intractability. Results We develop tractable count-based models, which are amenable to efficient estimation through the introduction of latent variables and the appropriate application of recent statistical theory in a sparse Bayesian modelling framework. Furthermore, we examine several transformation methods for RNA-seq read counts and we introduce arcsin, logit and Laplace smoothing as preprocessing steps for transformation-based models. Using natural and carefully simulated data from the 1000 Genomes and gEUVADIS projects, we benchmark both approaches under a variety of scenarios, including the presence of noise and violation of basic model assumptions. We demonstrate that an arcsin transformation of Laplace-smoothed data is at least as good as state-of-the-art models, particularly at small samples. Furthermore, we show that an over-dispersed Poisson model is comparable to the celebrated Negative Binomial, but much easier to estimate. These results provide strong support for transformation-based versus count-based (particularly Negative-Binomial-based) models for eQTL mapping. Availability and implementation All methods are implemented in the free software eQTLseq: https://github.com/dvav/eQTLseq Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dimitrios V Vavoulis
- The Nuffield Division of Clinical Laboratory Sciences.,The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK.,National Health Service Translational Molecular Diagnostics Centre, Oxford University Hospitals, John Radcliffe Hospital, Oxford, OX3 9DU UK.,National Institute for Health Research Oxford Biomedical Research Centre, Oxford, OX3 9DU UK
| | - Jenny C Taylor
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK.,National Institute for Health Research Oxford Biomedical Research Centre, Oxford, OX3 9DU UK
| | - Anna Schuh
- National Health Service Translational Molecular Diagnostics Centre, Oxford University Hospitals, John Radcliffe Hospital, Oxford, OX3 9DU UK.,National Institute for Health Research Oxford Biomedical Research Centre, Oxford, OX3 9DU UK.,Department of Oncology, University of Oxford, Oxford, OX3 7DQ UK
| |
Collapse
|
15
|
Hanna MH, Dalla Gassa A, Mayer G, Zaza G, Brophy PD, Gesualdo L, Pesce F. The nephrologist of tomorrow: towards a kidney-omic future. Pediatr Nephrol 2017; 32:393-404. [PMID: 26961492 DOI: 10.1007/s00467-016-3357-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Revised: 02/14/2016] [Accepted: 02/15/2016] [Indexed: 12/19/2022]
Abstract
Omics refers to the collective technologies used to explore the roles and relationships of the various types of molecules that make up the phenotype of an organism. Systems biology is a scientific discipline that endeavours to quantify all of the molecular elements of a biological system. Therefore, it reflects the knowledge acquired by omics in a meaningful manner by providing insights into functional pathways and regulatory networks underlying different diseases. The recent advances in biotechnological platforms and statistical tools to analyse such complex data have enabled scientists to connect the experimentally observed correlations to the underlying biochemical and pathological processes. We discuss in this review the current knowledge of different omics technologies in kidney diseases, specifically in the field of pediatric nephrology, including biomarker discovery, defining as yet unrecognized biologic therapeutic targets and linking omics to relevant standard indices and clinical outcomes. We also provide here a unique perspective on the field, taking advantage of the experience gained by the large-scale European research initiative called "Systems Biology towards Novel Chronic Kidney Disease Diagnosis and Treatment" (SysKid). Based on the integrative framework of Systems biology, SysKid demonstrated how omics are powerful yet complex tools to unravel the consequences of diabetes and hypertension on kidney function.
Collapse
Affiliation(s)
- Mina H Hanna
- Department of Pediatrics, Kentucky Children's Hospital, University of Kentucky, Lexington, KY, USA
| | | | - Gert Mayer
- Department of Internal Medicine IV (Nephrology and Hypertension), Medical University Innsbruck, Innsbruck, Austria
| | - Gianluigi Zaza
- Renal Unit, Department of Medicine, Verona University Hospital, Verona, Italy
| | - Patrick D Brophy
- Pediatric Nephrology, University of Iowa Children's Hospital, Iowa City, IA, USA
| | - Loreto Gesualdo
- Dipartimento Emergenza e Trapianti di Organi (D.E.T.O), University of Bari, Bari, Italy
| | - Francesco Pesce
- Dipartimento Emergenza e Trapianti di Organi (D.E.T.O), University of Bari, Bari, Italy. .,Cardiovascular Genetics and Genomics, National Heart and Lung Institute, Royal Brompton Hospital, Imperial College London, London, UK.
| |
Collapse
|
16
|
Abstract
The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these "one at-a-time" strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results.
Collapse
Affiliation(s)
- Martha Imprialou
- Centre for Complement and Inflammation Research, Imperial College London, Hammersmith Hospital, Du Cane Road, London, W12 0NN, UK
| | - Enrico Petretto
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
| | - Leonardo Bottolo
- Department of Medical Genetics, University of Cambridge, Box 238, Lv 6 Addenbrooke's Treatment Centre, Addenbrooke's Hospital, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK.
- Department of Mathematics, Imperial College London, 180 Queen's Gate, London, SW7 2AZ, UK.
| |
Collapse
|
17
|
Moreno-Moral A, Pesce F, Behmoaras J, Petretto E. Systems Genetics as a Tool to Identify Master Genetic Regulators in Complex Disease. Methods Mol Biol 2017; 1488:337-362. [PMID: 27933533 DOI: 10.1007/978-1-4939-6427-7_16] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Systems genetics stems from systems biology and similarly employs integrative modeling approaches to describe the perturbations and phenotypic effects observed in a complex system. However, in the case of systems genetics the main source of perturbation is naturally occurring genetic variation, which can be analyzed at the systems-level to explain the observed variation in phenotypic traits. In contrast with conventional single-variant association approaches, the success of systems genetics has been in the identification of gene networks and molecular pathways that underlie complex disease. In addition, systems genetics has proven useful in the discovery of master trans-acting genetic regulators of functional networks and pathways, which in many cases revealed unexpected gene targets for disease. Here we detail the central components of a fully integrated systems genetics approach to complex disease, starting from assessment of genetic and gene expression variation, linking DNA sequence variation to mRNA (expression QTL mapping), gene regulatory network analysis and mapping the genetic control of regulatory networks. By summarizing a few illustrative (and successful) examples, we highlight how different data-modeling strategies can be effectively integrated in a systems genetics study.
Collapse
Affiliation(s)
- Aida Moreno-Moral
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
| | - Francesco Pesce
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, Hammersmith Campus, Imperial Centre for Translational and Experimental Medicine, London, UK
| | - Jacques Behmoaras
- Centre for Complement and Inflammation Research, Imperial College London, Hammersmith Hospital, Du Cane Road, London, W12 0NN, UK
| | - Enrico Petretto
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
| |
Collapse
|
18
|
Moreno-Moral A, Petretto E. From integrative genomics to systems genetics in the rat to link genotypes to phenotypes. Dis Model Mech 2016; 9:1097-1110. [PMID: 27736746 PMCID: PMC5087832 DOI: 10.1242/dmm.026104] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Complementary to traditional gene mapping approaches used to identify the hereditary components of complex diseases, integrative genomics and systems genetics have emerged as powerful strategies to decipher the key genetic drivers of molecular pathways that underlie disease. Broadly speaking, integrative genomics aims to link cellular-level traits (such as mRNA expression) to the genome to identify their genetic determinants. With the characterization of several cellular-level traits within the same system, the integrative genomics approach evolved into a more comprehensive study design, called systems genetics, which aims to unravel the complex biological networks and pathways involved in disease, and in turn map their genetic control points. The first fully integrated systems genetics study was carried out in rats, and the results, which revealed conserved trans-acting genetic regulation of a pro-inflammatory network relevant to type 1 diabetes, were translated to humans. Many studies using different organisms subsequently stemmed from this example. The aim of this Review is to describe the most recent advances in the fields of integrative genomics and systems genetics applied in the rat, with a focus on studies of complex diseases ranging from inflammatory to cardiometabolic disorders. We aim to provide the genetics community with a comprehensive insight into how the systems genetics approach came to life, starting from the first integrative genomics strategies [such as expression quantitative trait loci (eQTLs) mapping] and concluding with the most sophisticated gene network-based analyses in multiple systems and disease states. Although not limited to studies that have been directly translated to humans, we will focus particularly on the successful investigations in the rat that have led to primary discoveries of genes and pathways relevant to human disease.
Collapse
Affiliation(s)
- Aida Moreno-Moral
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore (NUS) Medical School, Singapore
| | - Enrico Petretto
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore (NUS) Medical School, Singapore
| |
Collapse
|
19
|
Richardson S, Tseng GC, Sun W. Statistical Methods in Integrative Genomics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2016; 3:181-209. [PMID: 27482531 PMCID: PMC4963036 DOI: 10.1146/annurev-statistics-041715-033506] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions.
Collapse
Affiliation(s)
- Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, CB2 0SR, United Kingdom
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261
| | - Wei Sun
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 27516
| |
Collapse
|
20
|
Newcombe PJ, Conti DV, Richardson S. JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects. Genet Epidemiol 2016; 40:188-201. [PMID: 27027514 PMCID: PMC4817278 DOI: 10.1002/gepi.21953] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 12/03/2015] [Accepted: 12/15/2015] [Indexed: 01/06/2023]
Abstract
Recently, large scale genome-wide association study (GWAS) meta-analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one-at-a-time. This complicates the ability of fine-mapping to identify a small set of SNPs for further functional follow-up. We describe a new and scalable algorithm, joint analysis of marginal summary statistics (JAM), for the re-analysis of published marginal summary statistics under joint multi-SNP models. The correlation is accounted for according to estimates from a reference dataset, and models and SNPs that best explain the complete joint pattern of marginal effects are highlighted via an integrated Bayesian penalized regression framework. We provide both enumerated and Reversible Jump MCMC implementations of JAM and present some comparisons of performance. In a series of realistic simulation studies, JAM demonstrated identical performance to various alternatives designed for single region settings. In multi-region settings, where the only multivariate alternative involves stepwise selection, JAM offered greater power and specificity. We also present an application to real published results from MAGIC (meta-analysis of glucose and insulin related traits consortium) - a GWAS meta-analysis of more than 15,000 people. We re-analysed several genomic regions that produced multiple significant signals with glucose levels 2 hr after oral stimulation. Through joint multivariate modelling, JAM was able to formally rule out many SNPs, and for one gene, ADCY5, suggests that an additional SNP, which transpired to be more biologically plausible, should be followed up with equal priority to the reported index.
Collapse
Affiliation(s)
| | - David V. Conti
- Division of BiostatisticsDepartment of Preventive MedicineZilkha Neurogenetic InstituteUniversity of Southern CaliforniaLos AngelesCaliforniaUnited States of America
| | | |
Collapse
|
21
|
Stell L, Sabatti C. Genetic Variant Selection: Learning Across Traits and Sites. Genetics 2016; 202:439-55. [PMID: 26680660 PMCID: PMC4788227 DOI: 10.1534/genetics.115.184572] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 11/30/2015] [Indexed: 11/18/2022] Open
Abstract
We consider resequencing studies of associated loci and the problem of prioritizing sequence variants for functional follow-up. Working within the multivariate linear regression framework helps us to account for the joint effects of multiple genes; and adopting a Bayesian approach leads to posterior probabilities that coherently incorporate all information about the variants' function. We describe two novel prior distributions that facilitate learning the role of each variable site by borrowing evidence across phenotypes and across mutations in the same gene. We illustrate their potential advantages with simulations and reanalyzing a data set of sequencing variants.
Collapse
Affiliation(s)
- Laurel Stell
- Department of Health Research and Policy, Stanford University, Stanford, California 94305
| | - Chiara Sabatti
- Department of Health Research and Policy, Stanford University, Stanford, California 94305 Department of Statistics, Stanford University, Stanford, California 94305
| |
Collapse
|
22
|
Liquet B, Bottolo L, Campanella G, Richardson S, Chadeau-Hyam M. R2GUESS: A Graphics Processing Unit-Based R Package for Bayesian Variable Selection Regression of Multivariate Responses. J Stat Softw 2016; 69. [PMID: 29568242 DOI: 10.18637/jss.v069.i02] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Technological advances in molecular biology over the past decade have given rise to high dimensional and complex datasets offering the possibility to investigate biological associations between a range of genomic features and complex phenotypes. The analysis of this novel type of data generated unprecedented computational challenges which ultimately led to the definition and implementation of computationally efficient statistical models that were able to scale to genome-wide data, including Bayesian variable selection approaches. While extensive methodological work has been carried out in this area, only few methods capable of handling hundreds of thousands of predictors were implemented and distributed. Among these we recently proposed GUESS, a computationally optimised algorithm making use of graphics processing unit capabilities, which can accommodate multiple outcomes. In this paper we propose R2GUESS, an R package wrapping the original C++ source code. In addition to providing a user-friendly interface of the original code automating its parametrisation, and data handling, R2GUESS also incorporates many features to explore the data, to extend statistical inferences from the native algorithm (e.g., effect size estimation, significance assessment), and to visualize outputs from the algorithm. We first detail the model and its parametrisation, and describe in details its optimised implementation. Based on two examples we finally illustrate its statistical performances and flexibility.
Collapse
Affiliation(s)
- Benoît Liquet
- Laboratoire de Mathématiques et de leurs Applications, Université de Pau et des Pays de l'Adour, UMR CNRS 5142, Pau, France; ARC Centre of Excellence for Mathematical and Statistical Frontiers, Queensland University of Technology (QUT), Brisbane, Australia
| | | | | | | | - Marc Chadeau-Hyam
- Department of Epidemiology and Biostatistics, Imperial College London, St Mary's Hospital, Norfolk Place, London, W21PG, United Kingdom
| |
Collapse
|
23
|
Lewin A, Saadi H, Peters JE, Moreno-Moral A, Lee JC, Smith KGC, Petretto E, Bottolo L, Richardson S. MT-HESS: an efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues. Bioinformatics 2015; 32:523-32. [PMID: 26504141 PMCID: PMC4743623 DOI: 10.1093/bioinformatics/btv568] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Accepted: 09/03/2015] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Analysing the joint association between a large set of responses and predictors is a fundamental statistical task in integrative genomics, exemplified by numerous expression Quantitative Trait Loci (eQTL) studies. Of particular interest are the so-called ': hotspots ': , important genetic variants that regulate the expression of many genes. Recently, attention has focussed on whether eQTLs are common to several tissues, cell-types or, more generally, conditions or whether they are specific to a particular condition. RESULTS We have implemented MT-HESS, a Bayesian hierarchical model that analyses the association between a large set of predictors, e.g. SNPs, and many responses, e.g. gene expression, in multiple tissues, cells or conditions. Our Bayesian sparse regression algorithm goes beyond ': one-at-a-time ': association tests between SNPs and responses and uses a fully multivariate model search across all linear combinations of SNPs, coupled with a model of the correlation between condition/tissue-specific responses. In addition, we use a hierarchical structure to leverage shared information across different genes, thus improving the detection of hotspots. We show the increase of power resulting from our new approach in an extensive simulation study. Our analysis of two case studies highlights new hotspots that would remain undetected by standard approaches and shows how greater prediction power can be achieved when several tissues are jointly considered. AVAILABILITY AND IMPLEMENTATION C[Formula: see text] source code and documentation including compilation instructions are available under GNU licence at http://www.mrc-bsu.cam.ac.uk/software/.
Collapse
Affiliation(s)
- Alex Lewin
- Department of Mathematics, Brunel University London
| | - Habib Saadi
- Department of Epidemiology and Biostatistics, Imperial College London, London
| | - James E Peters
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge, MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge
| | | | - James C Lee
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge
| | - Kenneth G C Smith
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge
| | - Enrico Petretto
- MRC Clinical Sciences Centre, Imperial College London, London, UK, Duke-NUS Graduate Medical School, Singapore, Singapore
| | - Leonardo Bottolo
- Department of Mathematics, Imperial College London, London, UK and Department of Medical Genetics, University of Cambridge
| | - Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge
| |
Collapse
|
24
|
Wallace C, Cutler AJ, Pontikos N, Pekalski ML, Burren OS, Cooper JD, García AR, Ferreira RC, Guo H, Walker NM, Smyth DJ, Rich SS, Onengut-Gumuscu S, Sawcer SJ, Ban M, Richardson S, Todd JA, Wicker LS. Dissection of a Complex Disease Susceptibility Region Using a Bayesian Stochastic Search Approach to Fine Mapping. PLoS Genet 2015; 11:e1005272. [PMID: 26106896 PMCID: PMC4481316 DOI: 10.1371/journal.pgen.1005272] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 05/12/2015] [Indexed: 12/15/2022] Open
Abstract
Identification of candidate causal variants in regions associated with risk of common diseases is complicated by linkage disequilibrium (LD) and multiple association signals. Nonetheless, accurate maps of these variants are needed, both to fully exploit detailed cell specific chromatin annotation data to highlight disease causal mechanisms and cells, and for design of the functional studies that will ultimately be required to confirm causal mechanisms. We adapted a Bayesian evolutionary stochastic search algorithm to the fine mapping problem, and demonstrated its improved performance over conventional stepwise and regularised regression through simulation studies. We then applied it to fine map the established multiple sclerosis (MS) and type 1 diabetes (T1D) associations in the IL-2RA (CD25) gene region. For T1D, both stepwise and stochastic search approaches identified four T1D association signals, with the major effect tagged by the single nucleotide polymorphism, rs12722496. In contrast, for MS, the stochastic search found two distinct competing models: a single candidate causal variant, tagged by rs2104286 and reported previously using stepwise analysis; and a more complex model with two association signals, one of which was tagged by the major T1D
associated rs12722496 and the other by rs56382813. There is low to moderate LD between rs2104286 and both rs12722496 and rs56382813 (r2 ≃ 0:3) and our two SNP model could not be recovered through a forward stepwise search after conditioning on rs2104286. Both signals in the two variant model for MS affect CD25 expression on distinct subpopulations of CD4+ T cells, which are key cells in the autoimmune process. The results support a shared causal variant for T1D and MS. Our study illustrates the benefit of using a purposely designed model search strategy for fine mapping and the advantage of combining disease and protein expression data. Genetic association studies have identified many DNA sequence variants that associate with disease risk. By exploiting the known correlation that exists between neighbouring variants in the genome, inference can be extended beyond those individual variants tested to identify sets within which a causal variant is likely to reside. However, this correlation, particularly in the presence of multiple disease causing variants in relative proximity, makes disentangling the specific causal variants difficult. Statistical approaches to this fine mapping problem have traditionally taken a stepwise search approach, beginning with the most associated variant in a region, then iteratively attempting to find additional associated variants. We adapted a stochastic search approach that avoids this stepwise process and is explicitly designed for dealing with highly correlated predictors to the fine mapping problem. We showed in simulated data that it outperforms its stepwise counterpart and other variable selection strategies such as the lasso. We applied our approach to understand the association of two immune-mediated diseases to a region on chromosome 10p15. We identified a model for multiple sclerosis containing two variants, neither of which was found through a stepwise search, and functionally linked both of these to the neighbouring candidate gene, IL2RA, in independent data. Our approach can be used to aid fine mapping of other disease-associated regions, which is critical for design of functional follow-up studies required to understand the mechanisms through which genetic variants influence disease.
Collapse
Affiliation(s)
- Chris Wallace
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom; MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, United Kingdom
| | - Antony J Cutler
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Nikolas Pontikos
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Marcin L Pekalski
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Oliver S Burren
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Jason D Cooper
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Arcadio Rubio García
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Ricardo C Ferreira
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Hui Guo
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom; Centre for Biostatistics Institute of Population Health, The University of Manchester Manchester, United Kingdom
| | - Neil M Walker
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Deborah J Smyth
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America; Department of Medicine, Division of Endocrinology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Suna Onengut-Gumuscu
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America; Department of Public Health Sciences, Division of Biostatistics and Epidemiology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Stephen J Sawcer
- University of Cambridge, Department of Clinical Neurosciences, Cambridge, United Kingdom
| | - Maria Ban
- University of Cambridge, Department of Clinical Neurosciences, Cambridge, United Kingdom
| | - Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, United Kingdom
| | - John A Todd
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Linda S Wicker
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
25
|
Yin Z, Xia K, Chung W, Sullivan PF, Zou F. Fast eQTL Analysis for Twin Studies. Genet Epidemiol 2015; 39:357-65. [PMID: 25865703 DOI: 10.1002/gepi.21900] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Revised: 01/17/2015] [Accepted: 02/23/2015] [Indexed: 12/29/2022]
Abstract
Twin data are commonly used for studying complex psychiatric disorders, and mixed effects models are one of the most popular tools for modeling dependence structures between twin pairs. However, for eQTL (expression quantitative trait loci) data where associations between thousands of transcripts and millions of single nucleotide polymorphisms need to be tested, mixed effects models are computationally inefficient and often impractical. In this paper, we propose a fast eQTL analysis approach for twin eQTL data where we randomly split twin pairs into two groups, so that within each group the samples are unrelated, and we then apply a multiple linear regression analysis separately to each group. A score statistic that automatically adjusts the (hidden) correlation between the two groups is constructed for combining the results from the two groups. The proposed method has well-controlled type I error. Compared to mixed effects models, the proposed method has similar power but drastically improved computational efficiency. We demonstrate the computational advantage of the proposed method via extensive simulations. The proposed method is also applied to a large twin eQTL data from the Netherlands Twin Register.
Collapse
Affiliation(s)
- Zhaoyu Yin
- Department of Biostatistics, University of North Carolina, Chapel Hill, North, Carolina, United States of America
| | - Kai Xia
- Department of Psychiatry, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Wonil Chung
- School of Public Health, Harvard, Boston, Massachusetts, United States of America
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Fei Zou
- Department of Biostatistics, University of North Carolina, Chapel Hill, North, Carolina, United States of America
| |
Collapse
|
26
|
Johnson MR, Behmoaras J, Bottolo L, Krishnan ML, Pernhorst K, Santoscoy PLM, Rossetti T, Speed D, Srivastava PK, Chadeau-Hyam M, Hajji N, Dabrowska A, Rotival M, Razzaghi B, Kovac S, Wanisch K, Grillo FW, Slaviero A, Langley SR, Shkura K, Roncon P, De T, Mattheisen M, Niehusmann P, O'Brien TJ, Petrovski S, von Lehe M, Hoffmann P, Eriksson J, Coffey AJ, Cichon S, Walker M, Simonato M, Danis B, Mazzuferi M, Foerch P, Schoch S, De Paola V, Kaminski RM, Cunliffe VT, Becker AJ, Petretto E. Systems genetics identifies Sestrin 3 as a regulator of a proconvulsant gene network in human epileptic hippocampus. Nat Commun 2015; 6:6031. [PMID: 25615886 DOI: 10.1038/ncomms7031] [Citation(s) in RCA: 133] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Accepted: 12/04/2014] [Indexed: 01/20/2023] Open
Abstract
Gene-regulatory network analysis is a powerful approach to elucidate the molecular processes and pathways underlying complex disease. Here we employ systems genetics approaches to characterize the genetic regulation of pathophysiological pathways in human temporal lobe epilepsy (TLE). Using surgically acquired hippocampi from 129 TLE patients, we identify a gene-regulatory network genetically associated with epilepsy that contains a specialized, highly expressed transcriptional module encoding proconvulsive cytokines and Toll-like receptor signalling genes. RNA sequencing analysis in a mouse model of TLE using 100 epileptic and 100 control hippocampi shows the proconvulsive module is preserved across-species, specific to the epileptic hippocampus and upregulated in chronic epilepsy. In the TLE patients, we map the trans-acting genetic control of this proconvulsive module to Sestrin 3 (SESN3), and demonstrate that SESN3 positively regulates the module in macrophages, microglia and neurons. Morpholino-mediated Sesn3 knockdown in zebrafish confirms the regulation of the transcriptional module, and attenuates chemically induced behavioural seizures in vivo.
Collapse
Affiliation(s)
- Michael R Johnson
- Division of Brain Sciences, Imperial College London, Hammersmith Hospital Campus, Burlington Danes Building, London W12 0NN, UK
| | - Jacques Behmoaras
- Centre for Complement and Inflammation Research, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Leonardo Bottolo
- Department of Mathematics, Imperial College London, 180 Queen's Gate, London SW7 2AZ, UK
| | - Michelle L Krishnan
- Centre for the Developing Brain, Department of Perinatal Imaging and Health, St Thomas' Hospital, King's College London, London SE1 7EH, UK
| | - Katharina Pernhorst
- Section of Translational Epileptology, Department of Neuropathology, University of Bonn, Sigmund Freud Street 25, Bonn D-53127, Germany
| | - Paola L Meza Santoscoy
- Department of Biomedical Science, Bateson Centre, University of Sheffield, Firth Court, Western Bank, Sheffield S10 2TN, UK
| | - Tiziana Rossetti
- Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Doug Speed
- UCL Genetics Institute, University College London, Gower Street, London WC1E 6BT, UK
| | - Prashant K Srivastava
- Division of Brain Sciences, Imperial College London, Hammersmith Hospital Campus, Burlington Danes Building, London W12 0NN, UK.,Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Marc Chadeau-Hyam
- Department of Epidemiology and Biostatistics, School of Public Health, MRC/PHE Centre for Environment and Health, Imperial College London, St Mary's Hospital, Norfolk Place, W21PG London, UK
| | - Nabil Hajji
- Department of Medicine, Centre for Pharmacology and Therapeutics, Imperial College London, Du Cane Road, London W12 0NN, UK
| | - Aleksandra Dabrowska
- Department of Medicine, Centre for Pharmacology and Therapeutics, Imperial College London, Du Cane Road, London W12 0NN, UK
| | - Maxime Rotival
- Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Banafsheh Razzaghi
- Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Stjepana Kovac
- Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Klaus Wanisch
- Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Federico W Grillo
- Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Anna Slaviero
- Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Sarah R Langley
- Division of Brain Sciences, Imperial College London, Hammersmith Hospital Campus, Burlington Danes Building, London W12 0NN, UK.,Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Kirill Shkura
- Division of Brain Sciences, Imperial College London, Hammersmith Hospital Campus, Burlington Danes Building, London W12 0NN, UK.,Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Paolo Roncon
- Department of Medical Sciences, Section of Pharmacology and Neuroscience Center, University of Ferrara, 44121 Ferrara, Italy.,National Institute of Neuroscience, 44121 Ferrara, Italy
| | - Tisham De
- Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Manuel Mattheisen
- Department of Genomics, Life and Brain Center, University of Bonn, D-53127 Bonn, Germany.,Institute of Human Genetics, University of Bonn, D-53127 Bonn, Germany.,Institute for Genomic Mathematics, University of Bonn, D-53127 Bonn, Germany
| | - Pitt Niehusmann
- Section of Translational Epileptology, Department of Neuropathology, University of Bonn, Sigmund Freud Street 25, Bonn D-53127, Germany
| | - Terence J O'Brien
- Department of Medicine, RMH, University of Melbourne, Royal Melbourne Hospital, Royal Parade, Parkville, Victoria 3050, Australia
| | - Slave Petrovski
- Department of Neurology, Royal Melbourne Hospital, Melbourne, Parkville, Victoria 3050, Australia
| | - Marec von Lehe
- Department of Neurosurgery, University of Bonn Medical Center, Sigmund-Freud-Strasse 25, 53105 Bonn, Germany
| | - Per Hoffmann
- Institute of Human Genetics, University of Bonn, Sigmund-Freud-Strasse 25, 53127 Bonn, Germany.,Department of Biomedicine, University of Basel, Hebelstrasse 20, 4056 Basel, Switzerland
| | - Johan Eriksson
- Folkhälsan Research Centre, Topeliusgatan 20, 00250 Helsinki, Finland.,Helsinki University Central Hospital, Unit of General Practice, Haartmaninkatu 4, Helsinki 00290, Finland.,Department of General Practice and Primary Health Care, University of Helsinki, 407, PO Box 20, Tukholmankatu 8 B, Helsinki 00014, Finland
| | - Alison J Coffey
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Sven Cichon
- Institute of Human Genetics, University of Bonn, Sigmund-Freud-Strasse 25, 53127 Bonn, Germany.,Department of Biomedicine, University of Basel, Hebelstrasse 20, 4056 Basel, Switzerland
| | - Matthew Walker
- Institute of Neurology, University College London, London WC1N 3BG, UK
| | - Michele Simonato
- Department of Medical Sciences, Section of Pharmacology and Neuroscience Center, University of Ferrara, 44121 Ferrara, Italy.,National Institute of Neuroscience, 44121 Ferrara, Italy.,Laboratory for Technologies of Advanced Therapies (LTTA), University of Ferrara, 44121 Ferrara, Italy
| | - Bénédicte Danis
- Neuroscience TA, UCB Biopharma SPRL, Avenue de l'industrie, R9, B-1420 Braine l'Alleud, Belgium
| | - Manuela Mazzuferi
- Neuroscience TA, UCB Biopharma SPRL, Avenue de l'industrie, R9, B-1420 Braine l'Alleud, Belgium
| | - Patrik Foerch
- Neuroscience TA, UCB Biopharma SPRL, Avenue de l'industrie, R9, B-1420 Braine l'Alleud, Belgium
| | - Susanne Schoch
- Section of Translational Epileptology, Department of Neuropathology, University of Bonn, Sigmund Freud Street 25, Bonn D-53127, Germany.,Department of Epileptology, University of Bonn Medical Center, Sigmund-Freud-Strasse 25, Bonn D-53127, Germany
| | - Vincenzo De Paola
- Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK
| | - Rafal M Kaminski
- Neuroscience TA, UCB Biopharma SPRL, Avenue de l'industrie, R9, B-1420 Braine l'Alleud, Belgium
| | - Vincent T Cunliffe
- Department of Biomedical Science, Bateson Centre, University of Sheffield, Firth Court, Western Bank, Sheffield S10 2TN, UK
| | - Albert J Becker
- Section of Translational Epileptology, Department of Neuropathology, University of Bonn, Sigmund Freud Street 25, Bonn D-53127, Germany
| | - Enrico Petretto
- Medical Research Council (MRC) Clinical Sciences Centre, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK.,Duke-NUS Graduate Medical School, 8 College Road, Singapore 169857, Singapore
| |
Collapse
|
27
|
Novel distal eQTL analysis demonstrates effect of population genetic architecture on detecting and interpreting associations. Genetics 2014; 198:879-93. [PMID: 25230953 PMCID: PMC4224177 DOI: 10.1534/genetics.114.167791] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Mapping expression quantitative trait loci (eQTL) has identified genetic variants associated with transcription rates and has provided insight into genotype-phenotype associations obtained from genome-wide association studies (GWAS). Traditional eQTL mapping methods present significant challenges for the multiple-testing burden, resulting in a limited ability to detect eQTL that reside distal to the affected gene. To overcome this, we developed a novel eQTL testing approach, " NET: work-based, L: arge-scale I: dentification o F: dis T: al eQTL" (NetLIFT), which performs eQTL testing based on the pairwise conditional dependencies between genes' expression levels. When applied to existing data from yeast segregants, NetLIFT replicated most previously identified distal eQTL and identified 46% more genes with distal effects compared to local effects. In liver data from mouse lines derived through the Collaborative Cross project, NetLIFT detected 5744 genes with local eQTL while 3322 genes had distal eQTL. This analysis revealed founder-of-origin effects for a subset of local eQTL that may contribute to previously described phenotypic differences in metabolic traits. In human lymphoblastoid cell lines, NetLIFT was able to detect 1274 transcripts with distal eQTL that had not been reported in previous studies, while 2483 transcripts with local eQTL were identified. In all species, we found no enrichment for transcription factors facilitating eQTL associations; instead, we found that most trans-acting factors were annotated for metabolic function, suggesting that genetic variation may indirectly regulate multigene pathways by targeting key components of feedback processes within regulatory networks. Furthermore, the unique genetic history of each population appears to influence the detection of genes with local and distal eQTL.
Collapse
|
28
|
Kang H, Kerloc'h A, Rotival M, Xu X, Zhang Q, D'Souza Z, Kim M, Scholz JC, Ko JH, Srivastava PK, Genzen JR, Cui W, Aitman TJ, Game L, Melvin JE, Hanidu A, Dimock J, Zheng J, Souza D, Behera AK, Nabozny G, Cook HT, Bassett JHD, Williams GR, Li J, Vignery A, Petretto E, Behmoaras J. Kcnn4 is a regulator of macrophage multinucleation in bone homeostasis and inflammatory disease. Cell Rep 2014; 8:1210-24. [PMID: 25131209 PMCID: PMC4471813 DOI: 10.1016/j.celrep.2014.07.032] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Revised: 05/19/2014] [Accepted: 07/20/2014] [Indexed: 12/29/2022] Open
Abstract
Macrophages can fuse to form osteoclasts in bone or multinucleate giant cells (MGCs) as part of the immune response. We use a systems genetics approach in rat macrophages to unravel their genetic determinants of multinucleation and investigate their role in both bone homeostasis and inflammatory disease. We identify a trans-regulated gene network associated with macrophage multinucleation and Kcnn4 as being the most significantly trans-regulated gene in the network and induced at the onset of fusion. Kcnn4 is required for osteoclast and MGC formation in rodents and humans. Genetic deletion of Kcnn4 reduces macrophage multinucleation through modulation of Ca2+ signaling, increases bone mass, and improves clinical outcome in arthritis. Pharmacological blockade of Kcnn4 reduces experimental glomerulonephritis. Our data implicate Kcnn4 in macrophage multinucleation, identifying it as a potential therapeutic target for inhibition of bone resorption and chronic inflammation. We identified a gene network that regulates macrophage multinucleation and includes Kcnn4 Kcnn4 can be targeted in two inflammatory conditions with macrophage multinucleation Kcnn4 regulates bone mass under physiological conditions Kcnn4 is a drug target for which inhibitors reached phase III of clinical trials
Collapse
Affiliation(s)
- Heeseog Kang
- Departments of Orthopaedics and Cell Biology, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Audrey Kerloc'h
- Centre for Complement and Inflammation Research (CCIR), Imperial College London, London W12 0NN, UK
| | - Maxime Rotival
- Integrative Genomics and Medicine, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
| | - Xiaoqing Xu
- Departments of Orthopaedics and Cell Biology, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Qing Zhang
- Departments of Orthopaedics and Cell Biology, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Zelpha D'Souza
- Physiological Genomics and Medicine, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
| | - Michael Kim
- Departments of Orthopaedics and Cell Biology, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Jodi Carlson Scholz
- Section of Comparative Medicine, Yale School of Medicine, New Haven, CT 06510, USA
| | - Jeong-Hun Ko
- Centre for Complement and Inflammation Research (CCIR), Imperial College London, London W12 0NN, UK
| | - Prashant K Srivastava
- Integrative Genomics and Medicine, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
| | - Jonathan R Genzen
- Department of Pathology, University of Utah and ARUP Laboratories, Salt Lake City, UT 84108, USA
| | - Weiguo Cui
- Blood Center of Wisconsin, Milwaukee, WI 53213, USA
| | - Timothy J Aitman
- Physiological Genomics and Medicine, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK
| | - Laurence Game
- Genomics Laboratory, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, London, UK
| | - James E Melvin
- National Institute of Dental and Craniofacial Research (NIDCR), National Institute of Health, Bethesda, MD 20892, USA
| | - Adedayo Hanidu
- Department of Immunology and Inflammation, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, CT 06877, USA
| | - Janice Dimock
- Department of Immunology and Inflammation, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, CT 06877, USA
| | - Jie Zheng
- Department of Immunology and Inflammation, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, CT 06877, USA
| | - Donald Souza
- Department of Immunology and Inflammation, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, CT 06877, USA
| | - Aruna K Behera
- Department of Immunology and Inflammation, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, CT 06877, USA
| | - Gerald Nabozny
- Department of Immunology and Inflammation, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, CT 06877, USA
| | - H Terence Cook
- Centre for Complement and Inflammation Research (CCIR), Imperial College London, London W12 0NN, UK
| | - J H Duncan Bassett
- Molecular Endocrinology Group, Department of Medicine, Imperial College London, London W12 0NN, UK
| | - Graham R Williams
- Molecular Endocrinology Group, Department of Medicine, Imperial College London, London W12 0NN, UK
| | - Jun Li
- Department of Immunology and Inflammation, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, CT 06877, USA
| | - Agnès Vignery
- Departments of Orthopaedics and Cell Biology, Yale University School of Medicine, New Haven, CT 06510, USA.
| | - Enrico Petretto
- Integrative Genomics and Medicine, MRC Clinical Sciences Centre, Imperial College London, London W12 0NN, UK.
| | - Jacques Behmoaras
- Centre for Complement and Inflammation Research (CCIR), Imperial College London, London W12 0NN, UK.
| |
Collapse
|
29
|
Kristensen VN, Lingjærde OC, Russnes HG, Vollan HKM, Frigessi A, Børresen-Dale AL. Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer 2014; 14:299-313. [PMID: 24759209 DOI: 10.1038/nrc3721] [Citation(s) in RCA: 249] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Combined analyses of molecular data, such as DNA copy-number alteration, mRNA and protein expression, point to biological functions and molecular pathways being deregulated in multiple cancers. Genomic, metabolomic and clinical data from various solid cancers and model systems are emerging and can be used to identify novel patient subgroups for tailored therapy and monitoring. The integrative genomics methodologies that are used to interpret these data require expertise in different disciplines, such as biology, medicine, mathematics, statistics and bioinformatics, and they can seem daunting. The objectives, methods and computational tools of integrative genomics that are available to date are reviewed here, as is their implementation in cancer research.
Collapse
Affiliation(s)
- Vessela N Kristensen
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Clinical Molecular Oncology, Division of Medicine, Akershus University Hospital, 1478 Ahus, Norway
| | - Ole Christian Lingjærde
- 1] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [2] Division for Biomedical Informatics, Department of Computer Science, University of Oslo, 0316 Oslo, Norway
| | - Hege G Russnes
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Pathology, Oslo University Hospital, 0450 Oslo, Norway
| | - Hans Kristian M Vollan
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Oncology, Division of Cancer, Surgery and Transplantation, Oslo University Hospital, 0450 Oslo, Norway
| | - Arnoldo Frigessi
- 1] Statistics for Innovation, Norwegian Computing Center, 0314 Oslo, Norway. [2] Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, PO Box 1122 Blindern, 0317 Oslo, Norway
| | - Anne-Lise Børresen-Dale
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway
| |
Collapse
|
30
|
Rotival M, Petretto E. Leveraging gene co-expression networks to pinpoint the regulation of complex traits and disease, with a focus on cardiovascular traits. Brief Funct Genomics 2013; 13:66-78. [PMID: 23960099 DOI: 10.1093/bfgp/elt030] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Over the past decade, the number of genome-scale transcriptional datasets in publicly available databases has climbed to nearly one million, providing an unprecedented opportunity for extensive analyses of gene co-expression networks. In systems-genetic studies of complex diseases researchers increasingly focus on groups of highly interconnected genes within complex transcriptional networks (referred to as clusters, modules or subnetworks) to uncover specific molecular processes that can inform functional disease mechanisms and pathological pathways. Here, we outline the basic paradigms underlying gene co-expression network analysis and critically review the most commonly used computational methods. Finally, we discuss specific applications of network-based approaches to the study of cardiovascular traits, which highlight the power of integrated analyses of networks, genetic and gene-regulation data to elucidate the complex mechanisms underlying cardiovascular disease.
Collapse
Affiliation(s)
- Maxime Rotival
- MRC-Clinical Sciences Centre, Hammersmith Hospital Campus, Imperial College Centre for Translational and Experimental Medicine (ICTEM Building), Du Cane Road, London, W12 0NN UK. Tel.: + 44-020-8383-1468; Fax: +44-208-383-8577;
| | | |
Collapse
|
31
|
Chadeau-Hyam M, Campanella G, Jombart T, Bottolo L, Portengen L, Vineis P, Liquet B, Vermeulen RCH. Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2013; 54:542-557. [PMID: 23918146 DOI: 10.1002/em.21797] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Revised: 05/21/2013] [Accepted: 05/28/2013] [Indexed: 05/28/2023]
Abstract
Recent technological advances in molecular biology have given rise to numerous large-scale datasets whose analysis imposes serious methodological challenges mainly relating to the size and complex structure of the data. Considerable experience in analyzing such data has been gained over the past decade, mainly in genetics, from the Genome-Wide Association Study era, and more recently in transcriptomics and metabolomics. Building upon the corresponding literature, we provide here a nontechnical overview of well-established methods used to analyze OMICS data within three main types of regression-based approaches: univariate models including multiple testing correction strategies, dimension reduction techniques, and variable selection models. Our methodological description focuses on methods for which ready-to-use implementations are available. We describe the main underlying assumptions, the main features, and advantages and limitations of each of the models. This descriptive summary constitutes a useful tool for driving methodological choices while analyzing OMICS data, especially in environmental epidemiology, where the emergence of the exposome concept clearly calls for unified methods to analyze marginally and jointly complex exposure and OMICS datasets.
Collapse
Affiliation(s)
- Marc Chadeau-Hyam
- Department of Epidemiology and Biostatistics, MRC-HPA Centre for Environment and Health, School of Public Health, Imperial College London, Norfolk Place, London, W2 1PG, United Kingdom.
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Zhu Y, Shen X, Pan W. Simultaneous grouping pursuit and feature selection over an undirected graph. J Am Stat Assoc 2013; 108:713-725. [PMID: 24098061 DOI: 10.1080/01621459.2013.770704] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
In high-dimensional regression, grouping pursuit and feature selection have their own merits while complementing each other in battling the curse of dimensionality. To seek a parsimonious model, we perform simultaneous grouping pursuit and feature selection over an arbitrary undirected graph with each node corresponding to one predictor. When the corresponding nodes are reachable from each other over the graph, regression coefficients can be grouped, whose absolute values are the same or close. This is motivated from gene network analysis, where genes tend to work in groups according to their biological functionalities. Through a nonconvex penalty, we develop a computational strategy and analyze the proposed method. Theoretical analysis indicates that the proposed method reconstructs the oracle estimator, that is, the unbiased least squares estimator given the true grouping, leading to consistent reconstruction of grouping structures and informative features, as well as to optimal parameter estimation. Simulation studies suggest that the method combines the benefit of grouping pursuit with that of feature selection, and compares favorably against its competitors in selection accuracy and predictive performance. An application to eQTL data is used to illustrate the methodology, where a network is incorporated into analysis through an undirected graph.
Collapse
|
33
|
Bhadra A, Mallick BK. Joint High‐Dimensional Bayesian Variable and Covariance Selection with an Application to eQTL Analysis. Biometrics 2013; 69:447-57. [DOI: 10.1111/biom.12021] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2011] [Revised: 10/01/2012] [Accepted: 12/01/2012] [Indexed: 01/29/2023]
Affiliation(s)
- Anindya Bhadra
- Department of StatisticsPurdue University, West Lafayette Indiana 47907‐2066, U.S.A
| | - Bani K. Mallick
- Department of StatisticsTexas A&M University, College Station Texas 77843‐3143, U.S.A
| |
Collapse
|
34
|
Curtis RE, Kim S, Woolford JL, Xu W, Xing EP. Structured association analysis leads to insight into Saccharomyces cerevisiae gene regulation by finding multiple contributing eQTL hotspots associated with functional gene modules. BMC Genomics 2013; 14:196. [PMID: 23514438 PMCID: PMC3616858 DOI: 10.1186/1471-2164-14-196] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 03/12/2013] [Indexed: 01/08/2023] Open
Abstract
Background Association analysis using genome-wide expression quantitative trait locus (eQTL) data investigates the effect that genetic variation has on cellular pathways and leads to the discovery of candidate regulators. Traditional analysis of eQTL data via pairwise statistical significance tests or linear regression does not leverage the availability of the structural information of the transcriptome, such as presence of gene networks that reveal correlation and potentially regulatory relationships among the study genes. We employ a new eQTL mapping algorithm, GFlasso, which we have previously developed for sparse structured regression, to reanalyze a genome-wide yeast dataset. GFlasso fully takes into account the dependencies among expression traits to suppress false positives and to enhance the signal/noise ratio. Thus, GFlasso leverages the gene-interaction network to discover the pleiotropic effects of genetic loci that perturb the expression level of multiple (rather than individual) genes, which enables us to gain more power in detecting previously neglected signals that are marginally weak but pleiotropically significant. Results While eQTL hotspots in yeast have been reported previously as genomic regions controlling multiple genes, our analysis reveals additional novel eQTL hotspots and, more interestingly, uncovers groups of multiple contributing eQTL hotspots that affect the expression level of functional gene modules. To our knowledge, our study is the first to report this type of gene regulation stemming from multiple eQTL hotspots. Additionally, we report the results from in-depth bioinformatics analysis for three groups of these eQTL hotspots: ribosome biogenesis, telomere silencing, and retrotransposon biology. We suggest candidate regulators for the functional gene modules that map to each group of hotspots. Not only do we find that many of these candidate regulators contain mutations in the promoter and coding regions of the genes, in the case of the Ribi group, we provide experimental evidence suggesting that the identified candidates do regulate the target genes predicted by GFlasso. Conclusions Thus, this structured association analysis of a yeast eQTL dataset via GFlasso, coupled with extensive bioinformatics analysis, discovers a novel regulation pattern between multiple eQTL hotspots and functional gene modules. Furthermore, this analysis demonstrates the potential of GFlasso as a powerful computational tool for eQTL studies that exploit the rich structural information among expression traits due to correlation, regulation, or other forms of biological dependencies.
Collapse
Affiliation(s)
- Ross E Curtis
- Joint Carnegie Mellon – University of Pittsburgh PhD Program in Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | | | | | | |
Collapse
|