1
|
Blatti C, de la Fuente J, Gao H, Marín-Goñi I, Chen Z, Zhao SD, Tan W, Weinshilboum R, Kalari KR, Wang L, Hernaez M. Bayesian Machine Learning Enables Identification of Transcriptional Network Disruptions Associated with Drug-Resistant Prostate Cancer. Cancer Res 2023; 83:1361-1380. [PMID: 36779846 PMCID: PMC10102853 DOI: 10.1158/0008-5472.can-22-1910] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/29/2022] [Accepted: 02/08/2023] [Indexed: 02/14/2023]
Abstract
Survival rates of patients with metastatic castration-resistant prostate cancer (mCRPC) are low due to lack of response or acquired resistance to available therapies, such as abiraterone (Abi). A better understanding of the underlying molecular mechanisms is needed to identify effective targets to overcome resistance. Given the complexity of the transcriptional dynamics in cells, differential gene expression analysis of bulk transcriptomics data cannot provide sufficient detailed insights into resistance mechanisms. Incorporating network structures could overcome this limitation to provide a global and functional perspective of Abi resistance in mCRPC. Here, we developed TraRe, a computational method using sparse Bayesian models to examine phenotypically driven transcriptional mechanistic differences at three distinct levels: transcriptional networks, specific regulons, and individual transcription factors (TF). TraRe was applied to transcriptomic data from 46 patients with mCRPC with Abi-response clinical data and uncovered abrogated immune response transcriptional modules that showed strong differential regulation in Abi-responsive compared with Abi-resistant patients. These modules were replicated in an independent mCRPC study. Furthermore, key rewiring predictions and their associated TFs were experimentally validated in two prostate cancer cell lines with different Abi-resistance features. Among them, ELK3, MXD1, and MYB played a differential role in cell survival in Abi-sensitive and Abi-resistant cells. Moreover, ELK3 regulated cell migration capacity, which could have a direct impact on mCRPC. Collectively, these findings shed light on the underlying transcriptional mechanisms driving Abi response, demonstrating that TraRe is a promising tool for generating novel hypotheses based on identified transcriptional network disruptions. SIGNIFICANCE The computational method TraRe built on Bayesian machine learning models for investigating transcriptional network structures shows that disruption of ELK3, MXD1, and MYB signaling cascades impacts abiraterone resistance in prostate cancer.
Collapse
Affiliation(s)
- Charles Blatti
- NCSA, University of Illinois at Urbana-Champaign, Champaign, Illinois
| | | | - Huanyao Gao
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota
| | - Irene Marín-Goñi
- Computational Biology Program, CIMA University of Navarra, Navarra, Spain
| | - Zikun Chen
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois
| | - Sihai D. Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, Illinois
| | - Winston Tan
- Department of Internal Medicine, Mayo Clinic, Rochester, Minnesota
| | - Richard Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota
| | - Krishna R. Kalari
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota
| | - Mikel Hernaez
- Computational Biology Program, CIMA University of Navarra, Navarra, Spain
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, Illinois
| |
Collapse
|
2
|
Gockley J, Montgomery KS, Poehlman WL, Wiley JC, Liu Y, Gerasimov E, Greenwood AK, Sieberts SK, Wingo AP, Wingo TS, Mangravite LM, Logsdon BA. Multi-tissue neocortical transcriptome-wide association study implicates 8 genes across 6 genomic loci in Alzheimer's disease. Genome Med 2021; 13:76. [PMID: 33947463 PMCID: PMC8094491 DOI: 10.1186/s13073-021-00890-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 04/17/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is an incurable neurodegenerative disease currently affecting 1.75% of the US population, with projected growth to 3.46% by 2050. Identifying common genetic variants driving differences in transcript expression that confer AD risk is necessary to elucidate AD mechanism and develop therapeutic interventions. We modify the FUSION transcriptome-wide association study (TWAS) pipeline to ingest gene expression values from multiple neocortical regions. METHODS A combined dataset of 2003 genotypes clustered to 1000 Genomes individuals from Utah with Northern and Western European ancestry (CEU) was used to construct a training set of 790 genotypes paired to 888 RNASeq profiles from temporal cortex (TCX = 248), prefrontal cortex (FP = 50), inferior frontal gyrus (IFG = 41), superior temporal gyrus (STG = 34), parahippocampal cortex (PHG = 34), and dorsolateral prefrontal cortex (DLPFC = 461). Following within-tissue normalization and covariate adjustment, predictive weights to impute expression components based on a gene's surrounding cis-variants were trained. The FUSION pipeline was modified to support input of pre-scaled expression values and support cross validation with a repeated measure design arising from the presence of multiple transcriptome samples from the same individual across different tissues. RESULTS Cis-variant architecture alone was informative to train weights and impute expression for 6780 (49.67%) autosomal genes, the majority of which significantly correlated with gene expression; FDR < 5%: N = 6775 (99.92%), Bonferroni: N = 6716 (99.06%). Validation of weights in 515 matched genotype to RNASeq profiles from the CommonMind Consortium (CMC) was (72.14%) in DLPFC profiles. Association of imputed expression components from all 2003 genotype profiles yielded 8 genes significantly associated with AD (FDR < 0.05): APOC1, EED, CD2AP, CEACAM19, CLPTM1, MTCH2, TREM2, and KNOP1. CONCLUSIONS We provide evidence of cis-genetic variation conferring AD risk through 8 genes across six distinct genomic loci. Moreover, we provide expression weights for 6780 genes as a valuable resource to the community, which can be abstracted across the neocortex and a wide range of neuronal phenotypes.
Collapse
Affiliation(s)
| | | | | | | | - Yue Liu
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Ekaterina Gerasimov
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | | | | | - Aliza P Wingo
- Division of Mental Health, Atlanta VA Medical Center, Decatur, GA, USA
- Department of Psychiatry, Emory University School of Medicine, Atlanta, GA, USA
| | - Thomas S Wingo
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | | | - Benjamin A Logsdon
- Cajal Neuroscience, 1616 Eastlake Avenue East, Suite 208, Seattle, WA, 98102, USA.
| |
Collapse
|
3
|
Hernaez M, Blatti C, Gevaert O. Comparison of single and module-based methods for modeling gene regulatory networks. Bioinformatics 2020; 36:558-567. [PMID: 31287491 DOI: 10.1093/bioinformatics/btz549] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 06/11/2019] [Accepted: 07/06/2019] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Gene regulatory networks describe the regulatory relationships among genes, and developing methods for reverse engineering these networks is an ongoing challenge in computational biology. The majority of the initially proposed methods for gene regulatory network discovery create a network of genes and then mine it in order to uncover previously unknown regulatory processes. More recent approaches have focused on inferring modules of co-regulated genes, linking these modules with regulatory genes and then mining them to discover new molecular biology. RESULTS In this work we analyze module-based network approaches to build gene regulatory networks, and compare their performance to single gene network approaches. In the process, we propose a novel approach to estimate gene regulatory networks drawing from the module-based methods. We show that generating modules of co-expressed genes which are predicted by a sparse set of regulators using a variational Bayes method, and then building a bipartite graph on the generated modules using sparse regression, yields more informative networks than previous single and module-based network approaches as measured by: (i) the rate of enriched gene sets, (ii) a network topology assessment, (iii) ChIP-Seq evidence and (iv) the KnowEnG Knowledge Network collection of previously characterized gene-gene interactions. AVAILABILITY AND IMPLEMENTATION The code is written in R and can be downloaded from https://github.com/mikelhernaez/linker. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikel Hernaez
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Charles Blatti
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Olivier Gevaert
- The Stanford Center of Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University.,Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
4
|
Duan W, Zhang R, Zhao Y, Shen S, Wei Y, Chen F, Christiani DC. Bayesian variable selection for parametric survival model with applications to cancer omics data. Hum Genomics 2018; 12:49. [PMID: 30400837 PMCID: PMC6218990 DOI: 10.1186/s40246-018-0179-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 10/07/2018] [Indexed: 12/15/2022] Open
Abstract
Background Modeling thousands of markers simultaneously has been of great interest in testing association between genetic biomarkers and disease or disease-related quantitative traits. Recently, an expectation-maximization (EM) approach to Bayesian variable selection (EMVS) facilitating the Bayesian computation was developed for continuous or binary outcome using a fast EM algorithm. However, it is not suitable to the analyses of time-to-event outcome in many public databases such as The Cancer Genome Atlas (TCGA). Results We extended the EMVS to high-dimensional parametric survival regression framework (SurvEMVS). A variant of cyclic coordinate descent (CCD) algorithm was used for efficient iteration in M-step, and the extended Bayesian information criteria (EBIC) was employed to make choice on hyperparameter tuning. We evaluated the performance of SurvEMVS using numeric simulations and illustrated the effectiveness on two real datasets. The results of numerical simulations and two real data analyses show the well performance of SurvEMVS in aspects of accuracy and computation. Some potential markers associated with survival of lung or stomach cancer were identified. Conclusions These results suggest that our model is effective and can cope with high-dimensional omics data. Electronic supplementary material The online version of this article (10.1186/s40246-018-0179-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Weiwei Duan
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Ruyang Zhang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Yang Zhao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Sipeng Shen
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Yongyue Wei
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China
| | - Feng Chen
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China. .,China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China. .,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China. .,Key Laboratory of Biomedical Big Data of Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.
| | - David C Christiani
- China International Cooperation Center for Environment and Human Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Joint Laboratory of Health and Environmental Risk Assessment (HERA), Nanjing Medical University School of Public Health / Harvard School of Public Health, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.,Department of Environmental Health, Harvard School of Public Health, Boston, MA, USA.,Pulmonary and Critical Care Division, Department of Medicine, Massachusetts General Hospital/Harvard Medical School, Boston, MA, 02114, USA
| |
Collapse
|
5
|
Wittenburg D, Liebscher V. An approximate Bayesian significance test for genomic evaluations. Biom J 2018; 60:1096-1109. [PMID: 30101421 PMCID: PMC6282823 DOI: 10.1002/bimj.201700219] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 03/06/2018] [Accepted: 04/10/2018] [Indexed: 11/12/2022]
Abstract
Genomic information can be used to study the genetic architecture of some trait. Not only the size of the genetic effect captured by molecular markers and their position on the genome but also the mode of inheritance, which might be additive or dominant, and the presence of interactions are interesting parameters. When searching for interacting loci, estimating the effect size and determining the significant marker pairs increases the computational burden in terms of speed and memory allocation dramatically. This study revisits a rapid Bayesian approach (fastbayes). As a novel contribution, a measure of evidence is derived to select markers with effect significantly different from zero. It is based on the credibility of the highest posterior density interval next to zero in a marginalized manner. This methodology is applied to simulated data resembling a dairy cattle population in order to verify the sensitivity of testing for a given range of type-I error levels. A real data application complements this study. Sensitivity and specificity of fastbayes were similar to a variational Bayesian method, and a further reduction of computing time could be achieved. More than 50% of the simulated causative variants were identified. The most complex model containing different kinds of genetic effects and their pairwise interactions yielded the best outcome over a range of type-I error levels. The validation study showed that fastbayes is a dual-purpose tool for genomic inferences - it is applicable to predict future outcome of not-yet phenotyped individuals with high precision as well as to estimate and test single-marker effects. Furthermore, it allows the estimation of billions of interaction effects.
Collapse
Affiliation(s)
- Dörte Wittenburg
- Institute of Genetics and Biometry, Leibniz Institute for Farm Animal Biology, Wilhelm-Stahl-Allee 2, D-18196, Dummerstorf, Germany
| | - Volkmar Liebscher
- Department of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Str. 47, D-17489, Greifswald, Germany
| |
Collapse
|
6
|
A fast algorithm for Bayesian multi-locus model in genome-wide association studies. Mol Genet Genomics 2017; 292:923-934. [DOI: 10.1007/s00438-017-1322-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 04/18/2017] [Indexed: 12/27/2022]
|
7
|
Lu ZH, Zhu H, Knickmeyer RC, Sullivan PF, Williams SN, Zou F. Multiple SNP Set Analysis for Genome-Wide Association Studies Through Bayesian Latent Variable Selection. Genet Epidemiol 2015; 39:664-77. [PMID: 26515609 DOI: 10.1002/gepi.21932] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Revised: 07/23/2015] [Accepted: 08/18/2015] [Indexed: 11/07/2022]
Abstract
The power of genome-wide association studies (GWAS) for mapping complex traits with single-SNP analysis (where SNP is single-nucleotide polymorphism) may be undermined by modest SNP effect sizes, unobserved causal SNPs, correlation among adjacent SNPs, and SNP-SNP interactions. Alternative approaches for testing the association between a single SNP set and individual phenotypes have been shown to be promising for improving the power of GWAS. We propose a Bayesian latent variable selection (BLVS) method to simultaneously model the joint association mapping between a large number of SNP sets and complex traits. Compared with single SNP set analysis, such joint association mapping not only accounts for the correlation among SNP sets but also is capable of detecting causal SNP sets that are marginally uncorrelated with traits. The spike-and-slab prior assigned to the effects of SNP sets can greatly reduce the dimension of effective SNP sets, while speeding up computation. An efficient Markov chain Monte Carlo algorithm is developed. Simulations demonstrate that BLVS outperforms several competing variable selection methods in some important scenarios.
Collapse
Affiliation(s)
- Zhao-Hua Lu
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, United States of America.,Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | - Rebecca C Knickmeyer
- Department of Psychiatry, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | - Stephanie N Williams
- Department of Genetics, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, United States of America
| | | |
Collapse
|
8
|
Vilhjálmsson B, Yang J, Finucane H, Gusev A, Lindström S, Ripke S, Genovese G, Loh PR, Bhatia G, Do R, Hayeck T, Won HH, Kathiresan S, Pato M, Pato C, Tamimi R, Stahl E, Zaitlen N, Pasaniuc B, Belbin G, Kenny EE, Schierup MH, De Jager P, Patsopoulos NA, McCarroll S, Daly M, Purcell S, Chasman D, Neale B, Goddard M, Visscher PM, Kraft P, Patterson N, Price AL, Ripke S, Neale B, Corvin A, Walters J, Farh KH, Holmans P, Lee P, Bulik-Sullivan B, Collier D, Huang H, Pers T, Agartz I, Agerbo E, Albus M, Alexander M, Amin F, Bacanu S, Begemann M, Belliveau R, Bene J, Bergen S, Bevilacqua E, Bigdeli T, Black D, Bruggeman R, Buccola N, Buckner R, Byerley W, Cahn W, Cai G, Campion D, Cantor R, Carr V, Carrera N, Catts S, Chambert K, Chan R, Chen R, Chen E, Cheng W, Cheung E, Chong S, Cloninger C, Cohen D, Cohen N, Cormican P, Craddock N, Crowley J, Curtis D, Davidson M, Davis K, Degenhardt F, Del Favero J, DeLisi L, Demontis D, Dikeos D, Dinan T, Djurovic S, Donohoe G, Drapeau E, Duan J, Dudbridge F, Durmishi N, Eichhammer P, Eriksson J, Escott-Price V, et alVilhjálmsson B, Yang J, Finucane H, Gusev A, Lindström S, Ripke S, Genovese G, Loh PR, Bhatia G, Do R, Hayeck T, Won HH, Kathiresan S, Pato M, Pato C, Tamimi R, Stahl E, Zaitlen N, Pasaniuc B, Belbin G, Kenny EE, Schierup MH, De Jager P, Patsopoulos NA, McCarroll S, Daly M, Purcell S, Chasman D, Neale B, Goddard M, Visscher PM, Kraft P, Patterson N, Price AL, Ripke S, Neale B, Corvin A, Walters J, Farh KH, Holmans P, Lee P, Bulik-Sullivan B, Collier D, Huang H, Pers T, Agartz I, Agerbo E, Albus M, Alexander M, Amin F, Bacanu S, Begemann M, Belliveau R, Bene J, Bergen S, Bevilacqua E, Bigdeli T, Black D, Bruggeman R, Buccola N, Buckner R, Byerley W, Cahn W, Cai G, Campion D, Cantor R, Carr V, Carrera N, Catts S, Chambert K, Chan R, Chen R, Chen E, Cheng W, Cheung E, Chong S, Cloninger C, Cohen D, Cohen N, Cormican P, Craddock N, Crowley J, Curtis D, Davidson M, Davis K, Degenhardt F, Del Favero J, DeLisi L, Demontis D, Dikeos D, Dinan T, Djurovic S, Donohoe G, Drapeau E, Duan J, Dudbridge F, Durmishi N, Eichhammer P, Eriksson J, Escott-Price V, Essioux L, Fanous A, Farrell M, Frank J, Franke L, Freedman R, Freimer N, Friedl M, Friedman J, Fromer M, Genovese G, Georgieva L, Gershon E, Giegling I, Giusti-Rodrguez P, Godard S, Goldstein J, Golimbet V, Gopal S, Gratten J, Grove J, de Haan L, Hammer C, Hamshere M, Hansen M, Hansen T, Haroutunian V, Hartmann A, Henskens F, Herms S, Hirschhorn J, Hoffmann P, Hofman A, Hollegaard M, Hougaard D, Ikeda M, Joa I, Julia A, Kahn R, Kalaydjieva L, Karachanak-Yankova S, Karjalainen J, Kavanagh D, Keller M, Kelly B, Kennedy J, Khrunin A, Kim Y, Klovins J, Knowles J, Konte B, Kucinskas V, Kucinskiene Z, Kuzelova-Ptackova H, Kahler A, Laurent C, Keong J, Lee S, Legge S, Lerer B, Li M, Li T, Liang KY, Lieberman J, Limborska S, Loughland C, Lubinski J, Lnnqvist J, Macek M, Magnusson P, Maher B, Maier W, Mallet J, Marsal S, Mattheisen M, Mattingsdal M, McCarley R, McDonald C, McIntosh A, Meier S, Meijer C, Melegh B, Melle I, Mesholam-Gately R, Metspalu A, Michie P, Milani L, Milanova V, Mokrab Y, Morris D, Mors O, Mortensen P, Murphy K, Murray R, Myin-Germeys I, Mller-Myhsok B, Nelis M, Nenadic I, Nertney D, Nestadt G, Nicodemus K, Nikitina-Zake L, Nisenbaum L, Nordin A, O’Callaghan E, O’Dushlaine C, O’Neill F, Oh SY, Olincy A, Olsen L, Van Os J, Pantelis C, Papadimitriou G, Papiol S, Parkhomenko E, Pato M, Paunio T, Pejovic-Milovancevic M, Perkins D, Pietilinen O, Pimm J, Pocklington A, Powell J, Price A, Pulver A, Purcell S, Quested D, Rasmussen H, Reichenberg A, Reimers M, Richards A, Roffman J, Roussos P, Ruderfer D, Salomaa V, Sanders A, Schall U, Schubert C, Schulze T, Schwab S, Scolnick E, Scott R, Seidman L, Shi J, Sigurdsson E, Silagadze T, Silverman J, Sim K, Slominsky P, Smoller J, So HC, Spencer C, Stahl E, Stefansson H, Steinberg S, Stogmann E, Straub R, Strengman E, Strohmaier J, Stroup T, Subramaniam M, Suvisaari J, Svrakic D, Szatkiewicz J, Sderman E, Thirumalai S, Toncheva D, Tooney P, Tosato S, Veijola J, Waddington J, Walsh D, Wang D, Wang Q, Webb B, Weiser M, Wildenauer D, Williams N, Williams S, Witt S, Wolen A, Wong E, Wormley B, Wu J, Xi H, Zai C, Zheng X, Zimprich F, Wray N, Stefansson K, Visscher P, Adolfsson R, Andreassen O, Blackwood D, Bramon E, Buxbaum J, Børglum A, Cichon S, Darvasi A, Domenici E, Ehrenreich H, Esko T, Gejman P, Gill M, Gurling H, Hultman C, Iwata N, Jablensky A, Jonsson E, Kendler K, Kirov G, Knight J, Lencz T, Levinson D, Li Q, Liu J, Malhotra A, McCarroll S, McQuillin A, Moran J, Mortensen P, Mowry B, Nthen M, Ophoff R, Owen M, Palotie A, Pato C, Petryshen T, Posthuma D, Rietschel M, Riley B, Rujescu D, Sham P, Sklar P, St. Clair D, Weinberger D, Wendland J, Werge T, Daly M, Sullivan P, O’Donovan M, Kraft P, Hunter DJ, Adank M, Ahsan H, Aittomäki K, Baglietto L, Berndt S, Blomquist C, Canzian F, Chang-Claude J, Chanock SJ, Crisponi L, Czene K, Dahmen N, Silva IDS, Easton D, Eliassen AH, Figueroa J, Fletcher O, Garcia-Closas M, Gaudet MM, Gibson L, Haiman CA, Hall P, Hazra A, Hein R, Henderson BE, Hofman A, Hopper JL, Irwanto A, Johansson M, Kaaks R, Kibriya MG, Lichtner P, Lindström S, Liu J, Lund E, Makalic E, Meindl A, Meijers-Heijboer H, Müller-Myhsok B, Muranen TA, Nevanlinna H, Peeters PH, Peto J, Prentice RL, Rahman N, Sánchez MJ, Schmidt DF, Schmutzler RK, Southey MC, Tamimi R, Travis R, Turnbull C, Uitterlinden AG, van der Luijt RB, Waisfisz Q, Wang Z, Whittemore AS, Yang R, Zheng W. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am J Hum Genet 2015; 97:576-92. [PMID: 26430803 DOI: 10.1016/j.ajhg.2015.09.001] [Show More Authors] [Citation(s) in RCA: 867] [Impact Index Per Article: 86.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 09/01/2015] [Indexed: 11/24/2022] Open
Abstract
Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
Collapse
|
9
|
Mansfeldt CB, Logsdon BA, Debs GE, Richardson RE. SPINE: SParse eIgengene NEtwork linking gene expression clusters in Dehalococcoides mccartyi to perturbations in experimental conditions. PLoS One 2015; 10:e0118404. [PMID: 25714365 PMCID: PMC4340931 DOI: 10.1371/journal.pone.0118404] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 01/15/2015] [Indexed: 11/18/2022] Open
Abstract
We present a statistical model designed to identify the effect of experimental perturbations on the aggregate behavior of the transcriptome expressed by the bacterium Dehalococcoides mccartyi strain 195. Strains of Dehalococcoides are used in sub-surface bioremediation applications because they organohalorespire tetrachloroethene and trichloroethene (common chlorinated solvents that contaminate the environment) to non-toxic ethene. However, the biochemical mechanism of this process remains incompletely described. Additionally, the response of Dehalococcoides to stress-inducing conditions that may be encountered at field-sites is not well understood. The constructed statistical model captured the aggregate behavior of gene expression phenotypes by modeling the distinct eigengenes of 100 transcript clusters, determining stable relationships among these clusters of gene transcripts with a sparse network-inference algorithm, and directly modeling the effect of changes in experimental conditions by constructing networks conditioned on the experimental state. Based on the model predictions, we discovered new response mechanisms for DMC, notably when the bacterium is exposed to solvent toxicity. The network identified a cluster containing thirteen gene transcripts directly connected to the solvent toxicity condition. Transcripts in this cluster include an iron-dependent regulator (DET0096-97) and a methylglyoxal synthase (DET0137). To validate these predictions, additional experiments were performed. Continuously fed cultures were exposed to saturating levels of tetrachloethene, thereby causing solvent toxicity, and transcripts that were predicted to be linked to solvent toxicity were monitored by quantitative reverse-transcription polymerase chain reaction. Twelve hours after being shocked with saturating levels of tetrachloroethene, the control transcripts (encoding for a key hydrogenase and the 16S rRNA) did not significantly change. By contrast, transcripts for DET0137 and DET0097 displayed a 46.8±11.5 and 14.6±9.3 fold up-regulation, respectively, supporting the model. This is the first study to identify transcripts in Dehalococcoides that potentially respond to tetrachloroethene solvent-toxicity conditions that may be encountered near contamination source zones in sub-surface environments.
Collapse
Affiliation(s)
- Cresten B. Mansfeldt
- Department of Civil and Environmental Engineering, Cornell University, Ithaca, NY, United States of America
| | | | - Garrett E. Debs
- Department of Biological and Environmental Engineering, Cornell University, Ithaca, NY, United States of America
| | - Ruth E. Richardson
- Department of Civil and Environmental Engineering, Cornell University, Ithaca, NY, United States of America
| |
Collapse
|
10
|
Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, Patterson N, Price AL. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet 2015; 47:284-90. [PMID: 25642633 PMCID: PMC4342297 DOI: 10.1038/ng.3190] [Citation(s) in RCA: 971] [Impact Index Per Article: 97.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Accepted: 12/16/2014] [Indexed: 12/15/2022]
Abstract
Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN(2)) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.
Collapse
Affiliation(s)
- Po-Ru Loh
- 1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - George Tucker
- 1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. [3] Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA
| | - Brendan K Bulik-Sullivan
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Bjarni J Vilhjálmsson
- 1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Hilary K Finucane
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Rany M Salem
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Department of Endocrinology, Children's Hospital Boston, Boston, Massachusetts, USA
| | - Daniel I Chasman
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Paul M Ridker
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Benjamin M Neale
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Bonnie Berger
- 1] Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. [2] Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA
| | - Nick Patterson
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Alkes L Price
- 1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [3] Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
11
|
Logsdon BA, Gentles AJ, Miller CP, Blau CA, Becker PS, Lee SI. Sparse expression bases in cancer reveal tumor drivers. Nucleic Acids Res 2015; 43:1332-44. [PMID: 25583238 PMCID: PMC4330344 DOI: 10.1093/nar/gku1290] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
We define a new category of candidate tumor drivers in cancer genome evolution: ‘selected expression regulators’ (SERs)—genes driving dysregulated transcriptional programs in cancer evolution. The SERs are identified from genome-wide tumor expression data with a novel method, namely SPARROW (SPARse selected expRessiOn regulators identified With penalized regression). SPARROW uncovers a previously unknown connection between cancer expression variation and driver events, by using a novel sparse regression technique. Our results indicate that SPARROW is a powerful complementary approach to identify candidate genes containing driver events that are hard to detect from sequence data, due to a large number of passenger mutations and lack of comprehensive sequence information from a sufficiently large number of samples. SERs identified by SPARROW reveal known driver mutations in multiple human cancers, along with known cancer-associated processes and survival-associated genes, better than popular methods for inferring gene expression networks. We demonstrate that when applied to acute myeloid leukemia expression data, SPARROW identifies an apoptotic biomarker (PYCARD) for an investigational drug obatoclax. The PYCARD and obatoclax association is validated in 30 AML patient samples.
Collapse
Affiliation(s)
- Benjamin A Logsdon
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA Sage Bionetworks, Seattle, WA, 98109, USA
| | - Andrew J Gentles
- Center for Cancer Systems Biology, Department of Radiology, Stanford University, CA, 94305, USA
| | - Chris P Miller
- Department of Medicine/Hematology, Center for Cancer Innovation, University of Washington, Seattle, WA, 98195, USA
| | - C Anthony Blau
- Department of Medicine/Hematology, Center for Cancer Innovation, University of Washington, Seattle, WA, 98195, USA
| | - Pamela S Becker
- Department of Medicine/Hematology, Center for Cancer Innovation, University of Washington, Seattle, WA, 98195, USA
| | - Su-In Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA Department of Computer Science & Engineering, University of Washington, Seattle, WA, 98195, USA
| |
Collapse
|
12
|
Lin YC, Hsieh AR, Hsiao CL, Wu SJ, Wang HM, Lian IB, Fann CSJ. Identifying rare and common disease associated variants in genomic data using Parkinson's disease as a model. J Biomed Sci 2014; 21:88. [PMID: 25175702 PMCID: PMC4428531 DOI: 10.1186/s12929-014-0088-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 08/21/2014] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Genome-wide association studies have been successful in identifying common genetic variants for human diseases. However, much of the heritable variation associated with diseases such as Parkinson's disease remains unknown suggesting that many more risk loci are yet to be identified. Rare variants have become important in disease association studies for explaining missing heritability. Methods for detecting this type of association require prior knowledge on candidate genes and combining variants within the region. These methods may suffer from power loss in situations with many neutral variants or causal variants with opposite effects. RESULTS We propose a method capable of scanning genetic variants to identify the region most likely harbouring disease gene with rare and/or common causal variants. Our method assigns a score at each individual variant based on our scoring system. It uses aggregate scores to identify the region with disease association. We evaluate performance by simulation based on 1000 Genomes sequencing data and compare with three commonly used methods. We use a Parkinson's disease case-control dataset as a model to demonstrate the application of our method. Our method has better power than CMC and WSS and similar power to SKAT-O with well-controlled type I error under simulation based on 1000 Genomes sequencing data. In real data analysis, we confirm the association of α-synuclein gene (SNCA) with Parkinson's disease (p = 0.005). We further identify association with hyaluronan synthase 2 (HAS2, p = 0.028) and kringle containing transmembrane protein 1 (KREMEN1, p = 0.006). KREMEN1 is associated with Wnt signalling pathway which has been shown to play an important role for neurodegeneration in Parkinson's disease. CONCLUSIONS Our method is time efficient and less sensitive to inclusion of neutral variants and direction effect of causal variants. It can narrow down a genomic region or a chromosome to a disease associated region. Using Parkinson's disease as a model, our method not only confirms association for a known gene but also identifies two genes previously found by other studies. In spite of many existing methods, we conclude that our method serves as an efficient alternative for exploring genomic data containing both rare and common variants.
Collapse
Affiliation(s)
- Ying-Chao Lin
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan. .,Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan. .,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Ai-Ru Hsieh
- Graduate Institute of Biostatistics, China Medical University, Taichung, Taiwan.
| | - Ching-Lin Hsiao
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Shang-Jung Wu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Hui-Min Wang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Ie-Bin Lian
- Graduate Institute of Statistics and Information Science, National Changhua University of Education, Changhua, Taiwan.
| | - Cathy S J Fann
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan. .,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan. .,Institute of Public Health, National Yang-Ming University, Taipei, Taiwan.
| |
Collapse
|
13
|
Lee SH, Wray NR. Novel genetic analysis for case-control genome-wide association studies: quantification of power and genomic prediction accuracy. PLoS One 2013; 8:e71494. [PMID: 23977056 PMCID: PMC3747270 DOI: 10.1371/journal.pone.0071494] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Accepted: 07/05/2013] [Indexed: 11/19/2022] Open
Abstract
Genome-wide association studies (GWAS) are routinely conducted for both quantitative and binary (disease) traits. We present two analytical tools for use in the experimental design of GWAS. Firstly, we present power calculations quantifying power in a unified framework for a range of scenarios. In this context we consider the utility of quantitative scores (e.g. endophenotypes) that may be available on cases only or both cases and controls. Secondly, we consider, the accuracy of prediction of genetic risk from genome-wide SNPs and derive an expression for genomic prediction accuracy using a liability threshold model for disease traits in a case-control design. The expected values based on our derived equations for both power and prediction accuracy agree well with observed estimates from simulations.
Collapse
Affiliation(s)
- Sang Hong Lee
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
- * E-mail:
| | - Naomi R. Wray
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
14
|
Hoffman GE, Logsdon BA, Mezey JG. PUMA: a unified framework for penalized multiple regression analysis of GWAS data. PLoS Comput Biol 2013; 9:e1003101. [PMID: 23825936 PMCID: PMC3694815 DOI: 10.1371/journal.pcbi.1003101] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 05/02/2013] [Indexed: 01/25/2023] Open
Abstract
Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM) algorithm for generalized linear models (GLM) combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP), as well as a penalty that has not been previously applied to GWAS (i.e. LOG). Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn's disease; and one novel association implicating a gene involved in apoptosis pathways in rheumatoid arthritis. We provide software for applying our PUMA analysis framework.
Collapse
Affiliation(s)
- Gabriel E. Hoffman
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail: (GEH); (JGM)
| | - Benjamin A. Logsdon
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Jason G. Mezey
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Department of Genetic Medicine, Weill Cornell Medical College, New York, New York, United States of America
- * E-mail: (GEH); (JGM)
| |
Collapse
|