1
|
Zhang Q, Yang Z, Yang J. Dissecting the colocalized GWAS and eQTLs with mediation analysis for high-dimensional exposures and confounders. Biometrics 2024; 80:ujae050. [PMID: 38801257 DOI: 10.1093/biomtc/ujae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 03/14/2024] [Accepted: 05/14/2024] [Indexed: 05/29/2024]
Abstract
To leverage the advancements in genome-wide association studies (GWAS) and quantitative trait loci (QTL) mapping for traits and molecular phenotypes to gain mechanistic understanding of the genetic regulation, biological researchers often investigate the expression QTLs (eQTLs) that colocalize with QTL or GWAS peaks. Our research is inspired by 2 such studies. One aims to identify the causal single nucleotide polymorphisms that are responsible for the phenotypic variation and whose effects can be explained by their impacts at the transcriptomic level in maize. The other study in mouse focuses on uncovering the cis-driver genes that induce phenotypic changes by regulating trans-regulated genes. Both studies can be formulated as mediation problems with potentially high-dimensional exposures, confounders, and mediators that seek to estimate the overall indirect effect (IE) for each exposure. In this paper, we propose MedDiC, a novel procedure to estimate the overall IE based on difference-in-coefficients approach. Our simulation studies find that MedDiC offers valid inference for the IE with higher power, shorter confidence intervals, and faster computing time than competing methods. We apply MedDiC to the 2 aforementioned motivating datasets and find that MedDiC yields reproducible outputs across the analysis of closely related traits, with results supported by external biological evidence. The code and additional information are available on our GitHub page (https://github.com/QiZhangStat/MedDiC).
Collapse
Affiliation(s)
- Qi Zhang
- Department of Mathematics and Statistics, University of New Hampshire, Durham, NH 03824, United States
| | - Zhikai Yang
- Complex Biosystems Program and Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, United States
| | - Jinliang Yang
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, United States
| |
Collapse
|
2
|
Bettencourt C, Skene N, Bandres-Ciga S, Anderson E, Winchester LM, Foote IF, Schwartzentruber J, Botia JA, Nalls M, Singleton A, Schilder BM, Humphrey J, Marzi SJ, Toomey CE, Kleifat AA, Harshfield EL, Garfield V, Sandor C, Keat S, Tamburin S, Frigerio CS, Lourida I, Ranson JM, Llewellyn DJ. Artificial intelligence for dementia genetics and omics. Alzheimers Dement 2023; 19:5905-5921. [PMID: 37606627 PMCID: PMC10841325 DOI: 10.1002/alz.13427] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/14/2023] [Accepted: 07/18/2023] [Indexed: 08/23/2023]
Abstract
Genetics and omics studies of Alzheimer's disease and other dementia subtypes enhance our understanding of underlying mechanisms and pathways that can be targeted. We identified key remaining challenges: First, can we enhance genetic studies to address missing heritability? Can we identify reproducible omics signatures that differentiate between dementia subtypes? Can high-dimensional omics data identify improved biomarkers? How can genetics inform our understanding of causal status of dementia risk factors? And which biological processes are altered by dementia-related genetic variation? Artificial intelligence (AI) and machine learning approaches give us powerful new tools in helping us to tackle these challenges, and we review possible solutions and examples of best practice. However, their limitations also need to be considered, as well as the need for coordinated multidisciplinary research and diverse deeply phenotyped cohorts. Ultimately AI approaches improve our ability to interrogate genetics and omics data for precision dementia medicine. HIGHLIGHTS: We have identified five key challenges in dementia genetics and omics studies. AI can enable detection of undiscovered patterns in dementia genetics and omics data. Enhanced and more diverse genetics and omics datasets are still needed. Multidisciplinary collaborative efforts using AI can boost dementia research.
Collapse
Affiliation(s)
- Conceicao Bettencourt
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London, UK
- Queen Square Brain Bank for Neurological Disorders, UCL Queen Square Institute of Neurology, London, UK
| | - Nathan Skene
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Sara Bandres-Ciga
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Emma Anderson
- Department of Mental Health of Older People, Division of Psychiatry, University College London, London, UK
| | | | - Isabelle F Foote
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, USA
| | - Jeremy Schwartzentruber
- Open Targets, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
- Illumina Artificial Intelligence Laboratory, Illumina Inc, Foster City, California, USA
| | - Juan A Botia
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Mike Nalls
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
- Data Tecnica International LLC, Washington, DC, USA
| | - Andrew Singleton
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Jack Humphrey
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Christina E Toomey
- Queen Square Brain Bank for Neurological Disorders, UCL Queen Square Institute of Neurology, London, UK
- Department of Clinical and Movement Neuroscience, UCL Queen Square Institute of Neurology, London, UK
- The Francis Crick Institute, London, UK
| | - Ahmad Al Kleifat
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Eric L Harshfield
- Stroke Research Group, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
| | - Victoria Garfield
- MRC Unit for Lifelong Health and Ageing, Institute of Cardiovascular Science, University College London, London, UK
| | - Cynthia Sandor
- UK Dementia Research Institute. School of Medicine, Cardiff University, Cardiff, UK
| | - Samuel Keat
- UK Dementia Research Institute. School of Medicine, Cardiff University, Cardiff, UK
| | - Stefano Tamburin
- Department of Neurosciences, Biomedicine and Movement Sciences, Neurology Section, University of Verona, Verona, Italy
| | - Carlo Sala Frigerio
- UK Dementia Research Institute, Queen Square Institute of Neurology, University College London, London, UK
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- The Alan Turing Institute, London, UK
| |
Collapse
|
3
|
Van Buren E, Radicioni G, Lester S, O’Neal WK, Dang H, Kasela S, Garudadri S, Curtis JL, Han MK, Krishnan JA, Wan ES, Silverman EK, Hastie A, Ortega VE, Lappalainen T, Nawijn MC, van den Berge M, Christenson SA, Li Y, Cho MH, Kesimer M, Kelada SNP. Genetic regulators of sputum mucin concentration and their associations with COPD phenotypes. PLoS Genet 2023; 19:e1010445. [PMID: 37352370 PMCID: PMC10325042 DOI: 10.1371/journal.pgen.1010445] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 07/06/2023] [Accepted: 04/26/2023] [Indexed: 06/25/2023] Open
Abstract
Hyper-secretion and/or hyper-concentration of mucus is a defining feature of multiple obstructive lung diseases, including chronic obstructive pulmonary disease (COPD). Mucus itself is composed of a mixture of water, ions, salt and proteins, of which the gel-forming mucins, MUC5AC and MUC5B, are the most abundant. Recent studies have linked the concentrations of these proteins in sputum to COPD phenotypes, including chronic bronchitis (CB) and acute exacerbations (AE). We sought to determine whether common genetic variants influence sputum mucin concentrations and whether these variants are also associated with COPD phenotypes, specifically CB and AE. We performed a GWAS to identify quantitative trait loci for sputum mucin protein concentration (pQTL) in the Sub-Populations and InteRmediate Outcome Measures in COPD Study (SPIROMICS, n = 708 for total mucin, n = 215 for MUC5AC, MUC5B). Subsequently, we tested for associations of mucin pQTL with CB and AE using regression modeling (n = 822-1300). Replication analysis was conducted using data from COPDGene (n = 5740) and by examining results from the UK Biobank. We identified one genome-wide significant pQTL for MUC5AC (rs75401036) and two for MUC5B (rs140324259, rs10001928). The strongest association for MUC5B, with rs140324259 on chromosome 11, explained 14% of variation in sputum MUC5B. Despite being associated with lower MUC5B, the C allele of rs140324259 conferred increased risk of CB (odds ratio (OR) = 1.42; 95% confidence interval (CI): 1.10-1.80) as well as AE ascertained over three years of follow up (OR = 1.41; 95% CI: 1.02-1.94). Associations between rs140324259 and CB or AE did not replicate in COPDGene. However, in the UK Biobank, rs140324259 was associated with phenotypes that define CB, namely chronic mucus production and cough, again with the C allele conferring increased risk. We conclude that sputum MUC5AC and MUC5B concentrations are associated with common genetic variants, and the top locus for MUC5B may influence COPD phenotypes, in particular CB.
Collapse
Affiliation(s)
- Eric Van Buren
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Giorgia Radicioni
- Marsico Lung Institute, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Sarah Lester
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Wanda K. O’Neal
- Marsico Lung Institute, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Hong Dang
- Marsico Lung Institute, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Silva Kasela
- New York Genome Center, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
| | - Suresh Garudadri
- Division of Pulmonary, Critical Care, Allergy, & Sleep Medicine, Department of Medicine, University of California San Francisco, San Francisco, California, United States of America
| | - Jeffrey L. Curtis
- Pulmonary & Critical Care Medicine Division, University of Michigan, Ann Arbor, Michigan, United States of America
- Medical Service, VA Ann Arbor Healthcare System, Ann Arbor, Michigan, United States of America
| | - MeiLan K. Han
- Pulmonary & Critical Care Medicine Division, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jerry A. Krishnan
- Breathe Chicago Center, University of Illinois, Chicago, Illinois, United States of America
| | - Emily S. Wan
- Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- VA Boston Healthcare System, Jamaica Plain, Massachusetts, United States of America
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Annette Hastie
- Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Victor E. Ortega
- Department of Internal Medicine, Division of Respiratory Medicine, Mayo Clinic, Scottsdale, Arizona, United States of America
| | - Tuuli Lappalainen
- New York Genome Center, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
| | - Martijn C. Nawijn
- Department of Pathology and Medical Biology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, Groningen, the Netherlands
| | - Maarten van den Berge
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, Groningen, the Netherlands
- Department of Pulmonary Diseases, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Stephanie A. Christenson
- Division of Pulmonary, Critical Care, Allergy, & Sleep Medicine, Department of Medicine, University of California San Francisco, San Francisco, California, United States of America
| | - Yun Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Mehmet Kesimer
- Marsico Lung Institute, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Samir N. P. Kelada
- Marsico Lung Institute, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
4
|
Zhao Q, Han B, Xu Q, Wang T, Fang C, Li R, Zhang L, Pei Y. Proteome and genome integration analysis of obesity. Chin Med J (Engl) 2023; 136:910-921. [PMID: 37000968 PMCID: PMC10278747 DOI: 10.1097/cm9.0000000000002644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Indexed: 04/03/2023] Open
Abstract
ABSTRACT The prevalence of obesity has increased worldwide in recent decades. Genetic factors are now known to play a substantial role in the predisposition to obesity and may contribute up to 70% of the risk for obesity. Technological advancements during the last decades have allowed the identification of many hundreds of genetic markers associated with obesity. However, the transformation of current genetic variant-obesity associations into biological knowledge has been proven challenging. Genomics and proteomics are complementary fields, as proteomics extends functional analyses. Integrating genomic and proteomic data can help to bridge a gap in knowledge regarding genetic variant-obesity associations and to identify new drug targets for the treatment of obesity. We provide an overview of the published papers on the integrated analysis of proteomic and genomic data in obesity and summarize four mainstream strategies: overlap, colocalization, Mendelian randomization, and proteome-wide association studies. The integrated analyses identified many obesity-associated proteins, such as leptin, follistatin, and adenylate cyclase 3. Despite great progress, integrative studies focusing on obesity are still limited. There is an increased demand for large prospective cohort studies to identify and validate findings, and further apply these findings to the prevention, intervention, and treatment of obesity. In addition, we also discuss several other potential integration methods.
Collapse
Affiliation(s)
- Qigang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Suzhou Medical College of Soochow University, Suzhou, Jiangsu 215123, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Suzhou Medical College of Soochow University, Suzhou, Jiangsu 215213, China
| | - Baixue Han
- Department of Epidemiology and Biostatistics, School of Public Health, Suzhou Medical College of Soochow University, Suzhou, Jiangsu 215123, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Suzhou Medical College of Soochow University, Suzhou, Jiangsu 215213, China
| | - Qian Xu
- Department of Epidemiology and Biostatistics, School of Public Health, Suzhou Medical College of Soochow University, Suzhou, Jiangsu 215123, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Suzhou Medical College of Soochow University, Suzhou, Jiangsu 215213, China
| | - Tao Wang
- Department of Endocrinology, The Second Affiliated Hospital, Soochow University, Suzhou, Jiangsu 215004, China
| | - Chen Fang
- Department of Endocrinology, The Second Affiliated Hospital, Soochow University, Suzhou, Jiangsu 215004, China
| | - Rui Li
- Department of Gastroenterology, The First Affiliated Hospital, Soochow University, Suzhou, Jiangsu 215006, China
| | - Lei Zhang
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Suzhou Medical College of Soochow University, Suzhou, Jiangsu 215213, China
- Center for Genetic Epidemiology and Genomics, School of Public Health, Suzhou Medical College of Soochow University, Suzhou, Jiangsu 215123, China
| | - Yufang Pei
- Department of Epidemiology and Biostatistics, School of Public Health, Suzhou Medical College of Soochow University, Suzhou, Jiangsu 215123, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Suzhou Medical College of Soochow University, Suzhou, Jiangsu 215213, China
| |
Collapse
|
5
|
Zhong W, Liu W, Chen J, Sun Q, Hu M, Li Y. Understanding the function of regulatory DNA interactions in the interpretation of non-coding GWAS variants. Front Cell Dev Biol 2022; 10:957292. [PMID: 36060805 PMCID: PMC9437546 DOI: 10.3389/fcell.2022.957292] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/21/2022] [Indexed: 01/11/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified a vast number of variants associated with various complex human diseases and traits. However, most of these GWAS variants reside in non-coding regions producing no proteins, making the interpretation of these variants a daunting challenge. Prior evidence indicates that a subset of non-coding variants detected within or near cis-regulatory elements (e.g., promoters, enhancers, silencers, and insulators) might play a key role in disease etiology by regulating gene expression. Advanced sequencing- and imaging-based technologies, together with powerful computational methods, enabling comprehensive characterization of regulatory DNA interactions, have substantially improved our understanding of the three-dimensional (3D) genome architecture. Recent literature witnesses plenty of examples where using chromosome conformation capture (3C)-based technologies successfully links non-coding variants to their target genes and prioritizes relevant tissues or cell types. These examples illustrate the critical capability of 3D genome organization in annotating non-coding GWAS variants. This review discusses how 3D genome organization information contributes to elucidating the potential roles of non-coding GWAS variants in disease etiology.
Collapse
Affiliation(s)
- Wujuan Zhong
- Biostatistics and Research Decision Sciences, Merck & Co, Inc, Rahway, NJ, United States
| | - Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
6
|
Crouse WL, Keele GR, Gastonguay MS, Churchill GA, Valdar W. A Bayesian model selection approach to mediation analysis. PLoS Genet 2022; 18:e1010184. [PMID: 35533209 PMCID: PMC9129027 DOI: 10.1371/journal.pgen.1010184] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 05/24/2022] [Accepted: 04/04/2022] [Indexed: 01/09/2023] Open
Abstract
Genetic studies often seek to establish a causal chain of events originating from genetic variation through to molecular and clinical phenotypes. When multiple phenotypes share a common genetic association, one phenotype may act as an intermediate for the genetic effects on the other. Alternatively, the phenotypes may be causally unrelated but share genetic loci. Mediation analysis represents a class of causal inference approaches used to determine which of these scenarios is most plausible. We have developed a general approach to mediation analysis based on Bayesian model selection and have implemented it in an R package, bmediatR. Bayesian model selection provides a flexible framework that can be tailored to different analyses. Our approach can incorporate prior information about the likelihood of models and the strength of causal effects. It can also accommodate multiple genetic variants or multi-state haplotypes. Our approach reports posterior probabilities that can be useful in interpreting uncertainty among competing models. We compared bmediatR with other popular methods, including the Sobel test, Mendelian randomization, and Bayesian network analysis using simulated data. We found that bmediatR performed as well or better than these alternatives in most scenarios. We applied bmediatR to proteome data from Diversity Outbred (DO) mice, a multi-parent population, and demonstrate the power of mediation with multi-state haplotypes. We also applied bmediatR to data from human cell lines to identify transcripts that are mediated through or are expressed independently from local chromatin accessibility. We demonstrate that Bayesian model selection provides a powerful and versatile approach to identify causal relationships in genetic studies using model organism or human data.
Collapse
Affiliation(s)
- Wesley L. Crouse
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Gregory R. Keele
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | | | | | - William Valdar
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
7
|
Neumann A, Küçükali F, Bos I, Vos SJB, Engelborghs S, De Pooter T, Joris G, De Rijk P, De Roeck E, Tsolaki M, Verhey F, Martinez-Lage P, Tainta M, Frisoni G, Blin O, Richardson J, Bordet R, Scheltens P, Popp J, Peyratout G, Johannsen P, Frölich L, Vandenberghe R, Freund-Levi Y, Streffer J, Lovestone S, Legido-Quigley C, Ten Kate M, Barkhof F, Strazisar M, Zetterberg H, Bertram L, Visser PJ, van Broeckhoven C, Sleegers K. Rare variants in IFFO1, DTNB, NLRC3 and SLC22A10 associate with Alzheimer's disease CSF profile of neuronal injury and inflammation. Mol Psychiatry 2022; 27:1990-1999. [PMID: 35173266 PMCID: PMC9126805 DOI: 10.1038/s41380-022-01437-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 11/04/2021] [Accepted: 01/05/2022] [Indexed: 11/30/2022]
Abstract
Alzheimer's disease (AD) biomarkers represent several neurodegenerative processes, such as synaptic dysfunction, neuronal inflammation and injury, as well as amyloid pathology. We performed an exome-wide rare variant analysis of six AD biomarkers (β-amyloid, total/phosphorylated tau, NfL, YKL-40, and Neurogranin) to discover genes associated with these markers. Genetic and biomarker information was available for 480 participants from two studies: EMIF-AD and ADNI. We applied a principal component (PC) analysis to derive biomarkers combinations, which represent statistically independent biological processes. We then tested whether rare variants in 9576 protein-coding genes associate with these PCs using a Meta-SKAT test. We also tested whether the PCs are intermediary to gene effects on AD symptoms with a SMUT test. One PC loaded on NfL and YKL-40, indicators of neuronal injury and inflammation. Four genes were associated with this PC: IFFO1, DTNB, NLRC3, and SLC22A10. Mediation tests suggest, that these genes also affect dementia symptoms via inflammation/injury. We also observed an association between a PC loading on Neurogranin, a marker for synaptic functioning, with GABBR2 and CASZ1, but no mediation effects. The results suggest that rare variants in IFFO1, DTNB, NLRC3, and SLC22A10 heighten susceptibility to neuronal injury and inflammation, potentially by altering cytoskeleton structure and immune activity disinhibition, resulting in an elevated dementia risk. GABBR2 and CASZ1 were associated with synaptic functioning, but mediation analyses suggest that the effect of these two genes on synaptic functioning is not consequential for AD development.
Collapse
Affiliation(s)
- Alexander Neumann
- Complex Genetics of Alzheimer's Disease Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium.
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium.
| | - Fahri Küçükali
- Complex Genetics of Alzheimer's Disease Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Isabelle Bos
- Netherlands Institute for Health Services Research, Utrecht, the Netherlands
| | - Stephanie J B Vos
- Alzheimer Centrum Limburg, Maastricht University, Maastricht, the Netherlands
| | - Sebastiaan Engelborghs
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- Department of Neurology and Memory Clinic, Universitair Ziekenhuis Brussel (UZ Brussel) and Center for Neurosciences (C4N), Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Tim De Pooter
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- Neuromics Support Facility, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
| | - Geert Joris
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- Neuromics Support Facility, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
| | - Peter De Rijk
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- Neuromics Support Facility, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
| | - Ellen De Roeck
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- Department of Neurology and Memory Clinic, Hospital Network Antwerp (ZNA) Middelheim and Hoge Beuken, Antwerp, Belgium
| | - Magda Tsolaki
- 1st Department of Neurology, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, Makedonia, Thessaloniki, Greece
| | - Frans Verhey
- Alzheimer Centrum Limburg, Maastricht University, Maastricht, the Netherlands
- Department of Psychiatry and Neuropsychology, Maastricht University, Maastricht, the Netherlands
- School for Mental Health and Neuroscience, Maastricht University, Maastricht, the Netherlands
| | - Pablo Martinez-Lage
- Center for Research and Advanced Therapies, CITA-Alzheimer Foundation, San Sebastian, Spain
| | - Mikel Tainta
- Center for Research and Advanced Therapies, CITA-Alzheimer Foundation, San Sebastian, Spain
| | - Giovanni Frisoni
- Department of Psychiatry, Faculty of Medicine, Geneva University Hospitals, Geneva, Switzerland
- RCCS Instituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | - Oliver Blin
- Clinical Pharmacology & Pharmacovigilance Department, Marseille University Hospital, Marseille, France
| | - Jill Richardson
- Neurosciences Therapeutic Area, GlaxoSmithKline R&D, Stevanage, UK
| | - Régis Bordet
- Neuroscience & Cognition, CHU de Lille, University of Lille, Inserm, France
| | - Philip Scheltens
- Alzheimer Center and Department of Neurology, VU University Medical Center, Amsterdam, the Netherlands
| | - Julius Popp
- Department of Geriatric Psychiatry, University Hospital of Psychiatry Zürich, Zürich, Switzerland
- Old Age Psychiatry, Department of Psychiatry, University Hospital of Lausanne, Lausanne, Switzerland
| | - Gwendoline Peyratout
- Department of Psychiatry, University Hospital of Lausanne, Lausanne, Switzerland
| | - Peter Johannsen
- Clinical Drug Development, Novo Nordisk, Copenhagen, Denmark
| | - Lutz Frölich
- Department of Geriatric Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Rik Vandenberghe
- Laboratory for Cognitive Neurology, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Yvonne Freund-Levi
- Center for Alzheimer Research, Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society Karolinska Institute Stockholm Sweden, Stockholm, Sweden
- School of Medical Sciences Örebro, University Örebro, Örebro, Sweden
| | - Johannes Streffer
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Simon Lovestone
- Department of Psychiatry, University of Oxford, Oxford, UK
- Janssen Medical Ltd, High Wycombe, UK
| | - Cristina Legido-Quigley
- Steno Diabetes Center, Copenhagen, Denmark
- Institute of Pharmaceutical Sciences, King's College London, London, UK
| | - Mara Ten Kate
- Alzheimer Center and Department of Neurology, VU University Medical Center, Amsterdam, the Netherlands
- Department of Radiology and Nuclear Medicine, VU University Medical Center, Amsterdam, the Netherlands
| | - Frederik Barkhof
- Department of Radiology and Nuclear Medicine, VU University Medical Center, Amsterdam, the Netherlands
- Institutes of Neurology and Healthcare Engineering, University College London, London, UK
| | - Mojca Strazisar
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- Neuromics Support Facility, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
| | - Henrik Zetterberg
- Department of Psychiatry and Neurochemistry, University of Gothenburg, Gothenburg, Sweden
- Department of Molecular Neuroscience, UCL Institute of Neurology, London, UK
- Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital, Mölndal, Sweden
- UK Dementia Research Institute, University College London, London, UK
- Hong Kong Center for Neurodegenerative Diseases, Hong Kong, China
| | - Lars Bertram
- Lübeck Interdisciplinary Platform for Genome Analytics, University of Lübeck, Lübeck, Germany
- Centre for Lifespan Changes in Brain and Cognition, University of Oslo, Oslo, Norway
| | - Pieter Jelle Visser
- Alzheimer Centrum Limburg, Maastricht University, Maastricht, the Netherlands
- Alzheimer Center and Department of Neurology, VU University Medical Center, Amsterdam, the Netherlands
| | - Christine van Broeckhoven
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- Neurodegenerative Brain Diseases Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
| | - Kristel Sleegers
- Complex Genetics of Alzheimer's Disease Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
8
|
Liyanage JSS, Estepp JH, Srivastava K, Li Y, Mori M, Kang G. GMEPS: a fast and efficient likelihood approach for genome-wide mediation analysis under extreme phenotype sequencing. Stat Appl Genet Mol Biol 2022; 21:sagmb-2021-0071. [PMID: 35266368 DOI: 10.1515/sagmb-2021-0071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 02/17/2022] [Indexed: 11/15/2022]
Abstract
Due to many advantages such as higher statistical power of detecting the association of genetic variants in human disorders and cost saving, extreme phenotype sequencing (EPS) is a rapidly emerging study design in epidemiological and clinical studies investigating how genetic variations associate with complex phenotypes. However, the investigation of the mediation effect of genetic variants on phenotypes is strictly restrictive under the EPS design because existing methods cannot well accommodate the non-random extreme tails sampling process incurred by the EPS design. In this paper, we propose a likelihood approach for testing the mediation effect of genetic variants through continuous and binary mediators on a continuous phenotype under the EPS design (GMEPS). Besides implementing in EPS design, it can also be utilized as a general mediation analysis procedure. Extensive simulations and two real data applications of a genome-wide association study of benign ethnic neutropenia under EPS design and a candidate-gene study of neurocognitive performance in patients with sickle cell disease under random sampling design demonstrate the superiority of GMEPS under the EPS design over widely used mediation analysis procedures, while demonstrating compatible capabilities under the general random sampling framework.
Collapse
Affiliation(s)
- Janaka S S Liyanage
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis 38105, TN, USA
| | - Jeremie H Estepp
- Departments of Global Pediatric Medicine and Hematology, St. Jude Children's Research Hospital, Memphis 38105, TN, USA
| | - Kumar Srivastava
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis 38105, TN, USA
| | - Yun Li
- Department of Biostatistics, Department of Genetics, Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill 27599, NC, USA
| | - Motomi Mori
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis 38105, TN, USA
| | - Guolian Kang
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis 38105, TN, USA
| |
Collapse
|
9
|
Mary-Huard T, Das S, Mukhopadhyay I, Robin S. Querying multiple sets of P-values through composed hypothesis testing. Bioinformatics 2021; 38:141-148. [PMID: 34478490 DOI: 10.1093/bioinformatics/btab592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 07/16/2021] [Accepted: 07/27/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as a set of P-values resulting from previous analyses, that need to be combined flexibly to explore complex hypotheses, while guaranteeing a low proportion of false discoveries. RESULTS We introduce the generic concept of composed hypothesis, which corresponds to an arbitrary complex combination of simple hypotheses. We rephrase the problem of testing a composed hypothesis as a classification task and show that finding items for which the composed null hypothesis is rejected boils down to fitting a mixture model and classifying the items according to their posterior probabilities. We show that inference can be efficiently performed and provide a thorough classification rule to control for type I error. The performance and the usefulness of the approach are illustrated in simulations and on two different applications. The method is scalable, does not require any parameter tuning, and provided valuable biological insight on the considered application cases. AVAILABILITY AND IMPLEMENTATION The QCH methodology is available in the qch package hosted on CRAN. Additionally, R codes to reproduce the Einkorn example are available on the personal webpage of the first author: https://www6.inrae.fr/mia-paris/Equipes/Membres/Tristan-Mary-Huard. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tristan Mary-Huard
- Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France.,Génétique Quantitative et Evolution (GQE)-Le Moulon, Universite Paris-Saclay, INRAE, CNRS, AgroParisTech, Gif-sur-Yvette 91190, France
| | - Sarmistha Das
- Human Genetics Unit, Indian Statistical Institute, Kolkata 700108, India
| | | | - Stéphane Robin
- Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France.,Centre d'Écologie et des Sciences de la Conservation (CESCO), MNHN, CNRS, Sorbonne Université, Paris 75005, France
| |
Collapse
|
10
|
Wang T, Lu H, Zeng P. Identifying pleiotropic genes for complex phenotypes with summary statistics from a perspective of composite null hypothesis testing. Brief Bioinform 2021; 23:6375058. [PMID: 34571531 DOI: 10.1093/bib/bbab389] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/06/2021] [Accepted: 08/28/2021] [Indexed: 12/13/2022] Open
Abstract
Pleiotropy has important implication on genetic connection among complex phenotypes and facilitates our understanding of disease etiology. Genome-wide association studies provide an unprecedented opportunity to detect pleiotropic associations; however, efficient pleiotropy test methods are still lacking. We here consider pleiotropy identification from a methodological perspective of high-dimensional composite null hypothesis and propose a powerful gene-based method called MAIUP. MAIUP is constructed based on the traditional intersection-union test with two sets of independent P-values as input and follows a novel idea that was originally proposed under the high-dimensional mediation analysis framework. The key improvement of MAIUP is that it takes the composite null nature of pleiotropy test into account by fitting a three-component mixture null distribution, which can ultimately generate well-calibrated P-values for effective control of family-wise error rate and false discover rate. Another attractive advantage of MAIUP is its ability to effectively address the issue of overlapping subjects commonly encountered in association studies. Simulation studies demonstrate that compared with other methods, only MAIUP can maintain correct type I error control and has higher power across a wide range of scenarios. We apply MAIUP to detect shared associated genes among 14 psychiatric disorders with summary statistics and discover many new pleiotropic genes that are otherwise not identified if failing to account for the issue of composite null hypothesis testing. Functional and enrichment analyses offer additional evidence supporting the validity of these identified pleiotropic genes associated with psychiatric disorders. Overall, MAIUP represents an efficient method for pleiotropy identification.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.,Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.,Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
11
|
Shao Z, Wang T, Zhang M, Jiang Z, Huang S, Zeng P. IUSMMT: Survival mediation analysis of gene expression with multiple DNA methylation exposures and its application to cancers of TCGA. PLoS Comput Biol 2021; 17:e1009250. [PMID: 34464378 PMCID: PMC8437300 DOI: 10.1371/journal.pcbi.1009250] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/13/2021] [Accepted: 07/06/2021] [Indexed: 02/07/2023] Open
Abstract
Effective and powerful survival mediation models are currently lacking. To partly fill such knowledge gap, we particularly focus on the mediation analysis that includes multiple DNA methylations acting as exposures, one gene expression as the mediator and one survival time as the outcome. We proposed IUSMMT (intersection-union survival mixture-adjusted mediation test) to effectively examine the existence of mediation effect by fitting an empirical three-component mixture null distribution. With extensive simulation studies, we demonstrated the advantage of IUSMMT over existing methods. We applied IUSMMT to ten TCGA cancers and identified multiple genes that exhibited mediating effects. We further revealed that most of the identified regions, in which genes behaved as active mediators, were cancer type-specific and exhibited a full mediation from DNA methylation CpG sites to the survival risk of various types of cancers. Overall, IUSMMT represents an effective and powerful alternative for survival mediation analysis; our results also provide new insights into the functional role of DNA methylation and gene expression in cancer progression/prognosis and demonstrate potential therapeutic targets for future clinical practice.
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Meng Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| |
Collapse
|
12
|
Zhu A, Matoba N, Wilson EP, Tapia AL, Li Y, Ibrahim JG, Stein JL, Love MI. MRLocus: Identifying causal genes mediating a trait through Bayesian estimation of allelic heterogeneity. PLoS Genet 2021; 17:e1009455. [PMID: 33872308 PMCID: PMC8084342 DOI: 10.1371/journal.pgen.1009455] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 04/29/2021] [Accepted: 02/26/2021] [Indexed: 11/18/2022] Open
Abstract
Expression quantitative trait loci (eQTL) studies are used to understand the regulatory function of non-coding genome-wide association study (GWAS) risk loci, but colocalization alone does not demonstrate a causal relationship of gene expression affecting a trait. Evidence for mediation, that perturbation of gene expression in a given tissue or developmental context will induce a change in the downstream GWAS trait, can be provided by two-sample Mendelian Randomization (MR). Here, we introduce a new statistical method, MRLocus, for Bayesian estimation of the gene-to-trait effect from eQTL and GWAS summary data for loci with evidence of allelic heterogeneity, that is, containing multiple causal variants. MRLocus makes use of a colocalization step applied to each nearly-LD-independent eQTL, followed by an MR analysis step across eQTLs. Additionally, our method involves estimation of the extent of allelic heterogeneity through a dispersion parameter, indicating variable mediation effects from each individual eQTL on the downstream trait. Our method is evaluated against other state-of-the-art methods for estimation of the gene-to-trait mediation effect, using an existing simulation framework. In simulation, MRLocus often has the highest accuracy among competing methods, and in each case provides more accurate estimation of uncertainty as assessed through interval coverage. MRLocus is then applied to five candidate causal genes for mediation of particular GWAS traits, where gene-to-trait effects are concordant with those previously reported. We find that MRLocus's estimation of the causal effect across eQTLs within a locus provides useful information for determining how perturbation of gene expression or individual regulatory elements will affect downstream traits. The MRLocus method is implemented as an R package available at https://mikelove.github.io/mrlocus.
Collapse
Affiliation(s)
- Anqi Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Nana Matoba
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Emma P. Wilson
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Amanda L. Tapia
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Joseph G. Ibrahim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jason L. Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
13
|
Zhong W, Darville T, Zheng X, Fine J, Li Y. Generalized multi-SNP mediation intersection-union test. Biometrics 2020; 78:364-375. [PMID: 33316078 DOI: 10.1111/biom.13418] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 11/23/2020] [Accepted: 12/04/2020] [Indexed: 12/15/2022]
Abstract
To elucidate the molecular mechanisms underlying genetic variants identified from genome-wide association studies (GWAS) for a variety of phenotypic traits encompassing binary, continuous, count, and survival outcomes, we propose a novel and flexible method to test for mediation that can simultaneously accommodate multiple genetic variants and different types of outcome variables. Specifically, we employ the intersection-union test approach combined with the likelihood ratio test to detect mediation effect of multiple genetic variants via some mediator (e.g., the expression of a neighboring gene) on outcome. We fit high-dimensional generalized linear mixed models under the mediation framework, separately under the null and alternative hypothesis. We leverage Laplace approximation to compute the marginal likelihood of outcome and use coordinate descent algorithm to estimate corresponding parameters. Our extensive simulations demonstrate the validity of our proposed methods and substantial, up to 97%, power gains over alternative methods. Applications to real data for the study of Chlamydia trachomatis infection further showcase advantages of our methods. We believe our proposed methods will be of value and general interest in this post-GWAS era to disentangle the potential causal mechanism from DNA to phenotype for new drug discovery and personalized medicine.
Collapse
Affiliation(s)
- Wujuan Zhong
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Toni Darville
- Department of Pediatrics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Xiaojing Zheng
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Pediatrics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jason Fine
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Yun Li
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
14
|
Howey R, Shin SY, Relton C, Davey Smith G, Cordell HJ. Bayesian network analysis incorporating genetic anchors complements conventional Mendelian randomization approaches for exploratory analysis of causal relationships in complex data. PLoS Genet 2020; 16:e1008198. [PMID: 32119656 PMCID: PMC7067488 DOI: 10.1371/journal.pgen.1008198] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Revised: 03/12/2020] [Accepted: 01/21/2020] [Indexed: 12/26/2022] Open
Abstract
Mendelian randomization (MR) implemented through instrumental variables analysis is an increasingly popular causal inference tool used in genetic epidemiology. But it can have limitations for evaluating simultaneous causal relationships in complex data sets that include, for example, multiple genetic predictors and multiple potential risk factors associated with the same genetic variant. Here we use real and simulated data to investigate Bayesian network analysis (BN) with the incorporation of directed arcs, representing genetic anchors, as an alternative approach. A Bayesian network describes the conditional dependencies/independencies of variables using a graphical model (a directed acyclic graph) with an accompanying joint probability. In real data, we found BN could be used to infer simultaneous causal relationships that confirmed the individual causal relationships suggested by bi-directional MR, while allowing for the existence of potential horizontal pleiotropy (that would violate MR assumptions). In simulated data, BN with two directional anchors (mimicking genetic instruments) had greater power for a fixed type 1 error than bi-directional MR, while BN with a single directional anchor performed better than or as well as bi-directional MR. Both BN and MR could be adversely affected by violations of their underlying assumptions (such as genetic confounding due to unmeasured horizontal pleiotropy). BN with no directional anchor generated inference that was no better than by chance, emphasizing the importance of directional anchors in BN (as in MR). Under highly pleiotropic simulated scenarios, BN outperformed both MR (and its recent extensions) and two recently-proposed alternative approaches: a multi-SNP mediation intersection-union test (SMUT) and a latent causal variable (LCV) test. We conclude that BN incorporating genetic anchors is a useful complementary method to conventional MR for exploring causal relationships in complex data sets such as those generated from modern "omics" technologies.
Collapse
Affiliation(s)
- Richard Howey
- Institute of Genetic Medicine, Newcastle University, Newcastle, United Kingdom
| | - So-Youn Shin
- Institute of Genetic Medicine, Newcastle University, Newcastle, United Kingdom
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom
| | - Caroline Relton
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom
- Population Health Sciences, University of Bristol, Bristol, United Kingdom
| | - George Davey Smith
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom
- Population Health Sciences, University of Bristol, Bristol, United Kingdom
| | - Heather J. Cordell
- Institute of Genetic Medicine, Newcastle University, Newcastle, United Kingdom
| |
Collapse
|
15
|
Zhong W, Dong L, Poston TB, Darville T, Spracklen CN, Wu D, Mohlke KL, Li Y, Li Q, Zheng X. Inferring Regulatory Networks From Mixed Observational Data Using Directed Acyclic Graphs. Front Genet 2020; 11:8. [PMID: 32127796 PMCID: PMC7038820 DOI: 10.3389/fgene.2020.00008] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/06/2020] [Indexed: 02/02/2023] Open
Abstract
Construction of regulatory networks using cross-sectional expression profiling of genes is desired, but challenging. The Directed Acyclic Graph (DAG) provides a general framework to infer causal effects from observational data. However, most existing DAG methods assume that all nodes follow the same type of distribution, which prohibit a joint modeling of continuous gene expression and categorical variables. We present a new mixed DAG (mDAG) algorithm to infer the regulatory pathway from mixed observational data containing both continuous variables (e.g. expression of genes) and categorical variables (e.g. categorical phenotypes or single nucleotide polymorphisms). Our method can identify upstream causal factors and downstream effectors closely linked to a variable and generate hypotheses for causal direction of regulatory pathways. We propose a new permutation method to test the conditional independence of variables of mixed types, which is the key for mDAG. We also utilize an L1 regularization in mDAG to ensure it can recover a large sparse DAG with limited sample size. We demonstrate through extensive simulations that mDAG outperforms two well-known methods in recovering the true underlying DAG. We apply mDAG to a cross-sectional immunological study of Chlamydia trachomatis infection and successfully infer the regularity network of cytokines. We also apply mDAG to a large cohort study, generating sensible mechanistic hypotheses underlying plasma adiponectin level. The R package mDAG is publicly available from CRAN at https://CRAN.R-project.org/package=mDAG.
Collapse
Affiliation(s)
- Wujuan Zhong
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Li Dong
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Taylor B Poston
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Toni Darville
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Cassandra N Spracklen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Di Wu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.,Department of Oral and Craniofacial Health Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Quefeng Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Xiaojing Zheng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.,Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|