1
|
McCaw ZR, Gao J, Lin X, Gronsbell J. Synthetic surrogates improve power for genome-wide association studies of partially missing phenotypes in population biobanks. Nat Genet 2024:10.1038/s41588-024-01793-9. [PMID: 38872030 DOI: 10.1038/s41588-024-01793-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 05/08/2024] [Indexed: 06/15/2024]
Abstract
Within population biobanks, incomplete measurement of certain traits limits the power for genetic discovery. Machine learning is increasingly used to impute the missing values from the available data. However, performing genome-wide association studies (GWAS) on imputed traits can introduce spurious associations, identifying genetic variants that are not associated with the original trait. Here we introduce a new method, synthetic surrogate (SynSurr) analysis, which makes GWAS on imputed phenotypes robust to imputation errors. Rather than replacing missing values, SynSurr jointly analyzes the original and imputed traits. We show that SynSurr estimates the same genetic effect as standard GWAS and improves power in proportion to the quality of the imputations. SynSurr requires a commonly made missing-at-random assumption but relaxes the requirements of existing imputation methods by not requiring correct model specification. We present extensive simulations and ablation analyses to validate SynSurr and apply it to empower the GWAS of dual-energy X-ray absorptiometry traits within the UK Biobank.
Collapse
Affiliation(s)
- Zachary R McCaw
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
2
|
Pazokitoroudi A, Liu Z, Dahl A, Zaitlen N, Rosset S, Sankararaman S. A scalable and robust variance components method reveals insights into the architecture of gene-environment interactions underlying complex traits. Am J Hum Genet 2024:S0002-9297(24)00178-2. [PMID: 38866020 DOI: 10.1016/j.ajhg.2024.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/14/2024] Open
Abstract
Understanding the contribution of gene-environment interactions (GxE) to complex trait variation can provide insights into disease mechanisms, explain sources of heritability, and improve genetic risk prediction. While large biobanks with genetic and deep phenotypic data hold promise for obtaining novel insights into GxE, our understanding of GxE architecture in complex traits remains limited. We introduce a method to estimate the proportion of trait variance explained by GxE (GxE heritability) and additive genetic effects (additive heritability) across the genome and within specific genomic annotations. We show that our method is accurate in simulations and computationally efficient for biobank-scale datasets. We applied our method to common array SNPs (MAF ≥1%), fifty quantitative traits, and four environmental variables (smoking, sex, age, and statin usage) in unrelated white British individuals in the UK Biobank. We found 68 trait-E pairs with significant genome-wide GxE heritability (p<0.05/200) with a ratio of GxE to additive heritability of ≈6.8% on average. Analyzing ≈8 million imputed SNPs (MAF ≥0.1%), we documented an approximate 28% increase in genome-wide GxE heritability compared to array SNPs. We partitioned GxE heritability across minor allele frequency (MAF) and local linkage disequilibrium (LD) values, revealing that, like additive allelic effects, GxE allelic effects tend to increase with decreasing MAF and LD. Analyzing GxE heritability near genes highly expressed in specific tissues, we find significant brain-specific enrichment for body mass index (BMI) and basal metabolic rate in the context of smoking and adipose-specific enrichment for waist-hip ratio (WHR) in the context of sex.
Collapse
Affiliation(s)
- Ali Pazokitoroudi
- Department of Computer Science, UCLA, Los Angeles, CA, USA; Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Zhengtong Liu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Andrew Dahl
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Noah Zaitlen
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA; Department of Neurology, UCLA, Los Angeles, CA, USA
| | - Saharon Rosset
- Department of Statistics, Tel-Aviv University, Tel-Aviv, Israel
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
3
|
Kunkel D, Sørensen P, Shankar V, Morgante F. Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.06.592745. [PMID: 38766136 PMCID: PMC11100663 DOI: 10.1101/2024.05.06.592745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, Morgante et al. introduced mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data has smaller sample size.
Collapse
Affiliation(s)
- Deborah Kunkel
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, United States of America
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Vijay Shankar
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, United States of America
| |
Collapse
|
4
|
Tian J, Bai X, Quek C. Single-Cell Informatics for Tumor Microenvironment and Immunotherapy. Int J Mol Sci 2024; 25:4485. [PMID: 38674070 PMCID: PMC11050520 DOI: 10.3390/ijms25084485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/12/2024] [Accepted: 04/16/2024] [Indexed: 04/28/2024] Open
Abstract
Cancer comprises malignant cells surrounded by the tumor microenvironment (TME), a dynamic ecosystem composed of heterogeneous cell populations that exert unique influences on tumor development. The immune community within the TME plays a substantial role in tumorigenesis and tumor evolution. The innate and adaptive immune cells "talk" to the tumor through ligand-receptor interactions and signaling molecules, forming a complex communication network to influence the cellular and molecular basis of cancer. Such intricate intratumoral immune composition and interactions foster the application of immunotherapies, which empower the immune system against cancer to elicit durable long-term responses in cancer patients. Single-cell technologies have allowed for the dissection and characterization of the TME to an unprecedented level, while recent advancements in bioinformatics tools have expanded the horizon and depth of high-dimensional single-cell data analysis. This review will unravel the intertwined networks between malignancy and immunity, explore the utilization of computational tools for a deeper understanding of tumor-immune communications, and discuss the application of these approaches to aid in diagnosis or treatment decision making in the clinical setting, as well as the current challenges faced by the researchers with their potential future improvements.
Collapse
Affiliation(s)
| | | | - Camelia Quek
- Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia; (J.T.); (X.B.)
| |
Collapse
|
5
|
Balliu B, Douglas C, Seok D, Shenhav L, Wu Y, Chatzopoulou D, Kaiser W, Chen V, Kim J, Deverasetty S, Arnaudova I, Gibbons R, Congdon E, Craske MG, Freimer N, Halperin E, Sankararaman S, Flint J. Personalized mood prediction from patterns of behavior collected with smartphones. NPJ Digit Med 2024; 7:49. [PMID: 38418551 PMCID: PMC10902386 DOI: 10.1038/s41746-024-01035-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 02/09/2024] [Indexed: 03/01/2024] Open
Abstract
Over the last ten years, there has been considerable progress in using digital behavioral phenotypes, captured passively and continuously from smartphones and wearable devices, to infer depressive mood. However, most digital phenotype studies suffer from poor replicability, often fail to detect clinically relevant events, and use measures of depression that are not validated or suitable for collecting large and longitudinal data. Here, we report high-quality longitudinal validated assessments of depressive mood from computerized adaptive testing paired with continuous digital assessments of behavior from smartphone sensors for up to 40 weeks on 183 individuals experiencing mild to severe symptoms of depression. We apply a combination of cubic spline interpolation and idiographic models to generate individualized predictions of future mood from the digital behavioral phenotypes, achieving high prediction accuracy of depression severity up to three weeks in advance (R2 ≥ 80%) and a 65.7% reduction in the prediction error over a baseline model which predicts future mood based on past depression severity alone. Finally, our study verified the feasibility of obtaining high-quality longitudinal assessments of mood from a clinical population and predicting symptom severity weeks in advance using passively collected digital behavioral data. Our results indicate the possibility of expanding the repertoire of patient-specific behavioral measures to enable future psychiatric research.
Collapse
Affiliation(s)
- Brunilda Balliu
- Departments of Computational Medicine, University of California Los Angeles, Los Angeles, USA.
- Departments of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, USA.
- Department of Biostatistics, University of California Los Angeles, Los Angeles, USA.
| | - Chris Douglas
- Department of Psychiatry and Biobehavioral Science, University of California Los Angeles, Los Angeles, USA
| | - Darsol Seok
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, USA
| | - Liat Shenhav
- Department of Computer Science, University of California Los Angeles, Los Angeles, USA
| | - Yue Wu
- Department of Computer Science, University of California Los Angeles, Los Angeles, USA
| | - Doxa Chatzopoulou
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, USA
| | - William Kaiser
- Department of Electrical Engineering, University of California Los Angeles, Los Angeles, USA
| | - Victor Chen
- Department of Electrical Engineering, University of California Los Angeles, Los Angeles, USA
| | - Jennifer Kim
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, USA
| | - Sandeep Deverasetty
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, USA
| | - Inna Arnaudova
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, USA
| | - Robert Gibbons
- Departments of Medicine, Public Health Sciences and Comparative Human Development, University of Chicago, Chicago, USA
| | - Eliza Congdon
- Department of Psychiatry and Biobehavioral Science, University of California Los Angeles, Los Angeles, USA
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, USA
| | - Michelle G Craske
- Department of Psychiatry and Biobehavioral Science, University of California Los Angeles, Los Angeles, USA
- Department of Psychology, University of California Los Angeles, Los Angeles, USA
| | - Nelson Freimer
- Department of Psychiatry and Biobehavioral Science, University of California Los Angeles, Los Angeles, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, USA
| | - Eran Halperin
- Department of Computer Science, University of California Los Angeles, Los Angeles, USA
| | - Sriram Sankararaman
- Departments of Computational Medicine, University of California Los Angeles, Los Angeles, USA
- Department of Computer Science, University of California Los Angeles, Los Angeles, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, USA
| | - Jonathan Flint
- Department of Psychiatry and Biobehavioral Science, University of California Los Angeles, Los Angeles, USA.
- Department of Human Genetics, University of California Los Angeles, Los Angeles, USA.
| |
Collapse
|
6
|
Dahl A, Thompson M, An U, Krebs M, Appadurai V, Border R, Bacanu SA, Werge T, Flint J, Schork AJ, Sankararaman S, Kendler KS, Cai N. Phenotype integration improves power and preserves specificity in biobank-based genetic studies of major depressive disorder. Nat Genet 2023; 55:2082-2093. [PMID: 37985818 PMCID: PMC10703686 DOI: 10.1038/s41588-023-01559-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/18/2023] [Indexed: 11/22/2023]
Abstract
Biobanks often contain several phenotypes relevant to diseases such as major depressive disorder (MDD), with partly distinct genetic architectures. Researchers face complex tradeoffs between shallow (large sample size, low specificity/sensitivity) and deep (small sample size, high specificity/sensitivity) phenotypes, and the optimal choices are often unclear. Here we propose to integrate these phenotypes to combine the benefits of each. We use phenotype imputation to integrate information across hundreds of MDD-relevant phenotypes, which significantly increases genome-wide association study (GWAS) power and polygenic risk score (PRS) prediction accuracy of the deepest available MDD phenotype in UK Biobank, LifetimeMDD. We demonstrate that imputation preserves specificity in its genetic architecture using a novel PRS-based pleiotropy metric. We further find that integration via summary statistics also enhances GWAS power and PRS predictions, but can introduce nonspecific genetic effects depending on input. Our work provides a simple and scalable approach to improve genetic studies in large biobanks by integrating shallow and deep phenotypes.
Collapse
Affiliation(s)
- Andrew Dahl
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA.
| | - Michael Thompson
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ulzee An
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Morten Krebs
- Institute of Biological Psychiatry, Mental Health Center-Sct Hans, Copenhagen University Hospital-Mental Health Services CPH, Copenhagen, Denmark
| | - Vivek Appadurai
- Institute of Biological Psychiatry, Mental Health Center-Sct Hans, Copenhagen University Hospital-Mental Health Services CPH, Copenhagen, Denmark
| | - Richard Border
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Silviu-Alin Bacanu
- Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Thomas Werge
- Institute of Biological Psychiatry, Mental Health Center-Sct Hans, Copenhagen University Hospital-Mental Health Services CPH, Copenhagen, Denmark
- Lundbeck Foundation GeoGenetics Centre, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jonathan Flint
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Andrew J Schork
- Institute of Biological Psychiatry, Mental Health Center-Sct Hans, Copenhagen University Hospital-Mental Health Services CPH, Copenhagen, Denmark
- Neurogenomics Division, The Translational Genomics Research Institute (TGEN), Phoenix, AZ, USA
- Section for Geogenetics, GLOBE Institute, Faculty of Health and Medical Sciences, Copenhagen University, Copenhagen, Denmark
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kenneth S Kendler
- Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Na Cai
- Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany.
- Computational Health Centre, Helmholtz Zentrum München, Neuherberg, Germany.
- School of Medicine, Technical University of Munich, Munich, Germany.
| |
Collapse
|