1
|
Hong SC, Muyas F, Cortés-Ciriano I, Hormoz S. scAI-SNP: a method for inferring ancestry from single-cell data. BMC METHODS 2025; 2:10. [PMID: 40401145 PMCID: PMC12089154 DOI: 10.1186/s44330-025-00029-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 05/01/2025] [Indexed: 05/23/2025]
Abstract
Background Collaborative efforts, such as the Human Cell Atlas, are rapidly accumulating large amounts of single-cell data. To ensure that single-cell atlases are representative of human genetic diversity, we need to determine the ancestry of the donors from whom single-cell data are generated. Self-reporting of race and ethnicity, although important, can be biased and is not always available for the datasets already collected. Methods Here, we introduce scAI-SNP, a tool to infer ancestry directly from single-cell genomics data. To train scAI-SNP, we identified 4.5 million ancestry-informative single-nucleotide polymorphisms (SNPs) in the 1000 Genomes Project dataset across 3201 individuals from 26 population groups. For a query single-cell dataset, scAI-SNP uses these ancestry-informative SNPs to compute the contribution of each of the 26 population groups to the ancestry of the donor from whom the cells were obtained. Results Using diverse single-cell datasets with matched whole-genome sequencing data, we show that scAI-SNP is robust to the sparsity of single-cell data, can accurately and consistently infer ancestry from samples derived from diverse types of tissues and cancer cells, and can be applied to different modalities of single-cell profiling assays, such as single-cell RNA-seq and single-cell ATAC-seq. Discussion Finally, we argue that ensuring that single-cell atlases represent diverse ancestry, ideally alongside race and ethnicity, is ultimately important for improved and equitable health outcomes by accounting for human diversity. Supplementary Information The online version contains supplementary material available at 10.1186/s44330-025-00029-4.
Collapse
Affiliation(s)
- Sung Chul Hong
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215 USA
| | - Francesc Muyas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Isidro Cortés-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Sahand Hormoz
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215 USA
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115 USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
| |
Collapse
|
2
|
German J, Cordioli M, Tozzo V, Urbut S, Arumäe K, Smit RAJ, Lee J, Li JH, Janucik A, Ding Y, Akinkuolie A, Heyne HO, Eoli A, Saad C, Al-Sarraj Y, Abdel-Latif R, Mohammed S, Hail MA, Barry A, Wang Z, Cajuso T, Corbetta A, Natarajan P, Ripatti S, Philippakis A, Szczerbinski L, Pasaniuc B, Kutalik Z, Mbarek H, Loos RJF, Vainik U, Ganna A. Association between plausible genetic factors and weight loss from GLP1-RA and bariatric surgery. Nat Med 2025:10.1038/s41591-025-03645-3. [PMID: 40251273 DOI: 10.1038/s41591-025-03645-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 03/07/2025] [Indexed: 04/20/2025]
Abstract
Obesity is a major public health challenge. Glucagon-like peptide-1 receptor agonists (GLP1-RA) and bariatric surgery (BS) are effective weight loss interventions; however, the genetic factors influencing treatment response remain largely unexplored. Moreover, most previous studies have focused on race and ethnicity rather than genetic ancestry. Here we analyzed 10,960 individuals from 9 multiancestry biobank studies across 6 countries to assess the impact of known genetic factors on weight loss. Between 6 and 12 months, GLP1-RA users had an average weight change of -3.93% or -6.00%, depending on the outcome definition, with modest ancestry-based differences. BS patients experienced -21.17% weight change between 6 and 48 months. We found no significant associations between GLP1-RA-induced weight loss and polygenic scores for body mass index or type 2 diabetes, nor with missense variants in GLP1R. A higher body mass index polygenic score was modestly linked to lower weight loss after BS (+0.7% per s.d., P = 1.24 × 10-4), but the effect attenuated in sensitivity analyses. Our findings suggest known genetic factors have limited impact on GLP1-RA effectiveness with respect to weight change and confirm treatment efficacy across ancestry groups.
Collapse
Affiliation(s)
- Jakob German
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mattia Cordioli
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Veronica Tozzo
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Sarah Urbut
- Division of Cardiovascular Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Kadri Arumäe
- Institute of Psychology, University of Tartu, Tartu, Estonia
| | - Roelof A J Smit
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Environmental Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jiwoo Lee
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Josephine H Li
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Diabetes Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Adrian Janucik
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Digital Medicine, Medical University of Bialystok, Bialystok, Poland
| | - Yi Ding
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Akintunde Akinkuolie
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Henrike O Heyne
- Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Andrea Eoli
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chadi Saad
- Qatar Genome Program, Qatar Precision Health Institute, Qatar Foundation, Doha, Qatar
| | - Yasser Al-Sarraj
- Qatar Genome Program, Qatar Precision Health Institute, Qatar Foundation, Doha, Qatar
| | - Rania Abdel-Latif
- Qatar Genome Program, Qatar Precision Health Institute, Qatar Foundation, Doha, Qatar
| | - Shaban Mohammed
- Department of Pharmacy, Hamad Medical Corporation, Doha, Qatar
| | - Moza Al Hail
- Department of Pharmacy, Hamad Medical Corporation, Doha, Qatar
| | - Alexandra Barry
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Tatiana Cajuso
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Pathology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA
| | - Andrea Corbetta
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Health Data Science Centre, Human Technopole, Milan, Italy
- MOX - Laboratory for Modeling and Scientific Computing, Department of Mathematics, Politecnico di Milano, Milan, Italy
| | - Pradeep Natarajan
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Personalized Medicine, Mass General Brigham, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Program in Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Public Health, Clinicum, University of Helsinki, Helsinki, Finland
- Analytic & Translational Genetics Unit, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Anthony Philippakis
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lukasz Szczerbinski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Diabetes Unit, Massachusetts General Hospital, Boston, MA, USA
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Endocrinology, Diabetology and Internal Medicine, Medical University of Bialystok, Bialystok, Poland
- Clinical Research Centre, Medical University of Bialystok, Bialystok, Poland
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Institute of Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Zoltán Kutalik
- University Center for Primary Care and Public Health, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Hamdi Mbarek
- Qatar Genome Program, Qatar Precision Health Institute, Qatar Foundation, Doha, Qatar
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Environmental Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Uku Vainik
- Institute of Psychology, University of Tartu, Tartu, Estonia
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
- Department of Neurology and Neurosurgery, McGill University, Montreal, Quebec, Canada
| | - Andrea Ganna
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
3
|
Freeman K, Zwicker A, Fullerton JM, Hafeman DM, van Haren NEM, Merranko J, Goldstein BI, Stapp EK, de la Serna E, Moreno D, Sugranyes G, Mas S, Roberts G, Toma C, Schofield PR, Edenberg HJ, Wilcox HC, McInnis MG, Propper L, Pavlova B, Stewart SA, Denovan-Wright EM, Rouleau GA, Castro-Fornieles J, Hillegers MHJ, Birmaher B, Mitchell PB, Alda M, Nurnberger JI, Uher R. Polygenic Scores and Mood Disorder Onsets in the Context of Family History and Early Psychopathology. JAMA Netw Open 2025; 8:e255331. [PMID: 40238098 PMCID: PMC12004201 DOI: 10.1001/jamanetworkopen.2025.5331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Accepted: 02/12/2025] [Indexed: 04/18/2025] Open
Abstract
Importance Bipolar disorder (BD) and major depressive disorder (MDD) aggregate within families, with risk often first manifesting as early psychopathology, including attention-deficit/hyperactivity disorder (ADHD) and anxiety disorders. Objective To determine whether polygenic scores (PGS) are associated with mood disorder onset independent of familial high risk for BD (FHR-BD) and early psychopathology. Design, Setting, and Participants This cohort study used data from 7 prospective cohorts enriched in FHR-BD from Australia, Canada, the Netherlands, Spain, and the US. Participants with FHR-BD, defined as having at least 1 first-degree relative with BD, were compared with participants without FHR for any mood disorder. Participants were repeatedly assessed with variable follow-up intervals from July 1992 to July 2023. Data were analyzed from August 2023 to August 2024. Exposures PGS indexed genetic liability for MDD, BD, anxiety, neuroticism, subjective well-being, ADHD, self-regulation, and addiction risk factor. Semistructured diagnostic interviews with relatives established FHR-BD. ADHD or anxiety disorder diagnoses before mood disorder onset constituted early psychopathology. Main Outcomes and Measures The outcome of interest, mood disorder onset, was defined as a consensus-confirmed new diagnosis of MDD or BD. Cox regression examined associations of PGS, FHR-BD, ADHD, and anxiety with mood disorder onset. Kaplan-Meier curves and log-rank tests evaluated the probability of onset by PGS quartile and familial risk status. Results A total of 1064 participants (546 [51.3%] female; mean [SD] age at last assessment, 21.7 [5.1] years), including 660 with FHR-BD and 404 without FHR for any mood disorder, were repeatedly assessed for mental disorders. A total of 399 mood disorder onsets occurred over a variable mean (SD) follow-up interval of 6.3 (5.7) years. Multiple PGS were associated with onset after correcting for FHR-BD and early psychopathology, including PGS for ADHD (hazard ratio [HR], 1.19; 95% CI, 1.06-1.34), self-regulation (HR, 1.19; 95% CI, 1.06-1.34), neuroticism (HR, 1.18; 95% CI, 1.06-1.32), MDD (HR, 1.17; 95% CI, 1.04-1.31), addiction risk factor (HR, 1.16; 95% CI, 1.04-1.30), anxiety (HR, 1.15; 95% CI, 1.02-1.28), BD (HR, 1.14; 95% CI, 1.02-1.28), and subjective well-being (HR, 0.89; 95% CI, 0.79-0.99). High PGS for addiction risk factor, anxiety, BD, and MDD were associated with increased probability of onset in the control group. High PGS for ADHD and self-regulation increased rates of onset among participants with FHR-BD. PGS for self-regulation, ADHD, and addiction risk factors showed stronger associations with onsets of BD than MDD. Conclusions and Relevance In this cohort study, multiple PGS were associated with mood disorder onset independent of family history of BD and premorbid diagnoses of ADHD or anxiety. The association between PGS and mood disorder risk varied depending on family history status.
Collapse
Affiliation(s)
- Kathryn Freeman
- Department of Medical Neuroscience, Dalhousie University, Halifax, Nova Scotia, Canada
- Nova Scotia Health Authority, Halifax, Nova Scotia, Canada
| | - Alyson Zwicker
- Nova Scotia Health Authority, Halifax, Nova Scotia, Canada
- Department of Psychiatry, Dalhousie University, Halifax, Nova Scotia, Canada
- Dalhousie Medicine New Brunswick, St John, New Brunswick, Canada
| | - Janice M. Fullerton
- Neuroscience Research Australia, Randwick, New South Wales, Australia
- School of Biomedical Sciences, Faculty of Medicine & Health, University of New South Wales, Sydney, New South Wales, Australia
| | - Danella M. Hafeman
- Western Psychiatric Hospital, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Neeltje E. M. van Haren
- Department of Child and Adolescent Psychiatry/Psychology, Erasmus University Medical Center, Sophia Children’s Hospital, Rotterdam, the Netherlands
- Department of Psychiatry, University Medical Center Utrecht Brain Center, Utrecht, the Netherlands
| | - John Merranko
- Western Psychiatric Hospital, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Benjamin I. Goldstein
- Centre for Addiction and Mental Health, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Emma K. Stapp
- Milken Institute School of Public Health, George Washington University, Washington, District of Columbia
| | - Elena de la Serna
- Fundacio Clínic per la Recerca Biomedica, Institut d'Investigacions Biomèdiques d'August Pi i Sunye, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Madrid, Spain
- Department of Child and Adolescent Psychiatry and Psychology, 2021 SGR 01319, Hospital Clinic of Barcelona, Barcelona, Spain
| | - Dolores Moreno
- Centro de Investigación Biomédica en Red de Salud Mental, Madrid, Spain
- Department of Child and Adolescent Psychiatry, Hospital General Universitario Gregorio Marañón, Madrid, Spain
| | - Gisela Sugranyes
- Fundacio Clínic per la Recerca Biomedica, Institut d'Investigacions Biomèdiques d'August Pi i Sunye, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Madrid, Spain
- Department of Child and Adolescent Psychiatry and Psychology, 2021 SGR 01319, Hospital Clinic of Barcelona, Barcelona, Spain
| | - Sergi Mas
- Fundacio Clínic per la Recerca Biomedica, Institut d'Investigacions Biomèdiques d'August Pi i Sunye, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Madrid, Spain
- Department of Clinical Foundations, Universitat de Barcelona, Barcelona, Spain
| | - Gloria Roberts
- Discipline of Psychiatry and Mental Health, School of Clinical Medicine, University of New South Wales, Randwick, New South Wales, Australia
| | - Claudio Toma
- Neuroscience Research Australia, Randwick, New South Wales, Australia
- School of Biomedical Sciences, Faculty of Medicine & Health, University of New South Wales, Sydney, New South Wales, Australia
- Centro de Biología Molecular “Severo Ochoa”, Universidad Autónoma de Madrid, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - Peter R. Schofield
- Neuroscience Research Australia, Randwick, New South Wales, Australia
- School of Biomedical Sciences, Faculty of Medicine & Health, University of New South Wales, Sydney, New South Wales, Australia
| | - Howard J. Edenberg
- Department of Biochemistry and Molecular Biology, Indiana University, Indianapolis
| | - Holly C. Wilcox
- Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Johns Hopkins School of Medicine, Baltimore, Maryland
| | | | - Lukas Propper
- Department of Psychiatry, Dalhousie University, Halifax, Nova Scotia, Canada
- IWK Health Centre, Halifax, Nova Scotia, Canada
| | - Barbara Pavlova
- Nova Scotia Health Authority, Halifax, Nova Scotia, Canada
- Department of Psychiatry, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Samuel A. Stewart
- Department of Community Health and Epidemiology, Dalhousie University, Halifax, Nova Scotia, Canada
| | | | - Guy A. Rouleau
- Montreal Neurological Institute and Department of Neurology, McGill University, Montreal, Quebec, Canada
| | - Josefina Castro-Fornieles
- Fundacio Clínic per la Recerca Biomedica, Institut d'Investigacions Biomèdiques d'August Pi i Sunye, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Salud Mental, Madrid, Spain
- Department of Child and Adolescent Psychiatry and Psychology, 2021 SGR 01319, Hospital Clinic of Barcelona, Barcelona, Spain
- Department of Medicine, Neurosciences Institute, University of Barcelona, Barcelona, Spain
| | - Manon H. J. Hillegers
- Department of Child and Adolescent Psychiatry/Psychology, Erasmus University Medical Center, Sophia Children’s Hospital, Rotterdam, the Netherlands
| | - Boris Birmaher
- Western Psychiatric Hospital, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Philip B. Mitchell
- Discipline of Psychiatry and Mental Health, School of Clinical Medicine, University of New South Wales, Randwick, New South Wales, Australia
| | - Martin Alda
- Nova Scotia Health Authority, Halifax, Nova Scotia, Canada
- Department of Psychiatry, Dalhousie University, Halifax, Nova Scotia, Canada
| | - John I. Nurnberger
- Department of Psychiatry, Indiana University School of Medicine, Indianapolis
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis
| | - Rudolf Uher
- Department of Medical Neuroscience, Dalhousie University, Halifax, Nova Scotia, Canada
- Nova Scotia Health Authority, Halifax, Nova Scotia, Canada
- Department of Psychiatry, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
4
|
Gallagher CS, Ginsburg GS, Musick A. Biobanking with genetics shapes precision medicine and global health. Nat Rev Genet 2025; 26:191-202. [PMID: 39567741 DOI: 10.1038/s41576-024-00794-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/14/2024] [Indexed: 11/22/2024]
Abstract
Precision medicine provides patients with access to personally tailored treatments based on individual-level data. However, developing personalized therapies requires analyses with substantial statistical power to map genetic and epidemiologic associations that ultimately create models informing clinical decisions. As one solution, biobanks have emerged as large-scale, longitudinal cohort studies with long-term storage of biological specimens and health information, including electronic health records and participant survey responses. By providing access to individual-level data for genotype-phenotype mapping efforts, pharmacogenomic studies, polygenic risk score assessments and rare variant analyses, biobanks support ongoing and future precision medicine research. Notably, due in part to the geographical enrichment of biobanks in Western Europe and North America, European ancestries have become disproportionately over-represented in precision medicine research. Herein, we provide a genetics-focused review of biobanks from around the world that are in pursuit of supporting precision medicine. We discuss the limitations of their designs, ongoing efforts to diversify genomics research and strategies to maximize the benefits of research leveraging biobanks for all.
Collapse
Affiliation(s)
- C Scott Gallagher
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Geoffrey S Ginsburg
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Anjené Musick
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
5
|
Stoneman HR, Price AM, Trout NS, Lamont R, Tifour S, Pozdeyev N, Crooks K, Lin M, Rafaels N, Gignoux CR, Marker KM, Hendricks AE. Characterizing substructure via mixture modeling in large-scale genetic summary statistics. Am J Hum Genet 2025; 112:235-253. [PMID: 39824191 PMCID: PMC11866976 DOI: 10.1016/j.ajhg.2024.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 12/09/2024] [Accepted: 12/09/2024] [Indexed: 01/20/2025] Open
Abstract
Genetic summary data are broadly accessible and highly useful, including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into summary data, such as allele frequencies, masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted-for substructure limits summary data usability, especially for understudied or admixed populations. There is a need for methods to enable the harmonization of summary data where the underlying substructure is matched between datasets. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to enable the harmonization of genetic summary data by estimating and adjusting for substructure. In extensive simulations and application to public data, we show that Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and scans for potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse, publicly available summary data, resulting in improved and more equitable research.
Collapse
Affiliation(s)
- Hayley R Stoneman
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Adelle M Price
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikole Scribner Trout
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Riley Lamont
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Souha Tifour
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikita Pozdeyev
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristy Crooks
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Department of Pathology, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Meng Lin
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Nicholas Rafaels
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christopher R Gignoux
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Katie M Marker
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Audrey E Hendricks
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
| |
Collapse
|
6
|
Nurm M, Reigo A, Annilo T, Toomsoo T, Nõukas M, Nikopensius T, Pankratov V, Reisberg T, Hudjashov G, Haller T, Tõnisson N. Use of Estonian Biobank data and participant recall to improve Wilson's disease management. Eur J Hum Genet 2024:10.1038/s41431-024-01767-9. [PMID: 39674827 DOI: 10.1038/s41431-024-01767-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 11/05/2024] [Accepted: 12/03/2024] [Indexed: 12/16/2024] Open
Abstract
Population-based biobanks enable genomic screening to support initiatives that prevent disease onset or slow its progression and to estimate the prevalence of genetic diseases in the population. Wilson's disease (WD) is a rare genetic copper-accumulation disorder for which timely intervention is crucial, as treatment is readily available. We studied WD in the Estonian Biobank population to advance patient screening, swift diagnosis, and subsequent treatment. Combined analysis of genotype and phenotype data from electronic health records (EHRs) consolidated at the Estonian biobank led to the identification of 17 individuals at high risk of developing WD, who were recalled for further examination and deep phenotyping. All recall study participants, regardless of phenotype, age, and prior WD diagnosis, had low serum ceruloplasmin and copper levels, and 87% also exhibited signs of early to late neurodegeneration. The p.His1069Gln variant in ATP7B, a prevalent pathogenic mutation, showed a striking four- to five-fold enrichment in Estonians compared with other populations. Based on our analysis of genetic and nationwide health registry data, we estimate that WD remains underdiagnosed and undertreated in Estonia. Our study demonstrates that personalized medicine, implemented with the collaboration of medical professionals, has the potential to reduce the healthcare burden by facilitating the accurate diagnosis of rare genetic diseases. To our knowledge, this report is the first to describe a large-scale national biobank-based study of WD.
Collapse
Affiliation(s)
- Miriam Nurm
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia.
| | - Anu Reigo
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Tarmo Annilo
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Toomas Toomsoo
- Confido Medical Center, Tartu, Estonia
- School of Natural Sciences and Health, Tallinn University, Tallinn, Estonia
| | - Margit Nõukas
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Tiit Nikopensius
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Vasili Pankratov
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Tuuli Reisberg
- Core Facility of Genomics, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Georgi Hudjashov
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Toomas Haller
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Neeme Tõnisson
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
- Genetics and Personalized Medicine Clinic, Tartu University Hospital, Tartu, Estonia
| |
Collapse
|
7
|
Grinde KE, Browning BL, Reiner AP, Thornton TA, Browning SR. Adjusting for principal components can induce collider bias in genome-wide association studies. PLoS Genet 2024; 20:e1011242. [PMID: 39680601 PMCID: PMC11684764 DOI: 10.1371/journal.pgen.1011242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 12/30/2024] [Accepted: 11/14/2024] [Indexed: 12/18/2024] Open
Abstract
Principal component analysis (PCA) is widely used to control for population structure in genome-wide association studies (GWAS). Top principal components (PCs) typically reflect population structure, but challenges arise in deciding how many PCs are needed and ensuring that PCs do not capture other artifacts such as regions with atypical linkage disequilibrium (LD). In response to the latter, many groups suggest performing LD pruning or excluding known high LD regions prior to PCA. However, these suggestions are not universally implemented and the implications for GWAS are not fully understood, especially in the context of admixed populations. In this paper, we investigate the impact of pre-processing and the number of PCs included in GWAS models in African American samples from the Women's Health Initiative SNP Health Association Resource and two Trans-Omics for Precision Medicine Whole Genome Sequencing Project contributing studies (Jackson Heart Study and Genetic Epidemiology of Chronic Obstructive Pulmonary Disease Study). In all three samples, we find the first PC is highly correlated with genome-wide ancestry whereas later PCs often capture local genomic features. The pattern of which, and how many, genetic variants are highly correlated with individual PCs differs from what has been observed in prior studies focused on European populations and leads to distinct downstream consequences: adjusting for such PCs yields biased effect size estimates and elevated rates of spurious associations due to the phenomenon of collider bias. Excluding high LD regions identified in previous studies does not resolve these issues. LD pruning proves more effective, but the optimal choice of thresholds varies across datasets. Altogether, our work highlights unique issues that arise when using PCA to control for ancestral heterogeneity in admixed populations and demonstrates the importance of careful pre-processing and diagnostics to ensure that PCs capturing multiple local genomic features are not included in GWAS models.
Collapse
Affiliation(s)
- Kelsey E. Grinde
- Department of Mathematics, Statistics, and Computer Science, Macalester College, Saint Paul, Minnesota, United States of America
| | - Brian L. Browning
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Alexander P. Reiner
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
| | - Timothy A. Thornton
- Regeneron Genetics Center, Tarrytown, New York, United States of America
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Sharon R. Browning
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
8
|
Xu H, Zhang G, Chen J. A novel method for cell deconvolution using DNA methylation in PCA space. BMC Genomics 2024; 25:798. [PMID: 39179972 PMCID: PMC11344294 DOI: 10.1186/s12864-024-10652-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 07/22/2024] [Indexed: 08/26/2024] Open
Abstract
BACKGROUND In this study, we present a novel method for reference-based cell deconvolution using data from DNA methylation arrays. Different from existing methods like IDOL-Ext, which operate on probe-level data, our approach represents features in the principal component analysis (PCA) space for cell type deconvolution. RESULTS Our method's accuracy in estimating cell compositions is validated across various public datasets, including blood samples from glioma patients. It demonstrates precision comparable to IDOL-Ext, with R2 values ranging from 0.73 to 0.99 for most cell types, while offering improved discrimination between similar cell types, particularly T cell subtypes in glioma patient samples (R2 0.42-0.75 vs. 0.36-0.66 for IDOL-Ext). However, both methods showed lower accuracy for certain cell types, such as memory CD8 T cells in glioma patients (R2 0.42 vs. 0.36 for IDOL-Ext), highlighting the challenges in distinguishing closely related cell populations. We have made this method available as an R package "BloodCellDecon" on GitHub. CONCLUSIONS Our study confirms the efficacy of cell type deconvolution in PCA space. The results indicate wide-ranging applicability and potential for adaptation to other forms of genomic data.
Collapse
Affiliation(s)
- Huan Xu
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Ge Zhang
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center and March of Dimes Prematurity Research Center Ohio Collaborative, Cincinnati, OH, USA
| | - Jing Chen
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
| |
Collapse
|
9
|
Hong SC, Muyas F, Cortés-Ciriano I, Hormoz S. scAI-SNP: a method for inferring ancestry from single-cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.14.594208. [PMID: 38798590 PMCID: PMC11118306 DOI: 10.1101/2024.05.14.594208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Collaborative efforts, such as the Human Cell Atlas, are rapidly accumulating large amounts of single-cell data. To ensure that single-cell atlases are representative of human genetic diversity, we need to determine the ancestry of the donors from whom single-cell data are generated. Self-reporting of race and ethnicity, although important, can be biased and is not always available for the datasets already collected. Here, we introduce scAI-SNP, a tool to infer ancestry directly from single-cell genomics data. To train scAI-SNP, we identified 4.5 million ancestry-informative single-nucleotide polymorphisms (SNPs) in the 1000 Genomes Project dataset across 3201 individuals from 26 population groups. For a query single-cell data set, scAI-SNP uses these ancestry-informative SNPs to compute the contribution of each of the 26 population groups to the ancestry of the donor from whom the cells were obtained. Using diverse single-cell data sets with matched whole-genome sequencing data, we show that scAI-SNP is robust to the sparsity of single-cell data, can accurately and consistently infer ancestry from samples derived from diverse types of tissues and cancer cells, and can be applied to different modalities of single-cell profiling assays, such as single-cell RNA-seq and single-cell ATAC-seq. Finally, we argue that ensuring that single-cell atlases represent diverse ancestry, ideally alongside race and ethnicity, is ultimately important for improved and equitable health outcomes by accounting for human diversity.
Collapse
Affiliation(s)
- Sung Chul Hong
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Francesc Muyas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - Isidro Cortés-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - Sahand Hormoz
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
10
|
Stoneman HR, Price A, Trout NS, Lamont R, Tifour S, Pozdeyev N, Crooks K, Lin M, Rafaels N, Gignoux CR, Marker KM, Hendricks AE. Characterizing substructure via mixture modeling in large-scale genetic summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577805. [PMID: 38766180 PMCID: PMC11100604 DOI: 10.1101/2024.01.29.577805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Genetic summary data are broadly accessible and highly useful including for risk prediction, causal inference, fine mapping, and incorporation of external controls. However, collapsing individual-level data into groups masks intra- and inter-sample heterogeneity, leading to confounding, reduced power, and bias. Ultimately, unaccounted substructure limits summary data usability, especially for understudied or admixed populations. Here, we present Summix2, a comprehensive set of methods and software based on a computationally efficient mixture model to estimate and adjust for substructure in genetic summary data. In extensive simulations and application to public data, Summix2 characterizes finer-scale population structure, identifies ascertainment bias, and identifies potential regions of selection due to local substructure deviation. Summix2 increases the robust use of diverse publicly available summary data resulting in improved and more equitable research.
Collapse
Affiliation(s)
- Hayley R Stoneman
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Adelle Price
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikole Scribner Trout
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Riley Lamont
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Souha Tifour
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Nikita Pozdeyev
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristy Crooks
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Department of Pathology, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Meng Lin
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Nicholas Rafaels
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christopher R Gignoux
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Katie M Marker
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Audrey E Hendricks
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| |
Collapse
|
11
|
Lee S, Hecker J, Hahn G, Mullin K, Alzheimer's Disease Neuroimaging Initiative (ADNI), Lutz SM, Tanzi RE, Lange C, Prokopenko D. On the effect heterogeneity of established disease susceptibility loci for Alzheimer's disease across different genetic ancestries. Alzheimers Dement 2024; 20:3397-3405. [PMID: 38563508 PMCID: PMC11095441 DOI: 10.1002/alz.13796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 02/14/2024] [Accepted: 02/23/2024] [Indexed: 04/04/2024]
Abstract
INTRODUCTION Genome-wide association studies have identified numerous disease susceptibility loci (DSLs) for Alzheimer's disease (AD). However, only a limited number of studies have investigated the dependence of the genetic effect size of established DSLs on genetic ancestry. METHODS We utilized the whole genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP) including 35,569 participants. A total of 25,459 subjects in four distinct populations (African ancestry, non-Hispanic White, admixed Hispanic, and Asian) were analyzed. RESULTS We found that nine DSLs showed significant heterogeneity across populations. Single nucleotide polymorphism (SNP) rs2075650 in translocase of outer mitochondrial membrane 40 (TOMM40) showed the largest heterogeneity (Cochran's Q = 0.00, I2 = 90.08), followed by other SNPs in apolipoprotein C1 (APOC1) and apolipoprotein E (APOE). Two additional loci, signal-induced proliferation-associated 1 like 2 (SIPA1L2) and solute carrier 24 member 4 (SLC24A4), showed significant heterogeneity across populations. DISCUSSION We observed substantial heterogeneity for the APOE-harboring 19q13.32 region with TOMM40/APOE/APOC1 genes. The largest risk effect was seen among African Americans, while Asians showed a surprisingly small risk effect.
Collapse
Affiliation(s)
- Sanghun Lee
- Department of Medical ConsilienceDivision of MedicineGraduate schoolDankook UniversityYongin‐siGyeonggi‐doSouth Korea
- Channing Division of Network MedicineBrigham and Women's HospitalBostonMassachusettsUSA
- Department of BiostatisticsHarvard T.H. Chan School of Public HealthBostonMassachusettsUSA
| | - Julian Hecker
- Channing Division of Network MedicineBrigham and Women's HospitalBostonMassachusettsUSA
| | - Georg Hahn
- Department of BiostatisticsHarvard T.H. Chan School of Public HealthBostonMassachusettsUSA
| | - Kristina Mullin
- Genetics and Aging Unit and McCance Center for Brain HealthDepartment of NeurologyMassachusetts General HospitalCharlestownMassachusettsUSA
| | | | - Sharon M. Lutz
- Department of BiostatisticsHarvard T.H. Chan School of Public HealthBostonMassachusettsUSA
- Department of Population MedicineHarvard Medical School and Harvard Pilgrim Healthcare InstituteBostonMassachusettsUSA
| | - Rudolph E. Tanzi
- Genetics and Aging Unit and McCance Center for Brain HealthDepartment of NeurologyMassachusetts General HospitalCharlestownMassachusettsUSA
| | - Christoph Lange
- Channing Division of Network MedicineBrigham and Women's HospitalBostonMassachusettsUSA
- Department of BiostatisticsHarvard T.H. Chan School of Public HealthBostonMassachusettsUSA
| | - Dmitry Prokopenko
- Genetics and Aging Unit and McCance Center for Brain HealthDepartment of NeurologyMassachusetts General HospitalCharlestownMassachusettsUSA
| |
Collapse
|
12
|
Bonfiglio F, Lasorsa VA, Aievola V, Cantalupo S, Morini M, Ardito M, Conte M, Fragola M, Eva A, Corrias MV, Iolascon A, Capasso M. Exploring the role of HLA variants in neuroblastoma susceptibility through whole exome sequencing. HLA 2024; 103:e15515. [PMID: 38747019 DOI: 10.1111/tan.15515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 04/18/2024] [Accepted: 04/25/2024] [Indexed: 10/24/2024]
Abstract
Although a number of susceptibility loci for neuroblastoma (NB) have been identified by genome-wide association studies, it is still unclear whether variants in the HLA region contribute to NB susceptibility. In this study, we conducted a comprehensive genetic analysis of variants in the HLA region among 724 NB patients and 2863 matched controls from different cohorts. We exploited whole-exome sequencing data to accurately type HLA alleles with an ensemble approach on the results from three different typing tools, and carried out rigorous sample quality control to ensure a fine-scale ancestry matching. The frequencies of common HLA alleles were compared between cases and controls by logistic regression under additive and non-additive models. Population stratification was taken into account adjusting for ancestry-informative principal components. We detected significant HLA associations with NB. In particular, HLA-DQB1*05:02 (OR = 1.61; padj = 5.4 × 10-3) and HLA-DRB1*16:01 (OR = 1.60; padj = 2.3 × 10-2) alleles were associated to higher risk of developing NB. Conditional analysis highlighted the HLA-DQB1*05:02 allele and its residue Ser57 as key to this association. DQB1*05:02 allele was not associated to clinical features worse outcomes in the NB cohort. Nevertheless, a risk score derived from the allelic combinations of five HLA variants showed a substantial predictive value for patient survival (HR = 1.53; p = 0.032) that was independent from established NB prognostic factors. Our study leveraged powerful computational methods to explore WES data and HLA variants and to reveal complex genetic associations. Further studies are needed to validate the mechanisms of these interactions that contribute to the multifaceted pattern of factors underlying the disease initiation and progression.
Collapse
Affiliation(s)
- Ferdinando Bonfiglio
- Department of Molecular Medicine and Medical Biotechnology, University of Naples "Federico II", Naples, Italy
- CEINGE Biotecnologie Avanzate s.c.a r.l., Naples, Italy
| | | | - Vincenzo Aievola
- Department of Molecular Medicine and Medical Biotechnology, University of Naples "Federico II", Naples, Italy
- CEINGE Biotecnologie Avanzate s.c.a r.l., Naples, Italy
| | - Sueva Cantalupo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples "Federico II", Naples, Italy
- CEINGE Biotecnologie Avanzate s.c.a r.l., Naples, Italy
| | - Martina Morini
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - Martina Ardito
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - Massimo Conte
- U.O.C. Oncologia Pediatrica, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - Martina Fragola
- Servizio di Epidemiologia e Biostatistica, Direzione Scientifica, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - Alessandra Eva
- Direzione Scientifica, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - Maria Valeria Corrias
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - Achille Iolascon
- Department of Molecular Medicine and Medical Biotechnology, University of Naples "Federico II", Naples, Italy
- CEINGE Biotecnologie Avanzate s.c.a r.l., Naples, Italy
| | - Mario Capasso
- Department of Molecular Medicine and Medical Biotechnology, University of Naples "Federico II", Naples, Italy
- CEINGE Biotecnologie Avanzate s.c.a r.l., Naples, Italy
| |
Collapse
|
13
|
Grinde KE, Browning BL, Reiner AP, Thornton TA, Browning SR. Adjusting for principal components can induce spurious associations in genome-wide association studies in admixed populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.02.587682. [PMID: 38617337 PMCID: PMC11014513 DOI: 10.1101/2024.04.02.587682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/24/2024]
Abstract
Principal component analysis (PCA) is widely used to control for population structure in genome-wide association studies (GWAS). Top principal components (PCs) typically reflect population structure, but challenges arise in deciding how many PCs are needed and ensuring that PCs do not capture other artifacts such as regions with atypical linkage disequilibrium (LD). In response to the latter, many groups suggest performing LD pruning or excluding known high LD regions prior to PCA. However, these suggestions are not universally implemented and the implications for GWAS are not fully understood, especially in the context of admixed populations. In this paper, we investigate the impact of pre-processing and the number of PCs included in GWAS models in African American samples from the Women's Women's Health Initiative SNP Health Association Resource and two Trans-Omics for Precision Medicine Whole Genome Sequencing Project contributing studies (Jackson Heart Study and Genetic Epidemiology of Chronic Obstructive Pulmonary Disease Study). In all three samples, we find the first PC is highly correlated with genome-wide ancestry whereas later PCs often capture local genomic features. The pattern of which, and how many, genetic variants are highly correlated with individual PCs differs from what has been observed in prior studies focused on European populations and leads to distinct downstream consequences: adjusting for such PCs yields biased effect size estimates and elevated rates of spurious associations due to the phenomenon of collider bias. Excluding high LD regions identified in previous studies does not resolve these issues. LD pruning proves more effective, but the optimal choice of thresholds varies across datasets. Altogether, our work highlights unique issues that arise when using PCA to control for ancestral heterogeneity in admixed populations and demonstrates the importance of careful pre-processing and diagnostics to ensure that PCs capturing multiple local genomic features are not included in GWAS models.
Collapse
Affiliation(s)
- Kelsey E. Grinde
- Department of Mathematics, Statistics, and Computer Science, Macalester College, Saint Paul, Minnesota, 55105, USA
| | - Brian L. Browning
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, 98195, USA
| | - Alexander P. Reiner
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, 98109, USA
- Department of Epidemiology, University of Washington, Seattle, Washington, 98195, USA
| | - Timothy A. Thornton
- Regeneron Genetics Center, Tarrytown, New York, 10591, USA
- Department of Biostatistics, University of Washington, Seattle, Washington, 98195, USA
| | - Sharon R. Browning
- Department of Biostatistics, University of Washington, Seattle, Washington, 98195, USA
| |
Collapse
|
14
|
Padilla-Iglesias C, Derkx I. Hunter-gatherer genetics research: Importance and avenues. EVOLUTIONARY HUMAN SCIENCES 2024; 6:e15. [PMID: 38516374 PMCID: PMC10955370 DOI: 10.1017/ehs.2024.7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 01/17/2024] [Accepted: 02/02/2024] [Indexed: 03/23/2024] Open
Abstract
Major developments in the field of genetics in the past few decades have revolutionised notions of what it means to be human. Although currently only a few populations around the world practise a hunting and gathering lifestyle, this mode of subsistence has characterised members of our species since its very origins and allowed us to migrate across the planet. Therefore, the geographical distribution of hunter-gatherer populations, dependence on local ecosystems and connections to past populations and neighbouring groups have provided unique insights into our evolutionary origins. However, given the vulnerable status of hunter-gatherers worldwide, the development of the field of anthropological genetics requires that we reevaluate how we conduct research with these communities. Here, we review how the inclusion of hunter-gatherer populations in genetics studies has advanced our understanding of human origins, ancient population migrations and interactions as well as phenotypic adaptations and adaptability to different environments, and the important scientific and medical applications of these advancements. At the same time, we highlight the necessity to address yet unresolved questions and identify areas in which the field may benefit from improvements.
Collapse
Affiliation(s)
| | - Inez Derkx
- Department of Evolutionary Anthropology, University of Zurich, Zurich, Switzerland
| |
Collapse
|
15
|
Privé F, Albiñana C, Arbel J, Pasaniuc B, Vilhjálmsson BJ. Inferring disease architecture and predictive ability with LDpred2-auto. Am J Hum Genet 2023; 110:2042-2055. [PMID: 37944514 PMCID: PMC10716363 DOI: 10.1016/j.ajhg.2023.10.010] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/15/2023] [Accepted: 10/17/2023] [Indexed: 11/12/2023] Open
Abstract
LDpred2 is a widely used Bayesian method for building polygenic scores (PGSs). LDpred2-auto can infer the two parameters from the LDpred model, the SNP heritability h2 and polygenicity p, so that it does not require an additional validation dataset to choose best-performing parameters. The main aim of this paper is to properly validate the use of LDpred2-auto for inferring multiple genetic parameters. Here, we present a new version of LDpred2-auto that adds an optional third parameter α to its model, for modeling negative selection. We then validate the inference of these three parameters (or two, when using the previous model). We also show that LDpred2-auto provides per-variant probabilities of being causal that are well calibrated and can therefore be used for fine-mapping purposes. We also introduce a formula to infer the out-of-sample predictive performance r2 of the resulting PGS directly from the Gibbs sampler of LDpred2-auto. Finally, we extend the set of HapMap3 variants recommended to use with LDpred2 with 37% more variants to improve the coverage of this set, and we show that this new set of variants captures 12% more heritability and provides 6% more predictive performance, on average, in UK Biobank analyses.
Collapse
Affiliation(s)
- Florian Privé
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark.
| | - Clara Albiñana
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
| | - Julyan Arbel
- University Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, France
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bjarni J Vilhjálmsson
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark; Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark; Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| |
Collapse
|
16
|
Wu Y, Hao X, Zhu K, Zheng C, Guan F, Zeng P, Wang T. Long-term adverse influence of smoking during pregnancy on height and body size of offspring at ten years old in the UK Biobank cohort. SSM Popul Health 2023; 24:101506. [PMID: 37692834 PMCID: PMC10492214 DOI: 10.1016/j.ssmph.2023.101506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 08/27/2023] [Accepted: 08/29/2023] [Indexed: 09/12/2023] Open
Abstract
Background To explore the long-term relationship between maternal smoking during pregnancy and early childhood growth in the UK Biobank cohort. Methods To estimate the effect of maternal smoking during pregnancy on offspring height and body size at ten years old, we performed binary logistic analyses and reported odds ratios (OR) as well as 95% confidence intervals (95%CIs). We also implemented the cross-contextual comparison study to examine whether such influence could be repeatedly observed among three different ethnicities in the UK Biobank cohort (n = 22,140 for White, n = 7094 for South Asian, and n = 5000 for Black). In particular, we conducted the sibling cohort study in White sibling cohort (n = 9953 for height and n = 7239 for body size) to control for unmeasured familial confounders. Results We discovered that children whose mothers smoked during pregnancy had greater risk of being shorter or plumper at age ten in the full UK Biobank White cohort, with 15.3% (95% CIs: 13.0%∼17.7%) higher risk for height and 32.4% (95%CIs: 29.5%∼35.4%) larger risk for body size. Similar associations were identified in the South Asian and Black ethnicities. These associations were robust and remained significant in the White sibling cohort (12.6% [95%CIs: 5.0%∼20.3%] for height and 36.1% [95%CIs: 26.3%∼45.9%] for body size) after controlling for family factors. Conclusion This study robustly confirms that maternal smoking during pregnancy can promote height deficit and obesity for offspring at ten years old. Our findings strongly encourage mothers to quit smoking during pregnancy for improving growth and development of offspring.
Collapse
Affiliation(s)
- Yuxuan Wu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Xingjie Hao
- Department of Biostatistics and Epidemiology, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Kexuan Zhu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Chu Zheng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Fengjun Guan
- Department of Pediatrics, Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
17
|
Gichoya JW, Thomas K, Celi LA, Safdar N, Banerjee I, Banja JD, Seyyed-Kalantari L, Trivedi H, Purkayastha S. AI pitfalls and what not to do: mitigating bias in AI. Br J Radiol 2023; 96:20230023. [PMID: 37698583 PMCID: PMC10546443 DOI: 10.1259/bjr.20230023] [Citation(s) in RCA: 63] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Revised: 08/10/2023] [Accepted: 08/14/2023] [Indexed: 09/13/2023] Open
Abstract
Various forms of artificial intelligence (AI) applications are being deployed and used in many healthcare systems. As the use of these applications increases, we are learning the failures of these models and how they can perpetuate bias. With these new lessons, we need to prioritize bias evaluation and mitigation for radiology applications; all the while not ignoring the impact of changes in the larger enterprise AI deployment which may have downstream impact on performance of AI models. In this paper, we provide an updated review of known pitfalls causing AI bias and discuss strategies for mitigating these biases within the context of AI deployment in the larger healthcare enterprise. We describe these pitfalls by framing them in the larger AI lifecycle from problem definition, data set selection and curation, model training and deployment emphasizing that bias exists across a spectrum and is a sequela of a combination of both human and machine factors.
Collapse
Affiliation(s)
| | - Kaesha Thomas
- Department of Radiology, Emory University, Atlanta, United States
| | | | - Nabile Safdar
- Department of Radiology, Emory University, Atlanta, United States
| | - Imon Banerjee
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, United States
| | - John D Banja
- Emory University Center for Ethics, Emory University, Atlanta, United States
| | - Laleh Seyyed-Kalantari
- Department of Electrical Engineering and Computer Science, Lassonde School of Engineering, York University, North York, United States
| | - Hari Trivedi
- Department of Radiology, Emory University, Atlanta, United States
| | - Saptarshi Purkayastha
- School of Informatics and Computing, Indiana University Purdue University, Indianapolis, United States
| |
Collapse
|
18
|
Mantes AD, Montserrat DM, Bustamante CD, Giró-i-Nieto X, Ioannidis AG. Neural ADMIXTURE for rapid genomic clustering. NATURE COMPUTATIONAL SCIENCE 2023; 3:621-629. [PMID: 37600116 PMCID: PMC10438426 DOI: 10.1038/s43588-023-00482-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 06/06/2023] [Indexed: 08/22/2023]
Abstract
Characterizing the genetic structure of large cohorts has become increasingly important as genetic studies extend to massive, increasingly diverse biobanks. Popular methods decompose individual genomes into fractional cluster assignments with each cluster representing a vector of DNA variant frequencies. However, with rapidly increasing biobank sizes, these methods have become computationally intractable. Here we present Neural ADMIXTURE, a neural network autoencoder that follows the same modeling assumptions as the current standard algorithm, ADMIXTURE, while reducing the compute time by orders of magnitude surpassing even the fastest alternatives. One month of continuous compute using ADMIXTURE can be reduced to just hours with Neural ADMIXTURE. A multi-head approach allows Neural ADMIXTURE to offer even further acceleration by calculating multiple cluster numbers in a single run. Furthermore, the models can be stored, allowing cluster assignment to be performed on new data in linear time without needing to share the training samples.
Collapse
Affiliation(s)
- Albert Dominguez Mantes
- Department of Biomedical Data Science, Stanford Medical School, Stanford, CA, United States
- Signal Theory and Communications Department, Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Vaud, Switzerland
| | - Daniel Mas Montserrat
- Department of Biomedical Data Science, Stanford Medical School, Stanford, CA, United States
| | | | - Xavier Giró-i-Nieto
- Signal Theory and Communications Department, Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain
| | - Alexander G. Ioannidis
- Department of Biomedical Data Science, Stanford Medical School, Stanford, CA, United States
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, United States
| |
Collapse
|
19
|
Grandi E, Navedo MF, Saucerman JJ, Bers DM, Chiamvimonvat N, Dixon RE, Dobrev D, Gomez AM, Harraz OF, Hegyi B, Jones DK, Krogh-Madsen T, Murfee WL, Nystoriak MA, Posnack NG, Ripplinger CM, Veeraraghavan R, Weinberg S. Diversity of cells and signals in the cardiovascular system. J Physiol 2023; 601:2547-2592. [PMID: 36744541 PMCID: PMC10313794 DOI: 10.1113/jp284011] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 01/19/2023] [Indexed: 02/07/2023] Open
Abstract
This white paper is the outcome of the seventh UC Davis Cardiovascular Research Symposium on Systems Approach to Understanding Cardiovascular Disease and Arrhythmia. This biannual meeting aims to bring together leading experts in subfields of cardiovascular biomedicine to focus on topics of importance to the field. The theme of the 2022 Symposium was 'Cell Diversity in the Cardiovascular System, cell-autonomous and cell-cell signalling'. Experts in the field contributed their experimental and mathematical modelling perspectives and discussed emerging questions, controversies, and challenges in examining cell and signal diversity, co-ordination and interrelationships involved in cardiovascular function. This paper originates from the topics of formal presentations and informal discussions from the Symposium, which aimed to develop a holistic view of how the multiple cell types in the cardiovascular system integrate to influence cardiovascular function, disease progression and therapeutic strategies. The first section describes the major cell types (e.g. cardiomyocytes, vascular smooth muscle and endothelial cells, fibroblasts, neurons, immune cells, etc.) and the signals involved in cardiovascular function. The second section emphasizes the complexity at the subcellular, cellular and system levels in the context of cardiovascular development, ageing and disease. Finally, the third section surveys the technological innovations that allow the interrogation of this diversity and advancing our understanding of the integrated cardiovascular function and dysfunction.
Collapse
Affiliation(s)
- Eleonora Grandi
- Department of Pharmacology, University of California Davis, Davis, CA, USA
| | - Manuel F. Navedo
- Department of Pharmacology, University of California Davis, Davis, CA, USA
| | - Jeffrey J. Saucerman
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
| | - Donald M. Bers
- Department of Pharmacology, University of California Davis, Davis, CA, USA
| | - Nipavan Chiamvimonvat
- Department of Pharmacology, University of California Davis, Davis, CA, USA
- Department of Internal Medicine, University of California Davis, Davis, CA, USA
| | - Rose E. Dixon
- Department of Physiology and Membrane Biology, University of California Davis, Davis, CA, USA
| | - Dobromir Dobrev
- Institute of Pharmacology, West German Heart and Vascular Center, University Duisburg-Essen, Essen, Germany
- Department of Medicine, Montreal Heart Institute and Université de Montréal, Montréal, Canada
- Department of Molecular Physiology & Biophysics, Baylor College of Medicine, Houston, TX, USA
| | - Ana M. Gomez
- Signaling and Cardiovascular Pathophysiology-UMR-S 1180, INSERM, Université Paris-Saclay, Orsay, France
| | - Osama F. Harraz
- Department of Pharmacology, Larner College of Medicine, and Vermont Center for Cardiovascular and Brain Health, University of Vermont, Burlington, VT, USA
| | - Bence Hegyi
- Department of Pharmacology, University of California Davis, Davis, CA, USA
| | - David K. Jones
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Trine Krogh-Madsen
- Department of Physiology & Biophysics, Weill Cornell Medicine, New York, New York, USA
| | - Walter Lee Murfee
- J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, Gainesville, FL, USA
| | - Matthew A. Nystoriak
- Department of Medicine, Division of Environmental Medicine, Center for Cardiometabolic Science, University of Louisville, Louisville, KY, 40202, USA
| | - Nikki G. Posnack
- Department of Pediatrics, Department of Pharmacology and Physiology, The George Washington University, Washington, DC, USA
- Sheikh Zayed Institute for Pediatric and Surgical Innovation, Children’s National Heart Institute, Children’s National Hospital, Washington, DC, USA
| | | | - Rengasayee Veeraraghavan
- Department of Biomedical Engineering, The Ohio State University, Columbus, OH, USA
- Dorothy M. Davis Heart & Lung Research Institute, The Ohio State University – Wexner Medical Center, Columbus, OH, USA
| | - Seth Weinberg
- Department of Biomedical Engineering, The Ohio State University, Columbus, OH, USA
- Dorothy M. Davis Heart & Lung Research Institute, The Ohio State University – Wexner Medical Center, Columbus, OH, USA
| |
Collapse
|
20
|
Jacobs BM, Schalk L, Dunne A, Scalfari A, Nandoskar A, Gran B, Mein CA, Sellers C, Spilker C, Rog D, Visentin E, Bezzina EL, Uzochukwu E, Tallantyre E, Wozniak E, Sacre E, Hassan-Smith G, Ford HL, Harris J, Bradley J, Breedon J, Brooke J, Kreft KL, Tuite Dalton K, George K, Papachatzaki M, O'Malley M, Peter M, Mattoscio M, Rhule N, Evangelou N, Vinod N, Quinn O, Shamji R, Kaimal R, Boulton R, Tanveer R, Middleton R, Murray R, Bellfield R, Hoque S, Patel S, Raj S, Gumus S, Mitchell S, Sawcer S, Arun T, Pogreban T, Brown TL, Begum T, Antoine V, Rashid W, Noyce AJ, Silber E, Morris H, Giovannoni G, Dobson R. ADAMS project: a genetic Association study in individuals from Diverse Ancestral backgrounds with Multiple Sclerosis based in the UK. BMJ Open 2023; 13:e071656. [PMID: 37197821 PMCID: PMC10193065 DOI: 10.1136/bmjopen-2023-071656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 04/14/2023] [Indexed: 05/19/2023] Open
Abstract
PURPOSE Genetic studies of multiple sclerosis (MS) susceptibility and severity have focused on populations of European ancestry. Studying MS genetics in other ancestral groups is necessary to determine the generalisability of these findings. The genetic Association study in individuals from Diverse Ancestral backgrounds with Multiple Sclerosis (ADAMS) project aims to gather genetic and phenotypic data on a large cohort of ancestrally-diverse individuals with MS living in the UK. PARTICIPANTS Adults with self-reported MS from diverse ancestral backgrounds. Recruitment is via clinical sites, online (https://app.mantal.co.uk/adams) or the UK MS Register. We are collecting demographic and phenotypic data using a baseline questionnaire and subsequent healthcare record linkage. We are collecting DNA from participants using saliva kits (Oragene-600) and genotyping using the Illumina Global Screening Array V.3. FINDINGS TO DATE As of 3 January 2023, we have recruited 682 participants (n=446 online, n=55 via sites, n=181 via the UK MS Register). Of this initial cohort, 71.2% of participants are female, with a median age of 44.9 years at recruitment. Over 60% of the cohort are non-white British, with 23.5% identifying as Asian or Asian British, 16.2% as Black, African, Caribbean or Black British and 20.9% identifying as having mixed or other backgrounds. The median age at first symptom is 28 years, and median age at diagnosis is 32 years. 76.8% have relapsing-remitting MS, and 13.5% have secondary progressive MS. FUTURE PLANS Recruitment will continue over the next 10 years. Genotyping and genetic data quality control are ongoing. Within the next 3 years, we aim to perform initial genetic analyses of susceptibility and severity with a view to replicating the findings from European-ancestry studies. In the long term, genetic data will be combined with other datasets to further cross-ancestry genetic discoveries.
Collapse
Affiliation(s)
- Benjamin M Jacobs
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Luisa Schalk
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Angie Dunne
- Leeds Centre for Neurosciences, Leeds teaching Hospitals NHS Trust, Leeds, UK
| | - Antonio Scalfari
- Centre of Neuroscience, Department of Medicine, Imperial College London, London, UK
| | | | - Bruno Gran
- Department of Neurology, Nottingham University Hospitals NHS Trust, Mental Health and Clinical Neuroscience Academic Unit, University of Nottingham School of Medicine, Nottingham, UK
| | - Charles A Mein
- Barts and the London Genome Centre, Queen Mary University of London, London, UK
| | - Charlotte Sellers
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Cord Spilker
- Bradford Teaching Hospital Foundation Trust, Bradford, UK
| | - David Rog
- Manchester Centre for Clinical Neurosciences, Northern Care Alliance NHS Trust, Manchester, UK
| | - Elisa Visentin
- Research and Innovation, Queen's Hospital, BHRUT, London, UK
| | | | - Emeka Uzochukwu
- Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
| | - Emma Tallantyre
- Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
- Department of Clinical Neurology, University Hospital of Wales, Cardiff, UK
| | - Eva Wozniak
- Barts and the London Genome Centre, Queen Mary University of London, London, UK
| | - Eve Sacre
- Leeds Centre for Neurosciences, Leeds teaching Hospitals NHS Trust, Leeds, UK
| | | | - Helen L Ford
- Leeds Centre for Neurosciences, Leeds teaching Hospitals NHS Trust, Leeds, UK
| | - Jade Harris
- Northern Care Alliance NHS Trust, Manchester, UK
| | | | - Joshua Breedon
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | | | - Karim L Kreft
- Department of Neurology, University Hospital of Wales, Cardiff, UK
| | | | - Katila George
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | | | - Martin O'Malley
- Leeds Centre for Neurosciences, Leeds teaching Hospitals NHS Trust, Leeds, UK
| | - Michelle Peter
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Miriam Mattoscio
- Department of neuroscience, Queen's Hospital, BHRUT NHS Trust, Romford, UK
| | - Neisha Rhule
- Queen Elizabeth Hospital (Lewisham and Greenwich NHS Trust), London, UK
| | - Nikos Evangelou
- Department of Neurology, Nottingham University Hospitals NHS Trust; Mental Health and Clinical Neuroscience Academic Unit, University of Nottingham School of Medicine, Nottingham, UK
| | | | - Outi Quinn
- Bradford Teaching Hospital Foundation Trust, Bradford, UK
| | - Ramya Shamji
- Research and Innovation, Queen's Hospital, BHRUT, London, UK
| | - Rashmi Kaimal
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Rebecca Boulton
- Department of Neurology, Nottingham University Hospitals NHS Trust; Mental Health and Clinical Neuroscience Academic Unit, University of Nottingham School of Medicine, Nottingham, UK
| | - Riffat Tanveer
- Lancashire Teaching Hospital NHS Foundation Trust, Preston, UK
| | - Rod Middleton
- Population Data Science, Swansea University Medical School, Swansea, UK
| | - Roxanne Murray
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Ruth Bellfield
- Bradford Teaching Hospital Foundation Trust, Bradford, UK
| | - Sadid Hoque
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Shakeelah Patel
- Lancashire Teaching Hospital NHS Foundation Trust, Preston, UK
| | - Sonia Raj
- Lancashire Teaching Hospital NHS Foundation Trust, Preston, UK
| | - Stephanie Gumus
- Mid and South Essex NHS Foundation Trust, Southend-on-Sea, UK
| | | | - Stephen Sawcer
- University of Cambridge, Department of Clinical Neuroscience, Addenbrookes Hospital, Hills Road, Cambridge, UK
| | - Tarunya Arun
- University Hospitals of Coventry and Warwickshire, Coventry, UK
| | | | - Terri-Louise Brown
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Thamanna Begum
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | | | - Waqar Rashid
- St George's University Hospitals NHS Foundation Trust, London, UK
| | - Alastair J Noyce
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Eli Silber
- Kings College Hospital and Lewisham and Greenwich NHS Trusts, London, UK
| | - Huw Morris
- Department of Clinical and Movement Neuroscience, UCL Queen Square Institute of Neurology, London, UK
| | - Gavin Giovannoni
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | - Ruth Dobson
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| |
Collapse
|
21
|
Levin MG, Tsao NL, Singhal P, Liu C, Vy HMT, Paranjpe I, Backman JD, Bellomo TR, Bone WP, Biddinger KJ, Hui Q, Dikilitas O, Satterfield BA, Yang Y, Morley MP, Bradford Y, Burke M, Reza N, Charest B, Judy RL, Puckelwartz MJ, Hakonarson H, Khan A, Kottyan LC, Kullo I, Luo Y, McNally EM, Rasmussen-Torvik LJ, Day SM, Do R, Phillips LS, Ellinor PT, Nadkarni GN, Ritchie MD, Arany Z, Cappola TP, Margulies KB, Aragam KG, Haggerty CM, Joseph J, Sun YV, Voight BF, Damrauer SM. Genome-wide association and multi-trait analyses characterize the common genetic architecture of heart failure. Nat Commun 2022; 13:6914. [PMID: 36376295 PMCID: PMC9663424 DOI: 10.1038/s41467-022-34216-6] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 10/17/2022] [Indexed: 11/16/2022] Open
Abstract
Heart failure is a leading cause of cardiovascular morbidity and mortality. However, the contribution of common genetic variation to heart failure risk has not been fully elucidated, particularly in comparison to other common cardiometabolic traits. We report a multi-ancestry genome-wide association study meta-analysis of all-cause heart failure including up to 115,150 cases and 1,550,331 controls of diverse genetic ancestry, identifying 47 risk loci. We also perform multivariate genome-wide association studies that integrate heart failure with related cardiac magnetic resonance imaging endophenotypes, identifying 61 risk loci. Gene-prioritization analyses including colocalization and transcriptome-wide association studies identify known and previously unreported candidate cardiomyopathy genes and cellular processes, which we validate in gene-expression profiling of failing and healthy human hearts. Colocalization, gene expression profiling, and Mendelian randomization provide convergent evidence for the roles of BCKDHA and circulating branch-chain amino acids in heart failure and cardiac structure. Finally, proteome-wide Mendelian randomization identifies 9 circulating proteins associated with heart failure or quantitative imaging traits. These analyses highlight similarities and differences among heart failure and associated cardiovascular imaging endophenotypes, implicate common genetic variation in the pathogenesis of heart failure, and identify circulating proteins that may represent cardiomyopathy treatment targets.
Collapse
Affiliation(s)
- Michael G Levin
- Division of Cardiovascular Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
| | - Noah L Tsao
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Pankhuri Singhal
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Chang Liu
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Ha My T Vy
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ishan Paranjpe
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Tiffany R Bellomo
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - William P Bone
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kiran J Biddinger
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Qin Hui
- Emory University School of Public Health, Atlanta, GA, USA
- Atlanta VA Health Care System, Decatur, GA, USA
| | - Ozan Dikilitas
- Departments of Internal Medicine and Cardiovascular Medicine, and Mayo Clinician-Investigator Training Program, Mayo Clinic, Rochester, MN, USA
| | | | - Yifan Yang
- Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael P Morley
- Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yuki Bradford
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Megan Burke
- Division of Cardiovascular Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Nosheen Reza
- Division of Cardiovascular Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Brian Charest
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, MA, USA
| | - Renae L Judy
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Megan J Puckelwartz
- Department of Pharmacology, Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Atlas Khan
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Leah C Kottyan
- Department of Pediatrics, Division of Human Genetics and Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Iftikhar Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Elizabeth M McNally
- Center for Genetic Medicine, Bluhm Cardiovascular Institute, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Laura J Rasmussen-Torvik
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Sharlene M Day
- Division of Cardiovascular Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, BioMe Phenomics Center, and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lawrence S Phillips
- Atlanta VA Health Care System, Decatur, GA, USA
- Division of Endocrinology, Emory University School of Medicine, Atlanta, GA, USA
| | - Patrick T Ellinor
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Cardiac Arrhythmia Service, Massachusetts General Hospital, Boston, MA, USA
| | - Girish N Nadkarni
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Zoltan Arany
- Division of Cardiovascular Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Thomas P Cappola
- Division of Cardiovascular Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kenneth B Margulies
- Division of Cardiovascular Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Krishna G Aragam
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christopher M Haggerty
- Department of Translational Data Science and Informatics and Heart Institute, Geisinger, Danville, PA, USA
| | - Jacob Joseph
- Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Yan V Sun
- Emory University School of Public Health, Atlanta, GA, USA
- Atlanta VA Health Care System, Decatur, GA, USA
| | - Benjamin F Voight
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Institute of Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Scott M Damrauer
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA.
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
| |
Collapse
|
22
|
Privé F, Arbel J, Aschard H, Vilhjálmsson BJ. Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores. HGG ADVANCES 2022; 3:100136. [PMID: 36105883 PMCID: PMC9465343 DOI: 10.1016/j.xhgg.2022.100136] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 08/11/2022] [Indexed: 11/18/2022] Open
Abstract
Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analyses, we show that additional information such as imputation INFO scores, allele frequencies, and per-variant sample sizes in GWAS summary statistics can be used to detect possible issues and correct for misspecifications in the GWAS summary statistics. One important motivation for us is to improve the predictive performance of polygenic scores built from these summary statistics. Unfortunately, owing to the lack of reporting standards for GWAS summary statistics, this additional information is not systematically reported. We also show that using well-matched linkage disequilibrium (LD) references can improve model fit and translate into more accurate prediction. Finally, we discuss how to make polygenic score methods such as lassosum and LDpred2 more robust to these misspecifications to improve their predictive power.
Collapse
Affiliation(s)
- Florian Privé
- National Centre for Register-Based Research, Aarhus University, 8210 Aarhus, Denmark
| | - Julyan Arbel
- Université Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, 75015 Paris, France
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Bjarni J. Vilhjálmsson
- National Centre for Register-Based Research, Aarhus University, 8210 Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark
| |
Collapse
|