1
|
Wu Q, Dai J, Liu J, Wu L. Bridging Genomic Research Disparities in Osteoporosis GWAS: Insights for Diverse Populations. Curr Osteoporos Rep 2025; 23:24. [PMID: 40411668 DOI: 10.1007/s11914-025-00917-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/23/2025] [Indexed: 05/26/2025]
Abstract
PURPOSE OF REVIEW Genome-wide association studies (GWAS) have significantly advanced osteoporosis research by identifying genetic loci associated with bone mineral density (BMD) and fracture risk. However, disparities persist due to the underrepresentation of non-European populations, limiting the applicability of polygenic risk scores (PRS). This review examines recent advancements in osteoporosis genetics, highlights existing disparities, and explores strategies for more inclusive research. RECENT FINDINGS European-focused GWAS have identified key loci for osteoporosis, including WNT signaling (SOST, LRP5) and RUNX2 transcriptional regulation. However, fewer than 40% of these variants can be replicated in Asian and African populations. Emerging studies in non-European groups reveal population-specific loci, sex-specific associations, and gene-environment interactions. Advances in machine learning (ML)-assisted GWAS and multi-omics integration are improving genetic discovery. Expanding GWAS in diverse populations, integrating multi-omics data, refining ML-based risk models, and standardizing biobank data are essential for equitable osteoporosis research. Future efforts must prioritize clinical translation to enhance personalized osteoporosis prevention and treatment.
Collapse
Affiliation(s)
- Qing Wu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH, 43210, USA.
| | - Jingyuan Dai
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH, 43210, USA
| | - Jianing Liu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH, 43210, USA
| | - Lang Wu
- Pacific Center for Genome Research, University of Hawai'i at Mānoa, Honolulu, HI, USA
- Population Sciences in the Pacific Program, University of Hawai'i Cancer Center, University of Hawai'i at Mānoa, Honolulu, HI, USA
| |
Collapse
|
2
|
Lehmann B, Bräuninger L, Cho Y, Falck F, Jayadeva S, Katell M, Nguyen T, Perini A, Tallman S, Mackintosh M, Silver M, Kuchenbäcker K, Leslie D, Chatterjee N, Holmes C. Methodological opportunities in genomic data analysis to advance health equity. Nat Rev Genet 2025:10.1038/s41576-025-00839-w. [PMID: 40369311 DOI: 10.1038/s41576-025-00839-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2025] [Indexed: 05/16/2025]
Abstract
The causes and consequences of inequities in genomic research and medicine are complex and widespread. However, it is widely acknowledged that underrepresentation of diverse populations in human genetics research risks exacerbating existing health disparities. Efforts to improve diversity are ongoing, but an often-overlooked source of inequity is the choice of analytical methods used to process, analyse and interpret genomic data. This choice can influence all areas of genomic research, from genome-wide association studies and polygenic score development to variant prioritization and functional genomics. New statistical and machine learning techniques to understand, quantify and correct for the impact of biases in genomic data are emerging within the wider genomic research and genomic medicine ecosystems. At this crucial time point, it is important to clarify where improvements in methods and practices can, or cannot, have a role in improving equity in genomics. Here, we review existing approaches to promote equity and fairness in statistical analysis for genomics, and propose future methodological developments that are likely to yield the most impact for equity.
Collapse
Affiliation(s)
- Brieuc Lehmann
- Department of Statistical Science, University College London, London, UK.
| | - Leandra Bräuninger
- Department of Statistical Science, University College London, London, UK
- The Alan Turing Institute, London, UK
| | - Yoonsu Cho
- Genomics England, London, UK
- Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Fabian Falck
- The Alan Turing Institute, London, UK
- Department of Statistics, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | - Matt Silver
- Genomics England, London, UK
- Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine, Banjul, The Gambia
| | - Karoline Kuchenbäcker
- Genomics England, London, UK
- Division of Psychiatry, University College London, London, UK
| | | | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Chris Holmes
- Department of Statistics, University of Oxford, Oxford, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
3
|
Xu L, Zhou G, Jiang W, Zhang H, Dong Y, Guan L, Zhao H. JointPRS: A data-adaptive framework for multi-population genetic risk prediction incorporating genetic correlation. Nat Commun 2025; 16:3841. [PMID: 40268942 PMCID: PMC12019179 DOI: 10.1038/s41467-025-59243-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Accepted: 04/16/2025] [Indexed: 04/25/2025] Open
Abstract
Genetic risk prediction for non-European populations is hindered by limited Genome-Wide Association Study (GWAS) sample sizes and small tuning datasets. We propose JointPRS, a data-adaptive framework that leverages genetic correlations across multiple populations using GWAS summary statistics. It achieves accurate predictions without individual-level tuning data and remains effective in the presence of a small tuning set thanks to its data-adaptive approach. Through extensive simulations and real data applications to 22 quantitative and four binary traits in five continental populations evaluated using the UK Biobank (UKBB) and All of Us (AoU), JointPRS consistently outperforms six state-of-the-art methods across three data scenarios: no tuning data, same-cohort tuning and testing, and cross-cohort tuning and testing. Notably, in the Admixed American population, JointPRS improves lipid trait prediction in AoU by 6.46%-172.00% compared to the other existing methods.
Collapse
Affiliation(s)
- Leqi Xu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Geyu Zhou
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
- Department of Mathematics, University of Texas at Arlington, Arlington, Texas, USA
- Division of Data Science, College of Science, University of Texas at Arlington, Arlington, Texas, USA
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Yikai Dong
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Leying Guan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
| |
Collapse
|
4
|
Pei YB, Yu ZY, Shen JS. Transfer learning for accelerated failure time model with microarray data. BMC Bioinformatics 2025; 26:84. [PMID: 40098088 PMCID: PMC11917065 DOI: 10.1186/s12859-025-06056-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 01/17/2025] [Indexed: 03/19/2025] Open
Abstract
BACKGROUND In microarray prognostic studies, researchers aim to identify genes associated with disease progression. However, due to the rarity of certain diseases and the cost of sample collection, researchers often face the challenge of limited sample size, which may prevent accurate estimation and risk assessment. This challenge necessitates methods that can leverage information from external data (i.e., source cohorts) to improve gene selection and risk assessment based on the current sample (i.e., target cohort). METHOD We propose a transfer learning method for the accelerated failure time (AFT) model to enhance the fit on the target cohort by adaptively borrowing information from the source cohorts. We use a Leave-One-Out cross validation based procedure to evaluate the relative stability of selected genes and overall predictive power. CONCLUSION In simulation studies, the transfer learning method for the AFT model can correctly identify a small number of genes, its estimation error is smaller than the estimation error obtained without using the source cohorts. Furthermore, the proposed method demonstrates satisfactory accuracy and robustness in addressing heterogeneity across the cohorts compared to the method that directly combines the target and the source cohorts in the AFT model. We analyze the GSE88770 and GSE25055 data using the proposed method. The selected genes are relatively stable, and the proposed method can make an overall satisfactory risk prediction.
Collapse
Affiliation(s)
- Yan-Bo Pei
- School of Statistics, Capital University of Economics and Business, Beijing, China
| | - Zheng-Yang Yu
- School of Statistics, Capital University of Economics and Business, Beijing, China
| | - Jun-Shan Shen
- School of Statistics, Capital University of Economics and Business, Beijing, China.
| |
Collapse
|
5
|
Gunn S, Wang X, Posner DC, Cho K, Huffman JE, Gaziano M, Wilson PW, Sun YV, Peloso G, Lunetta KL. Comparison of methods for building polygenic scores for diverse populations. HGG ADVANCES 2025; 6:100355. [PMID: 39323095 PMCID: PMC11532986 DOI: 10.1016/j.xhgg.2024.100355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 09/22/2024] [Accepted: 09/22/2024] [Indexed: 09/27/2024] Open
Abstract
Polygenic scores (PGSs) are a promising tool for estimating individual-level genetic risk of disease based on the results of genome-wide association studies (GWASs). However, their promise has yet to be fully realized because most currently available PGSs were built with genetic data from predominantly European-ancestry populations, and PGS performance declines when scores are applied to target populations different from the populations from which they were derived. Thus, there is a great need to improve PGS performance in currently under-studied populations. In this work we leverage data from two large and diverse cohorts the Million Veterans Program (MVP) and All of Us (AoU), providing us the unique opportunity to compare methods for building PGSs for multi-ancestry populations across multiple traits. We build PGSs for five continuous traits and five binary traits using both multi-ancestry and single-ancestry approaches with popular Bayesian PGS methods and both MVP META GWAS results and population-specific GWAS results from the respective African, European, and Hispanic MVP populations. We evaluate these scores in three AoU populations genetically similar to the respective African, Admixed American, and European 1000 Genomes Project superpopulations. Using correlation-based tests, we make formal comparisons of the PGS performance across the multiple AoU populations. We conclude that approaches that combine GWAS data from multiple populations produce PGSs that perform better than approaches that utilize smaller single-population GWAS results matched to the target population, and specifically that multi-ancestry scores built with PRS-CSx outperform the other approaches in the three AoU populations.
Collapse
Affiliation(s)
- Sophia Gunn
- Biostatistics, Boston University School of Public Health, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA.
| | - Xin Wang
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA; Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Daniel C Posner
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) , Boston, MA, USA
| | - Kelly Cho
- Department of Medicine, Harvard Medical School, Boston, MA, USA; MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, USA; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Jennifer E Huffman
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) , Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA; Palo Alto Veterans Institute for Research (PAVIR), Palo Alto Health Care System, Palo Alto, CA, USA
| | - Michael Gaziano
- Department of Medicine, Harvard Medical School, Boston, MA, USA; MVP Boston Coordinating Center, VA Boston Healthcare System, Boston, MA, USA; Department of Medicine, Division of Aging, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Peter W Wilson
- VA Atlanta Healthcare System, Decatur, GA, USA; Division of Cardiology, Department of Medicine, Emory University School of Medicine, Atlanta, GA, USA; Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Yan V Sun
- VA Atlanta Healthcare System, Decatur, GA, USA; Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Gina Peloso
- Biostatistics, Boston University School of Public Health, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA
| | - Kathryn L Lunetta
- Biostatistics, Boston University School of Public Health, Boston, MA, USA
| |
Collapse
|
6
|
Jayasinghe D, Eshetie S, Beckmann K, Benyamin B, Lee SH. Advancements and limitations in polygenic risk score methods for genomic prediction: a scoping review. Hum Genet 2024; 143:1401-1431. [PMID: 39542907 DOI: 10.1007/s00439-024-02716-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 10/31/2024] [Indexed: 11/17/2024]
Abstract
This scoping review aims to identify and evaluate the landscape of Polygenic Risk Score (PRS)-based methods for genomic prediction from 2013 to 2023, highlighting their advancements, key concepts, and existing gaps in knowledge, research, and technology. Over the past decade, various PRS-based methods have emerged, each employing different statistical frameworks aimed at enhancing prediction accuracy, processing speed and memory efficiency. Despite notable advancements, challenges persist, including unrealistic assumptions regarding sample sizes and the polygenicity of traits necessary for accurate predictions, as well as limitations in exploring hyper-parameter spaces and considering environmental interactions. We included studies focusing on PRS-based methods for risk prediction that underwent methodological evaluations using valid approaches and released computational tools/software. Additionally, we restricted our selection to studies involving human participants that were published in English language. This review followed the standard protocol recommended by Joanna Briggs Institute Reviewer's Manual, systematically searching Ovid MEDLINE, Ovid Embase, Scopus and Web of Science databases. Additionally, searches included grey literature sources like pre-print servers such as bioRxiv, and articles recommended by experts to ensure comprehensive and diverse coverage of relevant records. This study identified 34 studies detailing 37 genomic prediction methods, the majority of which rely on linkage disequilibrium (LD) information and necessitate hyper-parameter tuning. Nine methods integrate functional/gene annotation, while 12 are suitable for cross-ancestry genomic prediction, with only one considering gene-environment (GxE) interaction. While some methods require individual-level data, most leverage summary statistics, offering flexibility. Despite progress, challenges remain. These include computational complexity and the need for large sample sizes for high prediction accuracy. Furthermore, recent methods exhibit varying effectiveness across traits, with absolute accuracies often falling short of clinical utility. Transferability across ancestries varies, influenced by trait heritability and diversity of training data, while handling admixed populations remains challenging. Additionally, the absence of standard error measurements for individual PRSs, crucial in clinical settings, underscores a critical gap. Another issue is the lack of customizable graphical visualization tools among current software packages. While genomic prediction methods have advanced significantly, there is still room for improvement. Addressing current challenges and embracing future research directions will lead to the development of more universally applicable, robust, and clinically relevant genomic prediction tools.
Collapse
Affiliation(s)
- Dovini Jayasinghe
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| | - Setegn Eshetie
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
- College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Kerri Beckmann
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
| | - Beben Benyamin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| |
Collapse
|
7
|
Zhu Y, Chen W, Zhu K, Liu Y, Huang S, Zeng P. Polygenic prediction for underrepresented populations through transfer learning by utilizing genetic similarity shared with European populations. Brief Bioinform 2024; 26:bbaf048. [PMID: 39905953 PMCID: PMC11794457 DOI: 10.1093/bib/bbaf048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 01/10/2025] [Accepted: 01/21/2025] [Indexed: 02/06/2025] Open
Abstract
Because current genome-wide association studies are primarily conducted in individuals of European ancestry and information disparities exist among different populations, the polygenic score derived from Europeans thus exhibits poor transferability. Borrowing the idea of transfer learning, which enables the utilization of knowledge acquired from auxiliary samples to enhance learning capability in target samples, we propose transPGS, a novel polygenic score method, for genetic prediction in underrepresented populations by leveraging genetic similarity shared between the European and non-European populations while explaining the trans-ethnic difference in linkage disequilibrium (LD) and effect sizes. We demonstrate the usefulness and robustness of transPGS in elevated prediction accuracy via individual-level and summary-level simulations and apply it to seven continuous phenotypes and three diseases in the African, Chinese, and East Asian populations of the UK Biobank and Genetic Epidemiology Research Study on Adult Health and Aging cohorts. We further reveal that distinct LD and minor allele frequency patterns across ancestral groups are responsible for the dissatisfactory portability of PGS.
Collapse
Affiliation(s)
- Yiyang Zhu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Wenying Chen
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Kexuan Zhu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yuxin Liu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
8
|
Borda V, Loesch DP, Guo B, Laboulaye R, Veliz-Otani D, French JN, Leal TP, Gogarten SM, Ikpe S, Gouveia MH, Mendes M, Abecasis GR, Alvim I, Arboleda-Bustos CE, Arboleda G, Arboleda H, Barreto ML, Barwick L, Bezzera MA, Blangero J, Borges V, Caceres O, Cai J, Chana-Cuevas P, Chen Z, Custer B, Dean M, Dinardo C, Domingos I, Duggirala R, Dieguez E, Fernandez W, Ferraz HB, Gilliland F, Guio H, Horta B, Curran JE, Johnsen JM, Kaplan RC, Kelly S, Kenny EE, Konkle BA, Kooperberg C, Lescano A, Lima-Costa MF, Loos RJF, Manichaikul A, Meyers DA, Naslavsky MS, Nickerson DA, North KE, Padilla C, Preuss M, Raggio V, Reiner AP, Rich SS, Rieder CR, Rienstra M, Rotter JI, Rundek T, Sacco RL, Sanchez C, Sankaran VG, Santos-Lobato BL, Schumacher-Schuh AF, Scliar MO, Silverman EK, Sofer T, Lasky-Su J, Tumas V, Weiss ST, Mata IF, Hernandez RD, Tarazona-Santos E, O'Connor TD. Genetics of Latin American Diversity Project: Insights into population genetics and association studies in admixed groups in the Americas. CELL GENOMICS 2024; 4:100692. [PMID: 39486408 PMCID: PMC11605695 DOI: 10.1016/j.xgen.2024.100692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 08/14/2024] [Accepted: 10/09/2024] [Indexed: 11/04/2024]
Abstract
Latin Americans are underrepresented in genetic studies, increasing disparities in personalized genomic medicine. Despite available genetic data from thousands of Latin Americans, accessing and navigating the bureaucratic hurdles for consent or access remains challenging. To address this, we introduce the Genetics of Latin American Diversity (GLAD) Project, compiling genome-wide information from 53,738 Latin Americans across 39 studies representing 46 geographical regions. Through GLAD, we identified heterogeneous ancestry composition and recent gene flow across the Americas. Additionally, we developed GLAD-match, a simulated annealing-based algorithm, to match the genetic background of external samples to our database, sharing summary statistics (i.e., allele and haplotype frequencies) without transferring individual-level genotypes. Finally, we demonstrate the potential of GLAD as a critical resource for evaluating statistical genetic software in the presence of admixture. By providing this resource, we promote genomic research in Latin Americans and contribute to the promises of personalized medicine to more people.
Collapse
Affiliation(s)
- Victor Borda
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA; University of Maryland Institute for Health Computing, University of Maryland School of Medicine, North Bethesda, MD 20852, USA.
| | - Douglas P Loesch
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Bing Guo
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Roland Laboulaye
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Diego Veliz-Otani
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Jennifer N French
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Thiago Peixoto Leal
- Lerner Research Institute, Genomic Medicine, Cleveland Clinic, Cleveland, OH, USA
| | | | - Sunday Ikpe
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Mateus H Gouveia
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Marla Mendes
- Department of Genetics, Ecology, and Evolution, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Gonçalo R Abecasis
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Isabela Alvim
- Department of Genetics, Ecology, and Evolution, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Carlos E Arboleda-Bustos
- Neuroscience and Cell Death Research Groups, Medical School and Genetic Institute, Universidad Nacional de Colombia, Bogota, Colombia
| | - Gonzalo Arboleda
- Neuroscience and Cell Death Research Groups, Medical School and Genetic Institute, Universidad Nacional de Colombia, Bogota, Colombia
| | - Humberto Arboleda
- Neuroscience and Cell Death Research Groups, Medical School and Genetic Institute, Universidad Nacional de Colombia, Bogota, Colombia
| | - Mauricio L Barreto
- Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador, BA 40110-040, Brazil
| | - Lucas Barwick
- LTRC Data Coordinating Center, The Emmes Company, Rockville, MD, USA
| | - Marcos A Bezzera
- Department of Genetics, Federal University of Pernambuco, Av. Prof. Moraes Rego, 1235, Recife, PE 50670-901, Brazil
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Vanderci Borges
- Movement Disorders Unit, Department of Neurology and Neurosurgery, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Omar Caceres
- Instituto Nacional de Salud, Lima, Peru; Facultad de Ciencias de la Salud, Universidad Científica del Sur, Lima, Peru
| | - Jianwen Cai
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Pedro Chana-Cuevas
- CETRAM, Facultad de Ciencias Médicas, Universidad de Santiago de Chile, Santiago, Chile
| | - Zhanghua Chen
- Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA, USA
| | - Brian Custer
- Vitalant Research Institute, San Francisco, CA, USA
| | - Michael Dean
- Laboratory of Genomic Diversity, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Carla Dinardo
- Instituto de Medicina Tropical, University of São Paulo, São Paulo, Brazil
| | - Igor Domingos
- Department of Genetics, Federal University of Pernambuco, Av. Prof. Moraes Rego, 1235, Recife, PE 50670-901, Brazil
| | - Ravindranath Duggirala
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Elena Dieguez
- Neurology Institute, Universidad de la República, Montevideo, Uruguay
| | - Willian Fernandez
- Neuroscience and Cell Death Research Groups, Medical School and Genetic Institute, Universidad Nacional de Colombia, Bogota, Colombia
| | - Henrique B Ferraz
- Movement Disorders Unit, Department of Neurology and Neurosurgery, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Frank Gilliland
- Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA, USA
| | - Heinner Guio
- Instituto Nacional de Salud, Lima, Peru; INBIOMEDIC Research Center, Lima, Peru; Universidad de Huánuco, Huánuco, Peru
| | - Bernardo Horta
- Faculdade de Medicina, Departamento de Medicina Social, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Jill M Johnsen
- Bloodworks Northwest Research Institute, Seattle, WA, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA; Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Shannon Kelly
- Vitalant Research Institute, San Francisco, CA, USA; UCSF Benioff Children's Hospital, University of California, San Francisco, Oakland, CA, USA
| | - Eimear E Kenny
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A Konkle
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Andres Lescano
- Neurology Institute, Universidad de la República, Montevideo, Uruguay
| | - M Fernanda Lima-Costa
- Instituto de Pesquisas René Rachou, Fundação Oswaldo Cruz, Belo Horizonte, MG, Brazil
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Deborah A Meyers
- Division of Genetics, Genomics, and Precision Medicine, University of Arizona, Tucson, AZ, USA
| | - Michel S Naslavsky
- Human Genome and Stem Cell Research Center, University of São Paulo, São Paulo, SP, Brazil
| | | | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Michael Preuss
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Victor Raggio
- Genetics Department, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Alexander P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Carlos R Rieder
- Departamento de Neurologia, Universidade Federal de Ciências da Saúde de Porto Alegre, Porto Alegre, Brazil
| | - Michiel Rienstra
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
| | - Tatjana Rundek
- Department of Neurology, Miller School of Medicine, and The Evelyn F. McKnight Brain Institute, University of Miami, Miami, FL, USA
| | - Ralph L Sacco
- Department of Neurology, Miller School of Medicine, and The Evelyn F. McKnight Brain Institute, University of Miami, Miami, FL, USA
| | | | - Vijay G Sankaran
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Artur Francisco Schumacher-Schuh
- Departamento de Farmacologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil; Serviço de Neurologia, Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil
| | - Marilia O Scliar
- Human Genome and Stem Cell Research Center, University of São Paulo, São Paulo, SP, Brazil
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Harvard Medical School, Boston, MA USA
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Vitor Tumas
- Ribeirão Preto Medical School, Universidade de São Paulo, Ribeirão Preto, Brazil
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Ignacio F Mata
- University of Maryland Institute for Health Computing, University of Maryland School of Medicine, North Bethesda, MD 20852, USA
| | - Ryan D Hernandez
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA; Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA; Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Eduardo Tarazona-Santos
- Department of Genetics, Ecology, and Evolution, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil; Facultad de Salud Pública y Administración. Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA; Program in Health Equity and Population Health, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Program in Personalized Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
| |
Collapse
|
9
|
Cataldo-Ramirez CC, Lin M, Mcmahon A, Gignoux CR, Weaver TD, Henn BM. Improving GWAS performance in underrepresented groups by appropriate modeling of genetics, environment, and sociocultural factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.28.620716. [PMID: 39553939 PMCID: PMC11565798 DOI: 10.1101/2024.10.28.620716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Genome-wide association studies (GWAS) and polygenic score (PGS) development are typically constrained by the data available in biobank repositories in which European cohorts are vastly overrepresented. Here, we increase the utility of non-European participant data within the UK Biobank (UKB) by characterizing the genetic affinities of UKB participants who self-identify as Bangladeshi, Indian, Pakistani, "White and Asian" (WA), and "Any Other Asian" (AOA), towards creating a more robust South Asian sample size for future genetic analyses. We assess the relationships between genetic structure and self-selected ethnic identities resulting in consistent patterns of clustering used to train a support vector machine (SVM). The SVM model was utilized to reassign n = 1,853 AOA and WA participants at the subcontinental level, and increase the sample size of the UKB South Asian group by 1,381 additional participants. We then leverage these samples to assess GWAS performance and PGS development. We further include environmental covariates in the height GWAS by implementing a rigorous covariate selection procedure, and compare the outputs of two GWAS models: GWASnull and GWASenv. We show that PGS performance derived from environmentally adjusted GWAS yields comparable prediction to PGS models developed with an order of magnitude larger training dataset (R 2=0.021 vs 0.026). Models with 7 - 8 environmental covariates double the variance explained by PGS alone. In summary, we demonstrate how GWAS performance can be improved by leveraging ambiguous ethnicity codes, ancestry matched imputation panels, and including environmental covariates.
Collapse
Affiliation(s)
- Chelsea C Cataldo-Ramirez
- Department of Anthropology, University of California Davis, Davis, CA, 95616, USA
- Department of Population and Public Health Sciences, Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, CA 91001, USA
| | - Meng Lin
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Aislinn Mcmahon
- Department of Anthropology, University of California Davis, Davis, CA, 95616, USA
| | - Christopher R Gignoux
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Timothy D Weaver
- Department of Anthropology, University of California Davis, Davis, CA, 95616, USA
| | - Brenna M Henn
- Department of Anthropology, University of California Davis, Davis, CA, 95616, USA
- UC Davis Genome Center, University of California Davis, Davis, CA, 95616, USA
| |
Collapse
|
10
|
Jee YH, Thibord F, Dominguez A, Sept C, Boulier K, Venkateswaran V, Ding Y, Cherlin T, Verma SS, Faro VL, Bartz TM, Boland A, Brody JA, Deleuze JF, Emmerich J, Germain M, Johnson AD, Kooperberg C, Morange PE, Pankratz N, Psaty BM, Reiner AP, Smadja DM, Sitlani CM, Suchon P, Tang W, Trégouët DA, Zöllner S, Pasaniuc B, Damrauer SM, Sanna S, Snieder H, Kabrhel C, Smith NL, Kraft P, INVENT Consortium. Multi-ancestry polygenic risk scores for venous thromboembolism. Hum Mol Genet 2024; 33:1584-1591. [PMID: 38879759 PMCID: PMC11373328 DOI: 10.1093/hmg/ddae097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 05/29/2024] [Accepted: 06/03/2024] [Indexed: 06/25/2024] Open
Abstract
Venous thromboembolism (VTE) is a significant contributor to morbidity and mortality, with large disparities in incidence rates between Black and White Americans. Polygenic risk scores (PRSs) limited to variants discovered in genome-wide association studies in European-ancestry samples can identify European-ancestry individuals at high risk of VTE. However, there is limited evidence on whether high-dimensional PRS constructed using more sophisticated methods and more diverse training data can enhance the predictive ability and their utility across diverse populations. We developed PRSs for VTE using summary statistics from the International Network against Venous Thrombosis (INVENT) consortium genome-wide association studies meta-analyses of European- (71 771 cases and 1 059 740 controls) and African-ancestry samples (7482 cases and 129 975 controls). We used LDpred2 and PRS-CSx to construct ancestry-specific and multi-ancestry PRSs and evaluated their performance in an independent European- (6781 cases and 103 016 controls) and African-ancestry sample (1385 cases and 12 569 controls). Multi-ancestry PRSs with weights tuned in European-ancestry samples slightly outperformed ancestry-specific PRSs in European-ancestry test samples (e.g. the area under the receiver operating curve [AUC] was 0.609 for PRS-CSx_combinedEUR and 0.608 for PRS-CSxEUR [P = 0.00029]). Multi-ancestry PRSs with weights tuned in African-ancestry samples also outperformed ancestry-specific PRSs in African-ancestry test samples (PRS-CSxAFR: AUC = 0.58, PRS-CSx_combined AFR: AUC = 0.59), although this difference was not statistically significant (P = 0.34). The highest fifth percentile of the best-performing PRS was associated with 1.9-fold and 1.68-fold increased risk for VTE among European- and African-ancestry subjects, respectively, relative to those in the middle stratum. These findings suggest that the multi-ancestry PRS might be used to improve performance across diverse populations to identify individuals at highest risk for VTE.
Collapse
Affiliation(s)
- Yon Ho Jee
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, United States
| | - Florian Thibord
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, 31 Center Drive, Bethesda, MD 20892, United States
- Framingham Heart Study, Boston University and National Heart, Lung, and Blood Institute, Framingham, 73 Mt. Wayte Ave, Suite #2, Framingham, MA 01702, United States
| | - Alicia Dominguez
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, United States
| | - Corriene Sept
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, United States
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, University of California Los Angeles, 611 Charles E. Young Drive East, Los Angeles, CA 90095-1570, United States
| | - Vidhya Venkateswaran
- Department of Oral Biology, University of California Los Angeles School of Dentistry, 13-089 CHS, Box 951668, Box 951570, Los Angeles, CA 90095-1668, United States
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California Los Angeles, 611 Charles E. Young Drive East, Los Angeles, CA 90095-1570, United States
| | - Tess Cherlin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, 3400 Spruce St. Philadelphia, PA 19104-4238, United States
| | - Shefali Setia Verma
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, 3400 Spruce St. Philadelphia, PA 19104-4238, United States
| | - Valeria Lo Faro
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, PO Box 30.001, 9700 RB Groningen, The Netherlands
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Dag Hammarskjölds väg 20751 85 Uppsala, Sweden
| | - Traci M Bartz
- Cardiovascular Health Research Unit, Departments of Biostatistics and Medicine, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
| | - Anne Boland
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057 Evry, France
- Laboratory of Excellence in Medical Genomics, GENMED, F-91057 Evry, France
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
| | - Jean-Francois Deleuze
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057 Evry, France
- Laboratory of Excellence in Medical Genomics, GENMED, F-91057 Evry, France
- Centre d’Etude du Polymorphisme Humain, Fondation Jean Dausset, 27 rue Juliette Dodu, 75010 Paris, France
| | - Joseph Emmerich
- Department of Vascular Medicine, Paris Saint-Joseph Hospital Group, University of Paris, 75014 Paris, France
- INSERM CRESS UMR 1153, F-75005, Paris, France
| | - Marine Germain
- Bordeaux Population Health Research Center, University of Bordeaux, INSERM, UMR 1219, Bordeaux, France
| | - Andrew D Johnson
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, 31 Center Drive, Bethesda, MD 20892, United States
- Framingham Heart Study, Boston University and National Heart, Lung, and Blood Institute, Framingham, 73 Mt. Wayte Ave, Suite #2, Framingham, MA 01702, United States
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinbson Cancer Center, PO Box 19024, Seattle, WA 98109, United States
| | - Pierre-Emmanuel Morange
- Aix-Marseille University, INSERM, INRAE, Centre de Recherche en CardioVasculaire et Nutrition, Laboratory of Haematology, CRB Assistance Publique – Hôpitaux de Marseille, HemoVasc, 27, boulevard Jean Moulin, 13005 Marseille, France
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota, 420 Delaware Street SE, Minneapolis, MN 55455, United States
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
- Department of Epidemiology, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
- Department of Health Systems and Population Health, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
| | - Alexander P Reiner
- Division of Public Health Sciences, Fred Hutchinbson Cancer Center, PO Box 19024, Seattle, WA 98109, United States
- Department of Epidemiology, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
| | - David M Smadja
- Innovative Therapies in Hemostasis, Université de Paris, INSERM, F-75006, Paris, France
- Hematology Department and Biosurgical Research Lab (Carpentier Foundation), Assistance Publique Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), F-75015, Paris, France
| | - Colleen M Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
| | - Pierre Suchon
- Aix-Marseille University, INSERM, INRAE, Centre de Recherche en CardioVasculaire et Nutrition, Laboratory of Haematology, CRB Assistance Publique – Hôpitaux de Marseille, HemoVasc, 27, boulevard Jean Moulin, 13005 Marseille, France
| | - Weihong Tang
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, 1300 S. 2nd St., Minneapolis, MN 55454, United States
| | - David-Alexandre Trégouët
- Bordeaux Population Health Research Center, University of Bordeaux, INSERM, UMR 1219, Bordeaux, France
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, United States
| | - Bogdan Pasaniuc
- Department of Oral Biology, University of California Los Angeles School of Dentistry, 13-089 CHS, Box 951668, Box 951570, Los Angeles, CA 90095-1668, United States
| | - Scott M Damrauer
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, 415 Curie Blvd, Philadelphia, PA 19104, United States
- Department of Surgery, Department of Genetics, and Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Boulevard, Building 421, Philadelphia, PA 19104, United States
- Department of Surgery, Corporal Michael Crescenz VA Medical Center, 3900 Woodland Ave, Philadelphia, PA 19104, United States
| | - Serena Sanna
- Department of Genetics, University of Groningen, University Medical Center Groningen (UMCG), PO Box 30.001, 9700 RB Groningen, The Netherlands
- Institute for Genetics and Biomedical Research, National Research Council, SS 554 Km 4,500, 09042 Monserrato CA, Italy
| | - Harold Snieder
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, PO Box 30.001, 9700 RB Groningen, The Netherlands
| | - Christopher Kabrhel
- Center for Vascular Emergencies, Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114, United States
| | - Nicholas L Smith
- Department of Health Systems and Population Health, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
- Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, 1730 Minor Ave, Seattle, WA 98101, United States
- Department of Veterans Affairs Office of Research and Development, Seattle Epidemiologic Research and Information Center, 1660 S Columbian Way, S-152-E, Seattle, WA 98108, United States
| | - Peter Kraft
- Transdivisional Research Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, 9609 Medical Center Dr, Rockville, MD 20850, United States
| | | |
Collapse
|
11
|
Joglekar MV, Kaur S, Pociot F, Hardikar AA. Prediction of progression to type 1 diabetes with dynamic biomarkers and risk scores. Lancet Diabetes Endocrinol 2024; 12:483-492. [PMID: 38797187 DOI: 10.1016/s2213-8587(24)00103-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/31/2024] [Accepted: 04/02/2024] [Indexed: 05/29/2024]
Abstract
Identifying biomarkers of functional β-cell loss is an important step in the risk stratification of type 1 diabetes. Genetic risk scores (GRS), generated by profiling an array of single nucleotide polymorphisms, are a widely used type 1 diabetes risk-prediction tool. Type 1 diabetes screening studies have relied on a combination of biochemical (autoantibody) and GRS screening methodologies for identifying individuals at high-risk of type 1 diabetes. A limitation of these screening tools is that the presence of autoantibodies marks the initiation of β-cell loss, and is therefore not the best biomarker of progression to early-stage type 1 diabetes. GRS, on the other hand, represents a static biomarker offering a single risk score over an individual's lifetime. In this Personal View, we explore the challenges and opportunities of static and dynamic biomarkers in the prediction of progression to type 1 diabetes. We discuss future directions wherein newer dynamic risk scores could be used to predict type 1 diabetes risk, assess the efficacy of new and emerging drugs to retard, or prevent type 1 diabetes, and possibly replace or further enhance the predictive ability offered by static biomarkers, such as GRS.
Collapse
Affiliation(s)
- Mugdha V Joglekar
- School of Medicine, Western Sydney University, Sydney, NSW, Australia
| | | | - Flemming Pociot
- Steno Diabetes Center Copenhagen, Herlev, Denmark; Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
| | | |
Collapse
|
12
|
Gao Y, Cui Y. Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement. Genome Med 2024; 16:76. [PMID: 38835075 PMCID: PMC11149372 DOI: 10.1186/s13073-024-01345-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 05/17/2024] [Indexed: 06/06/2024] Open
Abstract
BACKGROUND Accurate prediction of an individual's predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets. METHODS We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer's disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups. RESULTS Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations. CONCLUSIONS This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Yan Cui
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
- Center for Cancer Research, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
| |
Collapse
|
13
|
Jin J, Zhan J, Zhang J, Zhao R, O'Connell J, Jiang Y, Buyske S, Gignoux C, Haiman C, Kenny EE, Kooperberg C, North K, Koelsch BL, Wojcik G, Zhang H, Chatterjee N. MUSSEL: Enhanced Bayesian polygenic risk prediction leveraging information across multiple ancestry groups. CELL GENOMICS 2024; 4:100539. [PMID: 38604127 PMCID: PMC11019365 DOI: 10.1016/j.xgen.2024.100539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 09/07/2023] [Accepted: 03/14/2024] [Indexed: 04/13/2024]
Abstract
Polygenic risk scores (PRSs) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in summary statistics from genome-wide association studies (GWASs) across multiple ancestry groups via Bayesian hierarchical modeling and ensemble learning. In our simulation studies and data analyses across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. For example, MUSSEL has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, trait architecture, and linkage disequilibrium reference samples; thus, ultimately a combination of methods may be needed to generate the most robust PRSs across diverse populations.
Collapse
Affiliation(s)
- Jin Jin
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA; Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19103, USA.
| | | | - Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | | | | | - Steven Buyske
- Department of Statistics, Rutgers University, New Brunswick, NJ 08854, USA
| | - Christopher Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christopher Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90032, USA
| | - Eimear E Kenny
- Icahn Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Kari North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA
| | | | - Genevieve Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Haoyu Zhang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA; Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA.
| |
Collapse
|
14
|
Shah Y, Kulm S, Nauseef JT, Chen Z, Elemento O, Kensler KH, Sharaf RN. Benchmarking multi-ancestry prostate cancer polygenic risk scores in a real-world cohort. PLoS Comput Biol 2024; 20:e1011990. [PMID: 38598551 PMCID: PMC11034641 DOI: 10.1371/journal.pcbi.1011990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 04/22/2024] [Accepted: 03/11/2024] [Indexed: 04/12/2024] Open
Abstract
Prostate cancer is a heritable disease with ancestry-biased incidence and mortality. Polygenic risk scores (PRSs) offer promising advancements in predicting disease risk, including prostate cancer. While their accuracy continues to improve, research aimed at enhancing their effectiveness within African and Asian populations remains key for equitable use. Recent algorithmic developments for PRS derivation have resulted in improved pan-ancestral risk prediction for several diseases. In this study, we benchmark the predictive power of six widely used PRS derivation algorithms, including four of which adjust for ancestry, against prostate cancer cases and controls from the UK Biobank and All of Us cohorts. We find modest improvement in discriminatory ability when compared with a simple method that prioritizes variants, clumping, and published polygenic risk scores. Our findings underscore the importance of improving upon risk prediction algorithms and the sampling of diverse cohorts.
Collapse
Affiliation(s)
- Yajas Shah
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York City, New York, United States of America
| | - Scott Kulm
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York City, New York, United States of America
| | - Jones T. Nauseef
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Medicine—Hematology and Medical Oncology, Weill Cornell Medicine, New York City, New York, United States of America
| | - Zhengming Chen
- Department of Population Health Sciences, Weill Cornell Medicine, New York City, New York, United States of America
| | - Olivier Elemento
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York City, New York, United States of America
| | - Kevin H. Kensler
- Department of Population Health Sciences, Weill Cornell Medicine, New York City, New York, United States of America
| | - Ravi N. Sharaf
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Population Health Sciences, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Medicine–Gastroenterology and Hepatology, Weill Cornell Medicine, New York City, New York, United States of America
| |
Collapse
|
15
|
Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, Kenny EE, Pasaniuc B, Witte JS, Ge T. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet 2024; 25:8-25. [PMID: 37620596 PMCID: PMC10961971 DOI: 10.1038/s41576-023-00637-2] [Citation(s) in RCA: 103] [Impact Index Per Article: 103.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 08/26/2023]
Abstract
Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.
Collapse
Affiliation(s)
- Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jibril Hirbo
- Department of Medicine Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Iman Martin
- Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, MD, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
16
|
Zhai S, Mehrotra DV, Shen J. Applying polygenic risk score methods to pharmacogenomics GWAS: challenges and opportunities. Brief Bioinform 2023; 25:bbad470. [PMID: 38152980 PMCID: PMC10782924 DOI: 10.1093/bib/bbad470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/20/2023] [Accepted: 11/28/2023] [Indexed: 12/29/2023] Open
Abstract
Polygenic risk scores (PRSs) have emerged as promising tools for the prediction of human diseases and complex traits in disease genome-wide association studies (GWAS). Applying PRSs to pharmacogenomics (PGx) studies has begun to show great potential for improving patient stratification and drug response prediction. However, there are unique challenges that arise when applying PRSs to PGx GWAS beyond those typically encountered in disease GWAS (e.g. Eurocentric or trans-ethnic bias). These challenges include: (i) the lack of knowledge about whether PGx or disease GWAS/variants should be used in the base cohort (BC); (ii) the small sample sizes in PGx GWAS with corresponding low power and (iii) the more complex PRS statistical modeling required for handling both prognostic and predictive effects simultaneously. To gain insights in this landscape about the general trends, challenges and possible solutions, we first conduct a systematic review of both PRS applications and PRS method development in PGx GWAS. To further address the challenges, we propose (i) a novel PRS application strategy by leveraging both PGx and disease GWAS summary statistics in the BC for PRS construction and (ii) a new Bayesian method (PRS-PGx-Bayesx) to reduce Eurocentric or cross-population PRS prediction bias. Extensive simulations are conducted to demonstrate their advantages over existing PRS methods applied in PGx GWAS. Our systematic review and methodology research work not only highlights current gaps and key considerations while applying PRS methods to PGx GWAS, but also provides possible solutions for better PGx PRS applications and future research.
Collapse
Affiliation(s)
- Song Zhai
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
17
|
Zhang H, Zhan J, Jin J, Zhang J, Lu W, Zhao R, Ahearn TU, Yu Z, O'Connell J, Jiang Y, Chen T, Okuhara D, Garcia-Closas M, Lin X, Koelsch BL, Chatterjee N. A new method for multiancestry polygenic prediction improves performance across diverse populations. Nat Genet 2023; 55:1757-1768. [PMID: 37749244 PMCID: PMC10923245 DOI: 10.1038/s41588-023-01501-z] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 08/16/2023] [Indexed: 09/27/2023]
Abstract
Polygenic risk scores (PRSs) increasingly predict complex traits; however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRSs, using ancestry-specific genome-wide association study summary statistics from multiancestry training samples, integrating clumping and thresholding, empirical Bayes and superlearning. We evaluated CT-SLEB and nine alternative methods with large-scale simulated genome-wide association studies (~19 million common variants) and datasets from 23andMe, Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank, involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across 13 complex traits. Results demonstrated that CT-SLEB significantly improves PRS performance in non-European populations compared with simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offered insights into sample size requirements and SNP density effects on multiancestry risk prediction.
Collapse
Affiliation(s)
- Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | | | - Jin Jin
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Wenxuan Lu
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Thomas U Ahearn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Zhi Yu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Tony Chen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
- Division of Genetics and Epidemiology, Institute of Cancer Research, London, UK
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | | | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
18
|
Jin J, Zhan J, Zhang J, Zhao R, O’Connell J, Jiang Y, 23andMe Research Team, Buyske S, Gignoux C, Haiman C, Kenny EE, Kooperberg C, North K, Koelsch BL, Wojcik G, Zhang H, Chatterjee N. MUSSEL: Enhanced Bayesian Polygenic Risk Prediction Leveraging Information across Multiple Ancestry Groups. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.12.536510. [PMID: 37090648 PMCID: PMC10120638 DOI: 10.1101/2023.04.12.536510] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Polygenic risk scores (PRS) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across different populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in the summary statistics from genome-wide association studies (GWAS) across multiple ancestry groups. MUSSEL conducts Bayesian hierarchical modeling under a MUltivariate Spike-and-Slab model for effect-size distribution and incorporates an Ensemble Learning step using super learner to combine information across different tuning parameter settings and ancestry groups. In our simulation studies and data analyses of 16 traits across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. The method, for example, has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African Ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, underlying trait architecture, and the choice of reference samples for LD estimation, and thus ultimately, a combination of methods may be needed to generate the most robust PRS across diverse populations.
Collapse
Affiliation(s)
- Jin Jin
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | | | | - Steven Buyske
- Department of Statistics, Rutgers University, New Brunswick, NJ, USA
| | - Christopher Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Eimear E. Kenny
- Icahn Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Kari North
- Department of Epidemiology, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
| | | | - Genevieve Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Haoyu Zhang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
19
|
Alatrany AS, Khan W, Hussain AJ, Mustafina J, Al-Jumeily D. Transfer Learning for Classification of Alzheimer's Disease Based on Genome Wide Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2700-2711. [PMID: 37018274 DOI: 10.1109/tcbb.2022.3233869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Alzheimer's disease (AD) is a type of brain disorder that is regarded as a degenerative disease because the corresponding symptoms aggravate with the time progression. Single nucleotide polymorphisms (SNPs) have been identified as relevant biomarkers for this condition. This study aims to identify SNPs biomarkers associated with the AD in order to perform a reliable classification of AD. In contrast to existing related works, we utilize deep transfer learning with varying experimental analysis for reliable classification of AD. For this purpose, the convolutional neural networks (CNN) are firstly trained over the genome-wide association studies (GWAS) dataset requested from the AD neuroimaging initiative. We then employ the deep transfer learning for further training of our CNN (as base model) over a different AD GWAS dataset, to extract the final set of features. The extracted features are then fed into Support Vector Machine for classification of AD. Detailed experiments are performed using multiple datasets and varying experimental configurations. The statistical outcomes indicate an accuracy of 89% which is a significant improvement when benchmarked with existing related works.
Collapse
|
20
|
Sofer T, Kurniansyah N, Granot-Hershkovitz E, Goodman MO, Tarraf W, Broce I, Lipton RB, Daviglus M, Lamar M, Wassertheil-Smoller S, Cai J, DeCarli CS, Gonzalez HM, Fornage M. A polygenic risk score for Alzheimer's disease constructed using APOE-region variants has stronger association than APOE alleles with mild cognitive impairment in Hispanic/Latino adults in the U.S. Alzheimers Res Ther 2023; 15:146. [PMID: 37649099 PMCID: PMC10469805 DOI: 10.1186/s13195-023-01298-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 08/24/2023] [Indexed: 09/01/2023]
Abstract
INTRODUCTION Polygenic Risk Scores (PRSs) are summaries of genetic risk alleles for an outcome. METHODS We used summary statistics from five GWASs of AD to construct PRSs in 4,189 diverse Hispanics/Latinos (mean age 63 years) from the Study of Latinos-Investigation of Neurocognitive Aging (SOL-INCA). We assessed the PRS associations with MCI in the combined set of people and in diverse subgroups, and when including and excluding the APOE gene region. We also assessed PRS associations with MCI in an independent dataset from the Mass General Brigham Biobank. RESULTS A simple sum of 5 PRSs ("PRSsum"), each constructed based on a different AD GWAS, was associated with MCI (OR = 1.28, 95% CI [1.14, 1.41]) in a model adjusted for counts of the APOE-[Formula: see text] and APOE-[Formula: see text] alleles. Associations of single-GWAS PRSs were weaker. When removing SNPs from the APOE region from the PRSs, the association of PRSsum with MCI was weaker (OR = 1.17, 95% CI [1.04,1.31] with adjustment for APOE alleles). In all association analyses, APOE-[Formula: see text] and APOE-[Formula: see text] alleles were not associated with MCI. DISCUSSION A sum of AD PRSs is associated with MCI in Hispanic/Latino older adults. Despite no association of APOE-[Formula: see text] and APOE-[Formula: see text] alleles with MCI, the association of the AD PRS with MCI is stronger when including the APOE region. Thus, APOE variants different than the classic APOE alleles may be important predictors of MCI in Hispanic/Latino adults.
Collapse
Affiliation(s)
- Tamar Sofer
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
- CardioVascular Institute, Beth Israel Deaconess Medical Center, Boston, MA, USA.
| | - Nuzulul Kurniansyah
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | - Einat Granot-Hershkovitz
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Matthew O Goodman
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Wassim Tarraf
- Institute of Gerontology, Wayne State University, Detroit, MI, USA
| | - Iris Broce
- Department of Neurosciences, University of California San Diego, San Diego, CA, USA
| | | | - Martha Daviglus
- Department of Medicine, Institute for Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
| | - Melissa Lamar
- Department of Medicine, Institute for Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
- Rush Alzheimer's Disease Research Center, Rush University Medical Center, Chicago, IL, USA
| | - Sylvia Wassertheil-Smoller
- Department of Epidemiology & Population Health, Department of Pediatrics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Charles S DeCarli
- Department of Neurology, University of California at Davis, Sacramento, CA, USA
| | - Hector M Gonzalez
- Department of Neurosciences, University of California San Diego, San Diego, CA, USA
- Shiley-Marcos Alzheimer's Disease Center, University of California San Diego, La Jolla, CA, USA
| | - Myriam Fornage
- Institute of Molecular Medicine, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
21
|
Gao Y, Sharma T, Cui Y. Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective. Annu Rev Biomed Data Sci 2023; 6:153-171. [PMID: 37104653 PMCID: PMC10529864 DOI: 10.1146/annurev-biodatasci-020722-020704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Teena Sharma
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Yan Cui
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| |
Collapse
|
22
|
Lu H, Zhang S, Jiang Z, Zeng P. Leveraging trans-ethnic genetic risk scores to improve association power for complex traits in underrepresented populations. Brief Bioinform 2023:bbad232. [PMID: 37332016 DOI: 10.1093/bib/bbad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/06/2023] [Accepted: 06/04/2023] [Indexed: 06/20/2023] Open
Abstract
Trans-ethnic genome-wide association studies have revealed that many loci identified in European populations can be reproducible in non-European populations, indicating widespread trans-ethnic genetic similarity. However, how to leverage such shared information more efficiently in association analysis is less investigated for traits in underrepresented populations. We here propose a statistical framework, trans-ethnic genetic risk score informed gene-based association mixed model (GAMM), by hierarchically modeling single-nucleotide polymorphism effects in the target population as a function of effects of the same trait in well-studied populations. GAMM powerfully integrates genetic similarity across distinct ancestral groups to enhance power in understudied populations, as confirmed by extensive simulations. We illustrate the usefulness of GAMM via the application to 13 blood cell traits (i.e. basophil count, eosinophil count, hematocrit, hemoglobin concentration, lymphocyte count, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, monocyte count, neutrophil count, platelet count, red blood cell count and total white blood cell count) in Africans of the UK Biobank (n = 3204) while utilizing genetic overlap shared in Europeans (n = 746 667) and East Asians (n = 162 255). We discovered multiple new associated genes, which had otherwise been missed by existing methods, and revealed that the trans-ethnic information indirectly contributed much to the phenotypic variance. Overall, GAMM represents a flexible and powerful statistical framework of association analysis for complex traits in underrepresented populations by integrating trans-ethnic genetic similarity across well-studied populations, and helps attenuate health inequities in current genetics research for people of minority populations.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|