1
|
Giusti-Rodríguez P, Okewole N, Jain S, Montalvo-Ortiz JL, Peterson RE. Diversifying Psychiatric Genomics: Globally Inclusive Strategies Toward Health Equity. Psychiatr Clin North Am 2025; 48:241-256. [PMID: 40348415 DOI: 10.1016/j.psc.2025.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/14/2025]
Abstract
The underrepresentation of non-European researchers, participants, and datasets in psychiatric genetics hinders the understanding of mental health conditions and perpetuates health inequities. Ancestral diversity in research is crucial for advancing insights into disease etiology and achieving equity in precision medicine. Key strategies include optimizing data use, fostering global collaboration for capacity building, and adopting best practices in research methods. Ensuring clinical impact, accountability, and multi-agency commitment is vital. A more inclusive approach will enhance understanding of genetic and environmental factors in mental health, leading to equitable and accessible health care outcomes for all populations.
Collapse
Affiliation(s)
- Paola Giusti-Rodríguez
- Department of Psychiatry, University of Florida College of Medicine, Gainesville, FL, USA. https://twitter.com/GiustiLab
| | | | - Sanjeev Jain
- Department of Psychiatry, National Institute of Mental Health and Neurosciences, Bengaluru, India
| | - Janitza L Montalvo-Ortiz
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA. https://twitter.com/JanitzaMontalvo
| | - Roseann E Peterson
- Department of Psychiatry and Behavioral Sciences, Institute for Genomics in Health, State University of New York Downstate Health Sciences University, Brooklyn, NY, USA.
| |
Collapse
|
2
|
Lerga-Jaso J, Novković B, Unnikrishnan D, Bamunusinghe V, Hatorangan MR, Manson C, Pedersen H, Osama A, Terpolovsky A, Bohn S, De Marino A, Mahmoud AA, Bircan KO, Khan U, Grabherr MG, Yazdi PG. Tracing human genetic histories and natural selection with precise local ancestry inference. Nat Commun 2025; 16:4576. [PMID: 40379651 PMCID: PMC12084304 DOI: 10.1038/s41467-025-59936-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 05/06/2025] [Indexed: 05/19/2025] Open
Abstract
Local ancestry inference is crucial for unraveling demographic histories, discovering selection signals, and including admixed individuals in genomic studies for improved equity and portability. To date, the precision and resolution of local ancestry inference were limited by technical and dataset issues. To address them, we present Orchestra, a model we train on over 10,000 single-origin individuals from 35 worldwide populations that demonstrates superior accuracy in benchmarking analyzes. We employ Orchestra to shed light on the demographic history of Latin Americans, finding trace ancestries supported by historical records. We then deploy it to offer insight on the debated Ashkenazi Jewish origins, highlighting their South European heritage. Finally, Orchestra enables us to map selection signatures, identifying trace Scandinavian ancestry in British samples and unveiling an immune-rich region linked to respiratory infections passed down from the Viking conquests. Our work significantly advances the field of local ancestry inference, highlighting its use in admixed populations.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Alex Osama
- Research & Development, Omicsedge, Miami, FL, USA
| | | | - Sandra Bohn
- Research & Development, Omicsedge, Miami, FL, USA
| | | | | | | | - Umar Khan
- Research & Development, Omicsedge, Miami, FL, USA
| | | | - Puya G Yazdi
- Research & Development, Omicsedge, Miami, FL, USA.
| |
Collapse
|
3
|
Lehmann B, Bräuninger L, Cho Y, Falck F, Jayadeva S, Katell M, Nguyen T, Perini A, Tallman S, Mackintosh M, Silver M, Kuchenbäcker K, Leslie D, Chatterjee N, Holmes C. Methodological opportunities in genomic data analysis to advance health equity. Nat Rev Genet 2025:10.1038/s41576-025-00839-w. [PMID: 40369311 DOI: 10.1038/s41576-025-00839-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2025] [Indexed: 05/16/2025]
Abstract
The causes and consequences of inequities in genomic research and medicine are complex and widespread. However, it is widely acknowledged that underrepresentation of diverse populations in human genetics research risks exacerbating existing health disparities. Efforts to improve diversity are ongoing, but an often-overlooked source of inequity is the choice of analytical methods used to process, analyse and interpret genomic data. This choice can influence all areas of genomic research, from genome-wide association studies and polygenic score development to variant prioritization and functional genomics. New statistical and machine learning techniques to understand, quantify and correct for the impact of biases in genomic data are emerging within the wider genomic research and genomic medicine ecosystems. At this crucial time point, it is important to clarify where improvements in methods and practices can, or cannot, have a role in improving equity in genomics. Here, we review existing approaches to promote equity and fairness in statistical analysis for genomics, and propose future methodological developments that are likely to yield the most impact for equity.
Collapse
Affiliation(s)
- Brieuc Lehmann
- Department of Statistical Science, University College London, London, UK.
| | - Leandra Bräuninger
- Department of Statistical Science, University College London, London, UK
- The Alan Turing Institute, London, UK
| | - Yoonsu Cho
- Genomics England, London, UK
- Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Fabian Falck
- The Alan Turing Institute, London, UK
- Department of Statistics, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | - Matt Silver
- Genomics England, London, UK
- Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine, Banjul, The Gambia
| | - Karoline Kuchenbäcker
- Genomics England, London, UK
- Division of Psychiatry, University College London, London, UK
| | | | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Chris Holmes
- Department of Statistics, University of Oxford, Oxford, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
4
|
Sun Q, Horimoto ARVR, Chen B, Ockerman F, Mohlke KL, Blue E, Raffield LM, Li Y. Opportunities and challenges of local ancestry in genetic association analyses. Am J Hum Genet 2025; 112:727-740. [PMID: 40185073 DOI: 10.1016/j.ajhg.2025.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2024] [Revised: 03/05/2025] [Accepted: 03/05/2025] [Indexed: 04/07/2025] Open
Abstract
Recently, admixed populations make up an increasing percentage of the US and global populations, and the admixture is not uniform over space or time or across genomes. Therefore, it becomes indispensable to evaluate local ancestry in addition to global ancestry to improve genetic epidemiological studies. Recent advances in representing human genome diversity, coupled with large-scale whole-genome sequencing initiatives and improved tools for local ancestry inference, have enabled studies to demonstrate that incorporating local ancestry information enhances both genetic association analyses and polygenic risk predictions. Along with the opportunities that local ancestry provides, there exist challenges preventing its full usage in genetic analyses. In this review, we first summarize methods for local ancestry inference and illustrate how local ancestry can be utilized in various analyses, including admixture mapping, association testing, and polygenic risk score construction. In addition, we discuss current challenges in research involving local ancestry, both in terms of the inference itself and its role in genetic association studies. We further pinpoint some future study directions and methodology development opportunities to help more effectively incorporate local ancestry in genetic analyses. It is worth the effort to pursue those future directions and address these analytical challenges because the appropriate use of local ancestry estimates could help mitigate inequality in genomic medicine and improve our understanding of health and disease outcomes.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| | - Andrea R V R Horimoto
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Brian Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Frank Ockerman
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Elizabeth Blue
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA; Brotman Baty Institute, Seattle, WA 98195, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| |
Collapse
|
5
|
Bruxel EM, Rovaris DL, Belangero SI, Chavarría-Soley G, Cuellar-Barboza AB, Martínez-Magaña JJ, Nagamatsu ST, Nievergelt CM, Núñez-Ríos DL, Ota VK, Peterson RE, Sloofman LG, Adams AM, Albino E, Alvarado AT, Andrade-Brito D, Arguello-Pascualli PY, Bandeira CE, Bau CHD, Bulik CM, Buxbaum JD, Cappi C, Corral-Frias NS, Corrales A, Corsi-Zuelli F, Crowley JJ, Cupertino RB, da Silva BS, De Almeida SS, De la Hoz JF, Forero DA, Fries GR, Gelernter J, González-Giraldo Y, Grevet EH, Grice DE, Hernández-Garayua A, Hettema JM, Ibáñez A, Ionita-Laza I, Lattig MC, Lima YC, Lin YS, López-León S, Loureiro CM, Martínez-Cerdeño V, Martínez-Levy GA, Melin K, Moreno-De-Luca D, Muniz Carvalho C, Olivares AM, Oliveira VF, Ormond R, Palmer AA, Panzenhagen AC, Passos-Bueno MR, Peng Q, Pérez-Palma E, Prieto ML, Roussos P, Sanchez-Roige S, Santamaría-García H, Shansis FM, Sharp RR, Storch EA, Tavares MEA, Tietz GE, Torres-Hernández BA, Tovo-Rodrigues L, Trelles P, Trujillo-ChiVacuan EM, Velásquez MM, Vera-Urbina F, Voloudakis G, Wegman-Ostrosky T, Zhen-Duan J, Zhou H, Santoro ML, Nicolini H, Atkinson EG, Giusti-Rodríguez P, Montalvo-Ortiz JL. Psychiatric genetics in the diverse landscape of Latin American populations. Nat Genet 2025:10.1038/s41588-025-02127-z. [PMID: 40175716 DOI: 10.1038/s41588-025-02127-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Accepted: 02/14/2025] [Indexed: 04/04/2025]
Abstract
Psychiatric disorders are highly heritable and polygenic, influenced by environmental factors and often comorbid. Large-scale genome-wide association studies (GWASs) through consortium efforts have identified genetic risk loci and revealed the underlying biology of psychiatric disorders and traits. However, over 85% of psychiatric GWAS participants are of European ancestry, limiting the applicability of these findings to non-European populations. Latin America and the Caribbean, regions marked by diverse genetic admixture, distinct environments and healthcare disparities, remain critically understudied in psychiatric genomics. This threatens access to precision psychiatry, where diversity is crucial for innovation and equity. This Review evaluates the current state and advancements in psychiatric genomics within Latin America and the Caribbean, discusses the prevalence and burden of psychiatric disorders, explores contributions to psychiatric GWASs from these regions and highlights methods that account for genetic diversity. We also identify existing gaps and challenges and propose recommendations to promote equity in psychiatric genomics.
Collapse
Affiliation(s)
- Estela M Bruxel
- Department of Translational Medicine, School of Medical Sciences, University of Campinas, Campinas, Brazil
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Diego L Rovaris
- Department of Physiology and Biophysics, Instituto de Ciencias Biomedicas, Universidade de São Paulo, São Paulo, Brazil
| | - Sintia I Belangero
- Department of Morphology and Genetics, Universidade Federal de São Paulo, São Paulo, Brazil
- Laboratory of Integrative Neuroscience, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Gabriela Chavarría-Soley
- Escuela de Biología y Centro de Investigación en Biología Celular y Molecular, Universidad de Costa Rica, San Pedro, Costa Rica
| | - Alfredo B Cuellar-Barboza
- Department of Psychiatry, School of Medicine, Universidad Autónoma de Nuevo León, San Nicolás de los Garza, México
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, USA
| | - José J Martínez-Magaña
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
- Psychiatry Division, VA Connecticut Healthcare Center, West Haven, CT, USA
| | - Sheila T Nagamatsu
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
- Psychiatry Division, VA Connecticut Healthcare Center, West Haven, CT, USA
| | - Caroline M Nievergelt
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Diana L Núñez-Ríos
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
- Psychiatry Division, VA Connecticut Healthcare Center, West Haven, CT, USA
| | - Vanessa K Ota
- Department of Morphology and Genetics, Universidade Federal de São Paulo, São Paulo, Brazil
- Laboratory of Integrative Neuroscience, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Roseann E Peterson
- Department of Psychiatry and Behavioral Sciences, Institute for Genomics in Health, State University of New York Downstate Health Sciences University, Brooklyn, NY, USA
| | - Laura G Sloofman
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Amy M Adams
- Department of Psychiatry and Behavioral Sciences, Texas A&M University, College Station, TX, USA
| | - Elinette Albino
- School of Health Professions, University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico
| | - Angel T Alvarado
- Research Unit in Molecular Pharmacology and Genomic Medicine, VRI, San Ignacio de Loyola University, La Molina, Perú
| | | | - Paola Y Arguello-Pascualli
- Department of Medical Genetics, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cibele E Bandeira
- Department of Physiology and Biophysics, Instituto de Ciencias Biomedicas, Universidade de São Paulo, São Paulo, Brazil
| | - Claiton H D Bau
- Department of Genetics, Institute of Biosciences, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
- Laboratory of Developmental Psychiatry, Center of Experimental Research, Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil
| | - Cynthia M Bulik
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Joseph D Buxbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Carolina Cappi
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Alejo Corrales
- Departamento de Psiquiatría, Universidad Nacional de Tucumán, San Miguel de Tucumán, Argentina
| | - Fabiana Corsi-Zuelli
- Department of Neuroscience, Ribeirão Preto Medical School, Universidade de São Paulo, São Paulo, Brazil
| | - James J Crowley
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Renata B Cupertino
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Bruna S da Silva
- Department of Basic Health Sciences, Federal University of Health Sciences of Porto Alegre, Porto Alegre, Brazil
| | - Suzannah S De Almeida
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Juan F De la Hoz
- Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Diego A Forero
- School of Health and Sport Sciences, Fundación Universitaria del Área Andina, Bogotá, Colombia
| | - Gabriel R Fries
- Faillace Department of Psychiatry and Behavioral Sciences, the University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Joel Gelernter
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
- Psychiatry Division, VA Connecticut Healthcare Center, West Haven, CT, USA
| | - Yeimy González-Giraldo
- Biomedical Sciences Research Group, School of Medicine, Universidad Antonio Nariño, Bogotá, Colombia
| | - Eugenio H Grevet
- Department of Psychiatry and Legal Medicine, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Dorothy E Grice
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Adriana Hernández-Garayua
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
- Psychiatry Division, VA Connecticut Healthcare Center, West Haven, CT, USA
| | - John M Hettema
- Department of Psychiatry and Behavioral Sciences, Texas A&M University, College Station, TX, USA
| | - Agustín Ibáñez
- Latin American Brain Health Institute, Universidad Adolfo Ibañez, Santiago de Chile, Chile
- Global Brain Health Institute, Trinity College Dublin, Dublin, Ireland
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Columbia University, New York, NY, USA
- Department of Statistics, Lund University, Lund, Sweden
| | | | - Yago C Lima
- Department of Physiology and Biophysics, Instituto de Ciencias Biomedicas, Universidade de São Paulo, São Paulo, Brazil
| | - Yi-Sian Lin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sandra López-León
- Quantitative Safety Epidemiology, Novartis Pharma, East Hanover, NJ, USA
- Rutgers Center for Pharmacoepidemiology and Treatment Science, Rutgers University, New Brunswick, NJ, USA
| | - Camila M Loureiro
- Department of Neuroscience, Ribeirão Preto Medical School, Universidade de São Paulo, São Paulo, Brazil
| | | | - Gabriela A Martínez-Levy
- Department of Genetics, Subdirectorate of Clinical Research, National Institute of Psychiatry, México City, México
- Department of Cell and Tissular Biology, Medicine Faculty, National Autonomous University of Mexico, México City, México
| | - Kyle Melin
- School of Pharmacy, University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico
| | - Daniel Moreno-De-Luca
- Precision Medicine in Autism Group, Division of Child and Adolescent Psychiatry, Department of Psychiatry, Faculty of Medicine and Dentistry, University of Alberta, Alberta Health Services, CASA Mental Health, Edmonton, Alberta, Canada
| | | | - Ana Maria Olivares
- Broad Institute of Massachusetts Institute of Technology and Harvard University, Boston, MA, USA
| | - Victor F Oliveira
- Department of Physiology and Biophysics, Instituto de Ciencias Biomedicas, Universidade de São Paulo, São Paulo, Brazil
| | - Rafaella Ormond
- Disciplina de Biologia Molecular, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Alana C Panzenhagen
- Science for Life Laboratory, Department of Oncology-Pathology, Karolinska Institutet, Solna, Sweden
- Laboratório de Pesquisa Translacional em Comportamento Suicida, Universidade do Vale do Taquari, Lajeado, Brazil
| | - Maria Rita Passos-Bueno
- Departmento de Genetica e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Qian Peng
- Department of Neuroscience, the Scripps Research Institute, La Jolla, CA, USA
| | - Eduardo Pérez-Palma
- Facultad de Medicina Clínica Alemana, Centro de Genética y Genómica, Universidad del Desarrollo, Santiago, Chile
| | - Miguel L Prieto
- Mental Health Service, Clínica Universidad de los Andes, Santiago, Chile
- Department of Psychiatry, Faculty of Medicine, Universidad de los Andes, Santiago, Chile
| | - Panos Roussos
- Center for Disease Neurogenomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sandra Sanchez-Roige
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Hernando Santamaría-García
- PhD Program of Neuroscience, Pontificia Universidad Javeriana, Hospital San Ignacio, Center for Memory and Cognition, Intellectus, Bogotá, Colombia
| | - Flávio M Shansis
- Graduate Program of Medical Sciences, Universidade do Vale do Taquari, Lajeado, Brazil
- Universidade Federal de Ciências da Saúde de Porto Alegre, Porto Alegre, Brazil
| | - Rachel R Sharp
- Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Eric A Storch
- Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine, Houston, TX, USA
| | - Maria Eduarda A Tavares
- Department of Genetics, Institute of Biosciences, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Grace E Tietz
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | | | - Pilar Trelles
- Department of Psychiatry and Behavioral Sciences, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Eva M Trujillo-ChiVacuan
- Research Department, Comenzar de Nuevo Eating Disorders Treatment Center, Monterrey, México
- Escuela de Medicina y Ciencias de la Salud Tecnológico de Monterrey, Monterrey, México
| | - Maria M Velásquez
- Instituto de Genética Humana, Facultad de Medicina, Pontificia Universidad Javeriana, Bogotá, Colombia
| | - Fernando Vera-Urbina
- School of Pharmacy, University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico
| | - Georgios Voloudakis
- Center for Disease Neurogenomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Jenny Zhen-Duan
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
| | - Hang Zhou
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
- Psychiatry Division, VA Connecticut Healthcare Center, West Haven, CT, USA
| | - Marcos L Santoro
- Disciplina de Biologia Molecular, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Humberto Nicolini
- Laboratorio de Enfermedades Psiquiátricas, Neurodegenerativas y Adicciones, Instituto Nacional de Medicina Genómica, Mexico City, México
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Jan and Dan Duncan Neurological Research Center, Texas Children's Hospital, Houston, TX, USA.
| | - Paola Giusti-Rodríguez
- Department of Psychiatry, University of Florida College of Medicine, Gainesville, FL, USA.
| | - Janitza L Montalvo-Ortiz
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA.
- Psychiatry Division, VA Connecticut Healthcare Center, West Haven, CT, USA.
- Department of Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, USA.
| |
Collapse
|
6
|
Sun Q, Du J, Tang Y, Best LG, Haack K, Zhang Y, Cole SA, Franceschini N. Polygenic Scores of Cardiometabolic Risk Factors in American Indian Adults. JAMA Netw Open 2025; 8:e250535. [PMID: 40072435 PMCID: PMC11904716 DOI: 10.1001/jamanetworkopen.2025.0535] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Accepted: 01/06/2025] [Indexed: 03/14/2025] Open
Abstract
Importance Numerous efforts have been made to include diverse populations in genetic studies, but American Indian populations are still severely underrepresented. Polygenic scores derived from genetic data have been proposed in clinical care, but how polygenic scores perform in American Indian individuals and whether they can predict disease risk in this population remains unknown. Objective To study the performance of polygenic scores for cardiometabolic risk factors of lipid traits and C-reactive protein in American Indian adults and to determine whether such scores are helpful in clinical prediction for cardiometabolic diseases. Design, Setting, and Participants The Strong Heart Study (SHS) is a large American Indian cohort recruited from 1989 to 1991, with ongoing follow-up (phase VII). In this genetic association study, data from SHS American Indian participants were used in addition to data from 2 large-scale, external, ancestry-mismatched genome-wide association studies (GWASs; 450 865 individuals from a European GWAS and 33 096 individuals from a multi-ancestry GWAS) and 1 small-scale internal ancestry-matched American Indian GWAS (2000 individuals). Analyses were conducted from February 2023 to August 2024. Exposure Genetic risk score for cardiometabolic disease risk factors from 6 traits including 5 lipids (apolipoprotein A, apolipoprotein B, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, and triglycerides), and an inflammatory biomarker (C-reactive protein [CRP]). Main Outcomes and Measures Data from SHS participants and the 2 GWASs were used to construct 8 polygenic scores. The association of polygenic scores with cardiometabolic disease was assessed using 2-sided z tests and 1-sided likelihood ratio tests. Results In the 3157 SHS participants (mean [SD] age, 56.44 [8.12] years; 1845 female [58.4%]), a large European-based polygenic score had the most robust performance (mean [SD] R2 = 5.0% [1.7%]), but adding a small-scale ancestry-matched GWAS using American Indian data helped improve polygenic score prediction for 5 of 6 traits (all but CRP; mean [SD] R2, 7.6% [3.2%]). Lipid polygenic scores developed in American Indian individuals improved prediction of diabetes compared with baseline clinical risk factors (area under the curve for absolute improvement, 0.86%; 95% CI, 0.78%-0.93%; likelihood ratio test P = 3.8 × 10-3). Conclusions and Relevance In this genetic association study of lipids and CRP among American Indian individuals, polygenic scores of lipid traits were found to improve prediction of diabetes when added to clinical risk factors, although the magnitude of improvement was small. The transferability of polygenic scores derived from other populations is still a concern, with implications for the advancement of precision medicine and the potential of perpetuating health disparities, particularly in this underrepresented population.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill
- Now with: Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Jiawen Du
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Yihan Tang
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Lyle G. Best
- Missouri Breaks Industries Research Inc, Eagle Butte, South Dakota
| | - Karin Haack
- Texas Biomedical Research Institute, San Antonio
| | - Ying Zhang
- Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City
| | | | - Nora Franceschini
- Department of Epidemiology, University of North Carolina at Chapel Hill
| |
Collapse
|
7
|
Chen X, Wang H, Broce I, Dale A, Yu B, Zhou LY, Li X, Argos M, Daviglus ML, Cai J, Franceschini N, Sofer T. Old vs. New Local Ancestry Inference in HCHS/SOL: A Comparative Study. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.04.636481. [PMID: 39975339 PMCID: PMC11838596 DOI: 10.1101/2025.02.04.636481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Hispanic/Latino populations are admixed, with genetic contributions from multiple ancestral populations. Studies of genetic association in these admixed populations often use methods such as admixture mapping, which relies on inferred counts of "local" ancestry, i.e., of the source ancestral population at a locus. Local ancestries are inferred using external reference panels that represent ancestral populations, making the choice of inference method and reference panel critical. This study used a dataset of Hispanic/Latino individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) to evaluate the "old" local ancestry inference performed using the state-of-the-art inference method, RFMix, alongside "new" inferences performed using Fast Local Ancestry Estimation (FLARE), which also used an updated reference panel. We compared their performance in terms of global and local ancestry correlations, as well as admixture mapping-based associations. Overall, the old RFMix and new FLARE inferences were highly similar for both global and local ancestries, with FLARE-inferred datasets yielding admixture mapping results consistent with those computed from RFMix. However, in some genomic regions the old and new local ancestries have relatively lower correlations (Pearson R < 0.9). Most of these genomic regions (86.42%) were mapped to either ENCODE blacklist regions, or to gene clusters, compared to 7.67% of randomly-matched regions with high correlations (Pearson R > 0.97) between old and new local ancestries.
Collapse
Affiliation(s)
- Xueying Chen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Hao Wang
- Department of Radiology, University of California San Diego, La Jolla, CA, USA
| | - Iris Broce
- Department of Neurosciences, University of California, San Diego, San Diego, California, USA
- Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, UCSF, San Francisco, California, USA
| | - Anders Dale
- Department of Radiology, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California, San Diego, San Diego, California, USA
| | - Bing Yu
- Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Laura Y Zhou
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Xihao Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Maria Argos
- Department of Environmental Health, School of Public Health, Boston University, Boston, MA, USA
- Department of Epidemiology and Biostatistics, School of Public Health, University of Illinois Chicago, Chicago, IL, USA
| | - Martha L Daviglus
- Institute for Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Nora Franceschini
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Tamar Sofer
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| |
Collapse
|
8
|
Ruan Y, Bhukar R, Patel A, Koyama S, Hull L, Truong B, Hornsby W, Zhang H, Chatterjee N, Natarajan P. Leveraging genetic ancestry continuum information to interpolate PRS for admixed populations. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2024.11.09.24316996. [PMID: 39867390 PMCID: PMC11759244 DOI: 10.1101/2024.11.09.24316996] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
The relatively low representation of admixed populations in both discovery and fine-tuning individual-level datasets limits polygenic risk score (PRS) development and equitable clinical translation for admixed populations. Under the assumption that the most informative PRS weight for a homogeneous sample varies linearly in an ancestry continuum space, we introduce a Genetic Distance-assisted PRS Combination Pipeline for Diverse Genetic Ancestries (DiscoDivas) to interpolate a harmonized PRS for diverse, especially admixed, ancestries, leveraging multiple PRS weights fine-tuned within single-ancestry samples and genetic distance. DiscoDivas treats ancestry as a continuous variable and does not require shifting between different models when calculating PRS for different ancestries. We generated PRS with DiscoDivas and the current conventional method, i.e. fine-tuning multiple GWAS PRS using the matched or similar ancestry samples. DiscoDivas generated a harmonized PRS of the accuracy comparable to or higher than the conventional approach, with the greatest advantage exhibited in admixed individuals.
Collapse
Affiliation(s)
- Yunfeng Ruan
- Program in Medical and Population, Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Rohan Bhukar
- Program in Medical and Population, Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Aniruddh Patel
- Program in Medical and Population, Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA, USA
| | - Satoshi Koyama
- Program in Medical and Population, Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA, USA
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Leland Hull
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Buu Truong
- Program in Medical and Population, Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA, USA
- Department of Genetic Epidemiology and Statistical Genetics, Harvard T.H. School of Public Health, Cambridge, MA, US
| | - Whitney Hornsby
- Program in Medical and Population, Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Pradeep Natarajan
- Program in Medical and Population, Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
9
|
Wang C, Markus H, Diwadkar AR, Khunsriraksakul C, Carrel L, Li B, Zhong X, Wang X, Zhan X, Foulke GT, Olsen NJ, Liu DJ, Jiang B. Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages. Nat Commun 2025; 16:180. [PMID: 39747168 PMCID: PMC11695684 DOI: 10.1038/s41467-024-55636-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 12/18/2024] [Indexed: 01/04/2025] Open
Abstract
Autoimmune diseases often exhibit a preclinical stage before diagnosis. Electronic health record (EHR) based-biobanks contain genetic data and diagnostic information, which can identify preclinical individuals at risk for progression. Biobanks typically have small numbers of cases, which are not sufficient to construct accurate polygenic risk scores (PRS). Importantly, progression and case-control phenotypes may have shared genetic basis, which we can exploit to improve prediction accuracy. We propose a novel method Genetic Progression Score (GPS) that integrates biobank and case-control study to predict the disease progression risk. Via penalized regression, GPS incorporates PRS weights for case-control studies as prior and forces model parameters to be similar to the prior if the prior improves prediction accuracy. In simulations, GPS consistently yields better prediction accuracy than alternative strategies relying on biobank or case-control samples only and those combining biobank and case-control samples. The improvement is particularly evident when biobank sample is smaller or the genetic correlation is lower. We derive PRS for the progression from preclinical rheumatoid arthritis and systemic lupus erythematosus in the BioVU biobank and validate them in All of Us. For both diseases, GPS achieves the highest predictionR 2 and the resulting PRS yields the strongest correlation with progression prevalence.
Collapse
Affiliation(s)
- Chen Wang
- Bioinformatics and Genomics Graduate Program, College of Medicine, Penn State University, Hershey, PA, USA
- Department of Public Health Sciences, College of Medicine, Penn State University, Hershey, PA, USA
| | - Havell Markus
- Bioinformatics and Genomics Graduate Program, College of Medicine, Penn State University, Hershey, PA, USA
| | - Avantika R Diwadkar
- Bioinformatics and Genomics Graduate Program, College of Medicine, Penn State University, Hershey, PA, USA
- Department of Public Health Sciences, College of Medicine, Penn State University, Hershey, PA, USA
| | - Chachrit Khunsriraksakul
- Bioinformatics and Genomics Graduate Program, College of Medicine, Penn State University, Hershey, PA, USA
| | - Laura Carrel
- Department of Biochemistry and Molecular Biology, College of Medicine, Penn State University, Hershey, PA, USA
| | - Bingshan Li
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN, USA
| | - Xue Zhong
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Xingyan Wang
- Department of Public Health Sciences, College of Medicine, Penn State University, Hershey, PA, USA
| | - Xiaowei Zhan
- Department of Statistical Science, Southern Methodist University, Dallas, TX, USA
- Department of Population and Data Sciences, Quantitative Biomedical Research Center, Southwestern Medical Center University of Texas, Dallas, TX, USA
- Center for Genetics of Host Defense, Southwestern Medical Center University of Texas, Dallas, TX, USA
| | - Galen T Foulke
- Department of Public Health Sciences, College of Medicine, Penn State University, Hershey, PA, USA
- Department of Dermatology, College of Medicine, Penn State University, Hershey, PA, USA
| | - Nancy J Olsen
- Department of Medicine, College of Medicine, Penn State University, Hershey, PA, USA
| | - Dajiang J Liu
- Bioinformatics and Genomics Graduate Program, College of Medicine, Penn State University, Hershey, PA, USA.
- Department of Public Health Sciences, College of Medicine, Penn State University, Hershey, PA, USA.
| | - Bibo Jiang
- Department of Public Health Sciences, College of Medicine, Penn State University, Hershey, PA, USA.
| |
Collapse
|
10
|
Khan A, Kiryluk K. Polygenic scores and their applications in kidney disease. Nat Rev Nephrol 2025; 21:24-38. [PMID: 39271761 DOI: 10.1038/s41581-024-00886-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/06/2024] [Indexed: 09/15/2024]
Abstract
Genome-wide association studies (GWAS) have uncovered thousands of risk variants that individually have small effects on the risk of human diseases, including chronic kidney disease, type 2 diabetes, heart diseases and inflammatory disorders, but cumulatively explain a substantial fraction of disease risk, underscoring the complexity and pervasive polygenicity of common disorders. This complexity poses unique challenges to the clinical translation of GWAS findings. Polygenic scores combine small effects of individual GWAS risk variants across the genome to improve personalized risk prediction. Several polygenic scores have now been developed that exhibit sufficiently large effects to be considered clinically actionable. However, their clinical use is limited by their partial transferability across ancestries and a lack of validated models that combine polygenic, monogenic, family history and clinical risk factors. Moreover, prospective studies are still needed to demonstrate the clinical utility and cost-effectiveness of polygenic scores in clinical practice. Here, we discuss evolving methods for developing polygenic scores, best practices for validating and reporting their performance, and the study designs that will empower their clinical implementation. We specifically focus on the polygenic scores relevant to nephrology and other chronic, complex diseases and review their key limitations, necessary refinements and potential clinical applications.
Collapse
Affiliation(s)
- Atlas Khan
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA.
| |
Collapse
|
11
|
Kullo IJ, Conomos MP, Nelson SC, Adebamowo SN, Choudhury A, Conti D, Fullerton SM, Gogarten SM, Heavner B, Hornsby WE, Kenny EE, Khan A, Khera AV, Li Y, Martin I, Mercader JM, Ng M, Raffield LM, Reiner A, Rowley R, Schaid D, Stilp A, Wiley K, Wilson R, Witte JS, Natarajan P. The PRIMED Consortium: Reducing disparities in polygenic risk assessment. Am J Hum Genet 2024; 111:2594-2606. [PMID: 39561770 PMCID: PMC11639095 DOI: 10.1016/j.ajhg.2024.10.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 10/16/2024] [Accepted: 10/16/2024] [Indexed: 11/21/2024] Open
Abstract
By improving disease risk prediction, polygenic risk scores (PRSs) could have a significant impact on health promotion and disease prevention. Due to the historical oversampling of populations with European ancestry for genome-wide association studies, PRSs perform less well in other, understudied populations, leading to concerns that clinical use in their current forms could widen health care disparities. The PRIMED Consortium was established to develop methods to improve the performance of PRSs in global populations and individuals of diverse genetic ancestry. To this end, PRIMED is aggregating and harmonizing multiple phenotype and genotype datasets on AnVIL, an interoperable secure cloud-based platform, to perform individual- and summary-level analyses using population and statistical genetics approaches. Study sites, the coordinating center, and representatives from the NIH work alongside other NHGRI and global consortia to achieve these goals. PRIMED is also evaluating ethical and social implications of PRS implementation and investigating the joint modeling of social determinants of health and PRS in computing disease risk. The phenotypes of interest are primarily cardiometabolic diseases and cancer, the leading causes of death and disability worldwide. Early deliverables of the consortium include methods for data sharing on AnVIL, development of a common data model to harmonize phenotype and genotype data from cohort studies as well as electronic health records, adaptation of recent guidelines for population descriptors to global cohorts, and sharing of PRS methods/tools. As a multisite collaboration, PRIMED aims to foster equity in the development and use of polygenic risk assessment.
Collapse
Affiliation(s)
- Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA.
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Sarah C Nelson
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Sally N Adebamowo
- Department of Epidemiology and Public Health, University of Maryland, Baltimore, MD, USA
| | - Ananyo Choudhury
- Sydney Brenner Institute of Molecular Bioscience, University of Witwatersrand, Johannesburg, South Africa
| | - David Conti
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
| | - Stephanie M Fullerton
- Department of Bioethics and Humanities, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Ben Heavner
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Whitney E Hornsby
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Eimear E Kenny
- Institute of Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alyna Khan
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Amit V Khera
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Yun Li
- Department of Genetics, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
| | - Iman Martin
- National Human Genome Research Institute, National Institutes of Health, Baltimore, MD, USA
| | - Josep M Mercader
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Maggie Ng
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
| | - Alex Reiner
- Department of Epidemiology, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Robb Rowley
- National Human Genome Research Institute, National Institutes of Health, Baltimore, MD, USA
| | - Daniel Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Adrienne Stilp
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Ken Wiley
- National Human Genome Research Institute, National Institutes of Health, Baltimore, MD, USA
| | - Riley Wilson
- National Human Genome Research Institute, National Institutes of Health, Baltimore, MD, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University, Stanford, CA, USA
| | - Pradeep Natarajan
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
12
|
Zhao Z, Dorn S, Wu Y, Yang X, Jin J, Lu Q. One score to rule them all: regularized ensemble polygenic risk prediction with GWAS summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.27.625748. [PMID: 39677614 PMCID: PMC11642782 DOI: 10.1101/2024.11.27.625748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Ensemble learning has been increasingly popular for boosting the predictive power of polygenic risk scores (PRS), with almost every recent multi-ancestry PRS approach employing ensemble learning as a final step. Existing ensemble approaches rely on individual-level data for model training, which severely limits their real-world applications, especially in non-European populations without sufficient genomic samples. Here, we introduce a statistical framework to construct regularized ensemble PRS, which allows us to combine a large number of candidate PRS models using only summary statistics from genome-wide association studies. We demonstrate its robust and substantial improvement over many existing PRS models in both within- and cross-ancestry applications. We believe this is truly "one score to rule them all" due to its capability to continuously combine newly developed PRS models with existing models to improve prediction performance, which makes it a universal approach that should always be employed in future PRS applications.
Collapse
Affiliation(s)
- Zijie Zhao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Stephen Dorn
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Yuchang Wu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Xiaoyu Yang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Jin Jin
- Department of Biostatistics, Epidemiology and Bioinformatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
- Department of Statistics, University of Wisconsin-Madison, Madison, WI
| |
Collapse
|
13
|
Ndong Sima CAA, Step K, Swart Y, Schurz H, Uren C, Möller M. Methodologies underpinning polygenic risk scores estimation: a comprehensive overview. Hum Genet 2024; 143:1265-1280. [PMID: 39425790 PMCID: PMC11522080 DOI: 10.1007/s00439-024-02710-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 10/06/2024] [Indexed: 10/21/2024]
Abstract
Polygenic risk scores (PRS) have emerged as a promising tool for predicting disease risk and treatment outcomes using genomic data. Thousands of genome-wide association studies (GWAS), primarily involving populations of European ancestry, have supported the development of PRS models. However, these models have not been adequately evaluated in non-European populations, raising concerns about their clinical validity and predictive power across diverse groups. Addressing this issue requires developing novel risk prediction frameworks that leverage genetic characteristics across diverse populations, considering host-microbiome interactions and a broad range of health measures. One of the key aspects in evaluating PRS is understanding the strengths and limitations of various methods for constructing them. In this review, we analyze strengths and limitations of different methods for constructing PRS, including traditional weighted approaches and new methods such as Bayesian and Frequentist penalized regression approaches. Finally, we summarize recent advances in PRS calculation methods development, and highlight key areas for future research, including development of models robust across diverse populations by underlining the complex interplay between genetic variants across diverse ancestral backgrounds in disease risk as well as treatment response prediction. PRS hold great promise for improving disease risk prediction and personalized medicine; therefore, their implementation must be guided by careful consideration of their limitations, biases, and ethical implications to ensure that they are used in a fair, equitable, and responsible manner.
Collapse
Affiliation(s)
- Carene Anne Alene Ndong Sima
- Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, South African Medical Research Council Centre for Tuberculosis Research, Stellenbosch University, Cape Town, South Africa
| | - Kathryn Step
- Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, South African Medical Research Council Centre for Tuberculosis Research, Stellenbosch University, Cape Town, South Africa
| | - Yolandi Swart
- Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, South African Medical Research Council Centre for Tuberculosis Research, Stellenbosch University, Cape Town, South Africa
| | - Haiko Schurz
- Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, South African Medical Research Council Centre for Tuberculosis Research, Stellenbosch University, Cape Town, South Africa
| | - Caitlin Uren
- Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, South African Medical Research Council Centre for Tuberculosis Research, Stellenbosch University, Cape Town, South Africa
- Centre for Bioinformatics and Computational Biology, Stellenbosch University, Cape Town, South Africa
| | - Marlo Möller
- Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, South African Medical Research Council Centre for Tuberculosis Research, Stellenbosch University, Cape Town, South Africa.
- Centre for Bioinformatics and Computational Biology, Stellenbosch University, Cape Town, South Africa.
| |
Collapse
|
14
|
Chen T, Zhang H, Mazumder R, Lin X. SPLENDID incorporates continuous genetic ancestry in biobank-scale data to improve polygenic risk prediction across diverse populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.14.618256. [PMID: 39464044 PMCID: PMC11507800 DOI: 10.1101/2024.10.14.618256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Polygenic risk scores are widely used in disease risk stratification, but their accuracy varies across diverse populations. Recent methods large-scale leverage multi-ancestry data to improve accuracy in under-represented populations but require labelling individuals by ancestry for prediction. This poses challenges for practical use, as clinical practices are typically not based on ancestry. We propose SPLENDID, a novel penalized regression framework for diverse biobank-scale data. Our method utilizes ancestry principal component interactions to model genetic ancestry as a continuum within a single prediction model for all ancestries, eliminating the need for discrete labels. In extensive simulations and analyses of 9 traits from the All of Us Research Program (N=224,364) and UK Biobank (N=340,140), SPLENDID significantly outperformed existing methods in prediction accuracy and model sparsity. By directly incorporating continuous genetic ancestry in model training, SPLENDID stands as a valuable tool for robust risk prediction across diverse populations and fairer clinical implementation.
Collapse
|
15
|
Akamatsu K, Golzari S, Amariuta T. Powerful mapping of cis-genetic effects on gene expression across diverse populations reveals novel disease-critical genes. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.25.24314410. [PMID: 39399015 PMCID: PMC11469471 DOI: 10.1101/2024.09.25.24314410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
While disease-associated variants identified by genome-wide association studies (GWAS) most likely regulate gene expression levels, linking variants to target genes is critical to determining the functional mechanisms of these variants. Genetic effects on gene expression have been extensively characterized by expression quantitative trait loci (eQTL) studies, yet data from non-European populations is limited. This restricts our understanding of disease to genes whose regulatory variants are common in European populations. While previous work has leveraged data from multiple populations to improve GWAS power and polygenic risk score (PRS) accuracy, multi-ancestry data has not yet been used to better estimate cis-genetic effects on gene expression. Here, we present a new method, Multi-Ancestry Gene Expression Prediction Regularized Optimization (MAGEPRO), which constructs robust genetic models of gene expression in understudied populations or cell types by fitting a regularized linear combination of eQTL summary data across diverse cohorts. In simulations, our tool generates more accurate models of gene expression than widely-used LASSO and the state-of-the-art multi-ancestry PRS method, PRS-CSx, adapted to gene expression prediction. We attribute this improvement to MAGEPRO's ability to more accurately estimate causal eQTL effect sizes (p < 3.98 × 10-4, two-sided paired t-test). With real data, we applied MAGEPRO to 8 eQTL cohorts representing 3 ancestries (average n = 355) and consistently outperformed each of 6 competing methods in gene expression prediction tasks. Integration with GWAS summary statistics across 66 complex traits (representing 22 phenotypes and 3 ancestries) resulted in 2,331 new gene-trait associations, many of which replicate across multiple ancestries, including PHTF1 linked to white blood cell count, a gene which is overexpressed in leukemia patients. MAGEPRO also identified biologically plausible novel findings, such as PIGB, an essential component of GPI biosynthesis, associated with heart failure, which has been previously evidenced by clinical outcome data. Overall, MAGEPRO is a powerful tool to enhance inference of gene regulatory effects in underpowered datasets and has improved our understanding of population-specific and shared genetic effects on complex traits.
Collapse
Affiliation(s)
- Kai Akamatsu
- School of Biological Sciences, UC San Diego, La Jolla, CA, USA
- Department of Medicine, Division of Biomedical Informatics, UC San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, UC San Diego, La Jolla, CA, USA
| | - Stephen Golzari
- Department of Medicine, Division of Biomedical Informatics, UC San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, UC San Diego, La Jolla, CA, USA
- Shu Chien-Gene Lay Department of Bioengineering, UC San Diego, La Jolla, CA, USA
| | - Tiffany Amariuta
- Department of Medicine, Division of Biomedical Informatics, UC San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, UC San Diego, La Jolla, CA, USA
| |
Collapse
|
16
|
Wen J, Sun Q, Huang L, Zhou L, Doyle MF, Ekunwe L, Durda P, Olson NC, Reiner AP, Li Y, Raffield LM. Gene expression and splicing QTL analysis of blood cells in African American participants from the Jackson Heart Study. Genetics 2024; 228:iyae098. [PMID: 39056362 PMCID: PMC11373511 DOI: 10.1093/genetics/iyae098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 06/05/2024] [Indexed: 07/28/2024] Open
Abstract
Most gene expression and alternative splicing quantitative trait loci (eQTL/sQTL) studies have been biased toward European ancestry individuals. Here, we performed eQTL and sQTL analyses using TOPMed whole-genome sequencing-derived genotype data and RNA-sequencing data from stored peripheral blood mononuclear cells in 1,012 African American participants from the Jackson Heart Study (JHS). At a false discovery rate of 5%, we identified 17,630 unique eQTL credible sets covering 16,538 unique genes; and 24,525 unique sQTL credible sets covering 9,605 unique genes, with lead QTL at P < 5e-8. About 24% of independent eQTLs and independent sQTLs with a minor allele frequency > 1% in JHS were rare (minor allele frequency < 0.1%), and therefore unlikely to be detected, in European ancestry individuals. Finally, we created an open database, which is freely available online, allowing fast query and bulk download of our QTL results.
Collapse
Affiliation(s)
- Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Le Huang
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Lingbo Zhou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Margaret F Doyle
- Department of Pathology and Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, VT 05405, USA
| | - Lynette Ekunwe
- Department of Medicine, University of MS Medical Center (UMMC), Jackson, MS 39213, USA
| | - Peter Durda
- Department of Pathology and Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, VT 05405, USA
| | - Nels C Olson
- Department of Pathology and Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, VT 05405, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Research, Seattle, WA 98109, USA
| | - Yun Li
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| |
Collapse
|
17
|
Tang C, Sun Q, Zeng X, Yang X, Liu F, Zhao J, Shen Y, Liu B, Wen J, Li Y. Cell-type specific inference from bulk RNA-sequencing data by integrating single cell reference profiles via EPIC-unmix. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.595514. [PMID: 38826297 PMCID: PMC11142188 DOI: 10.1101/2024.05.23.595514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Cell type specific (CTS) analysis is essential to reveal biological insights obscured in bulk tissue data. However, single-cell (sc) or single-nuclei (sn) resolution data are still cost-prohibitive for large-scale samples. Thus, computational methods to perform deconvolution from bulk tissue data are highly valuable. We here present EPIC-unmix, a novel two-step empirical Bayesian method integrating reference sc/sn RNA-seq data and bulk RNA-seq data from target samples to enhance the accuracy of CTS inference. We demonstrate through comprehensive simulations across three tissues that EPIC-unmix achieved 4.6% - 109.8% higher accuracy compared to alternative methods. By applying EPIC-unmix to human bulk brain RNA-seq data from the ROSMAP and MSBB cohorts, we identified multiple genes differentially expressed between Alzheimer's disease (AD) cases versus controls in a CTS manner, including 57.4% novel genes not identified using similar sample size sc/snRNA-seq data, indicating the power of our in-silico approach. Among the 6-69% overlapping, 83%-100% are in consistent direction with those from sc/snRNA-seq data, supporting the reliability of our findings. EPIC-unmix inferred CTS expression profiles similarly empowers CTS eQTL analysis. Among the novel eQTLs, we highlight a microglia eQTL for AD risk gene AP3B2, obscured in bulk and missed by sc/snRNA-seq based eQTL analysis. The variant resides in a microglia-specific cCRE, forming chromatin loop with AP3B2 promoter region in microglia. Taken together, we believe EPIC-unmix will be a valuable tool to enable more powerful CTS analysis.
Collapse
Affiliation(s)
- Chenwei Tang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Xinyue Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Xiaoyu Yang
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Fei Liu
- Department of Pharmacy and Pharmaceutical Sciences, Faculty of Science, National University of Singapore, Singapore
| | - Jinying Zhao
- Department of Epidemiology, College of Public Health & Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA; Center for Genetic Epidemiology and Bioinformatics, University of Florida, Gainesville, FL, USA
| | - Yin Shen
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Bixiang Liu
- Department of Pharmacy and Pharmaceutical Sciences, Faculty of Science, National University of Singapore, Singapore
- Department of Biomedical Informatics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| |
Collapse
|
18
|
Zhang J, Zhan J, Jin J, Ma C, Zhao R, O'Connell J, Jiang Y, Koelsch BL, Zhang H, Chatterjee N. An ensemble penalized regression method for multi-ancestry polygenic risk prediction. Nat Commun 2024; 15:3238. [PMID: 38622117 PMCID: PMC11271575 DOI: 10.1038/s41467-024-47357-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 03/28/2024] [Indexed: 04/17/2024] Open
Abstract
Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination ofL 1 (lasso) andL 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.
Collapse
Affiliation(s)
- Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| | | | - Jin Jin
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Cheng Ma
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | | | | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
19
|
Hou K, Gogarten S, Kim J, Hua X, Dias JA, Sun Q, Wang Y, Tan T, Atkinson EG, Martin A, Shortt J, Hirbo J, Li Y, Pasaniuc B, Zhang H. Admix-kit: an integrated toolkit and pipeline for genetic analyses of admixed populations. Bioinformatics 2024; 40:btae148. [PMID: 38490256 PMCID: PMC10980565 DOI: 10.1093/bioinformatics/btae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 02/08/2024] [Accepted: 03/13/2024] [Indexed: 03/17/2024] Open
Abstract
SUMMARY Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic studies of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations. AVAILABILITY AND IMPLEMENTATION Admix-kit package is open-source and available at https://github.com/KangchengHou/admix-kit. Additionally, users can use the pipeline designed for admixed genotype simulation available at https://github.com/UW-GAC/admix-kit_workflow.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, United States
| | - Stephanie Gogarten
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, United States
| | - Joohyun Kim
- Vanderbilt Genetics Institute and Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, United States
| | - Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, United States
| | - Julie-Alexia Dias
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02120, United States
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, United States
| | - Ying Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, United States
| | - Taotao Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, United States
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, United States
| | - Alicia Martin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, United States
| | - Jonathan Shortt
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, United States
| | - Jibril Hirbo
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, United States
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, United States
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, United States
| |
Collapse
|