1
|
Ye L, Zhang L, Tang B, Liang J, Tan R, Jiang H, Peng W, Lin N, Li K, Xue C, Li M. Ge-SAND: an explainable deep learning-driven framework for disease risk prediction by uncovering complex genetic interactions in parallel. BMC Genomics 2025; 26:432. [PMID: 40312319 DOI: 10.1186/s12864-025-11588-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Accepted: 04/09/2025] [Indexed: 05/03/2025] Open
Abstract
BACKGROUND Accurate genetic risk prediction and understanding the mechanisms underlying complex diseases are essential for effective intervention and precision medicine. However, current methods often struggle to capture the intricate and subtle genetic interactions contributing to disease risk. This challenge may be further exacerbated by the curse of dimensionality when considering large-scale pairwise genetic combinations with limited samples. Overcoming these limitations could transform biomedicine by providing deeper insights into disease mechanisms, moving beyond black-box models and single-locus analyses, and enabling a more comprehensive understanding of cross-disease patterns. RESULTS We developed Ge-SAND (Genomic Embedding Self-Attention Neurodynamic Decoder), an explainable deep learning-driven framework designed to uncover complex genetic interactions at scales exceeding 106 in parallel for accurate disease risk prediction. Ge-SAND leverages genotype and genomic positional information to identify both intra- and interchromosomal interactions associated with disease phenotypes, providing comprehensive insights into pathogenic mechanisms crucial for disease risk prediction. Applied to simulated datasets and UK Biobank cohorts for Crohn's disease, schizophrenia, and Alzheimer's disease, Ge-SAND achieved up to a 20% improvement in AUC-ROC compared to mainstream methods. Beyond its predictive accuracy, through self-attention-based interaction networks, Ge-SAND provided insights into large-scale genotype relationships and revealed genetic mechanisms underlying these complex diseases. For instance, Ge-SAND identified potential genetic interaction pairs, including novel relationships such as ISOC1 and HOMER2, potentially implicating the brain-gut axis in Crohn's and Alzheimer's diseases. CONCLUSION Ge-SAND is a novel deep-learning approach designed to address the challenges of capturing large-scale genetic interactions. By integrating disease risk prediction with interpretable insights into genetic mechanisms, Ge-SAND offers a valuable tool for advancing genomic research and precision medicine.
Collapse
Affiliation(s)
- Lihang Ye
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Liubin Zhang
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Bin Tang
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Junhao Liang
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Ruijie Tan
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Hui Jiang
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Department of Medical Genetics and Prenatal Diagnosis, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China
| | - Wenjie Peng
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Nan Lin
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Kun Li
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Chao Xue
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China.
| |
Collapse
|
2
|
Le A, Paré G, Devereaux PJ, Quazi I, Mao S, Chong M, Heels-Ansdell D, Duceppe E, Wang MK, Patel A, Tiboni M, Magloire P, Garg AX, Ofori SN, Conen D, Spence J, Belley-Côté E, Beck C, McIntyre WF, Whitlock R, Healey JS, Pettit S, Borges FK. Polygenic Risk Scores in Myocardial Injury After Noncardiac Surgery: A VISION Substudy. JACC. ADVANCES 2025; 4:101680. [PMID: 40147046 PMCID: PMC11992376 DOI: 10.1016/j.jacadv.2025.101680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 02/20/2025] [Accepted: 02/21/2025] [Indexed: 03/29/2025]
Abstract
BACKGROUND Myocardial injury after noncardiac surgery (MINS) is the most prevalent vascular complication following surgical procedures. Although the revised cardiac risk index (RCRI) is widely used to predict postoperative cardiovascular complications, its predictive accuracy is suboptimal. OBJECTIVES Considering genetic influences may improve risk prediction. The authors propose integrating polygenic risk scores (PRS) with the RCRI to enhance MINS prediction. Identification of PRS associated with MINS could provide pathophysiological insights. METHODS This is a case-control study nested within the Vascular Events in Noncardiac Surgery Participants Cohort Evaluation cohort, including patients aged 45 and above who underwent noncardiac surgery. Daily troponin levels were measured preoperatively and on days 1, 2, and 3 postoperatively. PRS was computed for MINS risk factors using publicly available summary statistics. Logistic regression models were used to assess the association between each PRS and MINS. PRS discrimination was assessed independently and in combination with RCRI. RESULTS A total of 253 MINS cases were matched with 253 controls, adjusted for age, sex, and limited to individuals of European ancestry (ntotal = 506). The type II diabetes (T2D) PRS (OR: 1.26; 95% CI: 1.00-1.58; P = 0.047) and the HbA1c PRS (OR: 1.26; 95% CI: 1.03-1.54; P = 0.026) were associated with MINS. No other PRS, including those for coronary artery disease, stroke, and lipid biomarkers, showed significant associations. CONCLUSIONS The T2D PRS and the HbA1c PRS were associated with an increased risk of MINS. The findings may reflect the multifactorial pathophysiology of MINS. Larger genetic studies and trials evaluating perioperative glucose management warrant consideration.
Collapse
Affiliation(s)
- Ann Le
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Medical Sciences, McMaster University, Faculty of Health Sciences, Hamilton, Ontario, Canada
| | - Guillaume Paré
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Medical Sciences, McMaster University, Faculty of Health Sciences, Hamilton, Ontario, Canada; Department of Biochemistry and Biomedical Sciences, McMaster University, Faculty of Health Sciences, Hamilton, Ontario, Canada; Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, Ontario, Canada; Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - P J Devereaux
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Ibrahim Quazi
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Shihong Mao
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
| | - Michael Chong
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Medical Sciences, McMaster University, Faculty of Health Sciences, Hamilton, Ontario, Canada; Department of Biochemistry and Biomedical Sciences, McMaster University, Faculty of Health Sciences, Hamilton, Ontario, Canada
| | - Diane Heels-Ansdell
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Emmanuelle Duceppe
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Centre hospitalier de l'Université de Montréal, Université de Montréal, Montréal, Quebec, Canada
| | - Michael Ke Wang
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Ameen Patel
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Maria Tiboni
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Patrick Magloire
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Amit X Garg
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Division of Nephrology, London Health Sciences Centre, London, Ontario, Canada; Department of Epidemiology and Biostatistics, Western University, London, Ontario, Canada
| | - Sandra N Ofori
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - David Conen
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Jessica Spence
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Anesthesia and Critical Care, McMaster University, Hamilton, Ontario, Canada
| | - Emilie Belley-Côté
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Caleb Beck
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Ecology and Evolution, University of Lausanne, Faculty of Biology and Medicine, Quartier Centre, Lausanne, Switzerland
| | - William F McIntyre
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Richard Whitlock
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
| | - Jeff S Healey
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Shirley Pettit
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada
| | - Flavia K Borges
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Ontario, Canada; Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada.
| |
Collapse
|
3
|
DeCarli C, Rajan KB, Jin LW, Hinman J, Johnson DK, Harvey D, Fornage M. WMH Contributions to Cognitive Impairment: Rationale and Design of the Diverse VCID Study. Stroke 2025; 56:758-776. [PMID: 39545328 PMCID: PMC11850211 DOI: 10.1161/strokeaha.124.045903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
As awareness of dementia increases, more individuals with minor cognitive complaints are requesting clinical assessment. Neuroimaging studies frequently identify incidental white matter hyperintensities, raising patient concerns about their brain health and future risk for dementia. Moreover, current US demographics indicate that ≈50% of these individuals will be from diverse backgrounds by 2060. Racial and ethnic minority populations bear a disproportionate burden of vascular risk factors magnifying dementia risk. Despite established associations between white matter hyperintensities and cognitive impairment, including dementia, no study has comprehensively and prospectively examined the impact of individual and combined magnetic resonance imaging measures of white matter injury, their risk factors, and comorbidities on cognitive performance among a diverse, nondemented, stroke-free population with cognitive complaints over an extended period of observation. The Diverse VCID (Diverse Vascular Cognitive Impairment and Dementia) study is designed to fill this knowledge gap through 3 assessments of clinical, behavioral, and risk factors; neurocognitive and magnetic resonance imaging measures; fluid biomarkers of Alzheimer disease, vascular inflammation, angiogenesis, and endothelial dysfunction; and measures of genetic risk collected prospectively over a minimum of 3 years in a cohort of 2250 individuals evenly distributed among Americans of Black/African, Latino/Hispanic, and non-Hispanic White backgrounds. The goal of this study is to investigate the basic mechanisms of small vessel cerebrovascular injury, emphasizing clinically relevant assessment tools and developing a risk score that will accurately identify at-risk individuals for possible treatment or clinical therapeutic trials, particularly individuals of diverse backgrounds where vascular risk factors and disease are more prevalent.
Collapse
Affiliation(s)
- Charles DeCarli
- Department of Neurology, University of California at Davis, Sacramento, CA, USA
| | - Kumar B. Rajan
- Rush Institute for Healthy Aging, Rush University Medical Center, Chicago IL
| | - Lee-Way Jin
- Department of Pathology and Laboratory Medicine University of California Davis California USA
| | - Jason Hinman
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, United States
| | - David K. Johnson
- Department of Neurology, University of California at Davis, Sacramento, CA, USA
| | - Danielle Harvey
- Department of Public Health Sciences University of California Davis California USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | |
Collapse
|
4
|
Li X, Pang M, Wen J, Zhou LY, Raffield LM, Zhou H, Yao H, Chen C, Sun Q, Li Y. Variational Autoencoder-based Model Improves Polygenic Prediction in Blood Cell Traits. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.13.632820. [PMID: 39868173 PMCID: PMC11760740 DOI: 10.1101/2025.01.13.632820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Genetic prediction of complex traits, enabled by large-scale genomic studies, has created new measures to understand individual genetic predisposition. Polygenic Risk Scores (PRS) offer a way to aggregate information across the genome, enabling personalized risk prediction for complex traits and diseases. However, conventional PRS calculation methods that rely on linear models are limited in their ability to capture complex patterns and interaction effects in high-dimensional genomic data. In this study, we seek to improve the predictive power of PRS through applying advanced deep learning techniques. We show that the Variational AutoEncoder-based model for PRS construction (VAE-PRS) outperforms currently state-of-the-art methods for biobank-level data in 14 out of 16 blood cell traits, while being computationally efficient. Through comprehensive experiments, we found that the VAE-PRS model offers the ability to capture interaction effects in high-dimensional data and shows robust performance across different pre-screened variant sets. Furthermore, VAE-PRS is easily interpretable via assessing the contribution of each individual marker to the final prediction score through the SHapley Additive exPlanations (SHAP) method, providing potential new insights in identifying trait-associated genetic variants. In summary, VAE-PRS presents a novel measure to genetic risk prediction by harnessing the power of deep learning methods, which could further facilitate the development of personalized medicine and genetic research.
Collapse
Affiliation(s)
- Xiaoqi Li
- Carolina Health Informatics Program, University of North Carolina, Chapel Hill, NC, USA
| | - Minxing Pang
- Applied Mathematics & Computational Science Graduate Group, University of Pennsylvania, Philadelphia, PA
| | - Jia Wen
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Laura Y. Zhou
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Haibo Zhou
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Huaxiu Yao
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | - Can Chen
- Carolina Health Informatics Program, University of North Carolina, Chapel Hill, NC, USA
- School of Data Science and Society, University of North Carolina, Chapel Hill, NC, USA
- Department of Mathematics, University of North Carolina, Chapel Hill, NC, USA
| | - Quan Sun
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- These authors jointly supervised this work
| | - Yun Li
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
- These authors jointly supervised this work
| |
Collapse
|
5
|
Rao H, Weiss MC, Moon JY, Perreira KM, Daviglus ML, Kaplan R, North KE, Argos M, Fernández-Rhodes L, Sofer T. Advancements in genetic research by the Hispanic Community Health Study/Study of Latinos: A 10-year retrospective review. HGG ADVANCES 2025; 6:100376. [PMID: 39473183 PMCID: PMC11754138 DOI: 10.1016/j.xhgg.2024.100376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 10/24/2024] [Accepted: 10/24/2024] [Indexed: 11/14/2024] Open
Abstract
The Hispanic Community Health Study/Study of Latinos (HCHS/SOL) is a multicenter, longitudinal cohort study designed to evaluate environmental, lifestyle, and genetic risk factors as they relate to cardiometabolic and other chronic diseases among Hispanic/Latino populations in the United States. Since the study's inception in 2008, as a result of the study's robust genetic measures, HCHS/SOL has facilitated major contributions to the field of genetic research. This 10-year retrospective review highlights the major findings for genotype-phenotype relationships and advancements in statistical methods owing to the HCHS/SOL. Furthermore, we discuss the ethical and societal challenges of genetic research, especially among Hispanic/Latino adults in the United States. Continued genetic research, ancillary study expansion, and consortia collaboration through HCHS/SOL will further drive knowledge and advancements in human genetics research.
Collapse
Affiliation(s)
- Hridya Rao
- Department of Biobehavioral Health, Pennsylvania State University, University Park, PA, USA
| | - Margaret C Weiss
- Department of Epidemiology and Biostatistics, School of Public Health, University of Illinois Chicago, Chicago, IL, USA
| | - Jee Young Moon
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Krista M Perreira
- Department of Social Medicine, University of North Carolina School of Medicine, Chapel Hill, NC, USA
| | - Martha L Daviglus
- Institute for Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
| | - Robert Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA; Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Maria Argos
- Department of Epidemiology and Biostatistics, School of Public Health, University of Illinois Chicago, Chicago, IL, USA; Department of Environmental Health, School of Public Health, Boston University, Boston, MA, USA
| | | | - Tamar Sofer
- Cardiovascular Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
| |
Collapse
|
6
|
Zhao Z, Dorn S, Wu Y, Yang X, Jin J, Lu Q. One score to rule them all: regularized ensemble polygenic risk prediction with GWAS summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.27.625748. [PMID: 39677614 PMCID: PMC11642782 DOI: 10.1101/2024.11.27.625748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Ensemble learning has been increasingly popular for boosting the predictive power of polygenic risk scores (PRS), with almost every recent multi-ancestry PRS approach employing ensemble learning as a final step. Existing ensemble approaches rely on individual-level data for model training, which severely limits their real-world applications, especially in non-European populations without sufficient genomic samples. Here, we introduce a statistical framework to construct regularized ensemble PRS, which allows us to combine a large number of candidate PRS models using only summary statistics from genome-wide association studies. We demonstrate its robust and substantial improvement over many existing PRS models in both within- and cross-ancestry applications. We believe this is truly "one score to rule them all" due to its capability to continuously combine newly developed PRS models with existing models to improve prediction performance, which makes it a universal approach that should always be employed in future PRS applications.
Collapse
Affiliation(s)
- Zijie Zhao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Stephen Dorn
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Yuchang Wu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Xiaoyu Yang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Jin Jin
- Department of Biostatistics, Epidemiology and Bioinformatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
- Department of Statistics, University of Wisconsin-Madison, Madison, WI
| |
Collapse
|
7
|
Sears TJ, Pagadala MS, Castro A, Lee KH, Kong J, Tanaka K, Lippman SM, Zanetti M, Carter H. Integrated Germline and Somatic Features Reveal Divergent Immune Pathways Driving Response to Immune Checkpoint Blockade. Cancer Immunol Res 2024; 12:1780-1795. [PMID: 39255339 PMCID: PMC11612627 DOI: 10.1158/2326-6066.cir-24-0164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 06/13/2024] [Accepted: 09/06/2024] [Indexed: 09/12/2024]
Abstract
Immune checkpoint blockade (ICB) has revolutionized cancer treatment; however, the mechanisms determining patient response remain poorly understood. Here, we used machine learning to predict ICB response from germline and somatic biomarkers and interpreted the learned model to uncover putative mechanisms driving superior outcomes. Patients with higher infiltration of T-follicular helper cells had responses even in the presence of defects in the MHC class-I (MHC-I). Further investigation uncovered different ICB responses in tumors when responses were reliant on MHC-I versus MHC-II neoantigens. Despite similar response rates, MHC II-reliant responses were associated with significantly longer durable clinical benefits (discovery: median overall survival of 63.6 vs. 34.5 months; P = 0.0074; validation: median overall survival of 37.5 vs. 33.1 months; P = 0.040). Characteristics of the tumor immune microenvironment reflected MHC neoantigen reliance, and analysis of immune checkpoints revealed LAG3 as a potential target in MHC II-reliant but not MHC I-reliant responses. This study highlights the value of interpretable machine learning models in elucidating the biological basis of therapy responses.
Collapse
Affiliation(s)
- Timothy J. Sears
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California
| | - Meghana S. Pagadala
- Biomedical Sciences Program, University of California San Diego, La Jolla, California
| | - Andrea Castro
- Tumour Immunogenomics and Immunosurveillance Laboratory, University College London Cancer Institute, London, United Kingdom
| | - Ko-han Lee
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California
| | - JungHo Kong
- Division of Genomics and Precision Medicine, Department of Medicine, University of California San Diego, La Jolla, California
| | - Kairi Tanaka
- School of Biological Sciences, University of California San Diego, La Jolla, California
| | - Scott M. Lippman
- Moores Cancer Center, University of California San Diego, La Jolla, California
| | - Maurizio Zanetti
- Moores Cancer Center, University of California San Diego, La Jolla, California
- The Laboratory of Immunology, Moores Cancer Center and Department of Medicine, University of California San Diego, La Jolla, California
| | - Hannah Carter
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California
- Moores Cancer Center, University of California San Diego, La Jolla, California
- The Laboratory of Immunology, Moores Cancer Center and Department of Medicine, University of California San Diego, La Jolla, California
| |
Collapse
|
8
|
Rajagopalan RM, D'Antonio M, Fujimura JH. Enhancing Equity in Genomics: Incorporating Measures of Structural Racism, Discrimination, and Social Determinants of Health. Hastings Cent Rep 2024; 54 Suppl 2:S31-S40. [PMID: 39707937 DOI: 10.1002/hast.4927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2024]
Abstract
The everyday harms of structural racism and discrimination, perpetuated through institutions, laws, policies, and practices, constitute social determinants of health, but measures that account for their debilitating effects are largely missing in genetic studies of complex diseases. Drawing on insights from the social sciences and public health, we propose critical methodologies for incorporating tools that measure structural racism and discrimination within genetic analyses. We illustrate how including these measures may strengthen the accuracy and utility of findings for diverse communities, clarify elusive relationships between genetics and environment in a racialized society, and support greater equity within genomics and precision health research. This approach may also support efforts to build and sustain vital partnerships with communities and with other fields of research inquiry, centering community expertise and lived experiences and drawing on valuable knowledge from practitioners in the social sciences and public health to innovate biomedical and genomic study designs aimed at community health priorities.
Collapse
|
9
|
Liu K, Liao C. Examining the importance of neighborhood natural, and built environment factors in predicting older adults' mental well-being: An XGBoost-SHAP approach. ENVIRONMENTAL RESEARCH 2024; 262:119929. [PMID: 39251179 DOI: 10.1016/j.envres.2024.119929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 09/01/2024] [Accepted: 09/03/2024] [Indexed: 09/11/2024]
Abstract
BACKGROUND Previous studies have shown that urban neighborhood environmental factors significantly influence the health outcomes of urban older adults. However, most cross-sectional studies exploring the health effects of these factors have failed to quantify the relative importance of each factor. METHODS We use XGBoost machine learning techniques and SHAPley Additive Interpretation (SHAP) to rank the importance of urban neighborhood environmental factors in shaping the mental health of urban older adults. To address self-selection bias in housing choice, we distinguish older adults living in private housing from those living in public as residents in private housing have more freedom to choose where to live. RESULTS The results show that both natural and built environmental factors in urban neighborhoods are important predictors of mental well-being scores. Five natural environmental factors (blue space, perceived greenery quantity, NDVI, street view greenness, aesthetic quality) and three built environmental factors (physical activity facilities quality, physical activity facilities quantity, neighborhood disorder) had considerable predictive power for mental well-being scores in two groups. Among them, blue space, perceived greenery quantity and street view greenness quantity became less important after controlling for self-selection bias, possibly because of the unequal distribution of quantity and quality, and the performance of neighborhood disorder, aesthetic quality and physical activity facilities quality was more sensitive in public housing. CONCLUSIONS These results highlight the nuanced and differential effects of neighborhood environmental exposures on mental well-being outcomes, depending on housing preferences. The results of this study can provide support for decision makers in urban planning, landscape design and environmental management in order to improve the mental well-being status of urban older adults.
Collapse
Affiliation(s)
- Kaijun Liu
- Institute of Chengdu-Chongqing Economic Zone Development, Chongqing Technology and Business University, Chongqing, 400067, China.
| | - Changni Liao
- Chongqing Nursing Vocational College, Chongqing, 402760, China
| |
Collapse
|
10
|
Lück S, Scholz U, Douchkov D. Introducing GWAStic: a user-friendly, cross-platform solution for genome-wide association studies and genomic prediction. BIOINFORMATICS ADVANCES 2024; 4:vbae177. [PMID: 39678203 PMCID: PMC11643344 DOI: 10.1093/bioadv/vbae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/23/2024] [Accepted: 11/09/2024] [Indexed: 12/17/2024]
Abstract
Motivation Advances in genomics have created an insistent need for accessible tools that simplify complex genetic data analysis, enabling researchers across fields to harness the power of genome-wide association studies and genomic prediction. GWAStic was developed to bridge this gap, providing an intuitive platform that combines artificial intelligence with traditional statistical methods, making sophisticated genomic analysis accessible without requiring deep expertise in statistical software. Results We present GWAStic, an intuitive, cross-platform desktop application designed to streamline genome-wide association studies and genomic prediction for biological and medical researchers. With a user-friendly graphical interface, GWAStic integrates machine learning and traditional statistical approaches to support genetic analysis. The application accepts inputs from standard text-based Variant Call Formats and PLINK binary files, generating clear graphical outputs, including Manhattan plots, quantile-quantile plots, and genomic prediction correlation plots to enhance data visualization and analysis. Availability and implementation Project page: https://github.com/snowformatics/gwastic_desktop; GWAStic documentation: https://snowformatics.gitbook.io/product-docs; PyPI: https://pypi.org/project/gwastic-desktop/.
Collapse
Affiliation(s)
- Stefanie Lück
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, D-06466 Seeland, Germany
| | - Uwe Scholz
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, D-06466 Seeland, Germany
| | - Dimitar Douchkov
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, D-06466 Seeland, Germany
| |
Collapse
|
11
|
He R, Fu J, Ren J, Pan W. Trait imputation enhances nonlinear genetic prediction for some traits. Genetics 2024; 228:iyae148. [PMID: 39255064 DOI: 10.1093/genetics/iyae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 08/29/2024] [Accepted: 09/03/2024] [Indexed: 09/12/2024] Open
Abstract
The expansive collection of genetic and phenotypic data within biobanks offers an unprecedented opportunity for biomedical research. However, the frequent occurrence of missing phenotypes presents a significant barrier to fully leveraging this potential. In our target application, on one hand, we have only a small and complete dataset with both genotypes and phenotypes to build a genetic prediction model, commonly called a polygenic (risk) score (PGS or PRS); on the other hand, we have a large dataset of genotypes (e.g. from a biobank) without the phenotype of interest. Our goal is to leverage the large dataset of genotypes (but without the phenotype) and a separate genome-wide association studies summary dataset of the phenotype to impute the phenotypes, which are then used as an individual-level dataset, along with the small complete dataset, to build a nonlinear model as PGS. More specifically, we trained some nonlinear models to 7 imputed and observed phenotypes from the UK Biobank data. We then trained an ensemble model to integrate these models for each trait, resulting in higher R2 values in prediction than using only the small complete (observed) dataset. Additionally, for 2 of the 7 traits, we observed that the nonlinear model trained with the imputed traits had higher R2 than using the imputed traits directly as the PGS, while for the remaining 5 traits, no improvement was found. These findings demonstrate the potential of leveraging existing genetic data and accounting for nonlinear genetic relationships to improve prediction accuracy for some traits.
Collapse
Affiliation(s)
- Ruoyu He
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, USA
- School of Statistics, University of Minnesota, Minneapolis, MN 55414, USA
| | - Jinwen Fu
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, USA
- School of Statistics, University of Minnesota, Minneapolis, MN 55414, USA
| | - Jingchen Ren
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, USA
- School of Statistics, University of Minnesota, Minneapolis, MN 55414, USA
| | - Wei Pan
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, USA
| |
Collapse
|
12
|
Robertson AM, Piggott JJ, Penk MR. Improving multiple stressor-response models through the inclusion of nonlinearity and interactions among stressor gradients. ENVIRONMENTAL MONITORING AND ASSESSMENT 2024; 196:1026. [PMID: 39373764 DOI: 10.1007/s10661-024-13169-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 09/24/2024] [Indexed: 10/08/2024]
Abstract
Stressor-response models are used to detect and predict changes within ecosystems in response to anthropogenic and naturally occurring stressors. While nonlinear stressor-response relationships and interactions between stressors are common in nature, predictive models often do not account for them due to perceived difficulties in the interpretation of results. We used Irish river monitoring data from 177 river sites to investigate if multiple stressor-response models can be improved by accounting for nonlinearity, interactions in stressor-response relationships and environmental context dependencies. Out of the six models of distinct biological responses, five models benefited from the inclusion of nonlinearity while all six benefited from the inclusion of interactions. The addition of nonlinearity means that we can better see the exponential increase in Trophic Diatom Index (TDI3) as phosphorus increases, inferring ecological conditions deteriorating at a faster rate with increasing phosphorus. Furthermore, our results show that the relationship between stressor and response has the potential to be dependent on other variables, as seen in the interaction of elevation with both siltation and nutrients in relation to Ephemeroptera, Plecoptera and Trichoptera (EPT) richness. Both relationships weakened at higher elevations, perhaps demonstrating that there is a decreased capacity for resilience to stressors at lower elevations due to greater cumulative effects. Understanding interactions such as this is vital to managing ecosystems. Our findings provide empirical support for the need to further develop and employ more complex modelling techniques in environmental assessment and management.
Collapse
Affiliation(s)
- Aoife M Robertson
- School of Natural Sciences, Trinity College Dublin, The University of Dublin, Dublin, Ireland.
| | - Jeremy J Piggott
- School of Natural Sciences, Trinity College Dublin, The University of Dublin, Dublin, Ireland
| | - Marcin R Penk
- School of Natural Sciences, Trinity College Dublin, The University of Dublin, Dublin, Ireland
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
| |
Collapse
|
13
|
Hrytsenko Y, Spitzer BW, Wang H, Bertisch SM, Taylor K, Garcia-Bedoya O, Ramos AR, Daviglus ML, Gallo LC, Isasi C, Cai J, Qi Q, Alcantara C, Redline S, Sofer T. Obstructive sleep apnea mediates genetic risk of Diabetes Mellitus: The Hispanic Community Health Study/Study of Latinos. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.10.24313336. [PMID: 39314966 PMCID: PMC11419195 DOI: 10.1101/2024.09.10.24313336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Objective We sought to evaluate whether obstructive sleep apnea (OSA), and other sleep disorders, increase genetic risk of developing diabetes mellitus (DM). Research Design and Methods Using GWAS summary statistics from the DIAGRAM consortium and Million Veteran Program, we developed multi-ancestry Type 2 Diabetes (T2D) polygenic risk scores (T2D-PRSs) useful in admixed Hispanic/Latino individuals. We estimated the association of the T2D-PRS with cross-sectional and incident DM in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). We conducted a mediation analysis with T2D-PRSs as an exposure, incident DM as an outcome, and OSA as a mediator. Additionally, we performed Mendelian randomization (MR) analysis to assess the causal relationship between T2D and OSA. Results Of 12,342 HCHS/SOL participants, at baseline, 48.4% were normoglycemic, 36.6% were hyperglycemic, and 15% had diabetes, and 50.9% identified as female. Mean age was 41.5, and mean BMI was 29.4. T2D-PRSs was strongly associated with baseline DM and with incident DM. At baseline, a 1 SD increase in the primary T2D-PRS had DM adjusted odds ratio (OR) = 2.67, 95% CI [2.40; 2.97] and a higher incident DM rate (incident rate ratio (IRR) = 2.02, 95% CI [1.75; 2.33]). In a stratified analysis based on OSA severity categories the associations were stronger in individuals with mild OSA compared to those with moderate to severe OSA. Mediation analysis suggested that OSA mediates the T2D-PRS association with DM. In two-sample MR analysis, T2D-PRS had a causal effect on OSA, OR = 1.03, 95% CI [1.01; 1.05], and OSA had a causal effect on T2D, with OR = 2.34, 95% CI [1.59; 3.44]. Conclusions OSA likely mediates genetic effects on T2D.
Collapse
Affiliation(s)
- Yana Hrytsenko
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Brian W. Spitzer
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Heming Wang
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Suzanne M. Bertisch
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Kent Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Olga Garcia-Bedoya
- Division of Academic Internal Medicine and Geriatrics, College of Medicine, University of Illinois Chicago, Chicago, Illinois, USA
| | - Alberto R Ramos
- Department of Neurology, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Martha L. Daviglus
- DInsititute for Minority Health Research, Department of Medicine, College of Medicine University of Illinois Chicago, Chicago, IL, USA
| | - Linda C Gallo
- Department of Psychology, San Diego State University, San Diego, CA, USA
| | - Carmen Isasi
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Qibin Qi
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, New York, USA
| | | | - Susan Redline
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Tamar Sofer
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
14
|
Bhatt IS, Raygoza Garay JA, Bhagavan SG, Ingalls V, Dias R, Torkamani A. Polygenic Risk Score-Based Association Analysis Identifies Genetic Comorbidities Associated with Age-Related Hearing Difficulty in Two Independent Samples. J Assoc Res Otolaryngol 2024; 25:387-406. [PMID: 38782831 PMCID: PMC11349729 DOI: 10.1007/s10162-024-00947-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 04/18/2024] [Indexed: 05/25/2024] Open
Abstract
PURPOSE Age-related hearing loss is the most common form of permanent hearing loss that is associated with various health traits, including Alzheimer's disease, cognitive decline, and depression. The present study aims to identify genetic comorbidities of age-related hearing loss. Past genome-wide association studies identified multiple genomic loci involved in common adult-onset health traits. Polygenic risk scores (PRS) could summarize the polygenic inheritance and quantify the genetic susceptibility of complex traits independent of trait expression. The present study conducted a PRS-based association analysis of age-related hearing difficulty in the UK Biobank sample (N = 425,240), followed by a replication analysis using hearing thresholds (HTs) and distortion-product otoacoustic emissions (DPOAEs) in 242 young adults with self-reported normal hearing. We hypothesized that young adults with genetic comorbidities associated with age-related hearing difficulty would exhibit subclinical decline in HTs and DPOAEs in both ears. METHODS A total of 111,243 participants reported age-related hearing difficulty in the UK Biobank sample (> 40 years). The PRS models were derived from the polygenic risk score catalog to obtain 2627 PRS predictors across the health spectrum. HTs (0.25-16 kHz) and DPOAEs (1-16 kHz, L1/L2 = 65/55 dB SPL, F2/F1 = 1.22) were measured on 242 young adults. Saliva-derived DNA samples were subjected to low-pass whole genome sequencing, followed by genome-wide imputation and PRS calculation. The logistic regression analyses were performed to identify PRS predictors of age-related hearing difficulty in the UK Biobank cohort. The linear mixed model analyses were performed to identify PRS predictors of HTs and DPOAEs. RESULTS The PRS-based association analysis identified 977 PRS predictors across the health spectrum associated with age-related hearing difficulty. Hearing difficulty and hearing aid use PRS predictors revealed the strongest association with the age-related hearing difficulty phenotype. Youth with a higher genetic predisposition to hearing difficulty revealed a subclinical elevation in HTs and a decline in DPOAEs in both ears. PRS predictors associated with age-related hearing difficulty were enriched for mental health, lifestyle, metabolic, sleep, reproductive, digestive, respiratory, hematopoietic, and immune traits. Fifty PRS predictors belonging to various trait categories were replicated for HTs and DPOAEs in both ears. CONCLUSION The study identified genetic comorbidities associated with age-related hearing loss across the health spectrum. Youth with a high genetic predisposition to age-related hearing difficulty and other related complex traits could exhibit sub-clinical decline in HTs and DPOAEs decades before clinically meaningful age-related hearing loss is observed. We posit that effective communication of genetic risk, promoting a healthy lifestyle, and reducing exposure to environmental risk factors at younger ages could help prevent or delay the onset of age-related hearing difficulty at older ages.
Collapse
Affiliation(s)
- Ishan Sunilkumar Bhatt
- Department of Communication Sciences & Disorders, University of Iowa, 250 Hawkins Dr, Iowa City, IA, 52242, USA.
| | - Juan Antonio Raygoza Garay
- Department of Communication Sciences & Disorders, University of Iowa, 250 Hawkins Dr, Iowa City, IA, 52242, USA
- Holden Comprehensive Cancer Center, University of Iowa, Iowa City, IA, 52242, USA
| | - Srividya Grama Bhagavan
- Department of Communication Sciences & Disorders, University of Iowa, 250 Hawkins Dr, Iowa City, IA, 52242, USA
| | - Valerie Ingalls
- Department of Communication Sciences & Disorders, University of Iowa, 250 Hawkins Dr, Iowa City, IA, 52242, USA
| | - Raquel Dias
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32608, USA
| | - Ali Torkamani
- Department of Integrative Structural and Computational Biology, Scripps Science Institute, La Jolla, CA, 92037, USA
| |
Collapse
|
15
|
Elias P, Jain SS, Poterucha T, Randazzo M, Lopez Jimenez F, Khera R, Perez M, Ouyang D, Pirruccello J, Salerno M, Einstein AJ, Avram R, Tison GH, Nadkarni G, Natarajan V, Pierson E, Beecy A, Kumaraiah D, Haggerty C, Avari Silva JN, Maddox TM. Artificial Intelligence for Cardiovascular Care-Part 1: Advances: JACC Review Topic of the Week. J Am Coll Cardiol 2024; 83:2472-2486. [PMID: 38593946 DOI: 10.1016/j.jacc.2024.03.400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 03/14/2024] [Indexed: 04/11/2024]
Abstract
Recent artificial intelligence (AI) advancements in cardiovascular care offer potential enhancements in diagnosis, treatment, and outcomes. Innovations to date focus on automating measurements, enhancing image quality, and detecting diseases using novel methods. Applications span wearables, electrocardiograms, echocardiography, angiography, genetics, and more. AI models detect diseases from electrocardiograms at accuracy not previously achieved by technology or human experts, including reduced ejection fraction, valvular heart disease, and other cardiomyopathies. However, AI's unique characteristics necessitate rigorous validation by addressing training methods, real-world efficacy, equity concerns, and long-term reliability. Despite an exponentially growing number of studies in cardiovascular AI, trials showing improvement in outcomes remain lacking. A number are currently underway. Embracing this rapidly evolving technology while setting a high evaluation benchmark will be crucial for cardiology to leverage AI to enhance patient care and the provider experience.
Collapse
Affiliation(s)
- Pierre Elias
- Seymour, Paul and Gloria Milstein Division of Cardiology, Columbia University Irving Medical Center, New York, New York, USA; Department of Biomedical Informatics Columbia University Irving Medical Center, New York, New York, USA
| | - Sneha S Jain
- Division of Cardiology, Stanford University School of Medicine, Palo Alto, California, USA
| | - Timothy Poterucha
- Seymour, Paul and Gloria Milstein Division of Cardiology, Columbia University Irving Medical Center, New York, New York, USA
| | - Michael Randazzo
- Division of Cardiology, University of Chicago Medical Center, Chicago, Illinois, USA
| | | | - Rohan Khera
- Division of Cardiology, Yale School of Medicine, New Haven, Connecticut, USA
| | - Marco Perez
- Division of Cardiology, Stanford University School of Medicine, Palo Alto, California, USA
| | - David Ouyang
- Division of Cardiology, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - James Pirruccello
- Division of Cardiology, University of California-San Francisco, San Francisco, California, USA
| | - Michael Salerno
- Division of Cardiology, Stanford University School of Medicine, Palo Alto, California, USA
| | - Andrew J Einstein
- Seymour, Paul and Gloria Milstein Division of Cardiology, Columbia University Irving Medical Center, New York, New York, USA
| | - Robert Avram
- Division of Cardiology, Montreal Heart Institute, Montreal, Quebec, Canada
| | - Geoffrey H Tison
- Division of Cardiology, University of California-San Francisco, San Francisco, California, USA
| | - Girish Nadkarni
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | | | - Emma Pierson
- Department of Computer Science, Cornell Tech, New York, New York, USA
| | - Ashley Beecy
- NewYork-Presbyterian Health System, New York, New York, USA; Division of Cardiology, Weill Cornell Medical College, New York, New York, USA
| | - Deepa Kumaraiah
- Seymour, Paul and Gloria Milstein Division of Cardiology, Columbia University Irving Medical Center, New York, New York, USA; NewYork-Presbyterian Health System, New York, New York, USA
| | - Chris Haggerty
- Department of Biomedical Informatics Columbia University Irving Medical Center, New York, New York, USA; NewYork-Presbyterian Health System, New York, New York, USA
| | - Jennifer N Avari Silva
- Division of Cardiology, Washington University School of Medicine, St Louis, Missouri, USA
| | - Thomas M Maddox
- Division of Cardiology, Washington University School of Medicine, St Louis, Missouri, USA.
| |
Collapse
|
16
|
Gao Y, Cui Y. Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement. Genome Med 2024; 16:76. [PMID: 38835075 PMCID: PMC11149372 DOI: 10.1186/s13073-024-01345-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 05/17/2024] [Indexed: 06/06/2024] Open
Abstract
BACKGROUND Accurate prediction of an individual's predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets. METHODS We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer's disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups. RESULTS Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations. CONCLUSIONS This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Yan Cui
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
- Center for Cancer Research, University of Tennessee Health Science Center, Memphis, TN, 38163, USA.
| |
Collapse
|
17
|
Hrytsenko Y, Shea B, Elgart M, Kurniansyah N, Lyons G, Morrison AC, Carson AP, Haring B, Mitchell BD, Psaty BM, Jaeger BC, Gu CC, Kooperberg C, Levy D, Lloyd-Jones D, Choi E, Brody JA, Smith JA, Rotter JI, Moll M, Fornage M, Simon N, Castaldi P, Casanova R, Chung RH, Kaplan R, Loos RJF, Kardia SLR, Rich SS, Redline S, Kelly T, O'Connor T, Zhao W, Kim W, Guo X, Ida Chen YD, Sofer T. Machine learning models for predicting blood pressure phenotypes by combining multiple polygenic risk scores. Sci Rep 2024; 14:12436. [PMID: 38816422 PMCID: PMC11139858 DOI: 10.1038/s41598-024-62945-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 05/22/2024] [Indexed: 06/01/2024] Open
Abstract
We construct non-linear machine learning (ML) prediction models for systolic and diastolic blood pressure (SBP, DBP) using demographic and clinical variables and polygenic risk scores (PRSs). We developed a two-model ensemble, consisting of a baseline model, where prediction is based on demographic and clinical variables only, and a genetic model, where we also include PRSs. We evaluate the use of a linear versus a non-linear model at both the baseline and the genetic model levels and assess the improvement in performance when incorporating multiple PRSs. We report the ensemble model's performance as percentage variance explained (PVE) on a held-out test dataset. A non-linear baseline model improved the PVEs from 28.1 to 30.1% (SBP) and 14.3% to 17.4% (DBP) compared with a linear baseline model. Including seven PRSs in the genetic model computed based on the largest available GWAS of SBP/DBP improved the genetic model PVE from 4.8 to 5.1% (SBP) and 4.7 to 5% (DBP) compared to using a single PRS. Adding additional 14 PRSs computed based on two independent GWASs further increased the genetic model PVE to 6.3% (SBP) and 5.7% (DBP). PVE differed across self-reported race/ethnicity groups, with primarily all non-White groups benefitting from the inclusion of additional PRSs. In summary, non-linear ML models improves BP prediction in models incorporating diverse populations.
Collapse
Affiliation(s)
- Yana Hrytsenko
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Benjamin Shea
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Michael Elgart
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | | | - Genevieve Lyons
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alanna C Morrison
- Department of Epidemiology, School of Public Health, Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - April P Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Bernhard Haring
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Department of Medicine III, Saarland University, Homburg, Saarland, Germany
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Bruce M Psaty
- Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
- Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Byron C Jaeger
- Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - C Charles Gu
- The Center for Biostatistics and Data Science, Washington University, St. Louis, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Daniel Levy
- The Population Sciences Branch of the National Heart, Lung and Blood Institute, Bethesda, MD, USA
- The Framingham Heart Study, Framingham, MA, USA
| | - Donald Lloyd-Jones
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Eunhee Choi
- Columbia Hypertension Laboratory, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Jennifer A Brody
- Department of Medicine, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Jerome I Rotter
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Matthew Moll
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, West Roxbury, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, USA
| | - Myriam Fornage
- Department of Epidemiology, School of Public Health, Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Noah Simon
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA
| | - Peter Castaldi
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Ramon Casanova
- Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Taipei City, Taiwan
| | - Robert Kaplan
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty for Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Susan Redline
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | - Tanika Kelly
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Timothy O'Connor
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Health Equity and Population Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Wonji Kim
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, USA
| | - Xiuqing Guo
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Yii-Der Ida Chen
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Tamar Sofer
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Center for Life Sciences CLS-934, 3 Blackfan St., Boston, MA, 02115, USA.
| |
Collapse
|
18
|
Ohta R, Tanigawa Y, Suzuki Y, Kellis M, Morishita S. A polygenic score method boosted by non-additive models. Nat Commun 2024; 15:4433. [PMID: 38811555 PMCID: PMC11522481 DOI: 10.1038/s41467-024-48654-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 05/08/2024] [Indexed: 05/31/2024] Open
Abstract
Dominance heritability in complex traits has received increasing recognition. However, most polygenic score (PGS) approaches do not incorporate non-additive effects. Here, we present GenoBoost, a flexible PGS modeling framework capable of considering both additive and non-additive effects, specifically focusing on genetic dominance. Building on statistical boosting theory, we derive provably optimal GenoBoost scores and provide its efficient implementation for analyzing large-scale cohorts. We benchmark it against seven commonly used PGS methods and demonstrate its competitive predictive performance. GenoBoost is ranked the best for four traits and second-best for three traits among twelve tested disease outcomes in UK Biobank. We reveal that GenoBoost improves prediction for autoimmune diseases by incorporating non-additive effects localized in the MHC locus and, more broadly, works best in less polygenic traits. We further demonstrate that GenoBoost can infer the mode of genetic inheritance without requiring prior knowledge. For example, GenoBoost finds non-zero genetic dominance effects for 602 of 900 selected genetic variants, resulting in 2.5% improvements in predicting psoriasis cases. Lastly, we show that GenoBoost can prioritize genetic loci with genetic dominance not previously reported in the GWAS catalog. Our results highlight the increased accuracy and biological insights from incorporating non-additive effects in PGS models.
Collapse
Affiliation(s)
- Rikifumi Ohta
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
| | - Yosuke Tanigawa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
| |
Collapse
|
19
|
Acosta A, Cifuentes L, Anazco D, O'Connor T, Hurtado M, Ghusn W, Campos A, Fansa S, McRae A, Madhusudhan S, Kolkin E, Ryks M, Harmsen W, Abu Dayyeh B, Hensrud D, Camilleri M. Unraveling the Variability of Human Satiation: Implications for Precision Obesity Management. RESEARCH SQUARE 2024:rs.3.rs-4402499. [PMID: 38826309 PMCID: PMC11142367 DOI: 10.21203/rs.3.rs-4402499/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Satiation is the physiologic process that regulates meal size and termination, and it is quantified by the calories consumed to reach satiation. Given its role in energy intake, changes in satiation contribute to obesity's pathogenesis. Our study employed a protocolized approach to study the components of food intake regulation including a standardized breakfast, a gastric emptying study, appetite sensation testing, and a satiation measurement by an ad libitummeal test. These studies revealed that satiation is highly variable among individuals, and while baseline characteristics, anthropometrics, body composition and hormones, contribute to this variability, these factors do not fully account for it. To address this gap, we explored the role of a germline polygenic risk score, which demonstrated a robust association with satiation. Furthermore, we developed a machine-learning-assisted gene risk score to predict satiation and leveraged this prediction to anticipate responses to anti-obesity medications. Our findings underscore the significance of satiation, its inherent variability, and the potential of a genetic risk score to forecast it, ultimately allowing us to predict responses to different anti-obesity interventions.
Collapse
|
20
|
Huang YJ, Chen CH, Yang HC. AI-enhanced integration of genetic and medical imaging data for risk assessment of Type 2 diabetes. Nat Commun 2024; 15:4230. [PMID: 38762475 PMCID: PMC11102564 DOI: 10.1038/s41467-024-48618-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 05/08/2024] [Indexed: 05/20/2024] Open
Abstract
Type 2 diabetes (T2D) presents a formidable global health challenge, highlighted by its escalating prevalence, underscoring the critical need for precision health strategies and early detection initiatives. Leveraging artificial intelligence, particularly eXtreme Gradient Boosting (XGBoost), we devise robust risk assessment models for T2D. Drawing upon comprehensive genetic and medical imaging datasets from 68,911 individuals in the Taiwan Biobank, our models integrate Polygenic Risk Scores (PRS), Multi-image Risk Scores (MRS), and demographic variables, such as age, sex, and T2D family history. Here, we show that our model achieves an Area Under the Receiver Operating Curve (AUC) of 0.94, effectively identifying high-risk T2D subgroups. A streamlined model featuring eight key variables also maintains a high AUC of 0.939. This high accuracy for T2D risk assessment promises to catalyze early detection and preventive strategies. Moreover, we introduce an accessible online risk assessment tool for T2D, facilitating broader applicability and dissemination of our findings.
Collapse
Affiliation(s)
- Yi-Jia Huang
- Institute of Public Health, National Yang-Ming Chiao-Tung University, Taipei, Taiwan
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Chun-Houh Chen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Hsin-Chou Yang
- Institute of Public Health, National Yang-Ming Chiao-Tung University, Taipei, Taiwan.
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan.
- Biomedical Translation Research Center, Academia Sinica, Taipei, Taiwan.
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan.
| |
Collapse
|
21
|
Armoundas AA, Narayan SM, Arnett DK, Spector-Bagdady K, Bennett DA, Celi LA, Friedman PA, Gollob MH, Hall JL, Kwitek AE, Lett E, Menon BK, Sheehan KA, Al-Zaiti SS. Use of Artificial Intelligence in Improving Outcomes in Heart Disease: A Scientific Statement From the American Heart Association. Circulation 2024; 149:e1028-e1050. [PMID: 38415358 PMCID: PMC11042786 DOI: 10.1161/cir.0000000000001201] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/29/2024]
Abstract
A major focus of academia, industry, and global governmental agencies is to develop and apply artificial intelligence and other advanced analytical tools to transform health care delivery. The American Heart Association supports the creation of tools and services that would further the science and practice of precision medicine by enabling more precise approaches to cardiovascular and stroke research, prevention, and care of individuals and populations. Nevertheless, several challenges exist, and few artificial intelligence tools have been shown to improve cardiovascular and stroke care sufficiently to be widely adopted. This scientific statement outlines the current state of the art on the use of artificial intelligence algorithms and data science in the diagnosis, classification, and treatment of cardiovascular disease. It also sets out to advance this mission, focusing on how digital tools and, in particular, artificial intelligence may provide clinical and mechanistic insights, address bias in clinical studies, and facilitate education and implementation science to improve cardiovascular and stroke outcomes. Last, a key objective of this scientific statement is to further the field by identifying best practices, gaps, and challenges for interested stakeholders.
Collapse
|
22
|
Shah Y, Kulm S, Nauseef JT, Chen Z, Elemento O, Kensler KH, Sharaf RN. Benchmarking multi-ancestry prostate cancer polygenic risk scores in a real-world cohort. PLoS Comput Biol 2024; 20:e1011990. [PMID: 38598551 PMCID: PMC11034641 DOI: 10.1371/journal.pcbi.1011990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 04/22/2024] [Accepted: 03/11/2024] [Indexed: 04/12/2024] Open
Abstract
Prostate cancer is a heritable disease with ancestry-biased incidence and mortality. Polygenic risk scores (PRSs) offer promising advancements in predicting disease risk, including prostate cancer. While their accuracy continues to improve, research aimed at enhancing their effectiveness within African and Asian populations remains key for equitable use. Recent algorithmic developments for PRS derivation have resulted in improved pan-ancestral risk prediction for several diseases. In this study, we benchmark the predictive power of six widely used PRS derivation algorithms, including four of which adjust for ancestry, against prostate cancer cases and controls from the UK Biobank and All of Us cohorts. We find modest improvement in discriminatory ability when compared with a simple method that prioritizes variants, clumping, and published polygenic risk scores. Our findings underscore the importance of improving upon risk prediction algorithms and the sampling of diverse cohorts.
Collapse
Affiliation(s)
- Yajas Shah
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York City, New York, United States of America
| | - Scott Kulm
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York City, New York, United States of America
| | - Jones T. Nauseef
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Medicine—Hematology and Medical Oncology, Weill Cornell Medicine, New York City, New York, United States of America
| | - Zhengming Chen
- Department of Population Health Sciences, Weill Cornell Medicine, New York City, New York, United States of America
| | - Olivier Elemento
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York City, New York, United States of America
| | - Kevin H. Kensler
- Department of Population Health Sciences, Weill Cornell Medicine, New York City, New York, United States of America
| | - Ravi N. Sharaf
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Population Health Sciences, Weill Cornell Medicine, New York City, New York, United States of America
- Department of Medicine–Gastroenterology and Hepatology, Weill Cornell Medicine, New York City, New York, United States of America
| |
Collapse
|
23
|
Fong WJ, Tan HM, Garg R, Teh AL, Pan H, Gupta V, Krishna B, Chen ZH, Purwanto NY, Yap F, Tan KH, Chan KYJ, Chan SY, Goh N, Rane N, Tan ESE, Jiang Y, Han M, Meaney M, Wang D, Keppo J, Tan GCY. Comparing feature selection and machine learning approaches for predicting CYP2D6 methylation from genetic variation. Front Neuroinform 2024; 17:1244336. [PMID: 38449836 PMCID: PMC10915285 DOI: 10.3389/fninf.2023.1244336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 10/18/2023] [Indexed: 03/08/2024] Open
Abstract
Introduction Pharmacogenetics currently supports clinical decision-making on the basis of a limited number of variants in a few genes and may benefit paediatric prescribing where there is a need for more precise dosing. Integrating genomic information such as methylation into pharmacogenetic models holds the potential to improve their accuracy and consequently prescribing decisions. Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene conventionally associated with the metabolism of commonly used drugs and endogenous substrates. We thus sought to predict epigenetic loci from single nucleotide polymorphisms (SNPs) related to CYP2D6 in children from the GUSTO cohort. Methods Buffy coat DNA methylation was quantified using the Illumina Infinium Methylation EPIC beadchip. CpG sites associated with CYP2D6 were used as outcome variables in Linear Regression, Elastic Net and XGBoost models. We compared feature selection of SNPs from GWAS mQTLs, GTEx eQTLs and SNPs within 2 MB of the CYP2D6 gene and the impact of adding demographic data. The samples were split into training (75%) sets and test (25%) sets for validation. In Elastic Net model and XGBoost models, optimal hyperparameter search was done using 10-fold cross validation. Root Mean Square Error and R-squared values were obtained to investigate each models' performance. When GWAS was performed to determine SNPs associated with CpG sites, a total of 15 SNPs were identified where several SNPs appeared to influence multiple CpG sites. Results Overall, Elastic Net models of genetic features appeared to perform marginally better than heritability estimates and substantially better than Linear Regression and XGBoost models. The addition of nongenetic features appeared to improve performance for some but not all feature sets and probes. The best feature set and Machine Learning (ML) approach differed substantially between CpG sites and a number of top variables were identified for each model. Discussion The development of SNP-based prediction models for CYP2D6 CpG methylation in Singaporean children of varying ethnicities in this study has clinical application. With further validation, they may add to the set of tools available to improve precision medicine and pharmacogenetics-based dosing.
Collapse
Affiliation(s)
- Wei Jing Fong
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Hong Ming Tan
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Rishabh Garg
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Ai Ling Teh
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Hong Pan
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Varsha Gupta
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Bernadus Krishna
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Zou Hui Chen
- Computational Biology, National University of Singapore, Singapore, Singapore
| | | | - Fabian Yap
- KK Women's and Children's Hospital, Singapore, Singapore
| | - Kok Hian Tan
- KK Women's and Children's Hospital, Singapore, Singapore
- Duke NUS Medical School, Singapore, Singapore
| | - Kok Yen Jerry Chan
- KK Women's and Children's Hospital, Singapore, Singapore
- Duke NUS Medical School, Singapore, Singapore
| | - Shiao-Yng Chan
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- National University Hospital, Singapore, Singapore
| | | | - Nikita Rane
- Institute of Mental Health,Singapore, Singapore
| | | | | | - Mei Han
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Michael Meaney
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Dennis Wang
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Jussi Keppo
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Geoffrey Chern-Yee Tan
- Computational Biology, National University of Singapore, Singapore, Singapore
- Institute of Mental Health,Singapore, Singapore
| |
Collapse
|
24
|
Gunter NB, Gebre RK, Graff-Radford J, Heckman MG, Jack CR, Lowe VJ, Knopman DS, Petersen RC, Ross OA, Vemuri P, Ramanan VK. Machine Learning Models of Polygenic Risk for Enhanced Prediction of Alzheimer Disease Endophenotypes. Neurol Genet 2024; 10:e200120. [PMID: 38250184 PMCID: PMC10798228 DOI: 10.1212/nxg.0000000000200120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 11/01/2023] [Indexed: 01/23/2024]
Abstract
Background and Objectives Alzheimer disease (AD) has a polygenic architecture, for which genome-wide association studies (GWAS) have helped elucidate sequence variants (SVs) influencing susceptibility. Polygenic risk score (PRS) approaches show promise for generating summary measures of inherited risk for clinical AD based on the effects of APOE and other GWAS hits. However, existing PRS approaches, based on traditional regression models, explain only modest variation in AD dementia risk and AD-related endophenotypes. We hypothesized that machine learning (ML) models of polygenic risk (ML-PRS) could outperform standard regression-based PRS methods and therefore have the potential for greater clinical utility. Methods We analyzed combined data from the Mayo Clinic Study of Aging (n = 1,791) and the Alzheimer's Disease Neuroimaging Initiative (n = 864). An AD PRS was computed for each participant using the top common SVs obtained from a large AD dementia GWAS. In parallel, ML models were trained using those SV genotypes, with amyloid PET burden as the primary outcome. Secondary outcomes included amyloid PET positivity and clinical diagnosis (cognitively unimpaired vs impaired). We compared performance between ML-PRS and standard PRS across 100 training sessions with different data splits. In each session, data were split into 80% training and 20% testing, and then five-fold cross-validation was used within the training set to ensure the best model was produced for testing. We also applied permutation importance techniques to assess which genetic factors contributed most to outcome prediction. Results ML-PRS models outperformed the AD PRS (r2 = 0.28 vs r2 = 0.24 in test set) in explaining variation in amyloid PET burden. Among ML approaches, methods accounting for nonlinear genetic influences were superior to linear methods. ML-PRS models were also more accurate when predicting amyloid PET positivity (area under the curve [AUC] = 0.80 vs AUC = 0.63) and the presence of cognitive impairment (AUC = 0.75 vs AUC = 0.54) compared with the standard PRS. Discussion We found that ML-PRS approaches improved upon standard PRS for prediction of AD endophenotypes, partly related to improved accounting for nonlinear effects of genetic susceptibility alleles. Further adaptations of the ML-PRS framework could help to close the gap of remaining unexplained heritability for AD and therefore facilitate more accurate presymptomatic and early-stage risk stratification for clinical decision-making.
Collapse
Affiliation(s)
- Nathaniel B Gunter
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Robel K Gebre
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Jonathan Graff-Radford
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Michael G Heckman
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Clifford R Jack
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Val J Lowe
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - David S Knopman
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Ronald C Petersen
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Owen A Ross
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Prashanthi Vemuri
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| | - Vijay K Ramanan
- From the Departments of Radiology (N.B.G., R.K.G., C.R.J., V.J.L., P.V.), Neurology (J.G.-R., D.S.K., R.C.P., V.K.R.), and Quantitative Health Sciences (R.C.P.), Mayo Clinic Rochester, MN; and Departments of Quantitative Health Sciences (M.G.H.), Neuroscience (O.A.R.), and Clinical Genomics (O.A.R.), Mayo Clinic Florida, Jacksonville
| |
Collapse
|
25
|
Laufer I, Mizrahi D, Zuckerman I. Enhancing EEG-based attachment style prediction: unveiling the impact of feature domains. Front Psychol 2024; 15:1326791. [PMID: 38318079 PMCID: PMC10838989 DOI: 10.3389/fpsyg.2024.1326791] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/04/2024] [Indexed: 02/07/2024] Open
Abstract
Introduction Attachment styles are crucial in human relationships and have been explored through neurophysiological responses and EEG data analysis. This study investigates the potential of EEG data in predicting and differentiating secure and insecure attachment styles, contributing to the understanding of the neural basis of interpersonal dynamics. Methods We engaged 27 participants in our study, employing an XGBoost classifier to analyze EEG data across various feature domains, including time-domain, complexity-based, and frequency-based attributes. Results The study found significant differences in the precision of attachment style prediction: a high precision rate of 96.18% for predicting insecure attachment, and a lower precision of 55.34% for secure attachment. Balanced accuracy metrics indicated an overall model accuracy of approximately 84.14%, taking into account dataset imbalances. Discussion These results highlight the challenges in using EEG patterns for attachment style prediction due to the complex nature of attachment insecurities. Individuals with heightened perceived insecurity predominantly aligned with the insecure attachment category, suggesting a link to their increased emotional reactivity and sensitivity to social cues. The study underscores the importance of time-domain features in prediction accuracy, followed by complexity-based features, while noting the lesser impact of frequency-based features. Our findings advance the understanding of the neural correlates of attachment and pave the way for future research, including expanding demographic diversity and integrating multimodal data to refine predictive models.
Collapse
Affiliation(s)
| | - Dor Mizrahi
- Department of Industrial Engineering and Management, Ariel University, Ariel, Israel
| | | |
Collapse
|
26
|
Sears T, Pagadala M, Castro A, Lee KH, Kong J, Tanaka K, Lippman S, Zanetti M, Carter H. Integrated germline and somatic features reveal divergent immune pathways driving ICB response. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.12.575430. [PMID: 38293085 PMCID: PMC10827124 DOI: 10.1101/2024.01.12.575430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Immune Checkpoint Blockade (ICB) has revolutionized cancer treatment, however mechanisms determining patient response remain poorly understood. Here we used machine learning to predict ICB response from germline and somatic biomarkers and interpreted the learned model to uncover putative mechanisms driving superior outcomes. Patients with higher T follicular helper infiltrates were robust to defects in the class-I Major Histocompatibility Complex (MHC-I). Further investigation uncovered different ICB responses in MHC-I versus MHC-II neoantigen reliant tumors across patients. Despite similar response rates, MHC-II reliant responses were associated with significantly longer durable clinical benefit (Discovery: Median OS=63.6 vs. 34.5 months P=0.0074; Validation: Median OS=37.5 vs. 33.1 months, P=0.040). Characteristics of the tumor immune microenvironment reflected MHC neoantigen reliance, and analysis of immune checkpoints revealed LAG3 as a potential target in MHC-II but not MHC-I reliant responses. This study highlights the value of interpretable machine learning models in elucidating the biological basis of therapy responses.
Collapse
Affiliation(s)
- Timothy Sears
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA USA
| | - Meghana Pagadala
- Biomedical Sciences Program, University of California San Diego, La Jolla, CA,, USA
| | - Andrea Castro
- Tumour Immunogenomics and Immunosurveillance Laboratory, University College London Cancer Institute, London, UK
| | - Ko-han Lee
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA USA
| | - JungHo Kong
- Division of Genomics and Precision Medicine, Department of Medicine, University of California San Diego, La Jolla, CA USA
| | - Kairi Tanaka
- School of Biological Sciences, University of California San Diego, La Jolla, CA USA
| | - Scott Lippman
- Moores Cancer Center, University of California San Diego, La Jolla, CA USA
| | - Maurizio Zanetti
- Moores Cancer Center, University of California San Diego, La Jolla, CA USA
- The Laboratory of Immunology, Moores Cancer Center and Department of Medicine, University of California San Diego, La Jolla, CA USA
| | - Hannah Carter
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA USA
- The Laboratory of Immunology, Moores Cancer Center and Department of Medicine, University of California San Diego, La Jolla, CA USA
| |
Collapse
|
27
|
Hrytsenko Y, Shea B, Elgart M, Kurniansyah N, Lyons G, Morrison AC, Carson AP, Haring B, Mitchel BD, Psaty BM, Jaeger BC, Gu CC, Kooperberg C, Levy D, Lloyd-Jones D, Choi E, Brody JA, Smith JA, Rotter JI, Moll M, Fornage M, Simon N, Castaldi P, Casanova R, Chung RH, Kaplan R, Loos RJ, Kardia SLR, Rich SS, Redline S, Kelly T, O’Connor T, Zhao W, Kim W, Guo X, Der Ida Chen Y, Sofer T. Machine learning models for blood pressure phenotypes combining multiple polygenic risk scores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.13.23299909. [PMID: 38168328 PMCID: PMC10760279 DOI: 10.1101/2023.12.13.23299909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
We construct non-linear machine learning (ML) prediction models for systolic and diastolic blood pressure (SBP, DBP) using demographic and clinical variables and polygenic risk scores (PRSs). We developed a two-model ensemble, consisting of a baseline model, where prediction is based on demographic and clinical variables only, and a genetic model, where we also include PRSs. We evaluate the use of a linear versus a non-linear model at both the baseline and the genetic model levels and assess the improvement in performance when incorporating multiple PRSs. We report the ensemble model's performance as percentage variance explained (PVE) on a held-out test dataset. A non-linear baseline model improved the PVEs from 28.1% to 30.1% (SBP) and 14.3% to 17.4% (DBP) compared with a linear baseline model. Including seven PRSs in the genetic model computed based on the largest available GWAS of SBP/DBP improved the genetic model PVE from 4.8% to 5.1% (SBP) and 4.7% to 5% (DBP) compared to using a single PRS. Adding additional 14 PRSs computed based on two independent GWASs further increased the genetic model PVE to 6.3% (SBP) and 5.7% (DBP). PVE differed across self-reported race/ethnicity groups, with primarily all non-White groups benefitting from the inclusion of additional PRSs.
Collapse
Affiliation(s)
- Yana Hrytsenko
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA
| | - Benjamin Shea
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA
| | - Michael Elgart
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | | | - Genevieve Lyons
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Alanna C. Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - April P. Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Bernhard Haring
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Department of Medicine III, Saarland University, Homburg, Saarland, Germany
| | - Braxton D. Mitchel
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Bruce M. Psaty
- Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
- Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Byron C. Jaeger
- Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - C Charles Gu
- The Center for Biostatistics and Data Science, Washington University, St. Louis, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Daniel Levy
- The Population Sciences Branch of the National Heart, Lung and Blood Institute, Bethesda, MD, USA
- The Framingham Heart Study, Framingham, MA, USA
| | - Donald Lloyd-Jones
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Eunhee Choi
- Columbia Hypertension Laboratory, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Jennifer A Brody
- Department of Medicine, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Matthew Moll
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- VA Boston Healthcare System, West Roxbury, MA, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Noah Simon
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA
| | - Peter Castaldi
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Ramon Casanova
- Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Taipei City, Taiwan
| | - Robert Kaplan
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Ruth J.F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty for Health and Medical Sciences, University of Copenhagen, Denmark, DK
| | - Sharon L. R. Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Susan Redline
- Department of Medicine, Harvard Medical School, Boston, MA
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA, USA
| | - Tanika Kelly
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Timothy O’Connor
- Department of Medicine III, Saarland University, Homburg, Saarland, Germany
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Wonji Kim
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Yii Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | | | - Tamar Sofer
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- CardioVascular Institute (CVI), Beth Israel Deaconess Medical Center, Boston, MA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| |
Collapse
|
28
|
Chen SF, Loguercio S, Chen KY, Lee SE, Park JB, Liu S, Sadaei HJ, Torkamani A. Artificial Intelligence for Risk Assessment on Primary Prevention of Coronary Artery Disease. CURRENT CARDIOVASCULAR RISK REPORTS 2023; 17:215-231. [DOI: 10.1007/s12170-023-00731-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/09/2023] [Indexed: 01/04/2025]
Abstract
Abstract
Purpose of Review
Coronary artery disease (CAD) is a common and etiologically complex disease worldwide. Current guidelines for primary prevention, or the prevention of a first acute event, include relatively simple risk assessment and leave substantial room for improvement both for risk ascertainment and selection of prevention strategies. Here, we review how advances in big data and predictive modeling foreshadow a promising future of improved risk assessment and precision medicine for CAD.
Recent Findings
Artificial intelligence (AI) has improved the utility of high dimensional data, providing an opportunity to better understand the interplay between numerous CAD risk factors. Beyond applications of AI in cardiac imaging, the vanguard application of AI in healthcare, recent translational research is also revealing a promising path for AI in multi-modal risk prediction using standard biomarkers, genetic and other omics technologies, a variety of biosensors, and unstructured data from electronic health records (EHRs). However, gaps remain in clinical validation of AI models, most notably in the actionability of complex risk prediction for more precise therapeutic interventions.
Summary
The recent availability of nation-scale biobank datasets has provided a tremendous opportunity to richly characterize longitudinal health trajectories using health data collected at home, at laboratories, and through clinic visits. The ever-growing availability of deep genotype-phenotype data is poised to drive a transition from simple risk prediction algorithms to complex, “data-hungry,” AI models in clinical decision-making. While AI models provide the means to incorporate essentially all risk factors into comprehensive risk prediction frameworks, there remains a need to wrap these predictions in interpretable frameworks that map to our understanding of underlying biological mechanisms and associated personalized intervention. This review explores recent advances in the role of machine learning and AI in CAD primary prevention and highlights current strengths as well as limitations mediating potential future applications.
Collapse
|
29
|
Castelli P, De Ruvo A, Bucciacchio A, D'Alterio N, Cammà C, Di Pasquale A, Radomski N. Harmonization of supervised machine learning practices for efficient source attribution of Listeria monocytogenes based on genomic data. BMC Genomics 2023; 24:560. [PMID: 37736708 PMCID: PMC10515079 DOI: 10.1186/s12864-023-09667-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 09/10/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Genomic data-based machine learning tools are promising for real-time surveillance activities performing source attribution of foodborne bacteria such as Listeria monocytogenes. Given the heterogeneity of machine learning practices, our aim was to identify those influencing the source prediction performance of the usual holdout method combined with the repeated k-fold cross-validation method. METHODS A large collection of 1 100 L. monocytogenes genomes with known sources was built according to several genomic metrics to ensure authenticity and completeness of genomic profiles. Based on these genomic profiles (i.e. 7-locus alleles, core alleles, accessory genes, core SNPs and pan kmers), we developed a versatile workflow assessing prediction performance of different combinations of training dataset splitting (i.e. 50, 60, 70, 80 and 90%), data preprocessing (i.e. with or without near-zero variance removal), and learning models (i.e. BLR, ERT, RF, SGB, SVM and XGB). The performance metrics included accuracy, Cohen's kappa, F1-score, area under the curves from receiver operating characteristic curve, precision recall curve or precision recall gain curve, and execution time. RESULTS The testing average accuracies from accessory genes and pan kmers were significantly higher than accuracies from core alleles or SNPs. While the accuracies from 70 and 80% of training dataset splitting were not significantly different, those from 80% were significantly higher than the other tested proportions. The near-zero variance removal did not allow to produce results for 7-locus alleles, did not impact significantly the accuracy for core alleles, accessory genes and pan kmers, and decreased significantly accuracy for core SNPs. The SVM and XGB models did not present significant differences in accuracy between each other and reached significantly higher accuracies than BLR, SGB, ERT and RF, in this order of magnitude. However, the SVM model required more computing power than the XGB model, especially for high amount of descriptors such like core SNPs and pan kmers. CONCLUSIONS In addition to recommendations about machine learning practices for L. monocytogenes source attribution based on genomic data, the present study also provides a freely available workflow to solve other balanced or unbalanced multiclass phenotypes from binary and categorical genomic profiles of other microorganisms without source code modifications.
Collapse
Affiliation(s)
- Pierluigi Castelli
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea De Ruvo
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea Bucciacchio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicola D'Alterio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Cesare Cammà
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Adriano Di Pasquale
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicolas Radomski
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy.
| |
Collapse
|
30
|
Sofer T, Kurniansyah N, Granot-Hershkovitz E, Goodman MO, Tarraf W, Broce I, Lipton RB, Daviglus M, Lamar M, Wassertheil-Smoller S, Cai J, DeCarli CS, Gonzalez HM, Fornage M. A polygenic risk score for Alzheimer's disease constructed using APOE-region variants has stronger association than APOE alleles with mild cognitive impairment in Hispanic/Latino adults in the U.S. Alzheimers Res Ther 2023; 15:146. [PMID: 37649099 PMCID: PMC10469805 DOI: 10.1186/s13195-023-01298-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 08/24/2023] [Indexed: 09/01/2023]
Abstract
INTRODUCTION Polygenic Risk Scores (PRSs) are summaries of genetic risk alleles for an outcome. METHODS We used summary statistics from five GWASs of AD to construct PRSs in 4,189 diverse Hispanics/Latinos (mean age 63 years) from the Study of Latinos-Investigation of Neurocognitive Aging (SOL-INCA). We assessed the PRS associations with MCI in the combined set of people and in diverse subgroups, and when including and excluding the APOE gene region. We also assessed PRS associations with MCI in an independent dataset from the Mass General Brigham Biobank. RESULTS A simple sum of 5 PRSs ("PRSsum"), each constructed based on a different AD GWAS, was associated with MCI (OR = 1.28, 95% CI [1.14, 1.41]) in a model adjusted for counts of the APOE-[Formula: see text] and APOE-[Formula: see text] alleles. Associations of single-GWAS PRSs were weaker. When removing SNPs from the APOE region from the PRSs, the association of PRSsum with MCI was weaker (OR = 1.17, 95% CI [1.04,1.31] with adjustment for APOE alleles). In all association analyses, APOE-[Formula: see text] and APOE-[Formula: see text] alleles were not associated with MCI. DISCUSSION A sum of AD PRSs is associated with MCI in Hispanic/Latino older adults. Despite no association of APOE-[Formula: see text] and APOE-[Formula: see text] alleles with MCI, the association of the AD PRS with MCI is stronger when including the APOE region. Thus, APOE variants different than the classic APOE alleles may be important predictors of MCI in Hispanic/Latino adults.
Collapse
Affiliation(s)
- Tamar Sofer
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
- CardioVascular Institute, Beth Israel Deaconess Medical Center, Boston, MA, USA.
| | - Nuzulul Kurniansyah
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | - Einat Granot-Hershkovitz
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Matthew O Goodman
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Wassim Tarraf
- Institute of Gerontology, Wayne State University, Detroit, MI, USA
| | - Iris Broce
- Department of Neurosciences, University of California San Diego, San Diego, CA, USA
| | | | - Martha Daviglus
- Department of Medicine, Institute for Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
| | - Melissa Lamar
- Department of Medicine, Institute for Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
- Rush Alzheimer's Disease Research Center, Rush University Medical Center, Chicago, IL, USA
| | - Sylvia Wassertheil-Smoller
- Department of Epidemiology & Population Health, Department of Pediatrics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Charles S DeCarli
- Department of Neurology, University of California at Davis, Sacramento, CA, USA
| | - Hector M Gonzalez
- Department of Neurosciences, University of California San Diego, San Diego, CA, USA
- Shiley-Marcos Alzheimer's Disease Center, University of California San Diego, La Jolla, CA, USA
| | - Myriam Fornage
- Institute of Molecular Medicine, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
31
|
Moon J, Posada-Quintero HF, Chon KH. Genetic data visualization using literature text-based neural networks: Examples associated with myocardial infarction. Neural Netw 2023; 165:562-595. [PMID: 37364469 DOI: 10.1016/j.neunet.2023.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 04/11/2023] [Accepted: 05/09/2023] [Indexed: 06/28/2023]
Abstract
Data visualization is critical to unraveling hidden information from complex and high-dimensional data. Interpretable visualization methods are critical, especially in the biology and medical fields, however, there are limited effective visualization methods for large genetic data. Current visualization methods are limited to lower-dimensional data and their performance suffers if there is missing data. In this study, we propose a literature-based visualization method to reduce high-dimensional data without compromising the dynamics of the single nucleotide polymorphisms (SNP) and textual interpretability. Our method is innovative because it is shown to (1) preserves both global and local structures of SNP while reducing the dimension of the data using literature text representations, and (2) enables interpretable visualizations using textual information. For performance evaluations, we examined the proposed approach to classify various classification categories including race, myocardial infarction event age groups, and sex using several machine learning models on the literature-derived SNP data. We used visualization approaches to examine clustering of data as well as quantitative performance metrics for the classification of the risk factors examined above. Our method outperformed all popular dimensionality reduction and visualization methods for both classification and visualization, and it is robust against missing and higher-dimensional data. Moreover, we found it feasible to incorporate both genetic and other risk information obtained from literature with our method.
Collapse
Affiliation(s)
- Jihye Moon
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT 06269, USA.
| | | | - Ki H Chon
- Department of Biomedical Engineering, University of Connecticut, Storrs, CT 06269, USA.
| |
Collapse
|
32
|
Sigurdsson AI, Louloudis I, Banasik K, Westergaard D, Winther O, Lund O, Ostrowski S, Erikstrup C, Pedersen O, Nyegaard M, Brunak S, Vilhjálmsson B, Rasmussen S. Deep integrative models for large-scale human genomics. Nucleic Acids Res 2023; 51:e67. [PMID: 37224538 PMCID: PMC10325897 DOI: 10.1093/nar/gkad373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 04/18/2023] [Accepted: 04/28/2023] [Indexed: 05/26/2023] Open
Abstract
Polygenic risk scores (PRSs) are expected to play a critical role in precision medicine. Currently, PRS predictors are generally based on linear models using summary statistics, and more recently individual-level data. However, these predictors mainly capture additive relationships and are limited in data modalities they can use. We developed a deep learning framework (EIR) for PRS prediction which includes a model, genome-local-net (GLN), specifically designed for large-scale genomics data. The framework supports multi-task learning, automatic integration of other clinical and biochemical data, and model explainability. When applied to individual-level data from the UK Biobank, the GLN model demonstrated a competitive performance compared to established neural network architectures, particularly for certain traits, showcasing its potential in modeling complex genetic relationships. Furthermore, the GLN model outperformed linear PRS methods for Type 1 Diabetes, likely due to modeling non-additive genetic effects and epistasis. This was supported by our identification of widespread non-additive genetic effects and epistasis in the context of T1D. Finally, we constructed PRS models that integrated genotype, blood, urine, and anthropometric data and found that this improved performance for 93% of the 290 diseases and disorders considered. EIR is available at https://github.com/arnor-sigurdsson/EIR.
Collapse
Affiliation(s)
- Arnór I Sigurdsson
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ioannis Louloudis
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Karina Banasik
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - David Westergaard
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Ole Winther
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
- Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
- Center for Genomic Medicine, Rigshospitalet (Copenhagen University Hospital), Copenhagen 2100, Denmark
| | - Ole Lund
- Danish National Genome Center, Ørestads Boulevard 5, 2300 Copenhagen S, Denmark
- DTU Health Tech, Department of Health Technology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Sisse Rye Ostrowski
- Department of Clinical Immunology, Rigshospitalet, University of Copenhagen, 2200 Copenhagen N, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Christian Erikstrup
- Department of Clinical Immunology, Aarhus University Hospital, 8000 Aarhus C, Denmark
- Department of Clinical Medicine, Aarhus University, 8000 Aarhus C, Denmark
| | - Ole Birger Vesterager Pedersen
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
- Department of Clinical Immunology, Zealand University Hospital, 4600 Køge, Denmark
| | - Mette Nyegaard
- Department of Health Science and Technology, Aalborg University, DK- 9260 Gistrup, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Bjarni J Vilhjálmsson
- National Centre for Register-Based Research (NCRR), Aarhus University, 8000 Aarhus C, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH), 8210 Aarhus V, Denmark
- Bioinformatics Research Centre (BiRC), Aarhus University, 8000 Aarhus C, Denmark
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
33
|
Badré A, Pan C. Explainable multi-task learning improves the parallel estimation of polygenic risk scores for many diseases through shared genetic basis. PLoS Comput Biol 2023; 19:e1011211. [PMID: 37418352 DOI: 10.1371/journal.pcbi.1011211] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 05/23/2023] [Indexed: 07/09/2023] Open
Abstract
Many complex diseases share common genetic determinants and are comorbid in a population. We hypothesized that the co-occurrences of diseases and their overlapping genetic etiology can be exploited to simultaneously improve multiple diseases' polygenic risk scores (PRS). This hypothesis was tested using a multi-task learning (MTL) approach based on an explainable neural network architecture. We found that parallel estimations of the PRS for 17 prevalent cancers in a pan-cancer MTL model were generally more accurate than independent estimations for individual cancers in comparable single-task learning (STL) models. Such performance improvement conferred by positive transfer learning was also observed consistently for 60 prevalent non-cancer diseases in a pan-disease MTL model. Interpretation of the MTL models revealed significant genetic correlations between the important sets of single nucleotide polymorphisms used by the neural network for PRS estimation. This suggested a well-connected network of diseases with shared genetic basis.
Collapse
Affiliation(s)
- Adrien Badré
- School of Computer Science, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Chongle Pan
- School of Computer Science, University of Oklahoma, Norman, Oklahoma, United States of America
- Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma, United States of America
| |
Collapse
|
34
|
Ko C, Brody JP. Evaluation of a genetic risk score computed using human chromosomal-scale length variation to predict breast cancer. Hum Genomics 2023; 17:53. [PMID: 37328908 PMCID: PMC10273758 DOI: 10.1186/s40246-023-00482-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 03/30/2023] [Indexed: 06/18/2023] Open
Abstract
INTRODUCTION The ability to accurately predict whether a woman will develop breast cancer later in her life, should reduce the number of breast cancer deaths. Different predictive models exist for breast cancer based on family history, BRCA status, and SNP analysis. The best of these models has an accuracy (area under the receiver operating characteristic curve, AUC) of about 0.65. We have developed computational methods to characterize a genome by a small set of numbers that represent the length of segments of the chromosomes, called chromosomal-scale length variation (CSLV). METHODS We built machine learning models to differentiate between women who had breast cancer and women who did not based on their CSLV characterization. We applied this procedure to two different datasets: the UK Biobank (1534 women with breast cancer and 4391 women who did not) and the Cancer Genome Atlas (TCGA) 874 with breast cancer and 3381 without. RESULTS We found a machine learning model that could predict breast cancer with an AUC of 0.836 95% CI (0.830.0.843) in the UK Biobank data. Using a similar approach with the TCGA data, we obtained a model with an AUC of 0.704 95% CI (0.702, 0.706). Variable importance analysis indicated that no single chromosomal region was responsible for significant fraction of the model results. CONCLUSION In this retrospective study, chromosomal-scale length variation could effectively predict whether or not a woman enrolled in the UK Biobank study developed breast cancer.
Collapse
Affiliation(s)
- Charmeine Ko
- Department of Biomedical Engineering, University of California, Irvine, USA
| | - James P Brody
- Department of Biomedical Engineering, University of California, Irvine, USA.
| |
Collapse
|
35
|
Li C, Pan Y, Zhang R, Huang Z, Li D, Han Y, Larkin C, Rao V, Sun X, Kelly TN. Genomic Innovation in Early Life Cardiovascular Disease Prevention and Treatment. Circ Res 2023; 132:1628-1647. [PMID: 37289909 PMCID: PMC10328558 DOI: 10.1161/circresaha.123.321999] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Cardiovascular disease (CVD) is a leading cause of morbidity and mortality globally. Although CVD events do not typically manifest until older adulthood, CVD develops gradually across the life-course, beginning with the elevation of risk factors observed as early as childhood or adolescence and the emergence of subclinical disease that can occur in young adulthood or midlife. Genomic background, which is determined at zygote formation, is among the earliest risk factors for CVD. With major advances in molecular technology, including the emergence of gene-editing techniques, along with deep whole-genome sequencing and high-throughput array-based genotyping, scientists now have the opportunity to not only discover genomic mechanisms underlying CVD but use this knowledge for the life-course prevention and treatment of these conditions. The current review focuses on innovations in the field of genomics and their applications to monogenic and polygenic CVD prevention and treatment. With respect to monogenic CVD, we discuss how the emergence of whole-genome sequencing technology has accelerated the discovery of disease-causing variants, allowing comprehensive screening and early, aggressive CVD mitigation strategies in patients and their families. We further describe advances in gene editing technology, which might soon make possible cures for CVD conditions once thought untreatable. In relation to polygenic CVD, we focus on recent innovations that leverage findings of genome-wide association studies to identify druggable gene targets and develop predictive genomic models of disease, which are already facilitating breakthroughs in the life-course treatment and prevention of CVD. Gaps in current research and future directions of genomics studies are also discussed. In aggregate, we hope to underline the value of leveraging genomics and broader multiomics information for characterizing CVD conditions, work which promises to expand precision approaches for the life-course prevention and treatment of CVD.
Collapse
Affiliation(s)
- Changwei Li
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA (C. Li, R.Z., Z.H., X.S.)
| | - Yang Pan
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| | - Ruiyuan Zhang
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA (C. Li, R.Z., Z.H., X.S.)
| | - Zhijie Huang
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA (C. Li, R.Z., Z.H., X.S.)
| | - Davey Li
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| | - Yunan Han
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| | - Claire Larkin
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| | - Varun Rao
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| | - Xiao Sun
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA (C. Li, R.Z., Z.H., X.S.)
| | - Tanika N Kelly
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| |
Collapse
|
36
|
Pagadala M, Sears TJ, Wu VH, Pérez-Guijarro E, Kim H, Castro A, Talwar JV, Gonzalez-Colin C, Cao S, Schmiedel BJ, Goudarzi S, Kirani D, Au J, Zhang T, Landi T, Salem RM, Morris GP, Harismendy O, Patel SP, Alexandrov LB, Mesirov JP, Zanetti M, Day CP, Fan CC, Thompson WK, Merlino G, Gutkind JS, Vijayanand P, Carter H. Germline modifiers of the tumor immune microenvironment implicate drivers of cancer risk and immunotherapy response. Nat Commun 2023; 14:2744. [PMID: 37173324 PMCID: PMC10182072 DOI: 10.1038/s41467-023-38271-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 04/24/2023] [Indexed: 05/15/2023] Open
Abstract
With the continued promise of immunotherapy for treating cancer, understanding how host genetics contributes to the tumor immune microenvironment (TIME) is essential to tailoring cancer screening and treatment strategies. Here, we study 1084 eQTLs affecting the TIME found through analysis of The Cancer Genome Atlas and literature curation. These TIME eQTLs are enriched in areas of active transcription, and associate with gene expression in specific immune cell subsets, such as macrophages and dendritic cells. Polygenic score models built with TIME eQTLs reproducibly stratify cancer risk, survival and immune checkpoint blockade (ICB) response across independent cohorts. To assess whether an eQTL-informed approach could reveal potential cancer immunotherapy targets, we inhibit CTSS, a gene implicated by cancer risk and ICB response-associated polygenic models; CTSS inhibition results in slowed tumor growth and extended survival in vivo. These results validate the potential of integrating germline variation and TIME characteristics for uncovering potential targets for immunotherapy.
Collapse
Affiliation(s)
- Meghana Pagadala
- Biomedical Sciences Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Timothy J Sears
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Victoria H Wu
- Department of Pharmacology, UCSD Moores Cancer Center, La Jolla, CA, 92093, USA
| | - Eva Pérez-Guijarro
- Laboratory of Cancer Biology and Genetics, National Cancer Institute, National Institutes of Health (NIH), Bethesda, MD, 20892, USA
| | - Hyo Kim
- Undergraduate Bioengineering Program, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Andrea Castro
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - James V Talwar
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
| | | | - Steven Cao
- Division of Epidemiology, Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, 92093, USA
| | | | | | - Divya Kirani
- Undergraduate Biology and Bioinformatics Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jessica Au
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Tongwu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health (NIH), Bethesda, MD, 20892, USA
| | - Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health (NIH), Bethesda, MD, 20892, USA
| | - Rany M Salem
- Division of Epidemiology, Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, 92093, USA
| | - Gerald P Morris
- Department of Pathology, University of California San Diego, La Jolla, CA, 92093, USA
| | - Olivier Harismendy
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Sandip Pravin Patel
- Center for Personalized Cancer Therapy, Division of Hematology and Oncology, UC San Diego Moores Cancer Center, San Diego, CA, 92037, USA
| | - Ludmil B Alexandrov
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jill P Mesirov
- Moores Cancer Center, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Maurizio Zanetti
- Moores Cancer Center, University of California San Diego, La Jolla, CA, 92093, USA
- The Laboratory of Immunology and Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Chi-Ping Day
- Laboratory of Cancer Biology and Genetics, National Cancer Institute, National Institutes of Health (NIH), Bethesda, MD, 20892, USA
| | - Chun Chieh Fan
- Center for Population Neuroscience and Genetics, Laureate Institute for Brain Research, Tulsa, OK, 74136, USA
- Department of Radiology, University of California San Diego, La Jolla, CA, 92093, USA
| | - Wesley K Thompson
- Division of Biostatistics, Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, 92093, USA
| | - Glenn Merlino
- Laboratory of Cancer Biology and Genetics, National Cancer Institute, National Institutes of Health (NIH), Bethesda, MD, 20892, USA
| | - J Silvio Gutkind
- Department of Pharmacology, UCSD Moores Cancer Center, La Jolla, CA, 92093, USA
| | | | - Hannah Carter
- Moores Cancer Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
37
|
Alzoubi H, Alzubi R, Ramzan N. Deep Learning Framework for Complex Disease Risk Prediction Using Genomic Variations. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094439. [PMID: 37177642 PMCID: PMC10181706 DOI: 10.3390/s23094439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/05/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Genome-wide association studies have proven their ability to improve human health outcomes by identifying genotypes associated with phenotypes. Various works have attempted to predict the risk of diseases for individuals based on genotype data. This prediction can either be considered as an analysis model that can lead to a better understanding of gene functions that underlie human disease or as a black box in order to be used in decision support systems and in early disease detection. Deep learning techniques have gained more popularity recently. In this work, we propose a deep-learning framework for disease risk prediction. The proposed framework employs a multilayer perceptron (MLP) in order to predict individuals' disease status. The proposed framework was applied to the Wellcome Trust Case-Control Consortium (WTCCC), the UK National Blood Service (NBS) Control Group, and the 1958 British Birth Cohort (58C) datasets. The performance comparison of the proposed framework showed that the proposed approach outperformed the other methods in predicting disease risk, achieving an area under the curve (AUC) up to 0.94.
Collapse
Affiliation(s)
- Hadeel Alzoubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Raid Alzubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Naeem Ramzan
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High Street, Paisley PA1 2BE, UK
| |
Collapse
|
38
|
Johnsen PV, Strümke I, Langaas M, DeWan AT, Riemer-Sørensen S. Inferring feature importance with uncertainties with application to large genotype data. PLoS Comput Biol 2023; 19:e1010963. [PMID: 36917581 PMCID: PMC10038287 DOI: 10.1371/journal.pcbi.1010963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 03/24/2023] [Accepted: 02/20/2023] [Indexed: 03/16/2023] Open
Abstract
Estimating feature importance, which is the contribution of a prediction or several predictions due to a feature, is an essential aspect of explaining data-based models. Besides explaining the model itself, an equally relevant question is which features are important in the underlying data generating process. We present a Shapley-value-based framework for inferring the importance of individual features, including uncertainty in the estimator. We build upon the recently published model-agnostic feature importance score of SAGE (Shapley additive global importance) and introduce Sub-SAGE. For tree-based models, it has the advantage that it can be estimated without computationally expensive resampling. We argue that for all model types the uncertainties in our Sub-SAGE estimator can be estimated using bootstrapping and demonstrate the approach for tree ensemble methods. The framework is exemplified on synthetic data as well as large genotype data for predicting feature importance with respect to obesity.
Collapse
Affiliation(s)
- Pål Vegard Johnsen
- SINTEF DIGITAL, Oslo, Norway
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Inga Strümke
- Department of Engineering Cybernetics, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Holistic Systems, SimulaMet, Oslo, Norway
| | - Mette Langaas
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Andrew Thomas DeWan
- Department of Chronic Disease Epidemiology and Center for Perinatal, Pediatric and Environmental Epidemiology, Yale School of Public Health, New Haven, Connecticut, United States of America
| | | |
Collapse
|
39
|
Fritzsche MC, Akyüz K, Cano Abadía M, McLennan S, Marttinen P, Mayrhofer MT, Buyx AM. Ethical layering in AI-driven polygenic risk scores-New complexities, new challenges. Front Genet 2023; 14:1098439. [PMID: 36816027 PMCID: PMC9933509 DOI: 10.3389/fgene.2023.1098439] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/04/2023] [Indexed: 01/27/2023] Open
Abstract
Researchers aim to develop polygenic risk scores as a tool to prevent and more effectively treat serious diseases, disorders and conditions such as breast cancer, type 2 diabetes mellitus and coronary heart disease. Recently, machine learning techniques, in particular deep neural networks, have been increasingly developed to create polygenic risk scores using electronic health records as well as genomic and other health data. While the use of artificial intelligence for polygenic risk scores may enable greater accuracy, performance and prediction, it also presents a range of increasingly complex ethical challenges. The ethical and social issues of many polygenic risk score applications in medicine have been widely discussed. However, in the literature and in practice, the ethical implications of their confluence with the use of artificial intelligence have not yet been sufficiently considered. Based on a comprehensive review of the existing literature, we argue that this stands in need of urgent consideration for research and subsequent translation into the clinical setting. Considering the many ethical layers involved, we will first give a brief overview of the development of artificial intelligence-driven polygenic risk scores, associated ethical and social implications, challenges in artificial intelligence ethics, and finally, explore potential complexities of polygenic risk scores driven by artificial intelligence. We point out emerging complexity regarding fairness, challenges in building trust, explaining and understanding artificial intelligence and polygenic risk scores as well as regulatory uncertainties and further challenges. We strongly advocate taking a proactive approach to embedding ethics in research and implementation processes for polygenic risk scores driven by artificial intelligence.
Collapse
Affiliation(s)
- Marie-Christine Fritzsche
- Institute of History and Ethics in Medicine, TUM School of Medicine, Technical University of Munich, Munich, Germany
- Department of Science, Technology and Society (STS), School of Social Sciences and Technology, Technical University of Munich, Munich, Germany
| | - Kaya Akyüz
- Biobanking and Biomolecular Resources Research Infrastructure Consortium - European Research Infrastructure Consortium (BBMRI-ERIC), Graz, Austria
- Department of Science and Technology Studies, University of Vienna, Vienna, Austria
| | - Mónica Cano Abadía
- Biobanking and Biomolecular Resources Research Infrastructure Consortium - European Research Infrastructure Consortium (BBMRI-ERIC), Graz, Austria
| | - Stuart McLennan
- Institute of History and Ethics in Medicine, TUM School of Medicine, Technical University of Munich, Munich, Germany
- Department of Science, Technology and Society (STS), School of Social Sciences and Technology, Technical University of Munich, Munich, Germany
| | - Pekka Marttinen
- Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Michaela Th. Mayrhofer
- Biobanking and Biomolecular Resources Research Infrastructure Consortium - European Research Infrastructure Consortium (BBMRI-ERIC), Graz, Austria
| | - Alena M. Buyx
- Institute of History and Ethics in Medicine, TUM School of Medicine, Technical University of Munich, Munich, Germany
- Department of Science, Technology and Society (STS), School of Social Sciences and Technology, Technical University of Munich, Munich, Germany
| |
Collapse
|
40
|
Zhang Y, Elgart M, Kurniansyah N, Spitzer BW, Wang H, Kim D, Shah N, Daviglus M, Zee PC, Cai J, Gottlieb DJ, Cade BE, Redline S, Sofer T. Genetic determinants of cardiometabolic and pulmonary phenotypes and obstructive sleep apnoea in HCHS/SOL. EBioMedicine 2022; 84:104288. [PMID: 36174398 PMCID: PMC9515437 DOI: 10.1016/j.ebiom.2022.104288] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 08/24/2022] [Accepted: 09/08/2022] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Obstructive Sleep Apnoea (OSA) often co-occurs with cardiometabolic and pulmonary diseases. This study is to apply genetic analysis methods to explain the associations between OSA and related phenotypes. METHODS In the Hispanic Community Healthy Study/Study of Latinos, we estimated genetic correlations ρg between the respiratory event index (REI) and 54 anthropometric, glycemic, cardiometabolic, and pulmonary phenotypes. We used summary statistics from published genome-wide association studies to construct Polygenic Risk Scores (PRSs) representing the genetic basis of each correlated phenotype (ρg>0.2 and p-value<0.05), and of OSA. We studied the association of the PRSs of the correlated phenotypes with both REI and OSA (REI≥5), and the association of OSA PRS with the correlated phenotypes. Causal relationships were tested using Mendelian Randomization (MR) analysis. FINDINGS The dataset included 11,155 participants, 31.03% with OSA. 22 phenotypes were genetically correlated with REI. 10 PRSs covering obesity and fat distribution (BMI, WHR, WHRadjBMI), blood pressure (DBP, PP, MAP), glycaemic control (fasting insulin, HbA1c, HOMA-B) and insomnia were associated with REI and/or OSA. OSA PRS was associated with BMI, WHR, DBP and glycaemic traits (fasting insulin, HbA1c, HOMA-B and HOMA-IR). MR analysis identified robust causal effects of BMI and WHR on OSA, and probable causal effects of DBP, PP, and HbA1c on OSA/REI. INTERPRETATION There are shared genetic underpinnings of anthropometric, blood pressure, and glycaemic phenotypes with OSA, with evidence for causal relationships between some phenotypes. FUNDING Described in Acknowledgments.
Collapse
Affiliation(s)
- Yuan Zhang
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA,Department of Respiratory Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Michael Elgart
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Nuzulul Kurniansyah
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Brian W. Spitzer
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Heming Wang
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Doyoon Kim
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Neomi Shah
- Department of Medicine, Division of Pulmonary, Critical Care, and Sleep Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Martha Daviglus
- Institute for Minority Health Research, University of Illinois at Chicago, Chicago, IL, USA
| | - Phyllis C. Zee
- Center for Circadian and Sleep Medicine, Department of Neurology, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Daniel J. Gottlieb
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Brian E. Cade
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA,Corresponding author at: Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, 221 Longwood Avenue, Boston, MA 02115, USA.
| |
Collapse
|
41
|
Elgart M, Lyons G, Romero-Brufau S, Kurniansyah N, Brody JA, Guo X, Lin HJ, Raffield L, Gao Y, Chen H, de Vries P, Lloyd-Jones DM, Lange LA, Peloso GM, Fornage M, Rotter JI, Rich SS, Morrison AC, Psaty BM, Levy D, Redline S, Sofer T. Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations. Commun Biol 2022; 5:856. [PMID: 35995843 PMCID: PMC9395509 DOI: 10.1038/s42003-022-03812-z] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 08/05/2022] [Indexed: 01/03/2023] Open
Abstract
Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.
Collapse
Affiliation(s)
- Michael Elgart
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Genevieve Lyons
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Santiago Romero-Brufau
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Nuzulul Kurniansyah
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Henry J Lin
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Laura Raffield
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Yan Gao
- The Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Paul de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Leslie A Lange
- Department of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA
| | - Daniel Levy
- The Population Sciences Branch of the National Heart, Lung and Blood Institute, Bethesda, MD, USA
- The Framingham Heart Study, Framingham, MA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|