1
|
Salvatore M, Mondul AM, Friese CR, Hanauer D, Xu H, Pearce CL, Mukherjee B. Impacts of sample weighting on transferability of risk prediction models across EHR-Linked biobanks with different recruitment strategies. J Biomed Inform 2025:104853. [PMID: 40398830 DOI: 10.1016/j.jbi.2025.104853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 04/15/2025] [Accepted: 05/18/2025] [Indexed: 05/23/2025]
Abstract
OBJECTIVE To evaluate whether using poststratification weights when training risk prediction models enhances transferability when the external test cohort has a different sampling strategy, a commonly encountered scenario when analyzing electronic health record (EHR)-linked biobanks. METHODS PS weights were calculated to align a health system-based biobank, the Michigan Genomics Initiative (MGI; n = 76,757), with a nationally recruited biobank, All of Us (AOU; n = 226,764), which oversamples underrepresented groups. Basic PS weights (PSBASIC) captured age, sex, and race/ethnicity; full PS weights (PSFULL) additionally included smoking, alcohol consumption, BMI, depression, hypertension, and the Charlson Comorbidity Index. Models for esophageal, liver, and pancreatic cancers were developed using EHR data from MGI at 0, 1, 2, and 5 years prior to diagnosis. Phenotype risk scores (PheRS) were constructed using six methods (e.g., regularized regression, random forest) and evaluated alongside covariates, risk factors, and symptoms. Evaluation metrics included the odds ratio (OR) for the top decile vs. the middle 40th-60th percentiles of the risk score distribution and the area under the receiver operating curve (AUC) evaluated in the AOU test cohort when models are trained with and without weighting. RESULTS Elastic net and random forest methods generally performed well in risk stratification, but no single PheRS construction method consistently outperformed others. Applying PS weights did not consistently improve risk stratification performance. For example, in liver cancer risk stratification at t = 1, unweighted random forest PheRS yielded an OR of 13.73 (95 % CI: 8.97, 21.01), compared to 14.55 (95 % CI: 9.45, 22.42) with PSBASIC and 13.62 (95 % CI: 8.90, 20.85) with PSFULL. CONCLUSION PS weights do not significantly enhance risk model transferability between biobanks. EHR-based PheRS are crucial for risk stratification and should be integrated with other multimodal data for improved risk prediction. Identifying high-risk populations for diseases like liver cancer early through health history mining shows promise.
Collapse
Affiliation(s)
- Maxwell Salvatore
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA; Center for Precision Health Data Science, University of Michigan, Ann Arbor, MI, USA
| | - Alison M Mondul
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA; Rogel Cancer Center, University of Michigan, Ann Arbor, MI, USA
| | - Christopher R Friese
- Rogel Cancer Center, University of Michigan, Ann Arbor, MI, USA; Department of Systems, Populations, and Leadership, School of Nursing, University of Michigan, Ann Arbor, MI, USA; Department of Health Management and Policy, University of Michigan, Ann Arbor, MI, USA
| | - David Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Hua Xu
- Department of Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA
| | - Celeste Leigh Pearce
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA; Rogel Cancer Center, University of Michigan, Ann Arbor, MI, USA
| | | |
Collapse
|
2
|
Tay JRH, Okada Y, Nadarajan GD, Siddiqui FJ, Barry T, Ong MEH. Pragmatic Risk Stratification Method to Identify Emergency Department Presentations for Alternative Care Service Pathways: Registry-Based Retrospective Study Over 5 Years. J Med Internet Res 2025; 27:e73758. [PMID: 40354643 DOI: 10.2196/73758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2025] [Revised: 04/10/2025] [Accepted: 04/17/2025] [Indexed: 05/14/2025] Open
Abstract
BACKGROUND Redirecting avoidable presentations to alternative care service pathways (ACSPs) may lead to better resource allocation for prehospital emergency care. Stratifying emergency department (ED) presentations by admission risk using diagnosis codes might be useful in identifying patients suitable for ACSPs. OBJECTIVE We aim to cluster ICD-10 (International Statistical Classification of Diseases, Tenth Revision) diagnosis codes based on hospital admission risk, identify ED presentation characteristics associated with these clusters, and develop an exploratory classification to identify groups potentially suitable for ACSPs. METHODS Retrospective observational data from a database of all visits to the ED of a tertiary care institution for over 5 years (2016-2020) were analyzed. K-means clustering grouped diagnosis codes according to admission outcomes. Multivariable logistic regression was performed to determine the association of characteristics with cluster membership. ICD-10 codes were grouped into blocks and analyzed for cumulative coverage to identify dominant groups associated with lower hospital admission risk. RESULTS A total of 215,477 ambulatory attendances classified as priority levels 3 (ambulatory) and 4 (nonemergency) under the Patient Acuity Category Scale were selected, with a 17.3% (0.4%) overall admission rate. The mean presentation age was 46.2 (SD 19.4) years. Four clusters with varying hospital admission risks were identified. Cluster 1 (n=131,531, 61%) had the lowest admission rate at 4.7% (0.2%), followed by cluster 2 (n=44,347, 20.6%) at 19.5% (0.4%), cluster 3 (n=27,829, 12.9%) at 47.8% (0.5%), and cluster 4 (n=11,770, 5.5%) with the highest admission rate at 78% (0.4%). The four-cluster solution achieved a silhouette score of 0.65, a Calinski-Harabasz Index of 3649.5, and a Davies-Bouldin Index of 0.46. Compared to clustering based on ICD-10 blocks, clustering based on individual ICD-10 codes demonstrated better separation. Mild (odds ratio [OR] 2.55, 95% CI 2.48-2.62), moderate (OR 2.40, 95% CI 2.28-2.51), and severe (OR 3.29, 95% CI 3.13-3.45) Charlson Comorbidity Index scores increased the odds of admission. Tachycardia (OR 1.46, 95% CI 1.43-1.49), hyperthermia (OR 2.32, 95% CI 2.25-2.40), recent surgery (OR 1.31, 95% CI 1.27-1.36), and recent inpatient admission (OR 1.16, 95% CI 1.13-1.18) also increased the odds of higher cluster membership. Among 132 ICD-10 blocks, 17 blocks accounted for 80% of cluster 1 cases, including musculoskeletal or connective tissue disorders and head or lower limbs injuries. Higher-risk categories included respiratory tract infections such as influenza and pneumonia, and infections of the skin and subcutaneous tissue. CONCLUSIONS Most ambulatory presentations at the ED were categorized into low-risk clusters with a minimal likelihood of hospital admission. Stratifying ICD-10 diagnosis codes by admission outcomes and ranking them based on frequency provides a structured approach to potentially stratify admission risk.
Collapse
Affiliation(s)
- John Rong Hao Tay
- Health Services and Systems Research Programme, Duke-NUS Medical School, Singapore, Singapore
- Department of Restorative Dentistry, National Dental Centre of Singapore, Singapore, Singapore
| | - Yohei Okada
- Health Services and Systems Research Programme, Duke-NUS Medical School, Singapore, Singapore
- Department of Preventive Services, Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Pre-hospital & Emergency Research Centre, Duke-NUS Medical School, Singapore, Singapore
| | | | - Fahad Javaid Siddiqui
- Health Services and Systems Research Programme, Duke-NUS Medical School, Singapore, Singapore
- Pre-hospital & Emergency Research Centre, Duke-NUS Medical School, Singapore, Singapore
| | - Tomás Barry
- School of Medicine, University College Dublin, Dublin, Ireland
| | - Marcus Eng Hock Ong
- Health Services and Systems Research Programme, Duke-NUS Medical School, Singapore, Singapore
- Pre-hospital & Emergency Research Centre, Duke-NUS Medical School, Singapore, Singapore
- Department of Emergency Medicine, Singapore General Hospital, Singapore, Singapore
| |
Collapse
|
3
|
Bastarache L, Tinker RJ, Schuler BA, Richter L, Phillips JA, Stead WW, Hooker GW, Peterson JF, Ruderfer DM. Characterizing trends in clinical genetic testing: A single-center analysis of EHR data from 1.8 million patients over two decades. Am J Hum Genet 2025; 112:1029-1038. [PMID: 40245861 DOI: 10.1016/j.ajhg.2025.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 03/11/2025] [Accepted: 03/12/2025] [Indexed: 04/19/2025] Open
Abstract
A lack of structural data in electronic health records (EHRs) makes assessing the impact of genetic testing on clinical practice challenging. We extracted clinical genetic tests from the EHRs of more than 1.8 million patients seen at Vanderbilt University Medical Center from 2002 to 2022. With these data, we quantified the use of clinical genetic testing in healthcare and described how testing patterns and results changed over time. We assessed trends in types of genetic tests, tracked usage across medical specialties, and introduced a new measure, the genetically attributable fraction (GAF), to quantify the proportion of observed phenotypes attributable to a genetic diagnosis over time. We identified 104,392 tests and 19,032 molecularly confirmed diagnoses. The proportion of patients with genetic testing in their EHRs increased from 1.0% in 2002 to 6.1% in 2022, and testing became more comprehensive with the growing use of multi-gene panels. The number of unique diseases diagnosed with genetic testing increased from 51 in 2002 to 509 in 2022, and there was a rise in the number of variants of uncertain significance. The phenome-wide GAF for 6,505,620 diagnoses made in 2022 was 0.46%, and the GAF was greater than 5% for 74 phenotypes, including pancreatic insufficiency (67%), chorea (64%), atrial septal defect (24%), microcephaly (17%), paraganglioma (17%), and ovarian cancer (6.8%). Our study provides a comprehensive quantification of the increasing role of genetic testing at a major academic medical institution and demonstrates its growing utility in explaining the observed medical phenome.
Collapse
Affiliation(s)
- Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Rory J Tinker
- Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Bryce A Schuler
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lucas Richter
- Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - John A Phillips
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - William W Stead
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Gillian W Hooker
- Division of Medical Genetics and Genomic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Concert Genetics, Nashville, TN, USA
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Douglas M Ruderfer
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
4
|
Arif M, Lehoczki A, Haskó G, Lohoff FW, Ungvari Z, Pacher P. Global and tissue-specific transcriptomic dysregulation in human aging: Pathways and predictive biomarkers. GeroScience 2025:10.1007/s11357-025-01672-z. [PMID: 40295347 DOI: 10.1007/s11357-025-01672-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2025] [Accepted: 04/10/2025] [Indexed: 04/30/2025] Open
Abstract
Aging is a universal biological process that impacts all tissues, leading to functional decline and increased susceptibility to age-related diseases, particularly cardiometabolic disorders. While aging is characterized by hallmarks such as mitochondrial dysfunction, chronic inflammation, and dysregulated metabolism, the molecular mechanisms driving these processes remain incompletely understood, particularly in a tissue-specific context. To address this gap, we conducted a comprehensive transcriptomic analysis across 40 human tissues using data from the Genotype-Tissue Expression (GTEx) project, comparing individuals younger than 40 years with those older than 65 years. We identified over 17,000 differentially expressed genes (DEGs) across tissues, with distinct patterns of up- and down-regulation. Enrichment analyses revealed that up-regulated DEGs were associated with inflammation, immune responses, and apoptosis, while down-regulated DEGs were linked to mitochondrial function, oxidative phosphorylation, and metabolic processes. Using gene co-expression network (GCN) analyses, we identified 1,099 genes as dysregulated nodes (DNs) shared across tissues, reflecting global aging-associated transcriptional shifts. Integrating machine learning approaches, we pinpointed key aging biomarkers, including GDF15 and EDA2R, which demonstrated strong predictive power for aging and were particularly relevant in cardiometabolic tissues such as the heart, liver, skeletal muscle, and adipose tissue. These genes were also validated in plasma proteomics studies and exhibited significant correlations with clinical cardiometabolic health indicators. This study provides a multi-tissue, integrative perspective on aging, uncovering both systemic and tissue-specific molecular signatures. Our findings advance understanding of the molecular underpinnings of aging and identify novel biomarkers that may serve as therapeutic targets for promoting healthy aging and mitigating age-related diseases.
Collapse
Affiliation(s)
- Muhammad Arif
- Laboratory of Cardiovascular Physiology and Tissue Injury, National Institute On Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA.
- Section On Fibrotic Disorders, National Institute and Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA.
- Department of Molecular and Clinical Medicine, SciLifeLab, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden.
| | - Andrea Lehoczki
- Doctoral College/Institute of Preventive Medicine and Public Health, International Training Program in Geroscience, Semmelweis University, Budapest, Hungary
| | - György Haskó
- Department of Anesthesiology, Columbia University, New York, NY, USA
| | - Falk W Lohoff
- Section On Clinical Genomics and Experimental Therapeutics, National Institute On Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA
| | - Zoltan Ungvari
- Vascular Cognitive Impairment, Neurodegeneration and Healthy Brain Aging Program, Department of Neurosurgery, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
- Oklahoma Center for Geroscience and Healthy Brain Aging, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
- Department of Health Promotion Sciences, College of Public Health, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
- The Peggy and Charles Stephenson Cancer Center, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Pal Pacher
- Laboratory of Cardiovascular Physiology and Tissue Injury, National Institute On Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
5
|
Brokamp E, Miller-Fleming T, Scalici A, Hooker G, Hamid R, Velez Edwards D, Chung WK, Luo Y, Kiryluk K, Limidi NA, Khankari NK, Cox NJ, Bastarache L, Shuey MM. Systematic method for classifying multiple congenital anomaly cases in electronic health records. Genet Med 2025; 27:101415. [PMID: 40116291 DOI: 10.1016/j.gim.2025.101415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 03/06/2025] [Accepted: 03/13/2025] [Indexed: 03/23/2025] Open
Abstract
PURPOSE Congenital anomalies (CAs) affect approximately 3% of live births and are the leading cause of infant morbidity and mortality. Many individuals have multiple CAs (MCA), a constellation of 2 or more unrelated CAs; yet, there is no consensus on how to systematically identify these individuals in electronic health records (EHRs). We developed a scalable method to characterize MCA in the EHR, allowing for the dramatic improvement of our understanding of the genetic and epidemiologic underpinnings of MCA. METHODS From the Vanderbilt University Medical Center's anonymized EHR database, we evaluated 3 different approaches for classifying MCA, including a novel approach that removed minor vs major differentiation and their associated clinical utilization and population characteristics. Using phenome-wide association studies, we assessed the phenome associated with previously classified minor CAs. RESULTS Our proposed universal method for MCA identification in the EHR is accurate (positive predictive value = 97.1%), associated with heightened hospital utilization (41% receiving inpatient care), and captures granular patterns of CAs. A secondary application of our method was done in 2 separate cohorts. CONCLUSION We developed a method to comprehensively identify individuals with MCA in the EHR, allowing researchers to better investigate the genetic etiologies of MCA. This method can be applied across EHR databases with billing codes.
Collapse
Affiliation(s)
- Elly Brokamp
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Tyne Miller-Fleming
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Alexandra Scalici
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Gillian Hooker
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| | - Rizwan Hamid
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN
| | - Digna Velez Edwards
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Department of Obstetrics and Gynecology, Vanderbilt University Medical Center, Nashville, TN; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Wendy K Chung
- Department of Pediatrics, Boston Children's Hospital, Boston, MA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | | | - Nita A Limidi
- Department of Neurology, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL
| | - Nikhil K Khankari
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Lisa Bastarache
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Megan M Shuey
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN.
| |
Collapse
|
6
|
Colbert SMC, Lepow L, Fennessy B, Iwata N, Ikeda M, Saito T, Terao C, Preuss M, Pathak J, Mann JJ, Coon H, Mullins N. Distinguishing clinical and genetic risk factors for suicidal ideation and behavior in a diverse hospital population. Transl Psychiatry 2025; 15:63. [PMID: 39979244 PMCID: PMC11842747 DOI: 10.1038/s41398-025-03287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 01/13/2025] [Accepted: 02/12/2025] [Indexed: 02/22/2025] Open
Abstract
Suicidal ideation (SI) and behavior (SB) are major public health concerns, but risk factors for their development and progression are poorly understood. We used ICD codes and a natural language processing algorithm to identify individuals in a hospital biobank with SI-only, SB, and controls without either. We compared the profiles of SB and SI-only patients to controls, and each other, using phenome-wide association studies (PheWAS) and polygenic risk scores (PRS). PheWAS identified many risk factors for SB and SI-only, plus specific psychiatric disorders which may be involved in progression from SI-only to SB. PRS for suicide attempt were only associated with SB, and even after accounting for psychiatric disorder PRS. SI PRS were only associated with SI-only, although not after accounting for psychiatric disorder PRS. These findings advance understanding of distinct genetic and clinical risk factors for SB and SI-only, which will aid in early detection and intervention efforts.
Collapse
Affiliation(s)
- Sarah M C Colbert
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Lauren Lepow
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Brian Fennessy
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nakao Iwata
- Department of Psychiatry, Fujita Health University School of Medicine, Toyoake, Japan
| | - Masashi Ikeda
- Department of Psychiatry, Fujita Health University School of Medicine, Toyoake, Japan
- Department of Psychiatry, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Takeo Saito
- Department of Psychiatry, Fujita Health University School of Medicine, Toyoake, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| | - Michael Preuss
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jyotishman Pathak
- Department of Population Health Sciences, Weill Cornell Medicine/NewYork-Presbyterian, New York, NY, USA
| | - J John Mann
- Department of Psychiatry, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
- Department of Radiology, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
- Division of Molecular Imaging and Neuropathology, New York State Psychiatric Institute, New York, NY, USA
| | - Hilary Coon
- Department of Psychiatry & Huntsman Mental Health Institute, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Niamh Mullins
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
7
|
Sullivan AE, Granger H, Dupuis L, Napper J, Tran L, Laws JL, Wells QS, Farber-Eger E, Alvis BD, O’Leary JM, Bommareddi S, Amancherla KV, Rali AS. Patient-specific Predictors of Haemolysis with Percutaneous Ventricular Assist Devices. Card Fail Rev 2025; 11:e04. [PMID: 40083653 PMCID: PMC11904419 DOI: 10.15420/cfr.2024.30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 11/29/2024] [Indexed: 03/16/2025] Open
Abstract
Introduction Percutaneous ventricular assist devices (pVADs) are increasingly used in cardiogenic shock but are associated with complications including haemolysis. The aim of this study was to investigate patient characteristics associated with haemolysis in cardiogenic shock patient population. Methods Consecutive patients were identified using Current Procedural Terminology (CPT) codes for pVAD insertion. Patient characteristics, laboratory and imaging data, and patient outcomes were abstracted manually and using validated automated methods. Laboratory-defined haemolysis required a drop in haemoglobin ≥2 mg/dl with either lactate dehydrogenase ≥250 units/l or undetectable haptoglobin. Clinically significant haemolysis was defined as laboratory-defined haemolysis necessitating transfusion. Primary outcome was the association between haemolysis and on-device and 30-day mortality. Results A total of 196 patients underwent pVAD insertion for cardiogenic shock during the study period and were included. Laboratory-defined haemolysis occurred in 46 patients (23.5%), of whom 12 (6.1%) had clinically significant haemolysis. Haemolysis occurred more often following emergency insertion, rather than elective insertion (84.8% versus 40.0%, p<0.001) in patients with elevated lactic acid levels (median 2.5 versus 1.6, p=0.016) and elevated heart rates (92.5 BPM versus 86.5 BPM, p=0.023). After multivariable adjustment, there was no association between laboratory-defined haemolysis and on-device (OR 0.6; 95% CI [0.1-3.4]; p=0.565) or 30-day mortality (OR 2.1; 95% CI [0.4-13.0]; p=0.391). Conclusion Laboratory-defined haemolysis was common in patients with cardiogenic shock and pVAD, but clinically significant haemolysis was not. There was no association between haemolysis and on-device or 30-day mortality.
Collapse
Affiliation(s)
- Alexander E Sullivan
- Division of Cardiovascular Diseases, Vanderbilt University Medical CenterNashville, TN, US
| | - Hannah Granger
- Department of Medicine, Vanderbilt University Medical CenterNashville, TN, US
| | - Leonie Dupuis
- Department of Medicine, Vanderbilt University Medical CenterNashville, TN, US
| | - Jonathan Napper
- Department of Medicine, Vanderbilt University Medical CenterNashville, TN, US
| | - Lena Tran
- Department of Medicine, Vanderbilt University Medical CenterNashville, TN, US
| | - J Lukas Laws
- Division of Cardiovascular Diseases, Vanderbilt University Medical CenterNashville, TN, US
| | - Quinn S Wells
- Division of Cardiovascular Diseases, Vanderbilt University Medical CenterNashville, TN, US
| | - Eric Farber-Eger
- Division of Cardiovascular Diseases, Vanderbilt University Medical CenterNashville, TN, US
| | - Bret D Alvis
- Department of Anaesthesiology, Vanderbilt University Medical CenterNashville, TN, US
- Department of Biomedical Engineering, Vanderbilt UniversityNashville, TN, US
| | - Jared M O’Leary
- Division of Cardiovascular Diseases, Vanderbilt University Medical CenterNashville, TN, US
| | - Swaroop Bommareddi
- Department of Cardiac Surgery, Vanderbilt University Medical CenterNashville, TN, US
| | - Kaushik V Amancherla
- Division of Cardiovascular Diseases, Vanderbilt University Medical CenterNashville, TN, US
| | - Aniket S Rali
- Division of Cardiovascular Diseases, Vanderbilt University Medical CenterNashville, TN, US
- Department of Anaesthesiology, Vanderbilt University Medical CenterNashville, TN, US
| |
Collapse
|
8
|
Wheless L, Mosley D, Dochtermann D, Pyarajan S, Gonzalez K, Weiss R, Maas K, Zhang S, Yao L, Xu Y, Madden C, Ike J, Smith IT, Grossarth S, Wilson O, Hung A, Fillmore NR, Brown K, Landi MT, Hartman RI. The impact of imprecise case definitions in electronic health record research: a melanoma case-study from the Million Veteran Program. Arch Dermatol Res 2025; 317:308. [PMID: 39853431 DOI: 10.1007/s00403-024-03780-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 12/26/2024] [Accepted: 12/29/2024] [Indexed: 01/26/2025]
Abstract
Cases for a disease can be defined broadly using diagnostic codes, or narrowly using gold-standard confirmation that often is not available in large administrative datasets. These different definitions can have significant impacts on the results and conclusions of studies. We conducted this study to assess how using melanoma phecodes versus histologic confirmation for invasive or in situ melanoma impacts the results of a genome-wide association study (GWAS) using the Million Veteran Program. Melanoma status was determined three ways: (1) by the presence of two or more phecodes, (2) histologically-confirmed invasive melanoma, and (3) histologically-confirmed melanoma in situ. We conducted a GWAS for variants with minor allele frequencies of 1% or greater. There were 45,665 cases in the phecode cohort, 5364 cases in the confirmed invasive melanoma cohort, and 4792 cases in the confirmed melanoma in situ cohort. There were 20,457 variants significant at the genome-wide level in the phecode cohort, 2582 in the invasive melanoma cohort, and 1989 in the melanoma in situ cohort. Most of the variants identified in the phecode cohort did not replicate in the histologically-confirmed cohorts. The different case definitions led to large differences in sample size and variants associated at the genome-wide level. Unvalidated and imprecise case definitions can lead to less accurate results. Investigators should use validated phenotypes when gold-standard definitions are not available.
Collapse
Affiliation(s)
- Lee Wheless
- Tennessee Valley Healthcare System VA Medical Center, 719 Thompson Lane, Suite 26300, Nashville, TN, 37215, USA.
- Division of Epidemiology, Vanderbilt University Medical Center Department of Medicine, Vanderbilt University Medical Center, Nashville, USA.
- Department of Dermatology, Nashville, TN, USA.
| | | | | | | | - Katlyn Gonzalez
- Vanderbilt University School of Medicine, Nashville, TN, USA
| | | | - Kyle Maas
- Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Siwei Zhang
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lydia Yao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Yaomin Xu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Christopher Madden
- State University of New York Downstate College of Medicine, Brooklyn, NY, USA
| | | | | | - Sarah Grossarth
- Quillen College of Medicine, East Tennessee State University, Johnson City, TN, USA
| | - Otis Wilson
- Tennessee Valley Healthcare System VA Medical Center, 719 Thompson Lane, Suite 26300, Nashville, TN, 37215, USA
- Department of Medicine, Division of Nephrology and Hypertension, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Adriana Hung
- Tennessee Valley Healthcare System VA Medical Center, 719 Thompson Lane, Suite 26300, Nashville, TN, 37215, USA
- Department of Medicine, Division of Nephrology and Hypertension, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nathanael R Fillmore
- VA Boston Healthcare System, Boston, MA, USA
- Department of Medicine, Veterans Affairs Boston Healthcare System, Boston, MA, USA
- Dana Farber Cancer Institute, Boston, MA, USA
- Massachusetts Veterans Epidemiology Research and Information Center, Veterans Affairs Boston Healthcare System, Boston, MA, USA
| | - Kevin Brown
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Maria Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Rebecca I Hartman
- VA Boston Healthcare System, Boston, MA, USA
- Brigham and Women's Hospital Department of Dermatology, Boston, MA, USA
| |
Collapse
|
9
|
Jones SC, Cardone KM, Bradford Y, Tishkoff SA, Ritchie MD. The Impact of Ancestry on Genome-Wide Association Studies. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2025; 30:251-267. [PMID: 39670375 PMCID: PMC11694900 DOI: 10.1142/9789819807024_0019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/16/2025]
Abstract
Genome-wide association studies (GWAS) are an important tool for the study of complex disease genetics. Decisions regarding the quality control (QC) procedures employed as part of a GWAS can have important implications on the results and their biological interpretation. Many GWAS have been conducted predominantly in cohorts of European ancestry, but many initiatives aim to increase the representation of diverse ancestries in genetic studies. The question of how these data should be combined and the consequences that genetic variation across ancestry groups might have on GWAS results warrants further investigation. In this study, we focus on several commonly used methods for combining genetic data across diverse ancestry groups and the impact these decisions have on the outcome of GWAS summary statistics. We ran GWAS on two binary phenotypes using ancestry-specific, multi-ancestry mega-analysis, and meta-analysis approaches. We found that while multi-ancestry mega-analysis and meta-analysis approaches can aid in identifying signals shared across ancestries, they can diminish the signal of ancestry-specific associations and modify their effect sizes. These results demonstrate the potential impact on downstream post-GWAS analyses and follow-up studies. Decisions regarding how the genetic data are combined has the potential to mask important findings that might serve individuals of ancestries that have been historically underrepresented in genetic studies. New methods that consider ancestry-specific variants in conjunction with the shared variants need to be developed.
Collapse
Affiliation(s)
- Steven Christopher Jones
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Katie M Cardone
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Yuki Bradford
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Sarah A Tishkoff
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA,
| |
Collapse
|
10
|
Dasgupta S. Multiplexed Molecular Endophenotypes Help Identify Hub Genes in Non-Small Cell Lung Cancer: Unlocking Next-Generation Cancer Phenomics. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2025; 29:8-17. [PMID: 39817717 DOI: 10.1089/omi.2024.0179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2025]
Abstract
Next-generation cancer phenomics by deployment of multiple molecular endophenotypes coupled with high-throughput analyses of gene expression offer veritable opportunities for triangulation of discovery findings in non-small cell lung cancer (NSCLC) research. This study reports differentially expressed genes in NSCLC using publicly available datasets (GSE18842 and GSE229253), uncovering 130 common genes that may potentially represent crucial molecular signatures of NSCLC. Additionally, network analyses by GeneMANIA and STRING revealed significant coexpression and interaction patterns among these genes, with four notable hub genes-GRK5, CAV1, PPARG, and CXCR2-identified as pivotal in NSCLC progression. Validation of these hub genes indicated their consistent downregulation in tumor tissues compared to normal counterparts. Gene expression across the endophenotypes representing pathological stages revealed distinct downregulation trends, emphasizing their putative roles as biomarkers for cancer progression. Moreover, three miRNAs (hsa-miR-429, hsa-miR-335-5p, and hsa-miR-126-3p) showed strong associations with these hub genes, while SREBF1 emerged as a relevant transcription factor. Pathway enrichment analysis identified the chemokine signaling pathway as significantly associated with these genes, highlighting its role in tumor progression and immune evasion. Cell-type enrichment analysis indicated that endothelial cells may play a significant role in NSCLC pathogenesis. Finally, survival analysis demonstrated that GRK5 is a potential oncogenic marker, whereas CAV1 may have a protective effect. These findings collectively underscore the critical molecular interactions in NSCLC and suggest novel paths for translational research, targeted therapies, and prognostic markers in clinical settings. They also attest to the promises of next-generation cancer phenomics using multiple endophenotypes for discovery and triangulation of novel findings.
Collapse
Affiliation(s)
- Sanjukta Dasgupta
- Department of Biotechnology, Brainware University, Barasat, West Bengal, India
- Center for Multidisciplinary Research & Innovations, Brainware University, Barasat, West Bengal, India
| |
Collapse
|
11
|
Tran TC, Schlueter DJ, Zeng C, Mo H, Carroll RJ, Denny JC. PheWAS analysis on large-scale biobank data with PheTK. Bioinformatics 2024; 41:btae719. [PMID: 39657951 PMCID: PMC11709244 DOI: 10.1093/bioinformatics/btae719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 10/16/2024] [Accepted: 12/06/2024] [Indexed: 12/12/2024] Open
Abstract
SUMMARY With the rapid growth of genetic data linked to electronic health record (EHR) data in huge cohorts, large-scale phenome-wide association study (PheWAS) have become powerful discovery tools in biomedical research. PheWAS is an analysis method to study phenotype associations utilizing longitudinal EHR data. Previous PheWAS packages were developed mostly with smaller datasets and with earlier PheWAS approaches. PheTK was designed to simplify analysis and efficiently handle biobank-scale data. PheTK uses multithreading and supports a full PheWAS workflow including extraction of data from OMOP databases and Hail matrix tables as well as PheWAS analysis for both phecode version 1.2 and phecodeX. Benchmarking results showed PheTK took 64% less time than the R PheWAS package to complete the same workflow. PheTK can be run locally or on cloud platforms such as the All of Us Researcher Workbench (All of Us) or the UK Biobank (UKB) Research Analysis Platform (RAP). AVAILABILITY AND IMPLEMENTATION The PheTK package is freely available on the Python Package Index, on GitHub under GNU General Public License (GPL-3) at https://github.com/nhgritctran/PheTK, and on Zenodo, DOI 10.5281/zenodo.14217954, at https://doi.org/10.5281/zenodo.14217954. PheTK is implemented in Python and platform independent.
Collapse
Affiliation(s)
- Tam C Tran
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, United States
| | - David J Schlueter
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, United States
- University of Toronto, ON, M5S 1A1, Canada
| | - Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, United States
| | - Huan Mo
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, United States
| | - Robert J Carroll
- Vanderbilt University School of Medicine, Nashville, TN, 37240, United States
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, United States
- All of Us Research Program, National Institutes of Health, Bethesda, MD, 20892, United States
| |
Collapse
|
12
|
Johnson R, Gottlieb U, Shaham G, Eisen L, Waxman J, Devons-Sberro S, Ginder CR, Hong P, Sayeed R, Reis BY, Balicer RD, Dagan N, Zitnik M. Unified Clinical Vocabulary Embeddings for Advancing Precision Medicine. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.12.03.24318322. [PMID: 39677476 PMCID: PMC11643188 DOI: 10.1101/2024.12.03.24318322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Integrating clinical knowledge into AI remains challenging despite numerous medical guidelines and vocabularies. Medical codes, central to healthcare systems, often reflect operational patterns shaped by geographic factors, national policies, insurance frameworks, and physician practices rather than the precise representation of clinical knowledge. This disconnect hampers AI in representing clinical relationships, raising concerns about bias, transparency, and generalizability. Here, we developed a resource of 67,124 clinical vocabulary embeddings derived from a clinical knowledge graph tailored to electronic health record vocabularies, spanning over 1.3 million edges. Using graph transformer neural networks, we generated clinical vocabulary embeddings that provide a new representation of clinical knowledge by unifying seven medical vocabularies. These embeddings were validated through a phenotype risk score analysis involving 4.57 million patients from Clalit Healthcare Services, effectively stratifying individuals based on survival outcomes. Inter-institutional panels of clinicians evaluated the embeddings for alignment with clinical knowledge across 90 diseases and 3,000 clinical codes, confirming their robustness and transferability. This resource addresses gaps in integrating clinical vocabularies into AI models and training datasets, paving the way for knowledge-grounded population and patient-level models.
Collapse
Affiliation(s)
- Ruth Johnson
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Uri Gottlieb
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
| | - Galit Shaham
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
| | - Lihi Eisen
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
| | - Jacob Waxman
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
| | - Stav Devons-Sberro
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
| | - Curtis R. Ginder
- Cardiovascular Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Peter Hong
- Division of General Pediatrics, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
- Information Technology, Enterprise Data Analytics and Reporting, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Raheel Sayeed
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Ben Y. Reis
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Predictive Medicine Group, Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| | - Ran D. Balicer
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
- Faculty of Health Sciences, School of Public Health, Ben Gurion University of the Negev, Be’er Sheva, Israel
| | - Noa Dagan
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
- Software and Information Systems Engineering, Ben Gurion University, Be’er Sheva, Israel
| | - Marinka Zitnik
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA
| |
Collapse
|
13
|
Stead WW, Lewis A, Giuse NB, Williams AM, Biaggioni I, Bastarache L. Disentangling the phenotypic patterns of hypertension and chronic hypotension. J Biomed Inform 2024; 159:104743. [PMID: 39486471 PMCID: PMC11722018 DOI: 10.1016/j.jbi.2024.104743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 10/03/2024] [Accepted: 10/29/2024] [Indexed: 11/04/2024]
Abstract
OBJECTIVE 2017 blood pressure (BP) categories focus on cardiac risk. We hypothesize that studying the balance between mechanisms that increase or decrease BP across the medical phenome will lead to new insights. We devised a classifier that uses BP measures to assign individuals to mutually exclusive categories centered in the upper (Htn), lower (Hotn) and middle (Naf) zones of the BP spectrum; and examined the epidemiologic and phenotypic patterns of these BP-categories. METHODS We classified a cohort of 832,560 deidentified electronic health records by BP-category; compared the frequency of BP-categories and four subtypes of Htn and Hotn by sex and age-decade; visualized the distributions of systolic, diastolic, mean arterial and pulse pressures stratified by BP-category; and ran Phenome-wide Association Studies (PheWAS) for Htn and Hotn. We paired knowledgebases for hypertension and hypotension and computed aggregate knowledgebase status (KB-status) indicating known associations. We assessed alignment of PheWAS results with KB-status for phecodes in the knowledgebase, and paired PheWAS correlations with KB-status to surface phenotypic patterns. RESULTS BP-categories represent distinct distributions within the multimodal distributions of systolic and diastolic pressure. They are centered in the upper, lower, and middle zones of mean arterial pressure and provide a different signal than pulse pressure. For phecodes in the knowledgebase, 85% of positive correlations align with KB-status. Phenotypic patterns for Htn and Hotn overlap for several phecodes and are separate for others. Our analysis suggests five candidates for hypothesis testing research, two where the prevalence of the association with Htn or Hotn may be under appreciated, three where mechanisms that increase and decrease blood pressure may be affecting one another's expression. CONCLUSION PairedPheWAS methods may open a phenome-wide path to disentangling hypertension and chronic hypotension. Our classifier provides a starting point for assigning individuals to BP-categories representing the upper, lower, and middle zones of the BP spectrum. 4.7 % of individuals matching 2017 BP categories for normal, elevated BP or isolated hypertension, have diastolic pressure < 60. Research is needed to fine-tune the classifier, provide external validation, evaluate the clinical significance of diastolic pressure < 60, and test the candidate hypotheses.
Collapse
Affiliation(s)
- William W Stead
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Adam Lewis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nunzia B Giuse
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Center for Knowledge Management, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Annette M Williams
- Center for Knowledge Management, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Italo Biaggioni
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
14
|
Chen R, Duffy Á, Petrazzini BO, Vy HM, Stein D, Mort M, Park JK, Schlessinger A, Itan Y, Cooper DN, Jordan DM, Rocheleau G, Do R. Expanding drug targets for 112 chronic diseases using a machine learning-assisted genetic priority score. Nat Commun 2024; 15:8891. [PMID: 39406732 PMCID: PMC11480483 DOI: 10.1038/s41467-024-53333-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 10/09/2024] [Indexed: 10/19/2024] Open
Abstract
Identifying genetic drivers of chronic diseases is necessary for drug discovery. Here, we develop a machine learning-assisted genetic priority score, which we call ML-GPS, that incorporates genetic associations with predicted disease phenotypes to enhance target discovery. First, we construct gradient boosting models to predict 112 chronic disease phecodes in the UK Biobank and analyze associations of predicted and observed phenotypes with common, rare, and ultra-rare variants to model the allelic series. We integrate these associations with existing evidence using gradient boosting with continuous feature encoding to construct ML-GPS, training it to predict drug indications in Open Targets and externally testing it in SIDER. We then generate ML-GPS predictions for 2,362,636 gene-phecode pairs. We find that the use of predicted phenotypes, which identify substantially more genetic associations than observed phenotypes across the allele frequency spectrum, significantly improves the performance of ML-GPS. ML-GPS increases coverage of drug targets, with the top 1% of all scores providing support for 15,077 gene-phecode pairs that previously had no support. ML-GPS can also identify well-known target-disease relationships, promising targets without indicated drugs, and targets for several drugs in clinical trials, including LRRK2 inhibitors for Parkinson's disease and olpasiran for cardiovascular disease.
Collapse
Affiliation(s)
- Robert Chen
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Áine Duffy
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ben O Petrazzini
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Genomic Data Analytics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ha My Vy
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Genomic Data Analytics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David Stein
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Joshua K Park
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yuval Itan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - Daniel M Jordan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Genomic Data Analytics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ghislain Rocheleau
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Genomic Data Analytics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Center for Genomic Data Analytics, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
15
|
Rivière JG, Soler Palacín P, Butte MJ. Proceedings from the inaugural Artificial Intelligence in Primary Immune Deficiencies (AIPID) conference. J Allergy Clin Immunol 2024; 153:637-642. [PMID: 38224784 PMCID: PMC11402388 DOI: 10.1016/j.jaci.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/09/2024] [Accepted: 01/11/2024] [Indexed: 01/17/2024]
Abstract
Here, we summarize the proceedings of the inaugural Artificial Intelligence in Primary Immune Deficiencies conference, during which experts and advocates gathered to advance research into the applications of artificial intelligence (AI), machine learning, and other computational tools in the diagnosis and management of inborn errors of immunity (IEIs). The conference focused on the key themes of expediting IEI diagnoses, challenges in data collection, roles of natural language processing and large language models in interpreting electronic health records, and ethical considerations in implementation. Innovative AI-based tools trained on electronic health records and claims databases have discovered new patterns of warning signs for IEIs, facilitating faster diagnoses and enhancing patient outcomes. Challenges in training AIs persist on account of data limitations, especially in cases of rare diseases, overlapping phenotypes, and biases inherent in current data sets. Furthermore, experts highlighted the significance of ethical considerations, data protection, and the necessity for open science principles. The conference delved into regulatory frameworks, equity in access, and the imperative for collaborative efforts to overcome these obstacles and harness the transformative potential of AI. Concerted efforts to successfully integrate AI into daily clinical immunology practice are still needed.
Collapse
Affiliation(s)
- Jacques G Rivière
- Infection and Immunity in Pediatric Patients Research Group, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Pediatric Infectious Diseases and Immunodeficiencies Unit, Hospital Infantil i de la Dona, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain; Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pere Soler Palacín
- Infection and Immunity in Pediatric Patients Research Group, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Pediatric Infectious Diseases and Immunodeficiencies Unit, Hospital Infantil i de la Dona, Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain; Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Manish J Butte
- Division of Immunology, Allergy, and Rheumatology, Department of Pediatrics, University of California Los Angeles, Los Angeles, Calif; Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, Los Angeles, Calif; Department of Human Genetics, University of California Los Angeles, Los Angeles, Calif.
| |
Collapse
|
16
|
Tran TC, Schlueter DJ, Zeng C, Mo H, Carroll RJ, Denny JC. PheWAS analysis on large-scale biobank data with PheTK. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.12.24302720. [PMID: 38410487 PMCID: PMC10896413 DOI: 10.1101/2024.02.12.24302720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Summary With the rapid growth of genetic data linked to electronic health record data in huge cohorts, large-scale phenome-wide association study (PheWAS), have become powerful discovery tools in biomedical research. PheWAS is an analysis method to study phenotype associations utilizing longitudinal electronic health record (EHR) data. Previous PheWAS packages were developed mostly in the days of smaller biobanks and with earlier PheWAS approaches. PheTK was designed to simplify analysis and efficiently handle biobank-scale data. PheTK uses multithreading and supports a full PheWAS workflow including extraction of data from OMOP databases and Hail matrix tables as well as PheWAS analysis for both phecode version 1.2 and phecodeX. Benchmarking results showed PheTK took 64% less time than the R PheWAS package to complete the same workflow. PheTK can be run locally or on cloud platforms such as the All of Us Researcher Workbench ( All of Us ) or the UK Biobank (UKB) Research Analysis Platform (RAP). Availability and implementation The PheTK package is freely available on the Python Package Index (PyPi) and on GitHub under GNU Public License (GPL-3) at https://github.com/nhgritctran/PheTK . It is implemented in Python and platform independent. The demonstration workspace for All of Us will be made available in the future as a featured workspace. Contact PheTK@mail.nih.gov.
Collapse
|