1
|
Gu Z, Jia W, Piccardi M, Yu P. Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought. Artif Intell Med 2025; 162:103078. [PMID: 39978047 DOI: 10.1016/j.artmed.2025.103078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 01/18/2025] [Accepted: 02/03/2025] [Indexed: 02/22/2025]
Abstract
BACKGROUND Understanding and extracting valuable information from electronic health records (EHRs) is important for improving healthcare delivery and health outcomes. Large language models (LLMs) have demonstrated significant proficiency in natural language understanding and processing, offering promises for automating the typically labor-intensive and time-consuming analytical tasks with EHRs. Despite the active application of LLMs in the healthcare setting, many foundation models lack real-world healthcare relevance. Applying LLMs to EHRs is still in its early stage. To advance this field, in this study, we pioneer a generation-augmented prompting paradigm "GAPrompt" to empower generic LLMs for automated clinical assessment, in particular, quantitative stroke severity assessment, using data extracted from EHRs. METHODS The GAPrompt paradigm comprises five components: (i) prompt-driven selection of LLMs, (ii) generation-augmented construction of a knowledge base, (iii) summary-based generation-augmented retrieval (SGAR); (iv) inferencing with a hierarchical chain-of-thought (HCoT), and (v) ensembling of multiple generations. RESULTS GAPrompt addresses the limitations of generic LLMs in clinical applications in a progressive manner. It efficiently evaluates the applicability of LLMs in specific tasks through LLM selection prompting, enhances their understanding of task-specific knowledge from the constructed knowledge base, improves the accuracy of knowledge and demonstration retrieval via SGAR, elevates LLM inference precision through HCoT, enhances generation robustness, and reduces hallucinations of LLM via ensembling. Experiment results demonstrate the capability of our method to empower LLMs to automatically assess EHRs and generate quantitative clinical assessment results. CONCLUSION Our study highlights the applicability of enhancing the capabilities of foundation LLMs in medical domain-specific tasks, i.e., automated quantitative analysis of EHRs, addressing the challenges of labor-intensive and often manually conducted quantitative assessment of stroke in clinical practice and research. This approach offers a practical and accessible GAPrompt paradigm for researchers and industry practitioners seeking to leverage the power of LLMs in domain-specific applications. Its utility extends beyond the medical domain, applicable to a wide range of fields.
Collapse
Affiliation(s)
- Zhanzhong Gu
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia.
| | - Wenjing Jia
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia.
| | - Massimo Piccardi
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia
| | - Ping Yu
- School of Computing and Information Technology, University of Wollongong, NSW, 2522, Australia
| |
Collapse
|
2
|
Wang Y, Hilsman J, Li C, Morris M, Heider PM, Fu S, Kwak MJ, Wen A, Applegate JR, Wang L, Bernstam E, Liu H, Chang J, Harris DR, Corbeau A, Henderson D, Osborne JD, Kennedy RE, Garduno-Rapp NE, Rousseau JF, Yan C, Chen Y, Patel MB, Murphy TJ, Malin BA, Park CM, Fan JW, Sohn S, Pagali S, Peng Y, Pathak A, Wu Y, Xia Z, Loguercio S, Reis SE, Visweswaran S. Development and Validation of Natural Language Processing Algorithms in the ENACT National Electronic Health Record Research Network. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.01.24.25321096. [PMID: 39974073 PMCID: PMC11839006 DOI: 10.1101/2025.01.24.25321096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Electronic health record (EHR) data are a rich and invaluable source of real-world clinical information, enabling detailed insights into patient populations, treatment outcomes, and healthcare practices. The availability of large volumes of EHR data are critical for advancing translational research and developing innovative technologies such as artificial intelligence. The Evolve to Next-Gen Accrual to Clinical Trials (ENACT) network, established in 2015 with funding from the National Center for Advancing Translational Sciences (NCATS), aims to accelerate translational research by democratizing access to EHR data for all Clinical and Translational Science Awards (CTSA) hub investigators. The present ENACT network provides access to structured EHR data, enabling cohort discovery and translational research across the network. However, a substantial amount of critical information is contained in clinical narratives, and natural language processing (NLP) is required for extracting this information to support research. To address this need, the ENACT NLP Working Group was formed to make NLP-derived clinical information accessible and queryable across the network. This article describes the implementation and deployment of NLP infrastructure across ENACT. First, we describe the formation and goals of the Working Group, the practices and logistics involved in implementation and deployment, and the specific NLP tools and technologies utilized. Then, we describe how we extended the ENACT ontology to standardize and query NLP-derived data, as well as how we conducted multisite evaluations of the NLP algorithms. Finally, we reflect on the experience and lessons learnt, which may be useful for other national data networks that are deploying NLP to unlock the potential of clinical text for research.
Collapse
Affiliation(s)
- Yanshan Wang
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jordan Hilsman
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Chenyu Li
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Paul M Heider
- Biomedical Informatics Center and Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
| | - Sunyang Fu
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Min Ji Kwak
- McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Andrew Wen
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Joseph R Applegate
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Liwei Wang
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Elmer Bernstam
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
- Division of General Internal Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hongfang Liu
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jack Chang
- Clinical and Translational Science Institute, University of Rochester Medical Center, Rochester, NY, USA
| | - Daniel R Harris
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, USA
| | - Alexandria Corbeau
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, USA
| | - Darren Henderson
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, USA
| | - John D Osborne
- Department of Biomedical Informatics and Data Science, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Richard E Kennedy
- Division of Gerontology, Geriatrics, and Palliative Care, Department of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | | | - Justin F Rousseau
- Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Neurology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Chao Yan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - You Chen
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Mayur B Patel
- Department of Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Tyler J Murphy
- Department of Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Chan Mi Park
- Department of Gerontology, Hebrew SeniorLife, Marcus Institute for Aging Research, Boston, MA, USA
| | - Jungwei W Fan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
- Center for Clinical and Translational Science, Mayo Clinic, Rochester, MN, USA
| | - Sunghwan Sohn
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | | | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
- Clinical & Translational Science Center, Weill Cornell Medicine, New York, NY, USA
| | - Aman Pathak
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Salvatore Loguercio
- Scripps Research Translational Institute, Scripps Research, La Jolla, CA, USA
| | - Steven E Reis
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shyam Visweswaran
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
3
|
Gu Z, He X, Yu P, Jia W, Yang X, Peng G, Hu P, Chen S, Chen H, Lin Y. Automatic quantitative stroke severity assessment based on Chinese clinical named entity recognition with domain-adaptive pre-trained large language model. Artif Intell Med 2024; 150:102822. [PMID: 38553162 DOI: 10.1016/j.artmed.2024.102822] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 01/28/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
BACKGROUND Stroke is a prevalent disease with a significant global impact. Effective assessment of stroke severity is vital for an accurate diagnosis, appropriate treatment, and optimal clinical outcomes. The National Institutes of Health Stroke Scale (NIHSS) is a widely used scale for quantitatively assessing stroke severity. However, the current manual scoring of NIHSS is labor-intensive, time-consuming, and sometimes unreliable. Applying artificial intelligence (AI) techniques to automate the quantitative assessment of stroke on vast amounts of electronic health records (EHRs) has attracted much interest. OBJECTIVE This study aims to develop an automatic, quantitative stroke severity assessment framework through automating the entire NIHSS scoring process on Chinese clinical EHRs. METHODS Our approach consists of two major parts: Chinese clinical named entity recognition (CNER) with a domain-adaptive pre-trained large language model (LLM) and automated NIHSS scoring. To build a high-performing CNER model, we first construct a stroke-specific, densely annotated dataset "Chinese Stroke Clinical Records" (CSCR) from EHRs provided by our partner hospital, based on a stroke ontology that defines semantically related entities for stroke assessment. We then pre-train a Chinese clinical LLM coined "CliRoberta" through domain-adaptive transfer learning and construct a deep learning-based CNER model that can accurately extract entities directly from Chinese EHRs. Finally, an automated, end-to-end NIHSS scoring pipeline is proposed by mapping the extracted entities to relevant NIHSS items and values, to quantitatively assess the stroke severity. RESULTS Results obtained on a benchmark dataset CCKS2019 and our newly created CSCR dataset demonstrate the superior performance of our domain-adaptive pre-trained LLM and the CNER model, compared with the existing benchmark LLMs and CNER models. The high F1 score of 0.990 ensures the reliability of our model in accurately extracting the entities for the subsequent automatic NIHSS scoring. Subsequently, our automated, end-to-end NIHSS scoring approach achieved excellent inter-rater agreement (0.823) and intraclass consistency (0.986) with the ground truth and significantly reduced the processing time from minutes to a few seconds. CONCLUSION Our proposed automatic and quantitative framework for assessing stroke severity demonstrates exceptional performance and reliability through directly scoring the NIHSS from diagnostic notes in Chinese clinical EHRs. Moreover, this study also contributes a new clinical dataset, a pre-trained clinical LLM, and an effective deep learning-based CNER model. The deployment of these advanced algorithms can improve the accuracy and efficiency of clinical assessment, and help improve the quality, affordability and productivity of healthcare services.
Collapse
Affiliation(s)
- Zhanzhong Gu
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia.
| | - Xiangjian He
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia; School of Computer Science, University of Nottingham Ningbo China, Ningbo, China
| | - Ping Yu
- School of Computing and Information Technology, University of Wollongong, NSW, 2522, Australia
| | - Wenjing Jia
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia
| | - Xiguang Yang
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia
| | - Gang Peng
- Intergenepharm Pty Ltd, Sydney, NSW, 2000, Australia
| | - Penghui Hu
- Department of Oncology, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Shiyan Chen
- Department of Neurology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Hongjie Chen
- Department of Traditional Chinese Medicine, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Yiguang Lin
- Department of Traditional Chinese Medicine, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China; Department of Immuno-Oncology, The First Affiliated Hospital of Guangdong Pharmaceutical University, China; School of Life Sciences, University of Technology Sydney, NSW, 2007, Australia
| |
Collapse
|
4
|
Liu Y, Bi D. Quantitative risk analysis of treatment plans for patients with tumor by mining historical similar patients from electronic health records using federated learning. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2023; 43:2422-2449. [PMID: 36906293 DOI: 10.1111/risa.14124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 12/11/2022] [Accepted: 02/06/2023] [Indexed: 06/18/2023]
Abstract
The determination of a treatment plan for a target patient with tumor is a difficult problem due to the existence of heterogeneity in patients' responses, incomplete information about tumor states, and asymmetric knowledge between doctors and patients, and so on. In this paper, a method for quantitative risk analysis of treatment plans for patients with tumor is proposed. To reduce the impacts of the heterogeneity in patients' responses on analysis results, the method conducts risk analysis by mining historical similar patients from Electronic Health Records (EHRs) in multiple hospitals using federated learning (FL). For this, the Recursive Feature Elimination based on the Support Vector Machine (SVM) and Deep Learning Important FeaTures (DeepLIFT) are extended into the FL framework to select key features and determine key feature weights for identifying historical similar patients. Then, in the database of each collaborative hospital, the similarities between the target patient and all historical patients are calculated, and the historical similar patients are determined. According to the statistics of tumor states and treatment outcomes of historical similar patients in all collaborative hospitals, the related data (including the probabilities of different tumor states and possible outcomes of different treatment plans) for risk analysis of the alternative treatment plans can be obtained, which can eliminate the asymmetric knowledge between doctors and patients. The related data are valuable for the doctor and patient to make their decisions. Experimental studies have been conducted to verify the feasibility and effectiveness of the proposed method.
Collapse
Affiliation(s)
- Yang Liu
- School of Economics and Management, Dalian University of Technology, Dalian, China
| | - Donghai Bi
- School of Economics and Management, Dalian University of Technology, Dalian, China
| |
Collapse
|
5
|
Li J, Chaudhary D, Sharma V, Sharma V, Avula V, Ssentongo P, Wolk DM, Zand R, Abedi V. An integrated pipeline for prediction of Clostridioides difficile infection. Sci Rep 2023; 13:16532. [PMID: 37783691 PMCID: PMC10545794 DOI: 10.1038/s41598-023-41753-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 08/31/2023] [Indexed: 10/04/2023] Open
Abstract
With the expansion of electronic health records(EHR)-linked genomic data comes the development of machine learning-enable models. There is a pressing need to develop robust pipelines to evaluate the performance of integrated models and minimize systemic bias. We developed a prediction model of symptomatic Clostridioides difficile infection(CDI) by integrating common EHR-based and genetic risk factors(rs2227306/IL8). Our pipeline includes (1) leveraging phenotyping algorithm to minimize temporal bias, (2) performing simulation studies to determine the predictive power in samples without genetic information, (3) propensity score matching to control for the confoundings, (4) selecting machine learning algorithms to capture complex feature interactions, (5) performing oversampling to address data imbalance, and (6) optimizing models and ensuring proper bias-variance trade-off. We evaluate the performance of prediction models of CDI when including common clinical risk factors and the benefit of incorporating genetic feature(s) into the models. We emphasize the importance of building a robust integrated pipeline to avoid systemic bias and thoroughly evaluating genetic features when integrated into the prediction models in the general population and subgroups.
Collapse
Affiliation(s)
- Jiang Li
- Department of Molecular and Functional Genomics, Geisinger Health System, Danville, PA, USA
| | - Durgesh Chaudhary
- Neuroscience Institute, Geisinger Health System, Danville, PA, USA
- Department of Neurology, College of Medicine, The Pennsylvania State University, Hershey, PA, 17033, USA
| | - Vaibhav Sharma
- Geisinger Commonwealth School of Medicine, Danville, PA, USA
| | - Vishakha Sharma
- College of Osteopathic Medicine, Kansas City University, Kansas City, MO, USA
| | - Venkatesh Avula
- Department of Molecular and Functional Genomics, Geisinger Health System, Danville, PA, USA
| | - Paddy Ssentongo
- Department of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Donna M Wolk
- Molecular and Microbial Diagnostics and Development, Geisinger Medical Center, Danville, PA, USA
| | - Ramin Zand
- Neuroscience Institute, Geisinger Health System, Danville, PA, USA
- Department of Neurology, College of Medicine, The Pennsylvania State University, Hershey, PA, 17033, USA
| | - Vida Abedi
- Department of Molecular and Functional Genomics, Geisinger Health System, Danville, PA, USA.
- Department of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA, USA.
| |
Collapse
|
6
|
He T, Belouali A, Patricoski J, Lehmann H, Ball R, Anagnostou V, Kreimeyer K, Botsis T. Trends and opportunities in computable clinical phenotyping: A scoping review. J Biomed Inform 2023; 140:104335. [PMID: 36933631 DOI: 10.1016/j.jbi.2023.104335] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/07/2023] [Accepted: 03/09/2023] [Indexed: 03/18/2023]
Abstract
Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.
Collapse
Affiliation(s)
- Ting He
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Anas Belouali
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jessica Patricoski
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Harold Lehmann
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Valsamo Anagnostou
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Taxiarchis Botsis
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
7
|
Forrest IS, Petrazzini BO, Duffy Á, Park JK, Marquez-Luna C, Jordan DM, Rocheleau G, Cho JH, Rosenson RS, Narula J, Nadkarni GN, Do R. Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. Lancet 2023; 401:215-225. [PMID: 36563696 PMCID: PMC10069625 DOI: 10.1016/s0140-6736(22)02079-7] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/05/2022] [Accepted: 10/18/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Binary diagnosis of coronary artery disease does not preserve the complexity of disease or quantify its severity or its associated risk with death; hence, a quantitative marker of coronary artery disease is warranted. We evaluated a quantitative marker of coronary artery disease derived from probabilities of a machine learning model. METHODS In this cohort study, we developed and validated a coronary artery disease-predictive machine learning model using 95 935 electronic health records and assessed its probabilities as in-silico scores for coronary artery disease (ISCAD; range 0 [lowest probability] to 1 [highest probability]) in participants in two longitudinal biobank cohorts. We measured the association of ISCAD with clinical outcomes-namely, coronary artery stenosis, obstructive coronary artery disease, multivessel coronary artery disease, all-cause death, and coronary artery disease sequelae. FINDINGS Among 95 935 participants, 35 749 were from the BioMe Biobank (median age 61 years [IQR 18]; 14 599 [41%] were male and 21 150 [59%] were female; 5130 [14%] were with diagnosed coronary artery disease) and 60 186 were from the UK Biobank (median age 62 [15] years; 25 031 [42%] male and 35 155 [58%] female; 8128 [14%] with diagnosed coronary artery disease). The model predicted coronary artery disease with an area under the receiver operating characteristic curve of 0·95 (95% CI 0·94-0·95; sensitivity of 0·94 [0·94-0·95] and specificity of 0·82 [0·81-0·83]) and 0·93 (0·92-0·93; sensitivity of 0·90 [0·89-0·90] and specificity of 0·88 [0·87-0·88]) in the BioMe validation and holdout sets, respectively, and 0·91 (0·91-0·91; sensitivity of 0·84 [0·83-0·84] and specificity of 0·83 [0·82-0·83]) in the UK Biobank external test set. ISCAD captured coronary artery disease risk from known risk factors, pooled cohort equations, and polygenic risk scores. Coronary artery stenosis increased quantitatively with ascending ISCAD quartiles (increase per quartile of 12 percentage points), including risk of obstructive coronary artery disease, multivessel coronary artery disease, and stenosis of major coronary arteries. Hazard ratios (HRs) and prevalence of all-cause death increased stepwise over ISCAD deciles (decile 1: HR 1·0 [95% CI 1·0-1·0], 0·2% prevalence; decile 6: 11 [3·9-31], 3·1% prevalence; and decile 10: 56 [20-158], 11% prevalence). A similar trend was observed for recurrent myocardial infarction. 12 (46%) undiagnosed individuals with high ISCAD (≥0·9) had clinical evidence of coronary artery disease according to the 2014 American College of Cardiology/American Heart Association Task Force guidelines. INTERPRETATION Electronic health record-based machine learning was used to generate an in-silico marker for coronary artery disease that can non-invasively quantify atherosclerosis and risk of death on a continuous spectrum, and identify underdiagnosed individuals. FUNDING National Institutes of Health.
Collapse
Affiliation(s)
- Iain S Forrest
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ben O Petrazzini
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Áine Duffy
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joshua K Park
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Carla Marquez-Luna
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniel M Jordan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ghislain Rocheleau
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Judy H Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert S Rosenson
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Metabolism and Lipids Unit, Zena and Michael A Wiener Cardiovascular Institute, Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Jagat Narula
- Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
8
|
Auwerx C, Sadler MC, Reymond A, Kutalik Z. From pharmacogenetics to pharmaco-omics: Milestones and future directions. HGG ADVANCES 2022; 3:100100. [PMID: 35373152 PMCID: PMC8971318 DOI: 10.1016/j.xhgg.2022.100100] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The origins of pharmacogenetics date back to the 1950s, when it was established that inter-individual differences in drug response are partially determined by genetic factors. Since then, pharmacogenetics has grown into its own field, motivated by the translation of identified gene-drug interactions into therapeutic applications. Despite numerous challenges ahead, our understanding of the human pharmacogenetic landscape has greatly improved thanks to the integration of tools originating from disciplines as diverse as biochemistry, molecular biology, statistics, and computer sciences. In this review, we discuss past, present, and future developments of pharmacogenetics methodology, focusing on three milestones: how early research established the genetic basis of drug responses, how technological progress made it possible to assess the full extent of pharmacological variants, and how multi-dimensional omics datasets can improve the identification, functional validation, and mechanistic understanding of the interplay between genes and drugs. We outline novel strategies to repurpose and integrate molecular and clinical data originating from biobanks to gain insights analogous to those obtained from randomized controlled trials. Emphasizing the importance of increased diversity, we envision future directions for the field that should pave the way to the clinical implementation of pharmacogenetics.
Collapse
Affiliation(s)
- Chiara Auwerx
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Center for Primary Care and Public Health, Lausanne, Switzerland
| | - Marie C. Sadler
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Center for Primary Care and Public Health, Lausanne, Switzerland
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Zoltán Kutalik
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Center for Primary Care and Public Health, Lausanne, Switzerland
| |
Collapse
|