1
|
Hama T, Alsaleh MM, Allery F, Choi JW, Tomlinson C, Wu H, Lai A, Pontikos N, Thygesen JH. Enhancing Patient Outcome Prediction Through Deep Learning With Sequential Diagnosis Codes From Structured Electronic Health Record Data: Systematic Review. J Med Internet Res 2025; 27:e57358. [PMID: 40100249 PMCID: PMC11962322 DOI: 10.2196/57358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 12/14/2024] [Accepted: 02/18/2025] [Indexed: 03/20/2025] Open
Abstract
BACKGROUND The use of structured electronic health records in health care systems has grown rapidly. These systems collect huge amounts of patient information, including diagnosis codes representing temporal medical history. Sequential diagnostic information has proven valuable for predicting patient outcomes. However, the extent to which these types of data have been incorporated into deep learning (DL) models has not been examined. OBJECTIVE This systematic review aims to describe the use of sequential diagnostic data in DL models, specifically to understand how these data are integrated, whether sample size improves performance, and whether the identified models are generalizable. METHODS Relevant studies published up to May 15, 2023, were identified using 4 databases: PubMed, Embase, IEEE Xplore, and Web of Science. We included all studies using DL algorithms trained on sequential diagnosis codes to predict patient outcomes. We excluded review articles and non-peer-reviewed papers. We evaluated the following aspects in the included papers: DL techniques, characteristics of the dataset, prediction tasks, performance evaluation, generalizability, and explainability. We also assessed the risk of bias and applicability of the studies using the Prediction Model Study Risk of Bias Assessment Tool (PROBAST). We used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist to report our findings. RESULTS Of the 740 identified papers, 84 (11.4%) met the eligibility criteria. Publications in this area increased yearly. Recurrent neural networks (and their derivatives; 47/84, 56%) and transformers (22/84, 26%) were the most commonly used architectures in DL-based models. Most studies (45/84, 54%) presented their input features as sequences of visit embeddings. Medications (38/84, 45%) were the most common additional feature. Of the 128 predictive outcome tasks, the most frequent was next-visit diagnosis (n=30, 23%), followed by heart failure (n=18, 14%) and mortality (n=17, 13%). Only 7 (8%) of the 84 studies evaluated their models in terms of generalizability. A positive correlation was observed between training sample size and model performance (area under the receiver operating characteristic curve; P=.02). However, 59 (70%) of the 84 studies had a high risk of bias. CONCLUSIONS The application of DL for advanced modeling of sequential medical codes has demonstrated remarkable promise in predicting patient outcomes. The main limitation of this study was the heterogeneity of methods and outcomes. However, our analysis found that using multiple types of features, integrating time intervals, and including larger sample sizes were generally related to an improved predictive performance. This review also highlights that very few studies (7/84, 8%) reported on challenges related to generalizability and less than half (38/84, 45%) of the studies reported on challenges related to explainability. Addressing these shortcomings will be instrumental in unlocking the full potential of DL for enhancing health care outcomes and patient care. TRIAL REGISTRATION PROSPERO CRD42018112161; https://tinyurl.com/yc6h9rwu.
Collapse
Affiliation(s)
- Tuankasfee Hama
- Institute of Health Informatics, University College London, London, United Kingdom
| | - Mohanad M Alsaleh
- Institute of Health Informatics, University College London, London, United Kingdom
- Department of Health Informatics, College of Applied Medical Sciences, Qassim University, Buraydah, Saudi Arabia
| | - Freya Allery
- Institute of Health Informatics, University College London, London, United Kingdom
| | - Jung Won Choi
- Institute of Health Informatics, University College London, London, United Kingdom
| | | | - Honghan Wu
- Institute of Health Informatics, University College London, London, United Kingdom
| | - Alvina Lai
- Institute of Health Informatics, University College London, London, United Kingdom
| | - Nikolas Pontikos
- UCL Institute of Ophthalmology, University College London, London, United Kingdom
| | - Johan H Thygesen
- Institute of Health Informatics, University College London, London, United Kingdom
| |
Collapse
|
2
|
Qiu J, Hu Y, Li L, Erzurumluoglu AM, Braenne I, Whitehurst C, Schmitz J, Arora J, Bartholdy BA, Gandhi S, Khoueiry P, Mueller S, Noyvert B, Ding Z, Jensen JN, de Jong J. Deep representation learning for clustering longitudinal survival data from electronic health records. Nat Commun 2025; 16:2534. [PMID: 40087274 PMCID: PMC11909183 DOI: 10.1038/s41467-025-56625-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 01/21/2025] [Indexed: 03/17/2025] Open
Abstract
Precision medicine requires accurate identification of clinically relevant patient subgroups. Electronic health records provide major opportunities for leveraging machine learning approaches to uncover novel patient subgroups. However, many existing approaches fail to adequately capture complex interactions between diagnosis trajectories and disease-relevant risk events, leading to subgroups that can still display great heterogeneity in event risk and underlying molecular mechanisms. To address this challenge, we implemented VaDeSC-EHR, a transformer-based variational autoencoder for clustering longitudinal survival data as extracted from electronic health records. We show that VaDeSC-EHR outperforms baseline methods on both synthetic and real-world benchmark datasets with known ground-truth cluster labels. In an application to Crohn's disease, VaDeSC-EHR successfully identifies four distinct subgroups with divergent diagnosis trajectories and risk profiles, revealing clinically and genetically relevant factors in Crohn's disease. Our results show that VaDeSC-EHR can be a powerful tool for discovering novel patient subgroups in the development of precision medicine approaches.
Collapse
Affiliation(s)
- Jiajun Qiu
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Yao Hu
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Li Li
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Abdullah Mesut Erzurumluoglu
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Ingrid Braenne
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Charles Whitehurst
- Immunology & Respiratory Diseases, Boehringer-Ingelheim, Ridgefield, CT, USA
| | - Jochen Schmitz
- Immunology & Respiratory Diseases, Boehringer-Ingelheim, Ridgefield, CT, USA
| | - Jatin Arora
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Boris Alexander Bartholdy
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Shrey Gandhi
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Pierre Khoueiry
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Stefanie Mueller
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Boris Noyvert
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Zhihao Ding
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Jan Nygaard Jensen
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany
| | - Johann de Jong
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
| |
Collapse
|
3
|
Velez-Arce A, Li MM, Gao W, Lin X, Huang K, Fu T, Pentelute BL, Kellis M, Zitnik M. Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.12.598655. [PMID: 38948789 PMCID: PMC11212894 DOI: 10.1101/2024.06.12.598655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Drug discovery AI datasets and benchmarks have not traditionally included single-cell analysis biomarkers. While benchmarking efforts in single-cell analysis have recently released collections of single-cell tasks, they have yet to comprehensively release datasets, models, and benchmarks that integrate a broad range of therapeutic discovery tasks with cell-type-specific biomarkers. Therapeutics Commons (TDC-2) presents datasets, tools, models, and benchmarks integrating cell-type-specific contextual features with ML tasks across therapeutics. We present four tasks for contextual learning at single-cell resolution: drug-target nomination, genetic perturbation response prediction, chemical perturbation response prediction, and protein-peptide interaction prediction. We introduce datasets, models, and benchmarks for these four tasks. Finally, we detail the advancements and challenges in machine learning and biology that drove the implementation of TDC-2 and how they are reflected in its architecture, datasets and benchmarks, and foundation model tooling.
Collapse
Affiliation(s)
| | - Michelle M. Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115
| | - Wenhao Gao
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Xiang Lin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115
| | - Kexin Huang
- Department of Computer Science, Stanford School of Engineering, Stanford, CA 94305
| | - Tianfan Fu
- Department of Computational Science, Rensselaer Polytechnic Institute, Troy, NY 12180
| | - Bradley L. Pentelute
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Computer Science and Artificial Intelligence Laboratory, MIT, Electrical Engineering and Computer Science Department, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Marinka Zitnik
- Broad Institute of MIT and Harvard, Harvard Data Science Initiative, Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215
| |
Collapse
|
4
|
Chae S, Street WN, Ramaraju N, Gilbertson-White S. Prediction of Cancer Symptom Trajectory Using Longitudinal Electronic Health Record Data and Long Short-Term Memory Neural Network. JCO Clin Cancer Inform 2024; 8:e2300039. [PMID: 38471054 PMCID: PMC10948138 DOI: 10.1200/cci.23.00039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 12/08/2023] [Accepted: 01/26/2024] [Indexed: 03/14/2024] Open
Abstract
PURPOSE Ability to predict symptom severity and progression across treatment trajectories would allow clinicians to provide timely intervention and treatment planning. However, such predictions are difficult because of sparse and inconsistent assessment, and simplistic measures such as the last observed symptom severity are often used. The purpose of this study is to develop a model for predicting future cancer symptom experiences on the basis of past symptom experiences. PATIENTS AND METHODS We performed a retrospective, longitudinal analysis using records of patients with cancer (n = 208) hospitalized between 2008 and 2014. A long short-term memory (LSTM)-based recurrent neural network, a linear regression, and random forest models were trained on previous symptoms experienced and used to predict future symptom trajectories. RESULTS We found that at least one of three tested models (LSTM, linear regression, and random forest) outperform predictions based solely on the previous clinical observation. LSTM models significantly outperformed linear regression and random forest models in predicting nausea (P < .1) and psychosocial status (P < .01). Linear regression outperformed all models when predicting oral health (P < .01), while random forest outperformed all models when predicting mobility (P < .01) and nutrition (P < .01). CONCLUSION We can successfully predict patients' symptom trajectories with a prediction model, built with sparse assessment data, using routinely collected nursing documentation. The results of this project can be applied to better individualize symptom management to support cancer patients' quality of life.
Collapse
Affiliation(s)
- Sena Chae
- The University of Iowa College of Nursing, Iowa City, IA
| | - W. Nick Street
- The University of Iowa Tippie College of Business, Iowa City, IA
| | - Naveenkumar Ramaraju
- University of Illinois Urbana-Champaign, Gies College of Business, Champaign, IL
| | | |
Collapse
|
5
|
Yang Z, Mitra A, Liu W, Berlowitz D, Yu H. TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records. Nat Commun 2023; 14:7857. [PMID: 38030638 PMCID: PMC10687211 DOI: 10.1038/s41467-023-43715-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 11/17/2023] [Indexed: 12/01/2023] Open
Abstract
Deep learning transformer-based models using longitudinal electronic health records (EHRs) have shown a great success in prediction of clinical diseases or outcomes. Pretraining on a large dataset can help such models map the input space better and boost their performance on relevant tasks through finetuning with limited data. In this study, we present TransformEHR, a generative encoder-decoder model with transformer that is pretrained using a new pretraining objective-predicting all diseases and outcomes of a patient at a future visit from previous visits. TransformEHR's encoder-decoder framework, paired with the novel pretraining objective, helps it achieve the new state-of-the-art performance on multiple clinical prediction tasks. Comparing with the previous model, TransformEHR improves area under the precision-recall curve by 2% (p < 0.001) for pancreatic cancer onset and by 24% (p = 0.007) for intentional self-harm in patients with post-traumatic stress disorder. The high performance in predicting intentional self-harm shows the potential of TransformEHR in building effective clinical intervention systems. TransformEHR is also generalizable and can be easily finetuned for clinical prediction tasks with limited data.
Collapse
Affiliation(s)
- Zhichao Yang
- College of Information and Computer Science, University of Massachusetts Amherst, Amherst, MA, USA
| | - Avijit Mitra
- College of Information and Computer Science, University of Massachusetts Amherst, Amherst, MA, USA
| | - Weisong Liu
- School of Computer & Information Sciences, University of Massachusetts Lowell, Lowell, MA, USA
- Center for Healthcare Organization and Implementation Research, VA Bedford Health Care System, Bedford, MA, USA
| | - Dan Berlowitz
- Center for Healthcare Organization and Implementation Research, VA Bedford Health Care System, Bedford, MA, USA
- Department of Public Health, University of Massachusetts Lowell, Lowell, MA, USA
| | - Hong Yu
- College of Information and Computer Science, University of Massachusetts Amherst, Amherst, MA, USA.
- School of Computer & Information Sciences, University of Massachusetts Lowell, Lowell, MA, USA.
- Center for Healthcare Organization and Implementation Research, VA Bedford Health Care System, Bedford, MA, USA.
- Center for Biomedical and Health Research in Data Sciences, University of Massachusetts Lowell, Lowell, MA, USA.
| |
Collapse
|
6
|
Iwase S, Nakada TA, Shimada T, Oami T, Shimazui T, Takahashi N, Yamabe J, Yamao Y, Kawakami E. Prediction algorithm for ICU mortality and length of stay using machine learning. Sci Rep 2022; 12:12912. [PMID: 35902633 PMCID: PMC9334583 DOI: 10.1038/s41598-022-17091-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 07/20/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning can predict outcomes and determine variables contributing to precise prediction, and can thus classify patients with different risk factors of outcomes. This study aimed to investigate the predictive accuracy for mortality and length of stay in intensive care unit (ICU) patients using machine learning, and to identify the variables contributing to the precise prediction or classification of patients. Patients (n = 12,747) admitted to the ICU at Chiba University Hospital were randomly assigned to the training and test cohorts. After learning using the variables on admission in the training cohort, the area under the curve (AUC) was analyzed in the test cohort to evaluate the predictive accuracy of the supervised machine learning classifiers, including random forest (RF) for outcomes (primary outcome, mortality; secondary outcome, length of ICU stay). The rank of the variables that contributed to the machine learning prediction was confirmed, and cluster analysis of the patients with risk factors of mortality was performed to identify the important variables associated with patient outcomes. Machine learning using RF revealed a high predictive value for mortality, with an AUC of 0.945 (95% confidence interval [CI] 0.922–0.977). In addition, RF showed high predictive value for short and long ICU stays, with AUCs of 0.881 (95% CI 0.876–0.908) and 0.889 (95% CI 0.849–0.936), respectively. Lactate dehydrogenase (LDH) was identified as a variable contributing to the precise prediction in machine learning for both mortality and length of ICU stay. LDH was also identified as a contributing variable to classify patients into sub-populations based on different risk factors of mortality. The machine learning algorithm could predict mortality and length of stay in ICU patients with high accuracy. LDH was identified as a contributing variable in mortality and length of ICU stay prediction and could be used to classify patients based on mortality risk.
Collapse
Affiliation(s)
- Shinya Iwase
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, 1-8-1 Inohana, Chuo-ku, Chiba, Chiba, 260-8677, Japan
| | - Taka-Aki Nakada
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, 1-8-1 Inohana, Chuo-ku, Chiba, Chiba, 260-8677, Japan. .,Smart 119 Inc., 7th floor, Chiba Chuo Twin Building No. 2, 2-5-1 Chuo, Chiba, Japan.
| | - Tadanaga Shimada
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, 1-8-1 Inohana, Chuo-ku, Chiba, Chiba, 260-8677, Japan
| | - Takehiko Oami
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, 1-8-1 Inohana, Chuo-ku, Chiba, Chiba, 260-8677, Japan
| | - Takashi Shimazui
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, 1-8-1 Inohana, Chuo-ku, Chiba, Chiba, 260-8677, Japan
| | - Nozomi Takahashi
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, 1-8-1 Inohana, Chuo-ku, Chiba, Chiba, 260-8677, Japan
| | - Jun Yamabe
- Smart 119 Inc., 7th floor, Chiba Chuo Twin Building No. 2, 2-5-1 Chuo, Chiba, Japan
| | - Yasuo Yamao
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, 1-8-1 Inohana, Chuo-ku, Chiba, Chiba, 260-8677, Japan.,Smart 119 Inc., 7th floor, Chiba Chuo Twin Building No. 2, 2-5-1 Chuo, Chiba, Japan
| | - Eiryo Kawakami
- Department of Artificial Intelligence Medicine, Chiba University Graduate School of Medicine, Chiba, Japan.,Medical Sciences Innovation Hub Program, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, Japan
| |
Collapse
|
7
|
Oliver LD, Hawco C, Viviano JD, Voineskos AN. From the Group to the Individual in Schizophrenia Spectrum Disorders: Biomarkers of Social Cognitive Impairments and Therapeutic Translation. Biol Psychiatry 2022; 91:699-708. [PMID: 34799097 DOI: 10.1016/j.biopsych.2021.09.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 08/11/2021] [Accepted: 09/11/2021] [Indexed: 12/23/2022]
Abstract
People with schizophrenia spectrum disorders (SSDs) often experience persistent social cognitive impairments, associated with poor functional outcome. There are currently no approved treatment options for these debilitating symptoms, highlighting the need for novel therapeutic strategies. Work to date has elucidated differential social processes and underlying neural circuitry affected in SSDs, which may be amenable to modulation using neurostimulation. Further, advances in functional connectivity mapping and electric field modeling may be used to identify individualized treatment targets to maximize the impact of brain stimulation on social cognitive networks. Here, we review literature supporting a roadmap for translating functional connectivity biomarker discovery to individualized treatment development for social cognitive impairments in SSDs. First, we outline the relevance of social cognitive impairments in SSDs. We review machine learning approaches for dimensional brain-behavior biomarker discovery, emphasizing the importance of individual differences. We synthesize research showing that brain stimulation techniques, such as repetitive transcranial magnetic stimulation, can be used to target relevant networks. Further, functional connectivity-based individualized targeting may enhance treatment response. We then outline recent approaches to account for neuroanatomical variability and optimize coil positioning to individually maximize target engagement. Overall, the synthesized literature provides support for the utility and feasibility of this translational approach to precision treatment. The proposed roadmap to translate biomarkers of social cognitive impairments to individualized treatment is currently under evaluation in precision-guided trials. Such a translational approach may also be applicable across conditions and generalizable for the development of individualized neurostimulation targeting other behavioral deficits.
Collapse
Affiliation(s)
- Lindsay D Oliver
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| | - Colin Hawco
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Ontario, Canada; Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
| | - Joseph D Viviano
- Mila-Quebec Artificial Intelligence Institute, Montreal, Quebec, Canada
| | - Aristotle N Voineskos
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Ontario, Canada; Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
8
|
Huang Y, Zheng Z, Ma M, Xin X, Liu H, Fei X, Wei L, Chen H. Improving Performance of Outcome Prediction for In-patients with Acute Myocardial Infarction Based on Embedding Representation Learned from Electronic Medical Records: Development and Validation Study (Preprint). J Med Internet Res 2022; 24:e37486. [PMID: 35921141 PMCID: PMC9386580 DOI: 10.2196/37486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 06/02/2022] [Accepted: 07/18/2022] [Indexed: 11/18/2022] Open
Abstract
Background The widespread secondary use of electronic medical records (EMRs) promotes health care quality improvement. Representation learning that can automatically extract hidden information from EMR data has gained increasing attention. Objective We aimed to propose a patient representation with more feature associations and task-specific feature importance to improve the outcome prediction performance for inpatients with acute myocardial infarction (AMI). Methods Medical concepts, including patients’ age, gender, disease diagnoses, laboratory tests, structured radiological features, procedures, and medications, were first embedded into real-value vectors using the improved skip-gram algorithm, where concepts in the context windows were selected by feature association strengths measured by association rule confidence. Then, each patient was represented as the sum of the feature embeddings weighted by the task-specific feature importance, which was applied to facilitate predictive model prediction from global and local perspectives. We finally applied the proposed patient representation into mortality risk prediction for 3010 and 1671 AMI inpatients from a public data set and a private data set, respectively, and compared it with several reference representation methods in terms of the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score. Results Compared with the reference methods, the proposed embedding-based representation showed consistently superior predictive performance on the 2 data sets, achieving mean AUROCs of 0.878 and 0.973, AUPRCs of 0.220 and 0.505, and F1-scores of 0.376 and 0.674 for the public and private data sets, respectively, while the greatest AUROCs, AUPRCs, and F1-scores among the reference methods were 0.847 and 0.939, 0.196 and 0.283, and 0.344 and 0.361 for the public and private data sets, respectively. Feature importance integrated in patient representation reflected features that were also critical in prediction tasks and clinical practice. Conclusions The introduction of feature associations and feature importance facilitated an effective patient representation and contributed to prediction performance improvement and model interpretation.
Collapse
Affiliation(s)
- Yanqun Huang
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| | - Zhimin Zheng
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| | - Moxuan Ma
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| | - Xin Xin
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| | - Honglei Liu
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| | - Xiaolu Fei
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Lan Wei
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Hui Chen
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China
| |
Collapse
|
9
|
Hayes CJ, Cucciare MA, Martin BC, Hudson TJ, Bush K, Lo-Ciganic W, Yu H, Charron E, Gordon AJ. Using data science to improve outcomes for persons with opioid use disorder. Subst Abus 2022; 43:956-963. [PMID: 35420927 PMCID: PMC9705076 DOI: 10.1080/08897077.2022.2060446] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Medication treatment for opioid use disorder (MOUD) is an effective evidence-based therapy for decreasing opioid-related adverse outcomes. Effective strategies for retaining persons on MOUD, an essential step to improving outcomes, are needed as roughly half of all persons initiating MOUD discontinue within a year. Data science may be valuable and promising for improving MOUD retention by using "big data" (e.g., electronic health record data, claims data mobile/sensor data, social media data) and specific machine learning techniques (e.g., predictive modeling, natural language processing, reinforcement learning) to individualize patient care. Maximizing the utility of data science to improve MOUD retention requires a three-pronged approach: (1) increasing funding for data science research for OUD, (2) integrating data from multiple sources including treatment for OUD and general medical care as well as data not specific to medical care (e.g., mobile, sensor, and social media data), and (3) applying multiple data science approaches with integrated big data to provide insights and optimize advances in the OUD and overall addiction fields.
Collapse
Affiliation(s)
- Corey J Hayes
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
- Central Arkansas Veterans Healthcare System, Center for Mental Healthcare and Outcomes Research, North Little Rock, Arkansas, USA
| | - Michael A Cucciare
- Central Arkansas Veterans Healthcare System, Center for Mental Healthcare and Outcomes Research, North Little Rock, Arkansas, USA
- Center for Health Services Research, Department of Psychiatry, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
- Veterans Affairs South Central Mental Illness Research, Education and Clinical Center, Central Arkansas Veterans Healthcare System, North Little Rock, Arkansas, USA
| | - Bradley C Martin
- Division of Pharmaceutical Evaluation and Policy, College of Pharmacy, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Teresa J Hudson
- Central Arkansas Veterans Healthcare System, Center for Mental Healthcare and Outcomes Research, North Little Rock, Arkansas, USA
- Center for Health Services Research, Department of Psychiatry, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Keith Bush
- Brain Imaging Research Center, Department of Psychiatry, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Weihsuan Lo-Ciganic
- Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, Florida, USA
- Center for Drug Evaluation and Safety (CoDES), College of Pharmacy, University of Florida, Gainesville, Florida, USA
| | - Hong Yu
- Department of Computer Science, Kennedy College of Sciences, University of Massachusetts Lowell, Lowell, Florida, USA
- Center for Healthcare Organization and Implementation Research, VA Bedford Healthcare System, Bedford, MA
| | - Elizabeth Charron
- Program for Addiction Research, Clinical Care, Knowledge, and Advocacy (PARCKA), Division of Epidemiology, Department of Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA
| | - Adam J Gordon
- Program for Addiction Research, Clinical Care, Knowledge, and Advocacy (PARCKA), Division of Epidemiology, Department of Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA
- Informatics, Decision-Enhancement and Analytic Sciences (IDEAS) Center, VA Salt Lake City Healthcare System, Salt Lake City, Utah, USA
| |
Collapse
|
10
|
Xie F, Yuan H, Ning Y, Ong MEH, Feng M, Hsu W, Chakraborty B, Liu N. Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies. J Biomed Inform 2021; 126:103980. [PMID: 34974189 DOI: 10.1016/j.jbi.2021.103980] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 11/07/2021] [Accepted: 12/20/2021] [Indexed: 12/21/2022]
Abstract
OBJECTIVE Temporal electronic health records (EHRs) contain a wealth of information for secondary uses, such as clinical events prediction and chronic disease management. However, challenges exist for temporal data representation. We therefore sought to identify these challenges and evaluate novel methodologies for addressing them through a systematic examination of deep learning solutions. METHODS We searched five databases (PubMed, Embase, the Institute of Electrical and Electronics Engineers [IEEE] Xplore Digital Library, the Association for Computing Machinery [ACM] Digital Library, and Web of Science) complemented with hand-searching in several prestigious computer science conference proceedings. We sought articles that reported deep learning methodologies on temporal data representation in structured EHR data from January 1, 2010, to August 30, 2020. We summarized and analyzed the selected articles from three perspectives: nature of time series, methodology, and model implementation. RESULTS We included 98 articles related to temporal data representation using deep learning. Four major challenges were identified, including data irregularity, heterogeneity, sparsity, and model opacity. We then studied how deep learning techniques were applied to address these challenges. Finally, we discuss some open challenges arising from deep learning. CONCLUSION Temporal EHR data present several major challenges for clinical prediction modeling and data utilization. To some extent, current deep learning solutions can address these challenges. Future studies may consider designing comprehensive and integrated solutions. Moreover, researchers should incorporate clinical domain knowledge into study designs and enhance model interpretability to facilitate clinical implementation.
Collapse
Affiliation(s)
- Feng Xie
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Han Yuan
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Department of Emergency Medicine, Singapore General Hospital, Singapore
| | - Mengling Feng
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Wynne Hsu
- School of Computing, National University of Singapore, Singapore; Institute of Data Science, National University of Singapore, Singapore
| | - Bibhas Chakraborty
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Department of Statistics and Data Science, National University of Singapore, Singapore; Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Nan Liu
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Institute of Data Science, National University of Singapore, Singapore; SingHealth AI Health Program, Singapore Health Services, Singapore.
| |
Collapse
|
11
|
Syed M, Syed S, Sexton K, Syeda HB, Garza M, Zozus M, Syed F, Begum S, Syed AU, Sanford J, Prior F. Application of Machine Learning in Intensive Care Unit (ICU) Settings Using MIMIC Dataset: Systematic Review. INFORMATICS-BASEL 2021; 8. [PMID: 33981592 DOI: 10.3390/informatics8010016] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Modern Intensive Care Units (ICUs) provide continuous monitoring of critically ill patients susceptible to many complications affecting morbidity and mortality. ICU settings require a high staff-to-patient ratio and generates a sheer volume of data. For clinicians, the real-time interpretation of data and decision-making is a challenging task. Machine Learning (ML) techniques in ICUs are making headway in the early detection of high-risk events due to increased processing power and freely available datasets such as the Medical Information Mart for Intensive Care (MIMIC). We conducted a systematic literature review to evaluate the effectiveness of applying ML in the ICU settings using the MIMIC dataset. A total of 322 articles were reviewed and a quantitative descriptive analysis was performed on 61 qualified articles that applied ML techniques in ICU settings using MIMIC data. We assembled the qualified articles to provide insights into the areas of application, clinical variables used, and treatment outcomes that can pave the way for further adoption of this promising technology and possible use in routine clinical decision-making. The lessons learned from our review can provide guidance to researchers on application of ML techniques to increase their rate of adoption in healthcare.
Collapse
Affiliation(s)
- Mahanazuddin Syed
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Shorabuddin Syed
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Kevin Sexton
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
- Department of Surgery, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
- Department of Health Policy and Management, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Hafsa Bareen Syeda
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Maryam Garza
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Meredith Zozus
- Department of Population Health Sciences, University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229, USA
| | - Farhanuddin Syed
- Shadan Institute of Medical Sciences, College of Medicine, Hyderabad, Telangana 500086, India
| | - Salma Begum
- Department of Information Technology, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Abdullah Usama Syed
- Department of Information Science, University of Arkansas at Little Rock (UALR), Little Rock, Arkansas 72205, USA
| | - Joseph Sanford
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
- Department of Anesthesiology, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Fred Prior
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| |
Collapse
|