1
|
Heumos L, Ehmele P, Treis T, Upmeier Zu Belzen J, Roellin E, May L, Namsaraeva A, Horlava N, Shitov VA, Zhang X, Zappia L, Knoll R, Lang NJ, Hetzel L, Virshup I, Sikkema L, Curion F, Eils R, Schiller HB, Hilgendorff A, Theis FJ. An open-source framework for end-to-end analysis of electronic health record data. Nat Med 2024:10.1038/s41591-024-03214-0. [PMID: 39266748 DOI: 10.1038/s41591-024-03214-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 07/25/2024] [Indexed: 09/14/2024]
Abstract
With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy's features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.
Collapse
Affiliation(s)
- Lukas Heumos
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Philipp Ehmele
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
| | - Tim Treis
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | | | - Eljas Roellin
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Lilly May
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Altana Namsaraeva
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA), Darmstadt, Germany
| | - Nastassya Horlava
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Vladimir A Shitov
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Xinyue Zhang
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
| | - Luke Zappia
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Rainer Knoll
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), Bonn, Germany
| | - Niklas J Lang
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany
| | - Leon Hetzel
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Isaac Virshup
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
| | - Lisa Sikkema
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Fabiola Curion
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Roland Eils
- Health Data Science Unit, Heidelberg University and BioQuant, Heidelberg, Germany
- Center for Digital Health, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Herbert B Schiller
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany
- Research Unit, Precision Regenerative Medicine (PRM), Helmholtz Munich, Munich, Germany
| | - Anne Hilgendorff
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany
- Center for Comprehensive Developmental Care (CDeCLMU) at the Social Pediatric Center, Dr. von Hauner Children's Hospital, LMU Hospital, Ludwig Maximilian University, Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
| |
Collapse
|
2
|
Naumova K, Devos A, Karimireddy SP, Jaggi M, Hartley MA. MyThisYourThat for interpretable identification of systematic bias in federated learning for biomedical images. NPJ Digit Med 2024; 7:238. [PMID: 39242810 PMCID: PMC11379706 DOI: 10.1038/s41746-024-01226-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 08/14/2024] [Indexed: 09/09/2024] Open
Abstract
Distributed collaborative learning is a promising approach for building predictive models for privacy-sensitive biomedical images. Here, several data owners (clients) train a joint model without sharing their original data. However, concealed systematic biases can compromise model performance and fairness. This study presents MyThisYourThat (MyTH) approach, which adapts an interpretable prototypical part learning network to a distributed setting, enabling each client to visualize feature differences learned by others on their own image: comparing one client's 'This' with others' 'That'. Our setting demonstrates four clients collaboratively training two diagnostic classifiers on a benchmark X-ray dataset. Without data bias, the global model reaches 74.14% balanced accuracy for cardiomegaly and 74.08% for pleural effusion. We show that with systematic visual bias in one client, the performance of global models drops to near-random. We demonstrate how differences between local and global prototypes reveal biases and allow their visualization on each client's data without compromising privacy.
Collapse
Affiliation(s)
- Klavdiia Naumova
- Laboratory for Intelligent Global Health and Humanitarian Response Technologies (LiGHT), Machine Learning and Optimization Laboratory, Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland
| | - Arnout Devos
- ETH AI Center, Swiss Federal Institute of Technology Zurich (ETH Zurich), Zurich, Switzerland
| | - Sai Praneeth Karimireddy
- Berkeley AI Research Laboratory, University of California, Berkeley, CA, USA
- Department of Computer Science, University of Southern California, Los Angeles, CA, USA
| | - Martin Jaggi
- Machine Learning and Optimization Laboratory, Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland
| | - Mary-Anne Hartley
- Laboratory for Intelligent Global Health and Humanitarian Response Technologies (LiGHT), Machine Learning and Optimization Laboratory, Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland.
- Laboratory for Intelligent Global Health and Humanitarian Response Technologies (LiGHT), School of Medicine, Yale University, New Haven, CT, USA.
| |
Collapse
|
3
|
Pavia G, Branda F, Ciccozzi A, Romano C, Locci C, Azzena I, Pascale N, Marascio N, Quirino A, Matera G, Giovanetti M, Casu M, Sanna D, Ceccarelli G, Ciccozzi M, Scarpa F. Integrating Digital Health Solutions with Immunization Strategies: Improving Immunization Coverage and Monitoring in the Post-COVID-19 Era. Vaccines (Basel) 2024; 12:847. [PMID: 39203973 PMCID: PMC11359052 DOI: 10.3390/vaccines12080847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 07/22/2024] [Accepted: 07/26/2024] [Indexed: 09/03/2024] Open
Abstract
The COVID-19 pandemic underscored the critical importance of vaccination to global health security and highlighted the potential of digital health solutions to improve immunization strategies. This article explores integrating digital health technologies with immunization programs to improve coverage, monitoring, and public health outcomes. It examines the current landscape of digital tools used in immunization initiatives, such as mobile health apps, electronic health records, and data analytics platforms. Case studies from different regions demonstrate the effectiveness of these technologies in addressing challenges such as vaccine hesitancy, logistics, and real-time monitoring of vaccine distribution and adverse events. The paper also examines ethical considerations, data privacy issues, and the need for a robust digital infrastructure to support these innovations. By analyzing the successes and limitations of digital health interventions in immunization campaigns during and after the COVID-19 pandemic, we provide recommendations for future integration strategies to ensure resilient and responsive immunization systems. This research aims to guide policymakers, health professionals, and technologists in leveraging digital health to strengthen immunization efforts and prepare for future public health emergencies.
Collapse
Affiliation(s)
- Grazia Pavia
- Unit of Clinical Microbiology, Department of Health Sciences, “Magna Græcia” University of Catanzaro—“Renato Dulbecco” Teaching Hospital, 88100 Catanzaro, Italy; (G.P.); (N.M.); (A.Q.); (G.M.)
| | - Francesco Branda
- Unit of Medical Statistics and Molecular Epidemiology, Università Campus Bio-Medico di Roma, 00128 Rome, Italy; (C.R.); (M.C.)
| | - Alessandra Ciccozzi
- Department of Biomedical Sciences, University of Sassari, 07100 Sassari, Italy; (A.C.); (C.L.); (D.S.); (F.S.)
| | - Chiara Romano
- Unit of Medical Statistics and Molecular Epidemiology, Università Campus Bio-Medico di Roma, 00128 Rome, Italy; (C.R.); (M.C.)
| | - Chiara Locci
- Department of Biomedical Sciences, University of Sassari, 07100 Sassari, Italy; (A.C.); (C.L.); (D.S.); (F.S.)
- Department of Veterinary Medicine, University of Sassari, 07100 Sassari, Italy; (I.A.); (N.P.); (M.C.)
| | - Ilenia Azzena
- Department of Veterinary Medicine, University of Sassari, 07100 Sassari, Italy; (I.A.); (N.P.); (M.C.)
| | - Noemi Pascale
- Department of Veterinary Medicine, University of Sassari, 07100 Sassari, Italy; (I.A.); (N.P.); (M.C.)
| | - Nadia Marascio
- Unit of Clinical Microbiology, Department of Health Sciences, “Magna Græcia” University of Catanzaro—“Renato Dulbecco” Teaching Hospital, 88100 Catanzaro, Italy; (G.P.); (N.M.); (A.Q.); (G.M.)
| | - Angela Quirino
- Unit of Clinical Microbiology, Department of Health Sciences, “Magna Græcia” University of Catanzaro—“Renato Dulbecco” Teaching Hospital, 88100 Catanzaro, Italy; (G.P.); (N.M.); (A.Q.); (G.M.)
| | - Giovanni Matera
- Unit of Clinical Microbiology, Department of Health Sciences, “Magna Græcia” University of Catanzaro—“Renato Dulbecco” Teaching Hospital, 88100 Catanzaro, Italy; (G.P.); (N.M.); (A.Q.); (G.M.)
| | - Marta Giovanetti
- Department of Sciences and Technologies for Sustainable Development and One Health, Università Campus Bio-Medico di Roma, 00128 Rome, Italy;
- Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Minas Gerais, Brazil
- Climate Amplified Diseases and Epidemics (CLIMADE), Brasilia 70070-130, Goias, Brazil
| | - Marco Casu
- Department of Veterinary Medicine, University of Sassari, 07100 Sassari, Italy; (I.A.); (N.P.); (M.C.)
| | - Daria Sanna
- Department of Biomedical Sciences, University of Sassari, 07100 Sassari, Italy; (A.C.); (C.L.); (D.S.); (F.S.)
| | - Giancarlo Ceccarelli
- Department of Public Health and Infectious Diseases, University Hospital Policlinico Umberto I, Sapienza University of Rome, 00161 Rome, Italy;
| | - Massimo Ciccozzi
- Unit of Medical Statistics and Molecular Epidemiology, Università Campus Bio-Medico di Roma, 00128 Rome, Italy; (C.R.); (M.C.)
| | - Fabio Scarpa
- Department of Biomedical Sciences, University of Sassari, 07100 Sassari, Italy; (A.C.); (C.L.); (D.S.); (F.S.)
| |
Collapse
|
4
|
Meng W, Xu J, Huang Y, Wang C, Song Q, Ma A, Song L, Bian J, Ma Q, Yin R. Autoencoder to Identify Sex-Specific Sub-phenotypes in Alzheimer's Disease Progression Using Longitudinal Electronic Health Records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.07.24310055. [PMID: 39040206 PMCID: PMC11261930 DOI: 10.1101/2024.07.07.24310055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
Alzheimer's Disease (AD) is a complex neurodegenerative disorder significantly influenced by sex differences, with approximately two-thirds of AD patients being women. Characterizing the sex-specific AD progression and identifying its progression trajectory is a crucial step to developing effective risk stratification and prevention strategies. In this study, we developed an autoencoder to uncover sex-specific sub-phenotypes in AD progression leveraging longitudinal electronic health record (EHR) data from OneFlorida+ Clinical Research Consortium. Specifically, we first constructed temporal patient representation using longitudinal EHRs from a sex-stratified AD cohort. We used a long short-term memory (LSTM)-based autoencoder to extract and generate latent representation embeddings from sequential clinical records of patients. We then applied hierarchical agglomerative clustering to the learned representations, grouping patients based on their progression sub-phenotypes. The experimental results show we successfully identified five primary sex-based AD sub-phenotypes with corresponding progression pathways with high confidence. These sex-specific sub-phenotypes not only illustrated distinct AD progression patterns but also revealed differences in clinical characteristics and comorbidities between females and males in AD development. These findings could provide valuable insights for advancing personalized AD intervention and treatment strategies.
Collapse
Affiliation(s)
- Weimin Meng
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, 32610, USA
| | - Jie Xu
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, 32610, USA
| | - Yu Huang
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, 32610, USA
| | - Cankun Wang
- Department of Biomedical Informatics, Ohio State University, Columbus, OH, 43210, USA
| | - Qianqian Song
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, 32610, USA
| | - Anjun Ma
- Department of Biomedical Informatics, Ohio State University, Columbus, OH, 43210, USA
| | - Lixin Song
- School of Nursing, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229, USA
| | - Jiang Bian
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, 32610, USA
| | - Qin Ma
- Department of Biomedical Informatics, Ohio State University, Columbus, OH, 43210, USA
| | - Rui Yin
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, 32610, USA
| |
Collapse
|
5
|
Tang AS, Woldemariam SR, Miramontes S, Norgeot B, Oskotsky TT, Sirota M. Harnessing EHR data for health research. Nat Med 2024; 30:1847-1855. [PMID: 38965433 DOI: 10.1038/s41591-024-03074-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/17/2024] [Indexed: 07/06/2024]
Abstract
With the increasing availability of rich, longitudinal, real-world clinical data recorded in electronic health records (EHRs) for millions of patients, there is a growing interest in leveraging these records to improve the understanding of human health and disease and translate these insights into clinical applications. However, there is also a need to consider the limitations of these data due to various biases and to understand the impact of missing information. Recognizing and addressing these limitations can inform the design and interpretation of EHR-based informatics studies that avoid confusing or incorrect conclusions, particularly when applied to population or precision medicine. Here we discuss key considerations in the design, implementation and interpretation of EHR-based informatics studies, drawing from examples in the literature across hypothesis generation, hypothesis testing and machine learning applications. We outline the growing opportunities for EHR-based informatics studies, including association studies and predictive modeling, enabled by evolving AI capabilities-while addressing limitations and potential pitfalls to avoid.
Collapse
Affiliation(s)
- Alice S Tang
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Sarah R Woldemariam
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Silvia Miramontes
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | | | - Tomiko T Oskotsky
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
6
|
Wang R, Kuo PC, Chen LC, Seastedt KP, Gichoya JW, Celi LA. Drop the shortcuts: image augmentation improves fairness and decreases AI detection of race and other demographics from medical images. EBioMedicine 2024; 102:105047. [PMID: 38471396 PMCID: PMC10945176 DOI: 10.1016/j.ebiom.2024.105047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/14/2024] Open
Abstract
BACKGROUND It has been shown that AI models can learn race on medical images, leading to algorithmic bias. Our aim in this study was to enhance the fairness of medical image models by eliminating bias related to race, age, and sex. We hypothesise models may be learning demographics via shortcut learning and combat this using image augmentation. METHODS This study included 44,953 patients who identified as Asian, Black, or White (mean age, 60.68 years ±18.21; 23,499 women) for a total of 194,359 chest X-rays (CXRs) from MIMIC-CXR database. The included CheXpert images comprised 45,095 patients (mean age 63.10 years ±18.14; 20,437 women) for a total of 134,300 CXRs were used for external validation. We also collected 1195 3D brain magnetic resonance imaging (MRI) data from the ADNI database, which included 273 participants with an average age of 76.97 years ±14.22, and 142 females. DL models were trained on either non-augmented or augmented images and assessed using disparity metrics. The features learned by the models were analysed using task transfer experiments and model visualisation techniques. FINDINGS In the detection of radiological findings, training a model using augmented CXR images was shown to reduce disparities in error rate among racial groups (-5.45%), age groups (-13.94%), and sex (-22.22%). For AD detection, the model trained with augmented MRI images was shown 53.11% and 31.01% reduction of disparities in error rate among age and sex groups, respectively. Image augmentation led to a reduction in the model's ability to identify demographic attributes and resulted in the model trained for clinical purposes incorporating fewer demographic features. INTERPRETATION The model trained using the augmented images was less likely to be influenced by demographic information in detecting image labels. These results demonstrate that the proposed augmentation scheme could enhance the fairness of interpretations by DL models when dealing with data from patients with different demographic backgrounds. FUNDING National Science and Technology Council (Taiwan), National Institutes of Health.
Collapse
Affiliation(s)
- Ryan Wang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Po-Chih Kuo
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
| | - Li-Ching Chen
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Kenneth Patrick Seastedt
- Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA; Department of Thoracic Surgery, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | | | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA; Division of Pulmonary Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
7
|
AlSaad R, Malluhi Q, Abd-Alrazaq A, Boughorbel S. Temporal self-attention for risk prediction from electronic health records using non-stationary kernel approximation. Artif Intell Med 2024; 149:102802. [PMID: 38462292 DOI: 10.1016/j.artmed.2024.102802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 09/27/2023] [Accepted: 02/03/2024] [Indexed: 03/12/2024]
Abstract
Effective modeling of patient representation from electronic health records (EHRs) is increasingly becoming a vital research topic. Yet, modeling the non-stationarity in EHR data has received less attention. Most existing studies follow a strong assumption of stationarity in patient representation from EHRs. However, in practice, a patient's visits are irregularly spaced over a relatively long period of time, and disease progression patterns exhibit non-stationarity. Furthermore, the time gaps between patient visits often encapsulate significant domain knowledge, potentially revealing undiscovered patterns that characterize specific medical conditions. To address these challenges, we introduce a new method which combines the self-attention mechanism with non-stationary kernel approximation to capture both contextual information and temporal relationships between patient visits in EHRs. To assess the effectiveness of our proposed approach, we use two real-world EHR datasets, comprising a total of 76,925 patients, for the task of predicting the next diagnosis code for a patient, given their EHR history. The first dataset is a general EHR cohort and consists of 11,451 patients with a total of 3,485 unique diagnosis codes. The second dataset is a disease-specific cohort that includes 65,474 pregnant patients and encompasses a total of 9,782 unique diagnosis codes. Our experimental evaluation involved nine prediction models, categorized into three distinct groups. Group 1 comprises the baselines: original self-attention with positional encoding model, RETAIN model, and LSTM model. Group 2 includes models employing self-attention with stationary kernel approximations, specifically incorporating three variations of Bochner's feature maps. Lastly, Group 3 consists of models utilizing self-attention with non-stationary kernel approximations, including quadratic, cubic, and bi-quadratic polynomials. The experimental results demonstrate that non-stationary kernels significantly outperformed baseline methods for NDCG@10 and Hit@10 metrics in both datasets. The performance boost was more substantial in dataset 1 for the NDCG@10 metric. On the other hand, stationary Kernels showed significant but smaller gains over baselines and were nearly as effective as Non-stationary Kernels for Hit@10 in dataset 2. These findings robustly validate the efficacy of employing non-stationary kernels for temporal modeling of EHR data, and emphasize the importance of modeling non-stationary temporal information in healthcare prediction tasks.
Collapse
Affiliation(s)
- Rawan AlSaad
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Qatar.
| | | | - Alaa Abd-Alrazaq
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Qatar
| | - Sabri Boughorbel
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Qatar
| |
Collapse
|
8
|
Lu HY, Ding X, Hirst JE, Yang Y, Yang J, Mackillop L, Clifton DA. Digital Health and Machine Learning Technologies for Blood Glucose Monitoring and Management of Gestational Diabetes. IEEE Rev Biomed Eng 2024; 17:98-117. [PMID: 37022834 PMCID: PMC7615520 DOI: 10.1109/rbme.2023.3242261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Innovations in digital health and machine learning are changing the path of clinical health and care. People from different geographical locations and cultural backgrounds can benefit from the mobility of wearable devices and smartphones to monitor their health ubiquitously. This paper focuses on reviewing the digital health and machine learning technologies used in gestational diabetes - a subtype of diabetes that occurs during pregnancy. This paper reviews sensor technologies used in blood glucose monitoring devices, digital health innovations and machine learning models for gestational diabetes monitoring and management, in clinical and commercial settings, and discusses future directions. Despite one in six mothers having gestational diabetes, digital health applications were underdeveloped, especially the techniques that can be deployed in clinical practice. There is an urgent need to (1) develop clinically interpretable machine learning methods for patients with gestational diabetes, assisting health professionals with treatment, monitoring, and risk stratification before, during and after their pregnancies; (2) adapt and develop clinically-proven devices for patient self-management of health and well-being at home settings ("virtual ward" and virtual consultation), thereby improving clinical outcomes by facilitating timely intervention; and (3) ensure innovations are affordable and sustainable for all women with different socioeconomic backgrounds and clinical resources.
Collapse
|
9
|
Li Z, Yan C, Zhang X, Gharibi G, Yin Z, Jiang X, Malin BA. Split Learning for Distributed Collaborative Training of Deep Learning Models in Health Informatics. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:1047-1056. [PMID: 38222326 PMCID: PMC10785879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Deep learning continues to rapidly evolve and is now demonstrating remarkable potential for numerous medical prediction tasks. However, realizing deep learning models that generalize across healthcare organizations is challenging. This is due, in part, to the inherent siloed nature of these organizations and patient privacy requirements. To address this problem, we illustrate how split learning can enable collaborative training of deep learning models across disparate and privately maintained health datasets, while keeping the original records and model parameters private. We introduce a new privacy-preserving distributed learning framework that offers a higher level of privacy compared to conventional federated learning. We use several biomedical imaging and electronic health record (EHR) datasets to show that deep learning models trained via split learning can achieve highly similar performance to their centralized and federated counterparts while greatly improving computational efficiency and reducing privacy risks.
Collapse
Affiliation(s)
| | - Chao Yan
- Vanderbilt University Medical Center, Nashville, TN
| | | | | | - Zhijun Yin
- Vanderbilt University, Nashville, TN
- Vanderbilt University Medical Center, Nashville, TN
| | | | - Bradley A Malin
- Vanderbilt University, Nashville, TN
- Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
10
|
Xu J, Yin R, Huang Y, Gao H, Wu Y, Guo J, Smith GE, DeKosky ST, Wang F, Guo Y, Bian J. Identification of Outcome-Oriented Progression Subtypes from Mild Cognitive Impairment to Alzheimer's Disease Using Electronic Health Records. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:764-773. [PMID: 38222396 PMCID: PMC10785946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Alzheimer's disease (AD) is a complex heterogeneous neurodegenerative disease that requires an in-depth understanding of its progression pathways and contributing factors to develop effective risk stratification and prevention strategies. In this study, we proposed an outcome-oriented model to identify progression pathways from mild cognitive impairment (MCI) to AD using electronic health records (EHRs) from the OneFlorida+ Clinical Research Consortium. To achieve this, we employed the long short-term memory (LSTM) network to extract relevant information from the sequential records of each patient. The hierarchical agglomerative clustering was then applied to the learned representation to group patients based on their progression subtypes. Our approach identified multiple progression pathways, each of which represented distinct patterns of disease progression from MCI to AD. These pathways can serve as a valuable resource for researchers to understand the factors influencing AD progression and to develop personalized interventions to delay or prevent the onset of the disease.
Collapse
Affiliation(s)
- Jie Xu
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Rui Yin
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Yu Huang
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Hannah Gao
- Hamilton Southeastern High School, Fishers, Indiana, IN, USA
| | - Yonghui Wu
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Jingchuan Guo
- Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA
| | - Glenn E Smith
- Department of Clinical and Health Psychology, University of Florida, Gainesville, FL, USA
| | - Steven T DeKosky
- Department of Neurology, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yi Guo
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes &Biomedical Informatics, University of Florida, Gainesville, FL, USA
| |
Collapse
|
11
|
Su L, Liu S, Long Y, Chen C, Chen K, Chen M, Chen Y, Cheng Y, Cui Y, Ding Q, Ding R, Duan M, Gao T, Gu X, He H, He J, Hu B, Hu C, Huang R, Huang X, Jiang H, Jiang J, Lan Y, Li J, Li L, Li L, Li W, Li Y, Lin J, Luo X, Lyu F, Mao Z, Miao H, Shang X, Shang X, Shang Y, Shen Y, Shi Y, Sun Q, Sun W, Tang Z, Wang B, Wang H, Wang H, Wang L, Wang L, Wang S, Wang Z, Wang Z, Wei D, Wu J, Wu Q, Xing X, Yang J, Yang X, Yu J, Yu W, Yu Y, Yuan H, Zhai Q, Zhang H, Zhang L, Zhang M, Zhang Z, Zhao C, Zheng R, Zhong L, Zhou F, Zhu W. Chinese experts' consensus on the application of intensive care big data. Front Med (Lausanne) 2024; 10:1174429. [PMID: 38264049 PMCID: PMC10804886 DOI: 10.3389/fmed.2023.1174429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 11/09/2023] [Indexed: 01/25/2024] Open
Abstract
The development of intensive care medicine is inseparable from the diversified monitoring data. Intensive care medicine has been closely integrated with data since its birth. Critical care research requires an integrative approach that embraces the complexity of critical illness and the computational technology and algorithms that can make it possible. Considering the need of standardization of application of big data in intensive care, Intensive Care Medicine Branch of China Health Information and Health Care Big Data Society, Standard Committee has convened expert group, secretary group and the external audit expert group to formulate Chinese Experts' Consensus on the Application of Intensive Care Big Data (2022). This consensus makes 29 recommendations on the following five parts: Concept of intensive care big data, Important scientific issues, Standards and principles of database, Methodology in solving big data problems, Clinical application and safety consideration of intensive care big data. The consensus group believes this consensus is the starting step of application big data in the field of intensive care. More explorations and big data based retrospective research should be carried out in order to enhance safety and reliability of big data based models of critical care field.
Collapse
Affiliation(s)
- Longxiang Su
- Department of Critical Care Medicine, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Shengjun Liu
- Department of Critical Care Medicine, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Yun Long
- Department of Critical Care Medicine, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Chaodong Chen
- Department of Surgical Intensive Critical Unit, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China
| | - Kai Chen
- Department of Critical Care Medicine, Fujian Provincial Key Laboratory of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fujian Provincial Center for Critical Care Medicine, Fuzhou, Fujian, China
| | - Ming Chen
- Department of Critical Care Medicine, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, Jiangsu, China
| | - Yaolong Chen
- Evidence-based Medicine Center, School of Basic Medical Sciences, Lanzhou University, Lanzhou, China
| | - Yisong Cheng
- Department of Critical Care Medicine, West China Hospital of Sichuan University, Chengdu, China
| | - Yating Cui
- Department of Critical Care Medicine, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Qi Ding
- Department of Surgical Intensive Critical Unit, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China
| | - Renyu Ding
- Department of Intensive Care Unit, The First Hospital of China Medical University, Shenyang, Liaoning, China
| | - Meili Duan
- Department of Critical Care Medicine, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Tao Gao
- Department of Critical Care Medicine, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, Jiangsu, China
| | - Xiaohua Gu
- Department of Critical Care Medicine, Northern Jiangsu People’s Hospital; Clinical Medical College, Yangzhou University, Yangzhou, China
| | - Hongli He
- Intensive Care Unit, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, School of Medicine of University of Electronic Science and Technology, Chengdu, China
| | - Jiawei He
- Department of Critical Care Medicine, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Bo Hu
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Chang Hu
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Rui Huang
- Department of Critical Care Medicine, The Second Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China
| | - Xiaobo Huang
- Intensive Care Unit, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, School of Medicine of University of Electronic Science and Technology, Chengdu, China
| | - Huizhen Jiang
- Department of Information Center, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Jing Jiang
- Department of Critical Care Medicine, Chongqing General Hospital, Chongqing, China
| | - Yunping Lan
- Intensive Care Unit, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, School of Medicine of University of Electronic Science and Technology, Chengdu, China
| | - Jun Li
- Department of Critical Care Medicine, Fujian Provincial Key Laboratory of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fujian Provincial Center for Critical Care Medicine, Fuzhou, Fujian, China
| | - Linfeng Li
- Medical Data Research Institute, Chongqing Medical University, Chongqing, China
| | - Lu Li
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Wenxiong Li
- Department of Surgical Intensive Critical Unit, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China
| | - Yongzai Li
- Information Network Center, QiLu Hospital, ShanDong University, Jinan, China
| | - Jin Lin
- Department of Critical Care Medicine, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Xufei Luo
- Evidence-based Medicine Center, School of Basic Medical Sciences, Lanzhou University, Lanzhou, China
| | - Feng Lyu
- Department of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhi Mao
- Department of Critical Care Medicine, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - He Miao
- Department of Intensive Care Unit, The First Hospital of China Medical University, Shenyang, Liaoning, China
| | - Xiaopu Shang
- Department of Information Management, Beijing Jiaotong University, Beijing, China
| | - Xiuling Shang
- Department of Critical Care Medicine, Fujian Provincial Key Laboratory of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fujian Provincial Center for Critical Care Medicine, Fuzhou, Fujian, China
| | - You Shang
- Department of Critical Care Medicine, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yuwen Shen
- Intensive Care Unit of Cardiovascular Surgery Department, Qilu Hospital of Shandong University, Jinan, China
| | - Yinghuan Shi
- National Institute of Healthcare Data Science, Nanjing University, Nanjing, China
| | - Qihang Sun
- British Chinese Society of Health Informatics, Beijing, China
| | - Weijun Sun
- Faculty of Automation, Guangdong University of Technology, Guangzhou, China
| | - Zhiyun Tang
- Department of Intensive Care Unit, Zhejiang Provincial People’s Hospital, Affiliated People’s Hospital, Emergency and Intensive Care Unit Center, Hangzhou Medical College, Hangzhou, Zhejiang, China
| | - Bo Wang
- Department of Critical Care Medicine, West China Hospital of Sichuan University, Chengdu, China
| | - Haijun Wang
- Department of Intensive Care Unit, National Cancer Center/National Clinical Research Center, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Hongliang Wang
- Department of Critical Care Medicine, The Second Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China
| | - Li Wang
- Department of Epidemiology and Biostatistics, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences; School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Luhao Wang
- Department of Critical Care Medicine, Sun Yat-Sen University First Affiliated Hospital, Guangzhou, China
| | - Sicong Wang
- Department of Critical Care Medicine, The Second Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China
| | - Zhanwen Wang
- Intensive Care Unit, XiangYa Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Xiang Ya Hospital, Central South University, Changsha, China
- Hunan Provincial Clinical Research Center for Critical Care Medicine, Xiang Ya Hospital, Central South University, Changsha, China
| | - Zhong Wang
- Department of Intensive Care Unit, The First Hospital of China Medical University, Shenyang, Liaoning, China
| | - Dong Wei
- National Institute of Healthcare Data Science, Nanjing University, Nanjing, China
| | - Jianfeng Wu
- Intensive Care Unit, XiangYa Hospital, Central South University, Changsha, China
| | - Qin Wu
- Department of Critical Care Medicine, West China Hospital of Sichuan University, Chengdu, China
| | - Xuezhong Xing
- Department of Epidemiology and Biostatistics, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences; School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Jin Yang
- Department of Critical Care Medicine, Chongqing General Hospital, Chongqing, China
| | - Xianghong Yang
- Department of Intensive Care Unit, National Cancer Center/National Clinical Research Center, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jiangquan Yu
- Department of Critical Care Medicine, Northern Jiangsu People’s Hospital; Clinical Medical College, Yangzhou University, Yangzhou, China
| | - Wenkui Yu
- Department of Critical Care Medicine, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, Jiangsu, China
| | - Yuan Yu
- Intensive Care Unit of Cardiovascular Surgery Department, Qilu Hospital of Shandong University, Jinan, China
| | - Hao Yuan
- Department of Critical Care Medicine, Sun Yat-Sen University First Affiliated Hospital, Guangzhou, China
| | - Qian Zhai
- National Institute of Healthcare Data Science, Nanjing University, Nanjing, China
| | - Hao Zhang
- Department of Intensive Care Unit, National Cancer Center/National Clinical Research Center, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Lina Zhang
- Intensive Care Unit, XiangYa Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Xiang Ya Hospital, Central South University, Changsha, China
- Hunan Provincial Clinical Research Center for Critical Care Medicine, Xiang Ya Hospital, Central South University, Changsha, China
| | - Meng Zhang
- Department of Critical Care Medicine, Chongqing General Hospital, Chongqing, China
| | - Zhongheng Zhang
- Department of Emergency Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Chunguang Zhao
- Intensive Care Unit, XiangYa Hospital, Central South University, Changsha, China
- National Clinical Research Center for Geriatric Disorders, Xiang Ya Hospital, Central South University, Changsha, China
- Hunan Provincial Clinical Research Center for Critical Care Medicine, Xiang Ya Hospital, Central South University, Changsha, China
| | - Ruiqiang Zheng
- Department of Critical Care Medicine, Northern Jiangsu People’s Hospital; Clinical Medical College, Yangzhou University, Yangzhou, China
| | - Lei Zhong
- Department of Intensive Care Unit, National Cancer Center/National Clinical Research Center, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Feihu Zhou
- Department of Critical Care Medicine, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Weiguo Zhu
- Department of General Medicine, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
12
|
Ho M, Levy TJ, Koulas I, Founta K, Coppa K, Hirsch JS, Davidson KW, Spyropoulos AC, Zanos TP. Longitudinal dynamic clinical phenotypes of in-hospital COVID-19 patients across three dominant virus variants in New York. Int J Med Inform 2024; 181:105286. [PMID: 37956643 PMCID: PMC10843635 DOI: 10.1016/j.ijmedinf.2023.105286] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 10/20/2023] [Accepted: 11/03/2023] [Indexed: 11/15/2023]
Abstract
BACKGROUND COVID-19 is a challenging disease to characterize given its wide-ranging heterogeneous symptomatology. Several studies have attempted to extract clinical phenotypes but often relied on data from small patient cohorts, usually limited to only one viral variant and utilizing a static snapshot of patient data. OBJECTIVE This study aimed to identify clinical phenotypes of hospitalized COVID-19 patients and investigate their longitudinal dynamics throughout the pandemic, with the goal to relate these phenotypes to clinical outcomes and treatment strategies. METHODS We utilized routinely collected demographic and clinical data throughout the hospitalization of 38,077 patients admitted between 3/2020 to 5/2022, in 12 New York hospitals. Uniform Manifold Approximation and Projection and agglomerative hierarchical clustering were used to derive the clusters, followed by exploratory data analysis to compare the prevalence of comorbidities and treatments per cluster. RESULTS 4 distinct clinical phenotypes remained robust in multi-site validation and were associated with different mortality rates. The temporal progression of these phenotypes throughout the COVID-19 pandemic demonstrated increased variability across the waves of the three dominant viral variants (alpha, delta, omicron). Longitudinal analysis evaluating changes in clinical phenotypes of each patient throughout the course of a 4-week hospital stay exemplified the dynamic nature of the disease progression. Factors such as sex, race/ethnicity and specific treatment modalities revealed significant and clinically relevant differences between the observed phenotypes. CONCLUSIONS Our proposed methodology has the potential of enabling clinicians and policy makers to draw evidence-based conclusions for guiding treatment modalities in a dynamic fashion.
Collapse
Affiliation(s)
- Matthew Ho
- Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549
| | - Todd J Levy
- Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030
| | - Ioannis Koulas
- Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030
| | - Kyriaki Founta
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549
| | - Kevin Coppa
- Department of Clinical Digital Solutions, Northwell Health, New Hyde Park, NY 11042
| | - Jamie S Hirsch
- Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549; Department of Clinical Digital Solutions, Northwell Health, New Hyde Park, NY 11042
| | - Karina W Davidson
- Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549
| | - Alex C Spyropoulos
- Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549
| | - Theodoros P Zanos
- Institute of Health Systems Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY 11549.
| |
Collapse
|
13
|
Papanastasiou G, Yang G, Fotiadis DI, Dikaios N, Wang C, Huda A, Sobolevsky L, Raasch J, Perez E, Sidhu G, Palumbo D. Large-scale deep learning analysis to identify adult patients at risk for combined and common variable immunodeficiencies. COMMUNICATIONS MEDICINE 2023; 3:189. [PMID: 38123736 PMCID: PMC10733406 DOI: 10.1038/s43856-023-00412-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 11/21/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Primary immunodeficiency (PI) is a group of heterogeneous disorders resulting from immune system defects. Over 70% of PI is undiagnosed, leading to increased mortality, co-morbidity and healthcare costs. Among PI disorders, combined immunodeficiencies (CID) are characterized by complex immune defects. Common variable immunodeficiency (CVID) is among the most common types of PI. In light of available treatments, it is critical to identify adult patients at risk for CID and CVID, before the development of serious morbidity and mortality. METHODS We developed a deep learning-based method (named "TabMLPNet") to analyze clinical history from nationally representative medical claims from electronic health records (Optum® data, covering all US), evaluated in the setting of identifying CID/CVID in adults. Further, we revealed the most important CID/CVID-associated antecedent phenotype combinations. Four large cohorts were generated: a total of 47,660 PI cases and (1:1 matched) controls. RESULTS The sensitivity/specificity of TabMLPNet modeling ranges from 0.82-0.88/0.82-0.85 across cohorts. Distinctive combinations of antecedent phenotypes associated with CID/CVID are identified, consisting of respiratory infections/conditions, genetic anomalies, cardiac defects, autoimmune diseases, blood disorders and malignancies, which can possibly be useful to systematize the identification of CID and CVID. CONCLUSIONS We demonstrated an accurate method in terms of CID and CVID detection evaluated on large-scale medical claims data. Our predictive scheme can potentially lead to the development of new clinical insights and expanded guidelines for identification of adult patients at risk for CID and CVID as well as be used to improve patient outcomes on population level.
Collapse
Affiliation(s)
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, London, UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK
| | - Dimitris I Fotiadis
- Department of Biomedical Research, Institute of Molecular Biology and Biotechnology, FORTH, Ioannina, Greece
- Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, Ioannina, Greece
| | | | - Chengjia Wang
- School of Mathematical and Computer Sciences, Heriot Watt, Edinburgh, UK
- Edinburgh Centre for Robotics, Edinburgh, UK
| | | | | | | | - Elena Perez
- Allergy Associates of the Palm Beaches, North Palm Beach, FL, USA
| | | | | |
Collapse
|
14
|
Lhoste VPF, Zhou B, Mishra A, Bennett JE, Filippi S, Asaria P, Gregg EW, Danaei G, Ezzati M. Cardiometabolic and renal phenotypes and transitions in the United States population. NATURE CARDIOVASCULAR RESEARCH 2023; 3:46-59. [PMID: 38314318 PMCID: PMC7615595 DOI: 10.1038/s44161-023-00391-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 11/13/2023] [Indexed: 02/06/2024]
Abstract
Cardiovascular and renal conditions have both shared and distinct determinants. In this study, we applied unsupervised clustering to multiple rounds of the National Health and Nutrition Examination Survey from 1988 to 2018, and identified 10 cardiometabolic and renal phenotypes. These included a 'low risk' phenotype; two groups with average risk factor levels but different heights; one group with low body-mass index and high levels of high-density lipoprotein cholesterol; five phenotypes with high levels of one or two related risk factors ('high heart rate', 'high cholesterol', 'high blood pressure', 'severe obesity' and 'severe hyperglycemia'); and one phenotype with low diastolic blood pressure (DBP) and low estimated glomerular filtration rate (eGFR). Prevalence of the 'high blood pressure' and 'high cholesterol' phenotypes decreased over time, contrasted by a rise in the 'severe obesity' and 'low DBP, low eGFR' phenotypes. The cardiometabolic and renal traits of the US population have shifted from phenotypes with high blood pressure and cholesterol toward poor kidney function, hyperglycemia and severe obesity.
Collapse
Affiliation(s)
- Victor P. F. Lhoste
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
| | - Bin Zhou
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
- Abdul Latif Jameel Institute for Disease and Emergency Analytics, Imperial College London, London, UK
| | - Anu Mishra
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
| | - James E. Bennett
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
| | - Sarah Filippi
- Department of Mathematics, Imperial College London, London, UK
| | - Perviz Asaria
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
| | - Edward W. Gregg
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
- Abdul Latif Jameel Institute for Disease and Emergency Analytics, Imperial College London, London, UK
- School of Population Health, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Goodarz Danaei
- Department of Global Health and Population, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Majid Ezzati
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
- Abdul Latif Jameel Institute for Disease and Emergency Analytics, Imperial College London, London, UK
- Regional Institute for Population Studies, University of Ghana, Accra, Ghana
| |
Collapse
|
15
|
Lanotte F, O’Brien MK, Jayaraman A. AI in Rehabilitation Medicine: Opportunities and Challenges. Ann Rehabil Med 2023; 47:444-458. [PMID: 38093518 PMCID: PMC10767220 DOI: 10.5535/arm.23131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 11/23/2023] [Indexed: 01/03/2024] Open
Abstract
Artificial intelligence (AI) tools are increasingly able to learn from larger and more complex data, thus allowing clinicians and scientists to gain new insights from the information they collect about their patients every day. In rehabilitation medicine, AI can be used to find patterns in huge amounts of healthcare data. These patterns can then be leveraged at the individual level, to design personalized care strategies and interventions to optimize each patient's outcomes. However, building effective AI tools requires many careful considerations about how we collect and handle data, how we train the models, and how we interpret results. In this perspective, we discuss some of the current opportunities and challenges for AI in rehabilitation. We first review recent trends in AI for the screening, diagnosis, treatment, and continuous monitoring of disease or injury, with a special focus on the different types of healthcare data used for these applications. We then examine potential barriers to designing and integrating AI into the clinical workflow, and we propose an end-to-end framework to address these barriers and guide the development of effective AI for rehabilitation. Finally, we present ideas for future work to pave the way for AI implementation in real-world rehabilitation practices.
Collapse
Affiliation(s)
- Francesco Lanotte
- Max Nader Lab for Rehabilitation Technologies and Outcomes Research, Shirley Ryan AbilityLab, Chicago, IL, United States
- Department of Physical Medicine and Rehabilitation, Northwestern University, Chicago, IL, United States
| | - Megan K. O’Brien
- Max Nader Lab for Rehabilitation Technologies and Outcomes Research, Shirley Ryan AbilityLab, Chicago, IL, United States
- Department of Physical Medicine and Rehabilitation, Northwestern University, Chicago, IL, United States
| | - Arun Jayaraman
- Max Nader Lab for Rehabilitation Technologies and Outcomes Research, Shirley Ryan AbilityLab, Chicago, IL, United States
- Department of Physical Medicine and Rehabilitation, Northwestern University, Chicago, IL, United States
| |
Collapse
|
16
|
Sivarajkumar S, Huang Y, Wang Y. Fair patient model: Mitigating bias in the patient representation learned from the electronic health records. J Biomed Inform 2023; 148:104544. [PMID: 37995843 PMCID: PMC10850918 DOI: 10.1016/j.jbi.2023.104544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 10/02/2023] [Accepted: 11/10/2023] [Indexed: 11/25/2023]
Abstract
OBJECTIVE To pre-train fair and unbiased patient representations from Electronic Health Records (EHRs) using a novel weighted loss function that reduces bias and improves fairness in deep representation learning models. METHODS We defined a new loss function, called weighted loss function, in the deep representation learning model to balance the importance of different groups of patients and features. We applied the proposed model, called Fair Patient Model (FPM), to a sample of 34,739 patients from the MIMIC-III dataset and learned patient representations for four clinical outcome prediction tasks. RESULTS FPM outperformed the baseline models in terms of three fairness metrics: demographic parity, equality of opportunity difference, and equalized odds ratio. FPM also achieved comparable predictive performance with the baselines, with an average accuracy of 0.7912. Feature analysis revealed that FPM captured more information from clinical features than the baselines. CONCLUSION FPM is a novel method to pre-train fair and unbiased patient representations from the EHR data using a weighted loss function. The learned representations can be used for various downstream tasks in healthcare and can be extended to other domains where fairness is important.
Collapse
Affiliation(s)
- Sonish Sivarajkumar
- Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States
| | - Yufei Huang
- Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States; Department of Medicine, University of Pittsburgh, Pittsburgh, PA, United States; Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, United States; University of Pittsburgh Medical Center Hillman Cancer Center, Pittsburgh, PA, United States
| | - Yanshan Wang
- Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States; Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, United States; Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States; Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, United States; University of Pittsburgh Medical Center Hillman Cancer Center, Pittsburgh, PA, United States.
| |
Collapse
|
17
|
Keszthelyi D, Gaudet-Blavignac C, Bjelogrlic M, Lovis C. Patient Information Summarization in Clinical Settings: Scoping Review. JMIR Med Inform 2023; 11:e44639. [PMID: 38015588 DOI: 10.2196/44639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 03/15/2023] [Accepted: 07/25/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Information overflow, a common problem in the present clinical environment, can be mitigated by summarizing clinical data. Although there are several solutions for clinical summarization, there is a lack of a complete overview of the research relevant to this field. OBJECTIVE This study aims to identify state-of-the-art solutions for clinical summarization, to analyze their capabilities, and to identify their properties. METHODS A scoping review of articles published between 2005 and 2022 was conducted. With a clinical focus, PubMed and Web of Science were queried to find an initial set of reports, later extended by articles found through a chain of citations. The included reports were analyzed to answer the questions of where, what, and how medical information is summarized; whether summarization conserves temporality, uncertainty, and medical pertinence; and how the propositions are evaluated and deployed. To answer how information is summarized, methods were compared through a new framework "collect-synthesize-communicate" referring to information gathering from data, its synthesis, and communication to the end user. RESULTS Overall, 128 articles were included, representing various medical fields. Exclusively structured data were used as input in 46.1% (59/128) of papers, text in 41.4% (53/128) of articles, and both in 10.2% (13/128) of papers. Using the proposed framework, 42.2% (54/128) of the records contributed to information collection, 27.3% (35/128) contributed to information synthesis, and 46.1% (59/128) presented solutions for summary communication. Numerous summarization approaches have been presented, including extractive (n=13) and abstractive summarization (n=19); topic modeling (n=5); summary specification (n=11); concept and relation extraction (n=30); visual design considerations (n=59); and complete pipelines (n=7) using information extraction, synthesis, and communication. Graphical displays (n=53), short texts (n=41), static reports (n=7), and problem-oriented views (n=7) were the most common types in terms of summary communication. Although temporality and uncertainty information were usually not conserved in most studies (74/128, 57.8% and 113/128, 88.3%, respectively), some studies presented solutions to treat this information. Overall, 115 (89.8%) articles showed results of an evaluation, and methods included evaluations with human participants (median 15, IQR 24 participants): measurements in experiments with human participants (n=31), real situations (n=8), and usability studies (n=28). Methods without human involvement included intrinsic evaluation (n=24), performance on a proxy (n=10), or domain-specific tasks (n=11). Overall, 11 (8.6%) reports described a system deployed in clinical settings. CONCLUSIONS The scientific literature contains many propositions for summarizing patient information but reports very few comparisons of these proposals. This work proposes to compare these algorithms through how they conserve essential aspects of clinical information and through the "collect-synthesize-communicate" framework. We found that current propositions usually address these 3 steps only partially. Moreover, they conserve and use temporality, uncertainty, and pertinent medical aspects to varying extents, and solutions are often preliminary.
Collapse
Affiliation(s)
- Daniel Keszthelyi
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christophe Gaudet-Blavignac
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Mina Bjelogrlic
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| |
Collapse
|
18
|
Misra S, Wagner R, Ozkan B, Schön M, Sevilla-Gonzalez M, Prystupa K, Wang CC, Kreienkamp RJ, Cromer SJ, Rooney MR, Duan D, Thuesen ACB, Wallace AS, Leong A, Deutsch AJ, Andersen MK, Billings LK, Eckel RH, Sheu WHH, Hansen T, Stefan N, Goodarzi MO, Ray D, Selvin E, Florez JC, Meigs JB, Udler MS. Precision subclassification of type 2 diabetes: a systematic review. COMMUNICATIONS MEDICINE 2023; 3:138. [PMID: 37798471 PMCID: PMC10556101 DOI: 10.1038/s43856-023-00360-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 09/15/2023] [Indexed: 10/07/2023] Open
Abstract
BACKGROUND Heterogeneity in type 2 diabetes presentation and progression suggests that precision medicine interventions could improve clinical outcomes. We undertook a systematic review to determine whether strategies to subclassify type 2 diabetes were associated with high quality evidence, reproducible results and improved outcomes for patients. METHODS We searched PubMed and Embase for publications that used 'simple subclassification' approaches using simple categorisation of clinical characteristics, or 'complex subclassification' approaches which used machine learning or 'omics approaches in people with established type 2 diabetes. We excluded other diabetes subtypes and those predicting incident type 2 diabetes. We assessed quality, reproducibility and clinical relevance of extracted full-text articles and qualitatively synthesised a summary of subclassification approaches. RESULTS Here we show data from 51 studies that demonstrate many simple stratification approaches, but none have been replicated and many are not associated with meaningful clinical outcomes. Complex stratification was reviewed in 62 studies and produced reproducible subtypes of type 2 diabetes that are associated with outcomes. Both approaches require a higher grade of evidence but support the premise that type 2 diabetes can be subclassified into clinically meaningful subtypes. CONCLUSION Critical next steps toward clinical implementation are to test whether subtypes exist in more diverse ancestries and whether tailoring interventions to subtypes will improve outcomes.
Collapse
Affiliation(s)
- Shivani Misra
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, UK.
- Department of Diabetes and Endocrinology, Imperial College Healthcare NHS Trust, London, UK.
| | - Robert Wagner
- Department of Endocrinology and Diabetology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Moorenstr. 5, 40225, Düsseldorf, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Auf'm Hennekamp 65, 40225, Düsseldorf, Germany
- German Center for Diabetes Research (DZD), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Bige Ozkan
- Welch Center for Prevention, Epidemiology, and Clinical Research, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Ciccarone Center for the Prevention of Cardiovascular Disease, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Martin Schön
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Auf'm Hennekamp 65, 40225, Düsseldorf, Germany
- German Center for Diabetes Research (DZD), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- Institute of Experimental Endocrinology, Biomedical Research Center, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Magdalena Sevilla-Gonzalez
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital, Boston, MA, USA
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Katsiaryna Prystupa
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Auf'm Hennekamp 65, 40225, Düsseldorf, Germany
- German Center for Diabetes Research (DZD), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Caroline C Wang
- Welch Center for Prevention, Epidemiology, and Clinical Research, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Raymond J Kreienkamp
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Diabetes Unit, Division of Endocrinology, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Pediatrics, Division of Endocrinology, Boston Children's Hospital, Boston, MA, USA
| | - Sara J Cromer
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Diabetes Unit, Division of Endocrinology, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mary R Rooney
- Welch Center for Prevention, Epidemiology, and Clinical Research, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Daisy Duan
- Division of Endocrinology, Diabetes and Metabolism, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Anne Cathrine Baun Thuesen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Amelia S Wallace
- Welch Center for Prevention, Epidemiology, and Clinical Research, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Aaron Leong
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Diabetes Unit, Division of Endocrinology, Massachusetts General Hospital, Boston, MA, USA
- Division of General Internal Medicine, Massachusetts General Hospital, 100 Cambridge St 16th Floor, Boston, MA, USA
| | - Aaron J Deutsch
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Diabetes Unit, Division of Endocrinology, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mette K Andersen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Liana K Billings
- Division of Endocrinology, Diabetes and Metabolism, NorthShore University Health System, Skokie, IL, USA
- Department of Medicine, Pritzker School of Medicine, University of Chicago, Chicago, IL, USA
| | - Robert H Eckel
- Division of Endocrinology, Metabolism and Diabetes, University of Colorado School of Medicine, Aurora, CO, USA
| | - Wayne Huey-Herng Sheu
- Institute of Molecular and Genomic Medicine, National Health Research Institute, Miaoli County, Taiwan, ROC
- Division of Endocrinology and Metabolism, Taichung Veterans General Hospital, Taichung, Taiwan, ROC
- Division of Endocrinology and Metabolism, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
| | - Torben Hansen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Norbert Stefan
- German Center for Diabetes Research (DZD), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- University Hospital of Tübingen, Tübingen, Germany
- Institute of Diabetes Research and Metabolic Diseases (IDM), Helmholtz Center Munich, Neuherberg, Germany
| | - Mark O Goodarzi
- Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Debashree Ray
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Elizabeth Selvin
- Welch Center for Prevention, Epidemiology, and Clinical Research, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jose C Florez
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Diabetes Unit, Division of Endocrinology, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - James B Meigs
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Division of General Internal Medicine, Massachusetts General Hospital, 100 Cambridge St 16th Floor, Boston, MA, USA
| | - Miriam S Udler
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Diabetes Unit, Division of Endocrinology, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
19
|
Pellegrini C, Navab N, Kazi A. Unsupervised pre-training of graph transformers on patient population graphs. Med Image Anal 2023; 89:102895. [PMID: 37473609 DOI: 10.1016/j.media.2023.102895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 07/06/2023] [Accepted: 07/07/2023] [Indexed: 07/22/2023]
Abstract
Pre-training has shown success in different areas of machine learning, such as Computer Vision, Natural Language Processing (NLP), and medical imaging. However, it has not been fully explored for clinical data analysis. An immense amount of clinical records are recorded, but still, data and labels can be scarce for data collected in small hospitals or dealing with rare diseases. In such scenarios, pre-training on a larger set of unlabeled clinical data could improve performance. In this paper, we propose novel unsupervised pre-training techniques designed for heterogeneous, multi-modal clinical data for patient outcome prediction inspired by masked language modeling (MLM), by leveraging graph deep learning over population graphs. To this end, we further propose a graph-transformer-based network, designed to handle heterogeneous clinical data. By combining masking-based pre-training with a transformer-based network, we translate the success of masking-based pre-training in other domains to heterogeneous clinical data. We show the benefit of our pre-training method in a self-supervised and a transfer learning setting, utilizing three medical datasets TADPOLE, MIMIC-III, and a Sepsis Prediction Dataset. We find that our proposed pre-training methods help in modeling the data at a patient and population level and improve performance in different fine-tuning tasks on all datasets.
Collapse
Affiliation(s)
- Chantal Pellegrini
- Computer Aided Medical Procedures, Technical University of Munich, Munich, Germany.
| | - Nassir Navab
- Computer Aided Medical Procedures, Technical University of Munich, Munich, Germany; Computer Aided Medical Procedures, Johns Hopkins University, Baltimore, USA
| | - Anees Kazi
- Computer Aided Medical Procedures, Technical University of Munich, Munich, Germany; Massachusetts General Hospital, Harvard Medical School, Cambridge, MA, USA
| |
Collapse
|
20
|
Liu P, Wang Z, Liu N, Peres MA. A scoping review of the clinical application of machine learning in data-driven population segmentation analysis. J Am Med Inform Assoc 2023; 30:1573-1582. [PMID: 37369006 PMCID: PMC10436153 DOI: 10.1093/jamia/ocad111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 06/08/2023] [Accepted: 06/16/2023] [Indexed: 06/29/2023] Open
Abstract
OBJECTIVE Data-driven population segmentation is commonly used in clinical settings to separate the heterogeneous population into multiple relatively homogenous groups with similar healthcare features. In recent years, machine learning (ML) based segmentation algorithms have garnered interest for their potential to speed up and improve algorithm development across many phenotypes and healthcare situations. This study evaluates ML-based segmentation with respect to (1) the populations applied, (2) the segmentation details, and (3) the outcome evaluations. MATERIALS AND METHODS MEDLINE, Embase, Web of Science, and Scopus were used following the PRISMA-ScR criteria. Peer-reviewed studies in the English language that used data-driven population segmentation analysis on structured data from January 2000 to October 2022 were included. RESULTS We identified 6077 articles and included 79 for the final analysis. Data-driven population segmentation analysis was employed in various clinical settings. K-means clustering is the most prevalent unsupervised ML paradigm. The most common settings were healthcare institutions. The most common targeted population was the general population. DISCUSSION Although all the studies did internal validation, only 11 papers (13.9%) did external validation, and 23 papers (29.1%) conducted methods comparison. The existing papers discussed little validating the robustness of ML modeling. CONCLUSION Existing ML applications on population segmentation need more evaluations regarding giving tailored, efficient integrated healthcare solutions compared to traditional segmentation analysis. Future ML applications in the field should emphasize methods' comparisons and external validation and investigate approaches to evaluate individual consistency using different methods.
Collapse
Affiliation(s)
- Pinyan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Ziwen Wang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- Institute of Data Science, National University of Singapore, Singapore, Singapore
| | - Marco Aurélio Peres
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore, Singapore
| |
Collapse
|
21
|
Xu J, Yin R, Huang Y, Gao H, Wu Y, Guo J, Smith GE, DeKosky ST, Wang F, Guo Y, Bian J. Identification of Outcome-Oriented Progression Subtypes from Mild Cognitive Impairment to Alzheimer's Disease Using Electronic Health Records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.27.23293270. [PMID: 37577594 PMCID: PMC10418300 DOI: 10.1101/2023.07.27.23293270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Alzheimer's disease (AD) is a complex heterogeneous neurodegenerative disease that requires an in-depth understanding of its progression pathways and contributing factors to develop effective risk stratification and prevention strategies. In this study, we proposed an outcome-oriented model to identify progression pathways from mild cognitive impairment (MCI) to AD using electronic health records (EHRs) from the OneFlorida+ Clinical Research Consortium. To achieve this, we employed the long short-term memory (LSTM) network to extract relevant information from the sequential records of each patient. The hierarchical agglomerative clustering was then applied to the learned representation to group patients based on their progression subtypes. Our approach identified multiple progression pathways, each of which represented distinct patterns of disease progression from MCI to AD. These pathways can serve as a valuable resource for researchers to understand the factors influencing AD progression and to develop personalized interventions to delay or prevent the onset of the disease.
Collapse
|
22
|
van der Haar D, Moustafa A, Warren SL, Alashwal H, van Zyl T. An Alzheimer's disease category progression sub-grouping analysis using manifold learning on ADNI. Sci Rep 2023; 13:10483. [PMID: 37380746 DOI: 10.1038/s41598-023-37569-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 06/23/2023] [Indexed: 06/30/2023] Open
Abstract
Many current statistical and machine learning methods have been used to explore Alzheimer's disease (AD) and its associated patterns that contribute to the disease. However, there has been limited success in understanding the relationship between cognitive tests, biomarker data, and patient AD category progressions. In this work, we perform exploratory data analysis of AD health record data by analyzing various learned lower dimensional manifolds to separate early-stage AD categories further. Specifically, we used Spectral embedding, Multidimensional scaling, Isomap, t-Distributed Stochastic Neighbour Embedding, Uniform Manifold Approximation and Projection, and sparse denoising autoencoder based manifolds on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. We then determine the clustering potential of the learned embeddings and then determine if category sub-groupings or sub-categories can be found. We then used a Kruskal-sWallis H test to determine the statistical significance of the discovered AD subcategories. Our results show that the existing AD categories do exhibit sub-groupings, especially in mild cognitive impairment transitions in many of the tested manifolds, showing there may be a need for further subcategories to describe AD progression.
Collapse
Affiliation(s)
- Dustin van der Haar
- Academy of Computer Science and Software Engineering, University of Johannesburg, Gauteng, South Africa.
| | - Ahmed Moustafa
- Department of Human Anatomy and Physiology, University of Johannesburg, Gauteng, South Africa
- School of Psychology, Faculty of Society and Design, Bond University, Gold Coast, QLD, Australia
| | - Samuel L Warren
- School of Psychology, Faculty of Society and Design, Bond University, Gold Coast, QLD, Australia
| | - Hany Alashwal
- College of Information Technology, United Arab Emirates University, Al-Ain, United Arab Emirates
| | - Terence van Zyl
- Institute for Intelligent Systems, University of Johannesburg, Gauteng, South Africa
| |
Collapse
|
23
|
Boussina A, Wardi G, Shashikumar SP, Malhotra A, Zheng K, Nemati S. Representation Learning and Spectral Clustering for the Development and External Validation of Dynamic Sepsis Phenotypes: Observational Cohort Study. J Med Internet Res 2023; 25:e45614. [PMID: 37351927 PMCID: PMC10337434 DOI: 10.2196/45614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/28/2023] [Accepted: 05/22/2023] [Indexed: 06/24/2023] Open
Abstract
BACKGROUND Recent attempts at clinical phenotyping for sepsis have shown promise in identifying groups of patients with distinct treatment responses. Nonetheless, the replicability and actionability of these phenotypes remain an issue because the patient trajectory is a function of both the patient's physiological state and the interventions they receive. OBJECTIVE We aimed to develop a novel approach for deriving clinical phenotypes using unsupervised learning and transition modeling. METHODS Forty commonly used clinical variables from the electronic health record were used as inputs to a feed-forward neural network trained to predict the onset of sepsis. Using spectral clustering on the representations from this network, we derived and validated consistent phenotypes across a diverse cohort of patients with sepsis. We modeled phenotype dynamics as a Markov decision process with transitions as a function of the patient's current state and the interventions they received. RESULTS Four consistent and distinct phenotypes were derived from over 11,500 adult patients who were admitted from the University of California, San Diego emergency department (ED) with sepsis between January 1, 2016, and January 31, 2020. Over 2000 adult patients admitted from the University of California, Irvine ED with sepsis between November 4, 2017, and August 4, 2022, were involved in the external validation. We demonstrate that sepsis phenotypes are not static and evolve in response to physiological factors and based on interventions. We show that roughly 45% of patients change phenotype membership within the first 6 hours of ED arrival. We observed consistent trends in patient dynamics as a function of interventions including early administration of antibiotics. CONCLUSIONS We derived and describe 4 sepsis phenotypes present within 6 hours of triage in the ED. We observe that the administration of a 30 mL/kg fluid bolus may be associated with worse outcomes in certain phenotypes, whereas prompt antimicrobial therapy is associated with improved outcomes.
Collapse
Affiliation(s)
- Aaron Boussina
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, United States
| | - Gabriel Wardi
- Department of Emergency Medicine, University of California San Diego, San Diego, CA, United States
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, CA, United States
| | | | - Atul Malhotra
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California San Diego, San Diego, CA, United States
| | - Kai Zheng
- Department of Informatics, University of California, Irvine, Irvine, CA, United States
| | - Shamim Nemati
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, United States
| |
Collapse
|
24
|
Wang M, Sushil M, Miao BY, Butte AJ. Bottom-up and top-down paradigms of artificial intelligence research approaches to healthcare data science using growing real-world big data. J Am Med Inform Assoc 2023; 30:1323-1332. [PMID: 37187158 PMCID: PMC10280344 DOI: 10.1093/jamia/ocad085] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 04/03/2023] [Accepted: 05/04/2023] [Indexed: 05/17/2023] Open
Abstract
OBJECTIVES As the real-world electronic health record (EHR) data continue to grow exponentially, novel methodologies involving artificial intelligence (AI) are becoming increasingly applied to enable efficient data-driven learning and, ultimately, to advance healthcare. Our objective is to provide readers with an understanding of evolving computational methods and help in deciding on methods to pursue. TARGET AUDIENCE The sheer diversity of existing methods presents a challenge for health scientists who are beginning to apply computational methods to their research. Therefore, this tutorial is aimed at scientists working with EHR data who are early entrants into the field of applying AI methodologies. SCOPE This manuscript describes the diverse and growing AI research approaches in healthcare data science and categorizes them into 2 distinct paradigms, the bottom-up and top-down paradigms to provide health scientists venturing into artificial intelligent research with an understanding of the evolving computational methods and help in deciding on methods to pursue through the lens of real-world healthcare data.
Collapse
Affiliation(s)
- Michelle Wang
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
| | - Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
| | - Brenda Y Miao
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, California, USA
| |
Collapse
|
25
|
Xu R, Ali MK, Ho JC, Yang C. Hypergraph Transformers for EHR-based Clinical Predictions. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2023; 2023:582-591. [PMID: 37350881 PMCID: PMC10283128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/24/2023]
Abstract
Electronic health records (EHR) data contain rich information about patients' health conditions including diagnosis, procedures, medications and etc., which have been widely used to facilitate digital medicine. Despite its importance, it is often non-trivial to learn useful representations for patients' visits that support downstream clinical predictions, as each visit contains massive and diverse medical codes. As a result, the complex interactions among medical codes are often not captured, which leads to substandard predictions. To better model these complex relations, we leverage hypergraphs, which go beyond pairwise relations to jointly learn the representations for visits and medical codes. We also propose to use the self-attention mechanism to automatically identify the most relevant medical codes for each visit based on the downstream clinical predictions with better generalization power. Experiments on two EHR datasets show that our proposed method not only yields superior performance, but also provides reasonable insights towards the target tasks.
Collapse
Affiliation(s)
- Ran Xu
- Department of Computer Science, Emory University, Atlanta, GA
| | - Mohammed K Ali
- Hubert Department of Global Health, Emory University, Atlanta, GA
| | - Joyce C Ho
- Department of Computer Science, Emory University, Atlanta, GA
| | - Carl Yang
- Department of Computer Science, Emory University, Atlanta, GA
| |
Collapse
|
26
|
Penrod N, Okeh C, Velez Edwards DR, Barnhart K, Senapati S, Verma SS. Leveraging electronic health record data for endometriosis research. Front Digit Health 2023; 5:1150687. [PMID: 37342866 PMCID: PMC10278662 DOI: 10.3389/fdgth.2023.1150687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 05/10/2023] [Indexed: 06/23/2023] Open
Abstract
Endometriosis is a chronic, complex disease for which there are vast disparities in diagnosis and treatment between sociodemographic groups. Clinical presentation of endometriosis can vary from asymptomatic disease-often identified during (in)fertility consultations-to dysmenorrhea and debilitating pelvic pain. Because of this complexity, delayed diagnosis (mean time to diagnosis is 1.7-3.6 years) and misdiagnosis is common. Early and accurate diagnosis of endometriosis remains a research priority for patient advocates and healthcare providers. Electronic health records (EHRs) have been widely adopted as a data source in biomedical research. However, they remain a largely untapped source of data for endometriosis research. EHRs capture diverse, real-world patient populations and care trajectories and can be used to learn patterns of underlying risk factors for endometriosis which, in turn, can be used to inform screening guidelines to help clinicians efficiently and effectively recognize and diagnose the disease in all patient populations reducing inequities in care. Here, we provide an overview of the advantages and limitations of using EHR data to study endometriosis. We describe the prevalence of endometriosis observed in diverse populations from multiple healthcare institutions, examples of variables that can be extracted from EHRs to enhance the accuracy of endometriosis prediction, and opportunities to leverage longitudinal EHR data to improve our understanding of long-term health consequences for all patients.
Collapse
Affiliation(s)
- Nadia Penrod
- College of Agriculture and Life Sciences, Texas A&M University, College Station, TX, United States
| | - Chelsea Okeh
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, Philadelphia, PA, United States
| | - Digna R. Velez Edwards
- Department of Obstetrics and Gynecology, Vanderbilt University, Nashville, TN, United States
| | - Kurt Barnhart
- Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Suneeta Senapati
- Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Shefali S. Verma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, Philadelphia, PA, United States
| |
Collapse
|
27
|
Soman K, Nelson CA, Cerono G, Goldman SM, Baranzini SE, Brown EG. Early detection of Parkinson's disease through enriching the electronic health record using a biomedical knowledge graph. Front Med (Lausanne) 2023; 10:1081087. [PMID: 37250641 PMCID: PMC10217780 DOI: 10.3389/fmed.2023.1081087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 04/18/2023] [Indexed: 05/31/2023] Open
Abstract
Introduction Early diagnosis of Parkinson's disease (PD) is important to identify treatments to slow neurodegeneration. People who develop PD often have symptoms before the disease manifests and may be coded as diagnoses in the electronic health record (EHR). Methods To predict PD diagnosis, we embedded EHR data of patients onto a biomedical knowledge graph called Scalable Precision medicine Open Knowledge Engine (SPOKE) and created patient embedding vectors. We trained and validated a classifier using these vectors from 3,004 PD patients, restricting records to 1, 3, and 5 years before diagnosis, and 457,197 non-PD group. Results The classifier predicted PD diagnosis with moderate accuracy (AUC = 0.77 ± 0.06, 0.74 ± 0.05, 0.72 ± 0.05 at 1, 3, and 5 years) and performed better than other benchmark methods. Nodes in the SPOKE graph, among cases, revealed novel associations, while SPOKE patient vectors revealed the basis for individual risk classification. Discussion The proposed method was able to explain the clinical predictions using the knowledge graph, thereby making the predictions clinically interpretable. Through enriching EHR data with biomedical associations, SPOKE may be a cost-efficient and personalized way to predict PD diagnosis years before its occurrence.
Collapse
Affiliation(s)
- Karthik Soman
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, United States
| | - Charlotte A. Nelson
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, United States
| | - Gabriel Cerono
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, United States
| | - Samuel M. Goldman
- Division of Occupational and Environmental Medicine, University of California, San Francisco, San Francisco, CA, United States
| | - Sergio E. Baranzini
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, United States
| | - Ethan G. Brown
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
28
|
Tariq A, Tang S, Sakhi H, Celi LA, Newsome JM, Rubin DL, Trivedi H, Gichoya JW, Banerjee I. Fusion of imaging and non-imaging data for disease trajectory prediction for coronavirus disease 2019 patients. J Med Imaging (Bellingham) 2023; 10:034004. [PMID: 37388280 PMCID: PMC10306115 DOI: 10.1117/1.jmi.10.3.034004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 06/07/2023] [Accepted: 06/13/2023] [Indexed: 07/01/2023] Open
Abstract
Purpose Our study investigates whether graph-based fusion of imaging data with non-imaging electronic health records (EHR) data can improve the prediction of the disease trajectories for patients with coronavirus disease 2019 (COVID-19) beyond the prediction performance of only imaging or non-imaging EHR data. Approach We present a fusion framework for fine-grained clinical outcome prediction [discharge, intensive care unit (ICU) admission, or death] that fuses imaging and non-imaging information using a similarity-based graph structure. Node features are represented by image embedding, and edges are encoded with clinical or demographic similarity. Results Experiments on data collected from the Emory Healthcare Network indicate that our fusion modeling scheme performs consistently better than predictive models developed using only imaging or non-imaging features, with area under the receiver operating characteristics curve of 0.76, 0.90, and 0.75 for discharge from hospital, mortality, and ICU admission, respectively. External validation was performed on data collected from the Mayo Clinic. Our scheme highlights known biases in the model prediction, such as bias against patients with alcohol abuse history and bias based on insurance status. Conclusions Our study signifies the importance of the fusion of multiple data modalities for the accurate prediction of clinical trajectories. The proposed graph structure can model relationships between patients based on non-imaging EHR data, and graph convolutional networks can fuse this relationship information with imaging data to effectively predict future disease trajectory more effectively than models employing only imaging or non-imaging data. Our graph-based fusion modeling frameworks can be easily extended to other prediction tasks to efficiently combine imaging data with non-imaging clinical data.
Collapse
Affiliation(s)
- Amara Tariq
- Mayo Clinic, Department of Administration, Phoenix, Arizona, United States
| | - Siyi Tang
- Stanford University, Department of Electrical Engineering, Stanford, California, United States
| | - Hifza Sakhi
- Philadelphia College of Osteopathic Medicine - Georgia Campus, Swanee, Georgia, United States
| | - Leo Anthony Celi
- Massachusetts Institute of Technology, Boston, Massachusetts, United States
| | - Janice M. Newsome
- Emory University, School of Medicine, Department of Radiology and Imaging Sciences, Atlanta, Georgia, United States
| | - Daniel L. Rubin
- Stanford University, Department of Biomedical Data Science, Stanford, California, United States
- Stanford University, Department of Radiology, Stanford, California, United States
| | - Hari Trivedi
- Emory University, School of Medicine, Department of Radiology and Imaging Sciences, Atlanta, Georgia, United States
| | - Judy Wawira Gichoya
- Emory University, School of Medicine, Department of Radiology and Imaging Sciences, Atlanta, Georgia, United States
| | - Imon Banerjee
- Mayo Clinic, Department of Radiology, Phoenix, Arizona, United States
- Arizona State University, Ira A. Fulton School of Engineering, Department of Computer Engineering, Tempe, Arizona, United States
| |
Collapse
|
29
|
Misra S, Wagner R, Ozkan B, Schön M, Sevilla-Gonzalez M, Prystupa K, Wang CC, Kreienkamp RJ, Cromer SJ, Rooney MR, Duan D, Thuesen ACB, Wallace AS, Leong A, Deutsch AJ, Andersen MK, Billings LK, Eckel RH, Sheu WHH, Hansen T, Stefan N, Goodarzi MO, Ray D, Selvin E, Florez JC, Meigs JB, Udler MS. Systematic review of precision subclassification of type 2 diabetes. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.04.19.23288577. [PMID: 37131632 PMCID: PMC10153304 DOI: 10.1101/2023.04.19.23288577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Heterogeneity in type 2 diabetes presentation, progression and treatment has the potential for precision medicine interventions that can enhance care and outcomes for affected individuals. We undertook a systematic review to ascertain whether strategies to subclassify type 2 diabetes are associated with improved clinical outcomes, show reproducibility and have high quality evidence. We reviewed publications that deployed 'simple subclassification' using clinical features, biomarkers, imaging or other routinely available parameters or 'complex subclassification' approaches that used machine learning and/or genomic data. We found that simple stratification approaches, for example, stratification based on age, body mass index or lipid profiles, had been widely used, but no strategy had been replicated and many lacked association with meaningful outcomes. Complex stratification using clustering of simple clinical data with and without genetic data did show reproducible subtypes of diabetes that had been associated with outcomes such as cardiovascular disease and/or mortality. Both approaches require a higher grade of evidence but support the premise that type 2 diabetes can be subclassified into meaningful groups. More studies are needed to test these subclassifications in more diverse ancestries and prove that they are amenable to interventions.
Collapse
|
30
|
Alfalahi H, Dias SB, Khandoker AH, Chaudhuri KR, Hadjileontiadis LJ. A scoping review of neurodegenerative manifestations in explainable digital phenotyping. NPJ Parkinsons Dis 2023; 9:49. [PMID: 36997573 PMCID: PMC10063633 DOI: 10.1038/s41531-023-00494-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 03/16/2023] [Indexed: 04/03/2023] Open
Abstract
Neurologists nowadays no longer view neurodegenerative diseases, like Parkinson's and Alzheimer's disease, as single entities, but rather as a spectrum of multifaceted symptoms with heterogeneous progression courses and treatment responses. The definition of the naturalistic behavioral repertoire of early neurodegenerative manifestations is still elusive, impeding early diagnosis and intervention. Central to this view is the role of artificial intelligence (AI) in reinforcing the depth of phenotypic information, thereby supporting the paradigm shift to precision medicine and personalized healthcare. This suggestion advocates the definition of disease subtypes in a new biomarker-supported nosology framework, yet without empirical consensus on standardization, reliability and interpretability. Although the well-defined neurodegenerative processes, linked to a triad of motor and non-motor preclinical symptoms, are detected by clinical intuition, we undertake an unbiased data-driven approach to identify different patterns of neuropathology distribution based on the naturalistic behavior data inherent to populations in-the-wild. We appraise the role of remote technologies in the definition of digital phenotyping specific to brain-, body- and social-level neurodegenerative subtle symptoms, emphasizing inter- and intra-patient variability powered by deep learning. As such, the present review endeavors to exploit digital technologies and AI to create disease-specific phenotypic explanations, facilitating the understanding of neurodegenerative diseases as "bio-psycho-social" conditions. Not only does this translational effort within explainable digital phenotyping foster the understanding of disease-induced traits, but it also enhances diagnostic and, eventually, treatment personalization.
Collapse
Affiliation(s)
- Hessa Alfalahi
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
- Healthcare Engineering Innovation Center (HEIC), Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
| | - Sofia B Dias
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Healthcare Engineering Innovation Center (HEIC), Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- CIPER, Faculdade de Motricidade Humana, University of Lisbon, Lisbon, Portugal
| | - Ahsan H Khandoker
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Healthcare Engineering Innovation Center (HEIC), Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Kallol Ray Chaudhuri
- Parkinson Foundation, International Center of Excellence, King's College London, Denmark Hills, London, UK
- Department of Basic and Clinical Neurosciences, Institute of Psychiatry, Psychology and Neuroscience, King's College London, De Crespigny Park, London, UK
| | - Leontios J Hadjileontiadis
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Healthcare Engineering Innovation Center (HEIC), Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
31
|
Chen A, Lu R, Han R, Huang R, Qin G, Wen J, Li Q, Zhang Z, Jiang W. Building Practical Risk Prediction Models for Nasopharyngeal Carcinoma Screening with Patient Graph Analysis and Machine Learning. Cancer Epidemiol Biomarkers Prev 2023; 32:274-280. [PMID: 36480263 DOI: 10.1158/1055-9965.epi-22-0792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 09/07/2022] [Accepted: 12/06/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND To expand nasopharyngeal carcinoma (NPC) screening to larger populations, more practical NPC risk prediction models independent of Epstein-Barr virus (EBV) and other lab tests are necessary. METHODS Patient data before diagnosis of NPC were collected from hospital electronic medical records (EMR) and used to develop machine learning (ML) models for NPC risk prediction using XGBoost. NPC risk factor distributions were generated through connection delta ratio (CDR) analysis of patient graphs. By combining EMR-wide ML with patient graph analysis, the number of variables in these risk models was reduced, allowing for more practical NPC risk prediction ML models. RESULTS Using data collected from 1,357 patients with NPC and 1,448 patients with control, an optimal set of 100 variables (ov100) was determined for building NPC risk prediction ML models that had, the following performance metrics: 0.93-0.96 recall, 0.80-0.92 precision, and 0.83-0.94 AUC. Aided by the analysis of top CDR-ranked risk factors, the models were further refined to contain only 20 practical variables (pv20), excluding EBV. The pv20 NPC risk XGBoost model achieved 0.79 recall, 0.94 precision, 0.96 specificity, and 0.87 AUC. CONCLUSIONS This study demonstrated the feasibility of developing practical NPC risk prediction models using EMR-wide ML and patient graph CDR analysis, without requiring EBV data. These models could enable broader implementation of NPC risk evaluation and screening recommendations for larger populations in urban community health centers and rural clinics. IMPACT These more practical NPC risk models could help increase NPC screening rate and identify more patients with early-stage NPC.
Collapse
Affiliation(s)
- Anjun Chen
- Guilin Medical University, Guilin, Guangxi, China.,West China Hospital, Chengdu, Sichuan, China
| | - Roufeng Lu
- Guilin Medical University, Guilin, Guangxi, China
| | - Ruobing Han
- Guilin Medical University, Guilin, Guangxi, China
| | - Ran Huang
- West China Hospital, Chengdu, Sichuan, China
| | - Guanjie Qin
- Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| | - Jian Wen
- Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| | - Qinghua Li
- Guilin Medical University, Guilin, Guangxi, China.,Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| | | | - Wei Jiang
- Guilin Medical University, Guilin, Guangxi, China.,Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| |
Collapse
|
32
|
Li F, Wu P, Ong HH, Peterson JF, Wei WQ, Zhao J. Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction. J Biomed Inform 2023; 138:104294. [PMID: 36706849 PMCID: PMC11104322 DOI: 10.1016/j.jbi.2023.104294] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 01/16/2023] [Accepted: 01/21/2023] [Indexed: 01/26/2023]
Abstract
OBJECTIVE The study aims to investigate whether machine learning-based predictive models for cardiovascular disease (CVD) risk assessment show equivalent performance across demographic groups (such as race and gender) and if bias mitigation methods can reduce any bias present in the models. This is important as systematic bias may be introduced when collecting and preprocessing health data, which could affect the performance of the models on certain demographic sub-cohorts. The study is to investigate this using electronic health records data and various machine learning models. METHODS The study used large de-identified Electronic Health Records data from Vanderbilt University Medical Center. Machine learning (ML) algorithms including logistic regression, random forest, gradient-boosting trees, and long short-term memory were applied to build multiple predictive models. Model bias and fairness were evaluated using equal opportunity difference (EOD, 0 indicates fairness) and disparate impact (DI, 1 indicates fairness). In our study, we also evaluated the fairness of a non-ML baseline model, the American Heart Association (AHA) Pooled Cohort Risk Equations (PCEs). Moreover, we compared the performance of three different de-biasing methods: removing protected attributes (e.g., race and gender), resampling the imbalanced training dataset by sample size, and resampling by the proportion of people with CVD outcomes. RESULTS The study cohort included 109,490 individuals (mean [SD] age 47.4 [14.7] years; 64.5% female; 86.3% White; 13.7% Black). The experimental results suggested that most ML models had smaller EOD and DI than PCEs. For ML models, the mean EOD ranged from -0.001 to 0.018 and the mean DI ranged from 1.037 to 1.094 across race groups. There was a larger EOD and DI across gender groups, with EOD ranging from 0.131 to 0.136 and DI ranging from 1.535 to 1.587. For debiasing methods, removing protected attributes didn't significantly reduced the bias for most ML models. Resampling by sample size also didn't consistently decrease bias. Resampling by case proportion reduced the EOD and DI for gender groups but slightly reduced accuracy in many cases. CONCLUSIONS Among the VUMC cohort, both PCEs and ML models were biased against women, suggesting the need to investigate and correct gender disparities in CVD risk prediction. Resampling by proportion reduced the bias for gender groups but not for race groups.
Collapse
Affiliation(s)
- Fuchen Li
- College of Art and Science, Vanderbilt University, Nashville, TN, USA
| | - Patrick Wu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Henry H. Ong
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Josh F. Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Juan Zhao
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
33
|
Explaining predictive factors in patient pathways using autoencoders. PLoS One 2022; 17:e0277135. [DOI: 10.1371/journal.pone.0277135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 10/20/2022] [Indexed: 11/12/2022] Open
Abstract
This paper introduces an end-to-end methodology to predict a pathway-related outcome and identifying predictive factors using autoencoders. A formal description of autoencoders for explainable binary predictions is presented, along with two objective functions that allows for filtering and inverting negative examples during training. A methodology to model and transform complex medical event logs is also proposed, which keeps the pathway information in terms of events and time, as well as the hierarchy information carried in medical codes. A case study is presented, in which the short-term mortality after the implementation of an Implantable Cardioverter-Defibrillator is predicted. Proposed methodologies have been tested and compared to other predictive methods, both explainable and not explainable. Results show the competitiveness of the method in terms of performances, particularly the use of a Variational Auto Encoder with an inverse objective function. Finally, the explainability of the method has been demonstrated, allowing for the identification of interesting predictive factors validated using relative risks.
Collapse
|
34
|
Carruthers R, Straw I, Ruffle JK, Herron D, Nelson A, Bzdok D, Fernandez-Reyes D, Rees G, Nachev P. Representational ethical model calibration. NPJ Digit Med 2022; 5:170. [PMID: 36333390 PMCID: PMC9636204 DOI: 10.1038/s41746-022-00716-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022] Open
Abstract
Equity is widely held to be fundamental to the ethics of healthcare. In the context of clinical decision-making, it rests on the comparative fidelity of the intelligence - evidence-based or intuitive - guiding the management of each individual patient. Though brought to recent attention by the individuating power of contemporary machine learning, such epistemic equity arises in the context of any decision guidance, whether traditional or innovative. Yet no general framework for its quantification, let alone assurance, currently exists. Here we formulate epistemic equity in terms of model fidelity evaluated over learnt multidimensional representations of identity crafted to maximise the captured diversity of the population, introducing a comprehensive framework for Representational Ethical Model Calibration. We demonstrate the use of the framework on large-scale multimodal data from UK Biobank to derive diverse representations of the population, quantify model performance, and institute responsive remediation. We offer our approach as a principled solution to quantifying and assuring epistemic equity in healthcare, with applications across the research, clinical, and regulatory domains.
Collapse
Affiliation(s)
- Robert Carruthers
- Department of Computer Science, University College London, London, UK.
| | - Isabel Straw
- UCL Queen Square Institute of Neurology, University College London, London, UK
| | - James K Ruffle
- UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Daniel Herron
- Research and Development, NIHR University College London Hospitals Biomedical Research Centre, London, UK
| | - Amy Nelson
- UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Danilo Bzdok
- Department of Biomedical Engineering, Faculty of Medicine, McGill University, Montreal, Canada
| | | | - Geraint Rees
- UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Parashkev Nachev
- UCL Queen Square Institute of Neurology, University College London, London, UK.
| |
Collapse
|
35
|
Chen A, Chen DO. Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data. Sci Rep 2022; 12:17917. [PMID: 36289292 PMCID: PMC9606301 DOI: 10.1038/s41598-022-23011-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 10/21/2022] [Indexed: 01/20/2023] Open
Abstract
When enabled by machine learning (ML), Learning Health Systems (LHS) hold promise for improving the effectiveness of healthcare delivery to patients. One major barrier to LHS research and development is the lack of access to EHR patient data. To overcome this challenge, this study demonstrated the feasibility of developing a simulated ML-enabled LHS using synthetic patient data. The ML-enabled LHS was initialized using a dataset of 30,000 synthetic Synthea patients and a risk prediction XGBoost base model for lung cancer. 4 additional datasets of 30,000 patients were generated and added to the previous updated dataset sequentially to simulate addition of new patients, resulting in datasets of 60,000, 90,000, 120,000 and 150,000 patients. New XGBoost models were built in each instance, and performance improved with data size increase, attaining 0.936 recall and 0.962 AUC (area under curve) in the 150,000 patients dataset. The effectiveness of the new ML-enabled LHS process was verified by implementing XGBoost models for stroke risk prediction on the same Synthea patient populations. By making the ML code and synthetic patient data publicly available for testing and training, this first synthetic LHS process paves the way for more researchers to start developing LHS with real patient data.
Collapse
Affiliation(s)
- Anjun Chen
- LHS Technology Forum Initiative, Learning Health Community, 748 Matadero Ave, Palo Alto, CA, 94306, USA.
| | - Drake O Chen
- LHS Technology Forum Initiative, Learning Health Community, 748 Matadero Ave, Palo Alto, CA, 94306, USA
| |
Collapse
|
36
|
Zou Y, Pesaranghader A, Song Z, Verma A, Buckeridge DL, Li Y. Modeling electronic health record data using an end-to-end knowledge-graph-informed topic model. Sci Rep 2022; 12:17868. [PMID: 36284225 PMCID: PMC9596500 DOI: 10.1038/s41598-022-22956-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 10/21/2022] [Indexed: 01/20/2023] Open
Abstract
The rapid growth of electronic health record (EHR) datasets opens up promising opportunities to understand human diseases in a systematic way. However, effective extraction of clinical knowledge from EHR data has been hindered by the sparse and noisy information. We present Graph ATtention-Embedded Topic Model (GAT-ETM), an end-to-end taxonomy-knowledge-graph-based multimodal embedded topic model. GAT-ETM distills latent disease topics from EHR data by learning the embedding from a constructed medical knowledge graph. We applied GAT-ETM to a large-scale EHR dataset consisting of over 1 million patients. We evaluated its performance based on topic quality, drug imputation, and disease diagnosis prediction. GAT-ETM demonstrated superior performance over the alternative methods on all tasks. Moreover, GAT-ETM learned clinically meaningful graph-informed embedding of the EHR codes and discovered interpretable and accurate patient representations for patient stratification and drug recommendations. GAT-ETM code is available at https://github.com/li-lab-mcgill/GAT-ETM .
Collapse
Affiliation(s)
- Yuesong Zou
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| | - Ahmad Pesaranghader
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| | - Ziyang Song
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| | - Aman Verma
- grid.14709.3b0000 0004 1936 8649School of Population and Global Health, McGill University, Montreal, Canada
| | - David L. Buckeridge
- grid.14709.3b0000 0004 1936 8649School of Population and Global Health, McGill University, Montreal, Canada
| | - Yue Li
- grid.14709.3b0000 0004 1936 8649School of Computer Science, McGill University, Montreal, Canada
| |
Collapse
|
37
|
Dileep G, Gianchandani Gyani SG. Artificial Intelligence in Breast Cancer Screening and Diagnosis. Cureus 2022; 14:e30318. [PMID: 36381716 PMCID: PMC9650950 DOI: 10.7759/cureus.30318] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 10/15/2022] [Indexed: 11/05/2022] Open
Abstract
Cancer is a disease that continues to plague our modern society. Among all types of cancer, breast cancer is now the most common type of cancer occurring in women worldwide. Various factors, including genetics, lifestyle, and the environment, have contributed to the rise in the prevalence of breast cancer among women of all socioeconomic strata. Therefore, proper screening for early diagnosis and treatment becomes a major factor when fighting the disease. Artificial intelligence (AI) continues to revolutionize various spheres of our lives with its numerous applications. Using AI in the existing screening process makes obtaining results even easier and more convenient. Faster, more accurate results are some of the benefits of AI methods in breast cancer screening. Nonetheless, there are many challenges in the process of the integration of AI that needs to be addressed systematically. The following is a review of the application of AI in breast cancer screening.
Collapse
|
38
|
Zucco AG, Agius R, Svanberg R, Moestrup KS, Marandi RZ, MacPherson CR, Lundgren J, Ostrowski SR, Niemann CU. Personalized survival probabilities for SARS-CoV-2 positive patients by explainable machine learning. Sci Rep 2022; 12:13879. [PMID: 35974050 PMCID: PMC9380679 DOI: 10.1038/s41598-022-17953-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 08/03/2022] [Indexed: 01/08/2023] Open
Abstract
Interpretable risk assessment of SARS-CoV-2 positive patients can aid clinicians to implement precision medicine. Here we trained a machine learning model to predict mortality within 12 weeks of a first positive SARS-CoV-2 test. By leveraging data on 33,938 confirmed SARS-CoV-2 cases in eastern Denmark, we considered 2723 variables extracted from electronic health records (EHR) including demographics, diagnoses, medications, laboratory test results and vital parameters. A discrete-time framework for survival modelling enabled us to predict personalized survival curves and explain individual risk factors. Performance on the test set was measured with a weighted concordance index of 0.95 and an area under the curve for precision-recall of 0.71. Age, sex, number of medications, previous hospitalizations and lymphocyte counts were identified as top mortality risk factors. Our explainable survival model developed on EHR data also revealed temporal dynamics of the 22 selected risk factors. Upon further validation, this model may allow direct reporting of personalized survival probabilities in routine care.
Collapse
Affiliation(s)
- Adrian G Zucco
- PERSIMUNE Center of Excellence, Rigshospitalet, Copenhagen, Denmark.
| | - Rudi Agius
- Department of Hematology, Rigshospitalet, Copenhagen, Denmark
| | | | | | - Ramtin Z Marandi
- PERSIMUNE Center of Excellence, Rigshospitalet, Copenhagen, Denmark
| | | | - Jens Lundgren
- PERSIMUNE Center of Excellence, Rigshospitalet, Copenhagen, Denmark.,Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Sisse R Ostrowski
- Department of Clinical Immunology, Rigshospitalet, Copenhagen, Denmark. .,Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
| | - Carsten U Niemann
- Department of Hematology, Rigshospitalet, Copenhagen, Denmark. .,Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
39
|
Preparing for the next pandemic via transfer learning from existing diseases with hierarchical multi-modal BERT: a study on COVID-19 outcome prediction. Sci Rep 2022; 12:10748. [PMID: 35750878 PMCID: PMC9232529 DOI: 10.1038/s41598-022-13072-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 05/20/2022] [Indexed: 11/14/2022] Open
Abstract
Developing prediction models for emerging infectious diseases from relatively small numbers of cases is a critical need for improving pandemic preparedness. Using COVID-19 as an exemplar, we propose a transfer learning methodology for developing predictive models from multi-modal electronic healthcare records by leveraging information from more prevalent diseases with shared clinical characteristics. Our novel hierarchical, multi-modal model (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\textsc {TransMED}}$$\end{document}TRANSMED) integrates baseline risk factors from the natural language processing of clinical notes at admission, time-series measurements of biomarkers obtained from laboratory tests, and discrete diagnostic, procedure and drug codes. We demonstrate the alignment of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\textsc {TransMED}}$$\end{document}TRANSMED’s predictions with well-established clinical knowledge about COVID-19 through univariate and multivariate risk factor driven sub-cohort analysis. \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\textsc {TransMED}}$$\end{document}TRANSMED’s superior performance over state-of-the-art methods shows that leveraging patient data across modalities and transferring prior knowledge from similar disorders is critical for accurate prediction of patient outcomes, and this approach may serve as an important tool in the early response to future pandemics.
Collapse
|
40
|
Maurits MP, Korsunsky I, Raychaudhuri S, Murphy SN, Smoller JW, Weiss ST, Petukhova LM, Weng C, Wei WQ, Huizinga TWJ, Reinders MJT, Karlson EW, van den Akker EB, Knevel R. A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history. J Am Med Inform Assoc 2022; 29:761-769. [PMID: 35139533 PMCID: PMC9122640 DOI: 10.1093/jamia/ocac008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 11/24/2021] [Accepted: 01/27/2022] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. MATERIAL AND METHODS We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. RESULTS We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2-8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. DISCUSSION Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. CONCLUSION We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes.
Collapse
Affiliation(s)
- Marc P Maurits
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Ilya Korsunsky
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Shawn N Murphy
- Research Information Science and Computing, Mass General Brigham, Boston, MA, USA
| | - Jordan W Smoller
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Scott T Weiss
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Lynn M Petukhova
- Lynn M. Petukhova, Department of Dermatology at NewYork-Presbyterian/Columbia University Medical Center (CUMC)
| | - Chunhua Weng
- Chunhua Weng, Biomedical Informatics - Columbia University
| | - Wei-Qi Wei
- Wei-Qi Wei, Biomedical Informatics in the School of Medicine at Vanderbilt University Wei
| | - Thomas W J Huizinga
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
| | - Marcel J T Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- The Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
| | - Elizabeth W Karlson
- Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Erik B van den Akker
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Section of Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Rachel Knevel
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
- Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
41
|
Zeng X, Linwood SL, Liu C. Pretrained transformer framework on pediatric claims data for population specific tasks. Sci Rep 2022; 12:3651. [PMID: 35256645 PMCID: PMC8901645 DOI: 10.1038/s41598-022-07545-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 01/28/2022] [Indexed: 11/09/2022] Open
Abstract
The adoption of electronic health records (EHR) has become universal during the past decade, which has afforded in-depth data-based research. By learning from the large amount of healthcare data, various data-driven models have been built to predict future events for different medical tasks, such as auto diagnosis and heart-attack prediction. Although EHR is abundant, the population that satisfies specific criteria for learning population-specific tasks is scarce, making it challenging to train data-hungry deep learning models. This study presents the Claim Pre-Training (Claim-PT) framework, a generic pre-training model that first trains on the entire pediatric claims dataset, followed by a discriminative fine-tuning on each population-specific task. The semantic meaning of medical events can be captured in the pre-training stage, and the effective knowledge transfer is completed through the task-aware fine-tuning stage. The fine-tuning process requires minimal parameter modification without changing the model architecture, which mitigates the data scarcity issue and helps train the deep learning model adequately on small patient cohorts. We conducted experiments on a real-world pediatric dataset with more than one million patient records. Experimental results on two downstream tasks demonstrated the effectiveness of our method: our general task-agnostic pre-training framework outperformed tailored task-specific models, achieving more than 10% higher in model performance as compared to baselines. In addition, our framework showed a potential to transfer learned knowledge from one institution to another, which may pave the way for future healthcare model pre-training across institutions.
Collapse
Affiliation(s)
- Xianlong Zeng
- Electrical Engineering and Computer Science, Ohio University, Athens, 45701, USA
| | - Simon L Linwood
- Research Information Solutions and Innovation, Nationwide Children's Hospital, Columbus, OH, 43205, USA
| | - Chang Liu
- Electrical Engineering and Computer Science, Ohio University, Athens, 45701, USA.
| |
Collapse
|
42
|
Predictive structured-unstructured interactions in EHR models: A case study of suicide prediction. NPJ Digit Med 2022; 5:15. [PMID: 35087182 PMCID: PMC8795240 DOI: 10.1038/s41746-022-00558-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 12/13/2021] [Indexed: 11/20/2022] Open
Abstract
Clinical risk prediction models powered by electronic health records (EHRs) are becoming increasingly widespread in clinical practice. With suicide-related mortality rates rising in recent years, it is becoming increasingly urgent to understand, predict, and prevent suicidal behavior. Here, we compare the predictive value of structured and unstructured EHR data for predicting suicide risk. We find that Naive Bayes Classifier (NBC) and Random Forest (RF) models trained on structured EHR data perform better than those based on unstructured EHR data. An NBC model trained on both structured and unstructured data yields similar performance (AUC = 0.743) to an NBC model trained on structured data alone (0.742, p = 0.668), while an RF model trained on both data types yields significantly better results (AUC = 0.903) than an RF model trained on structured data alone (0.887, p < 0.001), likely due to the RF model’s ability to capture interactions between the two data types. To investigate these interactions, we propose and implement a general framework for identifying specific structured-unstructured feature pairs whose interactions differ between case and non-case cohorts, and thus have the potential to improve predictive performance and increase understanding of clinical risk. We find that such feature pairs tend to capture heterogeneous pairs of general concepts, rather than homogeneous pairs of specific concepts. These findings and this framework can be used to improve current and future EHR-based clinical modeling efforts.
Collapse
|
43
|
Xie F, Yuan H, Ning Y, Ong MEH, Feng M, Hsu W, Chakraborty B, Liu N. Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies. J Biomed Inform 2021; 126:103980. [PMID: 34974189 DOI: 10.1016/j.jbi.2021.103980] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 11/07/2021] [Accepted: 12/20/2021] [Indexed: 12/21/2022]
Abstract
OBJECTIVE Temporal electronic health records (EHRs) contain a wealth of information for secondary uses, such as clinical events prediction and chronic disease management. However, challenges exist for temporal data representation. We therefore sought to identify these challenges and evaluate novel methodologies for addressing them through a systematic examination of deep learning solutions. METHODS We searched five databases (PubMed, Embase, the Institute of Electrical and Electronics Engineers [IEEE] Xplore Digital Library, the Association for Computing Machinery [ACM] Digital Library, and Web of Science) complemented with hand-searching in several prestigious computer science conference proceedings. We sought articles that reported deep learning methodologies on temporal data representation in structured EHR data from January 1, 2010, to August 30, 2020. We summarized and analyzed the selected articles from three perspectives: nature of time series, methodology, and model implementation. RESULTS We included 98 articles related to temporal data representation using deep learning. Four major challenges were identified, including data irregularity, heterogeneity, sparsity, and model opacity. We then studied how deep learning techniques were applied to address these challenges. Finally, we discuss some open challenges arising from deep learning. CONCLUSION Temporal EHR data present several major challenges for clinical prediction modeling and data utilization. To some extent, current deep learning solutions can address these challenges. Future studies may consider designing comprehensive and integrated solutions. Moreover, researchers should incorporate clinical domain knowledge into study designs and enhance model interpretability to facilitate clinical implementation.
Collapse
Affiliation(s)
- Feng Xie
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Han Yuan
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Department of Emergency Medicine, Singapore General Hospital, Singapore
| | - Mengling Feng
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Wynne Hsu
- School of Computing, National University of Singapore, Singapore; Institute of Data Science, National University of Singapore, Singapore
| | - Bibhas Chakraborty
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Department of Statistics and Data Science, National University of Singapore, Singapore; Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Nan Liu
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Institute of Data Science, National University of Singapore, Singapore; SingHealth AI Health Program, Singapore Health Services, Singapore.
| |
Collapse
|
44
|
Alexander N, Alexander DC, Barkhof F, Denaxas S. Identifying and evaluating clinical subtypes of Alzheimer's disease in care electronic health records using unsupervised machine learning. BMC Med Inform Decis Mak 2021; 21:343. [PMID: 34879829 PMCID: PMC8653614 DOI: 10.1186/s12911-021-01693-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/15/2021] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data and have been limited in terms of data breadth and sample size. Here we examine the clinical heterogeneity of Alzheimer's disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable. METHODS We identified AD patients in primary care EHR from the Clinical Practice Research Datalink (CPRD) using a previously validated rule-based phenotyping algorithm. We extracted and included a range of comorbidities, symptoms and demographic features as patient features. We evaluated four different clustering methods (k-means, kernel k-means, affinity propagation and latent class analysis) to cluster Alzheimer's disease patients. We compared clusters on clinically relevant outcomes and evaluated each method using measures of cluster structure, stability, efficiency of outcome prediction and replicability in external data sets. RESULTS We identified 7,913 AD patients, with a mean age of 82 and 66.2% female. We included 21 features in our analysis. We observed 5, 2, 5 and 6 clusters in k-means, kernel k-means, affinity propagation and latent class analysis respectively. K-means was found to produce the most consistent results based on four evaluative measures. We discovered a consistent cluster found in three of the four methods composed of predominantly female, younger disease onset (43% between ages 42-73) diagnosed with depression and anxiety, with a quicker rate of progression compared to the average across other clusters. CONCLUSION Each clustering approach produced substantially different clusters and K-Means performed the best out of the four methods based on the four evaluative criteria. However, the consistent appearance of one particular cluster across three of the four methods potentially suggests the presence of a distinct disease subtype that merits further exploration. Our study underlines the variability of the results obtained from different clustering approaches and the importance of systematically evaluating different approaches for identifying disease subtypes in complex EHR.
Collapse
Affiliation(s)
- Nonie Alexander
- Institute of Health Informatics, University College London, London, UK. .,Health Data Research UK, London, UK.
| | - Daniel C Alexander
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
| | - Frederik Barkhof
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK.,UCL Institute of Neurology, University College London, London, UK.,Department of Radiology and Nuclear Medicine, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK.,Health Data Research UK, London, UK.,Alan Turing Institute, London, UK
| |
Collapse
|
45
|
Chen J, Sun L, Yu K, Batmanghelich K. Extracting Disease-Relevant Features with Adversarial Regularization. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2021; 2021:3464-3471. [PMID: 35198261 PMCID: PMC8863436 DOI: 10.1109/bibm52615.2021.9669878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Extracting hidden phenotypes is essential in medical data analysis because it facilitates disease subtyping, diagnosis, and understanding of disease etiology. Since the hidden phenotype is usually a low-dimensional representation that comprehensively describes the disease, we require a dimensionality-reduction method that captures as much disease-relevant information as possible. However, most unsupervised or self-supervised methods cannot achieve the goal because they learn a holistic representation containing both disease-relevant and disease-irrelevant information. Supervised methods can capture information that is predictive to the target clinical variable only, but the learned representation is usually not generalizable for the various aspects of the disease. Hence, we develop a dimensionality-reduction approach to extract Disease Relevant Features (DRFs) based on information theory. We propose to use clinical variables that weakly define the disease as so-called anchors. We derive a formulation that makes the DRF predictive of the anchors while forcing the remaining representation to be irrelevant to the anchors via adversarial regularization. We apply our method to a large-scale study of Chronic Obstructive Pulmonary Disease (COPD). Our experiment shows: (1) Learned DRFs are as predictive as the original representation in predicting the anchors, although it is in a significantly lower dimension. (2) Compared to supervised representation, the learned DRFs are more predictive to other relevant disease metrics that are not used during the training. (3) The learned DRFs are related to non-imaging biological measurements such as gene expressions, suggesting the DRFs include information related to the underlying biology of the disease.
Collapse
Affiliation(s)
- Junxiang Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Li Sun
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Ke Yu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Kayhan Batmanghelich
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
46
|
Searle T, Ibrahim Z, Teo J, Dobson R. Estimating redundancy in clinical text. J Biomed Inform 2021; 124:103938. [PMID: 34695581 DOI: 10.1016/j.jbi.2021.103938] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 08/19/2021] [Accepted: 10/17/2021] [Indexed: 12/15/2022]
Abstract
The current mode of use of Electronic Health Records (EHR) elicits text redundancy. Clinicians often populate new documents by duplicating existing notes, then updating accordingly. Data duplication can lead to propagation of errors, inconsistencies and misreporting of care. Therefore, measures to quantify information redundancy play an essential role in evaluating innovations that operate on clinical narratives. This work is a quantitative examination of information redundancy in EHR notes. We present and evaluate two methods to measure redundancy: an information-theoretic approach and a lexicosyntactic and semantic model. Our first measure trains large Transformer-based language models using clinical text from a large openly available US-based ICU dataset and a large multi-site UK based Hospital. By comparing the information-theoretic efficient encoding of clinical text against open-domain corpora, we find that clinical text is ∼1.5× to ∼3× less efficient than open-domain corpora at conveying information. Our second measure, evaluates automated summarisation metrics Rouge and BERTScore to evaluate successive note pairs demonstrating lexicosyntactic and semantic redundancy, with averages from ∼43 to ∼65%.
Collapse
Affiliation(s)
- Thomas Searle
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
| | - Zina Ibrahim
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
| | - James Teo
- King's College Hospital NHS Foundation Trust, London, UK.
| | - Richard Dobson
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Institute of Health Informatics, University College London, London, UK.
| |
Collapse
|
47
|
Lee S, Kang S, Eun Y, Won HH, Kim H, Cha HS, Koh EM, Lee J. A cluster analysis of patients with axial spondyloarthritis using tumour necrosis factor alpha inhibitors based on clinical characteristics. Arthritis Res Ther 2021; 23:284. [PMID: 34782006 PMCID: PMC8591959 DOI: 10.1186/s13075-021-02647-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Accepted: 10/12/2021] [Indexed: 11/24/2022] Open
Abstract
Background This study aimed to classify the distinct group of patients with axial spondyloarthritis (SpA) on tumour necrosis factor alpha inhibitors (TNFi) according to the baseline characteristics using a clustering algorithm. Methods The clinical characteristics and demographic data of patients with axial SpA included in the Korean College of Rheumatology Biologics and Targeted Therapy registry were investigated. The patterns of disease manifestations were examined using divisive hierarchical cluster analysis. After clustering, we compared the clinical characteristics of patients and the drug survival of TNFi between the classified groups. Results A total of 1042 patients were analysed. The cluster analysis classified patients into two groups: axial group predominantly showing isolated axial manifestations (n = 828) and extra-axial group more frequently showing extra-axial symptoms (n = 214). Almost all extra-axial symptoms (peripheral arthritis, enthesitis, uveitis, and psoriasis) were more frequently observed in the extra-axial group than in the axial group. Moreover, patients in the extra-axial group had shorter disease duration, later disease onset, and higher disease activity than those in the axial group. The disease activity was comparable between the two groups after 1 year of treatment with TNFi. Interestingly, the extra-axial group had a lower drug survival with TNFi than the axial group (p = 0.001). Conclusions Cluster analysis of patients with axial SpA using TNFi classified two distinct clinical phenotypes. These clusters had different TNFi drug survival, clinical characteristics, and disease activity. Supplementary Information The online version contains supplementary material available at 10.1186/s13075-021-02647-z.
Collapse
Affiliation(s)
- Seulkee Lee
- Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea
| | - Seonyoung Kang
- Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea
| | - Yeonghee Eun
- Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea
| | - Hong-Hee Won
- Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea
| | - Hyungjin Kim
- Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea
| | - Hoon-Suk Cha
- Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea
| | - Eun-Mi Koh
- Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea
| | - Jaejoon Lee
- Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea.
| |
Collapse
|
48
|
Wanyan T, Honarvar H, Jaladanki SK, Zang C, Naik N, Somani S, De Freitas JK, Paranjpe I, Vaid A, Zhang J, Miotto R, Wang Z, Nadkarni GN, Zitnik M, Azad A, Wang F, Ding Y, Glicksberg BS. Contrastive Learning Improves Critical Event Prediction in COVID-19 Patients. PATTERNS 2021; 2:100389. [PMID: 34723227 PMCID: PMC8542449 DOI: 10.1016/j.patter.2021.100389] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 09/12/2021] [Accepted: 10/21/2021] [Indexed: 12/30/2022]
Abstract
Deep Learning (DL) models typically require large-scale, balanced training data to be robust, generalizable, and effective in the context of healthcare. This has been a major issue for developing DL models for the coronavirus-disease 2019 (COVID-19) pandemic where data are highly class imbalanced. Conventional approaches in DL use cross-entropy loss (CEL) which often suffers from poor margin classification. We show that contrastive loss (CL) improves the performance of CEL especially in imbalanced electronic health records (EHR) data for COVID-19 analyses. We use a diverse EHR data set to predict three outcomes: mortality, intubation, and intensive care unit (ICU) transfer in hospitalized COVID-19 patients over multiple time windows. To compare the performance of CEL and CL, models are tested on the full data set and a restricted data set. CL models consistently outperform CEL models with differences ranging from 0.04 to 0.15 for AUPRC and 0.05 to 0.1 for AUROC.
Collapse
Affiliation(s)
- Tingyi Wanyan
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Hossein Honarvar
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Suraj K Jaladanki
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chengxi Zang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Nidhi Naik
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sulaiman Somani
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jessica K De Freitas
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ishan Paranjpe
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Akhil Vaid
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jing Zhang
- Renmin University of China, Beijing, China
| | - Riccardo Miotto
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Zhangyang Wang
- Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX, USA
| | - Girish N Nadkarni
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard University, USA
| | - Ariful Azad
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Ying Ding
- Dell Medical School, University of Texas at Austin, Austin, TX, USA.,School of Informatics, University of Texas at Austin, Austin, TX, USA
| | - Benjamin S Glicksberg
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
49
|
Peng J, Jury EC, Dönnes P, Ciurtin C. Machine Learning Techniques for Personalised Medicine Approaches in Immune-Mediated Chronic Inflammatory Diseases: Applications and Challenges. Front Pharmacol 2021; 12:720694. [PMID: 34658859 PMCID: PMC8514674 DOI: 10.3389/fphar.2021.720694] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 09/14/2021] [Indexed: 12/12/2022] Open
Abstract
In the past decade, the emergence of machine learning (ML) applications has led to significant advances towards implementation of personalised medicine approaches for improved health care, due to the exceptional performance of ML models when utilising complex big data. The immune-mediated chronic inflammatory diseases are a group of complex disorders associated with dysregulated immune responses resulting in inflammation affecting various organs and systems. The heterogeneous nature of these diseases poses great challenges for tailored disease management and addressing unmet patient needs. Applying novel ML techniques to the clinical study of chronic inflammatory diseases shows promising results and great potential for precision medicine applications in clinical research and practice. In this review, we highlight the clinical applications of various ML techniques for prediction, diagnosis and prognosis of autoimmune rheumatic diseases, inflammatory bowel disease, autoimmune chronic kidney disease, and multiple sclerosis, as well as ML applications for patient stratification and treatment selection. We highlight the use of ML in drug development, including target identification, validation and drug repurposing, as well as challenges related to data interpretation and validation, and ethical concerns related to the use of artificial intelligence in clinical research.
Collapse
Affiliation(s)
- Junjie Peng
- Department of Medicine, Centre for Adolescent Rheumatology Versus Arthritis, University College London, London, United Kingdom
| | - Elizabeth C. Jury
- Department of Medicine, Centre for Adolescent Rheumatology Versus Arthritis, University College London, London, United Kingdom
- Department of Medicine, Centre for Rheumatology Research, University College London, London, United Kingdom
| | | | - Coziana Ciurtin
- Department of Medicine, Centre for Adolescent Rheumatology Versus Arthritis, University College London, London, United Kingdom
| |
Collapse
|
50
|
Ziletti A, Berns C, Treichel O, Weber T, Liang J, Kammerath S, Schwaerzler M, Virayah J, Ruau D, Ma X, Mattern A. Discovering Key Topics From Short, Real-World Medical Inquiries via Natural Language Processing. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.672867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Millions of unsolicited medical inquiries are received by pharmaceutical companies every year. It has been hypothesized that these inquiries represent a treasure trove of information, potentially giving insight into matters regarding medicinal products and the associated medical treatments. However, due to the large volume and specialized nature of the inquiries, it is difficult to perform timely, recurrent, and comprehensive analyses. Here, we combine biomedical word embeddings, non-linear dimensionality reduction, and hierarchical clustering to automatically discover key topics in real-world medical inquiries from customers. This approach does not require ontologies nor annotations. The discovered topics are meaningful and medically relevant, as judged by medical information specialists, thus demonstrating that unsolicited medical inquiries are a source of valuable customer insights. Our work paves the way for the machine-learning-driven analysis of medical inquiries in the pharmaceutical industry, which ultimately aims at improving patient care.
Collapse
|