1
|
Eghbali N, Klochko C, Mahdi Z, Alhiari L, Lee J, Knisely B, Craig J, Ghassemi MM. Enhancing Radiology Clinical Histories Through Transformer-Based Automated Clinical Note Summarization. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01477-8. [PMID: 40195229 DOI: 10.1007/s10278-025-01477-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 03/05/2025] [Accepted: 03/07/2025] [Indexed: 04/09/2025]
Abstract
Insufficient clinical information provided in radiology requests, coupled with the cumbersome nature of electronic health records (EHRs), poses significant challenges for radiologists in extracting pertinent clinical data and compiling detailed radiology reports. Considering the challenges and time involved in navigating electronic medical records (EMR), an automated method to accurately compress the text while maintaining key semantic information could significantly enhance the efficiency of radiologists' workflow. The purpose of this study is to develop and demonstrate an automated tool for clinical note summarization with the goal of extracting the most pertinent clinical information for the radiological assessments. We adopted a transfer learning methodology from the natural language processing domain to fine-tune a transformer model for abstracting clinical reports. We employed a dataset consisting of 1000 clinical notes from 970 patients who underwent knee MRI, all manually summarized by radiologists. The fine-tuning process involved a two-stage approach starting with self-supervised denoising and then focusing on the summarization task. The model successfully condensed clinical notes by 97% while aligning closely with radiologist-written summaries evidenced by a 0.9 cosine similarity and a ROUGE-1 score of 40.18. In addition, statistical analysis, indicated by a Fleiss kappa score of 0.32, demonstrated fair agreement among specialists on the model's effectiveness in producing more relevant clinical histories compared to those included in the exam requests. The proposed model effectively summarized clinical notes for knee MRI studies, thereby demonstrating potential for improving radiology reporting efficiency and accuracy.
Collapse
|
2
|
Liu M, Deng K, Wang M, He Q, Xu J, Li G, Zou K, Sun X, Wang W. Methods for identifying health status from routinely collected health data: An overview. Integr Med Res 2025; 14:101100. [PMID: 39897572 PMCID: PMC11786076 DOI: 10.1016/j.imr.2024.101100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 11/01/2024] [Accepted: 11/13/2024] [Indexed: 02/04/2025] Open
Abstract
Routinely collected health data (RCD) are currently accelerating publications that evaluate the effectiveness and safety of medicines and medical devices. One of the fundamental steps in using these data is developing algorithms to identify health status that can be used for observational studies. However, the process and methodologies for identifying health status from RCD remain insufficiently understood. While most current methods rely on International Classification of Diseases (ICD) codes, they may not be universally applicable. Although machine learning methods hold promise for more accurately identifying the health status, they remain underutilized in RCD studies. To address these significant methodological gaps, we outline key steps and methodological considerations for identifying health statuses in observational studies using RCD. This review has the potential to boost the credibility of findings from observational studies that use RCD.
Collapse
Affiliation(s)
- Mei Liu
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Ke Deng
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Mingqi Wang
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Qiao He
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Jiayue Xu
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Guowei Li
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada
- Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangzhou, Guangdong, China
- Biostatistics Unit, Research Institute at St. Joseph's Healthcare Hamilton, Hamilton, ON, Canada
| | - Kang Zou
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Xin Sun
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
- West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China
| | - Wen Wang
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| |
Collapse
|
3
|
Bressler T, Song J, Kamalumpundi V, Chae S, Song H, Tark A. Leveraging Artificial Intelligence/Machine Learning Models to Identify Potential Palliative Care Beneficiaries: A Systematic Review. J Gerontol Nurs 2025; 51:7-14. [PMID: 39746126 DOI: 10.3928/00989134-20241210-01] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
Abstract
PURPOSE The current review examined the application of artificial intelligence (AI) and machine learning (ML) techniques in palliative care, specifically focusing on models used to identify potential beneficiaries of palliative services among individuals with chronic and terminal illnesses. METHODS A systematic review was conducted across four electronic databases. Five studies met inclusion criteria, all of which applied AI/ML models to predict outcomes relevant to palliative care, such as mortality or the need for services. RESULTS Of 1,504 studies screened, five studies used supervised ML algorithms, whereas one used natural language processing with a deep learning model to identify potential palliative care candidates. The most common AI/ML algorithms included neural network-based models, logistic regression, and tree-based models. CONCLUSION AI and ML models offer promising avenues for identifying palliative care beneficiaries. As AI continues to evolve, its potential to reshape palliative care through early identification is significant, providing opportunities for timely and targeted care interventions. [Journal of Gerontological Nursing, 51(1), 7-14.].
Collapse
|
4
|
Noda R, Ichikawa D, Shibagaki Y. Machine learning-based diagnostic prediction of minimal change disease: model development study. Sci Rep 2024; 14:23460. [PMID: 39379539 PMCID: PMC11461711 DOI: 10.1038/s41598-024-73898-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 09/23/2024] [Indexed: 10/10/2024] Open
Abstract
Minimal change disease (MCD) is a common cause of nephrotic syndrome. Due to its rapid progression, early detection is essential; however, definitive diagnosis requires invasive kidney biopsy. This study aims to develop non-invasive predictive models for diagnosing MCD by machine learning. We retrospectively collected data on demographic characteristics, blood tests, and urine tests from patients with nephrotic syndrome who underwent kidney biopsy. We applied four machine learning algorithms-TabPFN, LightGBM, Random Forest, and Artificial Neural Network-and logistic regression. We compared their performance using stratified 5-repeated 5-fold cross-validation for the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Variable importance was evaluated using the SHapley Additive exPlanations (SHAP) method. A total of 248 patients were included, with 82 cases (33%) were diagnosed with MCD. TabPFN demonstrated the best performance with an AUROC of 0.915 (95% CI 0.896-0.932) and an AUPRC of 0.840 (95% CI 0.807-0.872). The SHAP methods identified C3, total cholesterol, and urine red blood cells as key predictors for TabPFN, consistent with previous reports. Machine learning models could be valuable non-invasive diagnostic tools for MCD.
Collapse
Affiliation(s)
- Ryunosuke Noda
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, Miyamae-ku, Kawasaki, Kanagawa, 216-8511, Japan.
| | - Daisuke Ichikawa
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, Miyamae-ku, Kawasaki, Kanagawa, 216-8511, Japan
| | - Yugo Shibagaki
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, Miyamae-ku, Kawasaki, Kanagawa, 216-8511, Japan
| |
Collapse
|
5
|
Bandyopadhyay A, Albashayreh A, Zeinali N, Fan W, Gilbertson-White S. Using real-world electronic health record data to predict the development of 12 cancer-related symptoms in the context of multimorbidity. JAMIA Open 2024; 7:ooae082. [PMID: 39282082 PMCID: PMC11397936 DOI: 10.1093/jamiaopen/ooae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 08/09/2024] [Accepted: 09/05/2024] [Indexed: 09/18/2024] Open
Abstract
Objective This study uses electronic health record (EHR) data to predict 12 common cancer symptoms, assessing the efficacy of machine learning (ML) models in identifying symptom influencers. Materials and Methods We analyzed EHR data of 8156 adults diagnosed with cancer who underwent cancer treatment from 2017 to 2020. Structured and unstructured EHR data were sourced from the Enterprise Data Warehouse for Research at the University of Iowa Hospital and Clinics. Several predictive models, including logistic regression, random forest (RF), and XGBoost, were employed to forecast symptom development. The performances of the models were evaluated by F1-score and area under the curve (AUC) on the testing set. The SHapley Additive exPlanations framework was used to interpret these models and identify the predictive risk factors associated with fatigue as an exemplar. Results The RF model exhibited superior performance with a macro average AUC of 0.755 and an F1-score of 0.729 in predicting a range of cancer-related symptoms. For instance, the RF model achieved an AUC of 0.954 and an F1-score of 0.914 for pain prediction. Key predictive factors identified included clinical history, cancer characteristics, treatment modalities, and patient demographics depending on the symptom. For example, the odds ratio (OR) for fatigue was significantly influenced by allergy (OR = 2.3, 95% CI: 1.8-2.9) and colitis (OR = 1.9, 95% CI: 1.5-2.4). Discussion Our research emphasizes the critical integration of multimorbidity and patient characteristics in modeling cancer symptoms, revealing the considerable influence of chronic conditions beyond cancer itself. Conclusion We highlight the potential of ML for predicting cancer symptoms, suggesting a pathway for integrating such models into clinical systems to enhance personalized care and symptom management.
Collapse
Affiliation(s)
- Anindita Bandyopadhyay
- Department of Business Analytics, University of Iowa, Iowa City, IA 52242, United States
| | - Alaa Albashayreh
- College of Nursing, University of Iowa, Iowa City, IA 52242, United States
| | - Nahid Zeinali
- Department of Informatics, University of Iowa, Iowa City, IA 52242, United States
| | - Weiguo Fan
- Department of Business Analytics, University of Iowa, Iowa City, IA 52242, United States
| | | |
Collapse
|
6
|
Xian S, Grabowska ME, Kullo IJ, Luo Y, Smoller JW, Wei WQ, Jarvik G, Mooney S, Crosslin D. Language-model-based patient embedding using electronic health records facilitates phenotyping, disease forecasting, and progression analysis. RESEARCH SQUARE 2024:rs.3.rs-4708839. [PMID: 39399661 PMCID: PMC11469380 DOI: 10.21203/rs.3.rs-4708839/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Current studies regarding the secondary use of electronic health records (EHR) predominantly rely on domain expertise and existing medical knowledge. Though significant efforts have been devoted to investigating the application of machine learning algorithms in the EHR, efficient and powerful representation of patients is needed to unleash the potential of discovering new medical patterns underlying the EHR. Here, we present an unsupervised method for embedding high-dimensional EHR data at the patient level, aimed at characterizing patient heterogeneity in complex diseases and identifying new disease patterns associated with clinical outcome disparities. Inspired by the architecture of modern language models-specifically transformers with attention mechanisms, we use patient diagnosis and procedure codes as vocabularies and treat each patient as a sentence to perform the patient embedding. We applied this approach to 34,851 unique medical codes across 1,046,649 longitudinal patient events, including 102,739 patients from the electronic Medical Records and GEnomics (eMERGE) Network. The resulting patient vectors demonstrated excellent performance in predicting future disease events (median AUROC = 0.87 within one year) and bulk phenotyping (median AUROC = 0.84). We then illustrated the utility of these patient vectors in revealing heterogeneous comorbidity patterns, exemplified by disease subtypes in colorectal cancer and systemic lupus erythematosus, and capturing distinct longitudinal disease trajectories. External validation using EHR data from the University of Washington confirmed robust model performance, with median AUROCs of 0.83 and 0.84 for bulk phenotyping tasks and disease onset prediction, respectively. Importantly, the model reproduced the clustering results of disease subtypes identified in the eMERGE cohort and uncovered variations in overall mortality among these subtypes. Together, these results underscore the potential of representation learning in EHRs to enhance patient characterization and associated clinical outcomes, thereby advancing disease forecasting and facilitating personalized medicine.
Collapse
Affiliation(s)
- Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA
| | - Monika E Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine and the Gonda Vascular Center, Mayo Clinic Rochester Minnesota
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Gail Jarvik
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Sean Mooney
- Center for Information Technology, National Institutes of Health
| | - David Crosslin
- Department of Medicine, Division of Biomedical Informatics and Genomics, Tulane University, New Orleans, LA
| |
Collapse
|
7
|
Irandoust K, Parsakia K, Estifa A, Zoormand G, Knechtle B, Rosemann T, Weiss K, Taheri M. Predicting and comparing the long-term impact of lifestyle interventions on individuals with eating disorders in active population: a machine learning evaluation. Front Nutr 2024; 11:1390751. [PMID: 39171102 PMCID: PMC11337873 DOI: 10.3389/fnut.2024.1390751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 07/15/2024] [Indexed: 08/23/2024] Open
Abstract
Objective This study aims to evaluate and predict the long-term effectiveness of five lifestyle interventions for individuals with eating disorders using machine learning techniques. Methods This study, conducted at Dr. Irandoust's Health Center at Qazvin from August 2021 to August 2023, aimed to evaluate the effects of five lifestyle interventions on individuals with eating disorders, initially diagnosed using The Eating Disorder Diagnostic Scale (EDDS). The interventions were: (1) Counseling, exercise, and dietary regime, (2) Aerobic exercises with dietary regime, (3) Walking and dietary regime, (4) Exercise with a flexible diet, and (5) Exercises through online programs and applications. Out of 955 enrolled participants, 706 completed the study, which measured Body Fat Percentage (BFP), Waist-Hip Ratio (WHR), Fasting Blood Sugar (FBS), Low-Density Lipoprotein (LDL) Cholesterol, Total Cholesterol (CHO), Weight, and Triglycerides (TG) at baseline, during, and at the end of the intervention. Random Forest and Gradient Boosting Regressors, following feature engineering, were used to analyze the data, focusing on the interventions' long-term effectiveness on health outcomes related to eating disorders. Results Feature engineering with Random Forest and Gradient Boosting Regressors, respectively, reached an accuracy of 85 and 89%, then 89 and 90% after dataset balancing. The interventions were ranked based on predicted effectiveness: counseling with exercise and dietary regime, aerobic exercises with dietary regime, walking with dietary regime, exercise with a flexible diet, and exercises through online programs. Conclusion The results show that Machine Learning (ML) models effectively predicted the long-term effectiveness of lifestyle interventions. The current study suggests a significant potential for tailored health strategies. This emphasizes the most effective interventions for individuals with eating disorders. According to the results, it can also be suggested to expand demographics and geographic locations of participants, longer study duration, exploring advanced machine learning techniques, and including psychological and social adherence factors. Ultimately, these results can guide healthcare providers and policymakers in creating targeted lifestyle intervention strategies, emphasizing personalized health plans, and leveraging machine learning for predictive healthcare solutions.
Collapse
Affiliation(s)
- Khadijeh Irandoust
- Department of Sport Sciences, Imam Khomeini International University, Qazvin, Iran
| | - Kamdin Parsakia
- Department of Psychology and Counseling, KMAN Research Institute, Richmond Hill, ON, Canada
| | - Ali Estifa
- Department of Sport Sciences, Imam Khomeini International University, Qazvin, Iran
| | - Gholamreza Zoormand
- Department of Physical Education, Huanggang Normal University, Huanggang, China
| | - Beat Knechtle
- Medbase St. Gallen Am Vadianplatz, St. Gallen, Switzerland
| | - Thomas Rosemann
- Institute of Primary Care, University of Zürich, Zürich, Switzerland
| | - Katja Weiss
- Institute of Primary Care, University of Zürich, Zürich, Switzerland
| | - Morteza Taheri
- Department of Cognitive and Behavioural Sciences in Sport, Faculty of Sport Science and Health, University of Tehran, Tehran, Iran
| |
Collapse
|
8
|
Wang W, Jin YH, Liu M, He Q, Xu JY, Wang MQ, Li GW, Fu B, Yan SY, Zou K, Sun X. Guidance of development, validation, and evaluation of algorithms for populating health status in observational studies of routinely collected data (DEVELOP-RCD). Mil Med Res 2024; 11:52. [PMID: 39107834 PMCID: PMC11302358 DOI: 10.1186/s40779-024-00559-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 07/24/2024] [Indexed: 03/17/2025] Open
Abstract
BACKGROUND In recent years, there has been a growing trend in the utilization of observational studies that make use of routinely collected healthcare data (RCD). These studies rely on algorithms to identify specific health conditions (e.g. diabetes or sepsis) for statistical analyses. However, there has been substantial variation in the algorithm development and validation, leading to frequently suboptimal performance and posing a significant threat to the validity of study findings. Unfortunately, these issues are often overlooked. METHODS We systematically developed guidance for the development, validation, and evaluation of algorithms designed to identify health status (DEVELOP-RCD). Our initial efforts involved conducting both a narrative review and a systematic review of published studies on the concepts and methodological issues related to algorithm development, validation, and evaluation. Subsequently, we conducted an empirical study on an algorithm for identifying sepsis. Based on these findings, we formulated specific workflow and recommendations for algorithm development, validation, and evaluation within the guidance. Finally, the guidance underwent independent review by a panel of 20 external experts who then convened a consensus meeting to finalize it. RESULTS A standardized workflow for algorithm development, validation, and evaluation was established. Guided by specific health status considerations, the workflow comprises four integrated steps: assessing an existing algorithm's suitability for the target health status; developing a new algorithm using recommended methods; validating the algorithm using prescribed performance measures; and evaluating the impact of the algorithm on study results. Additionally, 13 good practice recommendations were formulated with detailed explanations. Furthermore, a practical study on sepsis identification was included to demonstrate the application of this guidance. CONCLUSIONS The establishment of guidance is intended to aid researchers and clinicians in the appropriate and accurate development and application of algorithms for identifying health status from RCD. This guidance has the potential to enhance the credibility of findings from observational studies involving RCD.
Collapse
Affiliation(s)
- Wen Wang
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China.
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China.
| | - Ying-Hui Jin
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Mei Liu
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China
| | - Qiao He
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China
| | - Jia-Yue Xu
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China
| | - Ming-Qi Wang
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China
| | - Guo-Wei Li
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, L8S 4L8, Canada
- Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangzhou, 510317, China
- Biostatistics Unit, Research Institute at St. Joseph's Healthcare Hamilton, Hamilton, ON, L8N 4A6, Canada
| | - Bo Fu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Si-Yu Yan
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Kang Zou
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China
| | - Xin Sun
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China.
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, 610041, China.
- West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
9
|
Scott IA, De Guzman KR, Falconer N, Canaris S, Bonilla O, McPhail SM, Marxen S, Van Garderen A, Abdel-Hafez A, Barras M. Evaluating automated machine learning platforms for use in healthcare. JAMIA Open 2024; 7:ooae031. [PMID: 38863963 PMCID: PMC11165368 DOI: 10.1093/jamiaopen/ooae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/06/2024] [Accepted: 04/22/2024] [Indexed: 06/13/2024] Open
Abstract
Objective To describe development and application of a checklist of criteria for selecting an automated machine learning (Auto ML) platform for use in creating clinical ML models. Materials and Methods Evaluation criteria for selecting an Auto ML platform suited to ML needs of a local health district were developed in 3 steps: (1) identification of key requirements, (2) a market scan, and (3) an assessment process with desired outcomes. Results The final checklist comprising 21 functional and 6 non-functional criteria was applied to vendor submissions in selecting a platform for creating a ML heparin dosing model as a use case. Discussion A team of clinicians, data scientists, and key stakeholders developed a checklist which can be adapted to ML needs of healthcare organizations, the use case providing a relevant example. Conclusion An evaluative checklist was developed for selecting Auto ML platforms which requires validation in larger multi-site studies.
Collapse
Affiliation(s)
- Ian A Scott
- Centre for Health Services Research, University of Queensland, Brisbane, 4102, Australia
- Department of Internal Medicine and Clinical Epidemiology, Princess Alexandra Hospital, Brisbane, 4102, Australia
| | - Keshia R De Guzman
- Department of Pharmacy, Princess Alexandra Hospital, Brisbane, 4102, Australia
- School of Pharmacy, The University of Queensland, Brisbane, 4102, Australia
| | - Nazanin Falconer
- Department of Pharmacy, Princess Alexandra Hospital, Brisbane, 4102, Australia
- School of Pharmacy, The University of Queensland, Brisbane, 4102, Australia
| | - Stephen Canaris
- Digital Health and Informatics, Metro South Health, Brisbane, 4102, Australia
| | - Oscar Bonilla
- Digital Health and Informatics, Metro South Health, Brisbane, 4102, Australia
| | - Steven M McPhail
- Digital Health and Informatics, Metro South Health, Brisbane, 4102, Australia
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Queensland University of Technology, Brisbane, 4059, Australia
| | - Sven Marxen
- Pharmacy Service, Logan and Beaudesert Hospitals, Logan, 4131, Australia
| | - Aaron Van Garderen
- Digital Health and Informatics, Metro South Health, Brisbane, 4102, Australia
- Pharmacy Service, Logan and Beaudesert Hospitals, Logan, 4131, Australia
| | - Ahmad Abdel-Hafez
- Digital Health and Informatics, Metro South Health, Brisbane, 4102, Australia
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Queensland University of Technology, Brisbane, 4059, Australia
| | - Michael Barras
- Department of Pharmacy, Princess Alexandra Hospital, Brisbane, 4102, Australia
- School of Pharmacy, The University of Queensland, Brisbane, 4102, Australia
| |
Collapse
|
10
|
Wu JJ, Hauben M, Younus M. Current Approaches in Postapproval Vaccine Safety Studies Using Real-World Data: A Systematic Review of Published Literature. Clin Ther 2024; 46:555-564. [PMID: 39142925 DOI: 10.1016/j.clinthera.2024.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 05/06/2024] [Accepted: 06/05/2024] [Indexed: 08/16/2024]
Abstract
PURPOSE Well-designed observational postmarketing studies using real-world data (RWD) are critical in supporting an evidence base and bolstering public confidence in vaccine safety. This systematic review presents current research methodologies in vaccine safety research in postapproval settings, technological advancements contributing to research resources and capabilities, and their major strengths and limitations. METHODS A comprehensive search was conducted using PubMed to identify relevant articles published from January 1, 2019, to December 31, 2022. Eligible studies were summarized overall by study design and other study characteristics (eg, country, vaccine studied, types of data source, and study population). An in-depth review of select studies representative of conventional or new designs, analytical approaches, or data collection methods was conducted to summarize current methods in vaccine safety research. FINDINGS Out of 977 articles screened for inclusion, 135 were reviewed. The review shows that recent advancements in scientific methods, digital technology, and analytic approaches have significantly contributed to postapproval vaccine safety studies using RWD. "Near real-time surveillance" using large datasets (via collaborative or distributed databases) has been used to facilitate rapid signal detection that complements passive surveillance. There was increasing appreciation for self-controlled case-only designs (self-controlled case series and self-controlled risk interval) to assess acute-onset safety outcomes, artificial intelligence, and natural language processing to improve outcome accuracy and study timeliness and emerging artificial intelligence-based analysis to capture adverse events from social media platforms. IMPLICATIONS Continued development in the area of vaccine safety research methodologies using RWD is warranted. The future of successful vaccine safety research, especially evaluation of rare safety events, is likely to comprise digital technologies including linking RWD networks, machine learning, and advanced analytic methods to generate rapid and robust real-world safety information.
Collapse
Affiliation(s)
- Juan Joanne Wu
- Safety Surveillance Research, Worldwide Medical and Safety, Pfizer Inc, New York, NY
| | - Manfred Hauben
- Department of Family and Community Medicine, New York Medical College, Valhalla, NY and Truliant Consulting, Baltimore, Maryland
| | - Muhammad Younus
- Safety Surveillance Research, Worldwide Medical and Safety, Pfizer Inc, New York, NY.
| |
Collapse
|
11
|
Mesquita F, Bernardino J, Henriques J, Raposo JF, Ribeiro RT, Paredes S. Machine learning techniques to predict the risk of developing diabetic nephropathy: a literature review. J Diabetes Metab Disord 2024; 23:825-839. [PMID: 38932857 PMCID: PMC11196462 DOI: 10.1007/s40200-023-01357-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 11/20/2023] [Indexed: 06/28/2024]
Abstract
Purpose Diabetes is a major public health challenge with widespread prevalence, often leading to complications such as Diabetic Nephropathy (DN)-a chronic condition that progressively impairs kidney function. In this context, it is important to evaluate if Machine learning models can exploit the inherent temporal factor in clinical data to predict the risk of developing DN faster and more accurately than current clinical models. Methods Three different databases were used for this literature review: Scopus, Web of Science, and PubMed. Only articles written in English and published between January 2015 and December 2022 were included. Results We included 11 studies, from which we discuss a number of algorithms capable of extracting knowledge from clinical data, incorporating dynamic aspects in patient assessment, and exploring their evolution over time. We also present a comparison of the different approaches, their performance, advantages, disadvantages, interpretation, and the value that the time factor can bring to a more successful prediction of diabetic nephropathy. Conclusion Our analysis showed that some studies ignored the temporal factor, while others partially exploited it. Greater use of the temporal aspect inherent in Electronic Health Records (EHR) data, together with the integration of omics data, could lead to the development of more reliable and powerful predictive models.
Collapse
Affiliation(s)
- F. Mesquita
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
| | - J. Bernardino
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| | - J. Henriques
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| | - JF. Raposo
- Education and Research Center, APDP Diabetes Portugal, Rua Do Salitre 118-120, 1250-203 Lisbon, Portugal
| | - RT. Ribeiro
- Education and Research Center, APDP Diabetes Portugal, Rua Do Salitre 118-120, 1250-203 Lisbon, Portugal
| | - S. Paredes
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| |
Collapse
|
12
|
Noda R, Ichikawa D, Shibagaki Y. Machine learning-based diagnostic prediction of IgA nephropathy: model development and validation study. Sci Rep 2024; 14:12426. [PMID: 38816457 PMCID: PMC11139869 DOI: 10.1038/s41598-024-63339-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 05/28/2024] [Indexed: 06/01/2024] Open
Abstract
IgA nephropathy progresses to kidney failure, making early detection important. However, definitive diagnosis depends on invasive kidney biopsy. This study aimed to develop non-invasive prediction models for IgA nephropathy using machine learning. We collected retrospective data on demographic characteristics, blood tests, and urine tests of the patients who underwent kidney biopsy. The dataset was divided into derivation and validation cohorts, with temporal validation. We employed five machine learning models-eXtreme Gradient Boosting (XGBoost), LightGBM, Random Forest, Artificial Neural Networks, and 1 Dimentional-Convolutional Neural Network (1D-CNN)-and logistic regression, evaluating performance via the area under the receiver operating characteristic curve (AUROC) and explored variable importance through SHapley Additive exPlanations method. The study included 1268 participants, with 353 (28%) diagnosed with IgA nephropathy. In the derivation cohort, LightGBM achieved the highest AUROC of 0.913 (95% CI 0.906-0.919), significantly higher than logistic regression, Artificial Neural Network, and 1D-CNN, not significantly different from XGBoost and Random Forest. In the validation cohort, XGBoost demonstrated the highest AUROC of 0.894 (95% CI 0.850-0.935), maintaining its robust performance. Key predictors identified were age, serum albumin, IgA/C3, and urine red blood cells, aligning with existing clinical insights. Machine learning can be a valuable non-invasive tool for IgA nephropathy.
Collapse
Affiliation(s)
- Ryunosuke Noda
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan.
| | - Daisuke Ichikawa
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan
| | - Yugo Shibagaki
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan
| |
Collapse
|
13
|
Gonzalez-Estrada A, Park MA, Accarino JJO, Banerji A, Carrillo-Martin I, D'Netto ME, Garzon-Siatoya WT, Hardway HD, Joundi H, Kinate S, Plager JH, Rank MA, Rukasin CRF, Samarakoon U, Volcheck GW, Weston AD, Wolfson AR, Blumenthal KG. Predicting Penicillin Allergy: A United States Multicenter Retrospective Study. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2024; 12:1181-1191.e10. [PMID: 38242531 DOI: 10.1016/j.jaip.2024.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 12/29/2023] [Accepted: 01/07/2024] [Indexed: 01/21/2024]
Abstract
BACKGROUND Using the reaction history in logistic regression and machine learning (ML) models to predict penicillin allergy has been reported based on non-US data. OBJECTIVE We developed ML positive penicillin allergy testing prediction models from multisite US data. METHODS Retrospective data from 4 US-based hospitals were grouped into 4 datasets: enriched training (1:3 case-control matched cohort), enriched testing, nonenriched internal testing, and nonenriched external testing. ML algorithms were used for model development. We determined area under the curve (AUC) and applied the Shapley Additive exPlanations (SHAP) framework to interpret risk drivers. RESULTS Of 4777 patients (mean age 60 [standard deviation: 17] years; 68% women, 91% White, and 86% non-Hispanic) evaluated for penicillin allergy labels, 513 (11%) had positive penicillin allergy testing. Model input variables were frequently missing: immediate or delayed onset (71%), signs or symptoms (13%), and treatment (31%). The gradient-boosted model was the strongest model with an AUC of 0.67 (95% confidence interval [CI]: 0.57-0.77), which improved to 0.87 (95% CI: 0.73-1) when only cases with complete data were used. Top SHAP drivers for positive testing were reactions within the last year and reactions requiring medical attention; female sex and reaction of hives/urticaria were also positive drivers. CONCLUSIONS An ML prediction model for positive penicillin allergy skin testing using US-based retrospective data did not achieve performance strong enough for acceptance and adoption. The optimal ML prediction model for positive penicillin allergy testing was driven by time since reaction, seek medical attention, female sex, and hives/urticaria.
Collapse
Affiliation(s)
- Alexei Gonzalez-Estrada
- Division of Pulmonary, Allergy, and Sleep Medicine, Department of Medicine, Mayo Clinic, Jacksonville, Fla
| | - Miguel A Park
- Division of Allergic Diseases, Department of Internal Medicine, Mayo Clinic, Rochester, Minn
| | - John J O Accarino
- Division of Rheumatology, Allergy, and Immunology, Department of Medicine, Massachusetts General Hospital, Boston, Mass
| | - Aleena Banerji
- Division of Rheumatology, Allergy, and Immunology, Department of Medicine, Massachusetts General Hospital, Boston, Mass; Harvard Medical School, Boston, Mass
| | - Ismael Carrillo-Martin
- Division of Pulmonary, Allergy, and Sleep Medicine, Department of Medicine, Mayo Clinic, Jacksonville, Fla
| | - Michael E D'Netto
- Division of Allergic Diseases, Department of Internal Medicine, Mayo Clinic, Rochester, Minn
| | - W Tatiana Garzon-Siatoya
- Division of Pulmonary, Allergy, and Sleep Medicine, Department of Medicine, Mayo Clinic, Jacksonville, Fla
| | - Heather D Hardway
- Digital Innovation Lab, Department of Health Sciences Research, Mayo Clinic, Jacksonville, Fla
| | - Hajara Joundi
- Division of Pulmonary, Allergy, and Sleep Medicine, Department of Medicine, Mayo Clinic, Jacksonville, Fla
| | - Susan Kinate
- Division of Allergy, Asthma, and Clinical Immunology, Department of Medicine, Mayo Clinic, Scottsdale, Ariz
| | - Jessica H Plager
- Department of Medicine, Massachusetts General Hospital, Boston, Mass
| | - Matthew A Rank
- Division of Allergy, Asthma, and Clinical Immunology, Department of Medicine, Mayo Clinic, Scottsdale, Ariz; Section of Allergy, Immunology, Division of Pulmonary, Phoenix Children's Hospital, Phoenix, Ariz
| | - Christine R F Rukasin
- Division of Allergy, Asthma, and Clinical Immunology, Department of Medicine, Mayo Clinic, Scottsdale, Ariz; Section of Allergy, Immunology, Division of Pulmonary, Phoenix Children's Hospital, Phoenix, Ariz
| | - Upeka Samarakoon
- Division of Rheumatology, Allergy, and Immunology, Department of Medicine, Massachusetts General Hospital, Boston, Mass
| | - Gerald W Volcheck
- Division of Allergic Diseases, Department of Internal Medicine, Mayo Clinic, Rochester, Minn
| | - Alexander D Weston
- Digital Innovation Lab, Department of Health Sciences Research, Mayo Clinic, Jacksonville, Fla
| | - Anna R Wolfson
- Division of Rheumatology, Allergy, and Immunology, Department of Medicine, Massachusetts General Hospital, Boston, Mass; Harvard Medical School, Boston, Mass
| | - Kimberly G Blumenthal
- Division of Rheumatology, Allergy, and Immunology, Department of Medicine, Massachusetts General Hospital, Boston, Mass; Harvard Medical School, Boston, Mass.
| |
Collapse
|
14
|
Wang W, Liu M, He Q, Wang M, Xu J, Li L, Li G, He L, Zou K, Sun X. Validation and impact of algorithms for identifying variables in observational studies of routinely collected data. J Clin Epidemiol 2024; 166:111232. [PMID: 38043830 DOI: 10.1016/j.jclinepi.2023.111232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 11/23/2023] [Accepted: 11/28/2023] [Indexed: 12/05/2023]
Abstract
BACKGROUND AND OBJECTIVES Among observational studies of routinely collected health data (RCD) for exploring treatment effects, algorithms are used to identify study variables. However, the extent to which algorithms are reliable and impact the credibility of effect estimates is far from clear. This study aimed to investigate the validation of algorithms for identifying study variables from RCD, and examine the impact of alternative algorithms on treatment effects. METHODS We searched PubMed for observational studies published in 2018 that used RCD to explore drug treatment effects. Information regarding the reporting, validation, and interpretation of algorithms was extracted. We summarized the reporting and methodological characteristics of algorithms and validation. We also assessed the divergence in effect estimates given alternative algorithms by calculating the ratio of estimates of the primary vs. alternative analyses. RESULTS A total of 222 studies were included, of which 93 (41.9%) provided a complete list of algorithms for identifying participants, 36 (16.2%) for exposure, and 132 (59.5%) for outcomes, and 15 (6.8%) for all study variables including population, exposure, and outcomes. Fifty-nine (26.6%) studies stated that the algorithms were validated, and 54 (24.3%) studies reported methodological characteristics of 66 validations, among which 61 validations in 49 studies were from the cross-referenced validation studies. Of those 66 validations, 22 (33.3%) reported sensitivity and 16 (24.2%) reported specificity. A total of 63.6% of studies reporting sensitivity and 56.3% reporting specificity used test-result-based sampling, an approach that potentially biases effect estimates. Twenty-eight (12.6%) studies used alternative algorithms to identify study variables, and 24 reported the effects estimated by primary analyses and sensitivity analyses. Of these, 20% had differential effect estimates when using alternative algorithms for identifying population, 18.2% for identifying exposure, and 45.5% for classifying outcomes. Only 32 (14.4%) studies discussed how the algorithms may affect treatment estimates. CONCLUSION In observational studies of RCD, the algorithms for variable identification were not regularly validated, and-even if validated-the methodological approach and performance of the validation were often poor. More seriously, different algorithms may yield differential treatment effects, but their impact is often ignored by researchers. Strong efforts, including recommendations, are warranted to improve good practice.
Collapse
Affiliation(s)
- Wen Wang
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu 610041, China; NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu 610041, China; Sichuan Center of Technology Innovation for Real World Data, Chengdu 610041, China.
| | - Mei Liu
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu 610041, China; NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu 610041, China; Sichuan Center of Technology Innovation for Real World Data, Chengdu 610041, China
| | - Qiao He
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu 610041, China; NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu 610041, China; Sichuan Center of Technology Innovation for Real World Data, Chengdu 610041, China
| | - Mingqi Wang
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu 610041, China; NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu 610041, China; Sichuan Center of Technology Innovation for Real World Data, Chengdu 610041, China
| | - Jiayue Xu
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu 610041, China; NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu 610041, China; Sichuan Center of Technology Innovation for Real World Data, Chengdu 610041, China
| | - Ling Li
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu 610041, China; NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu 610041, China; Sichuan Center of Technology Innovation for Real World Data, Chengdu 610041, China
| | - Guowei Li
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario L8S 4L8, Canada; Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangzhou, Guangdong 510317, China; Biostatistics Unit, Research Institute at St. Joseph's Healthcare Hamilton, Hamilton, Ontario L8N 4A6, Canada
| | - Lin He
- Intelligence Library Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Kang Zou
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu 610041, China; NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu 610041, China; Sichuan Center of Technology Innovation for Real World Data, Chengdu 610041, China
| | - Xin Sun
- Institute of Integrated Traditional Chinese and Western Medicine, Chinese Evidence-based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu 610041, China; NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu 610041, China; Sichuan Center of Technology Innovation for Real World Data, Chengdu 610041, China.
| |
Collapse
|
15
|
van Leeuwen JR, Penne EL, Rabelink T, Knevel R, Teng YKO. Using an artificial intelligence tool incorporating natural language processing to identify patients with a diagnosis of ANCA-associated vasculitis in electronic health records. Comput Biol Med 2024; 168:107757. [PMID: 38039893 DOI: 10.1016/j.compbiomed.2023.107757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 11/14/2023] [Accepted: 11/21/2023] [Indexed: 12/03/2023]
Abstract
BACKGROUND Because anti-neutrophil cytoplasmatic antibody (ANCA)-associated vasculitis (AAV) is a rare, life-threatening, auto-immune disease, conducting research is difficult but essential. A long-lasting challenge is to identify rare AAV patients within the electronic-health-record (EHR)-system to facilitate real-world research. Artificial intelligence (AI)-search tools using natural language processing (NLP) for text-mining are increasingly postulated as a solution. METHODS We employed an AI-tool that combined text-mining with NLP-based exclusion, to accurately identify rare AAV patients within large EHR-systems (>2.000.000 records). We developed an identification method in an academic center with an established AAV-training set (n = 203) and validated the method in a non-academic center with an AAV-validation set (n = 84). To assess accuracy anonymized patient records were manually reviewed. RESULTS Based on an iterative process, a text-mining search was developed on disease description, laboratory measurements, medication and specialisms. In the training center, 608 patients were identified with a sensitivity of 97.0 % (95%CI [93.7, 98.9]) and positive predictive value (PPV) of 56.9 % (95%CI [52.9, 60.1]). NLP-based exclusion resulted in 444 patients increasing PPV to 77.9 % (95%CI [73.7, 81.7]) while sensitivity remained 96.3 % (95%CI [93.8, 98.0]). In the validation center, text-mining identified 333 patients (sensitivity 97.6 % (95%CI [91.6, 99.7]), PPV 58.2 % (95%CI [52.8, 63.6])) and NLP-based exclusion resulted in 223 patients, increasing PPV to 86.1 % (95%CI [80.9, 90.4]) with 98.0 % (95%CI [94.9, 99.4]) sensitivity. Our identification method outperformed ICD-10-coding predominantly in identifying MPO+ and organ-limited AAV patients. CONCLUSIONS Our study highlights the advantages of implementing AI, notably NLP, to accurately identify rare AAV patients within large EHR-systems and demonstrates the applicability and transportability. Therefore, this method can reduce efforts to identify AAV patients and accelerate real-world research, while avoiding bias by ICD-10-coding.
Collapse
Affiliation(s)
- Jolijn R van Leeuwen
- Center of Expertise for Lupus-, Vasculitis- and Complement-mediated Systemic diseases (LuVaCs), Department of Internal Medicine - Nephrology Section, Leiden University Medical Center, Leiden, the Netherlands
| | - Erik L Penne
- Department of Internal Medicine - Nephrology Section, Northwest Clinics, Alkmaar, the Netherlands
| | - Ton Rabelink
- Center of Expertise for Lupus-, Vasculitis- and Complement-mediated Systemic diseases (LuVaCs), Department of Internal Medicine - Nephrology Section, Leiden University Medical Center, Leiden, the Netherlands
| | - Rachel Knevel
- Department of Rheumatology, Leiden University Medical Center, Leiden, the Netherlands
| | - Y K Onno Teng
- Center of Expertise for Lupus-, Vasculitis- and Complement-mediated Systemic diseases (LuVaCs), Department of Internal Medicine - Nephrology Section, Leiden University Medical Center, Leiden, the Netherlands.
| |
Collapse
|
16
|
Grzenda A, Widge AS. Electronic health records and stratified psychiatry: bridge to precision treatment? Neuropsychopharmacology 2024; 49:285-290. [PMID: 37667021 PMCID: PMC10700348 DOI: 10.1038/s41386-023-01724-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/24/2023] [Accepted: 08/27/2023] [Indexed: 09/06/2023]
Abstract
The use of a stratified psychiatry approach that combines electronic health records (EHR) data with machine learning (ML) is one potentially fruitful path toward rapidly improving precision treatment in clinical practice. This strategy, however, requires confronting pervasive methodological flaws as well as deficiencies in transparency and reporting in the current conduct of ML-based studies for treatment prediction. EHR data shares many of the same data quality issues as other types of data used in ML prediction, plus some unique challenges. To fully leverage EHR data's power for patient stratification, increased attention to data quality and collection of patient-reported outcome data is needed.
Collapse
Affiliation(s)
- Adrienne Grzenda
- Department of Psychiatry & Biobehavioral Sciences, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA, USA.
- Olive View-UCLA Medical Center, Sylmar, CA, USA.
| | - Alik S Widge
- Department of Psychiatry & Behavioral Sciences, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
17
|
Smith CM, Weathers AL, Lewis SL. An overview of clinical machine learning applications in neurology. J Neurol Sci 2023; 455:122799. [PMID: 37979413 DOI: 10.1016/j.jns.2023.122799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 10/26/2023] [Accepted: 11/12/2023] [Indexed: 11/20/2023]
Abstract
Machine learning techniques for clinical applications are evolving, and the potential impact this will have on clinical neurology is important to recognize. By providing a broad overview on this growing paradigm of clinical tools, this article aims to help healthcare professionals in neurology prepare to navigate both the opportunities and challenges brought on through continued advancements in machine learning. This narrative review first elaborates on how machine learning models are organized and implemented. Machine learning tools are then classified by clinical application, with examples of uses within neurology described in more detail. Finally, this article addresses limitations and considerations regarding clinical machine learning applications in neurology.
Collapse
Affiliation(s)
- Colin M Smith
- Lehigh Valley Fleming Neuroscience Institute, 1250 S Cedar Crest Blvd., Allentown, PA 18103, USA
| | - Allison L Weathers
- Cleveland Clinic Information Technology Division, 9500 Euclid Ave. Cleveland, OH 44195, USA
| | - Steven L Lewis
- Lehigh Valley Fleming Neuroscience Institute, 1250 S Cedar Crest Blvd., Allentown, PA 18103, USA.
| |
Collapse
|
18
|
Wang M, Sushil M, Miao BY, Butte AJ. Bottom-up and top-down paradigms of artificial intelligence research approaches to healthcare data science using growing real-world big data. J Am Med Inform Assoc 2023; 30:1323-1332. [PMID: 37187158 PMCID: PMC10280344 DOI: 10.1093/jamia/ocad085] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 04/03/2023] [Accepted: 05/04/2023] [Indexed: 05/17/2023] Open
Abstract
OBJECTIVES As the real-world electronic health record (EHR) data continue to grow exponentially, novel methodologies involving artificial intelligence (AI) are becoming increasingly applied to enable efficient data-driven learning and, ultimately, to advance healthcare. Our objective is to provide readers with an understanding of evolving computational methods and help in deciding on methods to pursue. TARGET AUDIENCE The sheer diversity of existing methods presents a challenge for health scientists who are beginning to apply computational methods to their research. Therefore, this tutorial is aimed at scientists working with EHR data who are early entrants into the field of applying AI methodologies. SCOPE This manuscript describes the diverse and growing AI research approaches in healthcare data science and categorizes them into 2 distinct paradigms, the bottom-up and top-down paradigms to provide health scientists venturing into artificial intelligent research with an understanding of the evolving computational methods and help in deciding on methods to pursue through the lens of real-world healthcare data.
Collapse
Affiliation(s)
- Michelle Wang
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
| | - Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
| | - Brenda Y Miao
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, California, USA
| |
Collapse
|
19
|
Cohen RY, Kovacheva VP. A Methodology for a Scalable, Collaborative, and Resource-Efficient Platform, MERLIN, to Facilitate Healthcare AI Research. IEEE J Biomed Health Inform 2023; 27:3014-3025. [PMID: 37030761 PMCID: PMC10275625 DOI: 10.1109/jbhi.2023.3259395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2023]
Abstract
Healthcare artificial intelligence (AI) holds the potential to increase patient safety, augment efficiency and improve patient outcomes, yet research is often limited by data access, cohort curation, and tools for analysis. Collection and translation of electronic health record data, live data, and real-time high-resolution device data can be challenging and time-consuming. The development of clinically relevant AI tools requires overcoming challenges in data acquisition, scarce hospital resources, and requirements for data governance. These bottlenecks may result in resource-heavy needs and long delays in research and development of AI systems. We present a system and methodology to accelerate data acquisition, dataset development and analysis, and AI model development. We created an interactive platform that relies on a scalable microservice architecture. This system can ingest 15,000 patient records per hour, where each record represents thousands of multimodal measurements, text notes, and high-resolution data. Collectively, these records can approach a terabyte of data. The platform can further perform cohort generation and preliminary dataset analysis in 2-5 minutes. As a result, multiple users can collaborate simultaneously to iterate on datasets and models in real time. We anticipate that this approach will accelerate clinical AI model development, and, in the long run, meaningfully improve healthcare delivery.
Collapse
|
20
|
McElroy SJ, Lueschow SR. State of the art review on machine learning and artificial intelligence in the study of neonatal necrotizing enterocolitis. Front Pediatr 2023; 11:1182597. [PMID: 37303753 PMCID: PMC10250644 DOI: 10.3389/fped.2023.1182597] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 04/25/2023] [Indexed: 06/13/2023] Open
Abstract
Necrotizing Enterocolitis (NEC) is one of the leading causes of gastrointestinal emergency in preterm infants. Although NEC was formally described in the 1960's, there is still difficulty in diagnosis and ultimately treatment for NEC due in part to the multifactorial nature of the disease. Artificial intelligence (AI) and machine learning (ML) techniques have been applied by healthcare researchers over the past 30 years to better understand various diseases. Specifically, NEC researchers have used AI and ML to predict NEC diagnosis, NEC prognosis, discover biomarkers, and evaluate treatment strategies. In this review, we discuss AI and ML techniques, the current literature that has applied AI and ML to NEC, and some of the limitations in the field.
Collapse
Affiliation(s)
- Steven J. McElroy
- Department of Pediatrics, University of California Davis, Sacramento, CA, United States
| | - Shiloh R. Lueschow
- Stead Family Department of Pediatrics, University of Iowa, Iowa City, IA, United States
| |
Collapse
|
21
|
Williams DD, Ferro D, Mullaney C, Skrabonja L, Barnes MS, Patton SR, Lockee B, Tallon EM, Vandervelden CA, Schweisberger C, Mehta S, McDonough R, Lind M, D'Avolio L, Clements MA. Development of an "all-data-on-hand" deep learning model to predict hospitalization for diabetic ketoacidosis (DKA) in youth with type 1 diabetes (T1D). JMIR Diabetes 2023. [PMID: 37224506 PMCID: PMC10394604 DOI: 10.2196/47592] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Open
Abstract
BACKGROUND While prior research has identified multiple risk factors for diabetic ketoacidosis (DKA), clinicians continue to lack clinic-ready models to predict dangerous and costly episodes of DKA. We asked whether we could apply deep learning, specifically use of a long short-term (LSTM) model, to accurately predict 180-day risk of DKA-related hospitalization for youth with type 1 diabetes (T1D). OBJECTIVE To describe the development of a LSTM model to predict 180-day risk of DKA-related hospitalization for youth with T1D. METHODS We used 17 consecutive calendar quarters of clinical data (01/10/2016-03/18/2020) for 1745 youth 8 to 18-years with T1D from a pediatric diabetes clinic network in the Midwestern US. Input data included demographics, discrete clinical observations (lab results, vital signs, anthropometric measures, diagnosis and procedure codes), medications, visit counts by type of encounter, number of historic DKA episodes, number of days since last DKA admission, patient-reported outcomes (answers to clinic intake questions), and data features derived from diabetes- and non-diabetes-related clinical notes via natural language processing (NLP). We trained the model using input data from quarters 1-7 (n=1377), validated using input from quarters 3-9 in a partial out-of-sample cohort (OOS-P; n=1505), and further validated in a full out-of-sample cohort (OOS-F; n=354) with input from quarters 10-15. RESULTS DKA admissions occurred at a rate of 5% per 180-days in both OOS cohorts. For the OOS-P and OOS-F cohorts, respectively: median age was 13.7 years (IQR=11.3,15.8) and 13.1 years (10.7,15.5); and HbA1c at enrollment was 8.6% (7.6,9.8) [70 (60,84) mmol/mol] and 8.1% (6.9,9.5) [65 (52,80) mmol/mol]; 14% and 13% had prior DKA admissions (post-T1D-diagnosis); and recall was 0.33 and 0.50 for the top-ranked 5% of youth with T1D. For lists rank-ordered by probability of hospitalization, precision increased from 0.33 to 0.56 to 1.0 for positions 1-80, 1-25, and 1-10 in the OOS-P cohort and from 0.50 to 0.60 to 0.80 for positions 1-18, 1-10, and 1-5 in the OOS-F cohort. CONCLUSIONS The proposed LSTM model for predicting 180-day DKA-related hospitalization is valid in the present sample. Future work should evaluate model validity in multiple populations and settings to account for health inequities that may be present in different segments of the population (e.g., racially and/or socioeconomically diverse cohorts). Rank-ordering youth by probability of DKA-related hospitalization will allow clinics to identify the most at-risk youth. The clinical implication of this is that clinics may then create and evaluate novel preventive interventions based on available resources.
Collapse
Affiliation(s)
- David D Williams
- Health Services and Outcomes Research, Children's Mercy - Kansas City, 2401 Gillham Road, Kansas City, US
| | - Diana Ferro
- Predictive and Preventive Medicine Research Unit, Bambino Gesù Children Hospital, Roma, IT
- Department of Endocrinology, Children's Mercy - Kansas City, Kansas City, US
| | | | | | - Mitchell S Barnes
- Department of Endocrinology, Children's Mercy - Kansas City, Kansas City, US
| | - Susana R Patton
- Center for Healthcare Delivery Science, Nemours Children's Health, Jacksonville, US
| | - Brent Lockee
- Department of Endocrinology, Children's Mercy - Kansas City, Kansas City, US
| | - Erin M Tallon
- Department of Endocrinology, Children's Mercy - Kansas City, Kansas City, US
| | | | | | | | - Ryan McDonough
- Department of Endocrinology, Children's Mercy - Kansas City, Kansas City, US
| | - Marcus Lind
- Department of Medicine, NU-Hospital Group, Uddevalla, SE
- Department of Molecular and Clinical Medicine, University of Gothenburg, Gothenburg, SE
- Department of Internal Medicine, Sahlgrenska University Hospital, Gothenburg, SE
| | | | - Mark A Clements
- Department of Endocrinology, Children's Mercy - Kansas City, Kansas City, US
| |
Collapse
|
22
|
Nurmambetova E, Pan J, Zhang Z, Wu G, Lee S, Southern DA, Martin EA, Ho C, Xu Y, Eastwood CA. Developing an Inpatient Electronic Medical Record Phenotype for Hospital-Acquired Pressure Injuries: Case Study Using Natural Language Processing Models. JMIR AI 2023; 2:e41264. [PMID: 38875552 PMCID: PMC11041460 DOI: 10.2196/41264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 01/01/2023] [Accepted: 01/15/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND Surveillance of hospital-acquired pressure injuries (HAPI) is often suboptimal when relying on administrative health data, as International Classification of Diseases (ICD) codes are known to have long delays and are undercoded. We leveraged natural language processing (NLP) applications on free-text notes, particularly the inpatient nursing notes, from electronic medical records (EMRs), to more accurately and timely identify HAPIs. OBJECTIVE This study aimed to show that EMR-based phenotyping algorithms are more fitted to detect HAPIs than ICD-10-CA algorithms alone, while the clinical logs are recorded with higher accuracy via NLP using nursing notes. METHODS Patients with HAPIs were identified from head-to-toe skin assessments in a local tertiary acute care hospital during a clinical trial that took place from 2015 to 2018 in Calgary, Alberta, Canada. Clinical notes documented during the trial were extracted from the EMR database after the linkage with the discharge abstract database. Different combinations of several types of clinical notes were processed by sequential forward selection during the model development. Text classification algorithms for HAPI detection were developed using random forest (RF), extreme gradient boosting (XGBoost), and deep learning models. The classification threshold was tuned to enable the model to achieve similar specificity to an ICD-based phenotyping study. Each model's performance was assessed, and comparisons were made between the metrics, including sensitivity, positive predictive value, negative predictive value, and F1-score. RESULTS Data from 280 eligible patients were used in this study, among whom 97 patients had HAPIs during the trial. RF was the optimal performing model with a sensitivity of 0.464 (95% CI 0.365-0.563), specificity of 0.984 (95% CI 0.965-1.000), and F1-score of 0.612 (95% CI of 0.473-0.751). The machine learning (ML) model reached higher sensitivity without sacrificing much specificity compared to the previously reported performance of ICD-based algorithms. CONCLUSIONS The EMR-based NLP phenotyping algorithms demonstrated improved performance in HAPI case detection over ICD-10-CA codes alone. Daily generated nursing notes in EMRs are a valuable data resource for ML models to accurately detect adverse events. The study contributes to enhancing automated health care quality and safety surveillance.
Collapse
Affiliation(s)
- Elvira Nurmambetova
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Jie Pan
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Zilong Zhang
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Guosong Wu
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Seungwon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Alberta Health Services, Edmonton, AB, Canada
| | - Danielle A Southern
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Elliot A Martin
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Alberta Health Services, Edmonton, AB, Canada
| | - Chester Ho
- Department of Medicine, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Yuan Xu
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Oncology, University of Calgary, Tom Baker Cancer Centre, Calgary, AB, Canada
- Department of Surgery, Foothills Medical Centre, University of Calgary, Calgary, AB, Canada
| | - Cathy A Eastwood
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
23
|
Ten Considerations for Integrating Patient-Reported Outcomes into Clinical Care for Childhood Cancer Survivors. Cancers (Basel) 2023; 15:cancers15041024. [PMID: 36831370 PMCID: PMC9954048 DOI: 10.3390/cancers15041024] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 01/28/2023] [Accepted: 02/01/2023] [Indexed: 02/08/2023] Open
Abstract
Patient-reported outcome measures (PROMs) are subjective assessments of health status or health-related quality of life. In childhood cancer survivors, PROMs can be used to evaluate the adverse effects of cancer treatment and guide cancer survivorship care. However, there are barriers to integrating PROMs into clinical practice, such as constraints in clinical validity, meaningful interpretation, and technology-enabled administration of the measures. This article discusses these barriers and proposes 10 important considerations for appropriate PROM integration into clinical care for choosing the right measure (considering the purpose of using a PROM, health profile vs. health preference approaches, measurement properties), ensuring survivors complete the PROMs (data collection method, data collection frequency, survivor capacity, self- vs. proxy reports), interpreting the results (scoring methods, clinical meaning and interpretability), and selecting a strategy for clinical response (integration into the clinical workflow). An example framework for integrating novel patient-reported outcome (PRO) data collection into the clinical workflow for childhood cancer survivorship care is also discussed. As we continuously improve the clinical validity of PROMs and address implementation barriers, routine PRO assessment and monitoring in pediatric cancer survivorship offer opportunities to facilitate clinical decision making and improve the quality of survivorship care.
Collapse
|
24
|
Ayaz M, Pasha MF, Le TY, Alahmadi TJ, Abdullah NNB, Alhababi ZA. A Framework for Automatic Clustering of EHR Messages Using a Spatial Clustering Approach. Healthcare (Basel) 2023; 11:390. [PMID: 36766965 PMCID: PMC9914110 DOI: 10.3390/healthcare11030390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/12/2023] [Accepted: 01/16/2023] [Indexed: 02/03/2023] Open
Abstract
Although Health Level Seven (HL 7) message standards (v2, v3, Clinical Document Architecture (CDA)) have been commonly adopted, there are still issues associated with them, especially the semantic interoperability issues and lack of support for smart devices (e.g., smartphones, fitness trackers, and smartwatches), etc. In addition, healthcare organizations in many countries are still using proprietary electronic health record (EHR) message formats, making it challenging to convert to other data formats-particularly the latest HL7 Fast Health Interoperability Resources (FHIR) data standard. The FHIR is based on modern web technologies such as HTTP, XML, and JSON and would be capable of overcoming the shortcomings of the previous standards and supporting modern smart devices. Therefore, the FHIR standard could help the healthcare industry to avail the latest technologies benefits and improve data interoperability. The data representation and mapping from the legacy data standards (i.e., HL7 v2 and EHR) to the FHIR is necessary for the healthcare sector. However, direct data mapping or conversion from the traditional data standards to the FHIR data standard is challenging because of the nature and formats of the data. Therefore, in this article, we propose a framework that aims to convert proprietary EHR messages into the HL7 v2 format and apply an unsupervised clustering approach using the DBSCAN (density-based spatial clustering of applications with noise) algorithm to automatically group a variety of these HL7 v2 messages regardless of their semantic origins. The proposed framework's implementation lays the groundwork to provide a generic mapping model with multi-point and multi-format data conversion input into the FHIR. Our experimental results show the proposed framework's ability to automatically cluster various HL7 v2 message formats and provide analytic insight behind them.
Collapse
Affiliation(s)
- Muhammad Ayaz
- Malaysia School of Information Technology, Monash University, Jalan Lagoon Selatan Bandar Sunway, Subang Jaya 47500, Selangor, Malaysia
| | - Muhammad Fermi Pasha
- Malaysia School of Information Technology, Monash University, Jalan Lagoon Selatan Bandar Sunway, Subang Jaya 47500, Selangor, Malaysia
| | - Tham Yu Le
- Malaysia School of Information Technology, Monash University, Jalan Lagoon Selatan Bandar Sunway, Subang Jaya 47500, Selangor, Malaysia
| | - Tahani Jaser Alahmadi
- Department of Information System, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
| | - Nik Nailah Binti Abdullah
- Malaysia School of Information Technology, Monash University, Jalan Lagoon Selatan Bandar Sunway, Subang Jaya 47500, Selangor, Malaysia
| | - Zaid Ali Alhababi
- Riyadh First Health Cluster, Ministry of Health, Riyadh 11622, Saudi Arabia
| |
Collapse
|
25
|
Cardozo G, Tirloni SF, Pereira Moro AR, Marques JLB. Use of Artificial Intelligence in the Search for New Information Through Routine Laboratory Tests: Systematic Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e40473. [PMID: 36644762 PMCID: PMC9828303 DOI: 10.2196/40473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/28/2022] [Accepted: 10/31/2022] [Indexed: 11/05/2022]
Abstract
Background In recent decades, the use of artificial intelligence has been widely explored in health care. Similarly, the amount of data generated in the most varied medical processes has practically doubled every year, requiring new methods of analysis and treatment of these data. Mainly aimed at aiding in the diagnosis and prevention of diseases, this precision medicine has shown great potential in different medical disciplines. Laboratory tests, for example, almost always present their results separately as individual values. However, physicians need to analyze a set of results to propose a supposed diagnosis, which leads us to think that sets of laboratory tests may contain more information than those presented separately for each result. In this way, the processes of medical laboratories can be strongly affected by these techniques. Objective In this sense, we sought to identify scientific research that used laboratory tests and machine learning techniques to predict hidden information and diagnose diseases. Methods The methodology adopted used the population, intervention, comparison, and outcomes principle, searching the main engineering and health sciences databases. The search terms were defined based on the list of terms used in the Medical Subject Heading database. Data from this study were presented descriptively and followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses; 2020) statement flow diagram and the National Institutes of Health tool for quality assessment of articles. During the analysis, the inclusion and exclusion criteria were independently applied by 2 authors, with a third author being consulted in cases of disagreement. Results Following the defined requirements, 40 studies presenting good quality in the analysis process were selected and evaluated. We found that, in recent years, there has been a significant increase in the number of works that have used this methodology, mainly because of COVID-19. In general, the studies used machine learning classification models to predict new information, and the most used parameters were data from routine laboratory tests such as the complete blood count. Conclusions Finally, we conclude that laboratory tests, together with machine learning techniques, can predict new tests, thus helping the search for new diagnoses. This process has proved to be advantageous and innovative for medical laboratories. It is making it possible to discover hidden information and propose additional tests, reducing the number of false negatives and helping in the early discovery of unknown diseases.
Collapse
Affiliation(s)
- Glauco Cardozo
- Federal Institute of Santa Catarina Florianópolis Brazil
| | | | | | | |
Collapse
|
26
|
Detection of factors affecting kidney function using machine learning methods. Sci Rep 2022; 12:21740. [PMID: 36526702 PMCID: PMC9758148 DOI: 10.1038/s41598-022-26160-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
Due to the increasing prevalence of chronic kidney disease and its high mortality rate, study of risk factors affecting the progression of the disease is of great importance. Here in this work, we aim to develop a framework for using machine learning methods to identify factors affecting kidney function. To this end classification methods are trained to predict the serum creatinine level based on numerical values of other blood test parameters in one of the three classes representing different ranges of the variable values. Models are trained using the data from blood test results of healthy and patient subjects including 46 different blood test parameters. The best developed models are random forest and LightGBM. Interpretation of the resulting model reveals a direct relationship between vitamin D and blood creatinine level. The detected analogy between these two parameters is reliable, regarding the relatively high predictive accuracy of the random forest model reaching the AUC of 0.90 and the accuracy of 0.74. Moreover, in this paper we develop a Bayesian network to infer the direct relationships between blood test parameters which have consistent results with the classification models. The proposed framework uses an inclusive set of advanced imputation methods to deal with the main challenge of working with electronic health data, missing values. Hence it can be applied to similar clinical studies to investigate and discover the relationships between the factors under study.
Collapse
|
27
|
Eichler F, Sevin C, Barth M, Pang F, Howie K, Walz M, Wilds A, Calcagni C, Chanson C, Campbell L. Understanding caregiver descriptions of initial signs and symptoms to improve diagnosis of metachromatic leukodystrophy. Orphanet J Rare Dis 2022; 17:370. [PMID: 36195888 PMCID: PMC9531467 DOI: 10.1186/s13023-022-02518-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Metachromatic leukodystrophy (MLD), a relentlessly progressive and ultimately fatal condition, is a rare autosomal recessive lysosomal storage disorder caused by a deficiency of the enzyme arylsulfatase A (ARSA). Historically management has been palliative or supportive care. Hematopoietic stem cell transplantation is poorly effective in early-onset MLD and benefit in late-onset MLD remains controversial. Hematopoietic stem cell gene therapy, Libmeldy (atidarsagene autotemcel), was recently approved by the European Medicines Agency for early-onset MLD. Treatment benefit is mainly observed at an early disease stage, indicating the need for early diagnosis and intervention. This study contributes insights into the caregiver language used to describe initial MLD symptomatology, and thereby aims to improve communication between clinicians and families impacted by this condition and promote a faster path to diagnosis. RESULTS Data was collected through a moderator-assisted online 60-min survey and 30-min semi-structured follow-up telephone interview with 31 MLD caregivers in the United States (n = 10), France (n = 10), the United Kingdom (n = 5), and Germany (n = 6). All respondents were primary caregivers of a person with late infantile (n = 20), juvenile (n = 11) or borderline late infantile/juvenile (n = 1) MLD (one caregiver reported for 2 children leading to a sample of 32 individuals with MLD). Caregivers were asked questions related to their child's initial signs and symptoms, time to diagnosis and interactions with healthcare providers. These results highlight the caregiver language used to describe the most common initial symptoms of MLD and provide added context to help elevate the index of suspicion of disease. Distinctions between caregiver descriptions of late infantile and juvenile MLD in symptom onset and disease course were also identified. CONCLUSIONS This study captures the caregiver description of the physical, behavioral, and cognitive signs of MLD prior to diagnosis. The understanding of the caregiver language at symptom onset sheds light on a critical window of often missed opportunity for earlier diagnosis and therapeutic intervention in MLD.
Collapse
Affiliation(s)
- F Eichler
- Center for Rare Neurological Diseases, Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Caroline Sevin
- Service de Neuropédiatrie, centre de reference des leucodystrophies et leucoencephalopathies genetiques de cause rare, CHU Paris-Sud-Hôpital de Bicêtre, Le Kremlin-Bicêtre, France
| | - M Barth
- Service de Génétique, Hôpital Universitaire d'Angers, Angers, France
| | - F Pang
- Orchard Therapeutics, 245 Hammersmith Road, London, W6 8PW, UK.
| | - K Howie
- Magnolia Innovation, Hoboken, NJ, USA
| | - M Walz
- Magnolia Innovation, Hoboken, NJ, USA
| | - A Wilds
- Magnolia Innovation, Hoboken, NJ, USA
| | | | - C Chanson
- Orchard Therapeutics, 245 Hammersmith Road, London, W6 8PW, UK
| | - L Campbell
- Orchard Therapeutics, 245 Hammersmith Road, London, W6 8PW, UK
| |
Collapse
|
28
|
Ahuja Y, Zou Y, Verma A, Buckeridge D, Li Y. MixEHR-Guided: A guided multi-modal topic modeling approach for large-scale automatic phenotyping using the electronic health record. J Biomed Inform 2022; 134:104190. [PMID: 36058522 DOI: 10.1016/j.jbi.2022.104190] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 08/27/2022] [Accepted: 08/28/2022] [Indexed: 01/18/2023]
Abstract
Electronic Health Records (EHRs) contain rich clinical data collected at the point of the care, and their increasing adoption offers exciting opportunities for clinical informatics, disease risk prediction, and personalized treatment recommendation. However, effective use of EHR data for research and clinical decision support is often hampered by a lack of reliable disease labels. To compile gold-standard labels, researchers often rely on clinical experts to develop rule-based phenotyping algorithms from billing codes and other surrogate features. This process is tedious and error-prone due to recall and observer biases in how codes and measures are selected, and some phenotypes are incompletely captured by a handful of surrogate features. To address this challenge, we present a novel automatic phenotyping model called MixEHR-Guided (MixEHR-G), a multimodal hierarchical Bayesian topic model that efficiently models the EHR generative process by identifying latent phenotype structure in the data. Unlike existing topic modeling algorithms wherein the inferred topics are not identifiable, MixEHR-G uses prior information from informative surrogate features to align topics with known phenotypes. We applied MixEHR-G to an openly-available EHR dataset of 38,597 intensive care patients (MIMIC-III) in Boston, USA and to administrative claims data for a population-based cohort (PopHR) of 1.3 million people in Quebec, Canada. Qualitatively, we demonstrate that MixEHR-G learns interpretable phenotypes and yields meaningful insights about phenotype similarities, comorbidities, and epidemiological associations. Quantitatively, MixEHR-G outperforms existing unsupervised phenotyping methods on a phenotype label annotation task, and it can accurately estimate relative phenotype prevalence functions without gold-standard phenotype information. Altogether, MixEHR-G is an important step towards building an interpretable and automated phenotyping system using EHR data.
Collapse
Affiliation(s)
- Yuri Ahuja
- Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA; Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA.
| | - Yuesong Zou
- School of Computer Science, McGill University, 3480 Rue University, Montreal, QC H3A 2A7, Canada
| | - Aman Verma
- School of Population and Global Health, McGill University, 2001 McGill College Avenue, Montreal, Québec H3A 1G1, Canada
| | - David Buckeridge
- School of Population and Global Health, McGill University, 2001 McGill College Avenue, Montreal, Québec H3A 1G1, Canada.
| | - Yue Li
- School of Computer Science, McGill University, 3480 Rue University, Montreal, QC H3A 2A7, Canada.
| |
Collapse
|
29
|
Noori A, Magdamo C, Liu X, Tyagi T, Li Z, Kondepudi A, Alabsi H, Rudmann E, Wilcox D, Brenner L, Robbins GK, Moura L, Zafar S, Benson NM, Hsu J, R Dickson J, Serrano-Pozo A, Hyman BT, Blacker D, Westover MB, Mukerji SS, Das S. Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study. J Med Internet Res 2022; 24:e40384. [PMID: 36040790 PMCID: PMC9472045 DOI: 10.2196/40384] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 07/29/2022] [Accepted: 07/31/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs) with large sample sizes and rich information offer great potential for dementia research, but current methods of phenotyping cognitive status are not scalable. OBJECTIVE The aim of this study was to evaluate whether natural language processing (NLP)-powered semiautomated annotation can improve the speed and interrater reliability of chart reviews for phenotyping cognitive status. METHODS In this diagnostic study, we developed and evaluated a semiautomated NLP-powered annotation tool (NAT) to facilitate phenotyping of cognitive status. Clinical experts adjudicated the cognitive status of 627 patients at Mass General Brigham (MGB) health care, using NAT or traditional chart reviews. Patient charts contained EHR data from two data sets: (1) records from January 1, 2017, to December 31, 2018, for 100 Medicare beneficiaries from the MGB Accountable Care Organization and (2) records from 2 years prior to COVID-19 diagnosis to the date of COVID-19 diagnosis for 527 MGB patients. All EHR data from the relevant period were extracted; diagnosis codes, medications, and laboratory test values were processed and summarized; clinical notes were processed through an NLP pipeline; and a web tool was developed to present an integrated view of all data. Cognitive status was rated as cognitively normal, cognitively impaired, or undetermined. Assessment time and interrater agreement of NAT compared to manual chart reviews for cognitive status phenotyping was evaluated. RESULTS NAT adjudication provided higher interrater agreement (Cohen κ=0.89 vs κ=0.80) and significant speed up (time difference mean 1.4, SD 1.3 minutes; P<.001; ratio median 2.2, min-max 0.4-20) over manual chart reviews. There was moderate agreement with manual chart reviews (Cohen κ=0.67). In the cases that exhibited disagreement with manual chart reviews, NAT adjudication was able to produce assessments that had broader clinical consensus due to its integrated view of highlighted relevant information and semiautomated NLP features. CONCLUSIONS NAT adjudication improves the speed and interrater reliability for phenotyping cognitive status compared to manual chart reviews. This study underscores the potential of an NLP-based clinically adjudicated method to build large-scale dementia research cohorts from EHRs.
Collapse
Affiliation(s)
- Ayush Noori
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Colin Magdamo
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Xiao Liu
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Tanish Tyagi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Zhaozhi Li
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Akhil Kondepudi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Haitham Alabsi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Emily Rudmann
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Vaccine and Immunotherapy Center, Division of Infectious Disease, Boston, MA, United States
| | - Douglas Wilcox
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Laura Brenner
- Harvard Medical School, Boston, MA, United States
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Gregory K Robbins
- Harvard Medical School, Boston, MA, United States
- Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
| | - Lidia Moura
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Sahar Zafar
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Nicole M Benson
- Harvard Medical School, Boston, MA, United States
- Mongan Institute, Massachusetts General Hospital, Boston, MA, United States
- McLean Hospital, Belmont, MA, United States
| | - John Hsu
- Harvard Medical School, Boston, MA, United States
- Mongan Institute, Massachusetts General Hospital, Boston, MA, United States
| | - John R Dickson
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Alberto Serrano-Pozo
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Bradley T Hyman
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Deborah Blacker
- Harvard Medical School, Boston, MA, United States
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States
| | - M Brandon Westover
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Shibani S Mukerji
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
| | - Sudeshna Das
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| |
Collapse
|
30
|
Vyas S, Shabaz M, Pandit P, Parvathy LR, Ofori I. Integration of Artificial Intelligence and Blockchain Technology in Healthcare and Agriculture. J FOOD QUALITY 2022; 2022:1-11. [DOI: 10.1155/2022/4228448] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2024] Open
Abstract
Over the last decade, the healthcare sector has accelerated its digitization and electronic health records (EHRs). As information technology progresses, the notion of intelligent health also gathers popularity. By combining technologies such as the internet of things (IoT) and artificial intelligence (AI), innovative healthcare modifies and enhances traditional medical systems in terms of efficiency, service, and personalization. On the other side, intelligent healthcare systems are incredibly vulnerable to data breaches and other malicious assaults. Recently, blockchain technology has emerged as a potentially transformative option for enhancing data management, access control, and integrity inside healthcare systems. Integrating these advanced approaches in agriculture is critical for managing food supply chains, drug supply chains, quality maintenance, and intelligent prediction. This study reviews the literature, formulates a research topic, and analyzes the applicability of blockchain to the agriculture/food industry and healthcare, with a particular emphasis on AI and IoT. This article summarizes research on the newest blockchain solutions paired with AI technologies for strengthening and inventing new technological standards for the healthcare ecosystems and food industry.
Collapse
Affiliation(s)
- Sonali Vyas
- University of Petroleum and Energy Studies, Dehradun, India
| | - Mohammad Shabaz
- Model Institute of Engineering and Technology, Jammu, J&K, India
| | - Prajjawal Pandit
- Department of Computer Science & Engineering, Lovely Professional University, Phagwāra, Punjab, India
| | - L. Rama Parvathy
- Department of Computer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, India
| | - Isaac Ofori
- Department of Environmental and Safety Engineering, University of Mines and Technology, Tarkwa, Ghana
| |
Collapse
|
31
|
Machine Learning in Prediction of Bladder Cancer on Clinical Laboratory Data. Diagnostics (Basel) 2022; 12:diagnostics12010203. [PMID: 35054370 PMCID: PMC8774436 DOI: 10.3390/diagnostics12010203] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 01/09/2022] [Accepted: 01/13/2022] [Indexed: 12/19/2022] Open
Abstract
Bladder cancer has been increasing globally. Urinary cytology is considered a major screening method for bladder cancer, but it has poor sensitivity. This study aimed to utilize clinical laboratory data and machine learning methods to build predictive models of bladder cancer. A total of 1336 patients with cystitis, bladder cancer, kidney cancer, uterus cancer, and prostate cancer were enrolled in this study. Two-step feature selection combined with WEKA and forward selection was performed. Furthermore, five machine learning models, including decision tree, random forest, support vector machine, extreme gradient boosting (XGBoost), and light gradient boosting machine (GBM) were applied. Features, including calcium, alkaline phosphatase (ALP), albumin, urine ketone, urine occult blood, creatinine, alanine aminotransferase (ALT), and diabetes were selected. The lightGBM model obtained an accuracy of 84.8% to 86.9%, a sensitivity 84% to 87.8%, a specificity of 82.9% to 86.7%, and an area under the curve (AUC) of 0.88 to 0.92 in discriminating bladder cancer from cystitis and other cancers. Our study provides a demonstration of utilizing clinical laboratory data to predict bladder cancer.
Collapse
|
32
|
Applying Machine Learning in Distributed Data Networks for Pharmacoepidemiologic and Pharmacovigilance Studies: Opportunities, Challenges, and Considerations. Drug Saf 2022; 45:493-510. [PMID: 35579813 PMCID: PMC9112258 DOI: 10.1007/s40264-022-01158-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/13/2022] [Indexed: 01/28/2023]
Abstract
Increasing availability of electronic health databases capturing real-world experiences with medical products has garnered much interest in their use for pharmacoepidemiologic and pharmacovigilance studies. The traditional practice of having numerous groups use single databases to accomplish similar tasks and address common questions about medical products can be made more efficient through well-coordinated multi-database studies, greatly facilitated through distributed data network (DDN) architectures. Access to larger amounts of electronic health data within DDNs has created a growing interest in using data-adaptive machine learning (ML) techniques that can automatically model complex associations in high-dimensional data with minimal human guidance. However, the siloed storage and diverse nature of the databases in DDNs create unique challenges for using ML. In this paper, we discuss opportunities, challenges, and considerations for applying ML in DDNs for pharmacoepidemiologic and pharmacovigilance studies. We first discuss major types of activities performed by DDNs and how ML may be used. Next, we discuss practical data-related factors influencing how DDNs work in practice. We then combine these discussions and jointly consider how opportunities for ML are affected by practical data-related factors for DDNs, leading to several challenges. We present different approaches for addressing these challenges and highlight efforts that real-world DDNs have taken or are currently taking to help mitigate them. Despite these challenges, the time is ripe for the emerging interest to use ML in DDNs, and the utility of these data-adaptive modeling techniques in pharmacoepidemiologic and pharmacovigilance studies will likely continue to increase in the coming years.
Collapse
|
33
|
Prodel M, Finkielsztejn L, Roustand L, Nachbaur G, De Leotoing L, Genreau M, Bonnet F, Ghosn J. Costs and mortality associated with HIV: a machine learning analysis of the French national health insurance database. J Public Health Res 2021; 11:2601. [PMID: 34850620 PMCID: PMC8958442 DOI: 10.4081/jphr.2021.2601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 10/10/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The objective is to characterise the economic burden to the healthcare system of people living with HIV (PLWHIV) in France and to help decision makers in identifying risk factors associated with high-cost and high mortality profiles. DESIGN AND METHODS The study is a retrospective analysis of PLWHIV identified in the French National Health Insurance database (SNDS). All PLWHIV present in the database in 2013 were identified. All healthcare resource consumption from 2008 to 2015 inclusive was documented and costed (for 2013 to 2015) from the perspective of public health insurance. High-cost and high mortality patient profiles were identified by a machine learning algorithm. RESULTS In 2013, 96,423 PLWHIV were identified in the SNDS database, including 3,373 incident cases. Overall, 3,224 PLWHIV died during the three-year follow-up period (mean annual mortality rate: 1.1%). The mean annual per capita cost incurred by PLWHIV was € 14,223, corresponding to a total management cost of HIV of € 1,370 million in 2013. The largest contribution came from the cost of antiretroviral medication (M€ 870; 63%) followed by hospitalisation (M€ 154; 11%). The costs incurred in the year preceding death were considerably higher. Four specific patient profiles were identified for under/over-expressing these costs, suggesting ways to reduce them. CONCLUSIONS Even though current therapeutic regimens provide excellent virological control in most patients, PLWHIV have excess mortality. Other factors such as comorbidities, lifestyle factors and screening for cancer and cardiovascular disease, need to be targeted in order to lower the mortality and cost associated with HIV infection.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fabrice Bonnet
- CHU de Bordeaux, Service de Médecine Interne et Maladies Infectieuses, Hôpital Saint-André, Bordeaux; Université de Bordeaux, INSERM U1219, ISPED, Bordeaux.
| | - Jade Ghosn
- Assistance Publique - Hôpitaux de Paris, APHP; Nord-Université de Paris, Hôpital Bichat-Claude-Bernard, Service des Maladies Infectieuses et Tropicales, Paris.
| |
Collapse
|
34
|
Lu Z, Sim JA, Wang JX, Forrest CB, Krull KR, Srivastava D, Hudson MM, Robison LL, Baker JN, Huang IC. Natural Language Processing and Machine Learning Methods to Characterize Unstructured Patient-Reported Outcomes: Validation Study. J Med Internet Res 2021; 23:e26777. [PMID: 34730546 PMCID: PMC8600437 DOI: 10.2196/26777] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 03/20/2021] [Accepted: 08/12/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Assessing patient-reported outcomes (PROs) through interviews or conversations during clinical encounters provides insightful information about survivorship. OBJECTIVE This study aims to test the validity of natural language processing (NLP) and machine learning (ML) algorithms in identifying different attributes of pain interference and fatigue symptoms experienced by child and adolescent survivors of cancer versus the judgment by PRO content experts as the gold standard to validate NLP/ML algorithms. METHODS This cross-sectional study focused on child and adolescent survivors of cancer, aged 8 to 17 years, and caregivers, from whom 391 meaning units in the pain interference domain and 423 in the fatigue domain were generated for analyses. Data were collected from the After Completion of Therapy Clinic at St. Jude Children's Research Hospital. Experienced pain interference and fatigue symptoms were reported through in-depth interviews. After verbatim transcription, analyzable sentences (ie, meaning units) were semantically labeled by 2 content experts for each attribute (physical, cognitive, social, or unclassified). Two NLP/ML methods were used to extract and validate the semantic features: bidirectional encoder representations from transformers (BERT) and Word2vec plus one of the ML methods, the support vector machine or extreme gradient boosting. Receiver operating characteristic and precision-recall curves were used to evaluate the accuracy and validity of the NLP/ML methods. RESULTS Compared with Word2vec/support vector machine and Word2vec/extreme gradient boosting, BERT demonstrated higher accuracy in both symptom domains, with 0.931 (95% CI 0.905-0.957) and 0.916 (95% CI 0.887-0.941) for problems with cognitive and social attributes on pain interference, respectively, and 0.929 (95% CI 0.903-0.953) and 0.917 (95% CI 0.891-0.943) for problems with cognitive and social attributes on fatigue, respectively. In addition, BERT yielded superior areas under the receiver operating characteristic curve for cognitive attributes on pain interference and fatigue domains (0.923, 95% CI 0.879-0.997; 0.948, 95% CI 0.922-0.979) and superior areas under the precision-recall curve for cognitive attributes on pain interference and fatigue domains (0.818, 95% CI 0.735-0.917; 0.855, 95% CI 0.791-0.930). CONCLUSIONS The BERT method performed better than the other methods. As an alternative to using standard PRO surveys, collecting unstructured PROs via interviews or conversations during clinical encounters and applying NLP/ML methods can facilitate PRO assessment in child and adolescent cancer survivors.
Collapse
Affiliation(s)
- Zhaohua Lu
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Jin-Ah Sim
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
- School of AI Convergence, Hallym University, Chuncheon, Republic of Korea
| | - Jade X Wang
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Christopher B Forrest
- Roberts Center for Pediatric Research, Children's Hospital of Philadelphia, Philadelphia, PA, United States
| | - Kevin R Krull
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Deokumar Srivastava
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Melissa M Hudson
- Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Leslie L Robison
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Justin N Baker
- Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - I-Chan Huang
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| |
Collapse
|
35
|
Field M, Vinod S, Aherne N, Carolan M, Dekker A, Delaney G, Greenham S, Hau E, Lehmann J, Ludbrook J, Miller A, Rezo A, Selvaraj J, Sykes J, Holloway L, Thwaites D. Implementation of the Australian Computer-Assisted Theragnostics (AusCAT) network for radiation oncology data extraction, reporting and distributed learning. J Med Imaging Radiat Oncol 2021; 65:627-636. [PMID: 34331748 DOI: 10.1111/1754-9485.13287] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 06/29/2021] [Indexed: 12/28/2022]
Abstract
INTRODUCTION There is significant potential to analyse and model routinely collected data for radiotherapy patients to provide evidence to support clinical decisions, particularly where clinical trials evidence is limited or non-existent. However, in practice there are administrative, ethical, technical, logistical and legislative barriers to having coordinated data analysis platforms across radiation oncology centres. METHODS A distributed learning network of computer systems is presented, with software tools to extract and report on oncology data and to enable statistical model development. A distributed or federated learning approach keeps data in the local centre, but models are developed from the entire cohort. RESULTS The feasibility of this approach is demonstrated across six Australian oncology centres, using routinely collected lung cancer data from oncology information systems. The infrastructure was used to validate and develop machine learning for model-based clinical decision support and for one centre to assess patient eligibility criteria for two major lung cancer radiotherapy clinical trials (RTOG-9410, RTOG-0617). External validation of a 2-year overall survival model for non-small cell lung cancer (NSCLC) gave an AUC of 0.65 and C-index of 0.62 across the network. For one centre, 65% of Stage III NSCLC patients did not meet eligibility criteria for either of the two practice-changing clinical trials, and these patients had poorer survival than eligible patients (10.6 m vs. 15.8 m, P = 0.024). CONCLUSION Population-based studies on routine data are possible using a distributed learning approach. This has the potential for decision support models for patients for whom supporting clinical trial evidence is not applicable.
Collapse
Affiliation(s)
- Matthew Field
- South Western Sydney Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia
| | - Shalini Vinod
- South Western Sydney Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia.,Liverpool and Macarthur Cancer Therapy Centres, Liverpool, New South Wales, Australia
| | - Noel Aherne
- Mid North Coast Cancer Institute, Coffs Harbour, New South Wales, Australia.,Rural Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Martin Carolan
- Illawarra Cancer Care Centre, Wollongong, New South Wales, Australia
| | - Andre Dekker
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, The Netherlands
| | - Geoff Delaney
- South Western Sydney Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia.,Liverpool and Macarthur Cancer Therapy Centres, Liverpool, New South Wales, Australia
| | - Stuart Greenham
- Mid North Coast Cancer Institute, Coffs Harbour, New South Wales, Australia
| | - Eric Hau
- Sydney West Radiation Oncology Network, Sydney, Australia.,Westmead Clinical School, University of Sydney, Sydney, New South Wales, Australia
| | - Joerg Lehmann
- School of Mathematical and Physical Sciences, University of Newcastle, Newcastle, New South Wales, Australia.,Department of Radiation Oncology, Calvary Mater, Newcastle, New South Wales, Australia.,Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| | - Joanna Ludbrook
- Department of Radiation Oncology, Calvary Mater, Newcastle, New South Wales, Australia
| | - Andrew Miller
- Illawarra Cancer Care Centre, Wollongong, New South Wales, Australia
| | - Angela Rezo
- Canberra Health Services, Canberra, Australian Capital Territory, Australia
| | - Jothybasu Selvaraj
- South Western Sydney Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia.,Canberra Health Services, Canberra, Australian Capital Territory, Australia
| | - Jonathan Sykes
- Sydney West Radiation Oncology Network, Sydney, Australia.,Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| | - Lois Holloway
- South Western Sydney Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia.,Liverpool and Macarthur Cancer Therapy Centres, Liverpool, New South Wales, Australia.,Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| | - David Thwaites
- Institute of Medical Physics, School of Physics, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
36
|
Watson NF, Fernandez CR. Artificial intelligence and sleep: Advancing sleep medicine. Sleep Med Rev 2021; 59:101512. [PMID: 34166990 DOI: 10.1016/j.smrv.2021.101512] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 05/20/2021] [Accepted: 05/21/2021] [Indexed: 02/07/2023]
Abstract
Artificial intelligence (AI) allows analysis of "big data" combining clinical, environmental and laboratory based objective measures to allow a deeper understanding of sleep and sleep disorders. This development has the potential to transform sleep medicine in coming years to the betterment of patient care and our collective understanding of human sleep. This review addresses the current state of the field starting with a broad definition of the various components and analytic methods deployed in AI. We review examples of AI use in screening, endotyping, diagnosing, and treating sleep disorders and place this in the context of precision/personalized sleep medicine. We explore the opportunities for AI to both facilitate and extend providers' clinical impact and present ethical considerations regarding AI derived prognostic information. We cover early adopting specialties of AI in the clinical realm, such as radiology and pathology, to provide a road map for the challenges sleep medicine is likely to face when deploying this technology. Finally, we discuss pitfalls to ensure clinical AI implementation proceeds in the safest and most effective manner possible.
Collapse
Affiliation(s)
- Nathaniel F Watson
- Department of Neurology, University of Washington (UW) School of Medicine, USA; UW Medicine Sleep Center, USA.
| | | |
Collapse
|
37
|
Yang L, Gabriel N, Hernandez I, Winterstein AG, Guo J. Using machine learning to identify diabetes patients with canagliflozin prescriptions at high-risk of lower extremity amputation using real-world data. Pharmacoepidemiol Drug Saf 2021; 30:644-651. [PMID: 33606340 DOI: 10.1002/pds.5206] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 02/16/2021] [Indexed: 02/06/2023]
Abstract
AIMS Canagliflozin, a sodium-glucose cotransporter 2 inhibitor indicated for lowering glucose, has been increasingly used in diabetes patients because of its beneficial effects on cardiovascular and renal outcomes. However, clinical trials have documented an increased risk of lower extremity amputations (LEA) associated with canagliflozin. We applied machine learning methods to predict LEA among diabetes patients treated with canagliflozin. METHODS Using claims data from a 5% random sample of Medicare beneficiaries, we identified 13 904 diabetes individuals initiating canagliflozin between April 2013 and December 2016. The samples were randomly and equally split into training and testing sets. We identified 41 predictor candidates using information from the year prior to canagliflozin initiation, and applied four machine learning approaches (elastic net, least absolute shrinkage and selection operator [LASSO], gradient boosting machine and random forests) to predict LEA risk after canagliflozin initiation. RESULTS The incidence rate of LEA was 0.57% over a median 1.5 years follow-up. LASSO produced the best prediction, yielding a C-statistic of 0.81 (95% CI: 0.76, 0.86). Among individuals categorized in the top 5% of the risk score, the actual incidence rate of LEA was 3.74%. Among the 16 factors selected by LASSO, history of LEA [adjusted odds ratio (aOR): 33.6 (13.8, 81.9)] and loop diuretic use [aOR: 3.6 (1.8,7.3)] had the strongest associations with LEA incidence. CONCLUSIONS Our machine learning model efficiently predicted the risk of LEA among diabetes patients undergoing canagliflozin treatment. The risk score may support optimized treatment decisions and thus improve health outcomes of diabetes patients.
Collapse
Affiliation(s)
- Lanting Yang
- Department of Pharmacy and Therapeutics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Nico Gabriel
- Department of Pharmacy and Therapeutics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Inmaculada Hernandez
- Department of Pharmacy and Therapeutics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Center for Pharmaceutical Policy and Prescribing, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Almut G Winterstein
- Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, Florida, USA
| | - Jingchuan Guo
- Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
38
|
Geva A, Liu M, Panickan VA, Avillach P, Cai T, Mandl KD. A high-throughput phenotyping algorithm is portable from adult to pediatric populations. J Am Med Inform Assoc 2021; 28:1265-1269. [PMID: 33594412 DOI: 10.1093/jamia/ocaa343] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/27/2020] [Accepted: 12/28/2020] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Multimodal automated phenotyping (MAP) is a scalable, high-throughput phenotyping method, developed using electronic health record (EHR) data from an adult population. We tested transportability of MAP to a pediatric population. MATERIALS AND METHODS Without additional feature engineering or supervised training, we applied MAP to a pediatric population enrolled in a biobank and evaluated performance against physician-reviewed medical records. We also compared performance of MAP at the pediatric institution and the original adult institution where MAP was developed, including for 6 phenotypes validated at both institutions against physician-reviewed medical records. RESULTS MAP performed equally well in the pediatric setting (average AUC 0.98) as it did at the general adult hospital system (average AUC 0.96). MAP's performance in the pediatric sample was similar across the 6 specific phenotypes also validated against gold-standard labels in the adult biobank. CONCLUSIONS MAP is highly transportable across diverse populations and has potential for wide-scale use.
Collapse
Affiliation(s)
- Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Anaesthesia, Harvard Medical School, Boston, Massachusetts, USA
| | - Molei Liu
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Vidul A Panickan
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Paul Avillach
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
39
|
Kwon O, Na W, Kang H, Jun TJ, Kweon J, Park GM, Cho Y, Hur C, Chae J, Kang DY, Lee PH, Ahn JM, Park DW, Kang SJ, Lee SW, Lee CW, Park SW, Park SJ, Yang DH, Kim YH. Electronic Medical Record-Based Machine Learning Approach to Predict the Risk of 30-Day Adverse Cardiac Events after Invasive Coronary Treatment (Preprint). JMIR Med Inform 2020; 10:e26801. [PMID: 35544292 PMCID: PMC9133980 DOI: 10.2196/26801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 06/10/2021] [Accepted: 01/31/2022] [Indexed: 11/13/2022] Open
Abstract
Background Objective Methods Results Conclusions
Collapse
Affiliation(s)
- Osung Kwon
- Division of Cardiology Department of Internal Medicine, Eunpyeong St Mary's Hospital, Catholic University of Korea, Seoul, Republic of Korea
| | - Wonjun Na
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Heejun Kang
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Tae Joon Jun
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jihoon Kweon
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Gyung-Min Park
- Division of Cardiology, Department of Internal Medicine, Ulsan University Hospital, University of Ulsan College of Medicine, Ulsan, Republic of Korea
| | - YongHyun Cho
- Artificial Intelligence Lab, Linewalks, Inc, Seoul, Republic of Korea
| | - Cinyoung Hur
- Artificial Intelligence Lab, Linewalks, Inc, Seoul, Republic of Korea
| | - Jungwoo Chae
- Artificial Intelligence Lab, Linewalks, Inc, Seoul, Republic of Korea
| | - Do-Yoon Kang
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Pil Hyung Lee
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jung-Min Ahn
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Duk-Woo Park
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Soo-Jin Kang
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Seung-Whan Lee
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Cheol Whan Lee
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Seong-Wook Park
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Seung-Jung Park
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Dong Hyun Yang
- Department of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Young-Hak Kim
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
40
|
Zeldow B, Flory J, Stephens-Shields A, Raebel M, Roy JA. Functional clustering methods for longitudinal data with application to electronic health records. Stat Methods Med Res 2020; 30:655-670. [PMID: 33176615 DOI: 10.1177/0962280220965630] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
We develop a method to estimate subject-level trajectory functions from longitudinal data. The approach can be used for patient phenotyping, feature extraction, or, as in our motivating example, outcome identification, which refers to the process of identifying disease status through patient laboratory tests rather than through diagnosis codes or prescription information. We model the joint distribution of a continuous longitudinal outcome and baseline covariates using an enriched Dirichlet process prior. This joint model decomposes into (local) semiparametric linear mixed models for the outcome given the covariates and simple (local) marginals for the covariates. The nonparametric enriched Dirichlet process prior is placed on the regression and spline coefficients, the error variance, and the parameters governing the predictor space. This leads to clustering of patients based on their outcomes and covariates. We predict the outcome at unobserved time points for subjects with data at other time points as well as for new subjects with only baseline covariates. We find improved prediction over mixed models with Dirichlet process priors when there are a large number of covariates. Our method is demonstrated with electronic health records consisting of initiators of second-generation antipsychotic medications, which are known to increase the risk of diabetes. We use our model to predict laboratory values indicative of diabetes for each individual and assess incidence of suspected diabetes from the predicted dataset.
Collapse
Affiliation(s)
- Bret Zeldow
- Department of Mathematics and Statistics, Colby College, Waterville, ME, USA
| | - James Flory
- Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Alisa Stephens-Shields
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Marsha Raebel
- Institute for Health Research, Kaiser Permanente Colorado, Aurora, CO, USA
| | - Jason A Roy
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, New Brunswick, NJ, USA
| |
Collapse
|
41
|
Bate A, Hobbiger SF. Artificial Intelligence, Real-World Automation and the Safety of Medicines. Drug Saf 2020; 44:125-132. [PMID: 33026641 DOI: 10.1007/s40264-020-01001-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2020] [Indexed: 12/16/2022]
Abstract
Despite huge technological advances in the capabilities to capture, store, link and analyse data electronically, there has been some but limited impact on routine pharmacovigilance. We discuss emerging research in the use of artificial intelligence, machine learning and automation across the pharmacovigilance lifecycle including pre-licensure. Reasons are provided on why adoption is challenging and we also provide a perspective on changes needed to accelerate adoption, and thereby improve patient safety. Last, we make clear that while technologies could be superimposed on existing pharmacovigilance processes for incremental improvements, these great societal advances in data and technology also provide us with a timely opportunity to reconsider everything we do in pharmacovigilance operations to maximise the benefit of these advances.
Collapse
Affiliation(s)
- Andrew Bate
- Clinical Safety and Pharmacovigilance, GSK, 980 Great West Road, Brentford, Middlesex, TW8 9GS, UK.
- Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel St, London, WC1E 7HT, UK.
| | - Steve F Hobbiger
- Clinical Safety and Pharmacovigilance, GSK, 980 Great West Road, Brentford, Middlesex, TW8 9GS, UK
| |
Collapse
|
42
|
Bertoncelli CM, Solla F. Machine learning for monitoring and evaluating physical activity in cerebral palsy. Dev Med Child Neurol 2020; 62:1010. [PMID: 32543715 DOI: 10.1111/dmcn.14596] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 05/11/2020] [Indexed: 11/29/2022]
Affiliation(s)
- Carlo M Bertoncelli
- Department of Physical Therapy & Neuroscience, Florida International University, Miami, Fl, USA.,Department of Orthopedic Surgery, Lenval University Children Hospital, Nice, France
| | - Federico Solla
- Department of Orthopedic Surgery, Lenval University Children Hospital, Nice, France
| |
Collapse
|
43
|
Alami H, Lehoux P, Auclair Y, de Guise M, Gagnon MP, Shaw J, Roy D, Fleet R, Ag Ahmed MA, Fortin JP. Artificial Intelligence and Health Technology Assessment: Anticipating a New Level of Complexity. J Med Internet Res 2020; 22:e17707. [PMID: 32406850 PMCID: PMC7380986 DOI: 10.2196/17707] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 04/25/2020] [Accepted: 05/13/2020] [Indexed: 12/12/2022] Open
Abstract
Artificial intelligence (AI) is seen as a strategic lever to improve access, quality, and efficiency of care and services and to build learning and value-based health systems. Many studies have examined the technical performance of AI within an experimental context. These studies provide limited insights into the issues that its use in a real-world context of care and services raises. To help decision makers address these issues in a systemic and holistic manner, this viewpoint paper relies on the health technology assessment core model to contrast the expectations of the health sector toward the use of AI with the risks that should be mitigated for its responsible deployment. The analysis adopts the perspective of payers (ie, health system organizations and agencies) because of their central role in regulating, financing, and reimbursing novel technologies. This paper suggests that AI-based systems should be seen as a health system transformation lever, rather than a discrete set of technological devices. Their use could bring significant changes and impacts at several levels: technological, clinical, human and cognitive (patient and clinician), professional and organizational, economic, legal, and ethical. The assessment of AI's value proposition should thus go beyond technical performance and cost logic by performing a holistic analysis of its value in a real-world context of care and services. To guide AI development, generate knowledge, and draw lessons that can be translated into action, the right political, regulatory, organizational, clinical, and technological conditions for innovation should be created as a first step.
Collapse
Affiliation(s)
- Hassane Alami
- Public Health Research Center, Université de Montréal, Montreal, QC, Canada
- Department of Health Management, Evaluation and Policy, École de santé publique de l'Université de Montréal, Montreal, QC, Canada
- Institut national d'excellence en santé et services sociaux, Montréal, QC, Canada
| | - Pascale Lehoux
- Public Health Research Center, Université de Montréal, Montreal, QC, Canada
- Department of Health Management, Evaluation and Policy, École de santé publique de l'Université de Montréal, Montreal, QC, Canada
| | - Yannick Auclair
- Institut national d'excellence en santé et services sociaux, Montréal, QC, Canada
| | - Michèle de Guise
- Institut national d'excellence en santé et services sociaux, Montréal, QC, Canada
| | - Marie-Pierre Gagnon
- Research Center on Healthcare and Services in Primary Care, Université Laval, Quebec, QC, Canada
- Faculty of Nursing Science, Université Laval, Quebec, QC, Canada
| | - James Shaw
- Joint Centre for Bioethics, University of Toronto, Toronto, ON, Canada
- Institute for Health System Solutions and Virtual Care, Women's College Hospital, Toronto, ON, Canada
| | - Denis Roy
- Institut national d'excellence en santé et services sociaux, Montréal, QC, Canada
| | - Richard Fleet
- Research Center on Healthcare and Services in Primary Care, Université Laval, Quebec, QC, Canada
- Department of Family Medicine and Emergency Medicine, Faculty of Medicine, Université Laval, Quebec, QC, Canada
- Research Chair in Emergency Medicine, Université Laval - CHAU Hôtel-Dieu de Lévis, Lévis, QC, Canada
| | - Mohamed Ali Ag Ahmed
- Research Chair on Chronic Diseases in Primary Care, Université de Sherbrooke, Chicoutimi, QC, Canada
| | - Jean-Paul Fortin
- Research Center on Healthcare and Services in Primary Care, Université Laval, Quebec, QC, Canada
- Department of Social and Preventive Medicine, Faculty of Medicine, Université Laval, Quebec, QC, Canada
| |
Collapse
|
44
|
Straub L, Gagne JJ, Maro JC, Nguyen MD, Beaulieu N, Brown JS, Kennedy A, Johnson M, Wright A, Zhou L, Wang SV. Evaluation of Use of Technologies to Facilitate Medical Chart Review. Drug Saf 2020; 42:1071-1080. [PMID: 31111340 DOI: 10.1007/s40264-019-00838-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
INTRODUCTION While medical chart review remains the gold standard to validate health conditions or events identified in administrative claims and electronic health record databases, it is time consuming, expensive and can involve subjective decisions. AIM The aim of this study was to describe the landscape of technology-enhanced approaches that could be used to facilitate medical chart review within and across distributed data networks. METHOD We conducted a semi-structured survey regarding processes for medical chart review with organizations that either routinely do medical chart review or use technologies that could facilitate chart review. RESULTS Fifteen out of 17 interviewed organizations used optical character recognition (OCR) or natural language processing (NLP) in their chart review process. None used handwriting recognition software. While these organizations found OCR and NLP to be useful for expediting extraction of useful information from medical charts, they also mentioned several challenges. Quality of medical scans can be variable, interfering with the accuracy of OCR. Additionally, linguistic complexity in medical notes and heterogeneity in reporting templates used by different healthcare systems can reduce the transportability of NLP-based algorithms to diverse healthcare settings. CONCLUSION New technologies including OCR and NLP are currently in use by various organizations involved in medical chart review. While technology-enhanced approaches could scale up capacity to validate key variables and make information about important clinical variables from medical records more generally available for research purposes, they often require considerable customization when employed in a distributed data environment with multiple, diverse healthcare settings.
Collapse
Affiliation(s)
- Loreen Straub
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 1620 Tremont Street, Suite 3030, Boston, MA, 02120, USA.
| | - Joshua J Gagne
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 1620 Tremont Street, Suite 3030, Boston, MA, 02120, USA
| | - Judith C Maro
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA, USA
| | - Michael D Nguyen
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA
| | - Nicolas Beaulieu
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA, USA
| | - Jeffrey S Brown
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA, USA
| | - Adee Kennedy
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA, USA
| | - Margaret Johnson
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA, USA
| | - Adam Wright
- Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Li Zhou
- Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Shirley V Wang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 1620 Tremont Street, Suite 3030, Boston, MA, 02120, USA
| |
Collapse
|
45
|
Pépin J, Bailly S, Tamisier R. Big Data in sleep apnoea: Opportunities and challenges. Respirology 2019; 25:486-494. [DOI: 10.1111/resp.13669] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 06/13/2019] [Accepted: 07/23/2019] [Indexed: 12/13/2022]
Affiliation(s)
- Jean‐Louis Pépin
- HP2 Laboratory, INSERM U1042University Grenoble Alpes Grenoble France
- EFCR LaboratoryCHU de Grenoble Alpes Grenoble France
| | - Sébastien Bailly
- HP2 Laboratory, INSERM U1042University Grenoble Alpes Grenoble France
- EFCR LaboratoryCHU de Grenoble Alpes Grenoble France
| | - Renaud Tamisier
- HP2 Laboratory, INSERM U1042University Grenoble Alpes Grenoble France
- EFCR LaboratoryCHU de Grenoble Alpes Grenoble France
| |
Collapse
|
46
|
Grabar N, Grouin C. A Year of Papers Using Biomedical Texts: Findings from the Section on Natural Language Processing of the IMIA Yearbook. Yearb Med Inform 2019; 28:218-222. [PMID: 31419835 PMCID: PMC6697498 DOI: 10.1055/s-0039-1677937] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
OBJECTIVES To analyze the content of publications within the medical Natural Language Processing (NLP) domain in 2018. METHODS Automatic and manual pre-selection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. RESULTS Two best papers have been selected this year. One dedicated to the generation of multi- documents summaries and another dedicated to the generation of imaging reports. We also proposed an analysis of the content of main research trends of NLP publications in 2018. CONCLUSIONS The year 2018 is very rich with regard to NLP issues and topics addressed. It shows the will of researchers to go towards robust and reproducible results. Researchers also prove to be creative for original issues and approaches.
Collapse
Affiliation(s)
- Natalia Grabar
- LIMSI, CNRS, Université Paris-Saclay, Orsay, France
- STL, CNRS, Université de Lille, Villeneuve-d'Ascq, France
| | - Cyril Grouin
- LIMSI, CNRS, Université Paris-Saclay, Orsay, France
| | | |
Collapse
|